It is possible to get a file which contain the mapping between input reads and representative sequences for the q2-dada2 process

clionnet · November 16, 2017, 4:46pm

Hi,

Sorry for my bad english.....
I have a simulated dataset of Miseq paired-end reads and I know the taxonomy of each read. I would like to test my dataset with dada2 and sklearn. To test my data, I need the correspondence between reads and representative sequences which are the results of q2-dada2. Currently, I have taxonomy for each representative sequence but not for each input reads.

Is there a method or a file to get the correspondence between reads and sequences?

Regards,
Clément

ebolyen · November 17, 2017, 1:04am

Hi @clionnet!

I believe what you are asking for is something like an OTU map correct? We don't have anything in QIIME 2 that has that concept. @benjjneb, would it be possible to track which reads became which ASVs if we were to use DADA2 directly in R?

Do you have the reference sequences that are the basis of your simulated dataset? It might make more sense to try and match the ASVs back to the references (I think you could use qiime vsearch cluster-features-closed-reference with --p-perc-identity of 1.0) as ideally they should be exact matches or very close to it. This assumes your simulation provides the same kind of error profiles that we see with real sequencing instruments. (I'm assuming you are trying to validate your simulation rather than DADA2/feature-classifier?)

benjjneb · November 17, 2017, 1:19am

Yes, but no. It is possible, but requires some R hacking and using some poorly documented features of the dada2 R package.

We've recently had a request for this same feature over at our github site and it is on our radar for our next release. You'll see progress on easy read tracking appear there.

clionnet · November 17, 2017, 3:31pm

Thanks for your answer @ebolyen and @benjjneb .
Yes, in a way I would like a OTU map but I read that dada2 doesn't perform clustering only dereplication to remove reads redondancy.

I will try to use qiime vsearch cluster-features-closed reference with your parameter, it's a good idea.

clionnet · November 17, 2017, 3:31pm

I read the doc for qiime vsearch cluster-features-closed-reference and I realize that it doesn't resolve my problem because qiime vsearch need the output of dada2 where I lost reads with the dereplication.

I will try to use dada2 directly in R.

system · December 18, 2017, 9:31pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.