Sorry for my bad english.....
I have a simulated dataset of Miseq paired-end reads and I know the taxonomy of each read. I would like to test my dataset with dada2 and sklearn. To test my data, I need the correspondence between reads and representative sequences which are the results of q2-dada2. Currently, I have taxonomy for each representative sequence but not for each input reads.
Is there a method or a file to get the correspondence between reads and sequences?
I believe what you are asking for is something like an OTU map correct? We don't have anything in QIIME 2 that has that concept. @benjjneb, would it be possible to track which reads became which ASVs if we were to use DADA2 directly in R?
Do you have the reference sequences that are the basis of your simulated dataset? It might make more sense to try and match the ASVs back to the references (I think you could use qiime vsearch cluster-features-closed-reference with --p-perc-identity of 1.0) as ideally they should be exact matches or very close to it. This assumes your simulation provides the same kind of error profiles that we see with real sequencing instruments. (I'm assuming you are trying to validate your simulation rather than DADA2/feature-classifier?)
Thanks for your answer @ebolyen and @benjjneb .
Yes, in a way I would like a OTU map but I read that dada2 doesn't perform clustering only dereplication to remove reads redondancy.
I will try to use qiime vsearch cluster-features-closed reference with your parameter, it's a good idea.
I read the doc for qiime vsearch cluster-features-closed-reference and I realize that it doesn't resolve my problem because qiime vsearch need the output of dada2 where I lost reads with the dereplication.