Questions Regarding Determining Original Read Names and Identifying Reads Clustered into Representative Sequences in QIIME2

Dear QIIME2 community,

I am new to QIIME 2 and currently encountering an issue for which I have extensively searched forums and webpages without finding a satisfactory answer. Consequently, I am reaching out to seek guidance on the following matters:

  1. After denoising a ".fastq" file to obtain an Amplicon Sequence Variant (ASV) represented by a "FeatureData[Sequence]" and a corresponding feature table "FeatureTable[Frequency]", I observed that the feature table contains names of rep-seqs and their respective frequencies. However, these sequence names appear to differ from the names of reads in the original ".fastq" file. Is there a method to determine the original names of the reads in the ".fastq" file from which the representative sequences were derived?

  2. In an effort to inspect the names (in ".fastq" file) of reads that cluster into each representative sequence, I utilized "qiime tools export." However, in the "FeatureTable[Frequency]" export, only the frequencies of each representative sequence are present, and the individual read names are not included. Additionally, within the "FeatureData[Sequence]" export, only the sequence of the representative sequence are present. Is there a recommended approach to identifying the names of the indivisual reads that are clustered into each representative sequence?

I would greatly appreciate any assistance, information about relevant forum posts, or insights into potential solutions that I may have overlooked.

Thank you in advance to anyone who can provide assistance.

Best regards,

Hi @Min-Ho ,

No, because the rep seqs are derived from possibly several raw sequences (and then denoised, and possibly merged, so might not be a 100% match).

What you are looking for is something like a "Feature map" to determine which raw read IDs map to the resulting ASVs. This is not an output with q2-dada2 currently.

This has been occasionally requested for both denoising and OTU clustering methods in QIIME 2, so may be added in the future, but it is not possible at the moment.

For now, I would suggest exporting the FeatureData[Sequence] artifact and then aligning the raw reads against the rep seqs to find the closest match, e.g., with VSEARCH, which would then give you a full report on the hits, allowing you to map reads to rep seqs.

I hope that helps!


Dear Bokulich,

I extend my sincere gratitude for your prompt and helpful response.

I regret to learn that obtaining the 'Feature map' as output has proven challenging.

Following your suggestion, I will initiate the process of mapping reads to representative sequences.

Your insightful guidance is immensely valued, and I wish to express my gratitude once again for your assistance. I hope you have a wonderful day.

Best regards,