How can I know the taxonomy of all my sequences contained in my fastq?

Hi all!!

I am writing in the forum, because I would like to know if there is any way to see all my sequences contained in a fastq with its taxonomy after doing the command qiime feature-classifier classify-consensus-vsearch.

For this I use the qiime metadata tabulate command with rep-seqs.qza and the output file of the qiime feature-classifier classify-consensus-vsearch command with the taxonomy (using silva132), but the result is a metadata in which I do not all the sequences of my fastq appear.

Could you help me with this?

thank you

Hi @hamac87,

Welcome to the :qiime2: forum!

I'm not entirely sure I understand your question.

If you want the direct taxonomy-to-sequence match for your rep-set, I would suggest using something like the --p-no-hashed-feature-ids when you denoise/cluster your data. That will give you the full sequence in your rep-set.

But, your question about the fastq makes me think that you want to classify directly on the per-sample fastq. That isn't a common path. A denoising approach helps overcome any error in your original fastq and gives you single nucleotide resolution. The output of this is a table with the counts per feature (ASV), and a rep-set. You an classify taxonomy for your rep-set. Combined, you have the taxonomy for all the counts in your ASV table (built from your input fastqs) and then you can do things like collapse your data into a classified table. Is that more like what you'd like?

Best,
Justine

qiime metadata tabulate is the correct command to tabulate your taxonomy classification results. All input sequences will be classified there. If you are missing some it is either:

  1. Because only 100 records are shown on each page. Use the "next" button to move to the next page or, better yet, use the search bar to find a feature or taxon of interest.
  2. Because the features that are being taxonomically classified are ASVs represented as a fasta, not a fastq. So if you are looking for sequence IDs from your input fastq they will not be there — those fastq IDs have long since been dereplicated, denoised, and rendered as ASVs.

No need for --p-no-hashed-feature-ids. Instead, you can use qiime metadata tabulate to collate your taxonomy and sequences together. Just input your sequences as an additional input file to this command and the output will show you feature ID, taxonomic classification, and sequence all together.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.