How to get the sequences that were taxonomically assigned to my reference database?

Hello everyone, does anyone know how I can know which are the sequences that were taxonomically assigned to my reference database? In the end I get the OTUS found in qzv format, but now I would like to know what the sequences that were assigned to those OTUSs, to be able to make a validation in NCBI.

mkdir 16s
cd 16s
qiime tools import --type ‘SampleData[SequencesWithQuality]’ --input-path secuencias --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-single-end-todas16s.qza

qiime tools validate demux-single-end-todas16s.qza

qiime demux summarize --i-data demux-single-end-todas16s.qza --o-visualization demux-single-end-todas16s.qzv

qiime dada2 denoise-single –i- demultiplexed-seqs demux-single-end.qza –p-trunc-len 440 –p-n-threads 0 –o-table table-dada2.qza –o-representative-sequences rep-seq-dada2.qza –o-denoising-stats stats-dada2.qza

qiime tools import --type ‘FeatureData[Sequence]’ --input-path silva_132_99_16S.fna --output-path reference.qza

qiime feature-classifier extract-reads --i-sequences reference.qza --p-f-primer CCTACGGGNGGCWGCAG --p-r-primer GACTACHVGGGTATCTAATCC --p-trunc-len 440 --o-reads ref-seqs-cianobacteria-16s-nocloroo.qza

qiime tools import --type ‘FeatureData[Taxonomy]’ --input-format HeaderlessTSVTaxonomyFormat --input-path silva_cianobacterias_99_16S_nocloro.txt --output-path taxa_silva_ref_99_nocloro.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs-cianobacteria-16s-nocloroo.qza --i-reference-taxonomy taxa_silva_ref_99_nocloro.qza --o-classifier classifier_cianobacterias_nocloro.qza

qiime feature-classifier classify-sklearn --i-classifier classifier_cianobacterias_nocloro.qza --i-reads rep-seq-dada2-todas16s.qza --o-classification taxonomy_cianobacterias_nocloro.qza**

qiime taxa barplot --i-table table-dada2-todas16s.qza --i-taxonomy taxonomy_cianobacterias_nocloro.qza --m-metadata-file sample-metadata.tsv --o-visualization taxa-bar-plots.qzv

I don’t know if you can do it within the Qiime, so I am writing here to wait for other participants.

But you always can open taxonomy.qza file, extract a taxonomy.tsv file, search for all ASVs hashes, assigned for certain taxa, and use this list of ASVs hashes/names to iterate through fasta file inside of rep-seqs.qza to get sequences. You can do it mannualy if you are interested only in few taxa, but for large list you may want to write a script for it

Hello, thank you very much for your recommendation, it seems to me that I could do that because I am really only interested in validating 30 sequences. The truth is that I am practically new in Qiime. Do you think it would be a lot to ask you, if you could tell me how to do what you mentioned, please?

Just run qiime metadata tabulate with the sequences and taxonomy as input data. This will collate those files and generate a searchable visualization…

So as above, create the metadata visualization, search for your 30 sequences (or specific taxonomies), and find the sequences associated.

Good luck!


That’s easier indeed… :face_with_monocle:

1 Like

Metadata tabulate was the solution !!
Thank you very much!

By the way I did not find the “SOLVED” option.
I used follow command.

qiime metadata tabulate
–m-input-file rep-seqs.qza
–m-input-file taxonomy.qza
–o-visualization tabulated-feature-metadata.qzv