How to get the sequences that were taxonomically assigned to my reference database?

osidito91 · January 15, 2020, 10:46pm

Hello everyone, does anyone know how I can know which are the sequences that were taxonomically assigned to my reference database? In the end I get the OTUS found in qzv format, but now I would like to know what the sequences that were assigned to those OTUSs, to be able to make a validation in NCBI.

mkdir 16s
cd 16s
qiime tools import --type ‘SampleData[SequencesWithQuality]’ --input-path secuencias --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-single-end-todas16s.qza

qiime tools validate demux-single-end-todas16s.qza

qiime demux summarize --i-data demux-single-end-todas16s.qza --o-visualization demux-single-end-todas16s.qzv

qiime dada2 denoise-single –i- demultiplexed-seqs demux-single-end.qza –p-trunc-len 440 –p-n-threads 0 –o-table table-dada2.qza –o-representative-sequences rep-seq-dada2.qza –o-denoising-stats stats-dada2.qza

qiime tools import --type ‘FeatureData[Sequence]’ --input-path silva_132_99_16S.fna --output-path reference.qza

qiime feature-classifier extract-reads --i-sequences reference.qza --p-f-primer CCTACGGGNGGCWGCAG --p-r-primer GACTACHVGGGTATCTAATCC --p-trunc-len 440 --o-reads ref-seqs-cianobacteria-16s-nocloroo.qza

qiime tools import --type ‘FeatureData[Taxonomy]’ --input-format HeaderlessTSVTaxonomyFormat --input-path silva_cianobacterias_99_16S_nocloro.txt --output-path taxa_silva_ref_99_nocloro.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs-cianobacteria-16s-nocloroo.qza --i-reference-taxonomy taxa_silva_ref_99_nocloro.qza --o-classifier classifier_cianobacterias_nocloro.qza

qiime feature-classifier classify-sklearn --i-classifier classifier_cianobacterias_nocloro.qza --i-reads rep-seq-dada2-todas16s.qza --o-classification taxonomy_cianobacterias_nocloro.qza**

qiime taxa barplot --i-table table-dada2-todas16s.qza --i-taxonomy taxonomy_cianobacterias_nocloro.qza --m-metadata-file sample-metadata.tsv --o-visualization taxa-bar-plots.qzv

timanix · January 16, 2020, 9:57am

Hi!
I don’t know if you can do it within the Qiime, so I am writing here to wait for other participants.

But you always can open taxonomy.qza file, extract a taxonomy.tsv file, search for all ASVs hashes, assigned for certain taxa, and use this list of ASVs hashes/names to iterate through fasta file inside of rep-seqs.qza to get sequences. You can do it mannualy if you are interested only in few taxa, but for large list you may want to write a script for it

osidito91 · January 16, 2020, 3:40pm

Hello, thank you very much for your recommendation, it seems to me that I could do that because I am really only interested in validating 30 sequences. The truth is that I am practically new in Qiime. Do you think it would be a lot to ask you, if you could tell me how to do what you mentioned, please?

Nicholas_Bokulich · January 16, 2020, 3:43pm

Just run qiime metadata tabulate with the sequences and taxonomy as input data. This will collate those files and generate a searchable visualization...

So as above, create the metadata visualization, search for your 30 sequences (or specific taxonomies), and find the sequences associated.

Good luck!

timanix · January 16, 2020, 5:46pm

That's easier indeed...

osidito91 · January 16, 2020, 5:47pm

Metadata tabulate was the solution !!
Thank you very much!

By the way I did not find the “SOLVED” option.
I used follow command.

qiime metadata tabulate
–m-input-file rep-seqs.qza
–m-input-file taxonomy.qza
–o-visualization tabulated-feature-metadata.qzv