How can I get taxa ID, rep-seq from each taxon per sample?

jlli2000 · June 1, 2023, 2:41am

Hello:

I have a project using CO1 BOLD database to detect invertbrates.

I used DADA2 to create table, rep-seq and stats.
I used feature-classifier classify-consensus-blast to create a taxa table
I use barplot to create taxa table

My question is: How can I get the taxa ID and rep seq for each taxon in each sample?

Thanks for your help.

Best,

Jin

Nicholas_Bokulich · June 1, 2023, 4:43am

Hi @jlli2000 ,

You can use qiime metadata tabulate to merge these into a single table visualization. See the example here:

https://docs.qiime2.org/2023.5/tutorials/metadata/#exploring-feature-metadata

jlli2000 · June 4, 2023, 6:54pm

Hi, Nicholas:

Thanks for your quick response.

I did use this to create a feature table which contains ID, sequence, taxon, confidence. However, I also need to add the sequence ID i.e. 1174152 in the following example. How can I extract the sequence ID in the BOLD database along with the sequences.

11794152
ATGTTCGCCGACCGTTGACTATTCTCTACAAACCACAAAGACATTGGAACACTATACCTATTATTCGGCGCATGAGCTGGAGTCCTAGGCACAGCTCTAAGCCTCCTTATTCGAGCCGAGCTGGGCCAGCCAGGCAACCTTCTAGGTAACGACCACATCTACAACGTTATCGTCACAGCCCATGCATTTGTAATAATCTTCTTCATAGTAATACCCATCATAATCGGAGGCTTTGGCAACTGACTAGTTCCCCTAATAATCGGTGCCCCCGATATGGCGTTTCCCCGCATAAACAACATAAGCTTCTGACTCTTACCTCCCTCTCTCCTACTCCTGCTCGCATCTGCTATAGTGGAGGCCGGAGCAGGAACAGGTTGAACAGTCTACCCTCCCTTAGCAGGGAACTACTCCCACCCTGGAGCCTCCGTAGACCTAACCATCTTCTCCTTACACCTAGCAGGTGTCTCCTCTATCTTAGGGGCCATCAATTTCATCACAACAATTATCAATATAAAACCCCCTGCCATAACCCAATACCAAACGCCCCTCTTCGTCTGATCCGTCCTAATCACAGCAGTCCTACTTCTCCTATCTCTCCCAGTCCTAGCTGCTGGCATCACTATACTACTAACAGACCGCAACCTCAACACCACCTTCTTCGACCCCGCCGGAGGAGGAGACCCCATTCTATACCAACACCTATTCTGATTTTTCGGTCACCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGTTTATCGTGTGAGCACACCATATATTTACAGTAGGAATAGACGTAGACACACGAGCATATTTCACCTCCGCTACCATAATCATCGCTATCCCCACCGGCGTCAAAGTATTTAGCTGACTCGCCACACTCCACGGAAGCAATATGAAATGATCTGCTGCAGTGCTCTGAGCCCTAGGATTCATCTTTCTTTTCACCGTAGGTGGCCTGACTGGCATTGTATTAGCAAACTCATCACTAGACATCGTACTACACGACACGTACTACGTTGTAGCTCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCTTCATTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACCTACGCCAAAATCCATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCACAACACTTTCTCGGCCTATCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGCATACACCACATGAAACATCCTATCATCTGTAGGCTCATTCATTTCTCTAACAGCAGTAATATTAATAATTTTCATGATTTGAGAAGCCTTCGCTTCGAAGCGAAAAGTCCTAATAGTAGAAGAACCCTCCATAAACCTGGAGTGACTATATGGATGCCCCCCACCCTACCACACATTCGAAGAACCCGTATACATAAAATCTAGA

Another question is how to assign this sequence and taxa confidence to each sample (taxa table) from the barplot.

By the way, I replied your email and it was returned to me.

Thanks,

Jin

Nicholas_Bokulich · June 6, 2023, 5:31am

hmm... this is a feature ID in a separate file (the reference database), so there is not really a good way to link this to the feature table and taxonomy that you currently have (as the feature IDs are distinct). It sounds like you want to display the ID of the closest hit in the reference for each query. For this, you could (outside of QIIME 2): run blast, extract the query ID and target ID columns, then merge this with the other files using qiime metadata tabulate.

if confidence information is present in the taxonomy file it will be included in the merged visualization output by qiime metadata tabulate

good luck!

jlli2000 · June 9, 2023, 1:46am

HI, Nicolas:

Thanks for your reply.

I created the taxa table using barplot pluggin. I found that each taxon may associated with multiple sequences with different confidence. Is there anyway that I can associate the taxon per sample to the appropriate sequence?

Thanks,

Jin

Nicholas_Bokulich · June 9, 2023, 10:42am

No, not with the output of the barplot plugin, which reports the counts per taxon, rather than the counts per ASV.

Best yes this is possible. You need to use metadata tabulate to merge the ASV table with the taxonomy and sequence artifacts.

jlli2000 · June 10, 2023, 10:10pm

HI, Nicholas:

Thanks for your reply.

I created the taxa table using barplot pluggin. I found that each taxon may associated with multiple sequences with different confidence. Is there anyway that I can associate the taxon per sample to the appropriate sequence?

If the barplot can't associate the taxon with the rep-sequence, can you recommend a taxonomy plugin can do this?

My goal is to associate the rep-sequence with taxon per sample.

Thanks,

Jin

Nicholas_Bokulich · June 11, 2023, 6:09am

Yes see my advice above — metadata tabulate is what you want. The tutorial at the link above describes exactly how to do this. If you are expecting something different, I suggest writing out a small example here to describe the format that you are looking for.

jlli2000 · June 12, 2023, 3:24am

Hi, Nicolas:

Sorry to bother you again. Here are the script that I used to create barplot and taxa associate with the rep seq. However, I need per sample taxa table created in the barplot and add the rep-seq to the taxa table.

I am new to the Qiime. I would appreciate you more help. Thanks, Jin

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path F230
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path F230-fullrun.qza

qiime cutadapt trim-paired
--i-demultiplexed-sequences F230-fullrun.qza
--p-cores 46
--p-front-f GGTCAACAAATCATAAAGATATTGG
--p-front-r CTTATRTTRTTTATNCGNGGRAANGC
--p-error-rate 0.4
--p-minimum-length 200
--p-discard-untrimmed
--p-overlap 10
--o-trimmed-sequences F230-fullrun-trimmed.qza

qiime dada2 denoise-paired
--p-n-threads 0
--i-demultiplexed-seqs F230-fullrun-trimmed.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 200
--p-trunc-len-r 180
--p-chimera-method consensus
--o-representative-sequences trimmed-F230-fullrun-rep-seqs-dada2.qza
--o-table trimmed-F230-fullrun-table-dada2.qza
--o-denoising-stats trimmed-F230-fullrun-stats-dada2.qza

qiime feature-classifier classify-consensus-blast
--i-query trimmed-F230-fullrun-rep-seqs-dada2.qza
--i-reference-taxonomy bold_database/bold_taxa-derep2.qza
--i-reference-reads bold_database/bold_seqs-derep2.qza
--o-classification bold_F230-fullrun-trimmed-rep-seqs-dada2-taxa-derep2.qza
--p-perc-identity 0.97
--p-maxaccepts 10

qiime feature-table filter-features
--i-table trimmed-F230-fullrun-table-dada2.qza
--m-metadata-file bold_F230-fullrun-trimmed-rep-seqs-dada2-taxa-derep2.qza
--o-filtered-table FilteredSusan-F230-fullrun-trimmed-table-dada2-derep2.qza

taxa abundance table at species level. Download from Susan_F230-sample-barplot-derep2.qzv

qiime taxa barplot
--i-table FilteredSusan-F230-fullrun-trimmed-table-dada2-derep2.qza
--i-taxonomy bold_F230-fullrun-trimmed-rep-seqs-dada2-taxa-derep2.qza
--m-metadata-file F230-original-sample-metadata.tsv
--o-visualization Susan_F230-sample-barplot-derep2.qzv

#assiciate sequences with taxa
qiime metadata tabulate
--m-input-file trimmed-F230-fullrun-rep-seqs-dada2.qza
--m-input-file bold_F230-fullrun-trimmed-rep-seqs-dada2-taxa-derep2.qza
--o-visualization trimmed-F230-fullrun-rep-seqs-taxa-dada2test.qzv

Nicholas_Bokulich · June 13, 2023, 6:37am

as explained above, this is not possible with the table that you are exporting from the barplot visualization. Please read my advice above about this.