Keeping Unassigned Sequences in the barplot

dr_hulk · April 20, 2023, 2:45pm

Hello,

New user here, still trying to set up a workflow that works...

I got as far as assigning Taxonomy using a pre-trained database (silva138_AB_V4_classifier.qza).
However, there are still some unassigned sequences left. When I want to make a barplot, I get an error because some ' Feature IDs found in the table are missing from the taxonomy
I can get rid of them by using this command:

Step 1
qiime feature-table filter-features
--i-table output/table-filtered-seq-clustered.qza
--m-metadata-file output/taxonomy.qza
--o-filtered-table output/id-filtered-table.qza

This solves it to generate a barplot, but I feel like these unassigned sequences are important. I want to include them in my analyses.

How can I synchronise my Fetures table and my Sequence table, without removing unassigned sequences?

My complete workflow (with the error) is below:
(Don't hesitate to comment on it, as I just got started and can use all the suggestions I can get)

Step 2
(qiime2-2023.2) mac-XXX-XXX:registry myname$ cd ~/Documents/registry
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /Users/myname/Documents/Registry/Raw_data
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path output/demux-paired-end.qza
Imported /Users/myname/Documents/Registry/Raw_data as CasavaOneEightSingleLanePerSampleDirFmt to output/demux-paired-end.qza
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime cutadapt trim-paired
--i-demultiplexed-sequences output/demux-paired-end.qza
--p-front-f GTGYCAGCMGCCGCGGTAA
--p-front-r GGACTACNVGGGTWTCTAAT
--p-match-read-wildcards
--quiet
--o-trimmed-sequences output/primers-removed.qza

Step 3
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime dada2 denoise-paired
--i-demultiplexed-seqs output/primers-removed.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 233
--p-trunc-len-r 229
--o-table output/table.qza
--o-representative-sequences output/rep-seqs.qza
--o-denoising-stats output/denoising-stats.qza
Saved FeatureTable[Frequency] to: output/table.qza
Saved FeatureData[Sequence] to: output/rep-seqs.qza
Saved SampleData[DADA2Stats] to: output/denoising-stats.qza

Step 4
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime feature-table summarize
--i-table output/table.qza
--m-sample-metadata-file metadata.tsv
--o-visualization output/table-summary.qzv
Saved Visualization to: output/table-summary.qzv

Step 5
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime feature-table filter-features
--i-table output/table.qza
--p-min-frequency 1
--o-filtered-table output/table-filtered-seq.qza
Saved FeatureTable[Frequency] to: output/table-filtered-seq.qza

Step 6
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime feature-table summarize
--i-table output/table-filtered-seq.qza
--o-visualization output/table-filtered-seq.qzv
Saved Visualization to: output/table-filtered-seq.qzv

(qiime2-2023.2) mac-XXX-XXX:registry myname$ (specify the lowest number of reads a sample must have to be retained for further analysis)
-bash: specify: command not found

Step 7
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime feature-table filter-samples
--i-table output/table-filtered-seq.qza
--p-min-frequency 2000
--o-filtered-table output/filtered-table-samples.qza
Saved FeatureTable[Frequency] to: output/filtered-table-samples.qza

Step 8
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime feature-table summarize
--i-table output/filtered-table-samples.qza
--o-visualization output/filtered-table-samples-summary.qzv
Saved Visualization to: output/filtered-table-samples-summary.qzv

Step 9
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime vsearch cluster-features-de-novo
--i-table output/filtered-table-samples.qza
--i-sequences output/rep-seqs.qza
--p-perc-identity 0.99
--p-threads 6
--o-clustered-table output/table-filtered-seq-clustered.qza
--o-clustered-sequences output/rep-seqs-clustered.qza
Saved FeatureTable[Frequency] to: output/table-filtered-seq-clustered.qza
Saved FeatureData[Sequence] to: output/rep-seqs-clustered.qza

Step 10
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime vsearch uchime-denovo
--i-table output/table-filtered-seq-clustered.qza
--i-sequences output/rep-seqs-clustered.qza
--o-chimeras output/chimeras.qza
--o-nonchimeras output/rep-seqs-nonchimeric.qza
--o-stats output/chimeras-stats.qza
Saved FeatureData[Sequence] to: output/chimeras.qza
Saved FeatureData[Sequence] to: output/rep-seqs-nonchimeric.qza
Saved UchimeStats to: output/chimeras-stats.qza

Step 11
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime feature-classifier classify-sklearn
--i-classifier Classifier/silva138_AB_V4_classifier.qza
--i-reads output/rep-seqs-nonchimeric.qza
--o-classification output/taxonomy.qza
Saved FeatureData[Taxonomy] to: output/taxonomy.qza

(qiime2-2023.2) mac-XXX-XXX:registry myname$ Phylogenetic Diversity and weighted and unweighted UniFrac.
-bash: Phylogenetic: command not found

Step 12
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime phylogeny align-to-tree-mafft-fasttree
--i-sequences output/rep-seqs-nonchimeric.qza
--o-alignment output/aligned-rep-seqs.qza
--o-masked-alignment output/masked-aligned-rep-seqs.qza
--o-tree output/unrooted-tree.qza
--o-rooted-tree output/rooted-tree.qza
Saved FeatureData[AlignedSequence] to: output/aligned-rep-seqs.qza
Saved FeatureData[AlignedSequence] to: output/masked-aligned-rep-seqs.qza
Saved Phylogeny[Unrooted] to: output/unrooted-tree.qza
Saved Phylogeny[Rooted] to: output/rooted-tree.qza

Step 13
(qiime2-2023.2) mac-XXX-XXX:registry myname$ qiime taxa barplot
--i-table output/table-filtered-seq-clustered.qza
--i-taxonomy output/taxonomy.qza
--m-metadata-file metadata.tsv
--o-visualization output/taxa-barplot.qzv
Plugin error from taxa:
Feature IDs found in the table are missing from the taxonomy: {'87959caafcd3a9754a7ff216c69f10db', '08e542b57c85006f23fc55d4afe2e80e', '8e798cd6222603d9ae5335fabd3109d7', 'e02e5e20e6c69df17fba1558706899cf', '8ad99eaeb7cfc8078721880d7e431e54', 'a54b3d089a55f8a84e35b8ec1521aabf', 'd1345a893834131ca5d77b2e66a5c6af', '9facdd58f96e1f72e4e3165c97ba58ab'}
Debug info has been saved to /var/folders/y1/dw4jwvvs7v76_vgklwnbsw0c0000gp/T/qiime2-q2cli-err-hfzla_06.log

colinvwood · April 20, 2023, 5:41pm

Hello @dr_hulk,

Welcome to the forum!

I annotated your post with step numbers so I can refer to certain parts. In step 10 you filter out chimeric sequences and then in step 11 use this filtered set of features to create your taxonomy. Then in step 13 you use an unfiltered set of features (table-filtered-seq-clustered.qza) to try to make a barplot. That's why you you get the "Feature IDs found in the table are missing from the taxonomy" error.

To make your feature table and taxonomy consistent you can either:

filter your feature table with the non-chimeric features
create the taxonomy with your rep-seqs-clustered.qza sequences (skipping the chimeric filtering step)

Thanks.

dr_hulk · April 21, 2023, 11:47am

Hi Colin,

Thanks for your help! I understand where the problem is, however, I don't know how to fix it.

It doesn't work when I try to make a barplot based on the filtered data (output/rep-seqs-nonchimeric.qza)

qiime taxa barplot \

--i-table output/rep-seqs-nonchimeric.qza
--i-taxonomy output/taxonomy.qza
--m-metadata-file metadata.tsv
--o-visualization output/taxa-barplot.qzv

error given:
(1/4) Missing option '--i-table'.
(2/4) Missing option '--i-taxonomy'.
(3/4) Missing option '--o-visualization'. ("--output-dir" may also be used)
(4/4) Got unexpected extra arguments (output/rep-seqs-nonchimeric.qza
output/taxonomy.qza metadata.tsv output/taxa-barplot.qzv)

From what I understand, the command "qiime feature-table filter-features" can only filter based on frequency and/or metadata, so how can I filter my table using my filtered sequences? In other words, how can I remove the chimera's from my list?

colinvwood · April 21, 2023, 5:44pm

Hello @dr_hulk,

Step 1 won't work because rep-seqs-nonchimeric.qza has the semantic type FeatureData[Sequence] and the taxa barplot command wants a table of semantic type FeatureTable[Frequency]. You also seem to have some command line syntax errors going on here.

To answer your second question, you can use the non-chimeric sequences as a filter to the qiime feature-table filter-features command like so:

qiime feature-table filter-features \
--I-table <your-table.qza> \
--m-metadata-file <non-chimeric-seqs.qza> \
--o-filtered-table <your-filtered-table.qza>

Yes, this is strange because the non-chimeric-seqs.qza isn't technically metadata, but it can be treated as such in this case.

dr_hulk · April 21, 2023, 7:10pm

It worked! Thank you very much!

If I could just ask an additional question: I don't have any 'Unassigned sequences', which surprises me a bit, because barplots that were generated in our lab before always had them.

I do have a small group that does not pass the domain level. Are those the unassigned?

Furthermore, I have not filtered out any eukaryotes, chloroplast nor mitochondria, so I was expecting to pick up at least some of them.

colinvwood · April 21, 2023, 9:13pm

Hello @dr_hulk,

Depends on what you mean by unassigned. The features that are given the d_Bacteria;__(...) classification could not be resolved beyond the domain level. These features are still "assigned" to the domain bacteria however.

As to why you don't have eukaryote/chloroplast/mitochondria features, I can't say exactly without knowing more about your project's background. In 16S pipelines, even if such DNA is extracted, after amplification of the 16S target region, it can be in such low relative amount that it's undetectable after sequencing (among other ways that 16S-specific library preparation can work to exclude these sequences).

Even if such DNA is sequenced, it won't necessarily be classified/properly annotated unless the database is constructed to do so.

system · May 23, 2023, 3:52am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.