Why does most my sample has uncategorized bacteria?

Hi there,
I wanted to understand why do most of the samples has uncategorized bacteria? Some of the sample has 100% of them. Does it mean something wrong with the sequencing? or sample prep? Or am I missing something?

This is the PE300 illumina sequencing with 16S primers.

Thanks

Hi @amm59063,

Can you please provide the exact commands you ran up until this point? Even better can you provide the QZV file from this barplot image? Looking at the provenance information will help us help you. Finally what amplicon region are you using?

Otherwise, these issues often occur because, the incorrect classifier is being used, sequencing reads are not in the same orientation as the reference database, etc.. There are many threads on the forum related to this topic. Here are links to a couple to help get you started:

-Mike

Hi Mike,
Thank you for the responds, I have used the primers to cover V3-V4 region.
This is the PE300 Miseq data.

Command for import:

qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path /scratch/amm59063/workdir/sb2_analysis/samples/fastq/r1
–input-format CasavaOneEightSingleLanePerSampleDirFmt
–output-path /scratch/amm59063/workdir/sb2_analysis/samples/fastq/demux-r1.qza

qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path /scratch/amm59063/workdir/sb2_analysis/samples/fastq/r2
–input-format CasavaOneEightSingleLanePerSampleDirFmt
–output-path /scratch/amm59063/workdir/sb2_analysis/samples/fastq/demux-r2.qza

DADA2 command:

qiime dada2 denoise-single
–p-trim-left 8
–p-trunc-len 299
–i-demultiplexed-seqs /scratch/amm59063/workdir/sb2_analysis/samples/fastq/demux-r1.qza
–o-representative-sequences /scratch/amm59063/workdir/sb2_analysis/samples/qc/rep_seqs_r1.qza
–o-table /scratch/amm59063/workdir/sb2_analysis/samples/qc/table_r1.qza
–o-denoising-stats /scratch/amm59063/workdir/sb2_analysis/samples/qc/stats_r1.qza

qiime dada2 denoise-single
–p-trim-left 7
–p-trunc-len 266
–i-demultiplexed-seqs /scratch/amm59063/workdir/sb2_analysis/samples/fastq/demux-r2.qza
–o-representative-sequences /scratch/amm59063/workdir/sb2_analysis/samples/qc/rep_seqs_r2.qza
–o-table /scratch/amm59063/workdir/sb2_analysis/samples/qc/table_r2.qza
–o-denoising-stats /scratch/amm59063/workdir/sb2_analysis/samples/qc/stats_r2.qza

Merge two reads:

qiime feature-table merge-seqs
–i-data /scratch/amm59063/workdir/sb2_analysis/samples/qc/rep_seqs_r1.qza
–i-data /scratch/amm59063/workdir/sb2_analysis/samples/qc/rep_seqs_r2.qza
–o-merged-data /scratch/amm59063/workdir/sb2_analysis/samples/merge/rep-seq.qza

qiime feature-table merge
–i-tables /scratch/amm59063/workdir/sb2_analysis/samples/qc/table_r1.qza
–i-tables /scratch/amm59063/workdir/sb2_analysis/samples/qc/table_r2.qza
–p-overlap-method error_on_overlapping_feature
–o-merged-table /scratch/amm59063/workdir/sb2_analysis/samples/merge/table.qza

qiime feature-table summarize
–i-table /scratch/amm59063/workdir/sb2_analysis/samples/merge/table.qza
–o-visualization /scratch/amm59063/workdir/sb2_analysis/samples/merge/table.qzv
–m-sample-metadata-file /scratch/amm59063/workdir/sb2_analysis/samples/metadata.tsv

qiime feature-table tabulate-seqs
–i-data /scratch/amm59063/workdir/sb2_analysis/samples/merge/rep-seq.qza
–o-visualization /scratch/amm59063/workdir/sb2_analysis/samples/merge/rep-seq.qzv

Taxonomy analysis:

qiime feature-classifier classify-sklearn
–i-classifier /scratch/amm59063/workdir/sb2_analysis/gg/gg_97/classifier_gg_97.qza
–i-reads /scratch/amm59063/workdir/sb2_analysis/samples/merge/rep-seq.qza
–o-classification /scratch/amm59063/workdir/sb2_analysis/samples/taxa_gg_97/sample_taxonomy_gg_97.qza

qiime taxa filter-table
–i-table /scratch/amm59063/workdir/sb2_analysis/samples/merge/table.qza
–i-taxonomy /scratch/amm59063/workdir/sb2_analysis/samples/taxa_gg_97/sample_taxonomy_gg_97.qza
–p-exclude mitochondria,chloroplast,Unassigned,Eukaryota
–o-filtered-table /scratch/amm59063/workdir/sb2_analysis/samples/taxa_gg_97/filtered_table_gg_97.qza

qiime taxa barplot
–i-table /scratch/amm59063/workdir/sb2_analysis/samples/taxa_gg_97/filtered_table_gg_97.qza
–i-taxonomy /scratch/amm59063/workdir/sb2_analysis/samples/taxa_gg_97/sample_taxonomy_gg_97.qza
–m-metadata-file /scratch/amm59063/workdir/sb2_analysis/samples/metadata.tsv
–o-visualization /scratch/amm59063/workdir/sb2_analysis/samples/taxa_gg_97/barplot_samples_gg_97.qzv

Barplot QZV file:

https://drive.google.com/file/d/1zZOE6-ebusQT2cKis6h1Uh_Dcnsi1G-7/view?usp=sharing

Hi @amm59063,

I immediately noticed a problem with your pipeline. You are merging data as if they were from different sequencing runs, the merge commands here are not actually merging your reads together, as would happen with vsearch join-pairs, or DADA2. Explained below…

If you want to make use of paired-ends from DADA2, you should run dada2 denoise paired …. DADA2 will denoise both reads and merge them for you. Alternatively you can use the deblur pipeline in place of DADA2.

See the Atacama Tutorial for more details.

Also, you might try the SILVA database for taxonomic classification. The classifiers, and the files used to make them are here.

-Mike

Hi Mike,
I trained the SILVA classifier and the problem occur. But when I use the trained SILVA classifier full length. It gave a result much better. I have attached the QZV file still few sample give uncategorized bacteria 100%

If you want to make use of paired-ends from DADA2, you should run dada2 denoise paired … . DADA2 will denoise both reads and merge them for you. Alternatively you can use the deblur pipeline in place of DADA2

I tried to import the forward and reverse sequence together and used dada2 denoise paired, But no differences I noticed.

Code for import:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /scratch/amm59063/workdir/sb2_analysis/samples/fastq/both
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path /scratch/amm59063/workdir/sb2_analysis/samples/fastq/demux-paired-end.qza

code for dada2:

qiime dada2 denoise-paired
--i-demultiplexed-seqs /scratch/amm59063/workdir/sb2_analysis/samples/fastq/demux-paired-end.qza
--p-trim-left-f 8
--p-trim-left-r 7
--p-trunc-len-f 299
--p-trunc-len-r 266
--o-table /scratch/amm59063/workdir/sb2_analysis/samples/qc_paired/table.qza
--o-representative-sequences /scratch/amm59063/workdir/sb2_analysis/samples/qc_paired/rep-seqs.qza
--o-denoising-stats /scratch/amm59063/workdir/sb2_analysis/samples/qc_paired/denoising-stats.qza

Do you see anything wrong with this coding?

Thanks again

-Afaqbarplot_samples_silva.qzv (3.9 MB)
.

@amm59063

I would not recommend using the data in which you’ve run DADA2 separately on R1 and R2 and then merged them (see prior comment in this thread). Again, you only run those merge commands if you are combining several sequencing runs together, which is not what you are doing here.

The taxonomy should be better with the DADA2 paired-end output as the sequences will be longer, increasing your taxonomic resolution.

Can you make / send a version of the qzv barplot made from the qiime dada2 denoise-paired sequences? This barplot is from your initial denoise-single approach, as shown in the provenance tab.

-Mike

Hi Mike,

Can you make / send a version of the qzv barplot made from the qiime dada2 denoise-paired sequences? This barplot is from your initial denoise-single approach, as shown in the provenance tab.

Here I'm attaching the qzv file. This is made from the full length classifier (Avilable on QIIME Data resources)barplot_samples_silva_paired.qzv (3.9 MB)

@amm59063 this is the barplot result for the individual dada2 denoise-single steps that were merged together. I would like to see the barplot result of dada2 denoise-paired, see the screen shot of the provenance of the last qzv file you sent:

Your provenance should say denoise-paired like this:

-Mike

1 Like

Hi Mike,
Sorry for that.

see the screen shot of the provenance of the last qzv file you sent:

I used the previous table to make the taxonomy (forgot to change the directory). Yeah Now I'm getting better results. I have attached the bar plot barplot_samples_silva_paired.qzv (2.7 MB).

1 Like

No worries @amm59063,

These results look good to me. I clicked on the color-box for d__Bacteria;__;__;__;__, which will hide everything else but that group of taxa. You can see there are now much fewer unclassified bacteria compared to your original post. Yay!

-Mike

Hi Mike,

Thank you again. One more question, I wanted to filter all those uncategorized assignment (d__Bacteria;;;;). So I tried to use following code, but I couldn’t eliminate that. Any advice on this coding?

> qiime taxa filter-table \
>         --i-table /scratch/amm59063/workdir/sb2_analysis/samples/taxa_silva_paired/filtered_table_silva_paired.qza \
>         --i-taxonomy /scratch/amm59063/workdir/sb2_analysis/samples/taxa_silva_paired/sample_taxonomy_paired.qza \
>         --p-mode exact \
>         --p-exclude "d__Bacteria;__;__;__;__;__;__" \
>         --o-filtered-table /scratch/amm59063/workdir/sb2_analysis/samples/taxa_silva_paired/filtered2_table_silva_paired.qza

-Afaq

Hi @amm59063,

Note the d__Bacteria;__;__;__;__;__;__ only appears in that form within the visualizer. So searching for that pattern will not work. The ;__;__;__;__;__;__ is simply filled in as you view more ranks in the visualizer. In actuality, the taxonomy information may only contain d__Bacteria.

The best way to figure out what to search for is to make a sample_taxonomy_paired.qzv file and view that.

I usually run something like this when filtering SILVA taxonomy:

  --p-mode 'contains'  \
  --p-include 'p__' \
  --p-exclude 'p__;,Eukaryota,Chloroplast,Mitochondria,Unassigned' \

Note the p__ in --p-include, and p__; in --p-exclude. The include keeps only taxa with p__ in the label. However, we do not want empty phylum labels, i.e. p__; (note the semi-colon). These two combined have the effect of explicitly removing all taxa that do not have at least a phylum-level designation. So, in this case the d__Bacteria; p__; ... or d__Bacteria ... (similarly for Archaea too) will be removed. Depending on the reference database you are using you may need to change Unassigned to Unclassified or simply add it.

-Mike

Thanks Mike, it is working. Honestly everyday learning new things on this forum.
-Afaq

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.