Lots of Unassigned reads

victoriamesa · October 21, 2019, 1:25am

Dear all,

I'm analyzing gut microbiota data. 2 x 300 pair-ends MISEQ runs on V3-V4. I had a similar issue with the previous ones who posted on this topic. I have QIIME2 version 2019.7 installed in macOS.

Summary: I followed Moving Pictures Tutorial. I denoised with DADA2 , truncated at 420 based on the QC plots.
The majority of samples (around 50%) ended up with unassigned taxonomy. However, when I processed the data through QIIME1, I don't have a lot of unassigned as is the case with QIIME1.

-The commands that I have used:
qiime dada2 denoise-single **
--i-demultiplexed-seqs Gut_microbiota/Sequences/demux-single-end2.qza **
--p-trim-left 30 **
--p-trunc-len 420 **
--o-representative-sequences Gut_microbiota/rep-seqs-dada2.qza **
--o-table Gut_microbiota/table-dada2.qza **
--o-denoising-stats Gut_microbiota/stats-dada2.qza

For the qiime feature-classifier classify-sklearn I have tried with greengenes and with the newest Silva database with my own primer sequences.

-Outputs: Please check the demux plot (demux.png). Barplot from QIIME2 and Barplot from some sequences did in QIIME1.

bar-plots_Filtered-no-mitochondria-no-chloroplast.qzv (2.9 MB)

Thank you in advance!

sixvable · October 21, 2019, 3:59am

Hi @victoriamesa
The taxonomy result is so weird.It assigned nothing

The V3-V4 16S amplicon length is about 437bp I remember but your sequences base show there are some amplicons length is longer than 450.
I guess it is because you have merged your paired end reads before quality control.That is not appropriate since Miseq PE300 has poor perfomance on base accuracy in the reads tail.
I suggest you import the paired end read without merging (Type SampleData[PairedEndSeuquencesWithQuality]) and merge through command qiime dada2 denoise-paired
also train your own V3-V4 classifier.
Hope that would help you!

Nicholas_Bokulich · October 21, 2019, 3:49pm

Welcome to the forum @victoriamesa!

The 50% unassigned is a pretty clear indicator: your reads are probably in mixed orientations. The classify-sklearn method currently only supports classification of sequences that are in a single orientation. However, the classify-consensus-vsearch method can classify mixed orientation reads just fine. Could you give that method a try and let us know what you see? Thanks!

But @sixvable is also correct:

Even though this is probably not what is causing this classification issue, it is probably leading to suboptimal performance with q2-dada2, which assumes that reads are not yet merged. So demultiplex and denoise on the unpaired reads and let q2-dada2 do the merging for you.

victoriamesa · October 22, 2019, 12:48pm

Dear @Nicholas_Bokulich and @sixvable,

Thank you for your fantastic work through this forum and Thank you very much for having taken the time to answer in details and for your explanations. I did the taxonomic assignment with classify-consensus-vsearch method and and now I only have a small fraction of unassigned bacteria. I think it was a problem with the orientation of reads.

Your help is greatly appreciated!

Best wishes,

system · November 22, 2019, 6:48pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.