almost all sequences filtered out with dada2

awaller · July 5, 2019, 12:23am

Hi there,

After demultiplexing I have 3302 sequences of length 240-457.
However after dad2 I have 16 sequences and they are all exactly 250 bp in length
I have pasted my commands below, please let me know if I am missing something.

Thanks

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path Manifest_16S.tsv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2

generate a summary of the demultiplexing

qiime demux summarize
--i-data paired-end-demux.qza
--o-visualization paired-end-demux.qzv

qiime tools view paired-end-demux.qzv

added cons , as this is fairly conservative

forward sequences have good quality until 240, but reverse only 220

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 15
--p-trunc-len-f 240
--p-trim-left-r 15
--p-trunc-len-r 220
--o-representative-sequences rep-seqs-cons-dada2.qza
--o-table table-cons-dada2.qza
--o-denoising-stats stats-cons-dada2.qza

paired-end-demux.qzv (288.8 KB) rep-seqs-cons-dada2.qzv (197.2 KB)

Have you reviewed the QIIME 2 Forum Glossary?
Version of QIIME 2 you are running, and how it is installed (e.g. Virtualbox, conda, etc.)
What is the exact command or commands you ran? Copy and paste please.
What is the exact error message, if applicable? If you didn't run the command with the --verbose flag, please re-run and copy-and-paste the results.

Mehrbod_Estaki · July 5, 2019, 1:22am

Hi @awaller,

Where are you getting these value from exactly? I ask because in your demux-visualization shows something differently:

Minimum:	13
Median:	114.0
Mean:	110.39285714285714
Maximum:	114
Total:	3091

and your quality plot shows regular ~ 250 bp long sequences. After merging with dada2 your features are all ~225bp, suggesting that there is an almost complete overlap between your forward/reverse reads. This is typical of sequencing short regions such as V4.

Your dada2 trim/trunc parameters look ok to me, assuming there is sufficient overlap in your reads. I would personally truncate more from your reverse reads if possible.
You may be aware of this but this data looks a bit artificial to me, I've never seen equal distribution of reads (114 sequences) in each sample the way you have it here. It looks as though each sample was sumsampled or rarefied prior to importing. The # of reads is also very low per sample. Blasting a few of your ASVs from rep-seq file shows them as chlorplast and mitochondria, again this may be what you are expecting, but just wanted to point out the oddity of this dataset in case it's not intentional.

system · August 5, 2019, 7:22am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.