Dada2 output same number of reads as unique sequences

gkong · November 13, 2017, 3:12pm

Hi all, I am having some difficulties trying to understand what the problem here is.
I am looking at 16S sequencing data (515F-806R) on 150bp MiSeq Illumina. Im running this on Qiime2-2017.10 and only using forward reads for now. However, my DADA2 output is giving me the same number of reads as unique sequences.

My command line:

> qiime dada2 denoise-single --i-demultiplexed-seqs demux-fwd.qza --p-trunc-len 150 --p-n-threads 0 --verbose --o-representative-sequences rep-seqs-fwd.qza --o-table table-fwd-20170805.qza
 
DADA2 output:
R version 3.3.2 (2016-10-31) 
Loading required package: Rcpp
There were 50 or more warnings (use warnings() to see the first 50)
DADA2 R package version: 1.4.0 
1) Filtering ...................................
2) Learning Error Rates
Initializing error rates to maximum possible estimate.
Sample 1 - 36102 reads in 36102 unique sequences.
Sample 2 - 48634 reads in 48634 unique sequences.
Sample 3 - 12 reads in 12 unique sequences.
Sample 4 - 61432 reads in 61428 unique sequences.
Sample 5 - 27783 reads in 27783 unique sequences.
Sample 6 - 32174 reads in 32174 unique sequences.
Sample 7 - 25428 reads in 25428 unique sequences.
Sample 8 - 31563 reads in 31562 unique sequences.
Sample 9 - 78258 reads in 78258 unique sequences.
Sample 10 - 95979 reads in 95978 unique sequences.
Sample 11 - 172712 reads in 172710 unique sequences.

etc....

I believe that this is abnormal as there should not be the same number of reads as unique sequences. Can anyone please help me with understanding what's wrong here?
There's room for improvement for the quality of reads: demux-paired-end-20170805.qzv (280.5 KB) . Is it possible that this is the issue?

Nicholas_Bokulich · November 13, 2017, 6:43pm

Hi @gkong,
Thanks for posting! Something unusual is definitely happening — and it may all stem from your data. I have never seen quality profiles like those shown in the file you posted. Typically, quality gradually declines, e.g., like this. You may want to check back on any pre-processing steps that were performed on these data prior to importing into QIIME 2 — if these are raw data, I wonder if unusual levels of sequencing error could explain your issue.

Do your sequence reads also contain barcode and primer sequences? These would typically cause dada2 to filter out these reads as chimera, but I wonder if that could also explain the output you report (read count = unique count).

@benjjneb do you have any insight on what is occurring here?

benjjneb · November 13, 2017, 7:03pm

I see the same thing: There are (essentially) no repeated sequences in the data (technically samples 4/9/10/11 have a couple). In general, that's a bad sign for the input data.

How was this data generated?

gkong · November 14, 2017, 10:56pm

Thanks for replying! The pre-processing steps (demultiplexing/ primer trimming) were done by a collaborator so I'm not too sure what command lines were used. It will take a few days to get that information so I will update when the time comes, hopefully this mystery will be solved soon.

system · December 16, 2017, 4:56am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.