Loss of data with DADA2

Droush · February 13, 2018, 2:51pm

Hello,

I've ran Qiime2 with DADA2 before and I usually would have around 30% of my sequences being filtered out after running DADA2, but this time I got only around 35% of my sequences back (meaning that 65% of my sequences are being filtered per sample).

This was the command I ran:

qiime dada2 denoise-paired --i-demultiplexed-seqs vanessa-demux-paired-end.qza --o-table vanessa_table_ee8 --o-representative-sequences vanessa-rep-seqs-ee8 --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 290 --p-trunc-len-r 235 --p-n-threads 30 --p-max-ee 8 --verbose

And this was my output:

                             input    filtered denoised  merged non-chimeric
20-1_48_L001_R1_001.fastq.gz 413752   370970   370970    226078       153173
20-2_49_L001_R1_001.fastq.gz 631046   576952   576952    385678       237943
20-3_50_L001_R1_001.fastq.gz 709152   614104   614104    371633       209208
20-4_51_L001_R1_001.fastq.gz 683558   599355   599355    399690       285173
20-5_52_L001_R1_001.fastq.gz 711601   629273   629273    421946       258020
5-1_43_L001_R1_001.fastq.gz  659969   591283   591283    368305       186078

Is this normal? Should I just continue with my data analysis?

Thank you.

benjjneb · February 13, 2018, 3:51pm

The amount of reads you are losing in merging, and at the chimeric step, are high enough to warrant concern.

Can you clarify: What is the length of the amplicon you are sequencing? What are the primers? Are the primers on the reads?

Droush · February 13, 2018, 4:53pm

Hey Ben,

Looks like someone in the lab used my old github account to post, but I can at least provide some insight here.

We are using the 341F -806R primer pair that I have used in the past for other data sets without much trouble. The merged amplicon is about 418 bp. From the same MiSeq run I had ran 2 other data sets that turned out fine. The data was generated with the adapter+primer PCR, so the primers may be in the read and that is something we can fix and rerun. There were some concerns about some variable quality in the first 100bp of both the forward and reverse reads (nothing major, but a wider spread than normal).

benjjneb · February 14, 2018, 5:46pm

That's definitely the first thing to try. If the data loss persists after the primers are off, we can revisit, but there's a very good chance that will fix the issue, as primers with ambiguous nucleotides hanging around can interfere w/ both merging and chimera removal.

system · March 18, 2018, 12:33am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.