Dada2 parameters problem

maryver · August 8, 2020, 7:05pm

Good morning,
I am new to qiime2 and I've been reading some of the forums that have been previously posted to familiarize myself with the language.

I managed to import my data with CassavaOneEight and now I am working on the denoise with dada2. I was handed paired-end-demultiplexed data containing barcodes and primers.

My target region is V3-V4 of the 16S gene with primers 314F (CCTAYGGGRBGCASCAG) and 806R (GGACTACNNGGGTATCTAAT).
I attach my quality plot of my sequences.

I've been having trouble with the trim / trunc parameters for denoising. Reviewing forums, the parameters that I decided to place were the following: --p-trunc-len-f 250 --p-trunc-len-r 230. However, when Dada2 finished running, it keeps deleting a good percentage of my sequences. I started with 342197 sequences and Dada2 gave me 24,858.

I would appreciate any of your help. Thank you in advance

I leave you the script that I am using to import them and until the dada2
Import data:
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path secuencias --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end-mary.qza

qiime demux summarize --i-data demux-paired-end-mary.qza --o-visualization demux-paired-end-mary.qzv

DADA2:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end-mary.qza --p-trunc-len-f 250 --p-trunc-len-r 230 --p-trim-left-f 0 --p-trim-left-r 0 --p-n-threads 1 --o-table table-dada2-mary.qza --o-representative-sequences rep-seq-dada2--mary.qza --o-denoising-stats stats-dada2-mary.qza

colinbrislawn · August 12, 2020, 9:22pm

Hello @maryver,

Welcome to the forums! :qiime2:

This one is a bit of a mystery because many factors could cause dada2 to remove so many reads. We will have to investigate...

We could find some useful information using qiime metadata tabulate to visualize your denoising-stats. This will tell us how many reads were retained at each step of the dada2 pipeline.

qiime metadata tabulate \
  --m-input-file stats-dada2-mary.qza \
  --o-visualization stats-dada2-mary.qzv

@Mehrbod_Estaki has a hunch that the small area of overlap between the reads might be causing pairing to fail. Let's do a simple estimate of the number of overlapping base pairs of your forward and reverse reads.

bp sequenced - bp in amplicon
(250f + 230r) - (806r-314f)
480 - 492
-12

So... if these number are correct, these reads might not overlap at all, and instead have a 12 bp gap in between, making pairing impossible.

But we will learn more by checking the visualization

Colin

maryver · August 13, 2020, 9:45pm

Hello @colinbrislawn

Thank you for helping me

I ran dada2 again with my sequences without primer and barcode and set the parameters to 215f and 185r assuming that my fragment is approximately 445 bp and thus had an overlap of 25 bp. And I have the same problem that discards most of my sequences, the stats file tells me that most of them are in the filtered and denoise step.

It is not very clear to me how you choose the overlap. Could you explain me? to know how to choose my parameters from dada2 correctly. I chose those parameter according to another post in the forum.

In the same way I attach my stat results of dada2 for my sequences with primers-barcodes (stats-dada2-mary-primer) and with primers-barcodes removed (stats-dada2-mary-Wprimer).

stats-dada2-mary-Wprimer.qzv (1.2 MB) stats-dada2-mary-primer.qzv (1.2 MB)
Marycarmen

colinbrislawn · August 13, 2020, 10:30pm

Hello Marycarmen,

I took a look at your files, and I agree. Most reads make it through filtering and denoising, but then are removed during merging.

q2-DADA2 will throw away any reads that are not able to merge. So we have to make sure you have enough area of overlap to merge your reads successfully.

Here is how I calculate area of overlap for paired end reads.

(I'm not really choosing the overlap, I'm trying to calculate how long the overlap could be based on the length of the region amplified and the length of the reads).

|----------------------------------------------- 16S amplicon ~1500 bp
     314f |>                   <| 806r primer
          |---------------------| regioned amplified 806-314 = 492 bp
250 bp f  |-----------> 
                      <---------| 250 bp r read
  the area of overlap ^ (250+250-492 = 8 bp)

If you trimmed your reads:
          |---------------------| regioned amplified 806-314 = 492 bp
250 bp f  |-----------> 
                        <-------| 230 bp r read
Now there is no overlap^  (250+230-492 = -12 bp)

I'm not sure if I'm explaining this well...

Do you think your reads will overlap?

maryver · August 14, 2020, 2:52pm

Hi @colinbrislawn

Thank you so much for your explication, now its clear.

An apology as I had a finger error on the primers: instead of 314F it is 341F.

In the same way I did the calculations according to your explication:
I have an amplicon of (806-341) = 465bp
If I set my values to 242f and 242r gives me a total of 484bp
then the overlap would be: (242 + 242-465 = 19 bp)
However dada2 throws le following error:

I tried to vary the lengths of the forward and reverse but I still get the same error.

I attach my demux of the sequences to observe the quality
demux-paired-end-mary.qzv (289.8 KB)

Thank you in advance

colinbrislawn · August 14, 2020, 7:35pm

Awesome! I think we are making good progress.

Good work. Now that we have corrected the amplicon size, we will get a better estimate of the overlap!

Based on the demus-paired-end-mary.qzv file you provided, we may have to update the sequence lengths too. That new files says that the forward read is 227 bp long, and reverse read is 224 bp long.
So:
227+224-465= -14
which is another gap, that would prevent your reads from joining.

The 'no reads passed filter' is a different error. Passing --p-trunc-length-r 234 will truncate longer reads to 234 bp and remove all reads shorter than 234, and because your forward reads in that file are 227 bp long, they are all removed.

Did you import raw reads directly from the Illumina instrument, or was some preprocessing done to these reads before importing into Qiime 2?

Colin

maryver · August 15, 2020, 10:03pm

Hello @colinbrislawn

Yes, for these reasons I choose 242 and 242 as a parameter for the overlap.

I imported the reads directly as the company give me the sequences. I have two formats of my sequences: one have primers and barcodes in the sequence and the other have primers and barcodes removed. I imported the ones with the barcodes and primers removed and then uses for dada2

Previously I ran dada2 with the sequences that have primers and barcodes but with other parameters (leaving a gap) . I don't know if having primers and barcodes in the sequence influences the command but I have a few reads too as an output.

Thank you

Marycarmen

thermokarst · August 25, 2020, 2:33pm

Hi @maryver - you appear to be using a very old version of QIIME 2 - can you upgrade and give this a shot again? Thanks! :qiime2:

system · September 25, 2020, 8:33pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.