Need a second opinion on my data - DADA2, truncation length for V4 region - 250 bases PE read data

anwesh · June 2, 2018, 8:02am

Hi all,

I know this question has been asked earlier, but I just want to confirm that I'm doing it right for my dataset, so, please input your suggestions...

Here are the details and workflow...
Dataset Properties: 7 stool samples, 16S rDNA - V4 region, 2*250bp, HiSeq platform, Demultiplexed

Quality checking with FASTQC:

Sample1_R1

Sample1_R2

Used Trimmomatic to remove adaptor content and filter out low quality reads...

$ java -jar trimmomatic-0.38.jar PE -phred33 FJ1_R1.fastq.gz FJ1_R2.fastq.gz FJ1_for_paired.fq.gz FJ1_for_unpaired.fq.gz FJ1_rev_paired.fq.gz FJ1_rev_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:100

The paired forward and reverse reads were imported into qiime2 (qiime2-2018.4)

$ qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path jar_manifest.csv --output-path raw_imports/jar-raw.qza --source-format PairedEndFastqManifestPhred33

Quality visualization of the imported raw reads with qiime tools...

Further, DADA2 was used for denoising with two different truncation values...

trunc-len-f 200 and trunc-len-r 175

$ qiime dada2 denoise-paired --i-demultiplexed-seqs jar-raw.qza --p-trunc-len-f 200 --p-trunc-len-r 175 --p-chimera-method pooled --p-n-threads 8 --o-representative-sequences denoise/jar_rep-seqs.qza --o-table denoise/jar_table.qza --o-denoising-stats denoise/jar_stats.qza --verbose

Output is jar_table.qzv (366.8 KB)
Total features: 1342
Total frequency: 831,346

No truncation, as the quality was comparatively better:

$ qiime dada2 denoise-paired --i-demultiplexed-seqs jar-raw.qza --p-trunc-len-f 0 --p-trunc-len-r 0 --p-chimera-method pooled --p-n-threads 8 --o-representative-sequences denoise/jar-untr_rep-seqs.qza --o-table denoise/jar-untr_table.qza --o-denoising-stats denoise/jar-untr_stats.qza --verbose

Output is jar-untr_table.qzv (339.3 KB)
Total features: 699
Total frequencies: 2,428,146

I understood from this topic that the changes in total features and frequencies are based on the probability of finding the error-free reads...

Now my questions are...

Is the workflow correct? Using trimmomatic prior to denoising?
Does the MINLEN parameter of Trimmomatic effect the total output of the reads? Do I need to increase it to ~175 (end of good quality in reverse reads)?
DADA2: As it is clear that truncation is required, are the values used for truncation appropriate?
Do I still need to truncate the reverse reads? (I can run it and check, but, with the computational resources availble here, it is taking a minimum of 72 hrs to complete the denoising for this dataset)
Whether I need to trim the 5` end of the reverse reads?

Thank you for your time...

With Regards
Anwesh

thermokarst · June 4, 2018, 10:42pm

Hi @anwesh!

I know you mentioned computational resources are limited, but maybe it is worth processing a subset of your samples with and without this trimmomatic step so that you can compare the results.

I don't know, we don't develop trimmomatic, but I suspect that this parameter does impact the final number of reads.

They look reasonable to me, but again, maybe it is worth testing out a few sets of values on a subset of samples and comparing the results.

It looks like you truncated your reverse reads in example 1 above.

Maybe! Again, I would recommend testing this out on a subset of your samples to get an understanding for how these parameters will impact your downstream results.

Hope that helps! :qiime2:

system · July 7, 2018, 11:41am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.