Need a second opinion on my data - DADA2, truncation length for V4 region - 250 bases PE read data

Hi all,

I know this question has been asked earlier, but I just want to confirm that I'm doing it right for my dataset, so, please input your suggestions...

Here are the details and workflow...
Dataset Properties: 7 stool samples, 16S rDNA - V4 region, 2*250bp, HiSeq platform, Demultiplexed

Quality checking with FASTQC:


Sample1_R1




Sample1_R2


Used Trimmomatic to remove adaptor content and filter out low quality reads...

$ java -jar trimmomatic-0.38.jar PE -phred33 FJ1_R1.fastq.gz FJ1_R2.fastq.gz FJ1_for_paired.fq.gz FJ1_for_unpaired.fq.gz FJ1_rev_paired.fq.gz FJ1_rev_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:100

The paired forward and reverse reads were imported into qiime2 (qiime2-2018.4)

$ qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path jar_manifest.csv --output-path raw_imports/jar-raw.qza --source-format PairedEndFastqManifestPhred33

Quality visualization of the imported raw reads with qiime tools...


Further, DADA2 was used for denoising with two different truncation values...

  1. trunc-len-f 200 and trunc-len-r 175
    $ qiime dada2 denoise-paired --i-demultiplexed-seqs jar-raw.qza --p-trunc-len-f 200 --p-trunc-len-r 175 --p-chimera-method pooled --p-n-threads 8 --o-representative-sequences denoise/jar_rep-seqs.qza --o-table denoise/jar_table.qza --o-denoising-stats denoise/jar_stats.qza --verbose
    

    Output is jar_table.qzv (366.8 KB)
    Total features: 1342
    Total frequency: 831,346


  2. No truncation, as the quality was comparatively better:
    $ qiime dada2 denoise-paired --i-demultiplexed-seqs jar-raw.qza --p-trunc-len-f 0 --p-trunc-len-r 0 --p-chimera-method pooled --p-n-threads 8 --o-representative-sequences denoise/jar-untr_rep-seqs.qza --o-table denoise/jar-untr_table.qza --o-denoising-stats denoise/jar-untr_stats.qza --verbose
    

    Output is jar-untr_table.qzv (339.3 KB)
    Total features: 699
    Total frequencies: 2,428,146


I understood from this topic that the changes in total features and frequencies are based on the probability of finding the error-free reads...

Now my questions are...

  1. Is the workflow correct? Using trimmomatic prior to denoising?
  2. Does the MINLEN parameter of Trimmomatic effect the total output of the reads? Do I need to increase it to ~175 (end of good quality in reverse reads)?
  3. DADA2: As it is clear that truncation is required, are the values used for truncation appropriate?
  4. Do I still need to truncate the reverse reads? (I can run it and check, but, with the computational resources availble here, it is taking a minimum of 72 hrs to complete the denoising for this dataset)
  5. Whether I need to trim the 5` end of the reverse reads?

Thank you for your time...

With Regards
Anwesh

Hi @anwesh!

I know you mentioned computational resources are limited, but maybe it is worth processing a subset of your samples with and without this trimmomatic step so that you can compare the results.

I don't know, we don't develop trimmomatic, but I suspect that this parameter does impact the final number of reads.

They look reasonable to me, but again, maybe it is worth testing out a few sets of values on a subset of samples and comparing the results.

It looks like you truncated your reverse reads in example 1 above.

Maybe! Again, I would recommend testing this out on a subset of your samples to get an understanding for how these parameters will impact your downstream results.

Hope that helps! :t_rex: :qiime2:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.