I know this question has been asked earlier, but I just want to confirm that I'm doing it right for my dataset, so, please input your suggestions...
Here are the details and workflow...
Dataset Properties: 7 stool samples, 16S rDNA - V4 region, 2*250bp, HiSeq platform, Demultiplexed
Quality checking with FASTQC:
Used Trimmomatic to remove adaptor content and filter out low quality reads...
$ java -jar trimmomatic-0.38.jar PE -phred33 FJ1_R1.fastq.gz FJ1_R2.fastq.gz FJ1_for_paired.fq.gz FJ1_for_unpaired.fq.gz FJ1_rev_paired.fq.gz FJ1_rev_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:100
The paired forward and reverse reads were imported into qiime2 (qiime2-2018.4)
$ qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path jar_manifest.csv --output-path raw_imports/jar-raw.qza --source-format PairedEndFastqManifestPhred33
Quality visualization of the imported raw reads with qiime tools...
Further, DADA2 was used for denoising with two different truncation values...
- trunc-len-f 200 and trunc-len-r 175
$ qiime dada2 denoise-paired --i-demultiplexed-seqs jar-raw.qza --p-trunc-len-f 200 --p-trunc-len-r 175 --p-chimera-method pooled --p-n-threads 8 --o-representative-sequences denoise/jar_rep-seqs.qza --o-table denoise/jar_table.qza --o-denoising-stats denoise/jar_stats.qza --verbose
Output is jar_table.qzv (366.8 KB)
Total features: 1342
Total frequency: 831,346
- No truncation, as the quality was comparatively better:
$ qiime dada2 denoise-paired --i-demultiplexed-seqs jar-raw.qza --p-trunc-len-f 0 --p-trunc-len-r 0 --p-chimera-method pooled --p-n-threads 8 --o-representative-sequences denoise/jar-untr_rep-seqs.qza --o-table denoise/jar-untr_table.qza --o-denoising-stats denoise/jar-untr_stats.qza --verbose
Output is jar-untr_table.qzv (339.3 KB)
Total features: 699
Total frequencies: 2,428,146
I understood from this topic that the changes in total features and frequencies are based on the probability of finding the error-free reads...
Now my questions are...
- Is the workflow correct? Using trimmomatic prior to denoising?
- Does the
MINLENparameter of Trimmomatic effect the total output of the reads? Do I need to increase it to ~175 (end of good quality in reverse reads)?
- DADA2: As it is clear that truncation is required, are the values used for truncation appropriate?
- Do I still need to truncate the reverse reads? (I can run it and check, but, with the computational resources availble here, it is taking a minimum of 72 hrs to complete the denoising for this dataset)
- Whether I need to trim the 5` end of the reverse reads?
Thank you for your time...