Hi everyone,
I am sorry for the long post, but I wanted to provide as much detail as possible.
I am new to using QIIME 2 - 2024.2 (miniconda3) and I have sequences from a 2 x 300 bp Illumina run (338F_806R). Attached are the q-score plots of my raw data 16Ssequences-391521.qzv (315.4 KB)
After importing raw data into qiime2, I first attempted to denoise the sequences using DADA2 with the following command:
qiime dada2 denoise-paired \
--i-demultiplexed-seqs 16Ssequences.qza \
--p-trim-left-f 26 \
--p-trim-left-r 26 \
--p-trunc-len-f 297 \
--p-trunc-len-r 207 \
--p-max-ee-f 3
--p-max-ee-r 3
--p-n-threads 6 \
--o-table 16Sseqs_table.qza \
--o-representative-sequences 16Sseqs_rep.qza \
--o-denoising-stats 16Sseqs_denoising_stats.qza
However, I encountered unsatisfactory results. Below is a summary of my attempts with different parameter sets, including using only forward reads (last row), which showed better results.
A | B | C | D | E | F | G | H | I | G |
---|---|---|---|---|---|---|---|---|---|
trim-left-f | trunc-len-f | trim-left-r | trunc-len-r | overlap | max-ee-f | max-ee-r | percentage of input passed filter | percentage of input merged | percentage of input non-chimeric |
26 | 296 | 26 | 246 | 22 | 4 | 4 | 73-75% | 38-41% | 35-39% |
26 | 296 | 26 | 236 | 12 | 4 | 4 | 80-81% | 42-46% | 39-43% |
26 | 296 | 26 | 230 | 6 | 4 | 4 | 80-82% | 43-48% | 40-45% |
26 | 296 | 26 | 220 | -4 | 5 | 5 | 81-83% | 46-51% | 42-47% |
26 | 196 | / | / | / | 7 | / | 87-89% | / | 70-75% |
Here is one of the QZV files: 16S_296236_denoising_stats.qzv (1.2 MB)
Since DADA2 did not provide satisfactory results, I then attempted to denoise the raw data using Deblur.
Before running Deblur, I used the following steps:
qiime cutadapt trim-paired \
--i-demultiplexed-sequences 16Ssequences_391521.qza \
--p-cores 4 \
--p-front-f ACTCCTACGGGAGGCAGCAG \
--p-front-r GGACTACACGGGTATCTAAT \
--p-error-rate 0.2 \
--p-discard-untrimmed \
--o-trimmed-sequences 16S_trimmed.qza
qiime vsearch merge-pairs \
--i-demultiplexed-seqs 16S-trimmed.qza \
--o-merged-sequences 16S-deblur/merged-seqs.qza \
--o-unmerged-sequences 16S-deblur/unmerged-seqs.qza
qiime demux summarize \
--i-data 16S-deblur/merged-seqs.qza \
--o-visualization 16S-deblur/merged-seqs-summary.qzv
qiime quality-filter q-score \
--i-demux 16S-deblur/merged-seqs.qza \
--o-filtered-sequences 16S-deblur/16Sseqs-filtered.qza \
--o-filter-stats 16S-deblur/16Sseqs-filter-stats.qza
qiime deblur denoise-16S \
--i-demultiplexed-seqs 16S-deblur/16Sseqs-filtered.qza \
--p-trim-length 0 \
--o-representative-sequences 16S-deblur/rep-seqs-deblur.qza \
--o-table 16S-deblur/table-deblur.qza \
--p-sample-stats \
--o-stats 16S-deblur/deblur-stats.qza
I observed that around 50000 reads were merged, while I have more than 250,000 raw reads. Here is the summary QZV file: merged-seqs-summary.qzv (305.2 KB). I also tried adjusting parameters like --p-minovlen
and --p-maxee
, but the results still did not improve as expected. Additionally, here is the QZV file of the feature table using deblur. table-deblur.qzv (422.9 KB)
I would appreciate your assistance with the following questions:
- The percentage of merged and non-chimeric sequences is consistently below 50%. Could you advise on how to improve parameter settings for DADA2 and Deblur to obtain better results?
- How should I prioritize between the overlap of forward and reverse reads and maximizing percentage of merged and non-chimeric sequences? ( e.g. the 4th row in the table)
- Should I consider using only forward reads for this analysis? My concern is whether important microbial information would be lost if I proceed with that approach.
Thanks a lot for your help,
Siqin