Hey, I am seeking a little guidance in understanding, if I am doing anything wrong while trimming primers using q2 cutadapt trim-paired command.
A brief overview, I am using qiime2-2018.11. I sequenced 16S rRNA (V3-V4 region using Pro341F/Pro805R primer, degenerate primers). Sequences were already demultiplexed when provided by the sequencing facility. So, I imported them using Casava 1.8 paired-end demultiplexed fastq¶format. Then,
I followed 3 different routes just to get an idea how accurate I am proceeding with my data analysis
- In First route, I ran DADA2 without trimming primers, followed by qiime feature-classifier extract-reads command to remove primer sequences after DADA2 from my repseq prior to taxonomic analysis
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end_805.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 284
--p-trunc-len-r 235
--o-table featuretable_primer.qza
--o-representative-sequences repseqs_primer.qza
--o-denoising-stats denoisingstats_primer.qza
qiime feature-classifier extract-reads
--i-sequences repseqs_primer.qza
--p-f-primer CCTACGGGNBGCASCAG
--p-r-primer GACTACNVGGGTATCTAATCC
--o-reads rep-seqs_extracted.qza
qiime feature-table filter-features
--i-table featuretable_primer.qza
--m-metadata-file rep-seqs_extracted.qza
--o-filtered-table table_extracted_Filetered.qza
- In second route, I ran q2 cutadapt trim-paired command, to remove primer, followed by DADA2 step (using same parameters as mentioned above to have fair comparison between both routes)
qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end_805.qza
--p-front-f CCTACGGGNBGCASCAG
--p-front-r GACTACNVGGGTATCTAATCC
--o-trimmed-sequences Cutadapt_regular.qza
--verbose
- In 3rd route, I ran DADA2 by specifying, --p-trim-left-f and --p-trim-left-r parameters to trim/remove primers
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end_805.qza
--p-trim-left-f 17
--p-trim-left-r 21
--p-trunc-len-f 284
--p-trunc-len-r 235
--o-table featuretable_primercut_DADA.qza
--o-representative-sequences repseqs_primer_cut_DADA.qza
--o-denoising-stats denoisingstats_primer_cut_DADA.qza
For taxanomic analysis, I used
qiime feature-classifier classify-sklearn
--i-classifier gg-13-8-99-nb-classifier.qza
--i-reads repseqs_primer.qza
--o-classification taxonomy_primer.qza
So, after comparing DADA2 output and taxa-bar-plots of these 3 routes, I got very strange results:
- Route 1, resulted in loss of almost 80-85% reads as Chimeras during DADA2 step (which is probably due to the presence of primers), However, filtration, denoising and merging looks fine.
Route 1 | Roue 2 | Route 3 | ||
---|---|---|---|---|
With primers | Primer cut |
(after DADA2 with extract-reads)|Primer cut (Cutadapt)|Primer cut (DADA2)|
|input|55444|55444|55444|55444|
|filtered|43924|Cannot have this information as extract-reads step was done after DADA2|35430|44046|
|denoised|43924|35430|44046|
|merged|29642|26575|32310|
|non-chimeric|7113|20001|22963|
- Route 2 and 3 resulted in similar taxonomic classification, but, completely opposite to route 1
Relative Abundance
Route 1 Roue 2 Route 3
Phylum With primers Primer cut
(after DADA2 with extract-reads) Primer cut (Cutadapt) Primer cut (DADA2)
Proteobacteria 23 % 23 % 54 % 51 %
Bacteroidetes 31 % 31 % 18 % 20 %
Firmicutes 31 % 31 % 19 % 18 %
So, by looking at this comparison, I think, it is useless to remove primers after DADA2!! (It’s better if we do it before), but on the other hand I am not sure, if removing primers with either Cutadapt or directly through DADA2 results in best results as I am not highly convinced with such huge differences in taxonomy with/without primers. Can you help me in understanding, why there are such huge differences!
I will be happy to share more results, if needed.