DADA2 with Cutadapt to have OTU table

I am new in microbiome analysis and QIIME 2. I found the published paired-end data (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA601757) and would like to generate the OTU table by QIIME 2 for the downstream analysis.
My first step is to read in the data by using qiime tools import:

qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest_file.tsv \
--output-path paired-end-demux.qza \
--input-format PairedEndFastqManifestPhred33V2

By looking at the “Moving Pictures” tutorial, I would like to use DADA2 to handle the data. In DADA2 own website, it mentions non-biological nucleotides need to be removed, e.g. primers, adapters, linkers, etc. And by using R to check the sequence, I see there are primers in the data. Therefore, I decide to use qiime cutadapt trim-paired before qiime dada2 denoise-paired.

qiime cutadapt trim-paired \
--i-demultiplexed-sequences paired-end-demux.qza \
--p-front-f CCTACGGG \
--p-front-r GACTAC \
--o-trimmed-sequences trimmed-seqs.qza \
--verbose
qiime demux summarize \
--i-data trimmed-seqs.qza \
--o-visualization demux.qzv
qiime dada2 denoise-paired \
--i-demultiplexed-seqs trimmed-seqs.qza \
--p-trim-left-f 0 \
--p-trunc-len-f 251 \
--p-trim-left-r 0 \
--p-trunc-len-r 244 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats stats.qza \
--verbose
qiime metadata tabulate \
--m-input-file stats.qza \
--o-visualization stats.qzv

251 and 244 are from demux.qzv Quality Plot to make sure the scores from Middle of Box are all above 20. However, in stats.qzv, the numbers from “percentage of input passed filter” are all vey low, and there are around 80 OTUs in the final OTU table.

If I use paired-end-demux.qza in DADA2 directly without qiime cutadapt trim-paired, then the result is very different.

qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux.qza \
--p-trim-left-f 0 \
--p-trunc-len-f 251 \
--p-trim-left-r 0 \
--p-trunc-len-r 244 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats stats.qza \
--verbose
qiime metadata tabulate \
--m-input-file stats.qza \
--o-visualization stats.qzv

I think I should put “–p-trim-left-f 8” and “–p-trim-left-r 6” to remove the primers. But without removing primers, the “percentage of input passed filter” improve a lot in stats.qzv. And there are around 650 OTUs in the final OTU table.

Does anyone know why the results are so different with and without cutadapt? And should I avoid using cutadapt before dada2? But it is mentioned that primers need to be removed before dada2 procedure in DADA2 website.

In addition, the following is my steps to generate and output OTU table by using Greengenes reference data version 13.8 with 97% similarity threshold. The following link is where I got Greengenes reference: ftp://greengenes.microbio.me/greengenes_release/gg_13_8_otus/
Can you also help me to check if my steps are correct?

### match reference library
qiime vsearch cluster-features-closed-reference \
--i-table table.qza \
--i-sequences rep-seqs.qza \
--i-reference-sequences 97_otus-GG.qza \
--p-perc-identity 0.97 \
--o-clustered-table tbl-gg-97_nasal.qza \
--o-clustered-sequences rep-seqs-gg-97_nasal.qza \
--o-unmatched-sequences unmatched-gg-97_nasal.qza
### export to biom file
qiime tools export \
--input-path tbl-gg-97_nasal.qza \
--output-path feature-table
### convert to txt file
biom convert -i ~/nasal_all/chk_primer/feature-table/feature-table.biom -o feature-table.txt --to-tsv

Thank you very much for any insight and apologies for the long question.

Hello, @OneWay
Welcome to the forum!
I think that the reason for so different outputs are truncating parameters you are providing for Dada2:

--p-trunc-len-f 251 \
--p-trunc-len-r 244 \

Dada2 will trash any sequence that is shorter than these numbers. So, with barcodes still attached, more sequences are passing the filters, meanwhile after cutadapt sequences are shorter and removed at this step.
So I can suggest you to decrease these parameters to keep more sequences. It is better to know approximate amplicon sizes in order to choose truncating parameters to keep as many sequences you can from one side and do not decrease too much overlapping region from other.

I can not see if there is something wrong. If you intended to work with OTUs instead of ASVs, then everything looks fine to me.

2 Likes

Hi, @timanix
Thanks for the prompt response!
I thought --p-trunc-len-f INTEGER is similar to --p-trim-left-f INTEGER, but one is working on 3’ and the other is working on 5’. I check qiime dada2 denoise-paired procedure again and see --p-trunc-len-f INTEGER will trash any sequence that is shorter than these numbers.

Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3’ end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded.

I have two follow-up questions:

  1. Does that mean we can only trim on 5’ but cannot trim/remove bp on 3’?
  2. The followings are the quality plots for forward and reverse reads. In reverse reads, quality score for Middle of Box starts to be below 20 from position 244. What should I put in my qiime dada2 denoise-paired to handle it?

You can cut at both sides by using a combination of trim (at the beginning, for example, 4) and trunc (at the end, for example, 210).
Regarding your quality plots, it looks like primers are still attached. You can remove them with cutadapt, and plot it again to gain a better idea of of which parameters to use, or use trim parameter to cut the primers directly in Dada2 by length of the primers.
I would suggest that you will take an average length of your amplicons, and calculate based on it the trunc parameters to keep overlapping region (at least 12-20 bp).If you are applying both trim and trunc, then the length of the read before merging will be trunc - trim.

Hi @timanix,
Thank you again! I think I will probably just use dada2 directly, rather than use both cutadapt and dada2.