Hi,
I am analyzing fungal ITS2 (ITS86F/ITS4 primers, PE300) sequences and I have some questions regarding the truncation of the reads as well as dealing with amplicons of different lengths.
In several sites I have seen that it's advised to put the truncation length at dada2 denoise step to "0".
(https://github.com/Joseph7e/ITS_metabarcoding_analyses
DADA2 Pipeline Tutorial (1.16) )
This makes sense as ITS has very variable length depending on the species - using a specific truncation length for all reads will bias the results (as you loose all the shorter amplicons). At the same time when setting the truncation length to 0 I have a lot less sequences that pass the merging step. I tried it as well on data that has been passed through cutadapt.
I found some explanation at this thread DADA2, truncation lengths and features number - but in the end I still don't know how to proceed with my data so that I could keep all quality reads... Maybe ITSxpress will help (posted another topic on my issues with that tool).
demux-paired-SB72.qzv (270.0 KB)
trimmed_sequencesSB72.qzv (274.4 KB)
Original data:
qiime dada2 denoise-paired
--verbose
--i-demultiplexed-seqs demux-paired-end_SB72.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 270
--p-trunc-len-r 220
--p-n-threads 40
--o-table table_0SB72.qza
--o-representative-sequences rep-seqs_SB72.qza
--o-denoising-stats denoising-stats_truncSB72.qzv
denoising stats
sample-id input filtered denoised merged non-chimeric
SB72 18647 14620 14620 14513 14466
qiime dada2 denoise-paired
--verbose
--i-demultiplexed-seqs demux-paired-end_SB72.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-n-threads 40
--o-table table_0SB72.qza
--o-representative-sequences rep-seqs_0SB72.qza
--o-denoising-stats denoising-stats_0SB72.qza
denoising stats
sample-id input filtered denoised merged non-chimeric
SB72 18647 11091 11091 298 298
The cutadapt trimmed data - removed the part that for short amplicons runs into reverse primer
qiime dada2 denoise-paired
--verbose
--i-demultiplexed-seqs trimmed_sequences.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 270
--p-trunc-len-r 220
--p-n-threads 40
--o-table table_trimmedSB72.qza
--o-representative-sequences rep-seqs_trimmedSB72.qza
--o-denoising-stats denoising-stats_trimmedSB72.qza
denoising stats
sample-id input filtered denoised merged non-chimeric
SB72 18647 13822 13822 13770 13770
qiime dada2 denoise-paired
--verbose
--i-demultiplexed-seqs trimmed_sequences.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-n-threads 40
--o-table table_0trimmedSB72.qza
--o-representative-sequences rep-seqs_0trimmedSB72.qza
--o-denoising-stats denoising-stats_0trimmedSB72.qza
denoising stats
sample-id input filtered denoised merged non-chimeric
SB72 18647 11488 11488 703 703
So in the end I am having trouble with the logical recommendation for ITS not to truncate the reads, but then loosing most of my reads.
Thank you in advance!!!