DADA2 filters out over 50% of my reads | Paired-end reads

Hi everyone,
I followed the Cancer Microbiome Intervention Tutorial (Filtering feature tables — QIIME 2 Cancer Microbiome Intervention Tutorial) but I am having issues in the denoising step using DADA2. I have 2x300 Illumina Sequencing paired-end reads. I was given just the raw reads and I imported them using the Cassava format. This is how the file names look like:
Sample_Barcode_L001_R1_001.fastq.gz and Sample_Barcode_L001_R2_001.fastq.gz
I used the following commands to import the data:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--input-path data_to_import
--output-path demultiplexed-sequences.qza
Based on my quality plots I truncated both forward and reverse sequences at 270, however I am getting below 50% of reads that passed the filter. Please, if you have any advice let me know. Thank you!
image
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trunc-len-f 270 --p-trunc-len-r 270 --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza

Hello @Mar,

Welcome to the qiime2 forum :qiime2: !

From looking at your sequence quality plots, I think that a reverse truncating position (i.e. --p-trunc-len-r) of 270 is possibly too far into the sequence. At that position the base pairs have very poor quality.

You could try a length closer to the 200-220 range (e.g. --p-trunc-len-r 220) and see if this helps.

Please keep in mind that you will need a certain amount of overlap for the paired sequences to merge. This will depend on the length of your target region.

Let us know if adjusting the reverse truncating position helps with read loss.

Thanks!

3 Likes

Thank you! I changed the position and now my results are higher. Still the highest percentage I get is 70%. But I will try with a different number in that range (200-220).