issue with demultiplexing paired end reads

Hello
@Afaf and I are having an issue with demultiplexing (or possibly denoising) end reads as well. We used demux cut-adapt in qiime2-2022.8, and we are looking for some guidance

Briefly, we received multiplexed forward and reverse reads and metadata files (.txt) with barcodes for ITS and 16S amplicons. We started with the 16S, which was amplifed with EMP primers 515f and 806r. Because we do not have a fastq file for the barcodes, we imported the data as paired end multiplexed data with barcodes in sequences and demultiplexed via cutadapt. The adapters were already trimmed off by the sequencing facility, so no trimming was performed.

Import code:

qiime tools import \
  --type MultiplexedPairedEndBarcodeInSequence \
  --input-path muxed-pe-barcode-in-seq \
  --output-path multiplexed-seqs.qza

Demux code:

qiime cutadapt demux-paired
--i-seqs multiplexed-seqs.qza
--m-forward-barcodes-file barcodes.txt
--m-forward-barcodes-column BarcodeSequence

https://view.qiime2.org/visualization/?type=html&src=09c6d6cd-b1d7-4553-8da4-d6c4e568d125

although there is some variabilty in number of reads from sample to sample (~15000 down to ~600 across 96 samples), we proceeded with denoising in dada2 with limited truncation

Denoising code:

qiime dada2 denoise-paired --i-demultiplexed-seqs 16S_per_sample_sequences.qza --p-n-threads 28  --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 248 --p-trunc-len-r 220 --o-representative-sequences 16S_rep-seqs-dada2_f.qza --o-table 16S_table-dada2_f.qza --o-denoising-stats 16S_stats-dada2_f.qza --verbose

But it looks like lots of reads were lost at the filtering step.
Example:

sample-id input filtered denoised merged non-chimeric
#q2:types numeric numeric numeric numeric numeric
1.0 26123 298 298 119 119
1.1 26844 17 17 0 0
1.2 9459 0 0 0 0
1.3 7119 0 0 0 0
10.0 49600 37 37 0 0
10.1 2664 0 0 0 0
10.2 19477 132 132 37 37
10.3 14230 0 0 0 0
11.0 20846 2742 2742 2007 2007
11.1 23153 24 24 0 0
11.2 5627 7 7 0 0
11.3 2755 0 0 0 0

Our concerns:

  1. it does not seem that the reads are all that bad, so we're not sure why the filtering step is getting rid of so many (all of the reads in some cases). We are running the denoising step again with lower p-max-ee values to test this and again with 0 truncation in either direction

  2. This makes us wonder whether there is something wrong with our demultiplexing or our importing steps. The primers used were from the Earth Microbiome Project (indicating that we should have imported using the --type EMPPairedEndSequences
    ), but there was not barcodes.fastq file provided (why we went with --type MultiplexedSingleEndBarcodeInSequence).

Which brings us to 3) there seems to be high and unexpected read count variability across the samples. This makes me think that maybe something in demux step or the import step was incorrect and reads are being assigned incorrectly. Could this happen?

So to sum it up: is this a read quality issue or is this an import issue?

Thanks for bearing with me on this super long query, apologies if this is covered elsewhere in the forum, and thanks in advance for any insight!

Laura

Hello and welcome back to the forum!

Did you try to search for the primers in reads? Even if adapters were removed by sequencing center, biological primers may still be in sequence.

You are targeting a relatively short region, so you can truncate reads at lower positions to remove parts with bad quality at the ends.

I guess that you are loosing most of the reads due to too high truncation parameter for forward reads.

Dada2 will discard any reads that are shorter than 248. As I already wrote, targeted region is relatively short and you can truncate more (by setting lower value). In that case more reads will pass the filter, and truncation of bad regions at the ends will increase merging rate.

Try to set lower truncation positions. I am sure that it will fix the issue and there is nothing wrong with importing data. Not sure about the initial variability between samples, but again, most probably that this issue will be fixed as well.

The more details one provides, the easier it is for us to answer the question.

Best,

Hi @timanix
sorry for the slow response! We heard back from the sequencing facility who recommended trimming the multiplexed reads to 150 bp before proceeding with demultiplexing and denoising. I used trimmomatic for that, trimming 100 bp from the trailing end, which resulted in 0 paired sequences.

Using untrimmed demultilplexed files, I have also tried setting the --p-trunc-len to 151 for both the forward and reverse in dada2, but that lead to a large number of reads being lost at the filter step (although with some improvement over the original). Maybe I am misunderstanding what that is actually doing?
Any advice would be appreciated!

Trimmomatic code:

trimmomatic PE forward.fastq.gz reverse.fastq.gz trimmed151_forward_paired.fq.gz trimmed151_forward_unpaired.fq.gz trimmed151_reverse_paired.fq.gz trimmed151_reverse_unpaired.fq.gz LEADING:0 TRAILING:100

Denoising code:

qiime dada2 denoise-paired --i-demultiplexed-seqs per_sample_sequences.qza --p-n-threads 28  --p-trunc-len-f 151 --p-trunc-len-r 151 --o-representative-sequences 16S_rep-seqs-dada2_151.qza --o-table 16S_table-dada2_151.qza --o-denoising-stats 16S_stats-dada2_151.qza 

Example of dada2 stats

sample-id	input	filtered	denoised	merged	non-chimeric
#q2:types	numeric	numeric	numeric	numeric	numeric
1.0	26110	381	381	151	151
1.1	26839	35	35	0	0
1.2	9450	31	31	27	27
1.3	7117	12	12	0	0

Thank you
Laura

Hi Laura!

It is not recommended to perform quality control on reads before dada2.

However, one can use q2-cutadapt to remove primers from sequences

I guess this time issue is in the reverse reads (as it was shown in your post).
I can suggest to try:
Option 1. Truncate forward reads at position 240 (reconsider if you will remove primers by cutadapt) and reverse at 100-110 to check, if it will improve filtering step and still the length of the reads will be enough for merging.
Option 2. Use only forward reads (no need to reimport, just use dada2 single for denoising). Since targeted region is not very long, you will not loose too much in taxonomy annotations with only forward reads.

Hope it will work,
Timur

Hi @timanix !
Thanks for the advice - we are running option 1 now and I will let you know how it goes
Laura