Problems when performing DADA2 for Illumina paired-end

MiriamGorostidi · November 28, 2023, 1:33pm

Hey!

I have already been reading about similar issues like:

or
error with dada2 - paired end sequences 2x250 bp - #2 by thermokarst, but even if I applied the recommendations or solutions given in them, my problem is not solved.

What is recommended is to try relaxing the truncating parameters, however, I have tried that and it is still not working. Here is some info about what I've been doing:

My samples are imported and the .qzv is here:
samples.qzv (311.3 KB)

You will see that the first 12-13nts are of bad quality, so I was trimming them.

Then, I decided to maintain the whole length of reads and I was truncating on the last nts, 150-151.

Like this:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs ${DIR}/samples.qza\
  --p-trunc-len-f 150 \
  --p-trunc-len-r 150 \
  --p-trim-left-f 13 \
  --p-trim-left-r 13 \
  --p-n-threads 4 \
  --o-representative-sequences ${DIR}/Dada2_output/rep-seqs-paired.qza \
  --o-table ${DIR}/Dada2_output/table-paired.qza \
  --o-denoising-stats ${DIR}/Dada2_output/stats-dada2-paired.qza \
  --verbose

I can't be much more flexible, right? The truncating is in the max and the trimming is performed so those reads with worse quality are removed.

Any help, please?

Thank youuu!

Micro_Biologist · November 28, 2023, 3:26pm

Hello,

Have you removed the primer sequences from the reads in samples.qza ? If you haven't this could explain poor quality in that region, and you will want to remove them anyway.
Secondly, what amplicon are you using? In particular its length.

If you trim your primers using the cutadapt plugin (I prefer to discard untrimmed sequences), and then remake the visualisation to select new trim parameters.

Selecting trimming length is a balance, too short and you will lose amplicon reads that are too long, too long and you make include poor quality bases that will cause the reads to be discarded anyway. Notice how you have a massive drop in quality at around 130bps? You will definitely want to trim before this drop as many of the reads will be discarded due to poor quality.

If you simply do not have long enough high quality sequences you could use just the reverse reads as the quality is better.

Jono

MiriamGorostidi · November 29, 2023, 11:08am

Hello Jono!

Thank you so much for your time.

At first I was using cutadapt for trimming the primers. The thing is that they did not know how to tell me which primers were used exactly. The genomic platform told me that they used the Nextera Kit and sent me the documentation, but I don't understand exactly what I should be removing:

Document: https://dnatech.genomecenter.ucdavis.edu/wp-content/uploads/2019/03/illumina-adapter-sequences-2019-1000000002694-10.pdf

I'm not clear about which sequences I should be trimming... I tried with the first parts of index 1 (i7) adapters and Index 2 (i5) adapters, but it did not work and the samples that were imported to Qiime2 did not have any reads on them.

That is why I decided to import the sequences with no previous cutadapt step and just trim the reads based on the quality score graph.

The total length of the amplicon is 150.

I've perform other options and what I see is the following:

I need to trim the 13 first nts, that would be probably the primers. So I always trim.
If a trunc the reads in a length <150, no error is shown (in this case I should be truncating before 130, as you said)
If I decide to trunc in 150, so all the amplicon is mantain, I get the following error:

Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) :
Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable

Why is this?

It seems that the best option would be to trim at 13 and trunc at 130.. However, the % passing the filter is really low..

stats-dada2-paired.qzv (1.2 MB)

Thank you so much again!

colinvwood · December 4, 2023, 4:42pm

Hello @MiriamGorostidi,

The quality drop at the beginning of your sequences isn't significant enough to warrant trimming.

The thing is that they did not know how to tell me which primers were used exactly.

I'm guessing that "they" refers to the sequencing center that you used. If another party performed the 16S amplification and then sent the amplicons to the sequencing center then the latter wouldn't know which 16S primers you used. Which 16S primers were used for your amplicons is what needs to be figured out and then used to trim. Trimming using these primers will allow you to ignore the illumina adapters. (I'm assuming that these reads are indeed 16S amplicons).

MiriamGorostidi · December 5, 2023, 8:47am

Yes, exactly! I'm trying to contact the sequencing center to get the exact primers that were used... However, I have a doubt: What happens if I use cutadapt and insert wrong sequences as primers? The thing is that I have tried to use cutadapt with 2 "random" primers (which I thought could be the possible primers that were actually used) and what happens is that my samples run out of reads.Is that normal?

colinvwood · December 5, 2023, 4:35pm

Hello @MiriamGorostidi,

I think you misunderstood: the sequencing center probably does not know what the primers are unless they also performed the 16S amplification.

Trimming with primers that are not found in your sequences will usually not modify the sequences.

MiriamGorostidi · December 6, 2023, 5:05pm

Hii!

Yes yes! I understood. The thing is that our genomic platform is the same that performs 16S amplification and sample sequencing.

They finally gave me the correct primers sequencing.

However, I still have a huge loss of reads per sample (there is almost non Non-chimeric reads). I have also checked the samples with FASTQC and the % of duplicates reads is enormous...

stats-dada2-paired.qzv (1.2 MB)

Maybe that one was the first problem..

colinvwood · December 6, 2023, 5:15pm

Hello @MiriamGorostidi,

From looking at your dada2 stats I can see that the problem isn't the chimera filter but the merging step. Almost none of your reads are merging. Do you know how long your amplicon is or which hyper variable region you're targeting? I also see that you chose truncation lengths of 112 forward and 128 reverse. According to the quality plot you posted earlier these are far too aggressive numbers and are most likely what's keeping your reads from merging.

MiriamGorostidi · December 7, 2023, 11:41am

Hello @colinvwood !

Thank you so much for such a fast response!

They told me that the amplicon was 300bp long (150 F and 150 R) and the hyper variable regions that are being amplified are V3 and V4, as explained here: https://support.illumina.com/documents/documentation/chemistry_documentation/16s/16s-metagenomic-library-prep-guide-15044223-b.pdf

The gene‐specific sequences used in this protocol target the 16S V3 and V4 region. They
are selected from the Klindworth et al. publication (Klindworth A, Pruesse E, Schweer T,
Peplles J, Quast C, et al. (2013) Evaluation of general 16S ribosomal RNA gene PCR
primers for classical and next‐generation sequencing‐based diversity studies. Nucleic
Acids Res 41(1).) as the most promising bacterial primer pair. Illumina adapter
overhang nucleotide sequences are added to the gene‐specific sequences. The full length
primer sequences, using standard IUPAC nucleotide nomenclature, to follow the protocol
targeting this region are:
16S Amplicon PCR Forward Primer = 5'
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG
16S Amplicon PCR Reverse Primer = 5'
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC

This is actually afirst sight into this first sequencing that they have performed so we can decide if we should use a longer amplicon or if the followed protocol is correct.

Regarding the truncating lenghts, which ones would you recommend me to use?

I am not sure if this could affect, but I am comparing, in the same run, samples that have been amplified with different polymerases:

AmpliTaqGold was used for 2 samples
and
Kappa for the other 2

The analysis of this results and the comparison was aimed to decide which polymerase would be better.

colinvwood · December 7, 2023, 7:42pm

Hello @MiriamGorostidi,

If your amplicon is 300bp on average then two 150bp reads are not going to merge on average because at least 12bp (by default) of overlap are needed between the two reads.

To have any shot of merging then you'll have to basically do no truncation. I would try a run with this approach and see what happens.

MiriamGorostidi · December 11, 2023, 12:10pm

Hello @colinvwood !

I tried your approach:

trimming primers with cutadapt and no truncating reads:
stats-dada2-paired.qzv (1.2 MB)
And even no trimming primers, neither truncating reads, just to see what happens:
stats-dada2-paired.qzv (1.2 MB)

Some merged values increase.. but I still lose most of my samples..

What could we do now? Is there any technological method to continue analyzing this sequencing files or should we move to another approach and sequence the samples again?

Thank you

colinvwood · December 11, 2023, 5:44pm

Hello @MiriamGorostidi,

You can move forward with one or the other read directions only, using denoise-single. If you need to have full coverage of this amplicon unfortunately it looks like you would need to resequence with longer read lengths.

MiriamGorostidi · December 12, 2023, 10:51am

Okey perfect! Thank you so much for your help!

system · January 12, 2024, 4:51pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.