Analyze sequences from different sequencing facilities and with different primer.

Dear all,

i know this is not the first post about using qiime2 with DADA2 to analyze sequences from different facilities and primers. However, after trying numerous possibilities and options (all found in this forum) I decided to write this post, as I don’t know any other option anymore.
The situation is:

I have 2 sets of sequences (both 2x300bp; V3-V4) from activated sludege samples from the same waste water treatment plant. Sequencing was perfromed on MiSeq with the primer pair 341f and 802r for the first batch and 341f and 806r for the second batch.
After reading all posts about how to deal with problems like this I trimmed my reads with cutadapt:

qiime cutadapt trim-paired \
   --i-demultiplexed-sequences /cluster/scratch/niederro/WEnzel/Uster_1/Uster_1.qza \
   --p-cores 4 \
   --p-front-f CCTACGGGNGGCWGCAG \
   --p-front-r TACNVGGGTATCTAATCC \
   --p-error-rate 0 \
   --o-trimmed-sequences /cluster/scratch/niederro/WEnzel/Filtered_QZas/Uster_1_filt.qza


qiime cutadapt trim-paired \
   --i-demultiplexed-sequences /cluster/scratch/niederro/WEnzel/Uster_2/Uster_2.qza \
   --p-cores 4 \
   --p-front-f CCTACGGGNGGCWGCAG \
   --p-front-r GACTACHVGGGTATCTAATCC \
   --p-error-rate 0 \
   --o-trimmed-sequences /cluster/scratch/niederro/WEnzel/Filtered_QZas/Uster_2_filt.qza

For the 341/802 batch (#1) it worked fine and it removed ll primers. In the 2nd batch, however, no primers were attached anymore. Which left me with sequences from the 1st batch with ~285bp and sequences from the 2nd batch with 301 bp.
Uster_1_filt.qzv (298.9 KB) Uster_2_filt.qzv (296.6 KB)

I tried multiple different ways to trim them in the same way to uncover ASVs that are present in both batches. However, in most of the cases it worked for one batch but not the other and merging of the DADA2 outputs never led to a single common ASV.
Just one example:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs /cluster/scratch/niederro/WEnzel/Uster_1/Uster_1_filt.qza \
  --p-max-ee-f 2 \
  --p-max-ee-r 3 \
  --p-trunc-len-f 270 \
  --p-trunc-len-r 205 \
  --p-n-threads 4 \
  --o-table /cluster/scratch/niederro/WEnzel/Uster_1/DADA2/Dada2_Uster_1.qza \
  --o-representative-sequences /cluster/scratch/niederro/WEnzel/Uster_1/DADA2/DADA2_Uster1_rep-seqs.qza \
  --o-denoising-stats /cluster/scratch/niederro/WEnzel/Uster_1/DADA2/Dada2_denoising-stats_Uster1.qza

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs /cluster/scratch/niederro/WEnzel/Uster_2/Uster_2.qza \
  --p-trim-left-f 15 \
  --p-trim-left-r 15 \
  --p-max-ee-f 2 \
  --p-max-ee-r 3 \
  --p-trunc-len-f 285 \
  --p-trunc-len-r 220 \
  --p-n-threads 4 \
  --o-table /cluster/scratch/niederro/WEnzel/Consensus_final_choices/Final_Final/Dada2_Uster_2.qza \
  --o-representative-sequences /cluster/scratch/niederro/WEnzel/Consensus_final_choices/Final_Final/DADA2_Uster2_rep-seqs.qza \
  --o-denoising-stats /cluster/scratch/niederro/WEnzel/Consensus_final_choices/Final_Final/Dada2_denoising-stats_Uster2.qza

Next I tried to take the raw sequences with following DADA2 parameters:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs /cluster/scratch/niederro/WEnzel/Uster_1/Uster_1.qza \
  --p-trim-left-f 17 \
  --p-trim-left-r 18 \
  --p-max-ee-f 2 \
  --p-max-ee-r 3 \
  --p-trunc-len-f 289 \
  --p-trunc-len-r 220 \
  --p-n-threads 4 \
  --o-table /cluster/scratch/niederro/WEnzel/Uster_raw/Rawer/Dada2_Uster_1_new.qza \
  --o-representative-sequences /cluster/scratch/niederro/WEnzel/Uster_raw/Rawer/DADA2_Uster1_rep-seqs_new.qza \
  --o-denoising-stats /cluster/scratch/niederro/WEnzel/Uster_raw/Rawer/Dada2_denoising-stats_Uster1.new.qza


qiime dada2 denoise-paired \
  --i-demultiplexed-seqs /cluster/scratch/niederro/WEnzel/Uster_2/Uster_2.qza \
  --p-trim-left-f 17 \
  --p-trim-left-r 18 \
  --p-max-ee-f 2 \
  --p-max-ee-r 3 \
  --p-trunc-len-f 289 \
  --p-trunc-len-r 220 \
  --p-n-threads 4 \
  --o-table /cluster/scratch/niederro/WEnzel/Uster_raw/Rawer/Dada2_Uster_2_raw_new.qza \
  --o-representative-sequences /cluster/scratch/niederro/WEnzel/Uster_raw/Rawer/DADA2_Uster2_rep-seqs_new.qza \
  --o-denoising-stats /cluster/scratch/niederro/WEnzel/Uster_raw/Rawer/Dada2_denoising-stats_Uster2_new.qza  

Here I treid to make sure that (i) the primers get removed from batch 1 and (ii) sequences from batch 2 start at the same position as batch 1. Again, I was not able to find a single common ASV.
Dada2_Uster_1_new.qzv (1.1 MB) Dada2_Uster_2_new.qzv (597.3 KB)

Do you know what could be problem here or even better what could be the solution for this problem?

Thank you very much for any advice.

Robert

Hi @Robert_Niederdorfer,
Welcome to the forum and thanks for taking time to look through the forum first and trying some strategies first.
I think your logic here is sound to try and achieve sequences covering the exact same region but trying to find the exact region with trimming is a bit tedious, perhaps there’s an easier way to achieve this.

I’m not sure if this will work, but why don’t you try and use the primers set from your shorter region, in both cutadapt trimming steps. That way you ensure you are flanking the same spot in both runs and discarding all the preceding nts that are left over in your run2.
Basically, just run cutadapt from your run1 exactly as-is with run2 data.
If I’m not mistaken that should leave you with reads of the exact same region. You can also set the --p-discard-untrimmed flag to discard any reads that didn’t have these primers in them in case there are some unwanted reads hanging around.
Give that a go and let us know how it goes.

1 Like

Thank you @Mehrbod_Estaki for your input…
I had to freeze these anlysis as there other deadlines coming up, therfore the very late response.
The problem here is that, as sequences from run1 have the primers attached sequences from run2 doesn’t have them anymore, or at least just a few of them. After cutadapt this normally leaves me with ~280bp reads for run1 (as primers were trimmed) and 301bp reads from run2. If I use discard untrimmed I have hardly any reads left from run2.
Therefore this approach also doesn’t work for me. When I analyse both set individually I nearly get the same taxonomic classification even down to the Genus level but breaking it down to that and compare it on that level is not very reliable.
Would it make sense to only use the forward reads from both sets?

Best and thank you,

Robert

Hi @Robert_Niederdorfer,

Aha, I think I understand your issue here better now. Sorry if I didn’t catch it the first time around.

Yes, that would certainly agree with what you mentioned that your run2 doesn’t have the primers in them anymore.

This is totally expected, and reassuring, since the 2 reads are really only different a a few nucleotides, so the classifiers should have no problem classifying them to the same taxa. However,

If I were you, I would stick with this approach you suggested. In fact this is the most common approach when we do metanalyses or trying to combine multiple studies.
That way you are dealing with the exact same ASVs, you can truncate them all to the same length and that way you will get exact ASVs coverage and you can do analysis at the ASV level rather than collapsing reads based on taxonomy levels.

Hope this helps, keep us posted if you run into any other issues!

Thank you @Mehrbod_Estaki for your answers.

If I were you, I would stick with this approach you suggested. In fact this is the most common approach when we do metanalyses or trying to combine multiple studies.
That way you are dealing with the exact same ASVs, you can truncate them all to the same length and that way you will get exact ASVs coverage and you can do analysis at the ASV level rather than collapsing reads based on taxonomy levels.

I think this is indeed, the best solution. However, I’m still facing the problem wiith the primers hanging on the sequences in set1 and not in set2. I obtain again different sets of ASVs from Deblur. Any idea how to overcome this issue?

Hi @Robert_Niederdorfer,

If removing the primers from set1 using cutadapt and truncating both sets to the same length doesn’t give you the the same ASVS between the 2 sets there may be something else at play here.

The easiest solution would be see if you can grab the unfiltered sets for both runs from your sequencing facility and do the primer removal yourself.

If this isn’t an option then you’ll want to know exactly what they have done with their primer removal. Maybe they just cut a fixed number of nucleotides instead of searching for primer target. <- not ideal, but maybe you can use the same parameters as this with set 1 and hopefully that will put them in the same starting point.

Can you also double check to make sure that the forward primers used is identical in both cases?