Merging after denoising step

Ayushi_Bhagat · September 4, 2024, 10:11am

Greetings,

I am running QIIME2 (2023 version) and I have a few questions regarding one of the data. Everything is fine but after the denoising step, I am not getting enough merged sequences because of which I am losing a lot of features and in the end while assigning taxonomy, I am not getting enough OTU abundances for the bacteria. I have tried various truncation lengths for forward and reverse reads but this is the maximum length at which I am atleast getting some output. Can you please suggest what can be done?

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-front-f TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAYTGGGYDTAAAGNG
--p-front-r GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTACNVGGGTATCTAATCC
--o-trimmed-sequences demux-paired-end-trimmed.qza

qiime demux summarize
--i-data demux-paired-end-trimmed.qza
--o-visualization demux-paired-end-trimmed.qzv

qiime tools view demux-paired-end-trimmed.qzv

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end-trimmed.qza
--p-trunc-len-f 125
--p-trunc-len-r 125
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats stats-dada2.qza

qiime metadata tabulate
--m-input-file stats-dada2.qza
--o-visualization stats-dada2.qzv

qiime tools view stats-dada2.qzv

qiime feature-table summarize
--i-table table.qza
--m-sample-metadata-file metadata.txt
--o-visualization table.qzv

qiime tools view table.qzv

qiime feature-table tabulate-seqs
--i-data rep-seqs.qza
--o-visualization rep-seqs.qzv

qiime tools view rep-seqs.qzv

qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-515-806-nb-classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy.qza

qiime metadata tabulate
--m-input-file taxonomy.qza
--o-visualization taxonomy.qzv

qiime taxa barplot
--i-table table.qza
--i-taxonomy taxonomy.qza
--m-metadata-file metadata.txt
--o-visualization taxa-barplot.qzv

qiime tools export
--input-path table.qza
--output-path exported-feature-table

biom convert
--input-fp exported-feature-table/feature-table.biom
--output-fp exported-feature-table/otu-table.tsv
--to-tsv

qiime tools view taxa-barplot.qzv

qiime tools view exported-feature-table/otu-table.tsv

buzic · September 4, 2024, 11:22am

Hi @Ayushi_Bhagat

welcome to the forum! So the issues is

Could you share the denoise stats as a visualisation file stats-dada2.qvz so we can have a look?

During your cutadapt step, you appear to have added the entire adapter sequence not just the primer, this could be one reason for poor merging. Try running the cutadapt step with only the primer (it should trim anything preceding that sequence anyway) and I would try with the setting --verbose setting to see how the trimming is going. Warning, it will print out lots of stuff, but it should give you an idea per sample of how many times you find your primers and they are trimmed.

all the best,

Vic

Micro_Biologist · September 4, 2024, 12:25pm

I assume you're using the V4 region primers due to the classifier you're using, is this correct?

If so what length sequencing did you do? It seems to me you probably don't have sufficient overlap if you're trimming the sequences to 125bps for a ~290bp amplicon. If you did 150bpsx2 then I would consider just using the forward reads, if you did longer then you're able to (hopefully) extend the trim length to allow sufficient overlap.

Ayushi_Bhagat · September 4, 2024, 10:24pm

I tried uploading the .html files but I could not as this portal didn't allow me. I am attaching the screenshot here just for reference.

Ayushi_Bhagat · September 4, 2024, 10:25pm

Hi Jono, thanks for this input.

So, I tried using just the primer sequences but when I do that I only got 31 features and not many sequences were being merged out. Although a good percentage of more than 97% of sequences were being filtered out. This was the reason I tried using the full overhang sequences. I am attaching the files, please do have a look. I have tried multiple times and I am still stuck with it. Also, if I increase the truncation paramaters more than 130 bp such as 135 bp or 140 bp I was losing most of the sequences at the filtered stage. If I decrease the truncation value to 125 bp, then also the same thing happens. I have 150 bp for both forward and reverse reads with expected amplicon length of 260 bp. Please do suggest me something to fix this. I have tried it more than 15 times till now.

Ayushi_Bhagat · September 4, 2024, 10:25pm

Hi Victoria, thanks for this input.

So, I tried using just the primer sequences but when I do that I only got 31 features and not many sequences were being merged out. Although a good percentage of more than 97% of sequences were being filtered out. This was the reason I tried using the full overhang sequences. I am attaching the files, please do have a look. I have tried multiple times and I am still stuck with it. Also, if I increase the truncation paramaters more than 130 bp such as 135 bp or 140 bp I was losing most of the sequences at the filtered stage. If I decrease the truncation value to 125 bp, then also the same thing happens. I have 150 bp for both forward and reverse reads with expected amplicon length of 260 bp. Please do suggest me something to fix this. I have tried it more than 15 times till now.

Ayushi_Bhagat · September 4, 2024, 10:26pm

Ayushi_Bhagat · September 4, 2024, 10:37pm

buzic · September 5, 2024, 8:27am

Hi @Ayushi_Bhagat,

A few things I have a few thoughts/questions that might help.

If you are trying to reconstruct the amplicon with an expected length of 260 bp, you’ll need to make sure you have enough to overlap and the default overlap value for dada2 in Qiime2 is 12bp.

The truncation parameters don’t have to be the same for forward and reverse, and by looking at your graph it appears your forward reads are fine and it’s the reverse that need truncating where the quality dips at the end. What happens if you use something like:

qiime dada2 denoise-paired 
--i-demultiplexed-seqs demux-paired-end-trimmed.qza 
--p-trunc-len-f 0 
--p-trunc-len-r 130 
--o-representative-sequences rep-seqs.qza 
--o-table table.qza 
--o-denoising-stats stats-dada2.qza

Ayushi_Bhagat · September 6, 2024, 5:48am

I tried this as well today but again, i am loosing sequences at the merging step. I have good number of sequences being denoised, filtered but only 3-4 sequences get merged. Can you please provide more suggestions? I have checked the primers.

buzic · September 6, 2024, 12:34pm

HI again @Ayushi_Bhagat ,

I would say that if this isn't working I would point you back to the issue that @Micro_Biologist pointed out.

Are you absolutely sure your sequencing has sufficient length to cover your target region with your trimming? See this post. What primers did you use? You might only have an overlap with no trimming because and as I mentioned

So, you can play with the --p-min-overlap setting in Dada2 to adjust this. i.e. try some tests with no trimming to see if this is the case somthing like:


qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end-trimmed.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-min-overlap 10
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats stats-dada2.qza

If that works then the issue is your sequence length vs your amplicon tagert length and you may need to proceed with forwrad reads only because your reverse reads will require trimming and that won't allow you to maintain the small overlap you had.

Ayushi_Bhagat · September 8, 2024, 2:21am

My target amplicon seqeunce length is 253 bp which I verified from the literature. I have 150 bp x2 reads, so I believe it should have been okay. However, even with 0 truncation values, it didn't run properly and not many sequences came out to be merged. Again the same issue popped up.

buzic · September 9, 2024, 8:28am

Hi Again @Ayushi_Bhagat

If the case is that with various trimming values, you don't get an overlap it would suggest to me that unfortunately your sequencing was not long enough to cover the entire amplicon you targeted. But I can't be sure without you answering this question:

Your forward reads appear to be of good quality, and you could do as previously suggested:

To do this you will just use the qiime dada2 denoise-single see here for details, the example of the command is like so:

  qiime dada2 denoise-single \
    --i-demultiplexed-seqs demux-single.qza \
    --p-trunc-len 145 \
    --o-representative-sequences representative-sequences.qza \
    --o-table table.qza \
    --o-denoising-stats denoising-stats.qza

Ayushi_Bhagat · September 9, 2024, 10:33pm

Thank you so much for your response. My primers are as follows -
AYTGGGYDTAAAGNG \ forward
TACNVGGGTATCTAATCC \ reverse
In literature, its expected amplicon length is 253 bp but it doesn't work at all based on this length in the literature. I got it checked from the sequencing place.

buzic · September 11, 2024, 10:36am

Hi @Ayushi_Bhagat,

I would suggest that you try to continue using forward reads only. Usually, the V4 region is sequenced using a 2x250 sequencing to leave room to trim and merge. If using a 2x150bp the rigorously tested EMP approach is much more dependable and consistent.

system · October 12, 2024, 4:36pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.