Denoising stat results

Namraj_Jaishi · January 16, 2025, 3:10pm

Hi QIIME Forum,

I am performing microbiome analysis on my soil data (banks, sediments, and upstream samples). The data are demultiplexed, and I have removed the primers using the following command:

qiime cutadapt trim-paired
--i-demultiplexed-sequences paired-end-demux.qza
--p-front-f GTGYCAGCMGCCGCGGTAA
--p-front-r GGACTACNVGGGTWTCTAAT
--p-match-adapter-wildcards
--p-discard-untrimmed
--o-trimmed-sequences trimmed-seqs-16S.qza
--verbose

Afterwards, I performed DADA2 on the trimmed sequences using three different truncation lengths based on quality score thresholds: 20%, 25%, and 30% of the 25th percentile of the quality score.

I now have three sets of denoising statistics for each truncation length, but when viewing these datasets in QIIME View, I notice minimal variation in the results across the results ( percentage of merged , percentage of non chimeric). Also, about 25 samples have 0 percent in terms of merge and nonchimeric.

Given these results, I’m seeking advice on which truncation length and percentage might be most appropriate for my analysis. Are there specific factors I should consider to get the better result?
Thank you in advance.

Best regards,
Namraj

Here are the attached files:
denoising-stats254204.qzv (1.2 MB)
denoising-stats257209.qzv (1.2 MB)
denoising-stats275223.qzv (1.2 MB)
paired-end-demux.qzv (315.7 KB)
trimmed-seqs-16S.qzv (321.7 KB)

colinbrislawn · January 16, 2025, 6:16pm

Hello @Namraj_Jaishi,

This is a great post! I always find parameter sweeps very helpful. Thank you for sharing all your data.

Based on the quality, I like your choices for trim and trunc locations:

One goal is to keep the most data possible. The other goal is to keep only the highest quality data. So it's a tradeoff between quantity and quality, hopefully with a sweet spot in the middle.

In the third file, denoising-stats275223.qzv, a very small percentage of reads merge, I would not use that one.

The other two files are similar. Most samples have >11k reads in them, which is pretty good!

I noticed that just over 30 samples had less than 50 input reads.
I would say these "failed to sequence." This is pretty common for samples with low biomass, but it could indicate something else.

It could also be bad luck from the sequencing core, and it's worth asking them if they would run your samples again! Losing 30 samples from a cohort is really hard!

Let us know if you have more questions!

Namraj_Jaishi · January 16, 2025, 10:46pm

Thank you for your insights. I will get back to the sequencing center about low-input reads.

Best,
Namraj