benchmarking single/paired/join prior to denoising

Due to low quality sequence of the reverse read I wish to compare different strategies to deal with this problem- using only the forward read, using both pairs, joining reads followed denoising with deblur and different trimming length.
I’m wondering which measurements to use for this comparison and thought about:

  1. Number of denoised sequences per sample / fraction of denoised out of input sequences
  2. Number of features
    Since the read quality and length affect the taxonomy assignment I also with to measure:
  3. Distribution of taxonomy levels (assuming the longer features allows to assign more features to lower rank taxa such as genus and species)
  4. Taxonomy classification confidence (assuming that low quality sequence and shorter sequence length may reduce confidence).

Do you have any additional suggestion?
Is there an easy way to count the number of features that were assigned to each level?


Hi @yipinto,
You could certainly use all four, and these are all great ideas.

Unless if your goal is explicitly to benchmark, I’d recommend just going with #1 (that’s what I do on a routine basis), because that will give you more time to focus on the biology.

Remember to focus on the number of successfully joined reads with the paired-end data. Yield is important, but making sure you are not losing too many reads at joining is critical for ensuring that you are not biasing the composition of your reads by systematically excluding reads that are too long to join.

This is important if you care about alpha diversity, but we can expect that number of features will almost certainly differ between the single-end and joined reads (and between different denoising techniques, for that matter). So the differences you see may not be all that informative or useful.

We can assume this is almost certainly going to happen — but not necessarily, depending on your target. So not something I would do for routine decision-making, but definitely for a more extensive benchmark.

The confidence scores are really used by the classifier for decision-making, they are not really there for human consumption. This is because the classifiers use these scores to decide what taxonomic level can be confidently classified. So you could use two different approaches and observe equivalent confident scores but one method reports these scores for family level and the other at the genus level. Not useful for benchmarking.

Not in QIIME 2.

Thanks! @Nicholas_Bokulich
My goal is not to write a benchmarking paper but since we often get poor quality of the reverse reads so I’m trying to better understand the trade-off between the sequence length and quality.
So it seems that the fraction of denoised out of input sequences is that easiest and an informative measurement.

Why are reads are too long to join? (we use 250bp paired end for V4).

What do you think about looking at the distribution of feature length?

Thanks again!

Makes sense! I’d say focus on point #1 above (paying attention to the # of reads lost during filtering, joining, and total yield), and it could be instructive to compare taxonomy classification with each, though unless if you have a mock community or simulated community you cannot really choose which is “best”.


This is just a common problem with paired-end reads, depending on the amplicon. 250XPE should be plenty long for V4, so you probably will not have that problem.

It could be worth seeing, but keep in mind that you will see some variation. You can check out the literature to see the expected distribution.

A post was split to a new topic: can forward and reverse reads be truncated to different lengths?