Phred score cutoff vs. sequence length for an unbiased comparison between different cohorts

Nicholas_Bokulich · March 13, 2020, 3:03pm

Hi @Parix,
Great questions!

I am assuming these are from different sequencing runs?

The most important thing is that the sequence lengths are the same if you plan to use denoising methods to denoise and dereplicate. If sequences are not the same length, each run will have its own unique ASVs and you would have trouble comparing them.

So if the reads are single-end it will be important to trim them at the same lengths. If they are paired end, then theoretically the merged reads should yield the same length and same amplicons.

Another way to handle this if you must trim to different lengths is to use closed-reference OTU clustering to cluster the reads against known full-length reference sequences. It's a lazy but efficient way to solve what is sometimes an intractable problem.

You should rarefy all to the same depth if you plan to compare these for alpha or beta diversity analyses. Otherwise it would be like s and s!

Good luck!