V3-V4 region, training a classifier, parameters

Hi,

It can always feel a bit trail and error / slightly arbitrary at this point, but you can make informed decisions by following the walk through, so don’t worry! Sometimes you will feel you are going around in circles :sweat_smile:

Choosing parameter depends on a range of things (for example, your sequencing approach and target region length). In the tutorial you link to there are notes sections under the example that explains how to make an informed decision about these parameters.

For example for --p-trunc-length it says “ query sequences are trimmed to this same length or shorter” and goes on to explain that for the “classification of paired-end reads and untrimmed single-end reads, we recommend training a classifier on sequences that have been extracted at the appropriate primer sites, but are not trimmed.”

The next notes give similar insights in to the --p-min-length and --p-max-length choices, these can be used to remove the amplicons far outside the numbers you were aiming for and mentions how the additional trim parameters work.

So, if you think about your if your target amplicon is ~ 460bp, what is far outside that and what was your sequencing read length?

If you look across the forum many people have asked about training classifiers for the same region, maybe see what they settled on as well? For example, there is some useful conversation here or here

Hope that helps, :slightly_smiling_face: :dna:

Vic

3 Likes