Training classifiers

Hi,

I had seen answers to train classifiers for V3 to V4 region of the 16s rRNA gene and currently im tring it out myself.

The command that was suggested by the qiime2 tutorial was;

qiime feature-classifier extract-reads
–i-sequences 85_otus.qza
–p-f-primer GTGCCAGCMGCCGCGGTAA
–p-r-primer GGACTACHVGGGTWTCTAAT
–p-trunc-len 120
–p-min-length 100
–p-max-length 400
–o-reads ref-seqs.qza

However, I am not very clear with the “trunc”, “min-length” and “max-length” option as how the value may affect the output classifier.

I have also search for similar discussion on the values that needed to be included.

Currently I am using the following commands;

qiime feature-classifier extract-reads \

–i-sequences 99_otus.qza
–p-f-primer CCTACGGGNGGCWGCAG
–p-r-primer GACTACHVGGGTATCTAATCC
–o-reads ref-seqs.qza

and (from reading posted answers)

qiime feature-classifier extract-reads
–i-sequences 99_otus.qza
–p-f-primer CCTACGGGNGGCWGCAG
–p-r-primer GACTACHVGGGTATCTAATCC
–p-min-length 30 --o-reads ref-seqs.qza

The main thing here that I currently is very confused is the “trunc”, “min-length” and “max-length” option.

I am sorry if this is a stupid question but I’m very new to microbiome analysis.

Thank you in advance for your time and consideration.

1 Like

Good morning!

There is a tutorial just for training a feature classifier and extracting reads, and it discusses these settings. Have you tried this tutorial? :point_down:
https://docs.qiime2.org/2020.2/tutorials/feature-classifier/

I think this question is great! There's a lot to learn about Qiime 2 and it takes a while to get used to it. If you have more question after trying that tutorial, I'm always happy to help.

Colin

1 Like

Hi @colinbrislawn

Yes I’ve already tried the tutorial from https://docs.qiime2.org/2020.2/tutorials/feature-classifier/ and I was able to train the classifier.

However, I was unable to exactly understand the;

–p-trunc-len
–p-min-length
–p-max-length

options. Even so, I trained two classifiers with one having the minimum length of 30bp and the other to have to minimum length and maximum length and compared the results. There was no differences between the results.

I am just worried if the maximum and minimum as well as truncate option is an important requirement for the analysis. The type of sample I’m using is the V3 to V4 region of 16s rRNA gene paired-end reads Illumina 2X300.

Thank you in advance.

1 Like

Good morning @farisfauzimuhammad,

What questions did you have about these settings? We are always looking to make the documentation more helpful.

Comparing methods is always a good idea! I'm glad the results are consistent. But this is a little surprising because, just like you said, the read lengths you use for classification should impact performance, with longer reads usually being better. :thinking:

Let's see if some of the Qiime devs have any advice!

Colin

No these are not critical parameters. The descriptions of what these do and what effect they may have on results are described in the tutorial that @colinbrislawn directed you to.

This is not surprising, this is not really impacting the lengths of most (or all?) of the sequences that you extracted. The min/max lengths just filter out aberrant sequences that are probably mismatched hits and occur with some databases, not a common occurrence. The min/max length parameters are only used as a safety catch to avoid issues with misprimed hits.

2 Likes

Thank you colinbrislawn and Nicholas_Bokulich.

colinbrislawn I was not able to understand that if its necessary to input values for;

–p-trunc-len
–p-min-length
–p-max-length

But as mentioned by Nicholas_Bokulich, it served as safety measures in analysing the data. I think I finally understands the underlaying reasons for the commands. I will later try to varied the values to truncate, min- and max- length to further understand the differences.

I also would like to add a suggestion regarding the documentation, where, if it is possible to add more explanation as of why the values should be or not be inserted in the

–p-trunc-len
–p-min-length
–p-max-length

I am confident that the explanations in the tutorial was enough and very helpful. But, more explanations on this part would be so much help especially to a newbie like me and others who are starting to do microbiome analysis.

Thank you colinbrislawn and Nicholas_Bokulich for the explanations and supports. It was very helpful.

1 Like