Questions for classifier training

Hi!

I have a question about classifier training, especially about extracting reference reads.

(1) On the tutorial website (https://docs.qiime2.org/2017.9/tutorials/feature-classifier/),
it says that “We know from the Moving Pictures tutorial that the sequence reads that we’re trying to classify are 100-base single-end reads that were amplified with the 515F/806R primer pair.” and we need to use following flags
qiime feature-classifier extract-reads
–i-sequences 85_otus.qza
–p-f-primer GTGCCAGCMGCCGCGGTAA
–p-r-primer GGACTACHVGGGTWTCTAAT
–p-trunc-len 100
–o-reads ref-seqs.qza

But, in the moving picture tutorial, I understand that it truncate the sequences at 120 bases.
So, should I change “–p-trunc-len 100” to “–p-trunc-len 120”?

(2) For my data, I use 150 bps of forward and reverse sequence like “atacama soil” tutorial, and I don’t trim or truncate any bases when I did denoise-paired flags.
For more clarification I used following flags,

qiime dada2 denoise-paired
–i-demultiplexed-seqs demux.qza
–o-table table
–o-representative-sequences rep-seqs
–p-trim-left-f 0
–p-trim-left-r 0
–p-trunc-len-f 150
–p-trunc-len-r 150,

If I want to train classifier based on this data using 97% OTUs greengene database,
should I use following flags?

qiime feature-classifier extract-reads
–i-sequences 85_otus.qza
–p-f-primer GTGCCAGCMGCCGCGGTAA
–p-r-primer GGACTACHVGGGTWTCTAAT
–p-trunc-len 100
–o-reads ref-seqs.qza

(3) This is my last and (maybe) very basic question. According to the sequencing information, the forward and primer sequences are FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT (for 515F-806R) which contains Y, M or W.
Can I just copy and paste above primer sequence for qiime feature-classifier extract-reads flags?

I think I have too many questions… Thank you very much in advance for your time and kind help!
Thank you.

1 Like

Hi @scho73, thanks for your questions.

(1) Yes, --p-trunc-len should be 120 to match the Moving Pictures tutorial. Either I got it wrong when I wrote the tutorial or the Moving Pictures tutorial has changed.

(2) Change --p-trunc-len to 300 to cover both reads. It doesn’t matter if it’s longer than it has to be. @Nicholas_Bokulich, please correct me if I’m wrong here. (I’m not very familiar with dada2.) Also, it looks like you’re using 85% OTUs, not 97% OTUs.

(3) extract-reads can handle ambiguous characters, so yes, just leave them in.

4 Likes

@scho73 –p-trunc-len-f and –p-trunc-len-r should both be set to 150 as you have them. These control the forward and reverse read trimming lengths and only need to be long enough to permit some overlap between reads; they do not need to cover the full V4 amplicon length. Thanks for checking @BenKaehler !

2 Likes

Thanks @BenKaehler - to follow up on #1 - we have opened an issue to update the tutorial with the correct trunc-len value — thanks for reporting! :t_rex:

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

The QIIME 2 2017.11 release has the incorrect trunc-len value fixed in the tutorial! :tada: