Trimming sequences for classifier training

eDNA · August 10, 2021, 11:45pm

I plan to follow the tutorial “Training feature classifiers with q2-feature-classifier" to get a classifier using bold_derep1_seqs.qza and bold_derep1_taxa.qza files.

This means I will use “qiime feature-classifier extract-reads” instead of Steps 4-6 of this tutorial. I wonder if there is any difference between the two for trimming sequences outside the primer region: “feature-classifier extract-reads” and Steps 4-6.

Any concerns or suggestions?

How long it may take for "extrac-reads"? My MacBook has a 2.6GHz Intel Core i7 processor, and 16 GB 2133 MHz LPDDR3 memory.

Nicholas_Bokulich · August 16, 2021, 2:52pm

Hi @eDNA ,
I think that @devonorourke used steps 4-6 because the primers were absent from some reference sequences so the multiple sequence alignment and positional trimming was needed as a workaround.

We now have a function for this in RESCRIPt called trim-alignment — the difference with extract-reads is that it will trim at a specific site (instead of only trimming reads that contain the primer and discarding the rest). So now this workflow could be more fully accomplished with QIIME 2.

I am not sure... but BOLD is very large, so this will take a long time to align the sequences, trim the sequences, and train the classifier... 16 GB RAM is most likely not enough. @devonorourke has shared his trimmed sequences and pre-trained classifiers as linked from the tutorial — I recommend using those to save yourself a month or more of trouble if you can!

system · September 16, 2021, 8:52pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.