Multiple orientation repseqs (not fastq!): feature-classifier extract-reads modifier?

devonorourke · January 14, 2019, 10:03pm

After testing out the vsearch and blast-style consensus classifiers, I dabbled with the scikit learn approach using the same cobbled together COI database I created for the alignment-based classifiers. However when I went to run the extract reads step to ensure that my reference sequences had the primers I used to amplify my COI sequences, maybe only 10% of the reference sequences remained. It appears my reads are in mixed orientations, and I suspect that might have something to do with the low number of filtered sequences I get in the output.

Is there any way that the extract reads script could be modified so that the primer sequence queries can be searched in both the forward and reverse complement? Maybe something around this part of the code could be manipulated so that the user can enter the forward and reverse primer sequences just once, but an additional parameter to search for primers in both directions could be added in... kind of like what is done anyway with alignment searches in vsearch and blast with the --p-strand both flag, I think?

The argument could look like this:

qiime feature-classifier extract-reads \
  --i-sequences unfiltered.repseqs.qza --o-reads primerfiltered.repseqs.qza \
  --p-f-primer FORWARDPRIMER--p-r-primer REVERSEPRIMER \
  --p-min-length 200 --p-max-length 500 \
  --p-strand both

Apologies if this already exists and I'm just missing something completely.
Thanks for the consideration!

Nicholas_Bokulich · January 15, 2019, 2:31pm

Thanks @devonorourke! I have opened an issue to track this suggestion. I agree, this would be really useful.