`qiime feature-classifier extract-reads` running for days


(Anuj Gupta) #1

Hi,

I am trying to build a classifier using “greengenes” database. Here is the command I am using:

qiime feature-classifier extract-reads --i-sequences gg_13_5_otus.qza --p-f-primer AATGATACGGCGACCACCGAGATCTACACTATGGTAATTGTCCTACGGGAGGCAGCAG --p-r-primer CAAGCAGAAGACGGCATACGAGATGCCGCATTCGATNNNNNNNNNNNNCCGTCAATTCMTTTRAGT --o-reads gg_13_5_ref_seqs.qza

The above command has been running for 3 days but there’s no output. I have built greengenes based classifier in the past using the same set of input files (i.e. fasta & taxonomy file) but with a different set of primers (took few hours). But since it has been running for days now, I am wandering if I am doing something wrong?

I installed QIIME2 within a conda environment. Any help would be greatly appreciated.

Thanks
Anuj


(Nicholas Bokulich) #2

The problem is that you are giving that command a very difficult (LONG, plus lots of ambiguous/degenerate nucleotides) alignment to perform, which is causing the job to take a very long time.

Besides, it looks like you are probably using the wrong sequences as your “primers”. Only use the actual primer sequence, i.e., that aligns to biological DNA. Do not input the adapter sequences, barcode sequences, etc — you appear to have tossed all of that in there! Your reverse primer should probably be CCGTCAATTCMTTTRAGT and your forward maybe CCTACGGGAGGCAGCAG? I don’t know what primers you are actually using.


(Anuj Gupta) #3

Hi Nicholas,

Thanks a lot for getting back to me. You were correct about the primers, I confirmed the same with the authors of the data I am working on. Hopefully everything would go smoothly from now on.

Thanks again.

Anuj