Assigning SILVA taxonomy to open-reference OTU table


I am using qiime2-2019.1

I need to create a 97% clustered OTU table. I used the following commands to create the otu table

qiime vsearch cluster-features-open-reference
–i-table table.qza
–i-sequences rep-seqs.qza
–i-reference-sequences 97-silva-otu-repseqs.qza
–p-perc-identity 0.97
–o-clustered-table table-or-97-silva.qza
–o-clustered-sequences rep-seqs-or-97.qza
–o-new-reference-sequences new-ref-seqs-or-97.qza

This worked smoothly and I was able to get the BIOM file and download it. However, there is no taxonomy information in this table. So, from what I gathered from previous forum posts, I think I need to create a classifier from the reference sequences that came from the OTU table? This is how i tried to do that:

qiime feature-classifier classify-sklearn
–i-classifier /nfs/turbo/umms-yvjhuang/CAARS-silva/silva-132-99-515-806-nb-classifier.qza
–i-reads /nfs/turbo/umms-yvjhuang/CAARS-silva/new-ref-seqs-or-97.qza
–o-classification /nfs/turbo/umms-yvjhuang/CAARS-silva/taxonomy-97.qza

This gives a killed:9 error on my personal machine, which I have been told is a memory problem. I then tried it on our high performance cluster, in the high memory nodes and it gave a memory error.

I don’t understand how this step is using so much memory. Am I way off and doing the wrong thing? I just need the taxonomy file that goes with my otu table. Thank you!

You are doing the right thing and yes if you do open-reference OTU clustering you need to classify taxonomy separately.

There’s lots of advice in the forum archives for dealing with memory errors and reducing memory requirements with the sklearn classifier. Please search the archives for similar errors and solutions.

In your case, you may not be requesting enough resources from your HPC — if the forum archive recommendations do not help, chat with the HPC admins to make sure you are requesting the right amount of resources for your job.

Good luck!

1 Like

This job has been running on the HPC for 3 continuous days and is still not completed. Is this normal?

Yes — it sounds like you have a very very large number of sequences because you used OTU clustering. If you have not already, you may want to perform abundance-based filtering to remove spurious low-abundance OTUs.

It also looks like you are not taking advantage of multiprocessing. You can use the n_jobs parameter to parallelize this job (just make sure you are requesting the appropriate # of cores and memory from the HPC, talk to your admin for details on how to do this).

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.