Import RDP classifier output taxonomy file into qiime2

yileiwu · July 29, 2021, 4:09pm

python3 script.py will print out the usage.

SoilRotifer · July 30, 2021, 5:01pm

Thanks for this clarification. It was a bit unclear to me which step you were referring, making a reference database for your classifier, or filtering your data after classification. It seems like there are two separate discussions occurring here.

If you only care about filtering after classification, then it seems you are already doing everything correctly as you have:

qiime taxa filter-table \
    --i-table table-dada2-4.qza \
    --i-taxonomy ncbi-taxonomy-4.qza \
    --p-include k__Bacteria \
    --o-filtered-table ncbi-Bacteria-dada2-table-4.qza \

I'd suggest the more explicit approach of filtering by exclusion. As it is easier to say what you'd removed. But that is just my personal preference.

--p-exclude d__Archaea,d__Eukaryota,Unclassified

An easier and more consistent way to filter you sequence file would be to use your new table to keep the matching sequences. That is, instead of doing this:

qiime taxa filter-seqs \
  --i-sequences rep-seq-dada2-4.qza \
  --i-taxonomy ncbi-taxonomy-4.qza \
  --p-include d__Bacteria  \
  --o-filtered-sequences ncbi-Bacteria-sequences-4.qza

Do this:

qiime feature-table filter-seqs \
    --i-data ./rep-seq-dada2-4.qza \
    --i-table ./ncbi-Bacteria-dada2-table-4.qza  \
    --o-filtered-data ./ncbi-Bacteria-sequences-4.qza

This way you do not have to worry about typing the same filtering commands twice.

I assume you mean how to you remove multiple unwanted features with unreliable taxonomy, etc? Take a look at the examples referenced here:

-Mike

system · August 30, 2021, 11:01pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.