18s data to analysis protist

Dear all,
I want to use my 18s data to analysis protist. Now, I finished the taxonomy classify after silva 132 database in qiime2 2018.11. But I dont know how can I get only protist's sequence. I dont know I should remove taxonomy on which level. And why Fungi will exist in D3 level.

Hi @YuZhang,

If you are only interested in protists, you can filter your data by taxonomy.

PCR & Sequencing primers are always a bit “leaky” and often will hit off-targets. In this case I would not be surprised if you hit many non-protist eukaryotes.

A couple of tips:

  • Update to the newly QIIME 2 formatted SILVA 138 reference databases available here. Or you can make your own SILVA db using RESCRIPt. I recommend these as these will have a GreenGenes-like taxonomy annotation, none of that D_2__, … mess. :slight_smile:

  • Also, there is a protist specific reference database you can try, PR2, and the PR2 primer database. I’ve not made use of this myself, but figured it might be of interest to you.


1 Like

Thanks,Sir. maybe I used the wrong referencedatabases. My qiime2 cant update due to some reason, so ,the easy meathod is using the greengene,right?

If I update qiime2,can I use the table.qza generated from the old version directly on the new version? or continue other analysis use the file generated from the old version?

Hi @YuZhang
I highly recommend to use the SLIVA database.
Updating Qiime2 is not what as you think. In fact, you download the new version the way you could be able to use the old-installed version too.
Based on my experience, you can use the previously generated table in the new Qiimer2 version.
Good luck

before ,I used the silva database.but the result is mess. like my first post,Idont know how to exclude un protist taxonomy. If I filter your data by [taxonomy] ,I should know whicn are protise.So , Where I can get it?

Mike suggested you lovely methods! You can try them one by one. If you do not want to do more computational steps, I recommend you filter non-protist by taxonomy filtration method (remove or exclude non-protist). look at the feature-table plugin. He also shared its link above.
Please read the filtration section. You can get ideas for sure.
Keep you finger crossed.

Thanks ,sir. I know how to filter. My question now, I dont know which are protise ,which are not.
if there is a table list the taxomy belong to protise?

Sir, I used the greengene

but it seemly dont support the eukaryote

my command here:
qiime feature-classifier classify-sklearn
--i-classifier /share/disk0/database/16s_18s_database/qiime2_2018_11/gg-13-8-99-nb-classifier.qza
--i-reads core_data/rep-seq-dada2-1.qza
--o-classification core_data/taxonomy_gg.qza \

qiime metadata tabulate
--m-input-file core_data/taxonomy_gg.qza
--o-visualization core_data/taxonomy_gg.qzv \

qiime taxa barplot
--i-table core_data/table-dada2-1.qza
--i-taxonomy core_data/taxonomy_gg.qza
--m-metadata-file core_data/metadata.tsv
--o-visualization core_data/taxonomy-gg-bar-plots.qzv \

You should take the 18S database! Download the SILVA from the source here then open the file and pick up the 18S database instead of 16S! In my view, you want to explore eukaryote (protist) but you have used 16S database, if I am not mistaken! You have bacteria rankings in the screenshot you shared while they should not be seen in 18S database that’s why I think you went wrong.

Hi @YuZhang,

@TurboQiimer is correct, GreenGenes only contains 16S rRNA genes not 18S. SILVA and PR2 are your best options.

I would simply download the taxonomy data from the PR2 database I linked to you earlier in this thread, and compile a list of taxonomies you wish to keep or exclude. Protists are not a natural clade, so you’ll simply have to do the work of compiling the taxonomy. One place to start is the Protist wiki, I am not sure how accurate this is. But you can confirm by searching the listed taxonomies via the GenBank taxonomy page.

A quick approach would be to simply discard any data that are not Eukaryotes. Then you’ll have a smaller list of data to work through in order to determine if you have protists in your data.


1 Like

Thanks Sir!I will try it.

May be it is not the most efficient way to do it. However, I filtered my 18S feature table manually from higher eukaryotes. So I deleted from *.tsv file of the 18S feature table all taxonomic ids relevant to plants and animals. Subsequently, I converted the *. txt file into the biom file:

biom convert -i biom.txt -o biom_json.biom --table-type=“OTU table” --to-json

qiime tools import --input-path biom_json.biom --type ‘FeatureTable[Frequency]’ --input-format BIOMV100Format --output-path biom.qza

Subsequently, when you analyze alpha diversity of your samples using the new biom table, an index such as chao1, will have higher values.

Thanks,how did you identified which are plants and animals?

For instance, this is definitely belong to higher eukaryotes (animals):
D_0__Eukaryota; D_1__Opisthokonta; D_2__Holozoa; D_3__Metazoa; D_4__Animalia
D_0__Eukaryota; D_1__Archaeplastida; D_2__Chloroplastida; D_3__Charophyta; D_4__Phragmoplastophyta; D_5__Streptophyta; D_6__Embryophyta

Thanks, I got it. So,I should check it in D_4_? remove plant and animal? What’s left is the protist?

For my dataset, it was enough to remove all mentioned above entries. Animalia is easy to identify and remove. For plants, it might be tricky. Your dataset could be different, check for plant species.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.