over 70 % of reads are assigned as d__Bacteria;__;__;__;__;__;__ qiime feature-classifier classify-sklearn

Hi @jrhaulung,

It appears the SILVA does not have many representatives for Paramecia. If you go here and search for Paramecium bursaria you'll see that there are ~ 29 entries. But if you search for Paramecium you'll observe 262 references. Many of them quite short and may not span the V1V2 region.

I downloaded your data and trained the calssifier on the full SILVA 138.2 database, and essentially observed the same results. Given the lack of references I mentioned above and the many hits I obtain with GenBank, perhaps make your own database from GenBank? Many of your reads hit quite well there.

You can modify the above tutorials as needed to pull down the taxonomic groups / gene sequences you are interested in.

-Mike

3 Likes