taxonomic classification after feature distinction

Lennon_Lee · October 20, 2020, 8:19am

Dear all,

I have performed 16S amplicon data analysis through QIIME2 several times. I use my own macbook (16G RAM) to process the data. I get the program killed every time I train the naive bayes classifier with the input of SILVA reference database because of the short of memory (GreenGenes works).

I run DADA2 for sequence denoising. I am wondering, could I perform feature distinction before the taxonomic classification? For example, I would like to perform ANCOM (or ALDEx2, LEfSe) with the input of the ASV table, and obtain the distinct features. With only a few of distinct features, I can easily BLAST their representative sequences and get the taxonomic information.

Will that be any problem？

Thank you very much

llenzi · October 20, 2020, 1:12pm

Hi @Lennon_Lee,

Sure, please have a look at the Moving picture tutorial (https://docs.qiime2.org/2020.8/tutorials/moving-pictures/), the command proposed for the differential abundance test start with the ‘table.qza’ which is the output of dada2 (or deblur depend which you prefer). You may still want to consider a bit of filtering before the ANCOM test, to remove possible noise and lowering the running time.
ALDEX works on the output of dada2/deblur, so same apply here.
On the other side, I think LEfSe requires taxonomical assigned ASVs to work with, so you may not be able to perform this.

Good luck

Nicholas_Bokulich · October 20, 2020, 1:50pm

Just adding to @llenzi's great advice:

Since you are using 16S data I recommend using one of the pre-trained 16S classifiers that we supply on the QIIME 2 website. You do not need to train your own. You could also use one of the other classifiers, which tend to have lower memory requirements.

Yes in theory that is fine, just a couple caveats:

you need to inspect the BLAST results closely. There can be other equally good or nearly as good hits in the output, and you need to manually inspect coverage and % identity to evaluate how reliable the hits are, and whether you "trust" the species-level hits you will receive. QIIME 2 automates this process for you to evaluate the confidence of classification.
you will not have taxonomic information on the other features present, so you are limiting the amount of information you obtain from the data! At the very least I recommend using the pre-trained classifier of choice to classify everything and look at overall taxonomic composition, then you can use BLAST to see what it says about the significant ASVs.

Good luck!

Lennon_Lee · October 20, 2020, 2:42pm

Thanks @llenzi’s and @Nicholas_Bokulich’s very helpful advices!!

system · November 20, 2020, 8:42pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.