Seeking Advice on Analyzing Overlap in Taxonomic Classification Results

Antani · December 29, 2023, 11:41am

Dear QIIME2 Community,

I'm currently working on the analysis of my samples where both ITS1 and ITS2 regions have been amplified. I conducted separate analyses for each set of samples.

I used the qiime feature-classifier classify-sklearn method to classify my representative sequences. Now, I'm interested in understanding the level of taxonomic overlap between the ITS1 and ITS2 sets.

Specifically, I would like to know how many sequences have been classified as the same in both datasets. Could you please provide suggestions on how I can obtain this information? Ideally, I would like to generate a list of accession numbers for these shared sequences.

Thank you so much for your assistance and any suggestions you can offer.

Best,
Edo

lizgehret · December 29, 2023, 7:51pm

Hi @Antani!

So, just to make sure I understand - you used the same taxonomic classifier on two sets of rep seqs? One with the ITS1 region amplified, and the other with ITS2?

Antani · December 29, 2023, 11:37pm

Hi @lizgehret
Correct, same taxonomic classifier on both.
I've used the Unite ITS database v9.

Thank you so much for your help!

lizgehret · January 5, 2024, 11:02pm

Hey @Antani,

So sorry for the delayed response here!

Unfortunately there isn't an action within QIIME 2 that will accomplish this. I think your best bet would be to try and compare your rep-seqs files in Python or R - returning a list of IDs for all matching values between the two files.

Hope this helps! Cheers

lizgehret · January 8, 2024, 4:22pm

Hi @Antani,

Just following up on this - there are a couple of options within QIIME 2 that will get you some of what you want (and might get you close enough to what you're wanting without having to write a custom Python/R script).

q2-quality-control has the evaulate taxonomy action, which compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. reSCRIPt also has an action with the same name that will produce a similar output.

These actions won't get you a list of IDs, but will provide you with general quantification of the amount of overlap between the two datasets.

Hope this helps! Cheers

Nicholas_Bokulich · January 8, 2024, 5:20pm

Hi @Antani ,

See also qiime rescript evaluate-classifications.

Good luck!

system · February 8, 2024, 11:21pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.