Taxonomy feature frequency

Hi all-
In Q2 I generated a dada2 feature-table showing the following overview summary numbers

of samples 1 (working through these 1 sample at a time)

of features 36

Total frequency 19,781 (I assume these are the total number of sequences that passed filer in dada2)

The feature detail then goes on to breakdown the frequency of sequences within each feature. I pushed these through the feature-classifier and generated a taxonomy artifact / visualization.

Due to the low number (relatively) of features, I manually matched up the feature ID's from the feature detail (dada2 table output) the taxonomy .qzv (exported into Excel) so I could have the frequencies observed paired to the taxons.

Since both files share the same Feature ID designations it seemed reasonable to expect that there was a way to combine these so I wouldn't have to do it manually - especially for more complex samples. I found a thread that discussed how to do this using the $qiime taxa collapse command...which I ran passing the --p-level 6 (in order to collapse to genus-level). I then extracted the collapsed table and looked at the .biom file created. Which displayed the following:

My question is why some genus-level classifications were not listed in the collapsed table .biom file? Why is the k__Bacteria only (8) feature included? The ones that are listed have the correct frequencies shown. The bulk of the sequences were clustered into 6 features - all of which were ID'd out to g__Streptococcus...yet they were not represented in the collapsed table?

Any help would be appreciated!
S

Hi @Sausage_Mahoney!!

Regarding all of your steps up to your final screenshot, I think that all makes sense — there are certainly some pain-points, and hopefully we will be able to remove some of those hurdles in the near-future.

Your screenshot is just showing the head, or first few lines of the biom file — what does it look like if you run feature-table summarize on the collapsed feature table, which will show you details about the entire file?

Because that particular sequence is only annotated to the kingdom level. The collapse tool needs to pad out the missing levels below it because other tools tend to expect that. This particular sequence variant is the same that is listed in your spreadsheet screenshot as id 0b3e3d7.....

See my note/question above about the feature-table summarize — I suspect the taxon is present, you just aren’t seeing it in the head listing of the biom file.


Keep us posted! Thanks for all the great screenshots, this really helped out! :tada:

2 Likes

Hi @thermokarst!
Got it all sorted out…thanks for the assist! User support from QIIME is on point.:trophy:
Best
S :hotdog:

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.