Feature numbers

Dear all,

I saw there are 42,039 features obtained after quality control, but how comes only 978 features ended up in my feature table?

Detail: by using overview I can see from the metric section that number of features is 42039, and then I did taxonomy classification, and collapsed it to level 7. Then I exported it to a tsv format, but when I opened it, I could only see 978 rows in feature table, which is far less than 42039. How should I interpret this?

thanks

1 Like

Hi @hongwei2017,
Nothing to worry about — based on your description, it sounds like you have 42039 sequence variants (or OTUs) and 978 unique species-level taxa. In other words, 42039 unique sequences (or OTUs) were detected in your samples, but when you collapse at level 7 (species) you reduce the number of features because you are no longer looking at individual variants/OTUs, you are collapsing these features according to their taxonomic affiliation.

I hope that helps!

1 Like

Hi@Nichola_Boculich

Thanks a lot for your answer! I know there must be something happened when I collapsed the taxonomy to level 7, which then resulted a 978 features. But I am still not clear how it works? Does it eliminate those singletons from feature table? Because I think each of the single sequence is corresponding for one “observed taxonomy unit” and then they should all appear in the feature table. Otherwise, how can i know which otus were kept and which ones were removed, what are the criteria here.

Recently, my colleagues questioned me about this a lot, they claimed that too less taxonomy obtained compared with using QIIME1. And it is, if we only look at the final table. I don’t know how to convince then this is normal and expected.

Cheers

And, additionally, the alpha diversity is very low, only several hundreds for the observed species, which is much smaller than I see before. And we will expect there are thousands of species in soils.

Cheers
hongwei

Hi @ Nicholas

After reading more in this forum, I understand that in my case there were 42039 sequences (otus) obtained, but after collapse to species level, only 978 features maintained. So if I want to generate an otu (42039) table with taxonomy, what should I do? I guess I should calculate alpha diversity using the those 42039 otus but not the 978 features in my final feature table. What the logic behind we use feature but not otu now?

1 Like

Hi @hongwei2017,

What do you want to do with that OTU table with taxonomy? QIIME 2 feature tables do not contain taxonomy in the table; any QIIME 2 action that handles taxonomy labels either accepts a FeatureData[Taxonomy] artifact as input or operates on a collapsed feature table (so that taxonomic affiliations become the feature IDs). If you want an OTU table with taxonomy for use outside of QIIME2, you will need to export to a .biom table and use biom to merge metadata (taxonomic information) into the .biom table. biom is not part of QIIME2 and so any support questions you have should be made in the qiime1 forum.

That depends. Are you trying to count unique OTUs or unique taxa?

"Feature" is just a generic term for any type of observation that is contained within a feature table. Depending on the type of analysis that has been performed, these features may be OTUs (i.e., an OTU table), they may be sequence variants, they may be taxa, or even other data like gene or metabolite frequencies. So we do not "use" features now — we just use a different, more general terminology because many different types of feature data may be contained in the table.

If you are wondering why we use denoising methods like dada2 instead of OTU picking methods in QIIME2 (though OTU picking methods are still supported through q2-vsearch), you should refer to the literature.

I hope that helps!

1 Like

Hi @Nicholas_Bokulich

Many thanks for your explanations but I am still not sure what to do next. For now I have a feature table with taxonomy label using the qiime-collapse function. As I mentioned, about 978 rows in the table16S_max-ee-4-15000.tsv (673.9 KB). There is another exported file which is the feature table without taxonomy label and it contains 42039 rows (or should i say otus?). It seems like the first feature table merged a lot of information together therefore much less rows (978) were obtained, and I doubt it is of less resolution than the second one that has more than 42000 otus. With taxonomy at level 7 is not what I really want, what i need are all those sequences detected, and know what they are. Species level is not the end, there are also strains, it doesn't make sense for me to combine otus to make taxonomy at species level. I wish I will have a feature table contains more information rather than only 978 features. I have many data sets from using qiime1 and I see in each otu table, there are tens of thousand of otus (soil samples), which represent each of different sequences, and I think is much better.

I need to convince our statistician to use the results I got by using qiime2.

Hi Nicholas,

Then should I do statistics on the 42039 table or the table with 978 features? Is “collapse” step necessary to generate a table for the downstream statistical analysis?

Cheers

Hi @hongwei2017,

That all depends on your analysis goals. Are you trying to compare samples based on their taxonomic (e..g, species-level) composition or OTU/sequence variant composition?

No — unless if you wish to compare samples based on their taxonomic composition.

It sounds like you are mostly confused about the general workflow in QIIME2 and the fundamental nature of various data types. I recommend thoroughly reading the tutorials provided here to get a sense of the typical steps in a "basic" analysis, and the different sorts of analyses you will perform on taxonomic features versus actual sequence variants or OTUs.

Hi @Nicholas_Bokulich

Thank you for all your answers, and definitely they helped me.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.