some confusion about the feature table of qiime2

Daryl · February 18, 2020, 4:57am

I am a beginner in bioinformatics. As far as I know, there are three ways to generate feature table in qiime2:

dada2
delur

3)the method of dereplicate-sequences in q2-vesearch

I'm not sure whether the feature table generated by these three methods is the same thing. These three should be the same. If so, **feature table is nothing but dereplicate-sequences？ **

Some people said that feature table is different from OTUs and can be regarded as 100% OTUs. I know that OTU is the result of sequence clustering. So 100% OTU means no clustering, just the sequence after dereplicated?

So,feature table is a sequence set after dereplicate, feature table will generate OTUs if further clustering is carried out. Is this correct?

In addition, I found ASVs (example sequence variants) in some tutorials. Is this the alias of feature table?
Any answer to my question is appreciated.Thank you in advance.

jwdebelius · February 18, 2020, 9:41am

Hi @Daryl,

Let's take a step back. A "feature table" is a tabular representation of your data where we map samples to some kind of feature (ASVs, OTUs, species, metabolites, genes, cytokines, cities visited, nutrients... if you can represent it as a feature table, you can represent it as a feature table!). In QIIME (and microbiome bioinformatics specifically), the feature table typically looks something like this under the hood:


5	0	500
100	200	50
0	5000	30
0	0	10

Due to legacy reasons, we typically represent the features (here , , , ) as the rows and the samples (, , ) are the columns. We can also represent the data as relative abundance, rather than counts. But... those are details.

They primary ways to generate a feature table in QIIME 2 from amplicon data dada2, deblur, and clustering via vsearch to make OTUs (I recommend the tutorial here for more details on OTU clustering.) Essentially, though, after you take the dereplicated sequences from vsearch, you then cluster those to generate a table like the one above.

You can learn more about all the different kinds of features here:

A 100% OTU (also zOTU or sOTU [zeroOTU or subOTU]) is usually the same as an ESV (exact sequence variant) or ASV (amplicon sequence variant). This should be the output of one of the denosing algorithm: dada2 or deblur. You want to denoise to get this output because denoising addresses error in your sequences. If you use the directly dereplicated sequences, you'll introduce error from PCR and sequencing.

The ASVs or raw sequences can then be clustered into operational taxonomic units (OTUs) which are just clusters of sequences with some threshold of similarity.

Here's a really nice discussion about OTUs and ASVs, etc

ASVs and OTUs are one kind of feature a table can contain. But if you used something like q2-PICRUSt, your feature table might contain genes. If you use q2-taxa and collapsed your taxonomy, your feature table would contain the collapsed taxonomy. If you have metabolomics and you have q2-metabolomics then your features are molecules. You could even make a feature table out of metagenomics data using q2-metaphlan or q2-shogun.

Best,
Justine

Daryl · February 24, 2020, 1:57pm

Oh，Jesus.It must be too late to reply now！But please accept my sincere thanks. I had read your wards about the feature table.Very clear,very logical~the discussion about whether clustering or not clustering is very interesting and useful.I had read through everyone's view on that issue. To my best knowledge，the sequences after dereplicated is should be ASVs.Of course,coures,the fiter and quality control is requisite.however,otherviews suggest that sequences must turn into ASVs after deduplicated and 100% clustered.it can compare the output of dereplicate-sequences: Dereplicate sequences. and cluster-features-de-novo: De novo clustering of features..by the way, i found some sentence in the tutorial of OTU picking strategies in QIIME can may support my point that [If you’re interested only in dereplicating sequences as your OTU picking process, that is a special case of de novo clustering where the similarity threshold is 100%. ](OTU picking strategies in QIIME — Homepage

jwdebelius · February 24, 2020, 2:46pm

Hi @Daryl,

The topic is still open so it's certainly not to late! You may want to look into this topic:

However, your dereplicated sequences dont account for PCR error, which denoising solves!

Best,
Justine

Daryl · February 24, 2020, 2:49pm

ok~i will see that
best wishe