ESV vs OTU after dada2

kmz · February 13, 2020, 10:53pm

Hello,

I have a general question about ESV and OTU. Normally, I perform analysis in dada2 and assume each unique ESV corresponds to a unique species. As a sanity check, I wanted to obtain the OTU composition of my data (16s paired end sequences). I basically followed the Moving pictures tutorial and created the file table.qza (using the dada2 option).

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza --o-denoising-stats stats-dada2.qza
--p-trunc-len-f 0
--p-trunc-len-r 0

My reads are 151 bp long from either ends. The read quality was good, so I did not want to trim. I tried to use trunc-len to be 150 for both forward and reverse, but even after 24h the code wasn't completed. So, I changed it to 0.

After this, I did:

qiime feature-table summarize
--i-table table.qza
--o-visualization table.qzv
--m-sample-metadata-file sample-metadata.tsv
qiime feature-table tabulate-seqs
--i-data rep-seqs.qza
--o-visualization rep-seqs.qzv

Next, I exported this to a feature table:

qiime tools export
--input-path table.qza
--output-path exported-feature-table

Then I changed directory to exported-feature-table and did:

biom convert -i feature-table.biom -o table.tsv --to-tsv

Now, this .tsv file has 1319 OTUs, whereas I had 1653 ESVs.

I have 2 questions:

Is what I obtained really OTU? Or did I make an error somewhere e.g. by setting trunc-len to 0?
I had thought I would obtain a much smaller number or OTUs (compared to the number of ESVs I had). While I'm happy that there was not a huge collapse, I was wondering if this is normal? (These are soil microbial communities cultured in lab)

I would appreciate any thoughts.

Mehrbod_Estaki · February 13, 2020, 11:35pm

Hi @kmz,
First, just a note on the terminology so we are on the same page, when you use DADA2, you obtain amplicon sequence variants (ASV) or ESVs if you want to call them, but simply exporting them out of qiime2 and converting the biom to a .tsv does not mean they are OTUs. OTUs are traditionally referred to sequence bins that have been clustered at some percent identity, typically 97%. So a 100% OTU would be like an ASV, where even a single nucleotide difference between sequences will be called a unique variant, or OTU if you will.

Ok, this one also needs a quick clarification. This is not technically true either as many different ASVs can and often do correspond to the same taxa (species for example). The best way to think about ASVs is just treat them as biologically informative sequences and assigning taxonomies to them is a separate procedure all together.

Truncating your reads vs not truncating shouldn't really be affecting how long DADA2 takes, especially since you were only truncating 1. What you want to be doing to speed up the process is to increase the number of cores you dedicate to the task using the --p-n-threads parameter.

This is odd, they should be identical. Can you clarify how you determined these 2 values? Are you sure the same feature-table was used in both situations?

Since you haven't actually collapsed your ASVs into OTUs, the expectation should be that they should be the same.

kmz · February 14, 2020, 6:09am

Hi @Mehrbod_Estaki,

Thanks a lot for your detailed response!

Yes, indeed I am aware of this, which is why I want to compare the ESVs vs OTUs.

So, according to the moving pictures tutorial, after demultiplexing the reads, and de-noising (in my case using dada2), I followed the steps that result in a "feature table". Aren't the reads clustered into OTUs in the feature table? I thought they would be because, when I exported the feature table to a .tsv, the column header said #OTU. I read this post, and followed your advice there. I didn't do the last step because I don't really need the actual names of the species, which is what I thought the last step was doing. But I guess I was wrong - I guess this will tell me what needs to be done.

The data was the same but the pipelines were different. For one case I did excatly what the moving pictures tutorial suggested (which I described in the first post), and for the other, I used the dada2 pipeline in R. Could that explain the difference?

Thanks!

Mehrbod_Estaki · February 14, 2020, 6:56am

HI @kmz,

Not after DADA2. At this step you get a feature-table of ASVs by samples.

I totally get that this can be a bit confusing, but since you exported ASVs, the output will also be ASVs. If you had done OTU picking with vsearch and then exported, they would still be called otus but they would be true OTUs.
The column name is called OTU purely because of historic reasons, before denoising methods such as DADA2 that gave ASVs, everyone was using OTUs and biom tables so there was no reason to call them anything else.

The last step in that post is simply adding a separate taxonomy file to the biom table. Taxonomy is a completely separate process in these pipelines, so you're right to ignore that step. The final link you see that has OTU picking with vsearch is indeed the right tutorial to follow if you want to go from ASVs -> OTUs. To be honest, I'm not sure how useful is comparing ASVs and OTUs in this scenario, they are different beasts all together and the choice of picking one over the other should be driven by the experiment question. I'll extend to say that in most cases you should use the ASVs as they are higher resolution than OTUs.

Yes! Totally. Even though you are using the core DADA2 algorithm in both scenarios, there may be some small differences between the two versions employed, their default parameters, and small changes between DADA2 version in Qiime2 and the one used in R.

kmz · February 14, 2020, 2:55pm

Thank you @Mehrbod_Estaki! I used the vsearch tool and did denovo clustering at 99%. It worked. I still have 1095 OTUs. I will play around with the percentages and see how it affects my results.

system · March 16, 2020, 9:02pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.