Question about generation feature table with OTU ID (e.g. OTU1, OTU2,...)

Hi I am currently working on sequencing data analysis. The reads were filtered using DADA2, and clustering was performed with QIIME VSEARCH cluster-features-de-novo (–p-perc-identity 0.97)

Afterwards, I got the featureID +sequence length + sequence information as:

And I also generated a table_abundance.tsv at 0.97 sequence similarity like:

(I converted it to excel).

The featureID (OTU ID) is a string of characters… I saw papers which conducted OTU level analysis assigned the OTU as OTU1, OTU2, etc, I am wondering, is there a way to convert this feature table with OTU ID like OTU1, OTU2,… not just the characters?

Thanks so much for your help!


Have is a followup question regarding to the data analysis.
As mention before also in this paper ( Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding ),

The reads were filtered using DADA2, and clustering was performed with QIIME VSEARCH cluster-features-de-novo (–p-perc-identity 0.97)

I then run the command
qiime vsearch uchime-denovo
to remove possible chimeras.

I noticed that before qiime vsearch uchime-denovo, the cluster_table.qzv total frequency is 3014234, and the feature is 1149. After the command is total frequency dropped a little to 2977404, and the feature is 270. I am wondering what caused the drop of the feature numbers?


To answer the part two question, I realized that after removing the chimeras, some low frequency features are removed.

e.g. Minimum frequency before qiime vsearch uchime-denovo is 0; Minimum frequency after running qiime vsearch uchime-denovo is 20688.

Above should probably be a reason.

Hi @YaochunYu,

The OTU ID you’re getting reflects the sequence. Essentially, it’s a (mostly) unique identiifer for the sequence so you can’t get them confused. The problem with the older system was that there, OTU1 in study A could be a completely difference sequence than OTU 1 in study B. (The exception obviously working wtih reference databese where OTU 7323 was always clustered against OTU 7323.

My suggestion in QIIME is to work with the current hash because it is, in fact, more specific than the OTU number. You could then either publish with the hash, or convert the hash table names into something more palatable when you publish. I’ll be honest, the second thing is something I’m struggling with in my own work. My co-authors aren’t huge fans of the full hash, but its nice to have something that’s a property of the centroid sequence as my ID and can’t get mixed up.

Best,
Justine

5 Likes

Hi Justine,

Thanks so much for your explanation! That’s extremely clear and useful! And your suggestion about “convert the hash table names into something more palatable when publish” is great!

Frankly saying, I am a rookie of sequencing data analysis (:sweat_smile:) and recently just got the chance to analyze my own data set. I only learnt and performed the analysis on QIIME2…so when I realized my feature ID is not the “OTU number” that I read in papers, I am confused… but now I realized that the OTU ID with a specific number is generated by the older system (should be QIIME1?).

Thanks,

Yaochun

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.