Form of OTU (ASV)

feature-table

(Taeho Kim) #1

Hi,

I am working on 16s Microbiome data and trying to get ASV from the paired sequence data.
Fortunately, I have an example set of data; that is,
for the data, somebody already made OTU by using QIIME1.

So, my goal has been replicating this result in terms of the number and variety of features by using QIIME2.
Once I end up with a satisfactory pipeline which provides very close replication, I am planning to use this approach to the rest of the data.

Thanks to the responses in this forum, I could get very close ASV to the target OTU by using DADA2 procedure and GG taxonomy.
It looks like just a usual ASV. (e.g., level 6 has OTU_6_forward_GG.tsv (47.1 KB))

The thing is that the result based on the example data shows another form of OTU which looks like in the following txt file. (part of the data
OTU.txt (78.9 KB))

I am curious about the relation between the first ASV form and the OTU form in the txt file.
In particular, I am not sure why several duplicated taxonomys are assigned to the same OTU IDs, resulting in the number of rows becomes a lot larger (I cut it at 500 but the original # of rows is 2443).

I am not sure if I am delivering my question in the correct way.
But, any clue or hint would be very helpful to me.
Thank you in advance for your time.


(Nicholas Bokulich) #2

These are both OTU tables. The first is collapsed on taxonomic labels. The second is not collapsed on taxonomy, but each feature is appended with its taxonomic assignment.

ASVs/OTUs are merely unique sequences, not necessarily unique taxa. Many ASVs/OTUs can be classified as the same taxon. Your first table is shorter because it has been collapsed on taxonomy. E.g., see qiime taxa collapse


(Taeho Kim) #3

Thank you for the kind response!

Those were merely two different forms of OTU tables.
So, can I say the total frequencies are the same for both tables?

I know the first table can be achieved by the command: “qiime taxa collapse”.
Then, what kinds of command or procedure should I follow to achieve the second table?


(Nicholas Bokulich) #4

you will need to summarize the tables to determine that. These are in biom format so see the biom-format docs for how to do that.

This is not a QIIME 2 format. Take the QIIME 2 table that you would have used as input for the collapse method. Export to biom format. Export your taxonomy classifications file. Use biom add-metadata to annotate your biom table with taxonomy information. See biom-format docs for more information.


(Taeho Kim) #5

Thank you for the clarification!

To extract the second form. I referred some posts here and the biom format documentation. Basically, what I did was

qiime tools export --input-path feature-table.qza --output-path exported
qiime tools export --input-path taxonomy.qza --output-path exported

and after moving into the exported directory:

biom add-metadata -i feature-table.biom -o table-with-taxonomy.biom --observation-metadata-fp taxonomy.tsv --sc-separated taxonomy
biom convert -i table-with-taxonomy.biom -o OTU_taxo.tsv --to-tsv --header-key taxonomy

I found “–header-key taxonomy” quite critical as
it will not add the taxonomy column without it specified at the end.

By doing this, I have got very similar OTU except for the first column #OTU ID.
In the original table, it is just some integers. However, with the one I created, it consists of the sequence: 40c8a6895c058cb465aecbe5e0ad57c
8b62f571d04259f00a5697338b05827f

What is the meaning of the integer numbers?
(I am sorry if this is not about QIIME2.)
But, is there anyway, I could provide the number?
Do you think I need to prepare another kind of mapfile?


(Nicholas Bokulich) #6

Denoising methods use the md5sum of the sequence as the feature ID of that sequence. This makes it possible to merge different studies that use the same exact processing parameters.

de novo OTU clustering creates an arbitrary ID for each feature. Hence, there is no way to compare these features between studies or even multiple OTU clustering runs.

It is not impossible to compare these sequences, but it is somewhat abnormal — follow this solution:

I can’t really provide support if this fails because you are venturing into unknown territory. I still suspect the OTUs and ASVs will not be comparable since you probably processed with different parameters. It would be more straightforward and feasible to import the qiime1 OTU table into QIIME 2 and collapse both tables on taxonomy.

Good luck!


(system) #7

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.