Clustering and taxonomy output

Isis_Guibert · October 7, 2020, 7:40am

Hi,

I made it through the taxonomy identification but I have 2 questions to make sure that I did everything well.

I am working with COI sequences and followed the moving pictures tutorial. I imported my data, used cutadapt to cut my primers, summarized my data to choose the trunc lenght, and then did the truncation and denoising part using dada2 denoise-paired. After that I created the feature table summary. I followed the tutorial and there is no clustering step, however while reading the tutorial overview it's stipulated that we should cluster after denoising. Is the clustering done with "dada2 denoise-paired" or did a missed a step? Should I follow another tutorial?

With the feature table summary, I then explored the alpha and beta diversity. After that I trained my database using the midori database for COI sequences and started to explore my data using "taxa boxplot". In the csv file I have now a number corresponding to my samples and the level chosen. I would like to now if this number is the "number of UNIQUE features". This is important for my data because I am working with COI. I read about why you are using the word "feature" and what does that means but that part is still not clear for me, sorry. What nomenclature are you using for publishing? Feature?

Thank you for your help

Nicholas_Bokulich · October 9, 2020, 2:06pm

Does it really say that? to my recollection it says that clustering is optional but unnecessary, but maybe the wording is vague and needs fixing. That's what it should say.

It dereplicates your data but does not cluster. So that is what you want, no other step required.

As a matter of fact, clustering is not recommended for COI data specifically, because it degrades resolution:
https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.6594

No, the CSV that you download from the visualization reports the number of sequences assigned to each taxon.

A feature is any kind of observation you have in a feature table. So it could be ASVs, OTUs, metabolites, etc. In your case it would be COI ASVs.

In publication I often use "feature" or "ASV", but it largely depends on the nomenclature standards of the domain. Maybe check out what they authors used in the COI paper I linked above, and follow what they did?

I hope that helps!

Isis_Guibert · October 12, 2020, 5:55am

Thank you!

You are right, It's not written that we should, but the clustering step is just after the denoising so it was not clear in my ind that is was optional. Thanks for the clarification and the paper.

If my CSV file reports the number of sequences assigned to each taxon, how do I do to get a CSV file with the unique COI ASVs for each taxon?

Thank you

Nicholas_Bokulich · October 12, 2020, 6:53am

If you want the number of unique ASVs assigned to each taxon, there is not a straightforward way to accomplish this in QIIME 2, though it would be relatively simple to whip up a solution in R or python, see here for some related discussion:

system · November 12, 2020, 12:53pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.