I have generated some results using the feature table biom file from qiime-2. Then include those information in the Picrust platform. But the outcome of the analysis provides a list of Sample IDs and OTU numbers. So can anyone how this following information (OTU numbers with Feature ID) can be linked as I am trying to extract the following information: Kingdom, Phylum, Class, Order, Family, Genus, and Species.
Hi. There is a tutorials for it, for example, Taxonomical analisys from Moving Pictures. If you performed some analisys outside of Qiime 2 or using some statistical scripts within Qiime 2, you may want to filter your existing table according received output, so you can use filter-features before taxonomical analisys, using metadate .tsv file to indicate features to keep.
I am aware of the tanomony analysis. But I am not clear where these OTU numbers are generated while executing the metadata in qiime platform. Even I am aware that Kingdom, Phylum, Class, Order, Family, Genus, and Species are created by the tanomony command. But that information is indexed with the Feature ID. I am trying to figure it out at which point the OTU numbers are Feature IDs are linked.
The feature ID is the OTU ID, unless if you have gone and re-clustered your sequences in preparation for PICRUST.
If you have, then you have new sequences and you will need to work from the OTU IDs — and classify those sequences to see their taxonomic affiliation. If these are re-clustered, you will not be able to link the original feature ID to the new OTU ID unless if you have some type of OTU map that links these.
Does that answer your question? If not, please upload some examples (QZV files, screenshots of the contents of exported files, etc) so that we can see what types of data you are describing, and describe the commands that you ran. Thanks!
But here's a guess: the feature IDs in clustered_table.qza do not match the feature IDs in MiSeq_data_Aug18_V1_rep_seqs.qza and MiSeq_data_Aug18_V1_table.qza.
This is because cluster-features-closed-reference is clustering your sequences into OTUs. There is not currently a way to map those OTU IDs to the original IDs.
The picrust reference sequences may have taxonomy associated with them, so you can map to those IDs — otherwise, just reclassify your sequences to obtain taxonomic predictions.
I need some clarification regarding the Feature IDs that are generated and OTU IDs that are generated using the clustering algorithm. I made use of the following clustering command:
So this is generating the OTU IDs that are been clustered. The outcome in the folder shows a list of OTU IDs that are been clustered and unclustered. Moreover, I have seen that there is a significant reduction in the number of entries while clustering been performed.
So here are the following issues I am having:
Is there any way to map these Feature ID with the OTU ID generated from the clustering command?
Moreover, I want to know how this clustering been performed as I want to include more OTU IDs for further analysis.
Note I am performing this clustering as the Feature IDs generated by Qiime is unreadable in the green gene dataset in PiCrust platform. I have attached the ID files that are generated before and after the clustering command.
Unfortunately, not at this time. There hasn't yet been a super compelling reason to generate an OTU map in QIIME 2, but it isn't unreasonable to think it might happen someday.
Could you expand on this a little? For example the implementation is here, but I suspect you are looking for something else?
With closed reference, it's straight forward to include more data as all OTUs are defined by your reference database (downside is you lose anything your database didn't know about).
Yep, that's a great reason to do this step. There is a picrust2 in the works which can handle ASVs, but I haven't tried it out yet.
Please find the response regarding the issues I am facing:
Could you expand on this a little? For example, the implementation is here, but I suspect you are looking for something else?
I am trying to have a good understanding of the clustering methodology that been performed. I am trying to list down the list of Feature IDs that are considered in the clusters and the ones that are not considered inside the clusters. This is because if you look at the tsv files you will see that there is a good reduction in the number of features and I am trying to make sure more features can be considered for clustering.
Yep, that’s a great reason to do this step. There is a picrust2 in the works which can handle ASVs, but I haven’t tried it out yet.
I have not yet tried PiCrust2. Maybe I need to go through the documentation to have a good understanding of the changes. Moreover, I am working with S16 RNA Sequencing Data.
I see, if you wanted to explore more deeply, you could run vsearch directly. Closed-reference is a very simple bioinformatic protocol and the OTU map would be accessible at that point.
Otherwise losing a lot of features to close-reference is very typical. It's why open reference is often used.
Thanks alot. But is it possible to share further elaboration of the idea of using vsearch for deeper exploration? Moreover, what do you exactly mean by open reference I am not getting that point clear.
Running vsearch directly to cluster your sequences will allow you to generate a file mapping the OTU IDs to the original sequence IDs. I believe the --uc parameter is what you are looking for.
Closed-reference OTU picking throws out any sequences that do not have at least X % similarity to the reference database.
Open-reference OTU picking keeps those features, and performs de novo clustering to form new OTU clusters from the sequences that failed to cluster to the reference.
However, you can only use closed-reference OTU picking with picrust, since the point is that you are identifying genome sequences that have similar 16S sequences to your queries.
But now I want to reduce the rep-seq file that is originally generated along with the table. The reduced rep-seq file should have min freq. 8 and samples 3. So how can I do that.
I am trying to do this so that it works properly with the cluster command. If you need the reduced qzv file I can share up anytime.