Trying to link the OTU Numbers from feature table (biom file) with the Feature ID (#q2:types)

dheeman00 · August 26, 2018, 5:04pm

Hi All

I have generated some results using the feature table biom file from qiime-2. Then include those information in the Picrust platform. But the outcome of the analysis provides a list of Sample IDs and OTU numbers. So can anyone how this following information (OTU numbers with Feature ID) can be linked as I am trying to extract the following information: Kingdom, Phylum, Class, Order, Family, Genus, and Species.

timanix · August 26, 2018, 6:35pm

Hi. There is a tutorials for it, for example, Taxonomical analisys from Moving Pictures. If you performed some analisys outside of Qiime 2 or using some statistical scripts within Qiime 2, you may want to filter your existing table according received output, so you can use filter-features before taxonomical analisys, using metadate .tsv file to indicate features to keep.

dheeman00 · August 26, 2018, 10:35pm

Hi

I am aware of the tanomony analysis. But I am not clear where these OTU numbers are generated while executing the metadata in qiime platform. Even I am aware that Kingdom, Phylum, Class, Order, Family, Genus, and Species are created by the tanomony command. But that information is indexed with the Feature ID. I am trying to figure it out at which point the OTU numbers are Feature IDs are linked.

timanix · August 27, 2018, 4:12am

Oh, maybe it's when you are using classifier, if I understood you correctly

Nicholas_Bokulich · August 27, 2018, 1:55pm

The feature ID is the OTU ID, unless if you have gone and re-clustered your sequences in preparation for PICRUST.

If you have, then you have new sequences and you will need to work from the OTU IDs — and classify those sequences to see their taxonomic affiliation. If these are re-clustered, you will not be able to link the original feature ID to the new OTU ID unless if you have some type of OTU map that links these.

Does that answer your question? If not, please upload some examples (QZV files, screenshots of the contents of exported files, etc) so that we can see what types of data you are describing, and describe the commands that you ran. Thanks!

dheeman00 · September 5, 2018, 6:45pm

Hi Nicholas

Thanks for your response.

I was trying to figure out a way to feed the qiime 2 Gene IDs in the PiCrust Platform. I made use the following command.

qiime vsearch cluster-features-closed-reference
--i-sequences Final_Aug_2018/MiSeq_data_Aug18_V1_rep_seqs.qza
--i-table Final_Aug_2018/MiSeq_data_Aug18_V1_table.qza
--i-reference-sequences gg_13_5_otu_99.qza **
--p-perc-identity 1
--p-threads 0
--output-dir Final_Aug_2018/PICRUST

qiime tools export
Final_Aug_2018/PICRUST/clustered_table.qza
--output-dir Final_Aug_2018/PICRUST/BIOM_File

But the Gene IDs generated using this command is different from the ones when I ran the rep-seq and table commands. Is there is way to map those IDs

Nicholas_Bokulich · September 6, 2018, 8:05pm

I do not know what commands you are referring to.

But here's a guess: the feature IDs in clustered_table.qza do not match the feature IDs in MiSeq_data_Aug18_V1_rep_seqs.qza and MiSeq_data_Aug18_V1_table.qza.

This is because cluster-features-closed-reference is clustering your sequences into OTUs. There is not currently a way to map those OTU IDs to the original IDs.

The picrust reference sequences may have taxonomy associated with them, so you can map to those IDs — otherwise, just reclassify your sequences to obtain taxonomic predictions.

I hope that helps!

dheeman00 · September 9, 2018, 5:50pm

Hi Nicholas

That's helpful. Just want to know one thing does the data in the taxamony changes when I perform clustering.

dheeman00 · September 10, 2018, 6:54pm

Hi Qiime Form

I need some clarification regarding the Feature IDs that are generated and OTU IDs that are generated using the clustering algorithm. I made use of the following clustering command:

qiime vsearch cluster-features-closed-reference
--i-sequences Final_Aug_2018/MiSeq_data_Aug18_V1_rep_seqs.qza
--i-table Final_Aug_2018/MiSeq_data_Aug18_V1_table.qza
--i-reference-sequences gg_13_5_otu_99.qza
--p-perc-identity 1
--p-threads 0
--output-dir Final_Aug_2018/PICRUST

qiime tools export
Final_Aug_2018/PICRUST/clustered_table.qza
--output-dir Final_Aug_2018/PICRUST/BIOM_File

So this is generating the OTU IDs that are been clustered. The outcome in the folder shows a list of OTU IDs that are been clustered and unclustered. Moreover, I have seen that there is a significant reduction in the number of entries while clustering been performed.

So here are the following issues I am having:

Is there any way to map these Feature ID with the OTU ID generated from the clustering command?
Moreover, I want to know how this clustering been performed as I want to include more OTU IDs for further analysis.

Note I am performing this clustering as the Feature IDs generated by Qiime is unreadable in the green gene dataset in PiCrust platform. I have attached the ID files that are generated before and after the clustering command.

taxonomy.tsv (1.5 MB)
taxonomy.tsv (141.3 KB)

ebolyen · September 11, 2018, 5:54pm

Hi @dheeman00,

Exactly.

Unfortunately, not at this time. There hasn't yet been a super compelling reason to generate an OTU map in QIIME 2, but it isn't unreasonable to think it might happen someday.

Could you expand on this a little? For example the implementation is here, but I suspect you are looking for something else?

With closed reference, it's straight forward to include more data as all OTUs are defined by your reference database (downside is you lose anything your database didn't know about).

Yep, that's a great reason to do this step. There is a picrust2 in the works which can handle ASVs, but I haven't tried it out yet.

dheeman00 · September 12, 2018, 1:44pm

Hi

thank you for your response.

Please find the response regarding the issues I am facing:

Could you expand on this a little? For example, the implementation is here, but I suspect you are looking for something else?
I am trying to have a good understanding of the clustering methodology that been performed. I am trying to list down the list of Feature IDs that are considered in the clusters and the ones that are not considered inside the clusters. This is because if you look at the tsv files you will see that there is a good reduction in the number of features and I am trying to make sure more features can be considered for clustering.

Yep, that’s a great reason to do this step. There is a picrust2 in the works which can handle ASVs, but I haven’t tried it out yet.

I have not yet tried PiCrust2. Maybe I need to go through the documentation to have a good understanding of the changes. Moreover, I am working with S16 RNA Sequencing Data.

ebolyen · September 12, 2018, 11:18pm

Hey @dheeman00,

I see, if you wanted to explore more deeply, you could run vsearch directly. Closed-reference is a very simple bioinformatic protocol and the OTU map would be accessible at that point.

Otherwise losing a lot of features to close-reference is very typical. It's why open reference is often used.

dheeman00 · September 13, 2018, 7:13pm

Hi Ebolyen

Thanks alot. But is it possible to share further elaboration of the idea of using vsearch for deeper exploration? Moreover, what do you exactly mean by open reference I am not getting that point clear.

Dheeman

Nicholas_Bokulich · September 13, 2018, 9:17pm

Running vsearch directly to cluster your sequences will allow you to generate a file mapping the OTU IDs to the original sequence IDs. I believe the --uc parameter is what you are looking for.

Check out the vsearch github page for installation and usage details.

Closed-reference OTU picking throws out any sequences that do not have at least X % similarity to the reference database.

Open-reference OTU picking keeps those features, and performs de novo clustering to form new OTU clusters from the sequences that failed to cluster to the reference.

However, you can only use closed-reference OTU picking with picrust, since the point is that you are identifying genome sequences that have similar 16S sequences to your queries.

dheeman00 · September 16, 2018, 1:45am

Thank you for the clarification.

But I am facing one issue which is bothering me:

I reduced the feature table using the following commands:

qiime feature-table filter-features
--i-table Outputs/MiSeq_data_v3_table.qza
--p-min-frequency 8
--o-filtered-table Reduced_Features/MiSeq_data_v3_table_min_8_feature_filtered_table.qza

qiime feature-table filter-features
--i-table Reduced_Features/MiSeq_data_v3_table_min_8_feature_filtered_table.qza
--p-min-samples 3
--o-filtered-table Reduced_Features/MiSeq_data_v3_table_min_8_feature_filtered_table_min_sample_3.qza

But now I want to reduce the rep-seq file that is originally generated along with the table. The reduced rep-seq file should have min freq. 8 and samples 3. So how can I do that.

I am trying to do this so that it works properly with the cluster command. If you need the reduced qzv file I can share up anytime.

dheeman00 · September 16, 2018, 2:22am

Just adding to my previous approach:

I used the following command:

qiime feature-table filter-seqs \

--i-data Final_Aug_2018/MiSeq_data_Aug18_V1_rep_seqs.qza
--i-table Final_Aug_2018/MiSeq_data_Aug18_V1_table_min_8_freq_min_3_samples.qza
--o-filtered-data Final_Aug_2018/MiSeq_data_Aug18_V1_rep_seq_min_8_freq_min_3_samples.qza

qiime feature-table tabulate-seqs
--i-data Final_Aug_2018/MiSeq_data_Aug18_V1_rep_seqs.qza
--o-visualization Final_Aug_2018/MiSeq_data_Aug18_V1_rep_seq_min_8_freq_min_3_samples.qzv

Nicholas_Bokulich · September 17, 2018, 1:53pm

Perfect! Sounds like you found the solution.

Let us know if this still is not working!