which OTU/Feature table matches to phylogenetic tree?

BecksHawaii · September 1, 2020, 3:52pm

Can you please tell me which file is the abundance table that matches the tips of the phylogenetic tree that is generated? I have a tree with ~ 21,000 tips and the abundance table (what would have been the OTU table in QIIME 1) has ~ 12,000 rows (features/OTUs). In another project, I have the same problem - I cannot fine the file that matches the phylogenetic tree that was generated; in that case the tips of the tree = to ~ 5000, and the OTU table has ~ 9000 rows (which is the feature table). The tree was made with FastTree in QIIME2. I think I am confused about what file is getting used to make the phylogenetic tree.

My apologies if this is somewhere already on the forum, I did a search first and could not find the answer. Still learning QIIME2, so again my apologies!

Cheers
Becks

thermokarst · September 1, 2020, 4:01pm

Hey @BecksHawaii!

We are missing a bit of information here! How did you create these files? Can you share some commands? I don't just mean "The tree was made with FastTree in QIIME2", but actual copy-and-paste commands, or, some QZAs/QZVs with provenance, that way we can see what you've done so far.

The rows of a QIIME 2 FeatureTable are actually samples by default.

Can you provide a little more context to us, that way we can lend a more helpful hand to you? Thanks! :qiime2:

BecksHawaii · September 1, 2020, 6:29pm

Aloha, my apologies (again!). I was hoping that there might be a general answer that I could follow, like it runs under a specific set of scripts and should be in "x" folder. Here is the script that was run to generate the phylogeny:

qiime phylogeny align-to-tree-mafft-fasttree --i-sequences QCandFT/rep-seqs.qza --o-alignment PhyloAnalysis/aligned-rep-seqs.qza --o-masked-alignment PhyloAnalysis/masked-aligned-rep-seqs.qza --o-tree PhyloAnalysis/unrooted-tree.qza --o-rooted-tree PhyloAnalysis/rooted-tree.qza --p-n-threads 14

The tree has ~ 21,000 tips, as does the alignment that was used. I may have just found the right file, but I would love it if you could confirm that. I am using this information for questions of phylogenetic diversity and distinctness in lava caves, whether that be per sample, per cave, etc. in another program.

I may be using the wrong language regarding "feature table." What I am looking at has the samples in the columns and has the "species/OTUs/features" in the rows. The files I have looked at include:

rarefied-table-summary/feature-table.tsv (has ~ 2,000, I think b/c its rarefied)
TaxonomyAnalysis/feature-table-taxanomy.tsv (has ~12,000 rows which are OTUs not samples in this case)
QCandFT/table-summary/feature-table.tsv = ~ 21,000 rows, so I think this might be it!

Attached are the scripts for all that we ran.
[processLog_LC_all_OTUs_SILVA.txt|attachment]
(upload://2kS8Fr3cjjlLZUWPnOANU0aTR2U.txt) (16.6 KB)

I spent most of the evening looking through files that I have generated for this project, and could not find a matching a file. So, thought I might try asking.

If you can confirm that should be the matching abundance table that would "match" the labels in the phylogenetic tree, that would be very helpful.

Happy to provide any further info!

Cheers
Becks

thermokarst · September 1, 2020, 6:46pm

Thanks @BecksHawaii!

Oops, looks like this didn't actually attach the file. Can you try again?

Hmm, this isn't something directly produced by QIIME 2 - hopefully the log I asked for above can provide us with a bit more information.

BecksHawaii · September 1, 2020, 8:15pm

processLog_LC_all_OTUs_SILVA.txt (16.6 KB)

thermokarst · September 1, 2020, 11:32pm

Thanks for attaching, @BecksHawaii!

You might be able to ask the person who made this pipeline for more information - this looks like a third-party script/pipeline.

Judging from the pipeline log you attached, it looks like you can use either QCandFT/table.qza or DiversityAnalysis/rarefied_table.qza with PhyloAnalysis/rooted-tree.qza - it really depends on what you intend to do with it downstream. Most or all QIIME 2 actions will work with either - if there are missing features in the table they will be skipped in whatever the action does. The QCandFT/table.qza file should have one feature per tip in the phylogeny produced.

Hope that helps!

PS - can you share any info about the pipeline used here? I would like to review it further when I get a chance, if possible.

BecksHawaii · September 2, 2020, 12:43am

Aloha, yes that helps and thank you for helping with this. Unfortunately the downstream analysis I am trying to use is outside of QIIME and crashes if there are mismatches.

I am happy to give you more info on the pipeline and provide any further info you need. This is out of one of the national labs and is a set of automated scripts that generate what I would say is standard results out of QIIME, with options for OTUs or ASVs (I have run both ways on these data). It is a free platform that helps facilitate the use of bioinformatic tools in research and education (including K-12). I use QIIME both in command line and in this program, and find that the scripts here match what I would do for general analysis (not for specific things, and sometimes I need to make adjustments to the pipeline).

If you want me to post any info here, can, or let me know if there is somewhere else I can send you info on the pipeline and platform.

I will be in the QIIME2 workshop so I will get better at using QIIME2!

Cheers
Becks

thermokarst · September 9, 2020, 7:54pm

Please share if you get a chance. As I mentioned above, I would like to review it further when I get time.

Thanks!

BecksHawaii · September 9, 2020, 9:27pm

If you can tell me what additional information you would like, I can give you the details. cheers Becks

thermokarst · September 9, 2020, 9:42pm

Well, maybe a URL would be useful? Or a link to a paper? Or any kind of information that you think would be relevant? The point is, I have no clue what this "pipeline" is about, and since you were asking for help with it, and mentioned that it uses QIIME 2, I figured it would be helpful for you, and others, if one of the QIIME 2 team members was at least familiar with it.

BecksHawaii · September 9, 2020, 10:32pm

Aloha, hopefully this will be the more general info you are looking for then:
https://edgebioinformatics.org (this is created by Los Alamos Nat. Lab, who I work with, and includes a pipeline for QIIME2 now).

EDGE: Empowering the Development of Genomics Expertise — EDGE develop documentation (this is documentation for the site).

https://academic.oup.com/nar/article/45/1/67/2572059 (article about EDGE)

I am not aware of a specific paper from LANL that focuses on the QIIME2 pipeline and platform.

Hopefully, this is what you are looking for. Appreciate the help the other day - just needed to narrow down which files go with what..for sure. EDGE is just running QIIME, so this is not developed really within the LANL team per se. I believe some of the info you might be looking for is at the top of the scripts/log I sent.

Cheers
Becks

thermokarst · September 9, 2020, 11:13pm

So then where did this pipeline come from? That's all I was looking for.... is it in-house, or published somewhere? Handed down to you? It doesn't really matter to me, but since you asked some very specific questions about how the pipeline worked earlier in this thread (and none of us on the QIIME 2 team were involved with your pipeline's creation), I was just wondering if there was some way I could actually help you. As much as I wish, I don't actually have a crystal ball to gaze into ...

thermokarst · September 9, 2020, 11:21pm

Is this what you ran on EDGE?

https://edge.readthedocs.io/en/latest/gui.html#run-qiime

BecksHawaii · September 10, 2020, 12:06am

Aloha,

Yes, that is what I run. I use EDGE regularly (as well as Kbase) to run 16S data, metagenomics, etc…It works great, as I mentioned, for a general analysis since it is a “set” pipeline. But generates most of the files one needs for further downstream analysis. I also use it in the classroom with K-12 and undgrads.

Cheers

Becks

BecksHawaii · September 10, 2020, 12:08am

Aloha,

I will direct you to ask any further questions to the LANL team directly, so there is no confusion on wording, which I think is the problem. There is a link to send emails to them on their webpage, so please discuss with them if you want to know exactly who created the pipeline. They are bogged down with COVID work, so my apologies, if they don’t respond right away. I am trying to not bother them too much with my work at the moment, because they are working hard on COVID. So, I asked the forum instead.

You did help the other day, and I have communicated (repeatedly) how much I appreciate your hard work and your teams hard work. No crystal ball expectations, nor did I ask for that. If it doesn’t matter to you regarding the pipeline (I thought you wanted that additional information for your own knowledge of the pipeline and asked for that), then I think there is no need to continue discussion with me. You helped me resolve my question with sorting out which file went with which, from the log I sent you, and you guys are the experts on QIIME scripts overall, and it was a big help. Thank you - truly.

Hope you had a lovely holiday!

Cheers

Becks

thermokarst · September 10, 2020, 12:25am

Perfect, thanks, that's all I was looking for! I wasn't asking about the technical details, I just wanted to know how you got your results. This answers the question, don't worry, I won't ask you any more questions about this . In the future though, please keep in mind that the only way we can help you is if you provide detailed information about what you have done (this gives us the context necessary to assist). Thanks!

Nicholas_Bokulich · September 18, 2020, 12:07pm

A post was split to a new topic: creating a phylogenetic tree for iTOL

system · October 19, 2020, 6:07pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.