Merging picrust2 feature tables

jairideout · January 29, 2019, 6:54pm

I’m using q2-picrust2 to predict metagenomes from ASVs following the q2-picrust2 tutorial:

Using picrust2 default reference files with qiime2-2018.11 release.
Inserting ASV representative sequences into picrust2 default reference tree with qiime fragment-insertion sepp.
Predicting metagenomes with qiime picrust2 custom-tree-pipeline.

I’m planning to run this pipeline on multiple ASV feature tables and representative sequences generated from different samples, and then merge the resulting picrust2 feature tables (e.g. KO tables). Is this a valid approach, or is it better to merge the ASV feature tables and representative sequences before running picrust2?

I tested out both approaches by:

Running picrust2 separately on two single-sample ASV feature tables and merging the picrust2 feature tables with qiime feature-table merge.
Merging the two single-sample ASV feature tables and representative sequences with qiime feature-table merge and qiime feature-table merge-seqs, respectively, and running picrust2 on the merged table and sequences.

I compared the resulting picrust2 feature tables with qiime feature-table summarize and noticed some minor differences. For example, the number of samples and features (e.g. KOs) are the same across the tables, but the total number of sequences is lower in one table (95,409,620 seqs for Approach #1) than the other (95,412,070 seqs for Approach #2).

Are these differences to be expected (e.g. due to stochasticity in picrust2 or some other reason)? Is one approach recommended over the other?

I’m happy to provide exact commands and outputs if that’s helpful – just wanted to put the conceptual question about merging out there first. Thanks for your guidance!

thermokarst · January 30, 2019, 9:24pm

ccing @gmdouglas

gmdouglas · January 30, 2019, 9:48pm

Hi @jairideout,

Interesting question! Did you run SEPP for each approach independently? That could be where variability is introduced if so, although I haven’t tested SEPP enough to know for sure.

I would recommend merging the tables beforehand to ensure that the same predictions are output for the same ASVs present across multiple tables. If ASVs don’t overlap then it shouldn’t make a difference though.

jairideout · January 31, 2019, 9:21pm

Thanks for getting back to me @gmdouglas!

Yes, for Approach #1 I ran SEPP separately on each sample's representative sequences, and ran picrust2 on each SEPP tree independently prior to merging picrust2 feature tables. For Approach #2 I merged both sample's representative sequences and ran SEPP on those a single time prior to picrust2 predictions.

That makes sense that SEPP may be the source of variability here (I haven't confirmed this though). Thank you for recommending a path forward -- I'll merge tables and representative sequences beforehand (Approach #2) to ensure that shared ASVs are predicted in a consistent manner across samples.

jairideout · March 4, 2019, 3:21am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.