Merging two samples given phylogenetic tree, OTU table, & metadata


I am collaborating with another research group who has been collecting 16S data on children at multiple ages. They have shared with me two separate datasets: one containing all participants with data at all ages, and another with participants who only have data from one age timepoint. For each sample, materials provided include a phylogenetic tree, an OTU table, and a metadata file. I am interested in analyzing all samples at Age 1, regardless of whether they have longitudinal data or not. So my question is, could I attempt to merge/combine the samples from both datasets that were collected at Age 1, given the materials I have? My hunch is that I cannot, because merging the two phylogenetic trees would create issues, but I am not too experienced with microbiome data analysis, so I would greatly appreciate hearing others’ thoughts.

Thank you!

Hi @fquerdasi ,
You have some options here depending on what format your files are in:

  1. If at all possible, start with the raw FASTQ files and work right from the beginning. If the samples are all from the same run you can just import them all as one artifact and go through the regular pipelines (denoising, tree building, taxonomy, analysis etc.) If the samples come from different runs, (and are Illumina data) you can use Deblur to denoise. If raw reads are not available:
  2. You can either import all your tables into QIIME 2 and merge them once imported, or merge all your individual tables before importing into QIIME 2. Depending on how comfortable you are with programming languages, the latter might be a lot easier. You can do this using R, Python, or even biom as itself has a merge function. OR, if your files are already in QIIME 2 format, then you can simply merge the tables using the feature-table merge plugin (see ex here)
  3. You’ll need a to build a new tree using the new table, I’m not familiar with any methods to merge trees in QIIME 2 (or anywhere else really) and in order to do that you’ll need a representative sequence file. To get this from your table:
    a) If you’re starting with a merged .biom table:
#use your merged biom table here
biom summarize-table --observations -i merged_biom_table.biom \
| tail -n +16 | awk -F ':' '{print ">"$1"\n"$1}' > rep-seqs.fna

#then import this into QIIME 2 as a FeatureData artifact

qiime tools import \
  --input-path rep-seqs.fna \
  --type 'FeatureData[Sequence]' \
  --output-path rep-seqs.qza

b) If you’re starting with a QIIME 2 feature-table: simply export the table artifact to get the biom table under the hood and then follow the above again to get the rep-seqs file:

qiime tools export \
  --input-path feature-table.qza \
  --output-path exported-feature-table
  1. Once you have your table and rep-seqs file, you can just follow any of the QIIME 2 tutorials on how to build trees, taxonomy, and do analyses.

Hope this helps.