How to produce a merged feature table?

wlb_merging_help_code.txt (2.2 KB) feature-table_w_tax.txt (219 Bytes) merged0.qza (59.0 KB) merged_data0.qza (59.6 KB)

Summary:
I am searching for a generalizeable way to merge samples for downstream analysis and have not found success in feature-table group, feature-table merge, or feature-table merge-seqs.

I need to be able to merge both technical replicates and biological replicates, some from different sequencing runs. I have successfully used feature-table group for producing/viewing taxa barplots. I have tried using feature-table merge and merge-seqs to try to produce unified feature tables. But barplots are not all I need, and at the end, the exported feature tables still seem to contain data about original inputs rather than about fused artifacts.

Long version:
I have searched the forum but found nothing exactly like it (or nothing in a recent version of QIIME), and many users seem to stop analyzing once they have barplots. I need to also run diversity commands, e.g. diversity core-metrics-phylogenetic, and use merged feature tables as inputs to MicrobiomeAnalyst. I have read the glossary. I am running QIIME2 2021.2, installed through conda. I don’t have error messages; I’m looking for a how-to (already aware of Fecal microbiota transplant (FMT) study: an exercise — QIIME 2 2021.2.0 documentation and found merge/merge-seqs there). Maybe I ran the wrong commands.

I have sampled three ponds four times each, taking three amphibian egg masses from each pond on each visit, and four eggs from each egg mass. The 16S V4/V5 sequences from these eggs are my data. I sequenced technical replicates at first: three samples each times three, e.g., A1-A3, B1-B3, C1-C3, getting eighteen fasta.gz files from R01 and R02 for these “nine” samples. I want to have an easier time with, e.g., A, B, and C, whether through concatenating the sequence files as inputs or producing fused QIIME artifacts later on. Then these A, B, and C will be part of the larger/later dataset, which was the remainder of my samples.

Also, not all my samples were successfully sequenced, cutting some of my biological replicates out. It will be useful for me to be able to treat all the eggs from one mass as one sample, or all masses from one visit to a pond as one mass.

I have so far tried, as a minimal and laborious case, merging just two of my technical replicates—the case of turning A-1 and A-2 into A. I have attached my work (abbreviated code, feature table, two merged artifacts) merging BP614-1 and BP614-2 and trying (and failing) to produce a feature table of just one “sample”, BP614. The code and feature table are self-evident, merged0.qza is the FeatureTable[Frequency] result of feature-table merge, and merged_data0.qza is the FeatureTable[Sequence] result of feature-table merge-seqs.

How can I take arbitrarily many paired-end sequences and mash them together with the ease of the command line interface? My apologies for the very long and complicated request. If I’ve omitted anything important or been foolish, I apologize and will be happy to try to remedy anything I did wrong. Thanks to any helper.

Hi, @wburgess,

Thanks for all the details! I need to do a bit more research before I try to advise you, but in the meantime I suggest searching the forum for something like, “merge replicates.” If you find any posts that are successfully accomplishing something similar to what you need to do but not quite, it would be helpful to link to those posts with a note about how/why that doesn’t address your issue.

Cheers,
:sponge: :mage:

@wburgess , @llenzi shared a post with me in which they wre discussing something similar with @SoilRotifer:

I think this could provide a good starting point for you! Please feel free to follow up with any specific questions. :smiley: