Best way to merge or group runs/samples

colinbrislawn · February 4, 2018, 6:36pm

Hello again, Sarah. I hope your 2018 is off to a good start.

Like I said before, when I combine samples from two miseq runs, I will keep them separate so that I can include a metadata variable like MiSeqRunNumber and use it to detect batch effects. So I would demultiplex these all as separate samples, and process them in a single, unified batch to make a single feature abundance table. At the end, I could combine them into the categories you described using this Qiime 2 plugin:
https://docs.qiime2.org/2017.12/plugins/available/feature-table/group/
This page also describes --p-mode option you were asking about.

The feature-table merge plugin is needed when you have processed two batches of samples into two tables, and now you want to combine these tables. This is sort of risky because if you have two different samples that happen to have the same name, this plugin could merge them by mistake! To prevent this problem, this plugin starts by checking to see if any of the sample names or feature names overlap. If it finds overlapping names, you can choose what it does next (like sum these two samples, or throw an error about the identical name).

Let's zoom out a bit. You have lots of samples, and you want to group them in different ways. The major choice here is to either:

process them all in one batch, then make meaningful categories using feature-table group or
process them in many batches (one for each meaningful category), then feature-table merge when you want to see the full study.

2024 update:

Both options are okay, but one works better for DADA2!

Because DADA2 builds an error model for each sequencing run, you should run DADA2 on each sequencing run separately, then merge the tables like in option 2.

If you want to 'collapse/merge/sum' replicates later on, you can use option 1.

They serve different purposes so Qiime2 supports both!