Merging samples with feature-table group

jmb · December 14, 2017, 3:54pm

Hi,

I have a question about merging samples together. My experiment includes samples along the GI tract from four different individuals from each of two species. For example, for each individual I have three samples from different locations in the small intestine. Ultimately, I would like to make comparisons (e.g. diversity) between the microbiota in the small intestines of the two species. I am concerned about treating multiple samples from an individual as separate samples in statistical analyses since each sample is not really independent. Ideally, I would like to be able to just combine the three samples from a given individual into one "composite small intestine" sample (thus I would have only four rather than twelve samples when I compare species). It seems like I might be able to deal with this using the feature-table group command, but, if so, after looking over the documentation, I am unclear as to exactly how to do it. Another idea I had was to concatenate the fastq files prior to importing (reads are from a single mi-seq run and are already demultiplexed so I would bring them in with a manifest). But, if I do the sequence headers in a single file would include different barcodes, and I'm not sure whether this would cause any issues when I import. Thanks, I appreciate any suggestions!

Nicholas_Bokulich · December 14, 2017, 4:09pm

Hi @jmb,
It sounds like using the action feature-table group would achieve what you describe. If you have a metadata category that describes which individual a sample came from, you can use group to collapse the feature table by group.

Note that the sample IDs will no longer be the original sample IDs, and will instead be the values from that metadata category (e.g., the individual ID). So you will need to create a new metadata file with these values as sample IDs for downstream analyses.

Keeping technical replicates separate also has its benefits from a statistical standpoint. Within-individual variation can be important for comparing samples, e.g., with diversity statistics. High inter-individual variation may decrease, rather than artificially inflate, confidence. But it cuts both ways. At the very least, it is worth looking at a PCoA plot to determine how much "spread" there is between technical replicates, and then group these samples if this is not artificially reducing noise and if you have sufficient replication for other tests.

I hope that helps!

jmb · December 15, 2017, 8:13pm

Thank you, I was able to do as you described. The piece I was missing before was that I needed to re-make the metadata file. I appreciate your help!