Question about analyzing different sample types

MBugay · March 17, 2025, 6:53pm

Hi all,

I have a question about processing/analyzing different sample types as I have not done this before. I have 3 different sample types (A, B, C) that were all on the same sequencing run. As of now, I know that I will be comparing B and C, but I'm unsure if I will eventually add A to the B-C comparison or if A will be a standalone. For more context, all sample types had the same mixed amplicon primers (16S and ITS) and were dual-indexed so each sample had its own unique indices. The sequencing center has already de-multiplexed them.

I wanted to ask how people generally handle multiple sample types in QIIME2. For instance, do you run everything together for DADA2, generating tree, etc or do you separate by sample type and run them separately for DADA2, generating tree, etc? Or is there no one way to analyze multiple sample types and it's subjective to the researcher?

I would probably trim any adapters prior to QIIME2 and was assuming it doesn't matter if I trimmed them together or separated them by sample type before trimming.

Eventually, I would plan on exporting the qiime artifacts to R. Any advice would be appreciated. Thank you.

jwdebelius · March 17, 2025, 7:39pm

Hi @MBugay,

Like with everything in and microbiome analysis in general, there is no one right way to do this.

Personally, when I deal with this problem, I tend to process all my 16S samples together and all my ITS samples together across all three sample types. (I work in humans so I'd put my oral, fecal, and colon biopsy samples in my same DADA2 run if they were all sequenced on the same sequencing run.) I'd then build the tree and do the taxonomic classification off my ASVs. I'd likely go through rarefaction wtih alpha and beta diveristy across all my samples.

I tend to process and calculate beta diversity distance metrics all together because it's very easy to filter a distance matrix, but it's computationally expensive to build one. So, I want to build the biggest distance matrix I think I'll use (all the samples across all the same types) and then I can filter specifically to what I need later. I tend to do an all body site PCoA and taxonomic barplots because I want to make sure everything is labeled correctly.

I think there are a couple of special cases. If you're planning to do an enviroment specific analysis with a sepcialized database, you might want to filter your table and representative features before doing that taxonomic classification.

All that said, there are a few things you may need to be aware of. Well-to-well contamination happens if you do 96 well plate robotic extraction and you sample types were extracted together. Be particularly aware of mixed biomass, as high biomass samples are more likely to contaimated low biomass than the reverse. I often reference this article on the topic.

There can also be index hopping, where reads from one sample can accidentally be mislabeled as another. I have all kinds of horror stories about well-to-well contamination and index hopping... including the time we accidentally found tick bacteria in our patients which we were super excited about... until we realized they'd been run with tick samples . Thankfully that never made it any where but the "lessons learned" section of a thesis.

Hopefully this helps!

Best,
Justine

MBugay · March 19, 2025, 8:04pm

Hi @jwdebelius

Thank you for the reply! Your response was very informative especially the issue about the well-to-well contamination and index hopping.

Fortunately, I did not do robotic extraction, so contamination from that is not an issue.

I do not plan on an environment specific analysis with a specialized database. For further context, my samples types are 2 types of soil and 1 type of root from the same environment.

I was worried that running everything together would be computationally taxing since I have over 200 samples total, but I have access to an HPC, which should help.

For now, I will plan to process all my 16S and ITS samples separately across all the sample types then filter later.

Thanks again

jwdebelius · March 20, 2025, 2:12pm

Hi @MBugay,

That depends on your depth and avalaible computational power; I'd probably process on an HPC if I could personally, but you could maybe do it on a laptop.

Best,
Justine