Merging two different libraries from the same experiments

Dear All,

I'm posting this question hoping that it will be accepted.

I am analyzing fecal microbiota from an experiment conducted in two separate batches. In the first batch, the V3-V4 region was sequenced using an Illumina MiSeq. In the second batch, only the V3 region was sequenced using an Illumina NextSeq 2000. Is it possible to combine these two libraries before proceeding with denoising and downstream analyses? If so, which tools within QIIME2-AMPLICON-2025.10 should I use?

Hello @AMosca96,

The best approach here is probably to run the same analysis on the two datasets in parallel and compare the results you get in the end for each dataset.

If you do really want to merge them, you are most likely going to want to extract only the V3 region out of your V3-V4 sequences. The addition of the V4 region is going to throw off your analysis if you merge V3-only data with V3-V4 combined data.

If you really do want to merge, I can help you with that, but I would again advise you to just run the datasets in parallel unless you have a very compelling reason to merge.

Thank you,
Anthony Simard

2 Likes

One alternative to what @Oddant1 is suggesting - you could also do a closed-reference OTU clustering process on the data after processing them independently with q2-dada2. QIIME 2's OTU clustering approaches are described here, though I just noticed some failures in that documentation that we will be fixing this week. You would basically follow the steps in that document, but use cluster-features-closed-reference in place of cluster-features-open-reference. I would use a percent identity threshold of 1.0.

2 Likes

Thank you for your reply. Unfortunately I must merge the two batches because the second batch doesn’t have the control samples and I’d like to try with the ASVs instead the OTU approach.

I used the ReSCRIPT tool to extract the V3 region on the ASVs obtained from the denoising analysis, but eventually I had to merge the two datasets and I obtained the so called “batch” effect.

I used the following commands for both batches, specifying the primers for the V3 amplicon:

qiime rescript trim-alignment --i-aligned-sequences output_alignment_batch2.qza --p-primer-fwd CCTACGGGRSGCAGCAG --p-primer-rev ATTACCGCGGCTGCT --o-trimmed-sequences output_rescript_batch2.qza

Then I converted the “feature table alignment” output to a “feautre table sequence” output. I did it for the outputs of both batches and I merged them. I did the same for both the ASV tables that I obtained previously from the denoising analysis.

I’m not sure if this approach is correct and if it’s possible to avoid the batch effect.

Any suggestions would be very appriciated! :folded_hands:

A similar discussion is happening in another thread on the forum right now. @AMosca96, I recommend taking a look at this post.

The approach you're describing will not resolve the ASVs from the two different runs into the same ASV as, if I understand the approach correctly, the ASV ids are not going to be updated post trimming of the sequences, so you will continue to observe them as different features in the merged table (which will lead to an apparent batch effect that is bigger than what is probably there). There are ways you might be able to post-process those data to address this - eg., de novo OTU picking at 100% identity following your trimming and merging, but I wouldn't recommend this route. Rather I would go with one of the approaches that are covered in the post I'm linking to.

Note there in all realistic scenarios, there will always be a batch effect if you're using different primers, because the primers are enabling you to "see" different community members due to primer bias. Combining data sequenced with different primers is therefore always an exercise in trying to manage the bias that we know is there.

1 Like