Combining 16S amplicon data from various regions.

Rajeev · June 11, 2024, 6:52am

Hello everyone,

I am currently working on a public 16S rRNA gene amplicon dataset, which was sequenced with different primers targeting three main regions (V3-V4, V4, V4-V5). Based on the previous discussions in this forum, I have trimmed all datasets to the V4 region (common to all datasets) using cutadapt. Now, I plan to run DADA2 separately for each dataset and then merge the representative sequences and feature tables for further analyses.

However, I am unsure whether I should use the same "trim" and "trunc" parameters for all datasets or I can use independent parameters since all datasets have already been trimmed to a similar region. Any advice on this would be greatly appreciated. Thanks

colinbrislawn · June 11, 2024, 1:45pm

Hello Rajeev,

All of this sounds very good. Using the same region for all downstream analyses makes merging possible.

Good question.

The trunc settings truncate the end of R1 and R2, which is the part that overlaps.
For each sequencing run, you can choose what settings make the most reads join.

The trim settings trim from the start of R1 and R2.
This is the outside section of your amplicon, and it really matters when you have multiple regions.

For ASVs to merge, all their letters must match exactly, including their length.

full 16s |-----------------------|
v4                |---|
v3-v4           |-----|

You may need to remove one of two base pairs from v3 to make it perfectly match the start of v4.
The end of v4 uses the same primer, so that may not need trimming.

Your cutadapt trimming may have already solved this problem, so run DADA2 and see if you can merge the output tables!

Good luck! Let us know if you have questions.

Rajeev · June 12, 2024, 3:39am

Thank you, @colinbrislawn, for your kind response. If I understood correctly, you suggest that the "trunc" parameter can be dataset-specific and adjusted to achieve a higher number of merged reads for each dataset. However, it is preferable to maintain consistent "trim" parameters across different datasets. Thanks once again.

colinbrislawn · June 12, 2024, 1:24pm

That's right! The devs recommend running DADA2 separately on each sequencing run, so you have the chance to adjust the 'trunc' setting to help the reads joins.

Usually this is true, but your data may be the exception!

The output region must match exactly, and you can use 'trim' to adjust the start and end of the amplicon so they create identical ASVs.

If you used the same primers, changing' trim' would probably mess them up, so most of the advice on the forums says to keep trim consistent.
(Same primers, same trim settings)

For your data, different primers were used so changing' trim' could be exactly what you need to get all the ASVs to match exactly.

Rajeev · June 13, 2024, 2:13am

Thank you, @colinbrislawn. Appreciate your response but I'm a bit confused here. As I mentioned earlier, I have already trimmed all datasets to the V4 region using V4-targeting primers with Cutadapt. Can I now use a consistent "trim" parameter and dataset-specific "trunc" parameters? I believe this is feasible because all the data are now technically from the same region.

colinbrislawn · June 13, 2024, 12:24pm

Yes. Because you used cutadapt to select the same region, you can use the same trim parameters. This is the next step!

Later on, if you need to adjust things slightly, you can change the trim settings.

Rajeev · June 14, 2024, 1:25am

Thanks, @colinbrislawn for this informative discussion. I will proceed with the analyses :).