how to phylogenetically analyse 16S genes recovered from metagenome?

Hi there,

I recently reconstructed or identified some 16S genes using tools like MATAM or barrnap, respectively. I noticed the resulting fragments of 16S gene varied across samples or MAGs, generally with a divergent length of 300~1500 bp. My aim is to find if there is any novel lineage of high rank in my samples, however, with a low computation cost, which renders me not to directly analyse MAGs.

To my current knowledge, the 16S phylogenetic analysis is commonly based on one or two of all the hypervariable regions, using the virtually same long fragments. That doesn't suit to my fragments.

I also learned some tools conduct de novo taxonomy, such as AutoTax. However, full length 16S is required. I am not sure whether this sort of method is also applicable to non full length 16S.

How can I deal with my fragements to perform phylogenetic analysis? My current plan is to discarded those short fragments which mostly derived from 16S identification in MAGs, and retain near full-length fragments (>1400 bp, or best 1500 bp).

Many thanks in advance.

1 Like

Thank you for sharing this question with us!

I don't know of a pipeline that does this, so instead I'll brainstorm a way to do this from within Qiime2.

Yes. Importantly, 16S amplicons use PCR amplification to target the same part of the game gene which is the same length. This means our tools usually do global alignments so our input sequences need to have end-to-end coverage.

But you have variable lengths:

So let's start here. See if you can extract just one region from all your MAGs, say the 16S V4 region, so you get a list of 16S v4 sequences. This will be like computational PCR, and you can process the results like the 16S v4 sequences from any other study.

Rescript can do this a few different ways. Try this!


Instead of going towards shorter sequences from one variable region, like 16S v4, you could also go towards full length, like you mentioned!

retain near full-length fragments (>1400 bp, or best 1500 bp)

I don't know anything about full length 16S taxonomy, but there has got to be a way!

Hi Colin,

Thanks to your help. I will consider which of the two way is better for my work. To extract V4 or other region seems to be feasible, while I am a bit worry about the accuracy of this way to distinguish clades of high taxonomic ranks. I will read more literatures to make a final decision. Anyway, thank you and the big family of QIIME2 again.

Best regards.

2 Likes