How to subdivide 16S rRNA gene in 9 hypervariables regions using qiime 2?

M_F · September 20, 2021, 3:36am

Hello,
Using SWIFT 16SITS and Qiaseq 16S kits it's possible to target all 16S rRNA gene. However how can i split this gene in 9 hypervariables regions V1V2, V2V3, V3V4, V4V5, V5V6, V6v7, V7V8, V8V9 because i want to investigate each region individually ? Moreover, for QIASEQ and Swift i don't have the sequences of the primers used for this purpose since they supply a pool of primers with unknown sequences. i find this article Development of an Analysis Pipeline Characterizing Multiple Hypervariable Regions of 16S rRNA Using Mock Samples

Referring to table 1 indicated in the article and qiime2 which command shall i use to align the reads to silva reference database and how can i set the start stop coordinates as indicated in table 1 to split the reads ?

Thanks

jwdebelius · September 22, 2021, 7:59am

Hi @M_F,

This is a really challenging problem.

I don't have a good QIIME-based solution right now. My best recommendation would be to contact the sequencing provider and see if they will demultiplex by region for you. (If they've trimmed the primers, then they should have had a step to remove them where they might have been able to strip them.)

Best,
Justine

Nicholas_Bokulich · September 22, 2021, 8:16am

Hi @M_F ,

For a QIIME-based solution:

grab a reference gene (e.g., E. coli, or maybe the greengenes 80% OTUs)
extract the regions of interest using known primers, or just trimming at specific positions (e.g., middle of the conserved regions based on known positions vs. the E. coli 16S gene). This can be done using q2-feature-classifier (extract by primer on unaligned seqs) or RESCRIPt (extract by primer or position on aligned sequences), depending on how you want to extract.
use q2-quality-control to select sequences that align to these reference genes... this can be done pre-denoising (on the demultiplexed sequences) with filter-reads or post-denoising with exclude-seqs.

Step 3 would need to be run for each region of interest — but the result will be sequences separated by variable region that can be analyzed separately downstream.

Sounds like using RESCRIPt (in step 2) is the method that you are looking for, to trim by position against a reference gene.

With q2-quality-control, see step 3 above.

If you are looking at aligned reads, the position will be different from what is listed here, so you either need to determine manually (i.e., recalculate the positions in Table 1 to include gaps in the E. coli reference alignment) or use common primer sets to trim based on common positions.

It might take a bit of trial-and-error, but this should accomplish more or less what you are after...

M_F · September 23, 2021, 5:47am

Thanks @jwdebelius @Nicholas_Bokulich,
.

@Nicholas_Bokulich do you have the tutorial to use RESCRIPt , to trim hypervariable regions by position against a reference gene ? How can i recalculate the positions in table 1 ? When you indicated use common primers, do you mean i should search in the litterature the primers used to target different hypervariables regions V1-V2, V2V3.......in this case it will be able to split those regions using qiime feature-classifier ?

Nicholas_Bokulich · September 23, 2021, 8:31am

No this method has not yet been added to a tutorial, so see the help docs for more information:

qiime rescript trim-alignment --help

one advantage of using this method is that you (a) remove the need for primer specificity and (b) it will not discard sequences that do not hit that primer (e.g., truncated reference sequences).

correct... presumably their positions are close enough to the proprietary primers used for your data (or you could align a few of your sequences vs. a reference sequence and manually check to find approximate positions, either for positional trimming or to find the nearest primers).

With extract-reads yes.

One possible concern is if the positions/primers are too imprecise, then you could have trouble with downstream taxonomic classification (one reason to do a manual check as I mention above). Classification against a full-length reference database would be an easy way to get around this...

M_F · September 23, 2021, 8:34am

thanks @Nicholas_Bokulich

bridlin · October 4, 2021, 3:18pm

Hello,

You can find a link to the primer sequences for the Swift kit on the company’s github page:

https://ws.onehub.com/folders/82s4teyk

I hope that helps.

I am also just trying to figure out how to analyse the sequencing data from thier overlapping multiplexed amplicons the best way.

Bridlin

WeedCentipede · October 4, 2021, 6:27pm

Hello,

People in my lab is also dealing with this kind of data,
So @jwdebelius, I was wondering if q2-sidle would be somehow useful to combine this kind of data of different regions into one consensus; after pooling the reads per region (and extracting the k-mers), to compute either overlap or just associate them by depth/taxonomy/phylogenetics/k-mer composition to reconstruct the different regions into one feature.

Cheers,
Luis A.

jwdebelius · October 4, 2021, 7:17pm

Hi @WeedCentipede,

We did some simulations in the sidle preprint showing (at least in our opinions) that Sidle is better at scaffolding multiple regions, providing more faithful abundance-based reconstruction and in some cases, better species level resolution.

Right now, Sidle works if you have the primer pairs for kmer-extraction, at some point we hope to maybe have a version that doesn't require the primer pairs.

You can find tutorials, etc on the read the docs page.

Best,
Justine

system · November 5, 2021, 1:18am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.