Classifier V4 different primers

iordanis · December 18, 2024, 10:39am

Hello everyone,

I have 5 datasets of microbiome data. All of them are from different V regions. But i trim all datasets in the V4 region with different primers, dependently where produce more trim results.

some datasets are trim with these primers

GTGYCAGCMGCCGCGGTAA forward primer
ATTAGAWACCCBNGTAGTCC revere primer

and some datasets with these

GTGYCAGCMGCCGCGGTAA forward primer
GGACTACNVGGGTWTCTAAT revere primer

I have build a classifier from silva 138.1 with these primers for v4 region

GTGYCAGCMGCCGCGGTAA
GGACTACNVGGGTWTCTAAT

Lastly i use this classifier for all datasets, and i don't have problem, which i mean the classifier don't create errors.

So here is my question, it is wrong to use this classifier on datasets from the same v region with different primers. It will creates me diversity bias?
Or i am ok with no problems?

Thank you all for yours support.

SoilRotifer · December 18, 2024, 8:53pm

Hi @iordanis,

If your goal is to compare these data sets together make sure that all of the amplicons are over the same exact after your primer removal. This is becuase, your data will be of different lengths.

Correction. These are both V4 primer pairs... the first pair reports the reverse primer in the reverse complimented orientation:

That is the V3V4 primers ...
GTGYCAGCMGCCGCGGTAA forward
ATTAGAWACCCBNGTAGTCC reverse

~~... will longer fragments compared to the V4 primers:~~
GTGYCAGCMGCCGCGGTAA forward
GGACTACNVGGGTWTCTAAT reverse

See later post about the correction.

So you should trim ~~the V3V4 reads down~~ to the V4 read length after primer removal. Actually, you can likely just use cutadapt to trim / extract the V4 primer region for all the data sets. Then I'd use the V4 classifier.

But be cautious. Even after you trim the data to cover the same region (i.e. v4), there will still be PCR amplification biases present within the data. See this post, and this post with references.

iordanis · December 19, 2024, 9:49am

Hello @SoilRotifer

Well, lets see if i understood.

These primers don't trim the v4 region?

GTGYCAGCMGCCGCGGTAA forward
ATTAGAWACCCBNGTAGTCC reverse

From that i know this primer GTGYCAGCMGCCGCGGTAA is for 5 end of v4 and this primer ATTAGAWACCCBNGTAGTCC for 3 end of v3v4. So theoretical can you trim V4. But this is my thought.

Now let me say my problems from 5 datasets that i have 2 datasets are from v4-v5 region
and when i trimed with these primers

GTGYCAGCMGCCGCGGTAA forward
GGACTACNVGGGTWTCTAAT reverse

The sequences recognize them only 3 to 30 times in each sequence.

But when i use these primers GTGYCAGCMGCCGCGGTAA forward
ATTAGAWACCCBNGTAGTCC reverse They recognize them and trim them thousands times.

So what you suggest?

Plus i want to you know i have denoised all dataset in the same lengths 0 - 240 bases separately

And i have create a classifier from V4 region with extract reads GTGYCAGCMGCCGCGGTAA forward ,, GGACTACNVGGGTWTCTAAT reverse and i have used the Rescript plugin. I use this classifier for datasets that i trimed with regular primers of v4 region.

And I used this classifiers for 2 datasets that i trim with these primers GTGYCAGCMGCCGCGGTAA forward,, ATTAGAWACCCBNGTAGTCC reverse and works.

Plus all dataset are from stools human gut.

In the end my differential abundance meta-analysis work fine with results.

In addition i use pcoa plot and all samples are overlaping with differences between control and disease of course. So i belive i am ok.

i am ask only because if i do something wrong and i don't know.

Any help?

Jordan

SoilRotifer · December 19, 2024, 2:28pm

Hi @iordanis,

Sorry I made a mistake. I must have mis-read a primer table I was looking at... These are the primers but I was mistaken about the reverse primer. The reverse primer you list, ATTAGAWACCCBNGTAGTCC, is actually the reverse compliment of the actual primer sequence:
GGACTACNVGGGTWTCTAAT

Which you have listed under your "second" primer set. You can check for yourself here.

So, you do not have two different primer sets, just that one was annotated in the incorrect orientation. This is why you are unable to find the hits with cutadapt, and likely why your data appear to analyze just fine. So, just use the second primer set on all of the data and you should be good to go.

I'll edit my original post for clarification .

-Mike

iordanis · December 19, 2024, 2:52pm

Hello, @SoilRotifer

Well 2 datasets works only with these sets and i mean the identify the primers in sequences and trimmed

GTGYCAGCMGCCGCGGTAA forward
ATTAGAWACCCBNGTAGTCC reverse

and the others works only with these sets

GTGYCAGCMGCCGCGGTAA forward
GGACTACNVGGGTWTCTAAT reverse

but for that you say me, i can understand it is the same primers and target the same length of V4 so i don't have significant bias, right?

Thank you for your support and for your time!
Best
Jordan

SoilRotifer · December 19, 2024, 2:57pm

Correct.

I would simply use the following:

as these are the primers as defined by the EMP 16S rRNA protocol.

iordanis · December 19, 2024, 2:59pm

Thank you so match Mike!

Goodbye

system · January 19, 2025, 9:00pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.