How to train the classifier with multiple mixed forward primers?

Birong · January 31, 2023, 2:14pm

Hi,

I am analysing V1-V2 16S rRNA sequence data. I want to use qiime feature-classifier extract-reads to extract reads and train a classifier.

However, this data has mixed primers:

V1-V2 MiSeq primers (parts in bold are adapter sequences)
Forward: These primers are mixed at a 4:1:1:1 ratio (28F-YM is the 4)

28F-YM: **TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG** GAGTTTGATYMTGGCTCAG 
28F-Borrellia: **TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG** GAGTTTGATCCTGGCTTAG 
28FChloroflex: **TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG** GAATTTGATCTTGGTTCAG 
28F-Bifdo: **TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG** GGGTTCGATTCTGGCTCAG

I came across this one: How to train the classifier with multiple reverse primers?. but my case has so many differences. But my case seems more complicated. I have primers, and they seem quite different.

What should I do?
Any advices would be highly appreciated! Thanks!

Kind regards,
Birong

crusher083 · January 31, 2023, 2:17pm

Hello,

Let's start with the same method that was in the linked thread explained in detail by @colinbrislawn - we need a metric to discuss whether primers are different or not.
So please, provide the information on how different these primers are.

Cheers
V

Birong · January 31, 2023, 2:51pm

Hi V,

Thanks for your kind help.

How about this:

28F-YM:        GAGTTTGATYMTGGCTCAG 
28F-Borrellia: GAGTTTGATCCTGGCTTAG 
28FChloroflex: GAATTTGATCTTGGTTCAG 
28F-Bifdo:     GGGTTCGATTCTGGCTCAG


28F-YM vs 28F-Borrellia           ==   3
28F-YM vs 28FChloroflex           ==   4
28F-YM vs 28F-Bifdo               ==   4
28F-Borrellia vs 28FChloroflex    ==   4
28F-Borrellia vs 28F-Bifdo        ==   4
28FChloroflex vs 28F-Bifdo        ==   6


(19-4) differences / 19 bp length == 78.95% similar
--p-identity 0.7/0.8 ？

However, another problem is these primers are mixed at a 4:1:1:1 ratio (28F-YM is the 4), how to take this into account? Should I use 28F-YM & --p-identity 0.7/0.8?

Thank.

SoilRotifer · February 1, 2023, 2:41pm

HI @Birong ,

It appears that all of these primers bind to the same location, and only differ by a few bases. You could combine these 4 sequences into a pseudo-sequence using the IUPAC ambiguity codes like this:

An extreme case would result in something like this:
GRRTTYGATYMTGGYTYAG
^^Warning: This might be too ambiguous and lead to spurious hits.

Since we can allow for a certain amount of mis-matches lets try something like you suggested by slightly lowering the identity, or make a new sequence string, (see below). I retained the initial ambiguous IUPAC bases added additional ones where the common base had a stronger bond, (i.e. a G or a C).
GARTTTGATYMTGGCTYAG
^^This still might be too ambiguous, but you get the idea

Another option, which I'd recomend, is to use only one of the primer sets. Specifically, the one that uses 28F-YM primer and use the resulting extracted sequences as a reference pool for guiding the extraction of this region without the use of additional primer pairs. That is, follow the approach outlined here.

-Cheers!
-Mike

Birong · February 1, 2023, 3:33pm

Hi Mike,

Thanks for you reply! Learned a lot! Wii try!

I guess the last one also applies to qiime rescript get-silva-data, like this:

## get-silva-data
qiime rescript get-silva-data \
    --p-version '138.1' \
    --p-target 'SSURef_NR99' \
    --p-include-species-labels \
    --o-silva-sequences silva-138-99-seqs.qza \
    --o-silva-taxonomy silva-138-99-tax.qza


## Dereplicate 
qiime rescript dereplicate \
    --i-sequences silva-138-99-seqs.qza \
    --i-taxa silva-138-99-tax.qza \
    --p-mode 'uniq' \
    --p-threads 8 \
    --o-dereplicated-sequences silva-138-99-seqs-derep.qza \
    --o-dereplicated-taxa silva-138-99-tax-derep.qza

##  extract-reads
qiime feature-classifier extract-reads \
   --i-sequences silva-138-99-seqs-derep.qza \
   --p-f-primer GAGTTTGATYMTGGCTCAG  \ #28F-YM
   --p-r-primer GCTGCCTCCCGTAGGAGT \ #388R
   --p-n-jobs 8 \
   --o-reads ilva-138-99-seqs-segments.qza

Many thanks!
Birong

system · March 4, 2023, 9:34pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.