Feature classifier with multiple primers

Maria_A_Sierra · June 29, 2021, 12:48pm

Hello,
I have a question regarding the primer sequences to train the classifier.
I have three datasets, one paired-end 16S, one single-end 16S (due to low quality of reverse reads) and one paired-end for 18S. Each one has been done with a different primer set. In order to have a single merged result, I merged all three .qza tables and now I am unsure how to proceed on the feature-classifier.

I am using the silva v.1.38 database for 16S and 18S in qiime2-2020.2.
These are the primer sets:

16S paired-ended
27f AGAGTTTGATCATGGCTCAG
806R GGACTACHVGGGTWTCTAAT

16S single-ended
341F CCTACGGGNGGCWGCAG

18S paired-ended
Euk 1391f GTACACACCGCCCGTC
EukBr TGATCCTTCTGCAGGTTCACCTAC

command I am trying to run:

qiime feature-classifier extract-reads
--i-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza
--p-f-primer xxxxxxxx
--p-r-primer xxxxxxx
--o-reads silva-138-ssu-nr99-seqs-16s-18s.qza

I will appreciate any help on this matter!

Best,

Maria.

Nicholas_Bokulich · June 30, 2021, 8:11am

Hi @Maria_A_Sierra , welcome to the forum!

I recommend that you do not merge the datasets; classify each one separately, or at least keep 16S and 18S separate. There is no way to train a classifier on three specific separate regions using q2-feature-classifier.

You could also use the pre-trained full-length classifier from SILVA to classify all of these amplicons, since it contains full-length 16S + 18S. Classification accuracy will be a little bit better if you train separate classifiers for each amplicon, but not by a large degree.

Good luck!

Maria_A_Sierra · June 30, 2021, 2:16pm

Thank you Nicholas.
I was initially thinking not to merge them. However, after analyzing the 18S dataset, I noticed it amplified a bunch of bacteria and archaea too. So this is why I was intending to have one single result.

Nicholas_Bokulich · July 1, 2021, 5:32am

Thanks for the clarification. Merging makes me worry a little bit — the 18S primers likely amplify a skewed selection of bacteria and archaea (i.e., have poor coverage unless if they were designed as "universal SSU" primers), so merging these data might lead to some strange-looking results. I suppose it is worth taking a look just to see what you see, but I would be very cautious with interpreting this.

One way or another, you can follow either of my suggestions above. Train amplicon-specific classifiers, classify separately, then merge; or classify all with the full-length SILVA SSU classifier, before or after merging.

Good luck!

system · August 1, 2021, 11:33am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.