Hello,
I have a question regarding the primer sequences to train the classifier.
I have three datasets, one paired-end 16S, one single-end 16S (due to low quality of reverse reads) and one paired-end for 18S. Each one has been done with a different primer set. In order to have a single merged result, I merged all three .qza tables and now I am unsure how to proceed on the feature-classifier.
I am using the silva v.1.38 database for 16S and 18S in qiime2-2020.2.
These are the primer sets:
I recommend that you do not merge the datasets; classify each one separately, or at least keep 16S and 18S separate. There is no way to train a classifier on three specific separate regions using q2-feature-classifier.
You could also use the pre-trained full-length classifier from SILVA to classify all of these amplicons, since it contains full-length 16S + 18S. Classification accuracy will be a little bit better if you train separate classifiers for each amplicon, but not by a large degree.
Thank you Nicholas.
I was initially thinking not to merge them. However, after analyzing the 18S dataset, I noticed it amplified a bunch of bacteria and archaea too. So this is why I was intending to have one single result.
Thanks for the clarification. Merging makes me worry a little bit — the 18S primers likely amplify a skewed selection of bacteria and archaea (i.e., have poor coverage unless if they were designed as "universal SSU" primers), so merging these data might lead to some strange-looking results. I suppose it is worth taking a look just to see what you see, but I would be very cautious with interpreting this.
One way or another, you can follow either of my suggestions above. Train amplicon-specific classifiers, classify separately, then merge; or classify all with the full-length SILVA SSU classifier, before or after merging.