For one of my projects I was trying to recover Archaea DNA from sediment samples from high-mountain wetlands. I sampled different depths and did separate PCRs for some of the depths (and many replicates of each) of only the V3 region using the primers U341F: CCTACGGGRSGCAGCAG and 534R: GWATTACCGCGGCKGCTG (Ziesemer et al. 2015). The reason for trying only the V3 region is because initially the main goal of the project was to obtain ancient DNA, so it shouldn’t be very long (200 bp at the most). These primers had long tags for the sequencing in a MiSeq PE300. The sequencing service already did the whole bioinformatic analysis using QIIME2, however for the taxonomic assignment they used a classifier trained on V3-V4 regions and the SILVA latest database, since that is the standard. However, in the results there was not a single sequence of Archaea. I expected most of the sequences to belong to Bacteria, since the samples are soil from a wetland, but some Archaea were expected.

My question is: Would training a Classifier on the SILVA database but only on the V3 region result in a significantly different outcome to what a V3-V4 trained classifier would?

Possibly. Different primer biases could lead to different species being picked up. This may explain why you did not observe any Archaea, though you expected to detect them with your primers.

So I think it is certainly worth trying to classify your sequences with a classifier trained on V3-only sequences extracted using your chosen primer set.

