Feature-clasifier V3-V4

irenedecarlos · October 15, 2025, 3:20pm

Classifier works without `extract-reads`, but collapses to one phylum after trimming (BEExact, V3–V4)

Hello, I am analysisn some sequences of the V3-V4 16S region. I trained a clasifier and the classifier worked great without trimming, but the moment I trimmed references to my primer sites, everything collapsed into basically one phylum. SLURM jobs were fine, no error messages… just awful classifications.
The setup (nothing exotic)

References: BEExact full-length 16S (BEEx_FL-refs_sequences.qza + taxonomy).
Target region: the usual V3–V4 primers:
- F: CCTACGGGNGGCWGCAG
- R: GACTACHVGGGTATCTAATCC

Trim command (QIIME 2 extract-reads):

qiime feature-classifier extract-reads \
  --i-sequences BEEx_FL-refs_sequences.qza \
  --p-f-primer CCTACGGGNGGCWGCAG \
  --p-r-primer GACTACHVGGGTATCTAATCC \
  --p-min-length 100 \
  --p-max-length 500 \
  --o-reads BEEx-V3V4-refs_sequences.qza

After trimming, references only dropped from 20,099 → 20,072. So no, I didn’t accidentally destroy the database.

Training on those trimmed refs gave me barplots where almost everything became the same group. While if I don´t trim and train on full-length refs I get diverse taxonomy. Any idea as to what I am doing wrong here?

SoilRotifer · October 15, 2025, 4:08pm

Hi @irenedecarlos, I assume you are following the protocol from the BEExact GitHub page on how to make an amplicon specific classifier?

For the qiime feature-classifier classify-sklearn command, what did you set for --p-confidence? The default is 0.7, but I noticed that their tutorial is set 0.5. I am not sure I'd advise using this setting as you might run the risk of erroneous classification. But I'm not as experienced with this database as others might be. Others might have more insight.

irenedecarlos · October 16, 2025, 8:40am

Hi @SoilRotifer,I also saw that the BEEexact tutorial uses 0.5, but I kept the default one :).

SoilRotifer · October 16, 2025, 4:27pm

Thanks @irenedecarlos,

Well, if you followed their tutorial and used the default --p-confidence value, I suppose you can try their recommendation of using 0.5.

I would also sanity-check your data by trying to classify your reads using using Greengenes, SILVA, or RDP. If they all return poor classifications, then that might be a sign that there is something wrong with the data? Even if the quality is good... perhaps too many off targets or host DNA?

SoilRotifer · October 16, 2025, 5:26pm

Hi @irenedecarlos,

It is also quite possible that this might be a mixed-orientation read issue. That is the naïve bayes classifier requires that reads match the sequence orentation of the reference database. Fortunately, the latest version of RESCRIPt has a couple of tools for this:

qiime rescript orient-reads ...
^^ Just provide any reference database FASTA formatted artifact as your reference, along with your imported paired-end FASTQ artifact, then your reads should hopefully be re-oriented properly. Then you can proceed with DADA2, ... classification, and see if they improve.

and
qiime rescript orient-seqs ...
^^ If you have an already merged FASTA file artifact you can reorient these. Then retry classification. Though I prefer to do this with FASTQs, as I worry about potential denoising issues.

irenedecarlos · October 21, 2025, 2:11pm

Hi @SoilRotifer , thanks a lot for the inputs. I have tried other classifiers (silva) and it works well, also I trained the beexact clasifier without the trim to V3-V4 and worked as well. Its only when I trim for the V3-V4 region… I have tried reorienting the reads as you proposed but I still have the same issue, I have gotten in contact with the creator of beexact to see if he can help! Thanks for the replies :).

gregcaporaso · October 21, 2025, 2:35pm

Hi @irenedecarlos,
One other thing to try: you can run qiime feature-table tabulate-seqs on the BEEx-V3V4-refs_sequences.qza file that you created, and the corresponding FeatureData[Taxonomy] artifact could be provided as well. That would let you actually look at the sequences post-trimming, and make sure they look like what you expect. You could potentially try searching that file as well with some of your sequences to see if they hit - if not, that might provide some insight into what's going on.

Is there any chance that the region you're trimming to doesn't match what was sequenced (e.g., the sequencing was actually V2, but you're trimming to V3-V4)? (I don't mean to suggest you're making a silly mistake - it's just that this result would be what I would expect if the trimming was incorrect, so just want to throw that idea out there so you can confirm.)

Feature-clasifier V3-V4

Classifier works without extract-reads, but collapses to one phylum after trimming (BEExact, V3–V4)

Classifier works without `extract-reads`, but collapses to one phylum after trimming (BEExact, V3–V4)