Classifier trained on NCBI RefSeq 16S rRNA data gives weird results

Hi @mkcheung,

@Nicholas_Bokulich, reminded me of another potential reason for what you are observing. That is, the primers for the V1V9 region may simply not be present in many of the reference sequences, which basically means that the PCR primer search and extraction using qiime feature-classifier extract-reads may not work well. I'd expect that the V1V9 region is almost equivalent to the "full-length" classifier we provide. Perhaps sanity check that the full-length classifier provides reasonable results, and let us know what you find.

Other options:

  1. If you know the alignment positions within the SILVA alignment, you could simply download and import the one of the aligned SILVA FASTA files from here, and extract with the alignment positions themselves, with the qiime rescript trim-alignment --p-position-start xxx --p-position-end yyy ... flags. Then you can use qiime rescript degap-seqs ... to remove the alignment gaps, then proceed as you normally would with the remainder of the SILVA tutorial.
  2. Try out our alpha / beta release of our RESCRIPt extract-seq-segments command. In fact, this command was made with the thought that primer sequences may not be present within your reference reads.
1 Like