Dear QIIME2 developers,
I am trying to use qiime feature-classifier to train a classifier for the use of 18S sequence.
here is the primer I used to extract the v4 target sequence:
565F CCAGCASCYGCGGTAATTCC 948R ACTTTCGTTCTTGATYRA
The confusing thing is that the length of sequences extracted by “qiime feature-classifier extract-reads” is far more than 948-565=383 (average 800 nt actually)
The even more confusing thing is that the taxonomy i got from “qiime feature-classifier classify-sklearn” is exactly all the same for all 330 features. Just for brief view:
|Feature ID|Taxonomy|
|0b298f6d9f609ae27dff8397c6b5dfea|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|e49af68b1ed288f280740f1efd0f628a|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|4e798893271f5ecce40feed01c675957|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|42613dfbaf97fdcd05e4c7da6852390d|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|581e00979f3019eb8a88ea8f05be2110|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|9c0bddb2fa5aa59e7059faefdd81b161|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|c07c010e7f93e777d631ebf52b1c3809|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|d11b4aa67389d800c12dbfbb1abe8f62|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|f1d3bed5959ea9b11ec2a8ea15e4a0ac|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|1a254c4fdce63af73175a6caba4aae73|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|e3b005dbf7d747de71d29b8ebbcb5fdb|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|2de3cb6cd76a45c01ce5daf55122c28d|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|09e9fb7ab6765be1d52adb5b4bf37a94|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|160f27c869746e32d4924e18d2049c35|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|9fc64f6b75f145c4657e3b939231c7f9|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|1715e06abf6db633d7a01ab0122c31b4|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
|90289e6e368fd01fbeee6d7562b32228|D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_8__Arachnida;D_9__Opiliones;D_10__Karamea lobata australis|
But if I use the full-length classifier, the taxonomy go back to normal, with different taxonomic names for every features.
here are the commands i used for training classifier:
_qiime tools import --type ‘FeatureData[Sequence]’ --input-path SILVA_132_QIIME_release/rep_set/rep_set_18S_only/99/silva_132_99_18S.fna --output-path rep-set.qza&&_
_qiime tools import --type ‘FeatureData[Taxonomy]’ --input-format HeaderlessTSVTaxonomyFormat --input-path SILVA_132_QIIME_release/taxonomy/18S_only/99/consensus_taxonomy_7_levels.txt --output-path ref-taxonomy.qza&&_
_qiime feature-classifier extract-reads --i-sequences rep-set.qza --p-f-primer CCAGCASCYGCGGTAATTCC --p-r-primer ACTTTCGTTCTTGATYRA --o-reads ref-seqs.qza&&_
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy ref-taxonomy.qza --o-classifier classifier.qza
and here is the command i used for assign taxonomy:
qiime feature-classifier classify-sklearn --p-n-jobs 16 --i-classifier classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
I am using the qiime2-2018.8 version of qiime2. I wonder if this is a common situation in qiime2, or if I was wrong about something. I would appreciate that if someone would like to help me figure this out.
Thanks