Hi everyone!!
I am trying to train a classifier for my datasets, which come from V3-V4 16S rRNA
Hi everyone!!
I am trying to train a classifier for my datasets, which come from V3-V4 16S rRNA
Hello Pau,
Can you post the command you ran and the error you got? The more information you provide, the more we can help.
Colin
Sorry @colinbrislawn I was writting the post but I didn’t want to post it in that moment, but I did so accidentaly. Therefore, many information is lacking.
The problem I have is that I want to use my data to assemble weights using q2-clawback in order to train a classifier as said here (Using q2-clawback to assemble taxonomic weights) in “assembling more exotic weights”.
I’m having problems with the following command:
qiime clawback sequence-variants-from-samples
–i-samples samples.qza
–o-sequences sv.qza
The input demands a feature table, and after giving it I get the same error:
Plugin error from clawback:
Invalid characters in sequence: [‘0’, ‘d’, ‘e’, ‘n’, ‘o’, ‘v’].
Valid characters: [‘H’, ‘R’, ‘N’, ‘M’, ‘W’, ‘D’, ‘-’, ‘V’, ‘S’, ‘K’, ‘T’, ‘.’, ‘B’, ‘Y’, ‘G’, ‘A’, ‘C’]
Note: Use lowercase
if your sequence contains lowercase characters not in the sequence’s alphabet.
Debug info has been saved to /tmp/qiime2-q2cli-err-gf1h38lt.log
I tried to change of table many times, even with qiime2 tutorial tables (just for testing), but I still get the same error (only with different invalid characters).
I don’t know if I’m providing a wrong input.
Thank you very much, I hope I explained well this time!
Thanks @pau, how are you generating samples.qza
?
If you are using deblur
or dada2
, you need to add the flag —p-no-hashed-feature-ids
. Not using that flag would result in the error that you are seeing. (—p-no-hashed-feature-ids
forces the denoising step to label features with their actual sequences, which is essential for this step.)
If you are not using deblur
or dada2
there may be other workarounds, but I would have to know more about your pipeline to make a recommendation.
Hi Ben, thanks a lot for your quick response!
I’m trying this option using a new dada2 on a small test dataset.
By de moment, could this option in dada2 (—p-no-hashed-feature-ids) affect other processes down stream which also use the feature table, such as (alpha-beta diversity, taxonomy, differential abundance tests, etc.)?
Thanks again!
No, that option only impacts the feature IDs. So unless if you are merging with other datasets (in which case feature IDs must match), this option will not impact your results.
Dear developers,
I continued with the pipeline in the tutorial with the following commands. From a dada2 performed from sequences from my dataset to be considered as “normal weights”,the feature table is called “table_full.qza”. As you said with option --p-no-hashed-feature-ids.
qiime clawback sequence-variants-from-samples
–i-samples DADA2_files/table_full.qza
–o-sequences sv.qza
qiime feature-classifier classify-sklearn
–i-classifier v3v4-classifier.qza
–i-reads sv.qza
–p-confidence=1
–o-classification classification.qza
#v3v4-classifier.qza is the one i got from the other trainingm tutorial using v3v4 region primers as said in https://docs.qiime2.org/2018.11/tutorials/feature-classifier/
qiime clawback generate-class-weights \
--i-reference-taxonomy ref-taxonomy.qza \
--i-reference-sequences ref-seqs.qza \
--i-samples DADA2_files/table_full.qza \
--i-taxonomy-classification classification.qza \
--o-class-weight nasal-weights.qza
So now I’m having trouble with this last command, getting this error
Plugin error from clawback:
taxonomy_classification does not match reference_taxonomy
Do you have any idea of what’s going wrong?
Thank you very much for your help!!!
Hi @pau, thanks for sticking with it.
The classifier (v3v4-classifier.qza
) must be trained with the same reference taxonomy and sequences that you then feed into generate-class-weights
(ref-taxonomy.qza
and ref-seqs.qza
). Is v3v4-classifier.qza
trained on ref-taxonomy.qza
and ref-seqs.qza
?
Also, be aware that you should be using 99% OTUs, not the 85% OTUs in that tutorial (which you probably already knew).
Hi @BenKaehler ! Thanks a lot for your help.
Yeah, this classifier is trained with the same ref-taxonomy and ref-seqs qza, as well as 99% OTUs are being used. The error is the same in the command (qiime clawback generate-class-weights):
Plugin error from clawback:
taxonomy_classification does not match reference_taxonomy
I’m attaching the whole pipeline I’m trying to use to obtain a trained classifier, it might be of use for you! Maybe I’m doing something else wrong.
Thanks again!
qiime tools import
–type ‘FeatureData[Sequence]’
–input-path 99_otus.fasta
–output-path 99_otus.qza
qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path 99_otu_taxonomy.txt
–output-path ref-taxonomy.qza
qiime feature-classifier extract-reads
–i-sequences 99_otus.qza
–p-f-primer CCTACGGGNGGCWGCAG
–p-r-primer GACTACHVGGGTATCTAATCC
–o-reads ref-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads ref-seqs.qza
–i-reference-taxonomy ref-taxonomy.qza
–o-classifier v3v4-classifier.qza
qiime clawback sequence-variants-from-samples
–i-samples DADA2_files/table_full.qza
–o-sequences sv.qza
qiime feature-classifier classify-sklearn
–i-classifier v3v4-classifier.qza
–i-reads sv.qza
–p-confidence=1
–o-classification classification.qza
qiime clawback generate-class-weights
–i-reference-taxonomy ref-taxonomy.qza
–i-reference-sequences ref-seqs.qza
–i-samples DADA2_files/table_full.qza
–i-taxonomy-classification classification.qza
–o-class-weight nasal-weights.qza
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads ref-seqs.qza
–i-reference-taxonomy ref-taxonomy.qza
–i-class-weight nasal-weights.qza
–o-classifier nasal-classifier.qza
Hi @pau, try setting --p-confidence=disable
instead of --p-confidence=1
when you run qiime feature-classifier classify-sklearn
. The interface changed and the tutorial wasn’t updated to reflect that. (I have updated it now.)
I suspect that classification.qza
contains a bunch of empty classifications, which may be leading to that error.
Hi @BenKaehler
Everything solved! Thanks a lot for your time!!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.