Error in q2-clawback to assemble taxonomic weights

Hi everyone!!

I am trying to train a classifier for my datasets, which come from V3-V4 16S rRNA

1 Like

Hello Pau,

Can you post the command you ran and the error you got? The more information you provide, the more we can help.

Colin

Sorry @colinbrislawn I was writting the post but I didn’t want to post it in that moment, but I did so accidentaly. Therefore, many information is lacking.

The problem I have is that I want to use my data to assemble weights using q2-clawback in order to train a classifier as said here (Using q2-clawback to assemble taxonomic weights) in “assembling more exotic weights”.
I’m having problems with the following command:

qiime clawback sequence-variants-from-samples
–i-samples samples.qza
–o-sequences sv.qza

The input demands a feature table, and after giving it I get the same error:

Plugin error from clawback:

Invalid characters in sequence: [‘0’, ‘d’, ‘e’, ‘n’, ‘o’, ‘v’].
Valid characters: [‘H’, ‘R’, ‘N’, ‘M’, ‘W’, ‘D’, ‘-’, ‘V’, ‘S’, ‘K’, ‘T’, ‘.’, ‘B’, ‘Y’, ‘G’, ‘A’, ‘C’]
Note: Use lowercase if your sequence contains lowercase characters not in the sequence’s alphabet.

Debug info has been saved to /tmp/qiime2-q2cli-err-gf1h38lt.log

I tried to change of table many times, even with qiime2 tutorial tables (just for testing), but I still get the same error (only with different invalid characters).
I don’t know if I’m providing a wrong input.

Thank you very much, I hope I explained well this time!

Thanks @pau, how are you generating samples.qza?

If you are using deblur or dada2, you need to add the flag —p-no-hashed-feature-ids. Not using that flag would result in the error that you are seeing. (—p-no-hashed-feature-ids forces the denoising step to label features with their actual sequences, which is essential for this step.)

If you are not using deblur or dada2 there may be other workarounds, but I would have to know more about your pipeline to make a recommendation.

2 Likes

Hi Ben, thanks a lot for your quick response!
I’m trying this option using a new dada2 on a small test dataset.
By de moment, could this option in dada2 (—p-no-hashed-feature-ids) affect other processes down stream which also use the feature table, such as (alpha-beta diversity, taxonomy, differential abundance tests, etc.)?
Thanks again!

No, that option only impacts the feature IDs. So unless if you are merging with other datasets (in which case feature IDs must match), this option will not impact your results.

1 Like

Dear developers,
I continued with the pipeline in the tutorial with the following commands. From a dada2 performed from sequences from my dataset to be considered as “normal weights”,the feature table is called “table_full.qza”. As you said with option --p-no-hashed-feature-ids.

qiime clawback sequence-variants-from-samples
–i-samples DADA2_files/table_full.qza
–o-sequences sv.qza

qiime feature-classifier classify-sklearn
–i-classifier v3v4-classifier.qza
–i-reads sv.qza
–p-confidence=1
–o-classification classification.qza

#v3v4-classifier.qza is the one i got from the other trainingm tutorial using v3v4 region primers as said in https://docs.qiime2.org/2018.11/tutorials/feature-classifier/

qiime clawback generate-class-weights \
  --i-reference-taxonomy ref-taxonomy.qza \
  --i-reference-sequences ref-seqs.qza \
  --i-samples DADA2_files/table_full.qza \
  --i-taxonomy-classification classification.qza \
  --o-class-weight nasal-weights.qza

So now I’m having trouble with this last command, getting this error

Plugin error from clawback:

taxonomy_classification does not match reference_taxonomy

Do you have any idea of what’s going wrong?

Thank you very much for your help!!! :slightly_smiling_face:

Hi @pau, thanks for sticking with it.

The classifier (v3v4-classifier.qza) must be trained with the same reference taxonomy and sequences that you then feed into generate-class-weights (ref-taxonomy.qza and ref-seqs.qza). Is v3v4-classifier.qza trained on ref-taxonomy.qza and ref-seqs.qza?

Also, be aware that you should be using 99% OTUs, not the 85% OTUs in that tutorial (which you probably already knew).

Hi @BenKaehler ! Thanks a lot for your help.
Yeah, this classifier is trained with the same ref-taxonomy and ref-seqs qza, as well as 99% OTUs are being used. The error is the same in the command (qiime clawback generate-class-weights):

Plugin error from clawback:
taxonomy_classification does not match reference_taxonomy

I’m attaching the whole pipeline I’m trying to use to obtain a trained classifier, it might be of use for you! Maybe I’m doing something else wrong.

Thanks again!

qiime tools import
–type ‘FeatureData[Sequence]’
–input-path 99_otus.fasta
–output-path 99_otus.qza

qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path 99_otu_taxonomy.txt
–output-path ref-taxonomy.qza

qiime feature-classifier extract-reads
–i-sequences 99_otus.qza
–p-f-primer CCTACGGGNGGCWGCAG
–p-r-primer GACTACHVGGGTATCTAATCC
–o-reads ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads ref-seqs.qza
–i-reference-taxonomy ref-taxonomy.qza
–o-classifier v3v4-classifier.qza

qiime clawback sequence-variants-from-samples
–i-samples DADA2_files/table_full.qza
–o-sequences sv.qza

qiime feature-classifier classify-sklearn
–i-classifier v3v4-classifier.qza
–i-reads sv.qza
–p-confidence=1
–o-classification classification.qza

qiime clawback generate-class-weights
–i-reference-taxonomy ref-taxonomy.qza
–i-reference-sequences ref-seqs.qza
–i-samples DADA2_files/table_full.qza
–i-taxonomy-classification classification.qza
–o-class-weight nasal-weights.qza

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads ref-seqs.qza
–i-reference-taxonomy ref-taxonomy.qza
–i-class-weight nasal-weights.qza
–o-classifier nasal-classifier.qza

1 Like

Hi @pau, try setting --p-confidence=disable instead of --p-confidence=1 when you run qiime feature-classifier classify-sklearn. The interface changed and the tutorial wasn’t updated to reflect that. (I have updated it now.)

I suspect that classification.qza contains a bunch of empty classifications, which may be leading to that error.

2 Likes

Hi @BenKaehler
Everything solved! Thanks a lot for your time!! :slightly_smiling_face:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.