feature-classifier classify-sklearn trimming zeros off sample-ids?

Cistron · September 27, 2019, 9:32am

Hi all,

I have come across some weird behaviour. I've sent my data through the deblur pipeline and assigned taxonomy.

qiime feature-classifier classify-sklearn \
    --i-classifier silva-132-99-515-806-nb-classifier.qza \
    --i-reads rep-seqs-deblur.qza \
    --o-classification taxonomy-deblur.qza

Though when I tried to make bar-plots I get an error that some of my Feature IDs are missing.

qiime taxa barplot \
    --i-table table-deblur.qza \
    --i-taxonomy taxonomy-deblur.qza \
    --m-metadata-file oral_microbiome.tsv \
    --o-visualization taxa-bar-plots-deblur.qzv

Plugin error from taxa:

  Feature IDs found in the table are missing in the metadata: {'15.0', '18.0', '16.0', '14.0', '13.0', '10.0', '8.0', '17.0', '9.0', '12.0', '11.0'}.

This being a time-series I gave them IDs like 8.00, 9.00 etc. I believe the last zero was clipped off the feature IDs in the taxonomy table, as the only sample not 'missing' is 8.02.

I will probably just change the IDs in my metadata file and run everything again. However, I thought I post, in case someone else stumbles across this error or maybe it could initiate a fix for the next version.

Cheers,
Mike

Cistron · September 27, 2019, 11:01am

I'd just like to add that leading zeros (0) in sample IDs (I assume coming from a manifest txt) are also stripped off somewhere along the way.

ben · September 27, 2019, 11:40am

Is the classifier you are using correct for the V region you are using in your microbiome data? What V region did you use (it looks like v4?)

Cistron · September 27, 2019, 11:48am

Hi Ben,

Yes, the data were generated with EMP primers; so they are v4 between 515f and 806r.

Anyway, once I change the sample IDs in the manifest file (for importing demultiplexed data) to something like s01 to s12 all is good.

Nicholas_Bokulich · September 27, 2019, 2:10pm

Feature classifier is definitely not doing this, because it is not even touching the samples! You are inputting feature data only (no sample IDs in any of these inputs or outputs):

This issue is happening elsewhere upstream, very possibly deblur (which has its own sample ID formatting requirements outside of QIIME 2's). In general, using numbers as sample IDs is quite a bad idea, since default behavior of some underlying programs will be to read this as a number and drop the trailing zeroes