Rescript 'KeyError'

arwqiime · September 20, 2021, 3:32pm

Hello,
I have imported 16S sequences and a corresponding taxonomy from a database of putative pathogenic bacteria. In order to prepare a taxonomy classifier from these sequences, I extracted 16S fragments (q2 feature-classifier extract-reads). When I run q2 rescript dereplicate, I get a KeyError as shown in the log file, which indicates an issue with the 'Feature ID'.

Plugin error from rescript:
'16S_015'

Is there any restriction with feature ids, e.g. the presence of underscores, etc.?
I attach the log file (changed to *.txt due to upload restrictions), but I can add some more documents, if necessary.
Thank your for your comments!

qiime2-q2cli-err-2r3fb6vc.log.txt (4.2 KB)

SoilRotifer · September 20, 2021, 4:49pm

Yes there are. Here are the general metadata requirements. More specifically, here are the identifier requirements.

Basically, you'd want to avoid underscores (_) in the sequence labels. Many older tools (e.g. QIIME 1) and pipelines often only consider anything before the first underscore as the sequence or sample label. There are also inconstancies regarding this approach...

I suspect what is happing is that, one portion of the code is trimming off anything after the underscore, and the other is not. That is 16S_015 is being trimmed to 16S and saved. Then when another part of the code tries searching for 16S_015 (i.e. the key), it is not found because it has been changed to 16S. This the key error.

system · October 21, 2021, 10:49pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.