Error on qiime vsearch cluster-features-closed-reference command

Whenever I try to run this command:

qiime vsearch cluster-features-closed-reference \
    --i-table AGP_feature-table.qza \
    --i-sequences AGP_sequences.qza \
    --i-reference-sequences 99_ref_seqs.qza \
    --p-perc-identity 0.99 \
    --o-clustered-table AGP_table-cr-99.qza \
    --o-clustered-sequences AGP_rep-seqs-cr-99.qza \
    --o-unmatched-sequences AGP_unmatched-cr-99.qza

I get the following error:

Plugin error from vsearch:

  Invalid character in sequence: b'g'. 
  Valid characters: ['-', 'N', 'K', 'Y', 'G', 'M', 'S', 'R', 'A', 'W', 'H', 'T', 'D', 'B', 'V', 'C', '.']
  Note: Use `lowercase` if your sequence contains lowercase characters not in the sequence's alphabet.

Debug info has been saved to /tmp/qiime2-q2cli-err-c5jwypdt.log
Usage: qiime feature-table filter-samples [OPTIONS]

How can I fix this? Thank you!

Hey @Stephanieorch!

The problem is either in AGP_sequences.qza or 99_ref_seqs.qza. It looks like you have a fasta file with lowercase letters.

Did you by chance use the aligned fasta file on accident from what I assume is the GG database for 99_ref_seqs? Usually we see lowercase letters because the alignment algorithm is trying to communicate some extra information in-band.

I found the lowercase sequences in my AGP_sequences.qza file. I was able to fix those quickly. However, now I need to fix the ones in my biom table. Is there an easy way to do this?

1 Like

You may not need to worry about the IDs actually…

I presume your fasta file looks something like this?


If so, the lowercase sequences in your fasta ID and biom table IDs won’t really be an issue. It’s just the sequence proper that needs to be uppercase.

Otherwise, if you do need to relabel your biom table, you could use feature-table group with a column that just maps every ID to a new ID (in this case the uppercase form).

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.