Classification of ITS question

The issue here is that some of the feature IDs in the taxonomy do not match those in the seqs. My guess is this is because you converted lowercase 'ACGT' in the seqs to uppercase at some point, because the mismatched IDs look like this:

# In the seqs
SH1660673.08FU_DQ974771_refs_sinGleTon
# in the taxonomy
SH1660673.08FU_DQ974771_refs_singleton

Go back to the start and uppercase using the example in this tutorial:

awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' developer/sh_refs_qiime_ver7_99_01.12.2017_dev.fasta > developer/sh_refs_qiime_ver7_99_01.12.2017_dev_uppercase.fasta

That will only convert to uppercase if it is not in the header line.

1 Like