taxonomy db problem

Hello

I imported the database and ran vsearch.

Most of the taxonomy only comes up to "d__bacteria".

There is no "d__bacteria" in my database, starting at the phylum level.

Is there a separate format for the database import tsv file?

thank you.

Helllo @shinseung,

Welcome back to the forums.

Can you post the full command you ran? I think that would help clarify what the vsearch plugin has done.

Thank you for answer.

After editing the taxonomy file, some taxonomy names were correct.

The taxonomy txt file in the database has been imported into QIIME2, but there seems to be a problem with the format.

But still some come out as "d__Bacteria".

Is there a way to validate the taxonomy file?

There seems to be an invisible tab.

thank you.

OK.

Can you post

  • the command you ran to import the database
  • the vsearch command you ran
  • the first few lines of the database file before import

It's a tab separated values file (TSV), but other than that I'm not sure we have a stand-alone validator. The RESCRIPt plugin does provide a lot of functionality. Check out this section of the tutorial about editing taxonomy.

The database is customized.

My guess is that the taxonomy format of gtdb is the problem.

'-' or '#' goes into the feature id.

Of course, only "d_Bacteria" came out when vsearch was performed after importing and removing - and #.

The gtdb taxonomy file was created by extracting the header of the fasta file.

Attach the taxonomy file.

thank you.add.txt (4.8 MB)

When we see cases in which most of the reads are poorly taxonomically assigned, it is likely that your sequences are not in the same orientation as the reference database. Here are a few threads that may help:

When it is not possible to achieve an exact or near-hit to any of the reference sequences, the classifier will return the Lowest Common Ancestor (LCA) taxonomy of similar sequences.

-Cheers!
-Mike

1 Like