Taxonomy file has lines not in sequences file

Hello,

I passed a large database containing the entire ITS region through ITSx to isolate the ITS2 region. The process generated an ITS2 file that is missing some of the original fasta lines because it did not find the region. I have a taxonomy file that corresponds to the original database, which of course contains all the entries before filtering. Is it OK that the taxonomy file contains entries that are not in the corresponding sequences file? If not, is there a tool to do this or will I have to write a script to match and remove?

John

Hi @John ,

Yes this is not a problem (and is quite typical, actually, after filtering out sequences for one reason or another). The extra lines in the taxonomy will just be ignored by most actions.

If you ever run into a situation where the taxonomy and sequences must match exactly (there are a few rare actions that require this), you can use RESCRIPt to filter your taxonomy as shown in this tutorial (see the filter-taxa action):

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.