RESCRIPt question: Can you/should you dereplicate NCBI data?

SoilRotifer · August 25, 2021, 4:12pm

What are the errors?

The --p-rank-handles simply sets the expected default prefix style, you can checkout the code here:

rank_handles = {
'silva': [' d_', ' p__', ' c__', ' o__', ' f__', ' g__', ' s__'],
'greengenes': ['k__', 'p__', 'c__', 'o__', 'f__', 'g__', 's__'],
'gtdb': ['k__', 'p__', 'c__', 'o__', 'f__', 'g__', 's__'],
'disable': None,
}

You can, of course, disable these using --p-rank-handles 'disable'.

Dereplication has always been optional, but it is often recommended to keep the database size small and remove redundant information.

Paste the error output here. Have you worked through this COI NCBI tutorial? You'll see that rescript dereplicate is used there.

What ranks did you choose while downloading data from NCBI? Can you list all commands used prior to the dereplication step?

-Mike