I have used the RESCRIPt plugin to import nifH nucleotide data from NCBI to create a classifier for this gene, and am attempting to quality-filter the data so that it's more accurate when run against my representative sequences.
For the Silva tutorial, I noticed that there is a dereplicate option before and after making an amplicon-specific classifier, using specific commands such as:
--p-rank-handles 'silva' \ --p-mode 'uniq' \
I tried to dereplicate my data, keeping the --p-mode function the same but switching out the 'silva' for 'ncbi' in --p-rank-handles, but both commands caused errors, so I'm assuming these commands are specific to Silva, Greengenes, etc. If so, why don't we need to dereplicate with these setting in NCBI?
I also noticed that this command:
Also doesn't work when I try to dereplicate. Why is this not necessary/used with NCBI data?
Just trying to understand the ins-and-outs of this stuff for my own sanity. Any clarification would be deeply appreciated. Thank you!!!