Nice catch @devonorourke!
Update: if working with 1.8 million COI sequences, be prepared to wait about 12 days...
Hello, great job.
Did you distinguish between ‘Consensus’ or 'Majority Taxonomies’as described in the Qiime2-formatted SILVA 132 release notes?
Best regards,
Hi @arwqiime,
If you run qiime rescript dereplicate --help
you can read the descriptions on how each of the dereplication modes work on the sequences and taxonomy.
--p-mode TEXT Choices('uniq', 'lca', 'majority', 'super')
How to handle dereplication when sequences map to
distinct taxonomies. "uniq" will retain all
sequences with unique taxonomic affiliations. "lca"
will find the least common ancestor among all taxa
sharing a sequence. "majority" will find the most
common taxonomic label associated with that
sequence; note that in the event of a tie,
"majority" will pick the winner arbitrarily. "super"
finds the LCA consensus while giving preference to
majority labels and collapsing substrings into
superstrings. For example, when a more specific
taxonomy does not contradict a less specific
taxonomy, the more specific is chosen. That is,
"g__Faecalibacterium; s__prausnitzii", will be
preferred over "g__Faecalibacterium; s__"
For pre-made classifiers we used the uniq
setting.
-Mike
An off-topic reply has been split into a new topic: Testing feature classifier accuracy
Please keep replies on-topic in the future.
2 off-topic replies have been split into a new topic: Should I cluster my reference sequences to 97% for my classifier?
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: Is there a way to parallelize evaluate-fit-classifier?
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: Modifying taxonomic annotation from RESCRIPt
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: Slightly different taxa with regional and full length taxonomy classifiers
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: rescript reverse-transcribe error: file not found
Please keep replies on-topic in the future.
There is a typo in the command, it should be --p-replacement-strings
not --p-replacementS-strings
An off-topic reply has been split into a new topic: how to train my classifier for V3-V4 16SRNA gene region
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: Questions about SILVA classifier for v3v4
Please keep replies on-topic in the future.
An off-topic reply has been split into a new topic: unable to download SILVA ARB files
Please keep replies on-topic in the future.