Taxonomic assignment with UNITE database: most reads classified as "s__unidentified"

Nicholas_Bokulich · March 8, 2021, 4:25pm

Hi @Danilo_Reis ,
Two hypotheses for the low diversity: either (1) too many reads were lost during QC, or (2) your reads are hitting junk reads in the UNITE database, e.g., abnormally short seqs.

Troubleshooting/solutions:

look at your dada2 stats and feature table summaries, keeping an eye on if/where reads are lost. If you are losing many reads during merging, analyze single-end reads instead of paired-end.
Use RESCRIPt to filter out abnormally short/long sequences, and maybe q2-taxa to remove any unidentified sequences from UNITE, if desired:
Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt

Note: I changed the title to be more descriptive. Thanks!

Hope that helps!