Thanks for sending the viz in a DM! Okay, guess what? Even though the viz looks okay, if I pull out the FASTA file from within:
>3cafeb270ab9f7183bbc2e7c24b7cc1ffb2f196c UU3micro-18S-12_S14_L001_404900
CCTGAAAGCCGGTAATGACTTTCTCGCGTCAAACCGCGAAAAGCCAGGCGTGACCGAACTCCTCAGCGGACTCCAGTACGAAGTGATCCACATGGGTGACGGAGCCAAACCCTGGCCCACCAGCAAAGTGACCTGCCATTACCAT
>bbfdbd5a45738494ef2a3fc5f95a878bfb9f8475 UU3micro-18S-12_S14_L001_404902
ATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGACTGGTTCATCGAGCTTACAACCTTGACCGAGAGGTCTGGGTAATCTTTTTAAAGCCAGTCGTGATGGGGATAGATTATTGCAATTTTTAATCTTCAACGAGGAATTCCTAGTAGACGCAGGTCATCAACCTGCATCGATTACGTCCCTGCCCTTTGTACACGCCGCCCGTCGCTACTACCGATTGAATGGCTTAGTGAGCCCTCTGGACTGGTGCACGGCGTTGGAAACTTCGCCGCGCGTTCAGGAAGGAGGTCAAACTTGATCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCC
>bc69f76bfcb0deac1cb364f08edd54725d7d47d1 UU3micro-18S-12_S14_L001_404918
ATAACAGGTCTGTGATGCCCTTAGATGTTTTGGGCTGCACGCGTGCTACACTGGTTTAATTAACGAGCTGCTGGTCTTGTTTGAAAGCGTGGGGTAAACTTTAATGTAAATCGTGATTGGGGTGGATTGTTGCAATTATTGATCTTGAACGAGGAATTCCTAGTACGCCGAAGTCATCAGCTTGGGCTGACTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATTGAGTAATCCGGTGAAATGCTTGGCTTGGCACAGTGGTCATAAATGAGTGTTGTGCAACAAGTGCTTTGAACCTTGTTACTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCT
Ah hah! vsearch
, you scoundrel ! Okay, so the main issue appears to be that
dereplicate-sequences
is modifying the feature IDs. Secondly, the tabulate-seqs
viz should show the entire feature ID, not just the first word found in it.
Okay - workarounds...
You could cluster de novo at 100%. This will keep your Features more or less the same (for example, I ran this as a check on the dereplicated outputs for the Moving Pictures tutorial dataset, I started with 229143 features, after clustering, 229137).
If that is not acceptable for you, your other options are to choose a different pipeline (e.g. DADA2), or, clean up your feature IDs in some kind of external script or tool.
Thanks for working with us on this one! :qiime2: