Here is a summary of the results:
From the BOLD database, the taxonomic assignment looks as follows:
BOLDSYSTEMS - using this db
Using this, the result is as
100.00 4905 0 R 1 root
99.70 4890 0 R1 131567 cellular organisms
45.71 2242 0 D 2759 Eukaryota
45.63 2238 0 D1 33154 Opisthokonta
45.61 2237 0 K 33208 Metazoa
45.61 2237 0 K1 6072 Eumetazoa
45.61 2237 0 K2 33213 Bilateria
43.41 2129 0 K3 33317 Protostomia
0.12 6 0 K4 2697495 Spiralia
0.12 6 0 K5 1206795 Lophotrochozoa
0.12 6 0 P 6157 Platyhelminthes
0.10 5 0 C 6199 Cestoda
0.10 5 0 C1 6200 Eucestoda
0.10 5 0 O 56708 Tetraphyllidea
0.10 5 0 F 2835625 Rhoptrobothriidae
0.10 5 0 G 2835626 Myzophyllobothrium
0.10 5 0 G1 2835628 Unclassified Myzophyllobothrium
0.10 5 5 S 2835629 Myzophyllobothrium sp. 'cf. nagasawai'
The last result is Myzophyllobothrium sp. 'cf. nagasawai'.
This result corresponds to 5 sequences as follows:
C SRR16510773.5626088 2835629 52 131567:2 2835629:2 0:14
@SRR16510773.5626088 5626088 length=52
CTCTACGCATTTCACCGCTACACCTGGAATTCTACCTTCCTCTCACATACTC
+SRR16510773.5626088 5626088 length=52
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
C SRR16510773.5944974 2835629 54 0:15 2835629:2 131567:3
@SRR16510773.5944974 5944974 length=54
AGAGTATGTGAGAGGAAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGA
+SRR16510773.5944974 5944974 length=54
AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
C SRR16510773.5944974 2835629 54 0:15 2835629:2 131567:3
@SRR16510773.5944974 5944974 length=54
AGAGTATGTGAGAGGAAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGA
+SRR16510773.5944974 5944974 length=54
AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
C SRR16510773.6212720 2835629 53 0:12 2835629:2 131567:2 A:3
@SRR16510773.6212720 6212720 length=53
GTATGTGAGAGGAAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATC
+SRR16510773.6212720 6212720 length=53
AAAAAEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEAEEEEEE/EE
C SRR16510773.6568304 2835629 66 0:28 2835629:2 131567:2
@SRR16510773.6568304 6568304 length=66
AAACTGTAGAGCTAGAGTATGTGAGAGGAAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAG
+SRR16510773.6568304 6568304 length=66
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
The sequence from the BOLD database that matches the above sequences is: >2835629:emu_db:1|kraken:taxid|2835629
ATTAATACTCTAGGATANTGGACGTTACTCNCANAATAAGCACCGGCTAACTCTGTGCCA
GCAGCCGCGGTAATACAGAGGGTGCGAGCGTTANTCGGATTTACTGGGCGTAAAGCGTGC
GTAGGCGGCTTNTTAAGTCGGATGTGAAATCCCTGAGCTTAACTTAGGAATNGCATTCGA
TACTGGGAAGCTAGAGTNTGGGAGAGGANGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
TAGAGAT
However, when I run a BLAST search for these 5 sequences on NCBI, the highest-ranking result is g-proteobacteria, rather than the expected taxon.
I believe this discrepancy might stem from the differences between the BOLD database accession numbers and the NCBI accession numbers. I manually aligned the 5 sequences with the BOLD database, and I found that the identity between them is about 94%, which seems to be correct.
The issue appears to be that Kraken2 is assigning a different taxon compared to what is reported in the BOLD database.
For context, I used the qiime moshpit build-kraken-db
command to generate the Kraken2 database for QIIME2. However, I did not use the kraken2-build
command from the Kraken2 Python package to transform the database. I suspect that this is the reason for the incorrect taxonomic assignment.
Could you please confirm if the issue could be due to the way I built the Kraken2 database in QIIME2? Is there anything I might be missing in the database preparation process?
I appreciate any guidance or suggestions you can provide!
Best regards,