i have fastQ files produced by shotgun DNA sequencing (paired-end) performed with illumina platform.
I have sequenced 3 fungi in monocolture, so i have 3 samples, i expect 3 different species. I have 2 questions:
is it possible to classify these sequences at the species level by using Kraken2, considering that we're talking about fungal species? The point is that i want to be very specific, because i don't want to know just the genra, but i want to know the species.
If i convert fastQ file in FASTA files, can i use ITSx to extract ITS sequences (already i'm processing), and classify them in UNITE database in order to obtain my taxa classification? It will be more accurate and specific for fungi? Or should i adopte other strategies?
Hi @FRANCESCOMONTESI ,
If you really require species-level information, I suggest following an assembly-based workflow (e.g., see the basic MAG assembly workflow in the MOSHPIT documentation). Assembling contigs/MAGs will give you longer contiguous sequences to work with, which are generally more informative than short raw reads.
read-based classification will be a bit mixed. Some reads will likely classify to species level, others will not (e.g., if they cover regions that are highly conserved across a larger clade). Assembling these into contigs/binning into MAGs will give longer sequences that will classify more specifically. But results will vary depending on your database, the species present, etc. So it may be best to just try and see, and then follow the (much more complicated) assembly workflow if you are not satisfied with read-based classification.
No, this will have limited effectiveness. You could probably use ITSx to identify the reads that map to ITS (though other tools could also do this probably more efficiently). And then classify those against UNITE. But they will still be short Illumina reads, not the full ITS, and so will give incomplete classifications.
So if deeper classification really matters for your experiment, try an assembly-based workflow!