I’ve completed the QIIME pipeline for 16S data, and some of my OTUs are either unclassified or uncultured bacteria. My supervisor suggested I BLAST my sequences, I’m just not sure what or how to do it.
If the sequences are in fastq format from the sequencing company, is there a way to convert them to fasta files? And after BLASTing the sequences, do I have to re-run the QIIME pipeline again on the BLAST outputs? Not sure what the analysis steps are after BLASTing sequences.
Sorry if this is a really simple question - still very new to the bioinformatics form of analysis!
A simple solution would be to just look at your rep-seqs visualization artifact (if you followed the moving picture tutorial, this is here). The features are already hyperlinked to Blast, so that’s all you have to do. So you can find the feature that are unclassified from your table and look for them there.
Of course this is only practical when you’re exploring a few unclassified reads if you have a bunch of these then other custom ways may be needed.
qiime taxa filter-seqs to only keep unclassified sequences, then
qiime feature-table tabulate-seqs.
I’m just a little confused - how do I set up the qiime taxa filter-seqs code to only keep unclassified sequences?
And just the clarify, you’d use the filtered .qza file in the feature-table tabulate-seqs code to create a new rep-seqs.qzv file. And then use the filtered .qzv file to see which/how many of the samples are unclassified? And then do the BLAST search then?
I guess I’m just also not sure what to be looking for after you BLAST a sequence…
Sorry I’m still very new to this and trying to get my head around it!
qiime taxa filter-seqs \
--i-sequences sequences.qza \
--i-taxonomy taxonomy.qza \
--p-include Unclassified \
Yes. The tabulate-seqs output will contain only unclassified sequences, with hyperlinks to NCBI BLAST searches with those sequences. So no need to copy/paste individually into NCBI BLAST.
Chances are these are unclassified because they are non-target DNA (e.g., host DNA) or some sort of error-riddled sequences. So BLAST will probably confirm this (the former if you get something like human, mouse, or plant hits; the latter if you do not get very good matches, e.g., only partial or low-quality alignments).
No need to blast every last sequence… spot check as many as needed to confirm a trend (i.e., these are bad seqs or there really is a lot of bacteria and something went wrong with your classifier) and then either filter out or reclassify those sequences as appropriate.
Thank you so much for that information - It’s been really helpful at getting my head around my data!
Also really helped at knowing what to look for when I BLAST.
Really appreciate the help
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.