Get a full length 16S rRNA gene sequence from a database from an ASV/OTU

Hello!

If I have a table of 16s ASVs with assigned taxonomy, is there any way to get a full length sequence of a 16S rRNA gene for a species belonging to the taxonomic group, e.g. by pulling from a database?

For example, I could BLAST the ASV corresponding to the a given genus, then download the 16S rRNA gene associated with top result. However, I am looking for a more automated way of doing this, as this is far too labor intensive to do for hundreds of ASV

If anyone knows of any way to do this, that would be incredible

Thank you for reading!
August

Welcome @a_staubus !

This is basically the motivation behind automated taxonomic classification methods such as those in QIIME 2's q2-feature-classifier plugin (and also other taxonomy classifiers like RDP etc). BLAST also has a batch mode (also in the NCBI web interface), though you'd still need to find a way to map the top hits in the blast results table back to the sequences (q2-feature-classifier's classify-consensus-blast action wraps blastn to automatically find the taxonomies for all hits, and find their consensus).

What you are describing is basically a closed-reference OTU clustering approach. You can see here for an example:
https://docs.qiime2.org/2022.2/tutorials/otu-clustering/#closed-reference-clustering

Your observed feature IDs (in the output feature table) would then correspond to the reference sequence IDs. If what you really want is just a fasta of hits (rather than the table containing counts per sample for each reference sequence), then you can use this action to filter the reference sequences to only those contained in the table:
https://docs.qiime2.org/2022.2/plugins/available/feature-table/filter-seqs/

Good luck!

1 Like