Preparing SILVA132 for QIIME1/2 Use

peterleary · December 15, 2017, 5:43pm

So I figured out that when doing the pick_rep_set.py script, I’m supposed to use the otu_map.txt from pick_otus.py and the original SILVA132 fasta file, rather than the denovo_abundance_sorted.fna that’s also produced from pick_otus.py. Doh. I think I’ve made some progress though.

I followed the instructions from the SILVA128 notes to get a 16S 99% rep set fasta file (as per the “creation of representative sequence files” and then “Splitting fasta files by domain” sections) – then imported this into QIIME2.

Next, I followed the instructions to create a taxonomy map (as per the “Taxonomy mapping file creation” and then “Parsing and splitting taxonomy mapping files” sections). It would not parse to 7 levels (from 14), and I didn’t create a majority or consensus taxonomy map yet (namely because I was too eager to see if it would even work). My parsed 16S taxonomy map imported into QIIME2.

I then extracted the reads and trained the classifier, then classified the features. And, it seems to work?

I’m comparing the same data to a 99% SILVA128 classifier I made, and there’s some differences, namely in the classification of archaea. But the SILVA128 classifier was made with a majority taxonomy map, so it’s not quite like-for-like.

The author of the SILVA128 notes and the authors of the scripts have made it possible for me to get to this stage, but even then it’s not been easy for an amateur like me! Now I have my head around the principles of the task, it seems pretty straightforward. Perhaps just a more detailed walkthrough would suffice. It’s a job that only really needs doing once, and the folks who have done it previously have saved us more trouble than I think people like me realised (as is always the way!) All the scripts used are on GitHub so maybe it’s just a case of making them a plugin?