Human oral microbiome database

RuneGronseth · March 8, 2018, 2:47pm

Hi,
One of our collaborators (a medical microbiologist) advocates for using the Human oral microbiome database. On their website various downloads are available: http://www.ehomd.org/index.php?name=seqDownload&file&type=R

I see that there are some old versions in "QIIME" format. Does anyone have advice on how to modify/train these files to be able to use them in our qiime2 run?

Thankful for any help!

Rune

Nicholas_Bokulich · March 8, 2018, 2:54pm

Hi @RuneGronseth,
QIIME1-compatible files will be QIIME2-compatible. So you can use the "QIIME" taxonomy and fasta files as they are. It looks like only an older version of the fasta files is qiime compatible. To convert the v. 15.1 fasta to be QIIME2 compatible, check out the v. 13.2 in qiime format; you just need to:

remove everything from the fasta header lines except for the seq ID
make sure the sequences are not aligned. No gaps! No lowercase characters!

Then you can import to QIIME2 and use for classifier training and taxonomy classification as described in this tutorial. That tutorial just covers the naive bayes machine learning classifier in QIIME2 — that should work great on this dataset since it's 16S, but there are other classifiers (taxonomy consensus classifiers based on BLAST+ and vsearch) in case you want to give those a try.

I hope that helps!

RuneGronseth · March 9, 2018, 3:11pm

Thank you so much! I'm not that experienced in grep commands, would there be a simple command to just erase everything after the occurrence of a vertical bar and before the next line shift?

Thanks!

Rune

ebolyen · March 13, 2018, 8:33pm

Hi @RuneGronseth,

Sorry for the delayed response.

There is, but in unix nothing is simple, just always possible. The program you need is sed and here's an invocation that should work:

 sed 's/|.*//' path/to/your/sequences.fasta > cleaned.fasta

which means, search a line (s), until you match a pipe and anything after (|.*), then replace it with nothing. The / are delimiters for the terms, which doesn't help the readability either.

Hope that helps!

system · April 14, 2018, 2:34am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.