I see that there are some old versions in "QIIME" format. Does anyone have advice on how to modify/train these files to be able to use them in our qiime2 run?
Hi @RuneGronseth,
QIIME1-compatible files will be QIIME2-compatible. So you can use the "QIIME" taxonomy and fasta files as they are. It looks like only an older version of the fasta files is qiime compatible. To convert the v. 15.1 fasta to be QIIME2 compatible, check out the v. 13.2 in qiime format; you just need to:
remove everything from the fasta header lines except for the seq ID
make sure the sequences are not aligned. No gaps! No lowercase characters!
Then you can import to QIIME2 and use for classifier training and taxonomy classification as described in this tutorial. That tutorial just covers the naive bayes machine learning classifier in QIIME2 — that should work great on this dataset since it's 16S, but there are other classifiers (taxonomy consensus classifiers based on BLAST+ and vsearch) in case you want to give those a try.
Thank you so much! I'm not that experienced in grep commands, would there be a simple command to just erase everything after the occurrence of a vertical bar and before the next line shift?
There is, but in unix nothing is simple, just always possible. The program you need is sed and here's an invocation that should work:
sed 's/|.*//' path/to/your/sequences.fasta > cleaned.fasta
which means, search a line (s), until you match a pipe and anything after (|.*), then replace it with nothing. The / are delimiters for the terms, which doesn't help the readability either.