I am pretty new in using qiime and microbiome analysis, as well.
I was following 'Training Feature Classifiers' tutorial for in-house developed reference database in aligned sequence format. When I got the error message of
Invalid value for "--i-sequences": Expected an artifact of at least
type FeatureData[Sequence]. An artifact of type FeatureData[AlignedSequence]
was provided.
I looked and learned that 'qiime feature-classifier extract-reads` is not functional for aligned sequences. Can you please guide me how to proceed?
You could rerun feature-classifier extract-reads on an unaligned version of your database.
Or you could 'un-align' your reads and pass those into the plugin. (Removing all the - from the reads should accomplish this. Let me know if you would like help crafting a sed command to remove all the dashes!)
Unfortunately, I only have the aligned fasta version of the reference (developed by a colleague) which has approx. 93 GB of size, so hard to manipulate. There are both dots and dashes that I need to get rid of (attaching a prtscr of it), and I was struggling with converting to un-aligned version. I would be very happy if you can help me to run the necessary ‘sed’ command.
Let's start with this detailed discussion about using sed to process fasta files:
With that in mind, let's craft this command:
My initial thought was to replace all dashes with - nothing sed 's/-//g' < input_aligned.fasta > output_unaligned.fasta
But... 1) that would replace dashes in the header and 2) that would not remove the . in the file.
We could follow up with sed 's/\.//g' < output_unaligned.fasta > output_unaligned_no_dots.fasta
... but that would also remove periods in the headers!
It's time to get serious and upgrade to awk
Here's what I came up with:
awk '/^>/ {print;next} {gsub(/\.|-/,"")}1' < test.fasta > test.unaligned.fasta
awk '' will process your file one line at a time
/^>/ matches lines that start with >
{print;next} prints line that match, then goes to the next line
{gsub()} globally substitutes characters like this:
/pattern/,"replacement"
/\.|-/,"" in your case, . or - with nothing "