Error in fasta sequence conversion

Asha1 · October 18, 2019, 11:26am

Hi, All

Since greengene and sliva database has no species classification for many organisms, I want to include some of the organism sequence in to existing green gene database. I am getting problem with my fasta file. All my fasta sequences were different in length. Some sequence is 2.1 Mb, Some are 1 Mb, and some fasta sequence are small in size. This is making huge problem in performing taxonomy analysis.

I couldn't convert all fasta sequence in to single line, I could convert only few sequence. If add fasta sequence without converting single line in to existing greengene database means, I was getting error mainly in classifier step.

I didn't get error, when I ran only existing greengenes database (gg_13_8_otus). If add my target sequence database to it mean, I couldn't run my command. Some error is coming in classifier step

I used two different commandline for converting my fasta sequence in to single line.
awk '/^>/ {printf("\n%s\n",$0);next;}{printf("%s",$0);} END {printf("\n");}'< 99_otus.fasta > 99sp.fasta
awk '{if(NR==1) {print $0} else {if($0 ~ /^>/) {print "\n"$0} else {printf $0}}}' nocardia.fasta > snocardia.fasta

I tried converting single organism sequence in to single line fasta sequence, Still I couldn't convert in to single line fasta sequence for some of my target organism genome.

is there any command which will convert huge fasta sequences in to single line for adding in existing greengenedatabase ?

Thanking you in advance for your support and help.

Nicholas_Bokulich · October 18, 2019, 4:18pm

so is the issue that those awk commands convert your fasta seqs to single lines? i.e., the seq ID and sequence are separated by a tab instead of a line break? If that's the case just use tr "\t" "\n" < old.fasta > new.fasta

But please let me know if that's not what you're after

Asha1 · October 19, 2019, 11:07am

Thank you so much for your suggestion sir. I will try this command and let u know the outcome.