Hi, All
Since greengene and sliva database has no species classification for many organisms, I want to include some of the organism sequence in to existing green gene database. I am getting problem with my fasta file. All my fasta sequences were different in length. Some sequence is 2.1 Mb, Some are 1 Mb, and some fasta sequence are small in size. This is making huge problem in performing taxonomy analysis.
I couldn't convert all fasta sequence in to single line, I could convert only few sequence. If add fasta sequence without converting single line in to existing greengene database means, I was getting error mainly in classifier step.
I didn't get error, when I ran only existing greengenes database (gg_13_8_otus). If add my target sequence database to it mean, I couldn't run my command. Some error is coming in classifier step
I used two different commandline for converting my fasta sequence in to single line.
awk '/^>/ {printf("\n%s\n",$0);next;}{printf("%s",$0);} END {printf("\n");}'< 99_otus.fasta > 99sp.fasta
awk '{if(NR==1) {print $0} else {if($0 ~ /^>/) {print "\n"$0} else {printf $0}}}' nocardia.fasta > snocardia.fasta
I tried converting single organism sequence in to single line fasta sequence, Still I couldn't convert in to single line fasta sequence for some of my target organism genome.
is there any command which will convert huge fasta sequences in to single line for adding in existing greengenedatabase ?
Thanking you in advance for your support and help.