Hi! Thank you for pointing out the issue.
I find out this may be caused by download and unzip processes.
The following are the processes I used before:
- DOWNLOAD NCBI SAMPLE FILES THROUGH
prefetch
Generally, the download process is like this:
prefetch SRA_ID --location NCBI
- After downloading the .sralite file(a kind of zip file), I use
fastq-dump
to unzip this file:
fastq-dump --split-3 SRA.sralite
3.Finally, two paired-end or one single file will be generated. The file's content is like below:
@SRR19603331.1 1 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTCACGTGAGAGCAGGCGG
+SRR19603331.1 1 length=251
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
@SRR19603331.2 2 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGTGAGCGCAGGCGG
+SRR19603331.2 2 length=251
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
@SRR19603331.3 3 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAATAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAAATGGCCTTTATTTGAAGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATGTATTGGGCGTAAAGCGAGCGCAGGCGG
+SRR19603331.3 3 length=251
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
We can draw conclusion that this file is not right because there are only '?' in the files.
However, when I use wget
to download SRA file, it will be very different.
- I enter the one of the run's website, which is displayed below.
Copying the AWS url and usewget
to download the files
wget -b -c https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR19603331/SRR19603331
parallel-fastq-dump
was used to generate fastq files from the file which was downloaded before.
parallel-fastq-dump -t 20 -O ./ --split-3 -s SRR19603331
- Check the fastq files, the result is showed below:
@SRR19603331.1 1 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTCACGTGAGAGCAGGCGG
+SRR19603331.1 1 length=251
FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FF,FFFFFFFFFF:FFFFFFFFFF:FFFFFFF:FF:FFFFFFFFFFFFF:FFFFFFF,F::FF:F:FFF,FF:FFFFFFF:F:,FFFFFFF::F:FF:F:FFFFF:FFF,FFFFFFFFFFFFFFFFF:FFF::FFF::FFFFFF:FFFF:FFF:FFF::FFFFF:,F,FFFFFFF::F,F:,FFFF,FF:F,FFFFF,FF,FF:F:
@SRR19603331.2 2 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGTGAGCGCAGGCGG
+SRR19603331.2 2 length=251
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFF:F:FFF:FFFFF:FFFF
@SRR19603331.3 3 length=251
ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAATAAGGTTTTCGGATCGTAAAGCTCTGTTGTTGGTGAAGAAGGATAGAGGTAGTAAATGGCCTTTATTTGAAGGTAATCAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATGTATTGGGCGTAAAGCGAGCGCAGGCGG
+SRR19603331.3 3 length=251
FFFFF,FF::F:::FFFFFFFFFF:FF:FFFF,:FFFFFFFFFFFF,,FFFFFFFFFF,FFFFF,FF,:FF:FFF,,::FFFFFF,,FFFFFF:FFF,:,FF,FF,FFFFFFF:F,FFFFFFFFFF:F:F,,F:F::F,F,F:,F,FF:,FF:FFF,FFF,F,FFF:,,F,FFFF,F,FF:FFF:FF:FFFF:FF,F,,F,F,FF:::F,FFFF,FFFF:,,F,F,F:F:,F::F:,,:,,F:,FFF,FF,
This file is right! In a nutshell, it seems like the problem is caused by prefetch
Thank you for your patient and kindness! I will try this way to generate a feature table and do species annotation.