Problems importing sequences from SRA

Dear Qiime2 users, I'm trying to use paired-end sequences downloaded from SRA, but I have problems with the command, for example:
**SRA files (one per sample): i.e., experiment SRX2730405 (view SRX2730405.txt (3.3 KB)
**Manifest file: Map-imput_Narish2017.csv (164 Bytes)
** comand
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path Map-imput_Naris2017.csv
--output-path Narish2017-demux.qza
--input-format PairedEndFastqManifestPhred64

I already tried several forms of --type, and --input-format, but did not identify the problem. Please, help me identify and correct the error.
Also, I would appreciate any recommendation on how to download data from SRA (I just identified and downloaded each experiment as .fastq

, is that correct?

Hi @Dasiel,

I suspect the issue is in the manifest file. I don’t see a file extension listed as part of the path. I would expect those paths to end in .fastq (or maybe .fastq.gz). If you run the following command, what do you see?

ls /mnt/c/Dowload_linux/set-sequence/

Otherwise, could you post what error you are seeing specifically?

Thanks!

Evan, thank you very much for your help.
Yes, you're right, I must include .fastq at the end of the filephat, i.e.

sample-id,absolute-filepath,direction
rcbc1,/mnt/c/Dowload_linux/set-sequence/SRX2730405_1.fastq,forward
rcbc1,/mnt/c/Dowload_linux/set-sequence/SRX2730405_2.Fastq,reverse

But still, it doesn't work. Here the -ls view and the error-output

I see that the sequences R1 and R2 are in the same file (SRA run), is it necessary to separate them in order to import them?
I was reading about FASTQ de-interlacer on paired-end reads (on galaxy)...what do you suggest?
cheers

Hey there @Dasiel! It looks like there might be an issue related to the manifest file itself. I noticed you are running an older version of QIIME 2 (2018.8). If you upgrade to 2018.11 you will see a more detailed error message about the manifest file that can hopefully set you in the right direction.

One thing I notice is that the filename in your manifest says "SRX2730405_1.fastq", but the closest filename in the dir is called "SRX2730405.fastq" (note the missing _1 at the end).

Are they interlaced, or pre-joined?

Hello Matthew, thanks for your observation. Normally I work on the lab server, and we have the current version, but I will also update my laptop, thanks.

The sequences are interlaced, what do you recommend me?

Unfortunately we don’t have a mechanism in QIIME 2 for dealing with interlaced reads — if you are able to deinterlace using an external tool (see this link for a suggestion) you could then import those deinterlaced reads using the manifest format you started working with above. Sorry!

Great, in the exchange with you I can understand the problem!. Interesting that not all the samples (run) in this set are interlaced, so I’m checking them individually. I’m using FASTQ de-interlacer (Galaxy version). Thanks for your important support to the users of qiime2. Cheers

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.