data import type and directory format

Hi, I am new to working with micorbial data and have been searching through the qiime forum for answers but I haven't found anything that fits my exact problem.

I have been working to import data process it on Qiime2 2020.2. The data is from a paper that uploaded sequences to NCBI SRA. I downloaded 22 files (of 239) which were part of one run from NCBI into a directory. The general file format is EERfileID.fastq.gz format. I have them on my directory but I cannont import them to qiime2 without errors.

This is the code I have run:
qiime tools import
--type PairedEndSequences
--input-path import_devries
--output-path paired_devries.qza

but I keep getting this error:

The main thing comfusing me is the "Semantic type PairedEndSequences does not have a compatible directory format" error. I don't know if the data I have is in the correct format/the format it should be in. Based on the import, atacma, and moving pictures tutorial fastq.gz formats are alright to use/import to qiime. Is there antoher format the files should be in?

The data should be paired end reads but I cannot tell R1 from R2 and don't know if I need to specify this or if doing so would remove the error. If I need to import all 239 files from NCBI I can but am not sure if that will fix the directory format.

If anyone has advice regarding this/importing from NCBI, I'd apprecaite the help!
Thanks!

Hi @cal1037,

I think you’re missing information about the Semantic type. PairedEndSequences aren’t a semantic type, you need the SampleData[PairedEndSequencesWithQuality].

Have you taken a look at the data importing tutorial? I find it really helpful.

Best,
Justine

Thanks for the quick response and help @jwdebelius!

I did check out the data importing tutorial. I had tried runnig the code posted before with --type EERPairedEndSequences but that didn’t work. Do I add in the quality score of 38 to the end (EERPairedEndSequences38)? I wasn’t sure where to find appropriate semantic types that work on the Importing Tutorial for data from NCBI.

Based on my understanding of the info on NCBI, the samples I’ve imported have two reads per spot in the each file so both the forward and reverse reads are together. I am not sure how to separate the read spots. Is that necessary to do before putting data into qiime?

Really appreciate the help!

Hi @cal1037,

Have you tried the --show-importable-types and --show-importable-formats flag to see if that is a format/type that has been described?

My assumption that you have demultiplexed Phred33 data that doesn’t follow the Casava convention, but I dont download from NCBI all that often.

Best,
Justine

Hi @jwdebelius,

I did try a variety of import types but I have not found one that works with the files I have. In reading through the forum and tutorials I think I need to use the manifest option to import data but I am having trouble understanding what to include in a .txt or .tsv file. I will keep looking through the forum but if you have any advice as to how to format the document or what needs to be in it, I’d appreciate it! I think it is different from the metadata, but in the import tutorial/page it notes some metadata info.

Thanks,
Christina

Hi @cal1037,

Yes, my assumption would be manifest. You can prepare this in excel like are regular table where you just have the table the 3 columns: sample-id, forward-absolute-filepath and reverse-absolute-filepath. In each case, you map whatever name you want to call the sample in the end to the forward (probably contains an R1) and reverse (same name with an R2) files. You need to use the absloute path, which you can probably get by adding $PWD to your path from the folder where you build your manifest.

Once you have constructed it in excel, go to file > save as > and then select “text” from the drop down menu. This will give you a tab-seperated manifest.

I like to keep mine seperate from my metadata because I find it easier to trouble shoot that way, but you need to make sure the file files up with whatever ids are in your metadata.

Best,
Justine

2 Likes

HI @jwdebelius,

I created a manifest file with sample-id and absolute-filepath. I think I am using the correct import type and format becasue I don't get an error regarding that but I am now getting this other error (attached).

I don't know what it means by 'No transformation'. Does this mean there is a problem with my code/the absolute file path or is there a problem with the format my data is in or something else?

I really appreciate the help, thanks so much!
Christina

Hi @cal1037,

You are using SingleEndFastqManifestPhred33V2 as your format, butt your type is SampleData[PairedEndSequencesWithQuality]. If you have single end data, then you need to use SampleData[SequencesWithQuality].

Best,
Justine

Hi @jwdebelius,

Thank you so much for the fast replies! I have paried end sequences so I fixed my input code.
code:
qiime tools import \ --type 'SampleData[PairedEndSequencesWithQuality]' \ --input-path /mnt/home/ernakovich/cal1037/devries_files/manifest_file_devries_4.txt \ --output-path /mnt/home/ernakovich/cal1037/devries_files/devries-demux.qza \ --input-format PairedEndFastqManifestPhred33V2

I have run the code on one line and on multiple lines, but either way I keep getting this error

Do you see something wrong with my code which is leading to this error?

Hi @cal1037,

There’s something either with the spacing or quotes. I think it’s probably that you’re using the ‘ quote instead o the ’ quote. So, maybe try

qiime tools import \
 --type 'SampleData[PairedEndSequencesWithQuality]' \ 
 --input-path /mnt/home/ernakovich/cal1037/devries_files/manifest_file_devries_4.txt \
 --output-path /mnt/home/ernakovich/cal1037/devries_files/devries-demux.qza \ 
 --input-format PairedEndFastqManifestPhred33V2

If that doesn’t work, could you post a picture of the command in your terminal?

Best,
Justine

1 Like

Hi @jwdebelius,

I am still getting this error image

Here is the code I am using: image
It doesn't work when it is on different lines which is why I have it all on one line.

When I run the exact code you sent I get this error: image

When the only change I make to your code is putting it onto one line I get this error: image

I am not sure what is wrong with the code I am typing or why on the code you sent it doesn't understand the type, output path, input path etc.

I really appreciate the help! Thank you so much for taking the time to work with me!
Best,
Christina

Hi @cal1037,
You don’t need to type " \ " when you input a command, it’s means " Enter " .

Good luck

2 Likes

Hi @cal1037,

As. @Iris says, you only need the “\” if you’re doing mult-line wrapping. So, you can either try without or copy line by line.

Best,
Justine

Thanks for the help @Iris and @jwdebelius.

I am still having trouble importing the data. I thought it was Paired end data so I was using image but it appears this is not correct... I am confused about the error though because my manifest file does have the header absolute-filepath as seen here (the top few lines of my .txt file image

I am not sure if I have done something wrong setting up my manifest file, through reading on the forum/tutorials I thought I set it up correctly. Do you all see a problem with how it is put together?

I know my data is paired end (according to NCBI where I downloaded the data from) but the ...Phred64V2 also does not work for importing the data.

Is there another data type/format that my data might fit under?

I really appreciate the help!
Christina

Hi @cal1037,

If you have paired end data, you need to use the paired end manifest format. You have a single end manifest format. Please go back to my previous post or the tutorial on paired end sequences.

Best,
Justine

1 Like

Hi @jwdebelius,

I am not sure how the set up of my manifest file is incorrect from looking at other posts and the errors I get. If I am understanding the error correctly, it wants me to use the "absolute-filepath" column which I have - am I misinterpreting the error (below)? image

Based on the file run code I don't know how to tell the R1 from R2 (forward from reverse), does this mean it is not possible to import the data? This is a link of one of the runs/sequences: Run Browser : Browse : Sequence Read Archive : NCBI/NLM/NIH
Am I using the wrong link to get the data?

Thank you for the help!
Christina

Hi @cal1037,

You’ve got the error flipped. It says that you have a column called absolute-filepath and it’s looking for a two columns called forward-absolute-filepath and reverse-absloute-filepath.

Again, is outlined pretty clearly in the manifest tutorial. Please read that closely.

Best,
Justine

1 Like

Hi @jwdebelius,

So I think my problem is that I have paired end reads (according to al the infor on NCBI where I am getting the samples from) but I only have one link for forward/reverse sequences. For example this is one of the runs I have taken from NCBI: ERR2654632 which was imported as a fastq file. This has been the only run style I’ve been able to upload to my directory.

I have been unable to figure out how to get these links (which appear to be forward/reverse reads) http://ftp.sra.ebi.ac.uk/vol1/run/ERR265/ERR2654632/H11C_S287_L001_R1_001.fastq.gz and http://ftp.sra.ebi.ac.uk/vol1/run/ERR265/ERR2654632/H11C_S287_L001_R2_001.fastq.gz to import into my directory/work with the manifest format.

Is there some way I should be renaming the files to make them be in the correct format of R1 and R2? Do I need to have these files in my directory before I import them to qiime or can qiime import them directly from NCBI?

Thanks,
Christina

Hi @cal1037,

Right now, the files do need to be in your directory. So, you need to download them first.

Best,
Justine

Hi @jwdebelius,

I reimported my data so I have the R1 and R2 (forward and reverse reads). I am now working on importing these data to qiime. Could you let me know if you see something wrong with my code or manifest file?

This image shows my code and error:

This image shows my manifest file.

Thanks for the help!
Christina