Hi, I am new to working with micorbial data and have been searching through the qiime forum for answers but I haven't found anything that fits my exact problem.
I have been working to import data process it on Qiime2 2020.2. The data is from a paper that uploaded sequences to NCBI SRA. I downloaded 22 files (of 239) which were part of one run from NCBI into a directory. The general file format is EERfileID.fastq.gz format. I have them on my directory but I cannont import them to qiime2 without errors.
This is the code I have run:
qiime tools import
--type PairedEndSequences
--input-path import_devries
--output-path paired_devries.qza
The main thing comfusing me is the "Semantic type PairedEndSequences does not have a compatible directory format" error. I don't know if the data I have is in the correct format/the format it should be in. Based on the import, atacma, and moving pictures tutorial fastq.gz formats are alright to use/import to qiime. Is there antoher format the files should be in?
The data should be paired end reads but I cannot tell R1 from R2 and don't know if I need to specify this or if doing so would remove the error. If I need to import all 239 files from NCBI I can but am not sure if that will fix the directory format.
If anyone has advice regarding this/importing from NCBI, I'd apprecaite the help!
Thanks!
I think youâre missing information about the Semantic type. PairedEndSequences arenât a semantic type, you need the SampleData[PairedEndSequencesWithQuality].
Have you taken a look at the data importing tutorial? I find it really helpful.
Thanks for the quick response and help @jwdebelius!
I did check out the data importing tutorial. I had tried runnig the code posted before with --type EERPairedEndSequences but that didnât work. Do I add in the quality score of 38 to the end (EERPairedEndSequences38)? I wasnât sure where to find appropriate semantic types that work on the Importing Tutorial for data from NCBI.
Based on my understanding of the info on NCBI, the samples Iâve imported have two reads per spot in the each file so both the forward and reverse reads are together. I am not sure how to separate the read spots. Is that necessary to do before putting data into qiime?
I did try a variety of import types but I have not found one that works with the files I have. In reading through the forum and tutorials I think I need to use the manifest option to import data but I am having trouble understanding what to include in a .txt or .tsv file. I will keep looking through the forum but if you have any advice as to how to format the document or what needs to be in it, Iâd appreciate it! I think it is different from the metadata, but in the import tutorial/page it notes some metadata info.
Yes, my assumption would be manifest. You can prepare this in excel like are regular table where you just have the table the 3 columns: sample-id, forward-absolute-filepath and reverse-absolute-filepath. In each case, you map whatever name you want to call the sample in the end to the forward (probably contains an R1) and reverse (same name with an R2) files. You need to use the absloute path, which you can probably get by adding $PWD to your path from the folder where you build your manifest.
Once you have constructed it in excel, go to file > save as > and then select âtextâ from the drop down menu. This will give you a tab-seperated manifest.
I like to keep mine seperate from my metadata because I find it easier to trouble shoot that way, but you need to make sure the file files up with whatever ids are in your metadata.
I created a manifest file with sample-id and absolute-filepath. I think I am using the correct import type and format becasue I don't get an error regarding that but I am now getting this other error (attached).
I don't know what it means by 'No transformation'. Does this mean there is a problem with my code/the absolute file path or is there a problem with the format my data is in or something else?
I really appreciate the help, thanks so much!
Christina
You are using SingleEndFastqManifestPhred33V2 as your format, butt your type is SampleData[PairedEndSequencesWithQuality]. If you have single end data, then you need to use SampleData[SequencesWithQuality].
Thank you so much for the fast replies! I have paried end sequences so I fixed my input code.
code:
qiime tools import \ --type 'SampleData[PairedEndSequencesWithQuality]' \ --input-path /mnt/home/ernakovich/cal1037/devries_files/manifest_file_devries_4.txt \ --output-path /mnt/home/ernakovich/cal1037/devries_files/devries-demux.qza \ --input-format PairedEndFastqManifestPhred33V2
I have run the code on one line and on multiple lines, but either way I keep getting this error
Thereâs something either with the spacing or quotes. I think itâs probably that youâre using the â quote instead o the â quote. So, maybe try
I am still having trouble importing the data. I thought it was Paired end data so I was using but it appears this is not correct... I am confused about the error though because my manifest file does have the header absolute-filepath as seen here (the top few lines of my .txt file
I am not sure if I have done something wrong setting up my manifest file, through reading on the forum/tutorials I thought I set it up correctly. Do you all see a problem with how it is put together?
I know my data is paired end (according to NCBI where I downloaded the data from) but the ...Phred64V2 also does not work for importing the data.
Is there another data type/format that my data might fit under?
If you have paired end data, you need to use the paired end manifest format. You have a single end manifest format. Please go back to my previous post or the tutorial on paired end sequences.
I am not sure how the set up of my manifest file is incorrect from looking at other posts and the errors I get. If I am understanding the error correctly, it wants me to use the "absolute-filepath" column which I have - am I misinterpreting the error (below)?
Based on the file run code I don't know how to tell the R1 from R2 (forward from reverse), does this mean it is not possible to import the data? This is a link of one of the runs/sequences: Run Browser : Browse : Sequence Read Archive : NCBI/NLM/NIH
Am I using the wrong link to get the data?
Youâve got the error flipped. It says that you have a column called absolute-filepath and itâs looking for a two columns called forward-absolute-filepath and reverse-absloute-filepath.
Again, is outlined pretty clearly in the manifest tutorial. Please read that closely.
So I think my problem is that I have paired end reads (according to al the infor on NCBI where I am getting the samples from) but I only have one link for forward/reverse sequences. For example this is one of the runs I have taken from NCBI: ERR2654632 which was imported as a fastq file. This has been the only run style Iâve been able to upload to my directory.
Is there some way I should be renaming the files to make them be in the correct format of R1 and R2? Do I need to have these files in my directory before I import them to qiime or can qiime import them directly from NCBI?
I reimported my data so I have the R1 and R2 (forward and reverse reads). I am now working on importing these data to qiime. Could you let me know if you see something wrong with my code or manifest file?