How to Import our samples in Qiime2

qiime tools import
–type 3949_A_run554_ATCACGTT_S39_L002_R1_001
–input-path casava-18-paired-end-demultiplexed
–input-format CasavaOneEightSingleLanePerSampleDirFmt
–output-path demux-paired-end.qza

Hi, in the command above will a user should change something?

Please tell me how can I import my data in Qiime2. It is puzzling in tutorial. It gives a sample we should download then unzip it, but I would like to work on my data. The command is there after two commands seems to be modified by my file name. It is just an idea! I have no any clue. I changed some parts but did not worked. By the way there are some phrases, such as INPUT, OUTPUT and TYPE are unfamiliar to me. Tell me what should I do. Thanks

I unipped my fastq.gz file to fastq. I wanted to import this fastq in Qiime2, it gave an error. The error is below.
qiime tools import \

--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path R2.fastq
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path demux-paired-end.qza
There was a problem importing R2.fastq:

** R2.fastq is not a directory.**
So what is the problem?
What should I do to solve it? Am I in wrong way?

I used these commands for unzip my fastq.gz file. But they did not make a right file to use it in importing. Have you got any idea?

gunzip .fastq.gz
pigz -d .fastq.gz

Thanks a lot

Hi @Mehrdad,

Have you looked through the tutorial and played with those files to try and structure yours the same way? I find the tutorials are typically really helpful in specifying the filetypes if you work through them. The test files are presented as zipped files, since its easiest to download them, but sequence files are also usually compressed.

If you have questions about the formats, please also review the semantic types in QIIME 2, since these can be helpful to determine what you need.

Best,
Justine

I looked at the importing part. There are three types. 1. EMP 2. Casava 3. others.
My data is paired-end Casava. I have two two files: one for F and another for R.

I know my file's address in Linux. In terminal I go to the place and run this command below. But I changed '--input-path name with my data name!

qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path MY FILE NAME.fastq
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path demux-single-end.qza

By the way. before doing this command I unzipped my files.

So tell me what is the problem now, please?

I put the error message here:
qiime tools import \

--type 'SampleData[SequencesWithQuality]'
--input-path 1.fastq
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path demux-single-end.qza
Usage: qiime tools import [OPTIONS]
Try "qiime tools import --help" for help.

Error: Invalid value for "--input-path": Path "1" does not exist.

Appreciate

Hi @Mehrdad,

Your import format is CasavaOneEightSingleLanePerSampleDirFmt. Usually a label like “dir” suggests you should load a directory, rather than a file (which might be fp or something similar).

Best,
Justine

Excuse me!
I changed the format.

qiime tools import \

--type 'SampleData[SequencesWithQuality]'
--input-path 1.fastq.gz
--input-format 1dirFmt
--output-path demux-single-end.qza
Usage: qiime tools import [OPTIONS]
Try "qiime tools import --help" for help.

Error: Invalid value for "--input-path": Path "1.fastq.gz" does not exist.

I have still the same problem.
Is there a protocol about the format, path, etc. I got stuck this step. I tried more but could not solve it.
I asked more questions but I have to share the current problem with you.

I tried another way I got a new error ralated to format. i am not familiar with forma what is suit to it.
qiime tools import \

--type 'SampleData[SequencesWithQuality]'
--input-path test
--input-format dir
--output-path 11.qza
Traceback (most recent call last):
File "/home/mpi/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/util.py", line 91, in parse_format
format_record = pm.formats[format_str]
KeyError: 'dir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/mpi/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/tools.py", line 146, in import_data
view_type=input_format)
File "/home/mpi/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/result.py", line 206, in import_data
view_type = qiime2.sdk.parse_format(view_type)
File "/home/mpi/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/util.py", line 93, in parse_format
raise TypeError("No format: %s" % format_str)
TypeError: No format: dir

An unexpected error has occurred:

No format: dir

See above for debug info.

Hi @Mehrdad,

Since I know you’re a new user to both Linux and QIIME 2, it’s really important to learn to trouble shoot yourself. dir is not a semantic type (a valid type argument); have you read the type documentation?

Have you been able to get the tutorial to work? If you download and unzip the tutorial data, have you explored the structure (directory structure and file types) there? Those may also help you better pattern your imports.

Best,
Justine

Dear Justine,
I made .qza artifact with the data that is present in tutorial page in Casava paired end part.
I was successful! But my problem is on my data. I am unsuccessful to import it that is why I asked your team.
It seems input-format has to be modified to sth. I do not know.

Sorry @Mehrdad,

I understand your frustration. You need to place all your files in a single directory, and then pass the directory in as your input-path and then pass in your format as CasavaOneEightSingleLanePerSampleDirFmt. However, it looks like your file names are not Casava-compatible, in which case, Id suggest a sample manifest format.

Best,
Justine

The directory stored my fastq.gz files named Seq.
so,

type = ‘SampleData[PairedEndSequencesWithQuality]’
input-path = Seq
input-format = ???
output-path = Seq11

I checked a lot of things but I did not understand what should be the format.

Hi! Everything is explained here
https://docs.qiime2.org/2019.1/tutorials/importing/
type - you should indicate which type of reads you are going to use, for some of them you should also indicate input format
input path - path to your reads
output path - path to the place where you want to locate output files

2 Likes

I read the page several times.
I seems I have to change the items to my data details. But I am not familiar with them. I got errors.

qiime tools import \

--type 'SampleData[SequencesWithQuality]'
--input-path test
--input-format CasavaOneNineSingleLanePerSampleDirFmt
--output-path 11.qza
Traceback (most recent call last):
File "/home/mpi/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/util.py", line 91, in parse_format
format_record = pm.formats[format_str]
KeyError: 'CasavaOneNineSingleLanePerSampleDirFmt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/mpi/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/tools.py", line 146, in import_data
view_type=input_format)
File "/home/mpi/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/result.py", line 206, in import_data
view_type = qiime2.sdk.parse_format(view_type)
File "/home/mpi/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/util.py", line 93, in parse_format
raise TypeError("No format: %s" % format_str)
TypeError: No format: CasavaOneNineSingleLanePerSampleDirFmt

An unexpected error has occurred:

No format: CasavaOneNineSingleLanePerSampleDirFmt

See above for debug info.

Where can I find my format? I read this link's content but did not help me!

So it is Fastq Format is does not help

Format

A FASTQ file normally uses four lines per sequence.

  • Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line).
  • Line 2 is the raw sequence letters.
  • Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again.
  • Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.

A FASTQ file containing a single sequence might look like this:

@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''((((+))%%%++)(%%%%).1*-+*''))**55CCF>>>>>>CCCCCCC65

The byte representing quality runs from 0x21 (lowest quality; '!' in ASCII) to 0x7e (highest quality; '~' in ASCII). Here are the quality value characters in left-to-right increasing order of quality (ASCII):

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuvwxyz{|}~

The original Sanger FASTQ files also allowed the sequence and quality strings to be wrapped (split over multiple lines), but this is generally discouraged as it can make parsing complicated due to the unfortunate choice of "@" and "+" as markers (these characters can also occur in the quality string).

Your reads are in test directory?
did you try

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path test \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

Yes, it is in 'test' directory!

I copied and pasted the command you replied me. Now it is the result:

qiime tools import \

--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path test
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path demux-paired-end.qza
There was a problem importing test:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'

Hello @Mehrdad - I have merged your two parallel discussion threads into one, please refrain from opening another thread with the same question, this takes up valuable time on our end, where we could be providing actual user support, rather than spending time on this kind of administration.


@Mehrdad - can you please provide the following: a screenshot of your raw files (we need to see the filenames)? Once we have a better idea of what shape your data currently is we can provide a more concrete recommendation. In the meantime, I think your most likely bet is to use the FASTQ Manifest format.

1 Like

Still struggling with it? Can you attach screenshot of directory test all content? Or attach a txt file with a complete list of files

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.