importing own data

hello everyone

Am a new on Qiime2. I need to import my data which is a manifest fastq format and including my metadata correlated to the results as attached.
I made a directory
(qiime2-2020.11) hamed@x86_64-apple-darwin13 ~ % mkdir afnan
(qiime2-2020.11) hamed@x86_64-apple-darwin13 ~ % cd afnan

then i used this commands to import the file:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path sind-manifest-file2.csv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33

the error is:
There was a problem importing sind-manifest-file2.csv:

sind-manifest-file2.csv is not a(n) PairedEndFastqManifestPhred33 file:

Found header on line 1 with the following labels: ['sample-id', 'absolute-filepath', 'direction', 'fertilizer', 'site of collection', 'Crop rotations', 'Total N (N%)', 'organic compound (OC)', 'nitrate( NO3-)', 'ammonium (NH4)', 'Fe', 'Zn', 'K%', 'Cu', 'Mn', 'average well color development (AWCD)', 'Shannon index (H)', 'Substrate Richness (SR)', 'Eveness (E)', 'carbohydrate', 'polymers', 'carboxylic acid', 'amino acids'], expected: ['sample-id', 'absolute-filepath', 'direction']

Does anyone know how to solve this?

sind-manifest-file2.csv (2.7 KB)

1 Like

Hello Sing,

Welcome to the forums! :wave:

It looks like your manifest file is a .csv file (comma separated value), but qiime tools import requires a .tsv file (tab separated value).

Simply saving your manifest as a .tsv instead of a .csv file should fix this problem!

EDIT: Some of these formats require comma separated files, some require tab separated files. Check the docs for your version of Qiime2 and try it both ways! :qiime2:

Let us know if you have any other question!

EDIT: Also make sure the manifest file only has the three columns:
'sample-id', 'absolute-filepath', 'direction'

You can include those columns as sample metadata later on!

Dear Colin

Thanks for your reply.

I edit the manifest file by including just the three column that you mentioned above and converted to .tsv.

the error is

There was a problem importing sind-manifest-file2.tsv:

sind-manifest-file2.tsv is not a(n) PairedEndFastqManifestPhred33 file:

Found header on line 1 with the following labels: ['sample-id\tabsolute-filepath\tdirection'], expected: ['sample-id', 'absolute-filepath', 'direction']

PLEASE how should I fix this error?

many appreciated

Hello @saldouri,

I made a mistake! It looks like the PairedEndFastqManifestPhred33 format requires a comma separated values (.csv) file.

Those column headers look correct, and once you export using commas, that file should work!

I did as you mentioned exactly. same error msg.

any suggestion please? I will be appreciate.

Would you be willing to post the command that you ran, your manifest file, and the full error message?

I can look at the file on my machine and see if I can see any other issues :mag:

hello

the command and the error msg are:

(qiime2-2020.11) hamed@x86_64-apple-darwin13 sajad % qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path sind-manifest-file2.csv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33
There was a problem importing sind-manifest-file2.csv:

sind-manifest-file2.csv is not a(n) PairedEndFastqManifestPhred33 file:

Found header on line 1 with the following labels: ['`sample-id`', '`absolute-filepath`', "`direction'"], expected: ['sample-id', 'absolute-filepath', 'direction']

and the manifest file is attached

sind-manifest-file2.csv (1.5 KB)

Thanks for posting that!

Most of your file looks good, but your header row has some unexpected characters in it.

If you take out the back ticks ` and quotes ' so it looks like this

sample-id,absolute-filepath,direction
B1,$PWD/SIND/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq,forward

then this should work!

Thanks for being patient while we worked on this problem. Formats are always tricky to get :sparkles: exactly right :sparkles:

Colin

1 Like

Dear Colin

sorry to distribute you

I removed the the back ticks ``` and quotes ' and another error msg :

(qiime2-2020.11) hamed@x86_64-apple-darwin13 sajad % qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path sind-manifest-file2.csv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33
There was a problem importing sind-manifest-file2.csv:

sind-manifest-file2.csv is not a(n) PairedEndFastqManifestPhred33 file:

File referenced on line 2 could not be found ($PWD/SIND/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq).

why all these complication is occurred?

The good news is that the file formatting issues have been fixed!

Now, it looks like one of your files paths is not quite right.

Instead of using $PWD/ in your folder path, perhaps you could use the full folder path instead. It would look something like this:
/path/to/file/SIND/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq
Note how I've replaced $PWD/ with /path/to/file/

Importing data is hard because everyone's files are in different places. Once the data is imported, the commands in the tutorials will be much more reliable and consistent.

If this is your first time working with Linux or the command line, this tutorial could also be helpful.

Colin

Hello Colin

I used my full path as shown in attached file

also there is an error

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path sind-manifest-file2.csv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33
There was a problem importing sind-manifest-file2.csv:

sind-manifest-file2.csv is not a(n) PairedEndFastqManifestPhred33 file:

File referenced on line 2 could not be found (/users/hamed/malak/SIND/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq).

sind-manifest-file2.csv (1.8 KB)

so, please what I need to do now?

Double check the file paths, and make sure the spelling and file extension is correct.

You can start by making sure you can view these files without using Qiime2. Try this:

head /users/hamed/malak/SIND/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq

Also try this:

head /users/hamed/malak/SIND/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq.gz

That should print out the first few lines of that file...

We are so close to importing this data!

Dear Colin

you are right, when I tried the file paths without using Qiime2, there was no such directory or file.

So the steps that I did was:

  1. I changed the name of my file to sind.csv and I edited the path (see my attached file)
  2. Then I used this command
    head /Users/hamed/malak/sind.csv
    the results are the same in entire path

head /Users/hamed/malak/sind.csv
sample-id,absolute-filepath,direction
B1,/Users/hamed/malak/sind.csv /B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq,forward
B1,/Users/hamed/malak/sind.csv /B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R2.fastq,reverse
B2,/Users/hamed/malak/sind.csv /B2_7-16S_V3-V4_BPHVY_TGCAGCTA-CTAAGCCT_L001_R1.fastq,forward
B2,/Users/hamed/malak/sind.csv /B2_7-16S_V3-V4_BPHVY_TGCAGCTA-CTAAGCCT_L001_R2.fastq,reverse
B3,/Users/hamed/malak/sind.csv /B3_8-16S_V3-V4_BPHVY_TCGACGTC-CTAAGCCT_L001_R1.fastq,forward
B3,/Users/hamed/malak/sind.csv /B3_8-16S_V3-V4_BPHVY_TCGACGTC-CTAAGCCT_L001_R2.fastq,reverse
B4,/Users/hamed/malak/sind.csv /B4_9-16S_V3-V4_BPHVY_ACTCGCTA-CGTCTAAT_L001_R1.fastq,forward
B4,/Users/hamed/malak/sind.csv /B4_9-16S_V3-V4_BPHVY_ACTCGCTA-CGTCTAAT_L001_R2.fastq,reverse
B5,/Users/hamed/malak/sind.csv /B5_10-16S_V3-V4_BPHVY_GGAGCTAC-CGTCTAAT_L001_R1.fastq,forward

Till now everything is good

  1. I used the following commands
    qiime tools import
    --type 'SampleData[PairedEndSequencesWithQuality]'
    --input-path sind.csv \
    --output-path paired-end-demux.qza
    --input-format PairedEndFastqManifestPhred33V2

the error msg is
(1/1) Got unexpected extra argument ( )
zsh: command not found: --input-format

I tried it again and another error is occurred
There was a problem importing sind.csv:

sind.csv is not a(n) PairedEndFastqManifestPhred33 file:

File referenced on line 2 could not be found (/Users/hamed/malak/sind.csv /B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq).

what i need to do please?

sind.csv (1.9 KB)

Note// I also used this command in Qiime2 to ensure my file is in malak folder

head sind.csv and the result is

(qiime2-2020.11) hamed@x86_64-apple-darwin13 malak % head sind.csv
sample-id,absolute-filepath,direction
B1,/Users/hamed/malak/sind.csv /B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq,forward
B1,/Users/hamed/malak/sind.csv /B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R2.fastq,reverse
B2,/Users/hamed/malak/sind.csv /B2_7-16S_V3-V4_BPHVY_TGCAGCTA-CTAAGCCT_L001_R1.fastq,forward
B2,/Users/hamed/malak/sind.csv /B2_7-16S_V3-V4_BPHVY_TGCAGCTA-CTAAGCCT_L001_R2.fastq,reverse
B3,/Users/hamed/malak/sind.csv /B3_8-16S_V3-V4_BPHVY_TCGACGTC-CTAAGCCT_L001_R1.fastq,forward
B3,/Users/hamed/malak/sind.csv /B3_8-16S_V3-V4_BPHVY_TCGACGTC-CTAAGCCT_L001_R2.fastq,reverse
B4,/Users/hamed/malak/sind.csv /B4_9-16S_V3-V4_BPHVY_ACTCGCTA-CGTCTAAT_L001_R1.fastq,forward
B4,/Users/hamed/malak/sind.csv /B4_9-16S_V3-V4_BPHVY_ACTCGCTA-CGTCTAAT_L001_R2.fastq,reverse
B5,/Users/hamed/malak/sind.csv /B5_10-16S_V3-V4_BPHVY_GGAGCTAC-CGTCTAAT_L001_R1.fastq,forward

1 Like

Good to hear from you again! I'm glad you are making progress.

There's a couple of issues we can address. You might know about these already, I just want to mention for future folks who might find this thread.


File formats and file extensions (.csv and .tsv)

This is a good first step! But the most important part is to change the file format to 'comma separated value' and not just the filename to .csv.

(You already did this! I just wanted to mention it for other folks, because while the file extension (.csv or .tsv) does not matter in Linux, the file format does.)


Multi-line linux commands:

When you see a command in a tutorial like this

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path sind.csv
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2

that's a single Linux command split over multiple lines. To make sure Linux knows that this is 1 command over 5 lines, and not 5 separate commands, you need to add a \ back slash \ to continue the one command. Like this:

qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path sind.csv \
--output-path paired-end-demux.qza \
--input-format PairedEndFastqManifestPhred33V2

Without those slashes, Linux tries to run each line as a command, and it can't find commands called --type or --input-path so it throws an error.


Fixing the path to the files

It looks like your .csv file includes paths that .csv file, instead of to your .fastq files. Maybe this is a find-and-replace glitch? It should say something like
/Users/hamed/malak/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq
instead.

Keep in touch!

1 Like

Dear Collin

I fixed the path to the files as attached

and did this command by adding the slash after each step
(qiime2-2020.11) hamed@x86_64-apple-darwin13 malak % qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path sind.csv \
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33

There was a problem importing sind.csv:

sind.csv is not a(n) PairedEndFastqManifestPhred33 file:

File referenced on line 2 could not be found (/Users/hamed/malak/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq).

sind.csv (1.7 KB)

note1/ I check the path as you recommended previously using head command and it is correct , the same as in entire file.

note2? when I paste the command of importing data, the slash after each step did not show here, but just to be sure that I did them

I don't know where the problem is, why qiime2 refused my data?

Hi @saldouri!

The main part of the error is here:

It is saying that there is no file at that location, with that specific name. As @colinbrislawn suggested above, you should confirm that that file exists, and if it does, make a note of the full path to its location.

Please run the following and report back what you see:

ls -la /Users/hamed/malak/B1_6-16S_V3-V4_BPHVY_CGATCAGT-CTAAGCCT_L001_R1.fastq
1 Like

Dear Dillon

first of all i would like to appreciate Collin and you for your efforts to solve my problem.

now let me show you a screenshoot of everything

  1. the location where my data is and as you see in the attached photo , my full path is Users/hamed/malak

  2. in the second photo you will see my excel sheet and how i wrote my path

  1. the screenshoot when i wrote command

note/ I attached my file for more check

sind.csv (1.7 KB)

1 Like