Problem importing data

Hi ,

I’m trying to import data by QIIME2 but I have issue with this.
This is example of my filenames format

W28_J0545_S100_L001_R1_001.fastq.gz  W28_J0545_S170_L001_R1_001.fastq.gz  W28_J0545_S31_L001_R1_001.fastq.gz
W28_J0545_S100_L001_R2_001.fastq.gz  W28_J0545_S170_L001_R2_001.fastq.gz  W28_J0545_S31_L001_R2_001.fastq.gz
W28_J0545_S101_L001_R1_001.fastq.gz  W28_J0545_S171_L001_R1_001.fastq.gz  W28_J0545_S32_L001_R1_001.fastq.gz
W28_J0545_S101_L001_R2_001.fastq.gz  W28_J0545_S171_L001_R2_001.fastq.gz  W28_J0545_S32_L001_R2_001.fastq.gz
W28_J0545_S102_L001_R1_001.fastq.gz  W28_J0545_S172_L001_R1_001.fastq.gz  W28_J0545_S33_L001_R1_001.fastq.gz
W28_J0545_S102_L001_R2_001.fastq.gz  W28_J0545_S172_L001_R2_001.fastq.gz  W28_J0545_S33_L001_R2_001.fastq.gz
W28_J0545_S103_L001_R1_001.fastq.gz  W28_J0545_S173_L001_R1_001.fastq.gz  W28_J0545_S34_L001_R1_001.fastq.gz
W28_J0545_S103_L001_R2_001.fastq.gz  W28_J0545_S173_L001_R2_001.fastq.gz  W28_J0545_S34_L001_R2_001.fastq.gz
W28_J0545_S104_L001_R1_001.fastq.gz  W28_J0545_S174_L001_R1_001.fastq.gz  W28_J0545_S35_L001_R1_001.fastq.gz

After I run this command,

   qiime tools import \
    --type 'SampleData[PairedEndSequencesWithQuality]' \
    --input-path m_28w \
    --source-format CasavaOneEightSingleLanePerSampleDirFmt \
    --output-path demux-paired-end.qza

I got this error:

There was a problem importing m_28w:

  m_28w is not a(n) CasavaOneEightSingleLanePerSampleDirFmt:

  Duplicate samples in forward reads: {'W28_J0545'}

I tried to remove W28 but still I have same issue.

Is there a way to solve this issue?

Thanks,

Hi @Faisal,
Is it possible to share the m_28w manifest file you are using to help with the troubleshooting?

Hi @Faisal,
The thread Demultiplex pair end sequence import problem may be of help to you.

Basically, this import type expects the naming of the files to be in this order:
sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number

In your case, it looks like S100 etc. is the sample identifier, so you could delete W28_J0545_ from the name, but then the string is still missing a barcode identifier.

You could use the general manifest format to import. The manifest file that gets inputted into this is what @Mehrbod_Estaki is referring to.

2 Likes

Hi @Mehrbod_Estaki

Thanks for your reply.

I haven’t used the manifest method but I will try.

Thanks

1 Like

Hi @sejsong

I will try manifest and hopefully works well.

Thanks,

Hi @Mehrbod_Estaki , @sejsong and anyone could help

I made the manifest and tried to import the data but still it could not recognise the format.

This is the command line used

qiime tools import \
>   --type 'SampleData[PairedEndSequencesWithQuality]' \
>   --input-path 28wks_manifest_29042018.csv \
>   --output-path w28_paired-end-demux.qza \
>   --source-format PairedEndFastqManifestPhred64

and this is the erorr

There was a problem importing 28wks_manifest_29042018.csv:

  28wks_manifest_29042018.csv is not a(n) PairedEndFastqManifestPhred64 file

Also, tried to change the format to Phred33 but still the error there

qiime tools import \
>   --type 'SampleData[PairedEndSequencesWithQuality]' \
>   --input-path 28wks_manifest_29042018.csv \
>   --output-path w28_paired-end-demux.qza \
>   --source-format PairedEndFastqManifestPhred33
There was a problem importing 28wks_manifest_29042018.csv:

  28wks_manifest_29042018.csv is not a(n) PairedEndFastqManifestPhred33 file

This is example of the sequence data, if I picked the wrong file format?
image

And this is screenshot of the mainfest file
image

I hope to find a solution for this.

Thanks.

Hi @Faisal,

I believe you’re almost there and the only thing you need to adjust is to change the relative file-paths in your manifest file to absolute paths i.e. /home/users/…/filename/fastq.gz. Give that a try!

1 Like

Hi @Mehrbod_Estaki

I changed the file-path to absolute paths but still getting this error

There was a problem importing 28wks_manifest_renames_29042018.csv:

  28wks_manifest_renames_29042018.csv is not a(n) PairedEndFastqManifestPhred64 file

It is seem to be issue with identify the fastq format with QIIME ?

Hi @Faisal,

Sorry you’re still having issues with this! The other common issues coming from this error according to the forum is extra rows/columns and empty cells in your manifest. Sometimes this happens without you realizing it by excel-like tools so those also might be checking out, and making sure the format is comma separated and not tab-separated as well. Also make sure the Phred format you pick is correct for your data, most recent sequencers use Phred33 format these days.
If those don’t work we’ll wait for one of the developers to take a look at this more thoroughly. Sorry couldn’t be more helpful.

Hi @Mehrbod_Estaki,

Thank you! I copied only the required cells and past them in new sheet to use it.
With Phred33 it is working but error shows with Phred64

qiime tools import \
>   --type 'SampleData[PairedEndSequencesWithQuality]' \
>   --input-path cleaned_rename_28w_manifest.csv \
>   --output-path w28_Ph64_paired-end-demux.qza \
>   --source-format PairedEndFastqManifestPhred64
/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py:344: UserWarning: Importing of PHRED 64 data is slow as it is converted internally to PHRED 33. Working with the imported data will not be slower than working with PHRED 33 data.
  warnings.warn(_phred64_warning)
/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/skbio/io/registry.py:557: ArgumentOverrideWarning: Best guess was: variant='illumina1.8', continuing with user supplied: 'illumina1.3'
  ArgumentOverrideWarning)
Traceback (most recent call last):
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/q2cli/tools.py", line 116, in import_data
    view_type=source_format)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 218, in import_data
    return cls._from_view(type_, view, view_type, provenance_capture)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 243, in _from_view
    result = transformation(view)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 70, in transformation
    new_view = transformer(view)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 346, in _9
    single_end=False)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 288, in _fastq_manifest_helper
    fastq_copy_fn(input_fastq_fp, str(output_fastq_fp))
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 313, in _write_phred64_to_phred33
    variant='illumina1.3'):
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/skbio/io/registry.py", line 1161, in read
    **kwargs)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/skbio/io/registry.py", line 506, in read
    return (x for x in itertools.chain([next(gen)], gen))
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/skbio/io/format/fastq.py", line 354, in _fastq_to_generator
    qual_header)
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/skbio/io/format/fastq.py", line 523, in _parse_quality_scores
    phred_offset=phred_offset))
  File "/Users/faisal/miniconda2/envs/qiime2-2018.4/lib/python3.5/site-packages/skbio/io/format/_base.py", line 34, in _decode_qual_to_phred
    % (phred_range[0], phred_range[1]))
ValueError: Decoded Phred score is out of range [0, 62].

An unexpected error has occurred:

  Decoded Phred score is out of range [0, 62].

See above for debug info.

I’m not sure which Phred I have for my data. But at least I’m glad it is imported with Phred33 at least.

Thanks!

Great news!:tropical_drink: Sounds like the issue was just from some empty cells in your original manifest file. You should carry on with your analyses using the Phred33 format, the error you get with Phred64 is indicative of your data not being Phred64. Most modern Illumina machines do give you Phred33 format FASTQ files these days but if you really want to be sure you can ask your sequencing facility. Wikipedia has a nice explanation of Phred scores and the history of their use if you are curious to learn more. Good luck with the rest of the analyses!

5 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.