Import failure with the public Galaxy

I recently tried to use the Public Galaxy server (https://cancer.usegalaxy.org/) to upload files and perform "qiime tools import." I have experience with natively installed qiime, and in that case, I was successful in importing using the following code:

qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path manifest.txt
--output-path single-end-demux.qza
--input-format SingleEndFastqManifestPhred33V2

I would like to try using the simple GUI on Galaxy, but I have failed in uploading simple fastq.gz files or using the URL method, as shown below:

IBD177 http://ftp.sra.ebi.ac.uk/vol1/run/ERR174/ERR1746515/1629.SubjectIBD177.fastq.gz
IBD178 http://ftp.sra.ebi.ac.uk/vol1/run/ERR174/ERR1746516/1629.SubjectIBD178.fastq.gz

Could you please let me know how to use public Galaxy to import FASTQ files?
Either method (using a URL or directly uploading a local file) is okay.

Sincerely,

2 Likes

@jaehyun.kim,
When using QIIME 2 locally, your files and QIIME 2 are already on the same machine. This is not the case with the Galaxy server, so you will need to perform the additional step of getting the files to the server. To do this simply click on the upload data button(denoted by the # 1 in the screenshot below), after your data is actually uploaded, you will need to import it just like when using QIIME 2 from your own machine, using qiime tools import, at the bottom of the list of QIIME 2 tools(denoted by the #2 in the screenshot)

Hope this gets you moving forwards!

Hi @Keegan-Evans . Thank you for your reply.

I basically followed the following tutorial page: Importing demultiplexed sequence data — QIIME 2 Cancer Microbiome Intervention Tutorial

  1. Using the Upload Data tool: On the fourth tab (Rule-based):
  • Set "Upload data as" to Collections
  • Set "Load tabular data from" to Pasted Table
  • Press the build button at the bottom

Then, built rules according to the instruction.

  1. Using the qiime2 tools import tool: (I think this step is somewhat problematic)
  • Set “Type of data to import” to SampleData[SequencesWithQuality]
  • Set “QIIME 2 file format to import from” to 'Single End FASTQ Manifest Phred33V2
  • For import_sequences, do the following:
  • Leave “Select a mechanism” as Use collection to import
  • Set “elements” to #: data_to_import:sequence
  • Leave “Append an extension?” as No.
  • Press the Execute button.

But it failed with the following error. I also tried the other way around (first download the fastq.gz from the SRA, then upload from the local directory), but it failed again.

Unexpected error importing data:
/mnt/efs/fs1/cancer-usegalaxy-shared/database/datasets/062/dataset_62594.dat is not a(n)
SingleEndFastqManifestPhred33V2 file:Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding

Could you please let me know how to create qza files from the fastq.gz using Galaxy server?

@jaehyun.kim,

SingleEndFastqManifestPhredd33v2 expects a single file containing the data, where as a collection like you are trying to upload is a directory(folder) of data. Instead you can follow the example below to import the collection:

By selecting the Associate Individual Files option, you give your files new names that follow contractual obligations for filenames demanded by the Casava One Eight format, in the includeded screenshot, you can see where I have given an example of what this should look like, just replace the more generic terms with those terms from the data in each file and you should be good to go. If you had a lot more files to import, it might be worth using the use collection to import option, but in your case I think doing the individual file association would be the fastest.

The following I only offer in the interest of getting questions answered more quickly in the future; it would be helpful if you could you post the full output of the error message you received. It looks like you posted a screenshot here, which cut the end of message off. You can always post large blocks of command line text inside of a "Preformatted Text Block" (use the image button).

Hi @Keegan-Evans. Thank you for your reply. It seems working now.

The full output of the error was as below. It seems like something went wrong with the UTF-8:

Unexpected error importing data:
/mnt/efs/fs1/cancer-usegalaxy-shared/
database/datasets/062/dataset_62177.d
at is not a(n)
SingleEndFastqManifestPhred33V2 file:
Metadata file must be encoded as
UTF-8 or ASCII. The following error
occurred when decoding the file:
'utf-8' codec can't decode byte 0x8b
in position 1: invalid start byte
There may be more errors present in
the metadata file. To get a full
report, sample/feature metadata files
can be validated with Keemei:
https://keemei.qiime2.org
Find details on QIIME 2 metadata
requirements here: https://docs.qiime
2.org/2022.11/tutorials/metadata/
                                                                                                                                                                                                                                                               
:(
Traceback (most recent call last):
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/metadata/io.py", line 73, in read
    header = self._read_header()
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/metadata/io.py", line 145, in _read_header
    for row in self._reader:
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/metadata/io.py", line 71, in <genexpr>
    self._reader = (self._strip_cell_whitespace(row)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/encodings/utf_8_sig.py", line 69, in _buffer_decode
    return codecs.utf_8_decode(input, errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_types/per_sample_sequences/_format.py", line 40, in _validate_
    md = qiime2.Metadata.load(str(self))
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/metadata/metadata.py", line 396, in load
    return MetadataReader(filepath).read(
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/metadata/io.py", line 84, in read
    raise MetadataFileError(
qiime2.metadata.io.MetadataFileError: Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding the file:

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

Find details on QIIME 2 metadata requirements here: https://docs.qiime2.org/2022.11/tutorials/metadata/

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/plugin/model/file_format.py", line 26, in validate
    self._validate_(level)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_types/per_sample_sequences/_format.py", line 42, in _validate_
    raise ValidationError(md_exc) from md_exc
qiime2.core.exceptions.ValidationError: Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding the file:

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

Find details on QIIME 2 metadata requirements here: https://docs.qiime2.org/2022.11/tutorials/metadata/

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/qiime2-2022.11/bin/q2galaxy", line 11, in <module>
    sys.exit(root())
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/q2galaxy/__main__.py", line 96, in run
    builtin_runner(action, config)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/q2galaxy/core/drivers/builtins.py", line 24, in builtin_runner
    tool(inputs, stdio=stdio)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/q2galaxy/core/drivers/builtins.py", line 43, in import_data
    artifact = _import_name_data(type_, format_, files_to_move,
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/q2galaxy/core/drivers/stdio.py", line 38, in wrapped
    return function(*args, **kwargs)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/q2galaxy/core/drivers/builtins.py", line 79, in _import_name_data
    return qiime2.Artifact.import_data(type_, path, view_type=format_)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/result.py", line 321, in import_data
    return cls._from_view(type_, view, view_type, provenance_capture,
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/result.py", line 349, in _from_view
    result = transformation(view, validate_level)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/transform.py", line 68, in transformation
    self.validate(view, level=validate_level)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/transform.py", line 143, in validate
    view.validate(level)
  File "/opt/conda/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/plugin/model/file_format.py", line 28, in validate
    raise ValidationError(
qiime2.core.exceptions.ValidationError: /mnt/efs/fs1/cancer-usegalaxy-shared/database/datasets/062/dataset_62177.dat is not a(n) SingleEndFastqManifestPhred33V2 file:

Metadata file must be encoded as UTF-8 or ASCII. The following error occurred when decoding the file:

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

There may be more errors present in the metadata file. To get a full report, sample/feature metadata files can be validated with Keemei: https://keemei.qiime2.org

Find details on QIIME 2 metadata requirements here: https://docs.qiime2.org/2022.11/tutorials/metadata/

After following your instruction, I successfully imported the FASTQ.GZ files. I still have one more question.

In your screenshot, there's an 'Add Elements' section where you specified the name as 'sampleID_barcode1_R1_001.fastq.gz'. However, the sample names in my case (IBD177, IBD178) and their corresponding URLs do not adhere to that particular format. Despite the discrepancy between the sample's name and the specified rules, I proceeded with the input based on your screenshot. What would be the correct approach in this situation?

1 Like

That is just the way the Casava format expects the files to be named. You could replace SampleID with your sample name, and could get the barcode from your metadata, or I think probably just make something up, the import tool should rename the files to what you put here, and the sample names should already be associated from the upload step.

You just need to make sure that whatever you put matches this regular expression: r'.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz', so that it can be validly stored in the Casava Format. You could use a regular expression checker(like Pythex, though their SSL certificate seems to be stale right now, so you may want to find a different one) to verify that what you would like to use will work before trying the import and spending all of the time on it.

Thank you so much. I have additional question.
Thanks to your advice, I successfully completed the import step and proceeded to the next stage of performing qiime2 demux summarize . However, when I attempted to view the generated .qzv file for all samples using Q2View, I encountered an error message stating "Not a .qza/.qzv file," as shown in the image below.

While I was able to view the individual sample's (IBD177, IBD178) .qzv files as below, I am interested in viewing the interactive quality plot for all samples collectively.

Previously, when I used QIIME2 with Python to execute demux summarize , it would generate a single .qzv file even when multiple samples were present, allowing me to access the interactive quality plot for all of them. Is it not possible to view the interactive quality plot for all samples when conducting the analysis on the Galaxy server?

Thank you!

Hello @jaehyun.kim. Can you send screenshots of the exact import command you ran and the exact demux summarize command you ran? Just trying to get a good idea of exactly what has happened. Additionally, you didn't run anything between importing and summarize correct? Thank you.

1 Like

Hi @Oddant1.
I started over from the beginning by deleting the previous history and proceeded with a fresh upload. I followed the exact same process as before. As shown in the screen capture below, I performed the upload step, but this time even the data_to_import:sequence was not created properly. What's really strange is that in the previous attempt, everything went smoothly up to this point. However, this time, even the upload step didn't work correctly. I'm not sure of the exact reason behind this issue.


I tried to investigate why the data_to_import:sequence was not created, but unfortunately, I couldn't find any error message to provide a specific reason. The only thing I noticed was an empty list without any additional information.
image

Thank you!

@jaehyun.kim, that is strange. Your upload looks fine. It looks like you tried multiple times and it failed every time. It is possible this was just a transient error due to the servers hosting your data being temporarily down, or due to the galaxy server itself having issues. I was able to successfully perform the upload just now. Try again now that it has been a few days and see if it works for you. If there were any server related issues they should be done now.

1 Like

Thank you @Oddant1. Yes, this is strange. But now it worked.
After importing the fastq.gz files, I followed the same procedure as before using qiime2 tools import and qiime2 demux summarize to generate qzv files. The execution screens provided below indicate that everything executed successfully.


I would like to obtain a single qzv file for multiple samples, similar to when I generated qzv files using Python. However, the qzv file obtained from Galaxy, as shown in the image below, combines the IBD177 sample and the IBD178 sample into a single entity.
image

I'm unsure how to obtain a single quality plot (qzv) for more than two samples.
Thank you!

2 Likes

Hello @jaehyun.kim, instead of importing the collection, try creating two different elements as shown in this screenshot. You should be able to get the (hidden) IBD177 and 178 by selecting "single dataset" as your type then clicking on the little folder on the right. That should open a menu that lets you select individual elements from collections you have uploaded.

If you import in this manner then run demux summarize it should more closely mimic what you did on the cli.

2 Likes

Thank you @Oddant1. This exactly mimics what I did on the CLI. Thank you so much for your support!

4 Likes