Different information of imported qza file in Linux and Mac(Intel) version.

Hi!

When I start dealting with the raw sequence download from NCBI, there shows some problems of qza files after the import step.

THE CODE in my Linux server (Ubuntu):
[version 2024-2, installed with anaconda with the yml file]

  1. fasterq-dump --split-3 SRR15347540
  2. time qiime tools import
    --type 'SampleData[PairedEndSequencesWithQuality]'
    --input-path SRR15347540.txt
    --output-path SRR15347540.qza
    --input-format PairedEndFastqManifestPhred33V2
  3. time qiime demux summarize --i-data SRR15347540.qza --o-visualization SRR15347540.qzv

THE SAME CODE processed in my mac.

However, when I start to check the SRR15347540.qzv on qiime2 view, there are really different on Interactive Plot.
On Mac:

On Linux:

I am really curious about it and check the size of qza file while 9MB of V-Linux and 33MB of V-macos.

I have no idea and I hope a correct base quality score for downsteam DADA2 analysis.

I would appreciate if you can give me some advice about it.
Apologizing for bothering you!
Thanks so much!

Hi @Jyi_Y,
Would you mind sending the visualizations for both of these runs? I want to peek around and look at the provenance.

Of course! I will send them there.
Thanks for your help and now I am just try to reinstall it on my Linux server.
SRR15347540_Linux.qzv (311.2 KB)

SRR15347540_mac.qzv (314.9 KB)

Hi @Jyi_Y,
This is certainly weird!
would you mind sending me your SRR15347540.qza, I am going to try to replicate this?

Sure. But it seems that the qza file are too large to upload than the limit. I have to upload them on my Github repository GitHub - Jyi-Yang/Qiime2data_qza :smiling_face_with_tear:
Thank you!

1 Like

Hi @Jyi_Y,
Those demux files are different sizes. Are you expecting that?

Well, thanks for your help and I don't expect that. The input fastq file is the same one, I just want to know why the results are different and want to fix the problem.
Have you ever been in this situation?
Unfortunately, I cannot install qiime2 after I remove the last one, even reinstalled conda.
I keep trying now.
Thanks again.

Hello @Jyi_Y,

It appears as though something went wrong with the actual .fastq files in your Linux .qza. The left is Linux and the right is Mac. As you can see, the quality scores in the Linux file were replaced with nothing but ?. This is why the mac file is much larger when zipped. The strings of ? in the Linux file compress much better than the actual scores in the Mac file.



As for how or why this happened, I can't say. Maybe it has something to do with fasterq-dump? I'm unfamiliar with that utility. I would say based on what I'm seeing here the results you are seeing on Mac are probably legitimate and something went very wrong on Linux.

You said you are unable to reinstall QIIME 2, can you create a separate thread about that? Tell us which OS you are trying to reinstall it on, what version you are trying to install, and what error you're getting.

Thank you.

3 Likes

Thanks so much!
And I would have another try. I want to reinstall Qiime2 on Linux. Though I have obtained some solutions in this forum, I cannot deal with it and I cannot install this bioconda::bioconductor-genomeinfodbdata-1.2.9, by ---conda install. So I tried to remove it from yml and install independently.
That's really help and I will check and compare the raw sequences obtained by --fasterq-dump and --fastq-dumpon linux.
Appreciate for all of you for your time and help.

The Error shows here:
Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
ERROR conda.core.link:_execute(945): An error occurred while installing package 'bioconda::bioconductor-genomeinfodbdata-1.2.9-r42hdfd78af_0'.
Rolling back transaction: done
class: LinkError
message:
post-link script failed for package bioconda::bioconductor-genomeinfodbdata-1.2.9-r42hdfd78af_0
location of failed script: /home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/bin/.bioconductor-genomeinfodbdata-post-link.sh
==> script messages <==

==> script output <==
stdout:
stderr: QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
++ dirname -- /home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/bin/installBiocDataPackage.sh

  • SCRIPT_DIR=/home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/bin/../share/bioconductor-data-packages
  • json=/home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/bin/../share/bioconductor-data-packages/dataURLs.json
    ++ yq '."genomeinfodbdata-1.2.9".fn' /home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/bin/../share/bioconductor-data-packages/dataURLs.json
  • FN='"GenomeInfoDbData_1.2.9.tar.gz"'
  • IFS=
  • read -r value
    ++ yq '."genomeinfodbdata-1.2.9".urls' /home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/bin/../share/bioconductor-data-packages/dataURLs.json
  • URLS+=($value)
  • IFS=
  • read -r value
  • URLS+=($value)
  • IFS=
  • read -r value
  • URLS+=($value)
  • IFS=
  • read -r value
    ++ yq '."genomeinfodbdata-1.2.9".md5' /home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/bin/../share/bioconductor-data-packages/dataURLs.json
  • MD5='"7cc138cfb74665fdfa8d1c244eac4879"'
  • STAGING=/home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/share/genomeinfodbdata-1.2.9
  • mkdir -p /home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/share/genomeinfodbdata-1.2.9
  • TARBALL='/home/YangJingyi/anaconda3/envs/qiime2-amplicon-2024.2/share/genomeinfodbdata-1.2.9/"GenomeInfoDbData_1.2.9.tar.gz"'
  • SUCCESS=0
  • for URL in ${URLS[@]}
    ++ echo '"https://bioconductor.org/packages/3.16/data/annotation/src/contrib/GenomeInfoDbData_1.2.9.tar.gz"'
    ++ tr -d '"'
  • URL=https://bioconductor.org/packages/3.16/data/annotation/src/contrib/GenomeInfoDbData_1.2.9.tar.gz
    ++ tr -d '"'
    ++ echo '"7cc138cfb74665fdfa8d1c244eac4879"'
  • MD5=7cc138cfb74665fdfa8d1c244eac4879
  • curl -L https://bioconductor.org/packages/3.16/data/annotation/src/contrib/GenomeInfoDbData_1.2.9.tar.gz
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    100 415 100 415 0 0 375 0 0:00:01 0:00:01 --:--:-- 375
    40 11.1M 40 4657k 0 0 11872 0 0:16:27 0:06:41 0:09:46 11701
    curl: (18) transfer closed with 6955914 bytes remaining to read

return code: 18
kwargs:
{}

: <exception str() failed>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/YangJingyi/anaconda3/bin/conda", line 13, in
sys.exit(main())
^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/cli/main.py", line 128, in main
return conda_exception_handler(main, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/exception_handler.py", line 388, in conda_exception_handler
return_value = exception_handler(func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/exception_handler.py", line 20, in call
return self.handle_exception(exc_val, exc_tb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/exception_handler.py", line 62, in handle_exception
return self.handle_application_exception(exc_val, exc_tb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/exception_handler.py", line 78, in handle_application_exception
self._print_conda_exception(exc_val, exc_tb)
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/exception_handler.py", line 84, in _print_conda_exception
print_conda_exception(exc_val, exc_tb)
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/exceptions.py", line 1258, in print_conda_exception
stderrlog.error("\n%r\n", exc_val)
File "/home/YangJingyi/anaconda3/lib/python3.11/logging/init.py", line 1518, in error
self._log(ERROR, msg, args, **kwargs)
File "/home/YangJingyi/anaconda3/lib/python3.11/logging/init.py", line 1634, in _log
self.handle(record)
File "/home/YangJingyi/anaconda3/lib/python3.11/logging/init.py", line 1643, in handle
if (not self.disabled) and self.filter(record):
^^^^^^^^^^^^^^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/logging/init.py", line 830, in filter
result = f.filter(record)
^^^^^^^^^^^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/gateways/logging.py", line 65, in filter
record.msg = record.msg % new_args
~~~^~
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/init.py", line 104, in repr
errs.append(e.repr())
^^^^^^^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/init.py", line 58, in repr
return f"{self.class.name}: {self}"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/YangJingyi/anaconda3/lib/python3.11/site-packages/conda/init.py", line 62, in str
return str(self.message % self._kwargs)
~^~
ValueError: unsupported format character 'T' (0x54) at index 2212

1 Like

Hi!
Firstly, thanks for all of your help and I have solved the problems there.

  1. In the qiime2 reintallation, I have changed the order of channels in .yml file for:
    ERROR conda.core.link:_execute(945): An error occurred while installing package 'bioconda::bioconductor-genomeinfodbdata-1.2.9-r42hdfd78af_0'.
    Rolling back transaction: done
    class: LinkError

  2. As for the different results
    They were caused by the default setting in sratoolkit for SRA file downloading in linux and macOS of dump different runs:.

  • on Mac - SRA Normalized Format files with full base quality scores,
  • on Linux - SRA Lite files with simplified base quality scores.

Solution:
To download only SRA Normalized Format files with full base quality scores:
Add NCBI_VDB_QUALITY=R
Like NCBI_VDB_QUALITY=R fasterq-dump --split-3 SRR25305389
It will enforce use just files with full quality scores.

Thanks for all of your help there!

2 Likes