Sidle reconstruct-database

Hello,

I've been trying to reconstruct a V1V9 count table from 6 different hypervariable regions.
I have encountered some issue in the process parallelization through dask , but I managed to resolve without it.
I then encountered an error when running:

qiime sidle reconstruct-database --p-debug \
	--p-region V1V2 --i-kmer-map $db_kmer_folder/sidle-db-V1V2-$kmer_length\nt-map.qza --i-regional-alignment $home_dir/denoised_V1V2/alignment/V1V2-align-map.qza \
	--p-region V2V3 --i-kmer-map $db_kmer_folder/sidle-db-V2V3-$kmer_length\nt-map.qza --i-regional-alignment $home_dir/denoised_V2V3/alignment/V2V3-align-map.qza \
	--p-region V3V4 --i-kmer-map $db_kmer_folder/sidle-db-V3V4-$kmer_length\nt-map.qza --i-regional-alignment $home_dir/denoised_V3V4/alignment/V3V4-align-map.qza \
	--p-region V4V5 --i-kmer-map $db_kmer_folder/sidle-db-V4V5-$kmer_length\nt-map.qza --i-regional-alignment $home_dir/denoised_V4V5/alignment/V4V5-align-map.qza \
	--p-region V5V7 --i-kmer-map $db_kmer_folder/sidle-db-V5V7-$kmer_length\nt-map.qza --i-regional-alignment $home_dir/denoised_V5V7/alignment/V5V7-align-map.qza \
	--p-region V7V9 --i-kmer-map $db_kmer_folder/sidle-db-V7V9-$kmer_length\nt-map.qza --i-regional-alignment $home_dir/denoised_V7V9/alignment/V7V9-align-map.qza \
	--o-database-map $home_dir/reconstruction/V1V9_map.qza --o-database-summary $home_dir/reconstruction/V1V9_summary.qza

The error message is the following:

Plugin error from sidle:

  New division must be list or tuple

I tried several input disposition orders, such as adding all same kind of inputs as lists, instead of repeating the --param for each file:

qiime sidle reconstruct-database --p-debug --p-region V1V2 V2V3 V3V4 V4V5 V5V7 V7V9 \
--i-kmer-map $db_kmer_folder/sidle-db-V1V2-$kmer_length\nt-map.qza $db_kmer_folder/sidle-db-V2V3-$kmer_length\nt-map.qza $db_kmer_folder/sidle-db-V3V4-$kmer_length\nt-map.qza $db_kmer_folder/sidle-db-V4V5-$kmer_length\nt-map.qza $db_kmer_folder/sidle-db-V5V7-$kmer_length\nt-map.qza $db_kmer_folder/sidle-db-V7V9-$kmer_length\nt-map.qza \
--i-regional-alignment $home_dir/denoised_V1V2/alignment/V1V2-align-map.qza $home_dir/denoised_V2V3/alignment/V2V3-align-map.qza $home_dir/denoised_V3V4/alignment/V3V4-align-map.qza $home_dir/denoised_V4V5/alignment/V4V5-align-map.qza $home_dir/denoised_V5V7/alignment/V5V7-align-map.qza $home_dir/denoised_V7V9/alignment/V7V9-align-map.qza \
--o-database-map $home_dir/reconstruction/V1V9_map.qza \
--o-database-summary $home_dir/reconstruction/V1V9_summary.qza

But the error message does not vary from before.

I then tested file corruption, and everything was verified at max level through qiime tools validate.
I don't know where the problem lies, and I could not find anything similar in previous discussions on the q2-sidle plugin.

I would be really grateful if someone could share their opinion on this issue!

P.S.: Since the installation of qiime2 amplicon distribution (2024.5) I get the following warning when using the q2-sidle plugin:

.../user/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/dask/dataframe/_pyarrow_compat.py:15: FutureWarning: Minimal version of pyarrow will soon be increased to 14.0.1. You are using 8.0.0. Please consider upgrading.
  warnings.warn(

And it seems I can't update the pyarrow version, in my opinion due to the installed pyarrow-hotfix package.
I don't think this is the reason behind the issue, as it is a warning and not an error, but just to let you know about it!

Thank you,
Andrea Gatti

Hello @Andrea_G,

It is possible that this plugin error could be related to the required input for --i-regional-alignment or --i-kmer-map, which should be a list. Have you checked if the last backslash in your kmer-map qza file paths (e.g., \nt-map.qza) might be causing a problem in this context?

Andrés

1 Like

Thank you @andresarroyo for the quick response!
I'm working in an HPC environment and all the operations are sent through SLURM jobs, so I set the variable $kmer_length=300 in the script.
Anyway, I proceeded to re-submit the job after I switched the variable reference with the actual number inside the file names (300 in this case).

The error message is still the same...

All the inputs I'm trying to use are obtained following the documentation, step by step.
Do you perhaps have other ideas/suggestions?

Thank you so much,
Andrea

Adding information: I re-tried, changing the variable to "300" in the filepath, for both the approaches mentioned in the first post, without success...
In addition, I've had no problems running the previous step:

for i in V1V2 V2V3 V3V4 V4V5 V5V7 V7V9
do

output_dir=$home_dir/denoised_$i
mkdir -p $output_dir/alignment

qiime sidle align-regional-kmers \
--p-debug \
--i-kmers $db_kmer_folder/sidle-db-$i\-$kmer_length\nt-kmers.qza \
--i-rep-seq $output_dir/rep-seqs-$kmer_length\nt.qza \
--p-max-mismatch $mismatch \
--p-region $i \
--o-regional-alignment $output_dir/alignment/$i-align-map.qza

done

Hi @Andrea_G,

Could you please provide more details about the error message and version? Could you run in verbose mode? This is a python type error, so it would be helpfult osee where in the code the problem is.

Thanks,
Justine

Hi @jwdebelius,

I have submitted again the job using full paths to files, to avoid any possible typo issue, both avoiding and using parallel processing.
Here below the error:

/adgatti/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/dask/dataframe/_pyarrow_compat.py:15: FutureWarning: Minimal version of pyarrow will soon be increased to 14.0.1. You are using 8.0.0. Please consider upgrading.
  warnings.warn(
Traceback (most recent call last):
  File "/adgatti/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in __call__
    results = self._execute_action(
  File "/adgatti/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
    results = action(**arguments)
  File "<decorator-gen-680>", line 2, in reconstruct_database
  File "/adgatti/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self._callable_executor_(
  File "/adgatti/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/adgatti/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_sidle/_build_database.py", line 93, in reconstruct_database
    mapped_kmer = mapped_kmer.repartition(npartitons)
  File "/adgatti/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/dask_expr/_collection.py", line 1358, in repartition
    check_divisions(divisions)
  File "/adgatti/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/dask/dataframe/core.py", line 7639, in check_divisions
    raise ValueError("New division must be list or tuple")
ValueError: New division must be list or tuple

Plugin error from sidle:

  New division must be list or tuple

See above for debug info.

I'm working in Debian 4.19.194-3,and I have installed qiime2-amplicon-2024.5 distribution in a conda environment with the following packages:

dask                      2024.8.0           pyhd8ed1ab_0    conda-forge
dask-core                 2024.8.0           pyhd8ed1ab_0    conda-forge
dask-expr                 1.1.10             pyhd8ed1ab_0    conda-forge
rescript                  2024.5.1         py39hd35c8a2_0    https://packages.qiime2.org/qiime2/2024.5/amplicon/released
q2-sidle                  2021.2.dev0              pypi_0    pypi
pyarrow                   8.0.0            py39h992f0b0_0
pyarrow-hotfix            0.6                pyhd8ed1ab_0    conda-forge

Thank you,
Andrea

1 Like

Hi @Andrea_G,

It looks like this is a change associated with Dask and how things run in parallel. I need some time to do more testing. Can you give me a week to look into some things and follow up?

Sorry I don't have an immediate fix.

Best,
Justine

Thank you for the response!
Take your time, of course! In the meanwhile, I might be trying using another conda environment in which I installed qiime2 (v2022.2) and the required dask package and sidle/rescript plugins.
If the error is related dask, then I hope using a precedent version should be helpful.

Thank you for your time! :pray:

Andrea Gatti

1 Like

Hello again @Andrea_G,

A few weeks ago, I encountered a different problem with Dask (error: P2P shuffling [id] failed during the transfer phase) while working with q2-sidle in a qiime2-amplicon-2024.5 Conda environment in Ubuntu 22.04.4 LTS, which was using Dask v2024.7.1, dask-core v2024.7.1, and dask-expr v1.19. I was able to solve the problem by specifying Dask v2023.5.0 in the YAML config file, which had worked for me in a previous qiime2-amplicon-2024.2 environment.

Hope it helps!

Andrés

1 Like

Thank you @andresarroyo, I will try to do the same then and see if I can fix that.

By the way, I used a previous qiime2 installation in another conda environment (2022.2) and I was able to reconstruct my table just fine. I'll work on this output data for now, and switch to more updated versions of the dependencies if I manage to make them work :+1:

Thank you so much for the feedback!
Andrea

2 Likes