decontam-score-viz missing output?

lxsteiner · May 15, 2024, 2:08pm

Hi,

I was using qiime quality-control decontam-score-viz in the latest qiime release and noticed that it only reports the histogram of decontam scores and not also the individual features labeled as contaminants. Is that how it should be? It would be nice to check the features in question before proceeding to remove them.

Usage: qiime quality-control decontam-score-viz [OPTIONS]

  Creates histogram based on the output of decontam identify

Inputs:
  --i-decontam-scores ARTIFACTS... Collection[FeatureData[DecontamScore]]
                          Output from decontam identify to be visualized
                                                                    [required]
  --i-table ARTIFACTS... Collection[FeatureTable[Frequency]]
                          Raw OTU/ASV table that was used as input to
                          decontam-identify                         [required]
Parameters:
  --p-threshold NUMBER    Select threshold cutoff for decontam algorithm
                          scores                                [default: 0.1]
  --p-weighted / --p-no-weighted
                          weight the decontam scores by their associated read
                          number                               [default: True]
  --p-bin-size NUMBER     Select bin size for the histogram    [default: 0.02]
Outputs:
  --o-visualization VISUALIZATION
                                                                    [required]

@jordenrabasco Or should I still preferably do this step in R instead? Thanks.

jordenrabasco · May 15, 2024, 2:27pm

Hi @lxsteiner thanks for pointing this out! We actually have an active pull request that will do exactly what you are suggesting and provide a table that will allow you to investigate each individual feature. If you would like you can download the updated code here, then install it locally within your qiime env via the "make dev" command. Let me know if you have any further questions and hope that helps!

lxsteiner · May 16, 2024, 9:06am

Hi @jordenrabasco thanks for the answer. Unfortunately that didn't work for me - I get the same output with only the barplot as before. Not sure if I did something wrong

$ conda activate qiime2-amplicon-2024.2
$ git clone https://github.com/jordenrabasco/q2-quality-control.git
$ cd q2-quality-control/
$ make dev
pip install -e .
Obtaining file:///.../q2-quality-control
  Preparing metadata (setup.py) ... done
Installing collected packages: q2-quality-control
  Attempting uninstall: q2-quality-control
    Found existing installation: q2-quality-control 2024.2.0
    Uninstalling q2-quality-control-2024.2.0:
      Successfully uninstalled q2-quality-control-2024.2.0
  Running setup.py develop for q2-quality-control
Successfully installed q2-quality-control-0+untagged.234.gbf11ca5

seemed good so far! But then when I ran the commands with --help there was nothing new added to pass additional files with features or whatever, so I assumed it at least internally changed.

$ qiime quality-control decontam-score-viz --help
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Usage: qiime quality-control decontam-score-viz [OPTIONS]

  Creates histogram based on the output of decontam identify

Inputs:
  --i-decontam-scores ARTIFACTS... Collection[FeatureData[DecontamScore]]
                          Output from decontam identify to be visualized
                                                                    [required]
  --i-table ARTIFACTS... Collection[FeatureTable[Frequency]]
                          Raw OTU/ASV table that was used as input to
                          decontam-identify                         [required]
Parameters:
  --p-threshold NUMBER    Select threshold cutoff for decontam algorithm
                          scores                                [default: 0.1]
  --p-weighted / --p-no-weighted
                          weight the decontam scores by their associated read
                          number                               [default: True]
  --p-bin-size NUMBER     Select bin size for the histogram    [default: 0.02]
Outputs:
  --o-visualization VISUALIZATION
                                                                    [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --help                  Show this message and exit.

I reran the first two commands:

qiime quality-control decontam-identify \
--i-table dada2-table-271bp.qza \
--m-metadata-file ../metadata.txt \
--p-method prevalence \
--p-prev-control-column Sample_or_Control \
--p-prev-control-indicator Control \
--o-decontam-scores prev_decontam_scores_v2.qza

qiime quality-control decontam-score-viz \
--i-decontam-scores prev_decontam_scores_v2.qza \
--i-table dada2-table-271bp.qza \
--p-threshold 0.1 \
--p-no-weighted \
--p-bin-size 0.05 \
--o-visualization prev_decontam_score_noweigh_p01_viz_v2.qzv

Same plot, same numbers and nothing else:

Thanks.

lxsteiner · May 16, 2024, 9:22am

Not sure where I should be looking instead, but in q2-quality-control/q2_quality_control /plugin_setup.py

plugin.visualizers.register_function(
    function=decontam_score_viz,
    inputs={
        'decontam_scores': Collection[FeatureData[DecontamScore]],
        'table': Collection[FeatureTable[Frequency]],
        'rep_seqs': FeatureData[Sequence]
    },
    parameters={
        'threshold':  Float,
        'weighted': Bool,
        'bin_size': Float,
    },
    name='Generate a histogram representation of the scores',
    description='Creates histogram based on the output of decontam identify',
    input_descriptions={
        'decontam_scores': 'Output from decontam identify '
                           'to be visualized',
        'table': 'Raw OTU/ASV table that was used '
                 'as input to decontam-identify',
        'rep_seqs': ('Representative Sequences table which contaminate '
                     'seqeunces will be removed from')

there is an option to pass rep_seqs, but doesn't seem like the qiime command recognizes it:

 (1/1?) No such option: --i-rep-seqs
 (1/1?) No such option: --i-rep_seqs

lxsteiner · May 16, 2024, 9:30am

Also ran make test after make dev and it fails:

$ make test
py.test
==================================== test session starts ====================================
platform linux -- Python 3.8.15, pytest-8.0.0, pluggy-1.4.0
rootdir: /gxfs_work/geomar/smomw445/soft/q2-quality-control
plugins: typeguard-2.13.3, anyio-4.2.0
collected 94 items

q2_quality_control/tests/test_decontam.py ..........                                  [ 10%]
q2_quality_control/tests/test_filter.py .FFFF                                         [ 15%]
q2_quality_control/tests/test_quality_control.py .................................... [ 54%]
......................................                                                [ 94%]
q2_quality_control/tests/test_stats.py .....                                          [100%]

========================================= FAILURES ==========================================
_____________________ TestFilterSingle.test_filter_single_exclude_seqs ______________________

self = <q2_quality_control.tests.test_filter.TestFilterSingle testMethod=test_filter_single_exclude_seqs>

    def test_filter_single_exclude_seqs(self):
>       obs_art, = self.plugin.methods['filter_reads'](
            self.demuxed_art, self.indexed_genome, exclude_seqs=True)

q2_quality_control/tests/test_filter.py:47:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
<decorator-gen-46>:2: in filter_reads
    ???
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py:342: in bound_callable
    outputs = self._callable_executor_(
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py:566: in _callable_executor_
    output_views = self._callable(**view_args)
q2_quality_control/_filter.py:77: in filter_reads
    _bowtie2_filter(fwd, rev, filtered_seqs, database, n_threads, mode,
q2_quality_control/_filter.py:106: in _bowtie2_filter
    _run_command(bowtie_cmd)
q2_quality_control/_utilities.py:39: in _run_command
    subprocess.run(cmd, check=True, stdout=stdout, stdin=stdin, cwd=cwd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

input = None, capture_output = False, timeout = None, check = True
popenargs = (['bowtie2', '-p', '1', '--sensitive-local', '--rfg', '5,3', ...],)
kwargs = {'cwd': None, 'stdin': None, 'stdout': None}
process = <subprocess.Popen object at 0x15526d14b490>, stdout = None, stderr = None
retcode = 127

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['bowtie2', '-p', '1', '--sensitive-local', '--rfg', '5,3', '-x', '/tmp/qiime2/smomw445/data/32eb5e3e-f571-424d-bd94-2e56e0347c98/data/db', '-U', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-ydgb9030/sample_a_S01_L001_R1_001.fastq.gz', '-S', '/tmp/tmprkmjx5ai']' returned non-zero exit status 127.

../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------------- Captured stdout call ------------------------------------
Running external command line application. This may print messages to stdout and/or stderr.
The commands to be run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: bowtie2 -p 1 --sensitive-local --rfg 5,3 -x /tmp/qiime2/smomw445/data/32eb5e3e-f571-424d-bd94-2e56e0347c98/data/db -U /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-ydgb9030/sample_a_S01_L001_R1_001.fastq.gz -S /tmp/tmprkmjx5ai

----------------------------------- Captured stderr call ------------------------------------
perl: error while loading shared libraries: libnsl.so.1: cannot open shared object file: No such file or directory
_______________________ TestFilterSingle.test_filter_single_keep_seqs _______________________

self = <q2_quality_control.tests.test_filter.TestFilterSingle testMethod=test_filter_single_keep_seqs>

    def test_filter_single_keep_seqs(self):
>       obs_art, = self.plugin.methods['filter_reads'](
            self.demuxed_art, self.indexed_genome, exclude_seqs=False)

q2_quality_control/tests/test_filter.py:63:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
<decorator-gen-46>:2: in filter_reads
    ???
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py:342: in bound_callable
    outputs = self._callable_executor_(
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py:566: in _callable_executor_
    output_views = self._callable(**view_args)
q2_quality_control/_filter.py:77: in filter_reads
    _bowtie2_filter(fwd, rev, filtered_seqs, database, n_threads, mode,
q2_quality_control/_filter.py:106: in _bowtie2_filter
    _run_command(bowtie_cmd)
q2_quality_control/_utilities.py:39: in _run_command
    subprocess.run(cmd, check=True, stdout=stdout, stdin=stdin, cwd=cwd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

input = None, capture_output = False, timeout = None, check = True
popenargs = (['bowtie2', '-p', '1', '--sensitive-local', '--rfg', '5,3', ...],)
kwargs = {'cwd': None, 'stdin': None, 'stdout': None}
process = <subprocess.Popen object at 0x1552d649da60>, stdout = None, stderr = None
retcode = 127

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['bowtie2', '-p', '1', '--sensitive-local', '--rfg', '5,3', '-x', '/tmp/qiime2/smomw445/data/32eb5e3e-f571-424d-bd94-2e56e0347c98/data/db', '-U', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-lm45t44g/sample_a_S01_L001_R1_001.fastq.gz', '-S', '/tmp/tmp2lm08w15']' returned non-zero exit status 127.

../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------------- Captured stdout call ------------------------------------
Running external command line application. This may print messages to stdout and/or stderr.
The commands to be run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: bowtie2 -p 1 --sensitive-local --rfg 5,3 -x /tmp/qiime2/smomw445/data/32eb5e3e-f571-424d-bd94-2e56e0347c98/data/db -U /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-lm45t44g/sample_a_S01_L001_R1_001.fastq.gz -S /tmp/tmp2lm08w15

----------------------------------- Captured stderr call ------------------------------------
perl: error while loading shared libraries: libnsl.so.1: cannot open shared object file: No such file or directory
_____________________ TestFilterPaired.test_filter_paired_exclude_seqs ______________________

self = <q2_quality_control.tests.test_filter.TestFilterPaired testMethod=test_filter_paired_exclude_seqs>

    def test_filter_paired_exclude_seqs(self):
>       obs_art, = self.plugin.methods['filter_reads'](
            self.demuxed_art, self.indexed_genome, exclude_seqs=True)

q2_quality_control/tests/test_filter.py:89:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
<decorator-gen-46>:2: in filter_reads
    ???
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py:342: in bound_callable
    outputs = self._callable_executor_(
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py:566: in _callable_executor_
    output_views = self._callable(**view_args)
q2_quality_control/_filter.py:77: in filter_reads
    _bowtie2_filter(fwd, rev, filtered_seqs, database, n_threads, mode,
q2_quality_control/_filter.py:106: in _bowtie2_filter
    _run_command(bowtie_cmd)
q2_quality_control/_utilities.py:39: in _run_command
    subprocess.run(cmd, check=True, stdout=stdout, stdin=stdin, cwd=cwd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

input = None, capture_output = False, timeout = None, check = True
popenargs = (['bowtie2', '-p', '1', '--sensitive-local', '--rfg', '5,3', ...],)
kwargs = {'cwd': None, 'stdin': None, 'stdout': None}
process = <subprocess.Popen object at 0x1552e6b429d0>, stdout = None, stderr = None
retcode = 127

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['bowtie2', '-p', '1', '--sensitive-local', '--rfg', '5,3', '-x', '/tmp/qiime2/smomw445/data/32eb5e3e-f571-424d-bd94-2e56e0347c98/data/db', '-1', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-q8afsxex/sample_a_S01_L001_R1_001.fastq.gz', '-2', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-q8afsxex/sample_a_S01_L001_R2_001.fastq.gz', '-S', '/tmp/tmp7a3vsbxq']' returned non-zero exit status 127.

../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------------- Captured stdout call ------------------------------------
Running external command line application. This may print messages to stdout and/or stderr.
The commands to be run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: bowtie2 -p 1 --sensitive-local --rfg 5,3 -x /tmp/qiime2/smomw445/data/32eb5e3e-f571-424d-bd94-2e56e0347c98/data/db -1 /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-q8afsxex/sample_a_S01_L001_R1_001.fastq.gz -2 /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-q8afsxex/sample_a_S01_L001_R2_001.fastq.gz -S /tmp/tmp7a3vsbxq

----------------------------------- Captured stderr call ------------------------------------
perl: error while loading shared libraries: libnsl.so.1: cannot open shared object file: No such file or directory
_______________________ TestFilterPaired.test_filter_paired_keep_seqs _______________________

self = <q2_quality_control.tests.test_filter.TestFilterPaired testMethod=test_filter_paired_keep_seqs>

    def test_filter_paired_keep_seqs(self):
>       obs_art, = self.plugin.methods['filter_reads'](
            self.demuxed_art, self.indexed_genome, exclude_seqs=False)

q2_quality_control/tests/test_filter.py:105:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
<decorator-gen-46>:2: in filter_reads
    ???
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py:342: in bound_callable
    outputs = self._callable_executor_(
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/action.py:566: in _callable_executor_
    output_views = self._callable(**view_args)
q2_quality_control/_filter.py:77: in filter_reads
    _bowtie2_filter(fwd, rev, filtered_seqs, database, n_threads, mode,
q2_quality_control/_filter.py:106: in _bowtie2_filter
    _run_command(bowtie_cmd)
q2_quality_control/_utilities.py:39: in _run_command
    subprocess.run(cmd, check=True, stdout=stdout, stdin=stdin, cwd=cwd)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

input = None, capture_output = False, timeout = None, check = True
popenargs = (['bowtie2', '-p', '1', '--sensitive-local', '--rfg', '5,3', ...],)
kwargs = {'cwd': None, 'stdin': None, 'stdout': None}
process = <subprocess.Popen object at 0x1552d6214070>, stdout = None, stderr = None
retcode = 127

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['bowtie2', '-p', '1', '--sensitive-local', '--rfg', '5,3', '-x', '/tmp/qiime2/smomw445/data/32eb5e3e-f571-424d-bd94-2e56e0347c98/data/db', '-1', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-pv90bwiq/sample_a_S01_L001_R1_001.fastq.gz', '-2', '/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-pv90bwiq/sample_a_S01_L001_R2_001.fastq.gz', '-S', '/tmp/tmpsqa7f6wv']' returned non-zero exit status 127.

../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------------- Captured stdout call ------------------------------------
Running external command line application. This may print messages to stdout and/or stderr.
The commands to be run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: bowtie2 -p 1 --sensitive-local --rfg 5,3 -x /tmp/qiime2/smomw445/data/32eb5e3e-f571-424d-bd94-2e56e0347c98/data/db -1 /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-pv90bwiq/sample_a_S01_L001_R1_001.fastq.gz -2 /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-pv90bwiq/sample_a_S01_L001_R2_001.fastq.gz -S /tmp/tmpsqa7f6wv

----------------------------------- Captured stderr call ------------------------------------
perl: error while loading shared libraries: libnsl.so.1: cannot open shared object file: No such file or directory
===================================== warnings summary ======================================
../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/archive/provenance.py:13
  /gxfs_work/geomar/smomw445/.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/archive/provenance.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/pkg_resources/__init__.py:2868
  /gxfs_work/geomar/smomw445/.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/pkg_resources/__init__.py:2868: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

../../.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/pkg_resources/__init__.py:2868
  /gxfs_work/geomar/smomw445/.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/pkg_resources/__init__.py:2868: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

q2_quality_control/tests/test_decontam.py::TestIdentify::test_combined
q2_quality_control/tests/test_decontam.py::TestIdentify::test_combined
q2_quality_control/tests/test_decontam.py::TestIdentify::test_combined
  <frozen importlib._bootstrap>:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 232 from PyObject

q2_quality_control/tests/test_decontam.py::TestIdentify::test_combined
  /gxfs_work/geomar/smomw445/.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/utils/multiclass.py:14: DeprecationWarning: Please use `spmatrix` from the `scipy.sparse` namespace, the `scipy.sparse.base` namespace is deprecated.
    from scipy.sparse.base import spmatrix

q2_quality_control/tests/test_decontam.py::TestIdentify::test_combined
  /gxfs_work/geomar/smomw445/.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/utils/optimize.py:18: DeprecationWarning: Please use `line_search_wolfe2` from the `scipy.optimize` namespace, the `scipy.optimize.linesearch` namespace is deprecated.
    from scipy.optimize.linesearch import line_search_wolfe2, line_search_wolfe1

q2_quality_control/tests/test_decontam.py::TestIdentify::test_combined
  /gxfs_work/geomar/smomw445/.conda/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/sklearn/utils/optimize.py:18: DeprecationWarning: Please use `line_search_wolfe1` from the `scipy.optimize` namespace, the `scipy.optimize.linesearch` namespace is deprecated.
    from scipy.optimize.linesearch import line_search_wolfe2, line_search_wolfe1

q2_quality_control/tests/test_decontam.py::TestRemove::test_remove
  /gxfs_work/geomar/smomw445/soft/q2-quality-control/q2_quality_control/tests/test_decontam.py:137: FutureWarning: The behavior of .astype from SparseDtype to a non-sparse dtype is deprecated. In a future version, this will return a non-sparse array with the requested dtype. To retain the old behavior, use `obj.astype(SparseDtype(dtype))`
    temp_table.to_csv(test_biom_fp, sep="\t")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================== short test summary info ==================================
FAILED q2_quality_control/tests/test_filter.py::TestFilterSingle::test_filter_single_exclude_seqs - subprocess.CalledProcessError: Command '['bowtie2', '-p', '1', '--sensitive-local', '--r...
FAILED q2_quality_control/tests/test_filter.py::TestFilterSingle::test_filter_single_keep_seqs - subprocess.CalledProcessError: Command '['bowtie2', '-p', '1', '--sensitive-local', '--r...
FAILED q2_quality_control/tests/test_filter.py::TestFilterPaired::test_filter_paired_exclude_seqs - subprocess.CalledProcessError: Command '['bowtie2', '-p', '1', '--sensitive-local', '--r...
FAILED q2_quality_control/tests/test_filter.py::TestFilterPaired::test_filter_paired_keep_seqs - subprocess.CalledProcessError: Command '['bowtie2', '-p', '1', '--sensitive-local', '--r...
=================== 4 failed, 90 passed, 10 warnings in 66.99s (0:01:06) ====================
make: *** [Makefile:12: test] Error 1

jordenrabasco · May 16, 2024, 12:33pm

Hi @lxsteiner make sure you are git cloning from the score_viz_update branch.
You can do this by adding the -b flag to you git command. I have linked a stackoverflow post here that describes how to do it. When it is installed successfully there will be an additional rep-seqs parameter in the score-viz action.
Let me know if you run into anymore issues!

lxsteiner · May 17, 2024, 12:48pm

Thanks @jordenrabasco totally didn't pay attention to the branch. decontam-score-viz conveniently outputs a table with the individual ASVs now! I had also switched to R in the meantime, to explore my data.

FYI If the table is big, it's a quite sluggish, since the whole table is being loaded at once, especially if you try to sort by the columns.

I have a few other questions:

Could you please elaborate what the purpose of the --p-weighted / --p-no-weighted parameter is? I couldn't find a corresponding parameter in the R package.
When do you recommend the "batch" alternative decontam-identify-batches? I've read in other topics that you recommend to have several negative control samples (~5 or more), which some users might achieve only by pooling all of their different sequencing run batches beforehand. What approach do you recommend? Pool all runs to have >5 negative controls in one decontam process or still process all runs separetely with <3 negative controls in each individual batch?
I've read the paper and the tutorials but I don't think I managed to find information on reporting features/ASVs with "unknown" or "unassigned" decontam scores. What happened here?

Thanks!

jordenrabasco · May 21, 2024, 1:59pm

Thanks for the feedback!
From a users perspective would you prefer an option to truncate the table to the top X features to make it more manageable?

the weighted flag is to denote whether to display the read numbers in the histogram instead of the feature counts. So the features are weighted by their respective read numbers
Each method (batches and regular decontam-identify) was implemented to achieve a slightly different goal which maybe more applicable in some datasets but not others. If you have <5 controls in your subset tables, I would recommend the regular decontam-identify method if your goal is just to identify contaminant features in your data. However, if your goal was to identify to sources of contamination within your experiment, and you were not too concerned about the sensitivity of the contaminant identification then I would recommend the batches method. However with that said if you have a plethora of controls I would suggest running both methods on your data as we have seen that the different methods can emphasis different features as contaminants.
Features are assigned "unknown" when there is not enough statistical evidence to say that they are contaminants, but also not enough to definitely say that they are true features. Since we are asking the question about which of these features are contaminants we treat the unknowns as True features in our calculations but denote that there is low statistical evidence associated with them.

lxsteiner · May 22, 2024, 9:10am

Thank you for answering the questions!

Yeah, one work-around could be only listing the contaminants with their associated scores and FASTA sequences as it is now, instead of all the features. I suppose the nucleotide sequences also being displayed is what causes most of the lag. This would get even worse with merged PE amplicons I assume.
At least I, really just look for a short summary here on the numbers and then maybe do a quick and dirty look up for some of the reported features to see whether they really are contaminants, false positives, or whatever with a BLAST identity look up or similar. Additional investigations can be easier done within R or by downloading a TSV table and multiFASTA file from the .qzv.

I think qiime metadata tabulate has generally a good solution by wrapping tables into several clickable pages, with 100 features being display on a single page, in addition to providing a search field as well. 100 features per page is also even more than necessary. Not sure how difficult it is to implement a similar thing.
I just tried it for the heck it, to see what would happen on the output of decontam-identify

qiime metadata tabulate --m-input-file prev_decontam_scores.qza --o-visualization prev_decontam_scores.qzv

but the output is definitely messed up (plus all the postprocessing of decontam scores is missing anyway...):

maybe if decontam-score-viz also produced a .qza output it could simply work with tabulate?

Cheers.

system · June 22, 2024, 3:11pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.