ancombc error Value provided in `reference_levels` parameter not found in the associated column within the metadata.

Hi team ,

I have this error (qiime2-amplicon-2024.5) qiime composition ancombc \

--i-table /dss/work/suga8254/ampliseq/merged_results/filtered-A_crassa-table_genus.qza
--m-metadata-file /dss/work/suga8254/ampliseq/merged_results/merged_metadata.tsv
--p-formula 'Sample_type + Environment'
--p-reference-levels "Sample_type::Feces Environment::Feces"
--verbose
--o-differentials /dss/work/suga8254/ampliseq/merged_results/ancombc-Sample_type_Environment_Feces_A_crassa-genus.qza

filtered-A_crassa-table_genus.qza (229.2 KB)
merged_metadata.tsv (6.5 KB)

Traceback (most recent call last):
  File "/user/suga8254/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in __call__
    results = self._execute_action(
  File "/user/suga8254/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
    results = action(**arguments)
  File "<decorator-gen-31>", line 2, in ancombc
  File "/user/suga8254/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self._callable_executor_(
  File "/user/suga8254/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/user/suga8254/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_composition/_ancombc.py", line 41, in ancombc
    return _ancombc(
  File "/user/suga8254/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2_composition/_ancombc.py", line 195, in _ancombc
    raise ValueError(message)
ValueError: Value provided in `reference_levels` parameter not found in the associated column within the metadata. Please make sure each column::value pair is present within the metadata file.

 column::value pair with a value that was not found: "Sample_type::Feces Environment::Feces"

NOTE: Your level value appears to contain a `:`, which can be a problem for this action.

Plugin error from composition:

  Value provided in `reference_levels` parameter not found in the associated column within the metadata. Please make sure each column::value pair is present within the metadata file.

   column::value pair with a value that was not found: "Sample_type::Feces Environment::Feces"

  NOTE: Your level value appears to contain a `:`, which can be a problem for this action.

See above for debug info.

I checked metadata file and column and content is there

(qiime2-amplicon-2024.5)

grep "Feces" /dss/work/suga8254/ampliseq/merged_results/merged_metadata.tsv
S63_HA_FE       Feces   Feces   Subject_4       H_atra  D       atra_S_Feces
S48_HA_FE       Feces   Feces   Subject_4       H_atra  D       atra_S_Feces
S36_HA_FE       Feces   Feces   Subject_1       H_atra  C_D     atra_S_Feces
S44_HA_FE       Feces   Feces   Subject_3       H_atra  C_D     atra_S_Feces
S52_HA_FE       Feces   Feces   Subject_4       H_atra  C_D     atra_S_Feces
S19_AC_FE       Feces   Feces   Subject_3       A_crassa        C_D     crassa_S_Feces
S27_AC_FE       Feces   Feces   Subject_4       A_crassa        C_D     crassa_S_Feces
S15_AC_FE       Feces   Feces   Subject_2       A_crassa        C_D     crassa_S_Feces
S11_AC_FE       Feces   Feces   Subject_1       A_crassa        D       crassa_S_Feces
(qiime2-amplicon-2024.5) [suga8254@hpcl002 exported_filtered-A_crassa-table]$ head /dss/work/suga8254/ampliseq/merged_results/merged_metadata.tsv
sampleID        Sample_type     Environment     Subject Species run     Group
wA28Mg  Midgut  Gut     Subject_1       H_atra  A_B     atra_W_Midgut
S33_HA_FG       Foregut Gut     Subject_1       H_atra  D       atra_S_Foregut
wC1Fg   Foregut Gut     Subject_1       A_crassa        A_B     crassa_W_Foregut
wC16Sg  Seagrass        Seagrass        Seagrass        A_crassa        B       crassa_W_Seagrass
wA34Mg  Midgut  Gut     Subject_3       H_atra  B       atra_W_Midgut
S63_HA_FE       Feces   Feces   Subject_4       H_atra  D       atra_S_Feces
wC24SeaW        Seawater        Seawater        Seawater        A_crassa        B_D     crassa_W_Seawater
S43_HA_HG       Hingut  Gut     Subject_3       H_atra  D       atra_S_Hingut
S28_AC_SE       Sediment        Sediment        Sediment        A_crassa        C_D     crassa_S_Sediment

Could you please help me figure out why the Sample_type::Feces Environment::Feces is giving me an error ?

Many thanks in advance,
Sabrin

Hi @Sabrin,

Here's where the mixup is coming from:

You've enclosed both sets of reference levels in one set of quotes, which is being interpreted as a single column::value pair. You can either enclose each pair in quotes:

--p-reference-levels "Sample_type::Feces" "Environment::Feces"

Or you can leave quotations out entirely:

--p-reference-levels Sample_type::Feces Environment::Feces

Hope this helps! Cheers :lizard:

1 Like

I tried it but get an error still

Error: Estimation failed for the following covariates:
EnvironmentGut, EnvironmentSeagrass, EnvironmentSeawater, EnvironmentSediment
Please ensure that these covariates do not have missing values and check for multicollinearity before re-estimating the model
qiime composition ancombc \
  --i-table filtered-A_crassa-table_genus.qza \
  --m-metadata-file merged_metadata.tsv  \
  --p-formula 'Sample_type + Environment' \
  --p-reference-levels "Sample_type::Feces" "Environment::Feces" \
   --verbose \
  --o-differentials ancombc-Sample_type_Environment_Feces_A_crassa-genus.qza

Hi @Sabrin,

This is a different error message, which in some ways is good news (i.e. we've sorted out the original problem)!

So for this error:

Error: Estimation failed for the following covariates:
EnvironmentGut, EnvironmentSeagrass, EnvironmentSeawater, EnvironmentSediment
Please ensure that these covariates do not have missing values and check for multicollinearity before re-estimating the model

It looks like a couple of things could be happening.

Either there are missing values associated with the covariates listed above (i.e. the Gut, Seagrass, Seawater and Sediment groups under the Environment column), or there are covariates in this list that are multicollinear (i.e. your independent variables are actually correlated).

If your problem ends up being the latter, this can be a bit trickier to resolve; you may find this related thread to be helpful in figuring out how you'll want to adjust your model to deal with this.

Hope this helps! Cheers :lizard: