Bioenv diversity error with qiime version 2021.4, but not with 2020.2

203967.metadata.tsv (702 Bytes) table-trim55126.qza (58.4 KB) rooted-tree.qza (76.7 KB)

I have qiime 2020.2 and qiime 2021.4 installed via conda.

Using the attached files, I ran the following command which succeeded in both qiime version 2020.2 and version 2021.4:

qiime diversity core-metrics-phylogenetic --verbose --i-phylogeny rooted-tree.qza --i-table table-trim55126.qza --m-metadata-file 203967.metadata.tsv --p-sampling-depth 300 --output-dir ./output/

.../site-packages/sklearn/metrics/pairwise.py:1735: DataConversionWarning: Data was converted to boolean for metric jaccard
  warnings.warn(msg, DataConversionWarning)
.../site-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:152: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -0.00311433891175376 and the largest is 0.5898033715594031.
  RuntimeWarning
Saved FeatureTable[Frequency] to: ./output/rarefied_table.qza
Saved SampleData[AlphaDiversity] % Properties('phylogenetic') to: ./output/faith_pd_vector.qza
Saved SampleData[AlphaDiversity] to: ./output/observed_otus_vector.qza
Saved SampleData[AlphaDiversity] to: ./output/shannon_vector.qza
Saved SampleData[AlphaDiversity] to: ./output/evenness_vector.qza
Saved DistanceMatrix % Properties('phylogenetic') to: ./output/unweighted_unifrac_distance_matrix.qza
Saved DistanceMatrix % Properties('phylogenetic') to: ./output/weighted_unifrac_distance_matrix.qza
Saved DistanceMatrix to: ./output/jaccard_distance_matrix.qza
Saved DistanceMatrix to: ./output/bray_curtis_distance_matrix.qza
Saved PCoAResults to: ./output/unweighted_unifrac_pcoa_results.qza
Saved PCoAResults to: ./output/weighted_unifrac_pcoa_results.qza
Saved PCoAResults to: ./output/jaccard_pcoa_results.qza
Saved PCoAResults to: ./output/bray_curtis_pcoa_results.qza
Saved Visualization to: ./output/unweighted_unifrac_emperor.qzv
Saved Visualization to: ./output/weighted_unifrac_emperor.qzv
Saved Visualization to: ./output/jaccard_emperor.qzv
Saved Visualization to: ./output/bray_curtis_emperor.qzv

Then, I ran this command:

qiime diversity bioenv --verbose --i-distance-matrix ./output/unweighted_unifrac_distance_matrix.qza --m-metadata-file 203967.metadata.tsv --o-visualization ./output/unweighted-unifrac-bioenv.qzv

In qiime version 2020.2, I saw the output:

Saved Visualization to: ./output/unweighted-unifrac-bioenv.qzv

But in qiime version 2021.4, I saw the output:

Traceback (most recent call last):
  File "/home/jacobs/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-495>", line 2, in bioenv
  File "/home/jacobs/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 244, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/jacobs/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 452, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/home/jacobs/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2_diversity/_beta/_visualizer.py", line 47, in bioenv
    metadata = qiime2.Metadata(df)
  File "/home/jacobs/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/metadata/metadata.py", line 357, in __init__
    super().__init__(dataframe.index)
  File "/home/jacobs/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/metadata/metadata.py", line 92, in __init__
    raise ValueError(
ValueError: Metadata must contain at least one ID.

Plugin error from diversity:

  Metadata must contain at least one ID.

See above for debug info.

Hello Daniel,

Check out the qiime2 docs on the format of Metadata for 2020.2 and 2021.4.

EDIT: That header line should be fine. I'm stumped! :thinking:

This metadata file successfully worked for other commands, just not for bioenv diversity.

Also, as I mentioned initially, this same command with the same files worked in qiime2 2020.2 but failed in qiime2 2021.4, so this is a difference between 2020.2 and 2021.4, not a difference between qiime2 and qiime1.

1 Like

Metadata in QIIME 2 — QIIME 2 2021.4.0 documentation says #SampleID is still supported for backwards compatibility, and it worked in qiime2 version 2020.2.

Also, I did try switching to SampleID without a pound sign and got the same error.

2 Likes

Thank you for the clarification. I should have read your original description more carefully.

I took a quick look at the git history, and q2-metadata seems pretty similar between 2020.2 and 2021.4. This is consistent with your observation that the metadata works for other commands.

The q2-diversity plugin has changed more, and there's been several edits to q2_diversity/_beta/_visualizer.py mentioned in the traceback.

Could this commit be the cause? I did notice some NAs in your metadata file...

    # Drop samples that have any missing values.
    # TODO use Metadata API if more filtering is supported in the future.
    df = metadata.to_dataframe()
    df = df.dropna()
    metadata = qiime2.Metadata(df)

I'm out of my depth here, so I'll let the real Qiime devs chime-in. :qiime2:

1 Like

Thank you. This led me in the right direction. I did try something. By dropping the columns BarcodeSequence and LinkerPrimerSequence entirely (both of which have no values), the command worked in Qiime 2021.4 as well.

Question for the devs: should this really be necessary?

This isn't a case of a single metadata file, and the metadata file I attached is not my own. Rather, this is the type of metadata file that DNA Subway (https://dnasubway.cyverse.org/) has created ever since the purple line was on Qiime version 1. I am currently the sole developer of DNA Subway, and I'd rather avoid disrupting a workflow that has worked for years by changing the rules for metadata files, especially since there are old projects which may have used a metadata file like this for trimming but not yet done core metrics, and I don't want to invalidate the files they used in their project in case they run core metrics later.

1 Like

Hi @jacobs, I'll let @colinbrislawn respond in general, but I do want to comment on this:

Fortunately nothing changed about the metadata formatting/processing rules in QIIME 2, you have nothing to be concerned about there. This specific issue you report here appears to be a minor regression introduced while fixing a different (but related) bug in the bioenv visualizer. I haven't taken a deep dive here yet, but I also suspect that part of this current issue is also related to pandas 1.0 API changes that have changed the semantics of this particular dataframe.

The "fix" you shared above makes sense to me, and I agree, it isn't ideal - we will revisit this visualizer's implementation. Thanks for reporting.

1 Like