Mock Communities for Assessing Quality Control

Hello everyone!

I am using Zymo Microbial Community DNA Standard as a mock community to assess the efficacy of the sequencing run.

For this, I have created an artifact for the expected sequences and another one for observed sequences. For taxonomy classification, I used both SILVA 132 to train a classifier and the Greengenes classifier available in the website on the "Moving Pictures" tutorial. After classifying it, I checked the taxonomy and some of the sequences on both don't get classified more than the Levels 1 (SILVA) or 4 (Greengenes).

Therefore, when I try to use the plugin q2-quality-control, I get the following:
Requested level of 2 is larger than the maximum level available in taxonomy data (1).
Is there any way to circumvent this error?

Thanks in advance!

Sequences for Expected Seqs_Freq-seqs.qza (10.2 KB)
FeatureTable for Expected Seqs_Freq-table.qza (11.8 KB)
Taxonomy for Expected Seqs_Freq-taxonomy.qzv (1.2 MB)

Sequences for Observed All_PositiveControls_seqs.qza (50.9 KB)
FeatureTable for Observed All_PositiveControls_table.qza (32.6 KB)
Taxonomy for Observed All_PositiveControls-taxonomy.qzv (1.2 MB)


Welcome to the QIIME 2 forum, @asbarros!

Could you please let us know:

  1. what command you are running
  2. the full error message

I am making some assumptions based on the information you have given, but I think the issue is that you are using evaluate-composition to compare the feature tables as they are (with unique hash IDs for each feature), but this action should be run on collapsed feature tables. See this tutorial for an example (the evaluate-composition example is near the bottom of the tutorial):

I am also a little confused about how you are creating your “expected” compositions. It looks like the feature IDs of your expected features also consist of hash IDs; how did you obtain those? Presumably the feature IDs for the expected composition should be the taxonomic lineages of the species you added to the community; hash IDs makes it seem like this is additional sequencing data.

Hi Nicholas,

The command I'm running is the following:

qiime quality-control evaluate-composition \
      --i-expected-features intermediate_files/Pos_Expected_Rel.qza \
      --i-observed-features intermediate_files/Pos_Obtained_Rel.qza \
      --o-visualization visualizations/GDF15_PositiveControls.qzv

and I get the following error message:
Requested level of 2 is larger than the maximum level available in taxonomy data (1).

I thought that but, even after collapsing the tables for taxonomic level 7, the same issue remains. I am sending you the collapsed tables:
Pos_Obtained.qza (81.1 KB) Pos_Expected.qza (45.3 KB)

The expected community had to "created" using the 16s sequences given by Zymo concerning the species that are expected to be in the positive controls and at the right abundance. This is a in silico-produced fasta file.

Please post the complete error message (check the log file or run the command with the --verbose flag)

Sorry Nicholas, here is the full error message:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  obs_collapsed.loc[sample]], axis=1).fillna(0)
Traceback (most recent call last):
  File "/Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/", line 328, in __call__
    results = action(**arguments)
  File "</Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/>", line 2, in evaluate_composition
  File "/Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/", line 240, in bound_callable
    output_types, provenance)
  File "/Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/", line 445, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_quality_control/", line 76, in evaluate_composition
  File "/Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_quality_control/", line 104, in _evaluate_composition
    results, vectors = _compute_per_level_accuracy(exp, obs, metadata, depth)
  File "/Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_quality_control/", line 162, in _compute_per_level_accuracy
    obs_collapsed = _collapse_table(obs, level)
  File "/Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_quality_control/", line 310, in _collapse_table
    table.columns, index=table.columns, name='Taxon'), level)
  File "/Users/asbarros/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_taxa/", line 27, in collapse
    (level, max_observed_level))
ValueError: Requested level of 2 is larger than the maximum level available in taxonomy data (1).

Plugin error from quality-control:

  Requested level of 2 is larger than the maximum level available in taxonomy data (1).

See above for debug info.

Thanks @asbarros! I am not able to replicate this error with the files you sent. In fact, here is the output:

GDF15_PositiveControls.qzv (441.9 KB)

I suggest looking back over your commands carefully — I suspect this is a matter of mixing up the inputs (the error you described would make sense with the ASV tables you sent initially, but not with the collapsed tables that you sent).

Good luck!

Thanks Nicholas!

The error message I sent you today was made using the files I had sent… But I’ll re-trace my steps!

Hi Andre!

I’m using mock community 16 from mockrobiota to test a taxonomic classifier that I’ve trained on the SILVA 132 99 OUT database in QIIME. Now I need to test how well my classifier works by using q2-quality-control
As far as I understand, I need to use the expected taxonomy of the mock community as input (–expected-features)
My question is, how did you get a qza file from an expected_taxonomy.tsv file?

I figured you may have already done that in your research, since you have generated a Seqs_Freq-table.qza
Any would help would be appreciated!

Welcome to the forum @Mia_T !

Please see an example of using a mockrobiota dataset in this tutorial:

Thank you so much this is very helpful :slight_smile:

1 Like