Metadata index not found issue

Hello again!

So I have received the following errors. I ran my dada2 command and it finished successfully I then went on to incorporate my metadata and got the following:

My first attempt:

(qiime2-2017.10) wsb255bioimac27:Fink_fermenter_Exp3 mel_local$ qiime feature-table summarize --i-table ADe3-Arc-table_dada2.qza --o-visualization ADe3-Arc-table_dada2.qzv --m-sample-metadata-file ADe3_Arc_sample-metadata.tsv 

There was an issue with loading the file ADe3_Arc_sample-metadata.tsv as metadata:

  Invalid characters (e.g. '/', '\x00', '\\', '*', '<', '>', '?', '|', '$') or empty ID detected in metadata category label: ''. There may be more errors present in this metadata. Sample/feature metadata files can be validated using Keemei: http://keemei.qiime.org

Seeing the above I figured it was the underscore issue again (ala this issue we figured out from a dada2 error: dada2 underscore issue), additionally when I went back and remembered to validate using Keemei it also told me there was something wrong with my sampleID column though it was not specific except to say there might be illegal characters...so I changed all my sample names to dashes instead of underscores. It passed Keemei and so I reran the command:

metadata from below command: sample-metadata.tsv (4.0 KB)

(qiime2-2017.10) wsb255bioimac27:Fink_fermenter_Exp3 mel_local$ qiime feature-table summarize --i-table ADe3-Arc-table_dada2.qza --o-visualization ADe3-Arc-table_dada2.qzv --m-sample-metadata-file sample-metadata.tsv 
Plugin error from feature-table:

  "None of [Index(['F1-42-Arc', 'F1-42-5-Arc', 'F1-101-5-Arc', 'F2-8-Arc', 'F2-2-Arc',\n       'F1-89-5-Arc', 'F2-24-Arc', 'F1-57-5-Arc', 'F1-0-Arc', 'F1-1-Arc',\n       'F1-0-5-Arc', 'F1-65-5-Arc', 'F1-128-5-Arc', 'F2-45-5-Arc',\n       'F1-41-5-Arc', 'F1-49-5-Arc', 'F2-42-Arc', 'F2-42-5-Arc',\n       'F2-128-5-Arc', 'F2-1-Arc', 'F2-43-5-Arc', 'F2-41-5-Arc',\n       'F2-182-5-Arc', 'M1-Arc', 'F2-4-Arc', 'F2-101-5-Arc', 'F2-0-5-Arc',\n       'F2-0-Arc', 'F1-16-Arc', 'F1-45-5-Arc', 'F1-8-Arc', 'F2-16-Arc',\n       'F2-89-5-Arc', 'F2-49-5-Arc', 'F1-24-Arc', 'F1-4-Arc', 'F1-43-5-Arc',\n       'F2-155-5-Arc', 'FD1-Arc', 'F1-182-5-Arc', 'F2-57-5-Arc', 'F1-2-Arc',\n       'F2-65-5-Arc', 'F1-155-5-Arc', 'BLANK2-Arc', 'F1-W-Arc', 'F2-W-Arc',\n       'BLANK1-Arc'],\n      dtype='object')] are in the [index]"

Debug info has been saved to /var/folders/12/2j8hq03s52lbnstw5wh008k80000gq/T/qiime2-q2cli-err-z1iymwtl.log
(qiime2-2017.10) wsb255bioimac27:Fink_fermenter_Exp3 mel_local$ vi /var/folders/12/2j8hq03s52lbnstw5wh008k80000gq/T/qiime2-q2cli-err-z1iymwtl.log 

Traceback (most recent call last):
  File "/Users/mel_local/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
    results = action(**arguments)
  File "<decorator-gen-239>", line 2, in summarize
  File "/Users/mel_local/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/qiime2/sdk/action.py", line 220, in bound_callable
    output_types, provenance)
  File "/Users/mel_local/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/qiime2/sdk/action.py", line 416, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/Users/mel_local/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/q2_feature_table/_summarize/_visualizer.py", line 148, in summarize
    df.loc[sample_frequencies.index].to_json(fh)
  File "/Users/mel_local/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/pandas/core/indexing.py", line 1328, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/Users/mel_local/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/pandas/core/indexing.py", line 1541, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/Users/mel_local/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/pandas/core/indexing.py", line 1081, in _getitem_iterable
    self._has_valid_type(key, axis)
  File "/Users/mel_local/miniconda2/envs/qiime2-2017.10/lib/python3.5/site-packages/pandas/core/indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: "None of [Index(['F1-42-Arc', 'F1-42-5-Arc', 'F1-101-5-Arc', 'F2-8-Arc', 'F2-2-Arc',\n       'F1-89-5-Arc', 'F2-24-Arc', 'F1-57-5-Arc', 'F1-0-Arc', 'F1-1-Arc',\n       'F1-0-5-Arc', 'F1-65-5-Arc', 'F1-128-5-Arc', 'F2-45-5-Arc',\n       'F1-41-5-Arc', 'F1-49-5-Arc', 'F2-42-Arc', 'F2-42-5-Arc',\n       'F2-128-5-Arc', 'F2-1-Arc', 'F2-43-5-Arc', 'F2-41-5-Arc',\n       'F2-182-5-Arc', 'M1-Arc', 'F2-4-Arc', 'F2-101-5-Arc', 'F2-0-5-Arc',\n       'F2-0-Arc', 'F1-16-Arc', 'F1-45-5-Arc', 'F1-8-Arc', 'F2-16-Arc',\n       'F2-89-5-Arc', 'F2-49-5-Arc', 'F1-24-Arc', 'F1-4-Arc', 'F1-43-5-Arc',\n       'F2-155-5-Arc', 'FD1-Arc', 'F1-182-5-Arc', 'F2-57-5-Arc', 'F1-2-Arc',\n       'F2-65-5-Arc', 'F1-155-5-Arc', 'BLANK2-Arc', 'F1-W-Arc', 'F2-W-Arc',\n       'BLANK1-Arc'],\n      dtype='object')] are in the [index]"

Would this be because my actual sequence file names have underscores and my metadata file - I changed the underscores to dashes?

The only other thing I could think of was I do have cells that are missing metadata because we didn't get measurements for those time points. Is there a null value I can put in there if that is the issue?

Thanks!
Mel

1 Like

One issue could be hidden characters from formatting in older versions of excel. You could try running it through textedit and making it simple formatting. Or run a command like sed to remove the offensive characters.

2 Likes

I did

:set list

And it showed ^I characters as the tabs.

Only ‘odd’ character I could find…would that character be a violating character?

1 Like

Not sure, but when we were moving files between a Windows 7 and Mac OSX computer we noticed that an older version of excel stuck in \n and \r hidden figures that we removed to get the file working - this may not be true for your issue.

Sure sure - I actually made the sheet in google sheets and exported it to TSV.

Keemei validated it as fine in google sheets prior to export.

If you look at my metadata linked above and screenshotted here:

You'll see I have those ^I characters and end of line characters $ so nothing that looks out of the ordinary.

I did inside VIM replace all ^I characters with a space - thinking the ^I was the issue and then re-inputted the tabs for spaces to revise to a TSV again and it just replaced the ^I character...so I'm thinking it's valid.

I do have some cells that lack data because we didn't get measurements for it - so I am wondering:

  1. Do I need to pull some sort of null value? (na, nan?) Is the feature data command bailing on my metadata because of blank cells?
  2. In my original .qza upload my sequence files had underscores in the names not dashes and as you can see I have dashes in the SampleID column because Keemei and per the above first attempt command an error was thrown about illegal characters and I assumed it was the underscore issue. Is this an issue?

You can try putting something like n.a for all the missing data - we usually do because we found that when we used QIIME1.9.1 it didn’t like to have missing values. So, instead of missing boxes we had n.a for all of the places that were missing.

1 Like

Hi @mmelendrez,

I think your initial guess was right. If you have underscores in your file names and hashes in your metadata then the script won't be able to match these IDs together. You'll have to change your metadata ids to match what you called your sequences exactly. There's no rule saying you can't have underscores in your metadata sample id names, see here for naming requirements.

OK I can retry that - Keemei was throwing an error as well as qiime when I did that - see first post - first attempt. BUT that first attempt could’ve also been error’d because I had empty cells which per @ben I will fill with n.a

1 Like

Hi @mmelendrez,

Sorry I believe you were right with the underscores being highlighted as illegal. They are indeed illegal in keemei when I checked as well, not sure why I was so sure they were ok, oops :zipper_mouth_face:. But with the names now matching and the empty cells being removed you should be ok. Sorry about the confusion.

Ok - I'm thinking it has to be the underscore versus dash issue.

I reran with a metadata file replacing all empty cells with n.a: ADe3_sample-metadata.tsv (4.3 KB)

(qiime2-2017.10) wsb255bioimac27:Fink_fermenter_Exp3 mel_local$ qiime feature-table summarize --i-table ADe3-Arc-table_dada2.qza --o-visualization ADe3-Arc-table_dada2.qzv --m-sample-metadata-file ADe3_sample-metadata.tsv 
Plugin error from feature-table:

  "None of [Index(['F1-42-Arc', 'F1-42-5-Arc', 'F1-101-5-Arc', 'F2-8-Arc', 'F2-2-Arc',\n       'F1-89-5-Arc', 'F2-24-Arc', 'F1-57-5-Arc', 'F1-0-Arc', 'F1-1-Arc',\n       'F1-0-5-Arc', 'F1-65-5-Arc', 'F1-128-5-Arc', 'F2-45-5-Arc',\n       'F1-41-5-Arc', 'F1-49-5-Arc', 'F2-42-Arc', 'F2-42-5-Arc',\n       'F2-128-5-Arc', 'F2-1-Arc', 'F2-43-5-Arc', 'F2-41-5-Arc',\n       'F2-182-5-Arc', 'M1-Arc', 'F2-4-Arc', 'F2-101-5-Arc', 'F2-0-5-Arc',\n       'F2-0-Arc', 'F1-16-Arc', 'F1-45-5-Arc', 'F1-8-Arc', 'F2-16-Arc',\n       'F2-89-5-Arc', 'F2-49-5-Arc', 'F1-24-Arc', 'F1-4-Arc', 'F1-43-5-Arc',\n       'F2-155-5-Arc', 'FD1-Arc', 'F1-182-5-Arc', 'F2-57-5-Arc', 'F1-2-Arc',\n       'F2-65-5-Arc', 'F1-155-5-Arc', 'BLANK2-Arc', 'F1-W-Arc', 'F2-W-Arc',\n       'BLANK1-Arc'],\n      dtype='object')] are in the [index]"

Debug info has been saved to /var/folders/12/2j8hq03s52lbnstw5wh008k80000gq/T/qiime2-q2cli-err-g_cedo70.log

@thermokarst - so this means I need to go back to my original fastqs sent by the sequencing center...change the underscores in the names to dashes....reload as a .qza and redo dada2 again, correct?

Nope! Let's break down what has happened in this thread:

The error clearly states that either an invalid character (e.g. '/', '\x00', '\\', '*', '<', '>', '?', '|', '$') was observed, or, there was a missing ID. Clearly there are formatting issues with this file, because the other part of that error message says that this problem was observed in a metadata category labelled ''.

Okay, onto the next:

Without seeing the Keemei issue, I can't comment on the specifics. I will mention though that there is a distinction between Keemei errors and Keemei warnings. What @Mehrbod_Estaki mentioned below is actually a Keemei warning, take a closer look at the warning message:

 ID doesn't meet the recommendations for choosing identifiers described in the QIIME 2 metadata documentation. IDs are recommended to have the following attributes:

- IDs should be 36 characters long or less.

- IDs should contain only ASCII alphanumeric characters (i.e. in the range of [a-z], [A-Z], or [0-9]), the period (.) character, or the dash (-) character.

It is simply a recommendation that you not have underscores in your Sample IDs. I know you had some issues related to importing and DADA2 related to underscores, but, that is an extraordinary case, and I am still looking into that issue.

Moving on:

Okay, this error says that the Sample IDs in your Feature table don't match the Sample IDs in your metadata.

Let's figure out what your Sample IDs actually are at this point, using your feature table. Please run the following, and post the viz here:

qiime feature-table summarize --i-table ADe3-Arc-table_dada2.qza --o-visualization ADe3-Arc-table_dada2.qzv

This will give us a list of the sample IDs present in your feature table.

Once we have that, we can compare the IDs between the table and the metadata file. The values need to be a 100% exact match (otherwise how will QIIME know what metadata belongs to which sample?), so we will then need to edit your metadata file to have the right IDs in there. Make sense?

3 Likes

Ah you know that's right Keemei was throwing a warning - not an error.

Also looked at the file - apparently all my sample IDs are labeled with dashes already - so apparently I'd already fixed that but I don't have the -Arc on there....let me fix that.ADe3-Arc-table_dada2.qzv (460.8 KB)

@ben FYI - Keemei doesn't like n.a :frowning: for numerics

Keemei is cool with this - empty cells seem to be fine:

Alright that worked:

Some For-my-Information:

  1. Underscores = warnings but not deal breakers according to Keemei
  2. If you don’t have numeric data input nothing - Keemie and qiiime are fine with that
  3. When working in google sheets as off 5/30/2018 - their version inserts extra tab characters when you export to TSV. Even if you highlight your specific cells and export from there. When you type :set list in VIM you will see them as extra ^I characters. I had to get rid of those.

Thanks!

4 Likes

Ugh, I don’t like Keemei :clown_face:

Hm, you’re making me at least rethink the use of n.a in our empty cells. We tend to validate using the old QIIME1 mapping tool validation. There, placing n.a in those columns were accepted, but thanks for the tip. I will re-think those mapping files.

These aren't keemei issues. keemei is actually helping here, not hurting.

they have nothing to do with keemei, specifically. keemei is just validating based on the QIIME2 specifications (and these are recommendations). Keemei is a tool we make to help validate qiime1 and qiime2 sample metadata files — it is in no way a "burden". And in most cases these are not arbitrary rules that QIIME2 is making — it's often underlying software or just good sense (e.g., see below about n.a. values). So don't hate the player, hate the game.

keemei is your friend :smile:

that is okay (because :qiime2: is backwards-compatible with qiime1-style metadata files) but you're sticking to the old ways.

You don't need to use keemei to validate a sample metadata file for use in :qiime2:, but it helps if you are unsure if your file meets expectations.

QIIME2 is actually quite flexible in terms of metadata. E.g., you could use those n.a. values if you feel so inclined. But these are not actually recognized as empty values, as described here and here. Empty spaces are — which means that those samples can be automatically removed during some statistical tests. n.a. values would get interpreted as a different group and, in the case of numeric columns, will cause methods requiring numeric values to error (because n.a. isn't a number, nor is it missing data).

Because most QIIME2 methods are okay with underscores — but this is not guaranteed (particularly for 3rd party plugins). Some software doesn't like underscores in sample names, so some 3rd party plugins may not support that, or :bug:s can arise.

QIIME2 metadata is meant to be flexible, and keemei is there to help. Warnings are there to help you find the "right" way (or at least a better way that will avoid conflicts)

4 Likes

I am after all, wary and afraid of change. Thanks for the clarification.

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.