Metadata error in diversity core metrics pipeline

Sarah_McGrath · December 12, 2017, 1:42am

Hello,

I have been getting an error when I try to run the qiime diversity core-metrics-phylogenetic plugin, i.e.,

(qiime2-2017.11) qiime2@qiime2core2017-11:~$ qiime diversity core-metrics-phylogenetic --i-phylogeny rooted-tree-frog2forward.qza --i-table table-frog2forward.qza --p-sampling-depth 150 --m-metadata-file frog2forward-metadata_1.tsv --output-dir frog2forward-core-metrics-results
There was an issue with loading the file frog2forward-metadata_1.tsv as metadata:

Invalid characters (e.g. '/', '\x00', '\', '*', '<', '>', '?', '|', '$') or empty ID detected in metadata index: '--2017-12-12 00:34:29-- http://microbiome/'. There may be more errors present in this metadata. Sample/feature metadata files can be validated using Keemei: http://keemei.qiime.org

My metadata file was created in google sheets and was validated using Keemei.

I ran this same code in the Qiime 2 Core-2017.9 version and read on a previous forum post that updating the version might help. I have now downloaded and tried this code in Qiime 2 Core-2017.11 and run into the same error. Is there something wrong with my metadata that Keemei isn't picking up or is it something I am missing with the plugin?

Any assistance would be greatly appreciated!

Best,
Sarah

thermokarst · December 12, 2017, 1:48am

Hi @Sarah_McGrath!

Check out this little tidbit in the error message:

According to that error, one of the Sample IDs is ‘–2017-12-12 00:34:29-- http://microbiome/’. The / character is an invalid character in your IDs.

Maybe something wormed its way into frog2forward-metadata_1.tsv after you downloaded it from Google Sheets? I would recommend taking a closer look at that file in a text editor (Notepad on Windows or TextEdit on Mac). You could also download the file again from Google Sheets, but I would still double check in a text editor first.

Good luck and keep us posted!

Sarah_McGrath · December 12, 2017, 3:58pm

Thanks for letting me know what part of the error message to pay attention to.

I opened the tsv file in Notepad per your suggestion and I am still not seeing anything that might constitute an invalid character.

QIIME2 Frog Microbiome Metadata - Frog2Forward Metadata_2.tsv (1.4 KB)

Here is my metadata file. My data is paired-end run on an Illumina MiSeq and is already demultiplexed, so I just have sequence ID and description as my columns. I tried downloading it from google sheets and running the code again with this new file and it is still giving me the same error message.

(qiime2-2017.11) qiime2@qiime2core2017-11:~$ qiime diversity core-metrics-phylogenetic --i-phylogeny rooted-tree-frog2forward.qza --i-table table-frog2forward.qza --p-sampling-depth 150 --m-metadata-file frog2forward-metadata_2.tsv --output-dir frog2forward-core-metrics-results
There was an issue with loading the file frog2forward-metadata_2.tsv as metadata:

Invalid characters (e.g. '/', '\x00', '\', '*', '<', '>', '?', '|', '$') or empty ID detected in metadata index: '--2017-12-12 12:54:52-- http://microbiome/'. There may be more errors present in this metadata. Sample/feature metadata files can be validated using Keemei: http://keemei.qiime.org

Has anyone had issues with having the # symbol in front of sample ID (i.e., #SampleID)? Are there any other characters in my metadata file that are invalid that I am just not seeing?

Again, any assistance is greatly appreciated!!!

Thanks,
Sarah

thermokarst · December 12, 2017, 4:04pm

Hi @Sarah_McGrath! Are you sure you have all of your files straight? The Metatdata file referenced in your command is called frog2forward-metadata_2.tsv, but the file you attached here is called QIIME2 Frog Microbiome Metadata - Frog2Forward Metadata_2.tsv.

What doe you see when you run:

$ cat frog2forward-metadata_2.tsv

QIIME 2 doesn't do any interpolation or extrapolation of IDs - it is reporting that there is a value in your first column that looks like this: –2017-12-12 12:54:52-- http://microbiome/

The # is valid in that spot, to support QIIME 1 backwards compatibility. As far as characters you might not be seeing, please see my comments above RE how IDs are handled in QIIME 2. Thanks!

Sarah_McGrath · December 12, 2017, 4:41pm

Hello @thermokarst!

When I ran $ cat frog2forward-metadata_2.tsv I got this...

 (qiime2-2017.11) qiime2@qiime2core2017-11:~$ cat frog2forward-metadata_2.tsv
--2017-12-12 12:54:52--  http://frog/
Resolving frog (frog)... failed: Name or service not known.
wget: unable to resolve host address ‘frog’
--2017-12-12 12:54:52--  http://microbiome/
Resolving microbiome (microbiome)... failed: Name or service not known.
wget: unable to resolve host address ‘microbiome’
--2017-12-12 12:54:52--  http://metadata/
Resolving metadata (metadata)... failed: Name or service not known.
wget: unable to resolve host address ‘metadata’
--2017-12-12 12:54:52--  http://-/
Resolving - (-)... failed: Name or service not known.
wget: unable to resolve host address ‘-’
--2017-12-12 12:54:52--  http://frog2forward/
Resolving frog2forward (frog2forward)... failed: Name or service not known.
wget: unable to resolve host address ‘frog2forward’
--2017-12-12 12:54:52--  http://metadata_2.tsv/
Resolving metadata_2.tsv (metadata_2.tsv)... failed: Name or service not known.
wget: unable to resolve host address ‘metadata_2.tsv’
/media/sf_Shared_Folder/QIIME2: No such file or directory
No URLs found in /media/sf_Shared_Folder/QIIME2.

I had tried using the wget command to input my metadata file into my QIIME2 directory. Is that command only applicable to obtaining data from the QIIME2 website? (I am obviously new to coding!).

I tried running the diversity code again and putting the actual file path to the tsv file in my shared folder and got this...

(qiime2-2017.11) qiime2@qiime2core2017-11:~$ qiime diversity core-metrics-phylogenetic --i-phylogeny rooted-tree-frog2forward.qza --i-table table-frog2forward.qza --p-sampling-depth 150 --m-metadata-file '/media/sf_Shared_Folder/QIIME2 Frog Microbiome Metadata - Frog2Forward Metadata_2.tsv' --output-dir frog2forward-core-metrics-results
There was an issue with loading the file /media/sf_Shared_Folder/QIIME2 Frog Microbiome Metadata - Frog2Forward Metadata_2.tsv as metadata:

  Non-string Metadata index values detected. There may be more errors present in this metadata. Sample/feature metadata files can be validated using Keemei: http://keemei.qiime.org

HELP!

thermokarst · December 12, 2017, 5:29pm

Yowzah! Looks like a party in there! Well, that would explain the first error message you saw: the very first value in the file is --2017-12-12 12:54:52-- http://microbiome/ (because the first row is parsed as the column labels).

Sorry, those commands are just for the official QIIME 2 tutorial files - you will need to do what works for you and your workflow instead. Since you made it in Google Sheets, you should be able to just download the file as a TSV from that interface.

The best way to learn is to dive in head-first! We all start somewhere, and this is a great place to get help!

Okay, this one is our fault, we have an open issue about this: NA is an invalid identifier in QIIME 2, at the moment. This has to do with some of the behind-the-scenes stuff in Q2 (we use pandas to load the metadata - it sees NA and turns it into a special "empty" value, with no real meaning).

Unfortunately, the options here aren't great:

You could remove that line from your metadata file, and then that sample would be ignored. Depending on what you're doing in your analysis though, this could be a problem for you (that is a judgement call you will have to make, related to the soundness of your study with and without that sample).
You could rename that sample, which would require re-importing your sequences from the beginning, using a new sample name other than NA (e.g. NotApplicable).

Let us know if you need a hand with the second option there!

Thanks and good luck!

Sarah_McGrath · December 12, 2017, 7:54pm

Hi @thermokarst!

I just tried the second option you proposed..

You could rename that sample, which would require re-importing your sequences from the beginning, using a new sample name other than NA (e.g. NotApplicable).

...and that worked! I re-ran all of the steps from the beginning after changing the file name and sample ID in the metadata file. I ran the diversity core metrics pipeline and now have the first visualizations of my data!

Thanks so much for your assistance with this. I really appreciate all of the hard work that has gone into QIIME 2 and all of you that continuously assist people like myself on the forum! It makes a huge difference, especially for those of us just starting out

Thanks again!
Sarah

system · January 13, 2018, 1:54am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

jairideout · February 16, 2018, 4:58pm

In the QIIME 2 2018.2 release, Metadata now supports IDs and column names that are NA; this name will no longer be interpreted as missing data.

There are a number of other changes to QIIME 2 Metadata in the 2018.2 release. See this forum announcement for details on what changed, as well as the updated Metadata tutorial.