Expand the impact of your microbiome data through metadata standardization!

cherman2 · August 9, 2022, 5:36pm

Hello :qiime2: Community,

Have you ever tried to work with publicly available data and been left frustrated or confused trying to interpret the associated metadata? There has to be a better way!

Have you ever used publicly available metadata?

Yes and it was frustrating or confusing
Yes and it was easy
No

0 voters

I am Chloe Herman, a 3rd year Ph.D. Student in Dr. Greg Caporaso's lab and a National Microbiome Data Collaborative (NMDC) Ambassador. I am working with the NMDC to teach the importance of metadata standardization and data management. As part of this project, I have created a new video that can help you expand the impact of your microbiome data through metadata standardization.

The NMDC's mission is to provide a gateway to Findable Accessible Interoperable and Reusable (FAIR) multi-omics microbiome data by leveraging best practices for data curation and processing. The NMDC are providing a pathway to FAIR metadata by utilizing and combining existing community driven metadata standards for our diverse microbiome community! They are also providing trainings for how to use these standards, like the video I just made!

My video discusses why standardizing metadata is important and how standardized metadata can easily fit into your analysis in :qiime2:! If you want to follow along with the video, here is a link to the incorrectly standardized metadata that I use in the demo:

I would love to start a discussion on this thread, so let me know if you have any questions about the metadata standardization or metadata in QIIME 2. I would also love to know one thing you learned from the video or something you thought was missing that would be good for our QIIME 2 community to know.

Important Links:
The Video: https://www.youtube.com/watch?v=erklD1bofzE
NMDC Website: microbiomedata.org
NMDC Data Portal: data.microbiomedata.org
Genomics Standards Consortium(GSC) Website: https://www.gensc.org/index.html
GSC MIxs Github: https://github.com/GenomicsStandardsConsortium/mixs
Gold Ecosystem Tree: https://gold.jgi.doe.gov/ecosystemtree
Ontology Search Website: https://www.ebi.ac.uk/ols/index
QIIME 2 Metadata: https://docs.qiime2.org/2022.2/tutorials/metadata/
Cual-id Github: https://github.com/johnchase/cual-id
Cual-id Paper: https://journals.asm.org/doi/10.1128/msystems.00010-15?permanently=true
UUID Wiki: Universally unique identifier - Wikipedia

hugh · August 12, 2022, 10:55pm

Hi Chloe,

Great video! Thanks for starting this discussion. I just started as a scientist at the National Ecological Observatory Network (NEON), in Boulder CO. Check us out on the NEON website. I am responsible for the microbial metagenomics data. Your post is very timely for me. We have just started discussions with NMDC to link our data with NMDC using the MIxs/GSC standards. It is early days, but everyone here has been working hard to realize this goal. In the first instance, we will be submitting the NEON shotgun metagenomics data, however, I will be applying the same GSC standards and formats to our marker gene data as well (bacterial 16S, fungal ITS, and macroinvertebrate & zooplankton COI).

A long-term goal of mine will be to move beyond the MIxs minimum metadata and create ways to link more of NEON's data. We have over 180 data products that range from remote sensing to soil chemistry to plant and animal records. My own view is that metabarcoding data becomes exciting only when paired with other data, so I believe this will be of utility to a range of researchers.

This is a great forum to start this discussion, as the Qiime community is large and very active. I look forward to seeing feedback and inputs from everyone here.
-Hugh

Propolis4651 · August 16, 2022, 9:06pm

Hi Chloe,
Thanks for the great YouTube presentation. Looks like the sample metadata is in good shape. Now we have a list of organisms from the sequencing data for each sample. Do you see any possibility of associating the sequence and/or individual organism names with your sample metadata. That sounds like “messing around” with some really big data science stuff. I have just started working with the NCBI’s command line tools, “datsets” and “dataformat”. The NCBI is using .json file which make the use the jq command line indispensable. I’m looking forward to learning about your metadata portal, I have only just created my Login account and searched for PI Knight. I’m also very interested in your cual-id suggestion and I will definitely add that to my NCBI conda environment. Keep up the good work!

cherman2 · August 31, 2022, 9:36pm

Hello @Propolis4651,

Thanks for your comment! Sorry for the slow response on my end! :qiime2:
It seems like what you are talking about is a feature table! That has all your sequences/organisms associated with your samples.

I sometimes merge these using python which might be something of interest for you! But usually the sample metadata has information about samples and feature tables have information about features in those samples and you can link those using the sample ids.