SMetaS

Publication

Article in the Metabolites.

Introduction

There is growing interest in standardizing metabolomics data. While past work in standardization has focused on data acquisition, data processing, and data storage aspects, metabolomics sample/study databases are useless without ontology-based descriptions of biological samples. Standardizing in this way would validate metabolomics’ reproducibility overall, decrease meta-analysis workload, and help metabolomics feed into large machine learning models.

Therefore, we designed SMetaS in order to enable sample-oriented standardization as a frontend for study submissions to metabolomic databases to try to overcome user apathy for submitting data with low-quality metadata.

Results

Sample Metadata Curation

The core result of this work is a tool that enables the programmatic capture of sample metadata in conjunction with established vocabularies. An example is shown below.

image.png

Characteristics of SMetaS

SMetaS has several nice characteristics that make it robust and easy to use.

image.png

Workflow

To accomplish these principles, we developed a pipeline to accumulate orthogonal vocabularies and machine learning models to quickly relate terms that are written to terms that come from our vocabularies. Our vocabularies can also expand if new terms are created.

image.png

Documentation

Extensive documentation is available at https://metabolomics-us.github.io/metadatastandardizer/