SMetaS

Publication

Introduction

There is growing interest in standardizing metabolomics data. While past work in standardization has focused on data acquisition, data processing, and data storage aspects, metabolomics sample/study databases are useless without ontology-based descriptions of biological samples. Standardizing in this way would validate metabolomics’ reproducibility overall, decrease meta-analysis workload, and help metabolomics feed into large machine learning models.

Therefore, we designed SMetaS in order to enable sample-oriented standardization as a frontend for study submissions to metabolomic databases to try to overcome user apathy for submitting data with low-quality metadata.

Results

Sample Metadata Curation

The core result of this work is a tool that enables the programmatic capture of sample metadata in conjunction with established vocabularies. An example is shown below.

Characteristics of SMetaS

SMetaS has several nice characteristics that make it robust and easy to use.

Workflow

To accomplish these principles, we developed a pipeline to accumulate orthogonal vocabularies and machine learning models to quickly relate terms that are written to terms that come from our vocabularies. Our vocabularies can also expand if new terms are created.

Documentation

Extensive documentation is available at https://metabolomics-us.github.io/metadatastandardizer/