BEDMS: A metadata standardizer for genomic regionattributes
Tambe, S.; Khoroshevskyi, O.; Park, S.-H.; LeRoy, N. J.; Campbell, D. R.; Zheng, G.; Zhang, A.; Sheffield, N. C.
Show abstract
High-throughput sequencing technologies have generated vast omics data annotating genomic regions. A challenge arises in integrating this data because the associated metadata does not follow a uniform schema. This hinders data management, discovery, interoperability, and reusability. Existing tools that address metadata standardization issues are generally limited in scope and targeted toward specific data sets or types and are not generally applicable to custom schemas. To improve standardization of genomic interval metadata, we have developed BEDMS. We developed and evaluated several model architectures and trained models that achieved high performance on held-out training data. With a trained model, BEDMS provides users with predicted standardized metadata attributes that follow a standardized schema. Furthermore, BEDMS provides the ability to train custom models. To demonstrate, we trained BEDMS on three different schemas, allowing users to choose which schema to standardize into. We also deployed BEDMS on PEPhub, which provides a graphical user interface to allow users to standardize metadata without requiring any local training or software at all. In conclusion, BEDMS offers a practical one-stop solution for metadata management and standardization for genomic interval data.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.