MIMOSA: A model-independent framework for transcription factor binding site motif similarity assessment
Tsukanov, A. V.; Levitsky, V. G.
Show abstract
MotivationTranscription factors (TFs) regulate gene expression by binding specific DNA sequences, which are commonly represented by motif models. Although position weight matrices (PWMs) remain the dominant motif representation, alternative models, such as Markov models, can capture interpositional dependencies and may provide higher predictive performance. However, existing motif comparison tools are designed mainly for PWMs or require motifs to be reduced to PWM/PPM representations. This creates a major bottleneck for comparing motifs represented by different model architectures. This limitation complicates the interpretation of de novo motif discovery results and hinders the systematic integration of diverse motif models into genomic analyses. ResultsWe present MIMOSA (Model-Independent Motif Similarity Assessment), a model-independent framework for direct comparison of TF binding site (TFBS) motifs regardless of their mathematical representation. MIMOSA assesses motif similarity by comparing calibrated recognition profiles produced by motifs of different models on the same DNA sequence set, rather than by comparing the motifs themselves. In a cross-database benchmark on HOCOMOCO motifs, MIMOSA achieved retrieval performance comparable to established PWM-oriented tools, including Tomtom and MACRO-APE, with MRR and Recall@k close to the best-performing methods. Pairwise ranking comparisons showed that MIMOSA captures a similarity signal consistent with existing approaches while providing a representation-independent comparison strategy. Application to de novo motifs derived from ChIP-seq data for the ATF3 TF demonstrated that recognition-profile comparison distinguished alternative spacer variants represented as separate PWMs from their integration within more flexible models such as BaMM and Slim. Thus, MIMOSA enables quantitative cross-model motif comparison and supports interpretation of motif heterogeneity in TFBS analyses. Availability and implementationMIMOSA is implemented in Python and is freely available at https://github.com/ubercomrade/mimosa.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.