Back

Advancing Privacy-Aware Machine Learning on Sensitive Data via Edge-Based Continual μ-Training for Personalized Large Models

Huang, Z.; Yu, L.; Herbozo Contreras, L. F.; Eshraghian, K.; Truong, N. D.; Nikpour, A.; Kavehei, O.

2024-05-20 cardiovascular medicine
10.1101/2024.05.18.24307564 medRxiv
Show abstract

This paper introduces an innovative method for fine-tuning a larger multi-label model for abnormality detection, utilizing a smaller trainer and advanced knowledge distillation techniques. It delves into the effects of fine-tuning on various abnormalities, noting varied improvements based on the Original Models performance in specific tasks. The experimental setup, optimized for on-device inference and fine-tuning with limited computational resources, demonstrates moderate yet promising enhancements in model performance post-fine-tuning. Key insights from the study include the importance of aligning the {micro}-Trainers behavior with the Original Model and the influence of hyper-parameters like the batch size on fine-tuning outcomes. The research acknowledges limitations such as the limited exploration of loss functions in multi-label models and constraints in architectural design, suggesting potential avenues for future investigation. While the proposed Naive Continual Fine-tuning Process is in its early stages, it highlights the potential for long-term model personalization. Moreover, using weight transfer exclusively for fine-tuning amplifies user privacy protection through on-device fine-tuning, devoid of transferring data or gradients to the server. Despite modest performance improvements after fine-tuning, these layers represent a small fraction (0.7%) of the total weights in the Original Model and 1.6% in the {micro}-Trainer. This study establishes a foundational framework for advancing personalized model adaptation, on-device inference, and fine-tuning while emphasizing the importance of safeguarding data privacy in model development.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 3%
22.2%
2
PNAS Nexus
147 papers in training set
Top 0.1%
6.3%
3
Bioinformatics
1061 papers in training set
Top 4%
6.2%
4
Scientific Reports
3102 papers in training set
Top 20%
6.2%
5
PLOS Global Public Health
293 papers in training set
Top 2%
6.2%
6
Medical Image Analysis
33 papers in training set
Top 0.3%
4.3%
50% of probability mass above
7
eLife
5422 papers in training set
Top 21%
4.1%
8
Nature Medicine
117 papers in training set
Top 0.7%
3.9%
9
npj Digital Medicine
97 papers in training set
Top 1%
3.5%
10
iScience
1063 papers in training set
Top 7%
3.0%
11
PLOS ONE
4510 papers in training set
Top 49%
2.1%
12
Cell Systems
167 papers in training set
Top 6%
1.9%
13
Nature
575 papers in training set
Top 10%
1.9%
14
Nature Computational Science
50 papers in training set
Top 0.9%
1.3%
15
Nature Machine Intelligence
61 papers in training set
Top 3%
1.2%
16
Patterns
70 papers in training set
Top 2%
1.2%
17
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.4%
1.2%
18
PLOS Computational Biology
1633 papers in training set
Top 21%
1.1%
19
Cell Genomics
162 papers in training set
Top 5%
1.1%
20
Frontiers in Immunology
586 papers in training set
Top 6%
0.9%
21
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
22
IEEE Access
31 papers in training set
Top 0.9%
0.8%
23
Science Bulletin
22 papers in training set
Top 0.9%
0.7%