FLASH: Fast Model Adaptation in ML-centric Cloud Platforms
FLASH is a general framework that introduces embedding-based meta-learning in ML-based cloud systems management to facilitate fast model adaptation across new applications or cloud environments without any changes to the original ML model design or training algorithm. In this repository, we demonstrate how to use FLASH on three existing ML agents that manage resource configurations, autoscaling, and server power.
See more details in paper.
Case Study I: Resource Config Search
Resource configuration search is a fundamental problem in cloud computing where the goal is to find the optimal configuration of resources (e.g., CPU, memory, storage) for a given workload. The problem is usually formulated as a regression problem (supervised learning) where the inputs are workload-related features (e.g., CPU usage, memory usage, request rate) and resource configuration, and the output is the expected performance (e.g., response time, throughput).
After pretraining, FLASH can be used to adapt the learned ML model to new, unseen applications or cloud environments, without requiring retraining the model. We use the following datasets to train and evaluate FLASH on resource configuration search:
- Sizeless dataset: https://github.com/Sizeless/ReplicationPackage
- Multi-cloud config search dataset: https://github.com/IBM/multi-cloud-configuration-dataset
- OpenWhisk dataset: Generated by running 1000 synthetic applications on an OpenWhisk cluster deployed on IBM Cloud.
For functionality and usage details, please refer to the README files in each subfolder: Sizeless, OpenWhisk, and Multi-cloud.
Reproducing results in this paper:
- Ubuntu 20.04
- Python 3.12.1
cd case-1-resource-config-search-sizeless
pip install -r requirements.txt
./reproduce.sh
cd ../case-1-resource-config-search-owk
./reproduce.sh
cd ../case-1-resource-config-search-multicloud
./reproduce.sh
The results are corresponding to Table 2
in the paper, where the main results should be similar to:
Sizeless Dataset
# of Samples | 1-shot | 2-shot | 3-shot |
---|---|---|---|
Sizeless (testing) | 0.36 | 0.4 | 0.34 |
FLASH (testing) | 0.05 | 0.04 | 0.03 |
OpenWhisk Dataset
# of Samples | 1-shot | 2-shot | 3-shot |
---|---|---|---|
Sizeless (testing) | 0.82 | 0.55 | 0.54 |
FLASH (testing) | 0.36 | 0.26 | 0.28 |
Multi-cloud Dataset
# of Samples | 1-shot | 2-shot | 3-shot |
---|---|---|---|
Sizeless (testing) | 0.99 | 0.89 | 0.8 |
FLASH (testing) | 0.65 | 0.42 | 0.5 |
Case Study II: Workload Autoscaling
Workload autoscaling is usually modeled as a sequential decision-making problem to scale horizontally or vertically the controlled workload. For example in Kubernetes, the autoscaler agent can scale the number of replicas and the size of each replica/pod for a Deployment instance. As proposed by many papers (e.g., DeepScaler), RL is well suited for learning an optimal policy for specific applications without relying on inaccurate assumptions like heuristics or roles.
In this case study, we demonstrate that FLASH can adapt the pre-trained RL model to new, unseen applications, faster than transfer learning with simply parameter sharing.
For functionality and usage details, please refer to the README file.
Reproducing results in this paper (about 2.5 hours):
- Ubuntu 20.04
- Python 3.12.1
cd case-2-autoscaling
pip install -r requirements.txt
./reproduce.sh
The results are corresponding to Sec. 5.2
in the paper, where the main results claimed by the paper are:
- Improved reward by a percentage of 71.6% (i.e., from 37% to 10.5%)
- Improved adaptation cost by 5.5x
Case Study III: Server Power Management
CPU frequency scaling requires the agent to balance the workload performance improvements with the extra power cost when increasing the core frequency. Similar to workload autoscaling, an RL model can be used to learn optimal policies to control the core frequency (i.e., when and how much to scale the core frequency) for specific applications.
In this case study, we demonstrate that FLASH can adapt the pre-trained RL model to new, unseen applications, faster than transfer learning with simply parameter sharing.
For functionality and usage details, please refer to the README file.
Reproducing results in this paper (about 3 hours):
- Ubuntu 20.04
- Python 3.12.1
cd case-4-cpu-freq-scaling
pip install -r requirements.txt
./reproduce.sh
The results are corresponding to Sec. 5.3
in the paper, where the main results claimed by the paper are:
- Improved reward by a percentage of 81.8% (i.e., from 39% to 7.1%)
- Improved adaptation cost by 9.2x
Contact
Haoran Qiu, haoranq4@illinois.edu