Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • llvm/predtuner
1 result
Show changes
Commits on Source (12)
# Autotuning and Predictive Autotuning
Performs autotuning on program approximation knobs using an error-predictive proxy in place of the
original program, to greatly speedup autotuning while getting results comparable in quality.
`predtuner` performs autotuning on program approximation knobs using an error-predictive proxy
in place of the original program, to greatly speedup autotuning while getting results
comparable in quality.
Work in progress.
## Requirements
Prerequisite packages are listed in `./env.yaml`. Conda is the validated and recommended way to set
up a working environment. If you're using conda, do
`pip` is needed for installing this package. At the root directory of this repo, do:
```bash
conda env create -n predtuner -f env.yaml
conda activate predtuner
pip install -e .
```
`-e` can be omitted if you don't intend to modify the code in this package.
## Getting Started
The documentation page contains a full tutorial.
Build the documentation by:
```bash
pip install sphinx sphinx_rtd_theme sphinx_autodoc_typehints
cd doc
make html
```
The documentation page will be created as `doc/build/html/index.html`.
You can open this in the browser and browse to "Getting Started" section.
### Model Data for Example / Testing
`predtuner` contains 10 demo models which are also used in tests.
- Download and extract [this](https://drive.google.com/file/d/1V_yd9sKcZQ7zhnO5YhRpOsaBPLEEvM9u/view?usp=sharing) file containing all 10 models, for testing purposes.
- The "Getting Started" example on the documentation page only uses VGG16-CIFAR10.
If you don't need the other models, get the data for VGG16-CIFAR10
[here](https://drive.google.com/file/d/1Z84z-nsv_nbrr8t9i28UoxSJg-Sd_Ddu/view?usp=sharing).
In either case, there should be a `model_params/` folder at the root of repo after extraction.
Getting Started
===================
This guide can help you start working with PredTuner.
Installation
------------
Install PredTuner from source using `pip`:
.. code-block:: shell
pip install -e .
PredTuner will also be available on PyPi in the future after we publish the first release.
Tuning a PyTorch DNN
--------------------
PredTuner can tune any user-defined application,
but it is optimized for tuning DNN applications defined in PyTorch.
We will use models predefined in PredTuner for demonstration purposes.
Download pretrained VGG16 model parameters and CIFAR10 dataset from `here
<https://drive.google.com/file/d/1Z84z-nsv_nbrr8t9i28UoxSJg-Sd_Ddu/view?usp=sharing>`_.
After extraction, there should be a `model_params/` folder in current directory.
Load the tuning and test subsets of CIFAR10 dataset, and create a pretrained VGG16 model:
.. code-block:: python
from pathlib import Path
import predtuner as pt
from predtuner.model_zoo import CIFAR, VGG16Cifar10
prefix = Path("model_params/vgg16_cifar10")
tune_set = CIFAR.from_file(prefix / "tune_input.bin", prefix / "tune_labels.bin")
tune_loader = DataLoader(tune_set, batch_size=500)
test_set = CIFAR.from_file(prefix / "test_input.bin", prefix / "test_labels.bin")
test_loader = DataLoader(test_set, batch_size=500)
module = VGG16Cifar10()
module.load_state_dict(torch.load("model_params/vgg16_cifar10.pth.tar"))
PredTuner provides a logging mechanism.
While not required, it's recommended that you set up the logger output into a file:
.. code-block:: python
msg_logger = pt.config_pylogger(output_dir="vgg16_cifar10/", verbose=True)
For each tuning task, both a tuning dataset and a test dataset is required.
The tuning dataset is used to evaluate the accuracy of application in the autotuning stage,
while the test dataset is used to evaluate configurations found in autotuning.
This is similar to the split between training and validation set in machine learning tasks.
In this case, both tuning and test datasets contain 5000 images.
Create an instance of `TorchApp` for tuning PyTorch DNN:
.. code-block:: python
app = pt.TorchApp(
"TestTorchApp", # Application name -- can be anything
module,
tune_loader,
test_loader,
knobs=pt.get_knobs_from_file(),
tensor_to_qos=pt.accuracy,
model_storage_folder="vgg16_cifar10/",
)
PredTuner provides `TorchApp`, which is specialized for the use scenario of tuning PyTorch DNNs.
In addition, two more functions from PredTuner are used:
`pt.accuracy` is the *classification accuracy* metric,
which receives the probability distribution output from the VGG16 model,
compare it to the groundtruth in the dataset,
and returns a scalar between 0 and 100 for the classification accuracy
`pt.get_knobs_from_file()` returns a set of approximations preloaded in PredTuner,
which are applied to `torch.nn.Conv2d` layers.
See ??? for these approximations and how to define custom approximations.
Now we can obtain a tuner object from the application and start tuning.
We will keep configurations that don't exceed 3% loss of accuracy,
but encourage the tuner to find configurations with loss of accuracy below 2.1%.
.. code-block:: python
tuner = app.get_tuner()
tuner.tune(
max_iter=500,
qos_tuner_threshold=2.1, # QoS threshold to guide tuner into
qos_keep_threshold=3.0, # QoS threshold for which we actually keep the configurations
is_threshold_relative=True, # Thresholds are relative to baseline -- baseline_acc - 2.1
take_best_n=50,
cost_model="cost_linear", # Use linear cost predictor
)
**QoS** (quality of service) is a general term for the quality of application after approximations are applied;
e.g., here it refers to the accuracy of DNN over given datasets.
We will be using the term QoS throughout the tutorials.
`max_iter` defines the number of iterations to use in autotuning.
Within 500 iterations, PredTuner should find about 200 valid configurations.
PredTuner will also automatically mark out `Pareto-optimal
<https://en.wikipedia.org/wiki/Pareto_efficiency>`_
configurations.
These are called "best" configurations (`tuner.best_configs`),
in contrast to "valid" configurations which are the configurations that satisfy our accuracy requirements
(`tuner.kept_configs`).
`take_best_n` allows taking some extra close-optimal configurations in addition to Pareto-optimal ones.
500 iterations is for demonstration; in practice,
at least 10000 iterations are necessary on VGG16-sized models to converge to a set of good configurations.
Depending on hardware performance, this tuning should take several minutes to several tens of minutes.
Saving Tuning Results
---------------------
Now the `tuner` object holds the tuning results,
we can export it into a json file,
and visualize all configurations in a figure:
.. code-block:: python
tuner.dump_configs("vgg16_cifar10/configs.json", best_only=False)
fig = tuner.plot_configs(show_qos_loss=True)
fig.savefig("vgg16_cifar10/configs.png")
The generated figure should look like this:
.. image:: tuning_result.png
where the blue points shows the QoS and speedup of all valid configurations,
and the "best" configurations are marked out in orange.
Autotuning with a QoS Model
-------------------------------------
The previous tuning session shown above is already slow,
and will be much slower with larger models, more iterations, and multiple tuning thresholds.
Instead, we can use a *QoS prediction model* which predicts the QoS,
with some inaccuracies, but much faster than running the application.
To do that, simply use the argument `qos_model` when calling `tuner.tune()`:
.. code-block:: python
tuner = app.get_tuner()
tuner.tune(
max_iter=500,
qos_tuner_threshold=2.1, # QoS threshold to guide tuner into
qos_keep_threshold=3.0, # QoS threshold for which we actually keep the configurations
is_threshold_relative=True, # Thresholds are relative to baseline -- baseline_acc - 2.1
take_best_n=50,
cost_model="cost_linear", # Use linear cost predictor
qos_model="qos_p1"
)
The QoS model will first undergo a initialization stage (takes a bit of time),
when it learns about the behavior of each knob on each operator (DNN layer).
Because the configurations will end up with predicted QoS values after tuning,
this will add a *validation* stage at the end of tuning where the QoS of best configurations are empirically measured,
and the bad ones are removed.
......@@ -9,27 +9,21 @@ PredTuner performs autotuning on approximation choices for a program
using an error-predictive proxy instead of executing the program,
to greatly speedup autotuning while getting results of comparable quality.
PredTuner is a contribution of [ApproxTuner]
(https://ppopp21.sigplan.org/details/PPoPP-2021-main-conference/41/ApproxTuner-A-Compiler-and-Runtime-System-for-Adaptive-Approximations).
Short-term Goals
- Measure accuracy impact of approximations
- Obtain a tuned, approximated CNN in <5 lines of code
- Easy to manage multiple approximation configs
- Easy to load and manage prior tuning results
- Flexible retraining support
Possible Long-term Goals
- High-performance implementations of approximate layers
- Allow users to register their own approximations
- Support for other frameworks: TF, ONNX, JAX
PredTuner is a main component of `ApproxTuner
<https://ppopp21.sigplan.org/details/PPoPP-2021-main-conference/41/ApproxTuner-A-Compiler-and-Runtime-System-for-Adaptive-Approximations>`_.
Documentation
-------------
.. only:: html
Solution for Efficient Approximation Autotuning
-----------------------------------------------
- Start a tuning session in 10 lines of code
- Deep integration with PyTorch for DNN supports
- Multiple levels of APIs for generality and ease-of-use
- Effective accuracy prediction models
- Easily store and visualize tuning results in many formats
:Release: |version|
:Date: |today|
Documentation
-------------
.. toctree::
:maxdepth: 1
......@@ -41,6 +35,3 @@ Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
* :ref:`glossary`
doc/tuning_result.png

26.3 KiB

import site
from pathlib import Path
import torch
from torch.utils.data.dataloader import DataLoader
from torch.utils.data.dataset import Subset
site.addsitedir(Path(__file__).parent.parent.absolute().as_posix())
from predtuner import TorchApp, accuracy, config_pylogger, get_knobs_from_file
from predtuner.model_zoo import CIFAR, VGG16Cifar10
# Set up logger to put log file in /tmp
msg_logger = config_pylogger(output_dir="/tmp", verbose=True)
# Load "tuning" dataset and "test" dataset,
# and use only the first 500 images from each dataset as an example
# TODO: you should use all (5000) images for actual tuning.
prefix = Path("model_params/vgg16_cifar10")
tune_set = CIFAR.from_file(prefix / "tune_input.bin", prefix / "tune_labels.bin")
tune_loader = DataLoader(tune_set, batch_size=500)
test_set = CIFAR.from_file(prefix / "test_input.bin", prefix / "test_labels.bin")
test_loader = DataLoader(test_set, batch_size=500)
# Load checkpoint for VGG16 (CIFAR10)
module = VGG16Cifar10()
module.load_state_dict(torch.load("model_params/vgg16_cifar10.pth.tar"))
app = TorchApp(
"TestTorchApp", # name -- can be anything
module,
tune_loader,
test_loader,
get_knobs_from_file(), # default knobs -- see "predtuner/approxes/default_approx_params.json"
accuracy, # the QoS metric to use -- classification accuracy
# Where to serialize prediction models if they are used
# For example, if you use p1 (see below), this will leave you a
# tuner_results/vgg16_cifar10/p1.pkl
# which can be quickly reloaded the next time you do tuning with
model_storage_folder="tuner_results/vgg16_cifar10",
)
# This is how to measure baseline accuracy -- {} means no approximation
baseline, _ = app.measure_qos_cost({}, False)
# Get a tuner object and start tuning!
tuner = app.get_tuner()
tuner.tune(
max_iter=500, # TODO: In practice, use at least 5000, or 10000
qos_tuner_threshold=2.1, # QoS threshold to guide tuner into
qos_keep_threshold=3.0, # QoS threshold for which we actually keep the configurations
is_threshold_relative=True, # Thresholds are relative to baseline -- baseline_acc - 2.1
cost_model="cost_linear", # Use linear performance predictor
qos_model="qos_p1", # Use P1 QoS predictor
)
# Save configs here when you're done
tuner.dump_configs("tuner_results/vgg16_cifar10_configs.json")
fig = tuner.plot_configs(show_qos_loss=True)
fig.savefig("tuner_results/vgg16_cifar10_configs.png")
\ No newline at end of file
......@@ -116,10 +116,6 @@ class Config:
@property
def speedup(self):
return 1 / self.cost
@property
def qos_speedup(self):
return self.qos, self.speedup
T = TypeVar("T", bound=Config)
......@@ -151,7 +147,7 @@ class ApproxTuner(Generic[T]):
is_threshold_relative: bool = False,
take_best_n: Optional[int] = None,
test_configs: bool = True,
**kwargs
app_kwargs: dict = None
# TODO: more parameters + opentuner param forwarding
) -> List[T]:
from opentuner.tuningrunmain import TuningRunMain
......@@ -173,7 +169,7 @@ class ApproxTuner(Generic[T]):
qos_tuner_threshold,
qos_keep_threshold,
is_threshold_relative,
self._get_app_kwargs(**kwargs),
app_kwargs or {},
)
assert self.keep_threshold is not None
trm = TuningRunMain(tuner, opentuner_args)
......@@ -206,27 +202,41 @@ class ApproxTuner(Generic[T]):
len(self.best_configs),
)
if test_configs:
msg_logger.info("Checking configurations on test inputs")
self.test_configs_(self.best_configs)
msg_logger.info("Calibrating configurations on test inputs")
self.best_configs = self.test_configs(self.best_configs)
return self.best_configs
def test_configs_(self, configs: List[T]):
def test_configs(self, configs: List[Config]):
from copy import deepcopy
from tqdm import tqdm
assert self.keep_threshold is not None
if not configs:
return []
ret_configs = []
total_error = 0
for cfg in tqdm(configs, leave=False):
cfg: T
if cfg.test_qos is not None:
continue
cfg = deepcopy(cfg)
assert cfg.test_qos is None
cfg.test_qos, _ = self.app.measure_qos_cost(cfg.knobs, True)
msg_logger.debug(f"Calibration: {cfg.qos} (mean) -> {cfg.test_qos} (mean)")
total_error += abs(cfg.qos - cfg.test_qos)
if cfg.test_qos > self.keep_threshold:
ret_configs.append(cfg)
else:
msg_logger.debug("Config removed")
mean_err = total_error / len(configs)
msg_logger.info("QoS mean abs difference of calibration: %f", mean_err)
return ret_configs
@staticmethod
def take_best_configs(configs: List[T], n: Optional[int] = None) -> List[T]:
points = np.array([c.qos_speedup for c in configs])
points = np.array([(c.qos, c.speedup) for c in configs])
taken_idx = is_pareto_efficient(points, take_n=n)
return [configs[i] for i in taken_idx]
def dump_configs(self, filepath: PathLike):
def dump_configs(self, filepath: PathLike, best_only: bool = True):
import os
from jsonpickle import encode
......@@ -237,20 +247,27 @@ class ApproxTuner(Generic[T]):
)
filepath = Path(filepath)
os.makedirs(filepath.parent, exist_ok=True)
confs = self.best_configs if best_only else self.kept_configs
with filepath.open("w") as f:
f.write(encode(self.best_configs, indent=2))
f.write(encode(confs, indent=2))
def plot_configs(
self, show_qos_loss: bool = False, connect_best_points: bool = False
self,
show_qos_loss: bool = False,
connect_best_points: bool = False,
use_test_qos: bool = False,
) -> plt.Figure:
if not self.tuned:
raise RuntimeError(
f"No tuning session has been run; call self.tune() first."
)
def qos_speedup(conf):
return conf.test_qos if use_test_qos else conf.qos, conf.speedup
def get_points(confs):
sorted_points = np.array(
sorted([c.qos_speedup for c in confs], key=lambda p: p[0])
sorted([qos_speedup(c) for c in confs], key=lambda p: p[0])
).T
if show_qos_loss:
sorted_points[0] = self.baseline_qos - sorted_points[0]
......@@ -297,9 +314,6 @@ class ApproxTuner(Generic[T]):
**app_kwargs,
)
def _get_app_kwargs(self, **kwargs):
return {}
@classmethod
def _get_config_class(cls) -> Type[Config]:
return Config
......
......@@ -5,6 +5,7 @@ import pickle
from pathlib import Path
from typing import Callable, Dict, Iterator, List, Optional, Tuple, Type, Union
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
......@@ -192,6 +193,8 @@ class QoSModelP1(IQoSModel):
qos_metric: Callable[[torch.Tensor], float],
storage: PathLike = None,
) -> None:
from torch.nn.functional import softmax
super().__init__()
self.app = app
self.output_f = tensor_output_getter
......@@ -387,32 +390,89 @@ class ApproxModeledTuner(ApproxTuner):
qos_keep_threshold=qos_keep_threshold,
is_threshold_relative=is_threshold_relative,
take_best_n=take_best_n,
test_configs=test_configs,
cost_model=cost_model,
qos_model=qos_model,
test_configs=False, # Test configs below by ourselves
app_kwargs={"cost_model": cost_model, "qos_model": qos_model},
)
if validate_configs is None and qos_model != "none":
msg_logger.info(
'Validating configurations due to using qos model "%s"', qos_model
)
self.validate_configs_(self.best_configs)
self.best_configs = self._update_configs(self.best_configs, False)
elif validate_configs:
msg_logger.info("Validating configurations as user requested")
self.validate_configs_(self.best_configs)
self.best_configs = self._update_configs(self.best_configs, False)
if test_configs:
msg_logger.info("Calibrating configurations on test inputs")
self.best_configs = self._update_configs(self.best_configs, True)
return ret
def validate_configs_(self, configs: List[ValConfig]):
def _update_configs(self, configs: List[ValConfig], test_mode: bool):
from copy import deepcopy
from tqdm import tqdm
assert self.keep_threshold is not None
if not configs:
msg_logger.info("No configurations found.")
return []
ret_configs = []
total_error = 0
for cfg in tqdm(configs, leave=False):
cfg: ValConfig
if cfg.validated_qos is not None:
continue
cfg.validated_qos, _ = self.app.measure_qos_cost(cfg.knobs, False)
msg_logger.debug(f"Validation: {cfg.qos} (mean) -> {cfg.test_qos} (mean)")
cfg = deepcopy(cfg)
qos, _ = self.app.measure_qos_cost(cfg.knobs, test_mode)
if test_mode:
assert cfg.test_qos is None
cfg.test_qos = qos
msg_logger.debug(f"Calibration: {cfg.qos} (mean) -> {qos} (mean)")
else:
assert cfg.validated_qos is None
cfg.validated_qos = qos
msg_logger.debug(f"Validation: {cfg.qos} (mean) -> {qos} (mean)")
total_error += abs(cfg.qos - qos)
if qos > self.keep_threshold:
ret_configs.append(cfg)
else:
msg_logger.debug("Config removed")
mean_err = total_error / len(configs)
if test_mode:
msg_logger.info("QoS mean abs difference of calibration: %f", mean_err)
else:
msg_logger.info("QoS mean abs difference of validation: %f", mean_err)
msg_logger.info("%d of %d configs remain", len(ret_configs), len(configs))
return ret_configs
def plot_configs(
self, show_qos_loss: bool = False, connect_best_points: bool = False
) -> plt.Figure:
if not self.tuned:
raise RuntimeError(
f"No tuning session has been run; call self.tune() first."
)
def _get_app_kwargs(self, cost_model: str, qos_model: str):
return {"cost_model": cost_model, "qos_model": qos_model}
def get_points(confs, validated):
def qos_speedup(conf):
return conf.validated_qos if validated else conf.qos, conf.speedup
sorted_points = np.array(
sorted([qos_speedup(c) for c in confs], key=lambda p: p[0])
).T
if show_qos_loss:
sorted_points[0] = self.baseline_qos - sorted_points[0]
return sorted_points
fig, ax = plt.subplots()
kept_confs = get_points(self.kept_configs, False)
best_confs = get_points(self.best_configs, False)
best_confs_val = get_points(self.best_configs, True)
ax.plot(kept_confs[0], kept_confs[1], "o", label="valid")
mode = "-o" if connect_best_points else "o"
ax.plot(best_confs[0], best_confs[1], mode, label="best")
mode = "-o" if connect_best_points else "o"
ax.plot(best_confs_val[0], best_confs_val[1], mode, label="best_validated")
ax.set_xlabel("QoS Loss" if show_qos_loss else "QoS")
ax.set_ylabel("Speedup (x)")
ax.legend()
return fig
@classmethod
def _get_config_class(cls) -> Type[Config]:
......
......@@ -153,6 +153,7 @@ class TorchApp(ModeledApp, abc.ABC):
target = move_to_device_recursively(target, self.device)
qos = self.tensor_to_qos(tensor_output[begin:end], target)
qoses.append(qos)
begin = end
return self.combine_qos(np.array(qoses))
p1_storage = self.model_storage / "p1.pkl" if self.model_storage else None
......