Compare revisions

Yifan Zhao · Yifan Zhao · Yifan Zhao · Yifan Zhao · Yifan Zhao · Yifan Zhao
--- a/README.md
+++ b/README.md
 # Autotuning and Predictive Autotuning

-Performs autotuning on program approximation knobs using an error-predictive proxy in place of the
-original program, to greatly speedup autotuning while getting results comparable in quality.
+`predtuner` performs autotuning on program approximation knobs using an error-predictive proxy
+in place of the original program, to greatly speedup autotuning while getting results
+comparable in quality.

 Work in progress.

 ## Requirements

-Prerequisite packages are listed in `./env.yaml`. Conda is the validated and recommended way to set
-up a working environment. If you're using conda, do
+`pip` is needed for installing this package. At the root directory of this repo, do:

 ```bash
-conda env create -n predtuner -f env.yaml
-conda activate predtuner
+pip install -e .
 ```
+
+`-e` can be omitted if you don't intend to modify the code in this package.
+
+## Getting Started
+
+The documentation page contains a full tutorial.
+Build the documentation by:
+
+```bash
+pip install sphinx sphinx_rtd_theme sphinx_autodoc_typehints
+cd doc
+make html
+```
+
+The documentation page will be created as `doc/build/html/index.html`.
+You can open this in the browser and browse to "Getting Started" section.
+
+### Model Data for Example / Testing
+
+`predtuner` contains 10 demo models which are also used in tests.
+
+- Download and extract [this](https://drive.google.com/file/d/1V_yd9sKcZQ7zhnO5YhRpOsaBPLEEvM9u/view?usp=sharing) file containing all 10 models, for testing purposes.
+- The "Getting Started" example on the documentation page only uses VGG16-CIFAR10.
+  If you don't need the other models, get the data for VGG16-CIFAR10
+  [here](https://drive.google.com/file/d/1Z84z-nsv_nbrr8t9i28UoxSJg-Sd_Ddu/view?usp=sharing).
+
+In either case, there should be a `model_params/` folder at the root of repo after extraction.
--- a/doc/getting_started.rst
+++ b/doc/getting_started.rst
 Getting Started
 ===================
+
+This guide can help you start working with PredTuner.
+
+Installation
+------------
+
+Install PredTuner from source using `pip`:
+
+.. code-block:: shell
+
+   pip install -e .
+
+PredTuner will also be available on PyPi in the future after we publish the first release.
+
+Tuning a PyTorch DNN
+--------------------
+
+PredTuner can tune any user-defined application,
+but it is optimized for tuning DNN applications defined in PyTorch.
+
+We will use models predefined in PredTuner for demonstration purposes.
+Download pretrained VGG16 model parameters and CIFAR10 dataset from `here
+<https://drive.google.com/file/d/1Z84z-nsv_nbrr8t9i28UoxSJg-Sd_Ddu/view?usp=sharing>`_.
+After extraction, there should be a `model_params/` folder in current directory.
+
+Load the tuning and test subsets of CIFAR10 dataset, and create a pretrained VGG16 model:
+
+.. code-block:: python
+  
+  from pathlib import Path
+  import predtuner as pt
+  from predtuner.model_zoo import CIFAR, VGG16Cifar10
+
+  prefix = Path("model_params/vgg16_cifar10")
+  tune_set = CIFAR.from_file(prefix / "tune_input.bin", prefix / "tune_labels.bin")
+  tune_loader = DataLoader(tune_set, batch_size=500)
+  test_set = CIFAR.from_file(prefix / "test_input.bin", prefix / "test_labels.bin")
+  test_loader = DataLoader(test_set, batch_size=500)
+
+  module = VGG16Cifar10()
+  module.load_state_dict(torch.load("model_params/vgg16_cifar10.pth.tar"))
+
+PredTuner provides a logging mechanism.
+While not required, it's recommended that you set up the logger output into a file:
+
+.. code-block:: python
+
+  msg_logger = pt.config_pylogger(output_dir="vgg16_cifar10/", verbose=True)
+
+For each tuning task, both a tuning dataset and a test dataset is required.
+The tuning dataset is used to evaluate the accuracy of application in the autotuning stage,
+while the test dataset is used to evaluate configurations found in autotuning.
+This is similar to the split between training and validation set in machine learning tasks.
+In this case, both tuning and test datasets contain 5000 images.
+
+Create an instance of `TorchApp` for tuning PyTorch DNN:
+
+.. code-block:: python
+
+  app = pt.TorchApp(
+    "TestTorchApp",  # Application name -- can be anything
+    module,
+    tune_loader,
+    test_loader,
+    knobs=pt.get_knobs_from_file(),
+    tensor_to_qos=pt.accuracy,
+    model_storage_folder="vgg16_cifar10/",
+  )
+
+PredTuner provides `TorchApp`, which is specialized for the use scenario of tuning PyTorch DNNs.
+In addition, two more functions from PredTuner are used:
+
+`pt.accuracy` is the *classification accuracy* metric,
+which receives the probability distribution output from the VGG16 model,
+compare it to the groundtruth in the dataset,
+and returns a scalar between 0 and 100 for the classification accuracy
+
+`pt.get_knobs_from_file()` returns a set of approximations preloaded in PredTuner,
+which are applied to `torch.nn.Conv2d` layers.
+See ??? for these approximations and how to define custom approximations.
+
+Now we can obtain a tuner object from the application and start tuning.
+We will keep configurations that don't exceed 3% loss of accuracy,
+but encourage the tuner to find configurations with loss of accuracy below 2.1%.
+
+.. code-block:: python
+
+  tuner = app.get_tuner()
+  tuner.tune(
+    max_iter=500,
+    qos_tuner_threshold=2.1,  # QoS threshold to guide tuner into
+    qos_keep_threshold=3.0,  # QoS threshold for which we actually keep the configurations
+    is_threshold_relative=True,  # Thresholds are relative to baseline -- baseline_acc - 2.1
+    take_best_n=50,
+    cost_model="cost_linear",  # Use linear cost predictor
+  )
+
+**QoS** (quality of service) is a general term for the quality of application after approximations are applied;
+e.g., here it refers to the accuracy of DNN over given datasets.
+We will be using the term QoS throughout the tutorials.
+
+`max_iter` defines the number of iterations to use in autotuning.
+Within 500 iterations, PredTuner should find about 200 valid configurations.
+PredTuner will also automatically mark out `Pareto-optimal
+<https://en.wikipedia.org/wiki/Pareto_efficiency>`_
+configurations.
+These are called "best" configurations (`tuner.best_configs`),
+in contrast to "valid" configurations which are the configurations that satisfy our accuracy requirements
+(`tuner.kept_configs`).
+`take_best_n` allows taking some extra close-optimal configurations in addition to Pareto-optimal ones.
+
+500 iterations is for demonstration; in practice,
+at least 10000 iterations are necessary on VGG16-sized models to converge to a set of good configurations.
+Depending on hardware performance, this tuning should take several minutes to several tens of minutes.
+
+Saving Tuning Results
+---------------------
+
+Now the `tuner` object holds the tuning results,
+we can export it into a json file,
+and visualize all configurations in a figure:
+
+.. code-block:: python
+
+  tuner.dump_configs("vgg16_cifar10/configs.json", best_only=False)
+  fig = tuner.plot_configs(show_qos_loss=True)
+  fig.savefig("vgg16_cifar10/configs.png")
+
+The generated figure should look like this:
+
+.. image:: tuning_result.png
+
+where the blue points shows the QoS and speedup of all valid configurations,
+and the "best" configurations are marked out in orange.
+
+Autotuning with a QoS Model
+-------------------------------------
+
+The previous tuning session shown above is already slow, 
+and will be much slower with larger models, more iterations, and multiple tuning thresholds.
+Instead, we can use a *QoS prediction model* which predicts the QoS,
+with some inaccuracies, but much faster than running the application.
+To do that, simply use the argument `qos_model` when calling `tuner.tune()`:
+
+.. code-block:: python
+
+  tuner = app.get_tuner()
+  tuner.tune(
+    max_iter=500,
+    qos_tuner_threshold=2.1,  # QoS threshold to guide tuner into
+    qos_keep_threshold=3.0,  # QoS threshold for which we actually keep the configurations
+    is_threshold_relative=True,  # Thresholds are relative to baseline -- baseline_acc - 2.1
+    take_best_n=50,
+    cost_model="cost_linear",  # Use linear cost predictor
+    qos_model="qos_p1"
+  )
+
+The QoS model will first undergo a initialization stage (takes a bit of time),
+when it learns about the behavior of each knob on each operator (DNN layer).
+Because the configurations will end up with predicted QoS values after tuning,
+this will add a *validation* stage at the end of tuning where the QoS of best configurations are empirically measured,
+and the bad ones are removed.
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -9,27 +9,21 @@ PredTuner performs autotuning on approximation choices for a program
 using an error-predictive proxy instead of executing the program,
 to greatly speedup autotuning while getting results of comparable quality.

-PredTuner is a contribution of [ApproxTuner]
-(https://ppopp21.sigplan.org/details/PPoPP-2021-main-conference/41/ApproxTuner-A-Compiler-and-Runtime-System-for-Adaptive-Approximations).
-
-Short-term Goals
- Measure accuracy impact of approximations
- Obtain a tuned, approximated CNN in <5 lines of code
- Easy to manage multiple approximation configs
- Easy to load and manage prior tuning results
- Flexible retraining support
-Possible Long-term Goals
- High-performance implementations of approximate layers
- Allow users to register their own approximations
- Support for other frameworks: TF, ONNX, JAX
+PredTuner is a main component of `ApproxTuner
+<https://ppopp21.sigplan.org/details/PPoPP-2021-main-conference/41/ApproxTuner-A-Compiler-and-Runtime-System-for-Adaptive-Approximations>`_.

-Documentation
-------------

-.. only:: html
+Solution for Efficient Approximation Autotuning
+-----------------------------------------------
+
+- Start a tuning session in 10 lines of code
+- Deep integration with PyTorch for DNN supports
+- Multiple levels of APIs for generality and ease-of-use
+- Effective accuracy prediction models
+- Easily store and visualize tuning results in many formats

-    :Release: |version|
-    :Date: |today|
+Documentation
+-------------

 .. toctree::
   :maxdepth: 1
@@ -41,6 +35,3 @@ Indices and tables
 ------------------

 * :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
-* :ref:`glossary`
--- a/doc/tuning_result.png
+++ b/doc/tuning_result.png
--- a/examples/tune_vgg16_cifar10.py
+++ b/examples/tune_vgg16_cifar10.py
+import site
+from pathlib import Path
+
+import torch
+from torch.utils.data.dataloader import DataLoader
+from torch.utils.data.dataset import Subset
+
+site.addsitedir(Path(__file__).parent.parent.absolute().as_posix())
+from predtuner import TorchApp, accuracy, config_pylogger, get_knobs_from_file
+from predtuner.model_zoo import CIFAR, VGG16Cifar10
+
+# Set up logger to put log file in /tmp
+msg_logger = config_pylogger(output_dir="/tmp", verbose=True)
+
+# Load "tuning" dataset and "test" dataset,
+# and use only the first 500 images from each dataset as an example
+# TODO: you should use all (5000) images for actual tuning.
+prefix = Path("model_params/vgg16_cifar10")
+tune_set = CIFAR.from_file(prefix / "tune_input.bin", prefix / "tune_labels.bin")
+tune_loader = DataLoader(tune_set, batch_size=500)
+test_set = CIFAR.from_file(prefix / "test_input.bin", prefix / "test_labels.bin")
+test_loader = DataLoader(test_set, batch_size=500)
+
+# Load checkpoint for VGG16 (CIFAR10)
+module = VGG16Cifar10()
+module.load_state_dict(torch.load("model_params/vgg16_cifar10.pth.tar"))
+app = TorchApp(
+    "TestTorchApp",  # name -- can be anything
+    module,
+    tune_loader,
+    test_loader,
+    get_knobs_from_file(),  # default knobs -- see "predtuner/approxes/default_approx_params.json"
+    accuracy,  # the QoS metric to use -- classification accuracy
+    # Where to serialize prediction models if they are used
+    # For example, if you use p1 (see below), this will leave you a
+    # tuner_results/vgg16_cifar10/p1.pkl
+    # which can be quickly reloaded the next time you do tuning with
+    model_storage_folder="tuner_results/vgg16_cifar10",
+)
+# This is how to measure baseline accuracy -- {} means no approximation
+baseline, _ = app.measure_qos_cost({}, False)
+# Get a tuner object and start tuning!
+tuner = app.get_tuner()
+tuner.tune(
+    max_iter=500,  # TODO: In practice, use at least 5000, or 10000
+    qos_tuner_threshold=2.1,  # QoS threshold to guide tuner into
+    qos_keep_threshold=3.0,  # QoS threshold for which we actually keep the configurations
+    is_threshold_relative=True,  # Thresholds are relative to baseline -- baseline_acc - 2.1
+    cost_model="cost_linear",  # Use linear performance predictor
+    qos_model="qos_p1",  # Use P1 QoS predictor
+)
+# Save configs here when you're done
+tuner.dump_configs("tuner_results/vgg16_cifar10_configs.json")
+fig = tuner.plot_configs(show_qos_loss=True)
+fig.savefig("tuner_results/vgg16_cifar10_configs.png")
\ No newline at end of file
--- a/predtuner/approxapp.py
+++ b/predtuner/approxapp.py
@@ -116,10 +116,6 @@ class Config:
    @property
    def speedup(self):
        return 1 / self.cost
-        
-    @property
-    def qos_speedup(self):
-        return self.qos, self.speedup


 T = TypeVar("T", bound=Config)
@@ -151,7 +147,7 @@ class ApproxTuner(Generic[T]):
        is_threshold_relative: bool = False,
        take_best_n: Optional[int] = None,
        test_configs: bool = True,
-        **kwargs
+        app_kwargs: dict = None
        # TODO: more parameters + opentuner param forwarding
    ) -> List[T]:
        from opentuner.tuningrunmain import TuningRunMain
@@ -173,7 +169,7 @@ class ApproxTuner(Generic[T]):
            qos_tuner_threshold,
            qos_keep_threshold,
            is_threshold_relative,
-            self._get_app_kwargs(**kwargs),
+            app_kwargs or {},
        )
        assert self.keep_threshold is not None
        trm = TuningRunMain(tuner, opentuner_args)
@@ -206,27 +202,41 @@ class ApproxTuner(Generic[T]):
            len(self.best_configs),
        )
        if test_configs:
-            msg_logger.info("Checking configurations on test inputs")
-            self.test_configs_(self.best_configs)
+            msg_logger.info("Calibrating configurations on test inputs")
+            self.best_configs = self.test_configs(self.best_configs)
        return self.best_configs

-    def test_configs_(self, configs: List[T]):
+    def test_configs(self, configs: List[Config]):
+        from copy import deepcopy
+
        from tqdm import tqdm

+        assert self.keep_threshold is not None
+        if not configs:
+            return []
+        ret_configs = []
+        total_error = 0
        for cfg in tqdm(configs, leave=False):
-            cfg: T
-            if cfg.test_qos is not None:
-                continue
+            cfg = deepcopy(cfg)
+            assert cfg.test_qos is None
            cfg.test_qos, _ = self.app.measure_qos_cost(cfg.knobs, True)
            msg_logger.debug(f"Calibration: {cfg.qos} (mean) -> {cfg.test_qos} (mean)")
+            total_error += abs(cfg.qos - cfg.test_qos)
+            if cfg.test_qos > self.keep_threshold:
+                ret_configs.append(cfg)
+            else:
+                msg_logger.debug("Config removed")
+        mean_err = total_error / len(configs)
+        msg_logger.info("QoS mean abs difference of calibration: %f", mean_err)
+        return ret_configs

    @staticmethod
    def take_best_configs(configs: List[T], n: Optional[int] = None) -> List[T]:
-        points = np.array([c.qos_speedup for c in configs])
+        points = np.array([(c.qos, c.speedup) for c in configs])
        taken_idx = is_pareto_efficient(points, take_n=n)
        return [configs[i] for i in taken_idx]

-    def dump_configs(self, filepath: PathLike):
+    def dump_configs(self, filepath: PathLike, best_only: bool = True):
        import os

        from jsonpickle import encode
@@ -237,20 +247,27 @@ class ApproxTuner(Generic[T]):
            )
        filepath = Path(filepath)
        os.makedirs(filepath.parent, exist_ok=True)
+        confs = self.best_configs if best_only else self.kept_configs
        with filepath.open("w") as f:
-            f.write(encode(self.best_configs, indent=2))
+            f.write(encode(confs, indent=2))

    def plot_configs(
-        self, show_qos_loss: bool = False, connect_best_points: bool = False
+        self,
+        show_qos_loss: bool = False,
+        connect_best_points: bool = False,
+        use_test_qos: bool = False,
    ) -> plt.Figure:
        if not self.tuned:
            raise RuntimeError(
                f"No tuning session has been run; call self.tune() first."
            )

+        def qos_speedup(conf):
+            return conf.test_qos if use_test_qos else conf.qos, conf.speedup
+
        def get_points(confs):
            sorted_points = np.array(
-                sorted([c.qos_speedup for c in confs], key=lambda p: p[0])
+                sorted([qos_speedup(c) for c in confs], key=lambda p: p[0])
            ).T
            if show_qos_loss:
                sorted_points[0] = self.baseline_qos - sorted_points[0]
@@ -297,9 +314,6 @@ class ApproxTuner(Generic[T]):
            **app_kwargs,
        )

-    def _get_app_kwargs(self, **kwargs):
-        return {}
-
    @classmethod
    def _get_config_class(cls) -> Type[Config]:
        return Config

--- a/predtuner/modeledapp.py
+++ b/predtuner/modeledapp.py
@@ -5,6 +5,7 @@ import pickle
 from pathlib import Path
 from typing import Callable, Dict, Iterator, List, Optional, Tuple, Type, Union

+import matplotlib.pyplot as plt
 import numpy as np
 import pandas as pd
 import torch
@@ -192,6 +193,8 @@ class QoSModelP1(IQoSModel):
        qos_metric: Callable[[torch.Tensor], float],
        storage: PathLike = None,
    ) -> None:
+        from torch.nn.functional import softmax
+
        super().__init__()
        self.app = app
        self.output_f = tensor_output_getter
@@ -387,32 +390,89 @@ class ApproxModeledTuner(ApproxTuner):
            qos_keep_threshold=qos_keep_threshold,
            is_threshold_relative=is_threshold_relative,
            take_best_n=take_best_n,
-            test_configs=test_configs,
-            cost_model=cost_model,
-            qos_model=qos_model,
+            test_configs=False,  # Test configs below by ourselves
+            app_kwargs={"cost_model": cost_model, "qos_model": qos_model},
        )
        if validate_configs is None and qos_model != "none":
            msg_logger.info(
                'Validating configurations due to using qos model "%s"', qos_model
            )
-            self.validate_configs_(self.best_configs)
+            self.best_configs = self._update_configs(self.best_configs, False)
        elif validate_configs:
            msg_logger.info("Validating configurations as user requested")
-            self.validate_configs_(self.best_configs)
+            self.best_configs = self._update_configs(self.best_configs, False)
+        if test_configs:
+            msg_logger.info("Calibrating configurations on test inputs")
+            self.best_configs = self._update_configs(self.best_configs, True)
        return ret

-    def validate_configs_(self, configs: List[ValConfig]):
+    def _update_configs(self, configs: List[ValConfig], test_mode: bool):
+        from copy import deepcopy
+
        from tqdm import tqdm

+        assert self.keep_threshold is not None
+        if not configs:
+            msg_logger.info("No configurations found.")
+            return []
+        ret_configs = []
+        total_error = 0
        for cfg in tqdm(configs, leave=False):
-            cfg: ValConfig
-            if cfg.validated_qos is not None:
-                continue
-            cfg.validated_qos, _ = self.app.measure_qos_cost(cfg.knobs, False)
-            msg_logger.debug(f"Validation: {cfg.qos} (mean) -> {cfg.test_qos} (mean)")
+            cfg = deepcopy(cfg)
+            qos, _ = self.app.measure_qos_cost(cfg.knobs, test_mode)
+            if test_mode:
+                assert cfg.test_qos is None
+                cfg.test_qos = qos
+                msg_logger.debug(f"Calibration: {cfg.qos} (mean) -> {qos} (mean)")
+            else:
+                assert cfg.validated_qos is None
+                cfg.validated_qos = qos
+                msg_logger.debug(f"Validation: {cfg.qos} (mean) -> {qos} (mean)")
+            total_error += abs(cfg.qos - qos)
+            if qos > self.keep_threshold:
+                ret_configs.append(cfg)
+            else:
+                msg_logger.debug("Config removed")
+        mean_err = total_error / len(configs)
+        if test_mode:
+            msg_logger.info("QoS mean abs difference of calibration: %f", mean_err)
+        else:
+            msg_logger.info("QoS mean abs difference of validation: %f", mean_err)
+        msg_logger.info("%d of %d configs remain", len(ret_configs), len(configs))
+        return ret_configs
+
+    def plot_configs(
+        self, show_qos_loss: bool = False, connect_best_points: bool = False
+    ) -> plt.Figure:
+        if not self.tuned:
+            raise RuntimeError(
+                f"No tuning session has been run; call self.tune() first."
+            )

-    def _get_app_kwargs(self, cost_model: str, qos_model: str):
-        return {"cost_model": cost_model, "qos_model": qos_model}
+        def get_points(confs, validated):
+            def qos_speedup(conf):
+                return conf.validated_qos if validated else conf.qos, conf.speedup
+
+            sorted_points = np.array(
+                sorted([qos_speedup(c) for c in confs], key=lambda p: p[0])
+            ).T
+            if show_qos_loss:
+                sorted_points[0] = self.baseline_qos - sorted_points[0]
+            return sorted_points
+
+        fig, ax = plt.subplots()
+        kept_confs = get_points(self.kept_configs, False)
+        best_confs = get_points(self.best_configs, False)
+        best_confs_val = get_points(self.best_configs, True)
+        ax.plot(kept_confs[0], kept_confs[1], "o", label="valid")
+        mode = "-o" if connect_best_points else "o"
+        ax.plot(best_confs[0], best_confs[1], mode, label="best")
+        mode = "-o" if connect_best_points else "o"
+        ax.plot(best_confs_val[0], best_confs_val[1], mode, label="best_validated")
+        ax.set_xlabel("QoS Loss" if show_qos_loss else "QoS")
+        ax.set_ylabel("Speedup (x)")
+        ax.legend()
+        return fig

    @classmethod
    def _get_config_class(cls) -> Type[Config]:

--- a/predtuner/torchapp.py
+++ b/predtuner/torchapp.py
@@ -153,6 +153,7 @@ class TorchApp(ModeledApp, abc.ABC):
                target = move_to_device_recursively(target, self.device)
                qos = self.tensor_to_qos(tensor_output[begin:end], target)
                qoses.append(qos)
+                begin = end
            return self.combine_qos(np.array(qoses))

        p1_storage = self.model_storage / "p1.pkl" if self.model_storage else None
No results found