Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
P
predtuner
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
llvm
predtuner
Commits
73298315
Commit
73298315
authored
4 years ago
by
Yifan Zhao
Browse files
Options
Downloads
Patches
Plain Diff
Updated documentation
parent
7102f5b1
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
doc/getting_started.rst
+57
-26
57 additions, 26 deletions
doc/getting_started.rst
doc/tuning_result.png
+0
-0
0 additions, 0 deletions
doc/tuning_result.png
examples/tune_vgg16_cifar10.py
+1
-1
1 addition, 1 deletion
examples/tune_vgg16_cifar10.py
with
58 additions
and
27 deletions
doc/getting_started.rst
+
57
−
26
View file @
73298315
...
@@ -23,7 +23,7 @@ but it is optimized for tuning DNN applications defined in PyTorch.
...
@@ -23,7 +23,7 @@ but it is optimized for tuning DNN applications defined in PyTorch.
We will use models predefined in PredTuner for demonstration purposes.
We will use models predefined in PredTuner for demonstration purposes.
Download pretrained VGG16 model parameters and CIFAR10 dataset from `here
Download pretrained VGG16 model parameters and CIFAR10 dataset from `here
<https://drive.google.com/file/d/1Z84z-nsv_nbrr8t9i28UoxSJg-Sd_Ddu/view?usp=sharing>`_.
<https://drive.google.com/file/d/1Z84z-nsv_nbrr8t9i28UoxSJg-Sd_Ddu/view?usp=sharing>`_.
After extraction, there should be a
:code:
`model_params/` folder in current directory.
After extraction, there should be a `model_params/` folder in current directory.
Load the tuning and test subsets of CIFAR10 dataset, and create a pretrained VGG16 model:
Load the tuning and test subsets of CIFAR10 dataset, and create a pretrained VGG16 model:
...
@@ -55,7 +55,7 @@ while the test dataset is used to evaluate configurations found in autotuning.
...
@@ -55,7 +55,7 @@ while the test dataset is used to evaluate configurations found in autotuning.
This is similar to the split between training and validation set in machine learning tasks.
This is similar to the split between training and validation set in machine learning tasks.
In this case, both tuning and test datasets contain 5000 images.
In this case, both tuning and test datasets contain 5000 images.
Create an instance of
:code:
`TorchApp` for tuning PyTorch DNN:
Create an instance of `TorchApp` for tuning PyTorch DNN:
.. code-block:: python
.. code-block:: python
...
@@ -69,17 +69,17 @@ Create an instance of :code:`TorchApp` for tuning PyTorch DNN:
...
@@ -69,17 +69,17 @@ Create an instance of :code:`TorchApp` for tuning PyTorch DNN:
model_storage_folder="vgg16_cifar10/",
model_storage_folder="vgg16_cifar10/",
)
)
PredTuner provides
:code:
`TorchApp`, which is specialized for the use scenario of tuning PyTorch DNNs.
PredTuner provides `TorchApp`, which is specialized for the use scenario of tuning PyTorch DNNs.
In addition, two more functions from PredTuner are used:
In addition, two more functions from PredTuner are used:
:code:
`pt.accuracy` is the *classification accuracy* metric,
`pt.accuracy` is the *classification accuracy* metric,
which receives the probability distribution output from the VGG16 model,
which receives the probability distribution output from the VGG16 model,
compare it to the groundtruth in the dataset,
compare it to the groundtruth in the dataset,
and returns a scalar between 0 and 100 for the classification accuracy
and returns a scalar between 0 and 100 for the classification accuracy
:code:
`pt.get_knobs_from_file()` returns a set of approximations preloaded in PredTuner,
`pt.get_knobs_from_file()` returns a set of approximations preloaded in PredTuner,
which are applied to
:code:
`torch.nn.Conv2d` layers.
which are applied to `torch.nn.Conv2d` layers.
See ??? for these approximations and how to define
your own
approximations.
See ??? for these approximations and how to define
custom
approximations.
Now we can obtain a tuner object from the application and start tuning.
Now we can obtain a tuner object from the application and start tuning.
We will keep configurations that don't exceed 3% loss of accuracy,
We will keep configurations that don't exceed 3% loss of accuracy,
...
@@ -89,21 +89,36 @@ but encourage the tuner to find configurations with loss of accuracy below 2.1%.
...
@@ -89,21 +89,36 @@ but encourage the tuner to find configurations with loss of accuracy below 2.1%.
tuner = app.get_tuner()
tuner = app.get_tuner()
tuner.tune(
tuner.tune(
max_iter=100,
max_iter=500,
qos_tuner_threshold=2.1, # QoS threshold to guide tuner into
qos_tuner_threshold=2.1, # QoS threshold to guide tuner into
qos_keep_threshold=3.0, # QoS threshold for which we actually keep the thresholds
qos_keep_threshold=3.0, # QoS threshold for which we actually keep the configurations
is_threshold_relative=True, # Thresholds are relative to baseline -- baseline_acc - 2.1
is_threshold_relative=True, # Thresholds are relative to baseline -- baseline_acc - 2.1
perf_model="perf_linear", # Use linear performance predictor
take_best_n=50,
cost_model="cost_linear", # Use linear cost predictor
)
)
:code:`max_iter` defines the number of iterations to use in autotuning.
**QoS** (quality of service) is a general term for the quality of application after approximations are applied;
100 iterations is for demonstration; in practice,
e.g., here it refers to the accuracy of DNN over given datasets.
We will be using the term QoS throughout the tutorials.
`max_iter` defines the number of iterations to use in autotuning.
Within 500 iterations, PredTuner should find about 200 valid configurations.
PredTuner will also automatically mark out `Pareto-optimal
<https://en.wikipedia.org/wiki/Pareto_efficiency>`_
configurations.
These are called "best" configurations (`tuner.best_configs`),
in contrast to "valid" configurations which are the configurations that satisfy our accuracy requirements
(`tuner.kept_configs`).
`take_best_n` allows taking some extra close-optimal configurations in addition to Pareto-optimal ones.
500 iterations is for demonstration; in practice,
at least 10000 iterations are necessary on VGG16-sized models to converge to a set of good configurations.
at least 10000 iterations are necessary on VGG16-sized models to converge to a set of good configurations.
Depending on hardware performance, this tuning should take several minutes to several tens of minutes.
Saving Tuning Results
Saving Tuning Results
---------------------
---------------------
Now the
:code:
`tuner` object holds the tuning results,
Now the `tuner` object holds the tuning results,
we can export it into a json file,
we can export it into a json file,
and visualize all configurations in a figure:
and visualize all configurations in a figure:
...
@@ -113,21 +128,37 @@ and visualize all configurations in a figure:
...
@@ -113,21 +128,37 @@ and visualize all configurations in a figure:
fig = tuner.plot_configs(show_qos_loss=True)
fig = tuner.plot_configs(show_qos_loss=True)
fig.savefig("vgg16_cifar10/configs.png")
fig.savefig("vgg16_cifar10/configs.png")
PredTuner will also automatically mark out `Pareto-optimal
<https://en.wikipedia.org/wiki/Pareto_efficiency>`_
configurations.
These are called "best" configurations (:code:`tuner.best_configs`),
in contrast to "valid" configurations which are the configurations that satisfy our accuracy requirements
(:code:`tuner.kept_configs`).
Within 100 iterations, PredTuner should find 30~50 valid configurations.
The generated figure should look like this:
The generated figure should look like this:
.. image:: tuning_result.png
.. image:: tuning_result.png
where the blue points shows the QoS and speedup of all valid configurations,
and the "best" configurations are marked out in orange.
Autotuning with a QoS Model
-------------------------------------
The previous tuning session shown above is already slow,
and will be much slower with larger models, more iterations, and multiple tuning thresholds.
Instead, we can use a *QoS prediction model* which predicts the QoS,
with some inaccuracies, but much faster than running the application.
To do that, simply use the argument `qos_model` when calling `tuner.tune()`:
Loading Tuning Results
.. code-block:: python
----------------------
TODO: TODO
tuner = app.get_tuner()
tuner.tune(
max_iter=500,
qos_tuner_threshold=2.1, # QoS threshold to guide tuner into
qos_keep_threshold=3.0, # QoS threshold for which we actually keep the configurations
is_threshold_relative=True, # Thresholds are relative to baseline -- baseline_acc - 2.1
take_best_n=50,
cost_model="cost_linear", # Use linear cost predictor
qos_model="qos_p1"
)
The QoS model will first undergo a initialization stage (takes a bit of time),
when it learns about the behavior of each knob on each operator (DNN layer).
Because the configurations will end up with predicted QoS values after tuning,
this will add a *validation* stage at the end of tuning where the QoS of best configurations are empirically measured,
and the bad ones are removed.
This diff is collapsed.
Click to expand it.
doc/tuning_result.png
+
0
−
0
View replaced file @
7102f5b1
View file @
73298315
25.3 KiB
|
W:
|
H:
26.3 KiB
|
W:
|
H:
2-up
Swipe
Onion skin
This diff is collapsed.
Click to expand it.
examples/tune_vgg16_cifar10.py
+
1
−
1
View file @
73298315
...
@@ -44,7 +44,7 @@ tuner = app.get_tuner()
...
@@ -44,7 +44,7 @@ tuner = app.get_tuner()
tuner
.
tune
(
tuner
.
tune
(
max_iter
=
500
,
# TODO: In practice, use at least 5000, or 10000
max_iter
=
500
,
# TODO: In practice, use at least 5000, or 10000
qos_tuner_threshold
=
2.1
,
# QoS threshold to guide tuner into
qos_tuner_threshold
=
2.1
,
# QoS threshold to guide tuner into
qos_keep_threshold
=
3.0
,
# QoS threshold for which we actually keep the
threshold
s
qos_keep_threshold
=
3.0
,
# QoS threshold for which we actually keep the
configuration
s
is_threshold_relative
=
True
,
# Thresholds are relative to baseline -- baseline_acc - 2.1
is_threshold_relative
=
True
,
# Thresholds are relative to baseline -- baseline_acc - 2.1
cost_model
=
"
cost_linear
"
,
# Use linear performance predictor
cost_model
=
"
cost_linear
"
,
# Use linear performance predictor
qos_model
=
"
qos_p1
"
,
# Use P1 QoS predictor
qos_model
=
"
qos_p1
"
,
# Use P1 QoS predictor
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment