Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
H
hpvm-release
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
llvm
hpvm-release
Commits
3dfefcee
Commit
3dfefcee
authored
3 years ago
by
Yifan Zhao
Browse files
Options
Downloads
Patches
Plain Diff
Fixed getting-started doc
parent
3229f04d
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
hpvm/docs/getting-started.rst
+24
-24
24 additions, 24 deletions
hpvm/docs/getting-started.rst
with
24 additions
and
24 deletions
hpvm/docs/getting-started.rst
+
24
−
24
View file @
3dfefcee
...
...
@@ -23,7 +23,7 @@ This package lives at ``projects/torch2hpvm`` and should have been installed by
The Keras frontend serves a similar purpose, and its usage can be found in the
:doc:`documentation </components/keras-frontend>`.
*Note* that below we'll be working under directory ``test/dnn_benchmarks``,
*
*Note*
*
that below we'll be working under directory ``test/dnn_benchmarks``,
for easier access to ``test/dnn_benchmarks/model_params/``.
You can also symlink it to other locations -- don't move it: it's used in test cases --
and adjust the paths below accordingly.
...
...
@@ -45,7 +45,7 @@ where ``tune`` and ``test`` prefixes signify tuning and testing set.
`BinDataset` is a utility `torch2hpvm` provides for creating dataset over binary files.
Any instance of `torch.utils.data.Dataset` can be used here.
*Note* that each `module` is bound to 2 datasets: a "tune" and a "test" set.
*
*Note*
*
that each `module` is bound to 2 datasets: a "tune" and a "test" set.
The generated binary accepts an argument to be either the string "tune" or "test",
and performs inference over a dataset accordingly.
...
...
@@ -55,10 +55,10 @@ Create a DNN `module` and load the checkpoint:
import torch
from torch.nn import Module
import dnn # Defined at `hpvm/test/dnn_benchmarks/pytorch`
from pytorch
import dnn # Defined at `hpvm/test/dnn_benchmarks/pytorch
/dnn
`
model: Module = dnn.VGG16()
checkpoint = "model_params/vgg16_cifar10.pth.tar"
model: Module = dnn.VGG16
Cifar10
()
checkpoint = "model_params/
pytorch/
vgg16_cifar10.pth.tar"
model.load_state_dict(torch.load(checkpoint))
Any `torch.nn.Module` can be similarly used,
...
...
@@ -84,8 +84,7 @@ Now we are ready to export the model. The main functioning class of `torch2hpvm`
and path to the compiled binary respectively.
`batch_size` is the batch size the binary uses during inference.
*
*Note* that `conf_file` is the path to an HPVM approximation configuration file.
* **Note** that `conf_file` is the path to an HPVM approximation configuration file.
This file decides what approximation the binary will use during inference.
This path is hardcoded into the binary and is only read when the binary starts,
so it's fine to have `conf_file` point to a non-existing path.
...
...
@@ -95,7 +94,8 @@ and path to the compiled binary respectively.
a helper that invokes the HPVM compiler for you.
Now there should be a binary at ``./vgg16_cifar10/build/vgg16_cifar10``.
Try running ``./vgg16_cifar10/build/vgg16_cifar10 test`` for inference over the test set.
Running it without argument will perform an inference over the test set.
(The accuracy of inference is written to the file ``./final_accuracy``.)
Compiling a Tuner Binary
------------------------
...
...
@@ -105,7 +105,7 @@ To use the autotuner, we need a slightly different binary that can talk with the
The following code is almost identical to the last code block,
but it adds `target="hpvm_tensor_inspect"` to `ModelExporter`,
to require an autotuner binary.
It also doesn't
defin
e a `conf_file`.
It also doesn't
requir
e a `conf_file`
argument
.
.. code-block:: python
...
...
@@ -115,11 +115,12 @@ It also doesn't define a `conf_file`.
tuner_build_dir = tuner_output_dir / "build"
tuner_binary = tuner_build_dir / "vgg16_cifar10"
exporter = ModelExporter(model, tuneset, testset, tuner_output_dir, target="hpvm_tensor_inspect")
metadata_file = tuner_output_dir / exporter.metadata_file_name
exporter.generate(batch_size=500).compile(tuner_binary, tuner_build_dir)
This binary is generated at ``vgg16_cifar10_tuner/build/vgg16_cifar10``.
It waits for autotuner signal and doesn't run on its own, so don't run it by yourself.
Instead, import
and use
the tuner `predtuner`:
Instead, import the tuner `predtuner`
, and tell the path to the binary (`tuner_binary`) to the tuner to use it
:
.. code-block:: python
...
...
@@ -128,7 +129,6 @@ Instead, import and use the tuner `predtuner`:
# Set up logger to put log file in /tmp
msg_logger = config_pylogger(output_dir="/tmp", verbose=True)
metadata_file = output_dir / exporter.metadata_file_name
# Create a `PipedBinaryApp` that communicates with HPVM bin.
# "TestHPVMApp" is an identifier of this app (used in logging, etc.) and can be anything.
# Other arguments:
...
...
@@ -160,20 +160,22 @@ Instead, import and use the tuner `predtuner`:
fig.savefig("configs.png", dpi=300)
app.dump_hpvm_configs(tuner.best_configs, "hpvm_confs.txt")
*
*Note* that the performance shown here is estimated.
* **Note** that the performance shown here is estimated.
``cost_model="cost_linear"`` estimates the performance of a configuration
using the FLOPs of each operator and the FLOPs reduction of each approximation.
The next section talks about profiling (on a different machine),
which shows the real performance.
* If you are tuning on the end device that you wish to run the inference on, (which is a rare case),
then removing this argument will make the tuner measure real performance instead.
In that case, you may skip the profiling step.
* Arguments `cost_model` and `qos_model` controls the models used in tuning.
No models are used when the argument is omitted.
For example, you can do an empirical tuning run by removing `qos_model="qos_p1"`.
* `cost_model="cost_linear"` estimates the performance of a configuration
using the FLOPs of each operator and the FLOPs reduction of each approximation.
If you are tuning on the end device that you wish to run the inference on, (which is a rare case),
then removing this argument will make the tuner use real performance instead.
In that case, you may skip the profiling step.
* The `metadata_file` variable passed to the tuner is the path to a metadata file generated by the frontend;
the tuner reads it to know how many operators are there and what are the applicable knobs to each operator.
This tuning process should take a few minutes to half an hour,
depending on your GPU performance.
...
...
@@ -188,20 +190,18 @@ It is also possible to save the configuration in other formats
Profiling the Configurations
----------------------------
We will use `hpvm_profiler`
, another
Python package for profiling the ``./hpvm_confs.txt``
We will use `hpvm_profiler`
(a
Python package
)
for profiling the ``./hpvm_confs.txt``
we obtained in the tuning step.
* The profiler uses the *plain* binary generated in the beginning (
its path is `target_binary
`)
* The profiler uses the *plain* binary generated in the beginning (
``./vgg16_cifar10/build/vgg16_cifar10`
`)
instead of the tuner binary.
*
*Note* that you may want to run this profiling step on the edge device
* **Note** that you may want to run this profiling step on the edge device
where the performance gain is desired.
As the compiled binary is usually not compatible across architectures,
you need to install HPVM on the edge device and recompile the model.
*
*Also note* that currently,
* **Also note** that currently,
the approximation implementations in the tensor runtime are tuned for Jetson TX2,
and speedup may be less for other architectures.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment