Fixed getting-started doc

3dfefcee · Yifan Zhao · 3229f04d · 3dfefcee
Commit 3dfefcee authored 4 years ago by Yifan Zhao
--- a/hpvm/docs/getting-started.rst
+++ b/hpvm/docs/getting-started.rst
@@ -23,7 +23,7 @@ This package lives at ``projects/torch2hpvm`` and should have been installed by
 The Keras frontend serves a similar purpose, and its usage can be found in the
 :doc:`documentation </components/keras-frontend>`.
-*Note* that below we'll be working under directory ``test/dnn_benchmarks``,
+**Note** that below we'll be working under directory ``test/dnn_benchmarks``,
 for easier access to ``test/dnn_benchmarks/model_params/``.
 You can also symlink it to other locations -- don't move it: it's used in test cases --
 and adjust the paths below accordingly.
@@ -45,7 +45,7 @@ where ``tune`` and ``test`` prefixes signify tuning and testing set.
 `BinDataset` is a utility `torch2hpvm` provides for creating dataset over binary files.
 Any instance of `torch.utils.data.Dataset` can be used here.
-*Note* that each `module` is bound to 2 datasets: a "tune" and a "test" set.
+**Note** that each `module` is bound to 2 datasets: a "tune" and a "test" set.
 The generated binary accepts an argument to be either the string "tune" or "test",
 and performs inference over a dataset accordingly.
@@ -55,10 +55,10 @@ Create a DNN `module` and load the checkpoint:
   import torch
   from torch.nn import Module
-   import dnn  # Defined at `hpvm/test/dnn_benchmarks/pytorch`
+   from pytorch import dnn  # Defined at `hpvm/test/dnn_benchmarks/pytorch/dnn`
-   model: Module = dnn.VGG16()
+   model: Module = dnn.VGG16Cifar10()
-   checkpoint = "model_params/vgg16_cifar10.pth.tar"
+   checkpoint = "model_params/pytorch/vgg16_cifar10.pth.tar"
   model.load_state_dict(torch.load(checkpoint))
 Any `torch.nn.Module` can be similarly used,
@@ -84,8 +84,7 @@ Now we are ready to export the model. The main functioning class of `torch2hpvm`
 and path to the compiled binary respectively.
 `batch_size` is the batch size the binary uses during inference.
-*
+* **Note** that `conf_file` is the path to an HPVM approximation configuration file.
-  *Note* that `conf_file` is the path to an HPVM approximation configuration file.
  This file decides what approximation the binary will use during inference.
  This path is hardcoded into the binary and is only read when the binary starts,
  so it's fine to have `conf_file` point to a non-existing path.
@@ -95,7 +94,8 @@ and path to the compiled binary respectively.
  a helper that invokes the HPVM compiler for you.
 Now there should be a binary at ``./vgg16_cifar10/build/vgg16_cifar10``.
-Try running ``./vgg16_cifar10/build/vgg16_cifar10 test`` for inference over the test set.
+Running it without argument will perform an inference over the test set.
+(The accuracy of inference is written to the file ``./final_accuracy``.)
 Compiling a Tuner Binary
 ------------------------
@@ -105,7 +105,7 @@ To use the autotuner, we need a slightly different binary that can talk with the
 The following code is almost identical to the last code block, 
 but it adds `target="hpvm_tensor_inspect"` to `ModelExporter`,
 to require an autotuner binary.
-It also doesn't define a `conf_file`.
+It also doesn't require a `conf_file` argument.
 .. code-block:: python
@@ -115,11 +115,12 @@ It also doesn't define a `conf_file`.
   tuner_build_dir = tuner_output_dir / "build"
   tuner_binary = tuner_build_dir / "vgg16_cifar10"
   exporter = ModelExporter(model, tuneset, testset, tuner_output_dir, target="hpvm_tensor_inspect")
+   metadata_file = tuner_output_dir / exporter.metadata_file_name
   exporter.generate(batch_size=500).compile(tuner_binary, tuner_build_dir)
 This binary is generated at ``vgg16_cifar10_tuner/build/vgg16_cifar10``.
 It waits for autotuner signal and doesn't run on its own, so don't run it by yourself.
-Instead, import and use the tuner `predtuner`:
+Instead, import the tuner `predtuner`, and tell the path to the binary (`tuner_binary`) to the tuner to use it:
 .. code-block:: python
@@ -128,7 +129,6 @@ Instead, import and use the tuner `predtuner`:
   # Set up logger to put log file in /tmp
   msg_logger = config_pylogger(output_dir="/tmp", verbose=True)
-   metadata_file = output_dir / exporter.metadata_file_name
   # Create a `PipedBinaryApp` that communicates with HPVM bin.
   # "TestHPVMApp" is an identifier of this app (used in logging, etc.) and can be anything.
   # Other arguments: 
@@ -160,20 +160,22 @@ Instead, import and use the tuner `predtuner`:
   fig.savefig("configs.png", dpi=300)
   app.dump_hpvm_configs(tuner.best_configs, "hpvm_confs.txt")
-*
+* **Note** that the performance shown here is estimated.
-  *Note* that the performance shown here is estimated.
+  ``cost_model="cost_linear"`` estimates the performance of a configuration
+  using the FLOPs of each operator and the FLOPs reduction of each approximation.
  The next section talks about profiling (on a different machine),
  which shows the real performance.
+  * If you are tuning on the end device that you wish to run the inference on, (which is a rare case),
+    then removing this argument will make the tuner measure real performance instead.
+    In that case, you may skip the profiling step.
 * Arguments `cost_model` and `qos_model` controls the models used in tuning.
  No models are used when the argument is omitted.
  For example, you can do an empirical tuning run by removing `qos_model="qos_p1"`.
-* `cost_model="cost_linear"` estimates the performance of a configuration
+* The `metadata_file` variable passed to the tuner is the path to a metadata file generated by the frontend;
-  using the FLOPs of each operator and the FLOPs reduction of each approximation.
+  the tuner reads it to know how many operators are there and what are the applicable knobs to each operator.
-  If you are tuning on the end device that you wish to run the inference on, (which is a rare case),
-  then removing this argument will make the tuner use real performance instead.
-  In that case, you may skip the profiling step.
 This tuning process should take a few minutes to half an hour,
 depending on your GPU performance.
@@ -188,20 +190,18 @@ It is also possible to save the configuration in other formats
 Profiling the Configurations
 ----------------------------
-We will use `hpvm_profiler`, another Python package for profiling the ``./hpvm_confs.txt``
+We will use `hpvm_profiler` (a Python package) for profiling the ``./hpvm_confs.txt``
 we obtained in the tuning step.
-* The profiler uses the *plain* binary generated in the beginning (its path is `target_binary`)
+* The profiler uses the *plain* binary generated in the beginning (``./vgg16_cifar10/build/vgg16_cifar10``)
  instead of the tuner binary.
-*
+* **Note** that you may want to run this profiling step on the edge device
-  *Note* that you may want to run this profiling step on the edge device
  where the performance gain is desired.
  As the compiled binary is usually not compatible across architectures,
  you need to install HPVM on the edge device and recompile the model.
-*
+* **Also note** that currently,
-  *Also note* that currently,
  the approximation implementations in the tensor runtime are tuned for Jetson TX2,
  and speedup may be less for other architectures.