From 734eb22fec9011db92d89289098fb054a357b0e9 Mon Sep 17 00:00:00 2001 From: Yifan Zhao <yifanz16@illinois.edu> Date: Fri, 26 Mar 2021 05:43:19 -0500 Subject: [PATCH] A framework for documentation on website --- README.md | 173 ++---------- hpvm/docs/KerasFrontend.md | 191 ------------- hpvm/docs/Makefile | 109 +++++++ hpvm/docs/README.md | 28 ++ hpvm/docs/compilation.md | 15 - hpvm/docs/components/index.rst | 30 ++ hpvm/docs/components/keras-benchmarks.rst | 177 ++++++++++++ hpvm/docs/components/keras-frontend.rst | 43 +++ hpvm/docs/components/torch2hpvm.rst | 1 + hpvm/docs/conf.py | 156 ++++++++++ hpvm/docs/getting-started.rst | 4 + hpvm/docs/hpvm-c.md | 131 --------- hpvm/docs/hpvm-specification.md | 221 --------------- hpvm/docs/index.rst | 40 +++ hpvm/docs/install.rst | 153 ++++++++++ hpvm/docs/references/compilation-process.rst | 21 ++ hpvm/docs/references/hpvm-c.rst | 151 ++++++++++ hpvm/docs/references/hpvm-specification.rst | 266 ++++++++++++++++++ hpvm/docs/references/index.rst | 11 + hpvm/docs/tests.rst | 1 + .../{alexnet2.pdf => alexnet2_cifar10.pdf} | Bin .../{alexnet.pdf => alexnet_cifar10.pdf} | Bin .../docs/tradeoff-curves/alexnet_imagenet.pdf | Bin 14150 -> 14159 bytes .../alexnet_imagenet_tradeoff.pdf | Bin 14159 -> 0 bytes hpvm/docs/tradeoff-curves/index.rst | 6 + .../{mobilenet.pdf => mobilenet_cifar10.pdf} | Bin .../{resnet18.pdf => resnet18_cifar10.pdf} | Bin .../tradeoff-curves/resnet50_imagenet.pdf | Bin 14144 -> 14155 bytes .../resnet50_imagenet_tradeoff.pdf | Bin 14155 -> 0 bytes hpvm/projects/torch2hpvm/README.md | 111 -------- hpvm/projects/torch2hpvm/README.rst | 148 ++++++++++ hpvm/test/README.md | 91 ------ hpvm/test/README.rst | 129 +++++++++ 33 files changed, 1490 insertions(+), 917 deletions(-) delete mode 100644 hpvm/docs/KerasFrontend.md create mode 100644 hpvm/docs/Makefile create mode 100644 hpvm/docs/README.md delete mode 100644 hpvm/docs/compilation.md create mode 100644 hpvm/docs/components/index.rst create mode 100644 hpvm/docs/components/keras-benchmarks.rst create mode 100644 hpvm/docs/components/keras-frontend.rst create mode 120000 hpvm/docs/components/torch2hpvm.rst create mode 100644 hpvm/docs/conf.py create mode 100644 hpvm/docs/getting-started.rst delete mode 100644 hpvm/docs/hpvm-c.md delete mode 100644 hpvm/docs/hpvm-specification.md create mode 100644 hpvm/docs/index.rst create mode 100644 hpvm/docs/install.rst create mode 100644 hpvm/docs/references/compilation-process.rst create mode 100644 hpvm/docs/references/hpvm-c.rst create mode 100644 hpvm/docs/references/hpvm-specification.rst create mode 100644 hpvm/docs/references/index.rst create mode 120000 hpvm/docs/tests.rst rename hpvm/docs/tradeoff-curves/{alexnet2.pdf => alexnet2_cifar10.pdf} (100%) rename hpvm/docs/tradeoff-curves/{alexnet.pdf => alexnet_cifar10.pdf} (100%) delete mode 100644 hpvm/docs/tradeoff-curves/alexnet_imagenet_tradeoff.pdf create mode 100644 hpvm/docs/tradeoff-curves/index.rst rename hpvm/docs/tradeoff-curves/{mobilenet.pdf => mobilenet_cifar10.pdf} (100%) rename hpvm/docs/tradeoff-curves/{resnet18.pdf => resnet18_cifar10.pdf} (100%) delete mode 100644 hpvm/docs/tradeoff-curves/resnet50_imagenet_tradeoff.pdf delete mode 100644 hpvm/projects/torch2hpvm/README.md create mode 100644 hpvm/projects/torch2hpvm/README.rst delete mode 100644 hpvm/test/README.md create mode 100644 hpvm/test/README.rst diff --git a/README.md b/README.md index d1a28fae2e..c902d71c5f 100644 --- a/README.md +++ b/README.md @@ -2,167 +2,26 @@ This repository contains the source code and documentation for the HPVM Compiler Infrastructure. -The README briefly describes how to get started with building and installing HPVM. It also provides a -benchmark suite to test the compiler infrastructure. +HPVM is a compiler for heterogeneous parallel system. +For more about what HPVM is, see [our website](https://publish.illinois.edu/hpvm-project/) +and publications: +[PPoPP'18 paper](https://dl.acm.org/doi/pdf/10.1145/3200691.3178493), +[OOPSLA'19 paper](https://dl.acm.org/doi/10.1145/3360612), +[PPoPP'21 paper](https://dl.acm.org/doi/10.1145/3437801.3446108). -HPVM is currently at **version 1.0**. For more about what HPVM is, see [our website](https://publish.illinois.edu/hpvm-project/). +HPVM is currently at **version 1.0**. -## Papers +For instruction on how to build and install HPVM, see [here](/hpvm/docs/install.rst); +for how to use HPVM, see [here](/hpvm/docs/getting_started.rst). -[PPoPP'18 paper](https://dl.acm.org/doi/pdf/10.1145/3200691.3178493) - -[OOPSLA'19 paper](https://dl.acm.org/doi/10.1145/3360612) - -[PPoPP'21 paper](https://dl.acm.org/doi/10.1145/3437801.3446108) - -## Resources - -[HPVM IR Specification](/hpvm/docs/hpvm-specification.md) - -[HPVM-C Language Specification](/hpvm/docs/hpvm-c.md) - -[HPVM Compilation Process](/hpvm/docs/compilation.md) - -## Dependencies - -The following components are required to be installed on your machine to build HPVM. - -* GCC (>=5.1) - * In addition, each version of CUDA-nvcc requires GCC to be not newer than a certain version. - See [here](https://gist.github.com/ax3l/9489132) for the support matrix. -* CMake (>=3.17) -* GNU Make (>=3.79) -* OpenCL (>=1.0.0) -* CUDA (>=9.1) -* Python (==3.6) with pip (>=20) - -Python must be strictly 3.6 (any subversion between 3.6.0~3.6.13). -Alternatively, if you use Anaconda for package management, -we provide a conda environment file that covers all Python and Python package requirements: - -```bash -conda env create -n hpvm -f hpvm/env.yaml -``` - -## Supported Targets - -Supported/tested CPU architectures: - -* Intel Xeon E5-2640 -* Intel Xeon W-2135 -* ARM Cortex A-57 - -Supported/tested GPU architectures for OpenCL backend: - -* Nvidia Quadro P1000 -* Nvidia GeForce GTX 1080 - -Supported/tested GPU architectures for Tensor Backend: - -* Nvidia Jetson TX2 -* Nvidia GeForce GTX 1080 - -HPVM has not been tested but might work on other CPUs supported by LLVM Backend, and GPUs supported by OpenCL such as Intel, AMD, etc. - -**NOTE**: Approximations are tuned for Jetson TX2 and same speedups may not exist for other architectures. - -## Getting Started - -### Getting source code and setting up environment - -Checkout HPVM and go to directory `./hpvm` under project root: - -```shell -git clone --recursive -b approx_hpvm_reorg --single-branch https://gitlab.engr.illinois.edu/llvm/hpvm.git -cd hpvm/ -``` - -HPVM needs to be able to find CUDA. -If CUDA is installed in your system's $PATH (e.g. if it was installed at the default location), -HPVM can find CUDA automatically. -Otherwise, some environment variables are required: - -* `CUDA_TOOLKIT_PATH` --- Path to the CUDA toolkit -* `CUDA_INCLUDE_PATH` --- Path to the CUDA headers -* `CUDA_LIB_PATH` --- Path to CUDA libraries - -`set_paths.sh` can be used for this. -Modify the values of these variables in `set_paths.sh` according to your system, and source the script: - -```shell -source set_paths.sh -``` - -HPVM installer script can be used to download, configure and build HPVM along with LLVM and Clang. - -```shell -bash install.sh -``` - -On launch, the installer asks whether it should also build HPVM. -If HPVM is to be built, the installer asks the number of threads to be used. -The default number of threads used to build HPVM is two (2). - -If you use this automatic build, skip the next section. - -* Specifically, the HPVM installer downloads LLVM, and Clang, copies HPVM source into - llvm/tools and builds the entire tree. It also builds a modified LLVM C-Backend, - based on the one maintained by [Julia Computing](https://github.com/JuliaComputing/llvm-cbe), - as a part of HPVM and is currently used to generate OpenCL kernels for GPUs. - -### Manually Build HPVM - -Alternatively, you can manually build HPVM with CMake. -Please note that in this case, -the installer script still *must* be executed to obtain some required components, -but without the build step. - -In current directory (`hpvm/`), do - -```shell -mkdir build -cd build -cmake ../llvm [options] -export PATH=$(realpath ./bin):$PATH -``` - -Some common options that can be used with CMake are: - -* -DCMAKE_INSTALL_PREFIX=directory --- Specify for directory the full pathname of where you want the HPVM tools and libraries to be installed. -* -DCMAKE_BUILD_TYPE=type --- Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is Debug. -* -DLLVM_ENABLE_ASSERTIONS=On --- Compile with assertion checks enabled (default is Yes for Debug builds, No for all other build types). - -**Note** that if the installer script was not used, -you must _manually add `build/bin` directory to your $PATH variable_ as absolute path (as shown above). - -Now, compile the HPVM Compilation Tool `approxhpvm.py` using: - -```shell -make -j<number of threads> approxhpvm.py -``` - -With all the aforementioned steps, HPVM should be built, installed, tested and ready to use. -In particular, `approxhpvm.py` should be an executable command from your command line. - -When not using the installer, you may want to run the regression tests using this script (outside of build directory): - -```shell -cd .. -bash scripts/automate_tests.sh -``` - -## Benchmarks and Tests - -We are providing the following [HPVM benchmarks](/hpvm/test/benchmarks): - -* Select benchmarks from the [Parboil](http://impact.crhc.illinois.edu/parboil/parboil.aspx) benchmark suite, located under [test/benchmarks/parboil](/hpvm/test/benchmarks/parboil). -* An edge detection pipeline benchmark, located under [test/benchmarks/pipeline](/hpvm/test/benchmarks/pipeline). -* A Camera ISP pipeline, located under [test/benchmarks/hpvm-cava](/hpvm/test/benchmarks/hpvm-cava), adapted from C code provided from our collaborators at [Harvard](http://vlsiarch.eecs.harvard.edu). +## Support -Benchmark descriptions and instructions on how to compile and run them are [here](/hpvm/test/benchmarks). +All questions can be directed to [hpvm-dev@lists.cs.illinois.edu](mailto:hpvm-dev@lists.cs.illinois.edu). -We are also providing [unit tests](/hpvm/test/unitTests) and [regression tests](/hpvm/test/regressionTests). +## References -## Support +Some documents on technical details and the internal working of HPVM: -All questions can be directed to [hpvm-dev@lists.cs.illinois.edu](mailto:hpvm-dev@lists.cs.illinois.edu). +* [HPVM IR Specification](/hpvm/docs/references/hpvm-specification.md) +* [HPVM-C Language Specification](/hpvm/docs/references/hpvm-c.md) +* [HPVM Compilation Process](/hpvm/docs/references/compilation-process.rst) diff --git a/hpvm/docs/KerasFrontend.md b/hpvm/docs/KerasFrontend.md deleted file mode 100644 index 3225b112ad..0000000000 --- a/hpvm/docs/KerasFrontend.md +++ /dev/null @@ -1,191 +0,0 @@ -# Keras Frontend - -Install Keras Frontend after moving to directory `/hpvm/hpvm/projects/keras` - -## Requirements - -* python == 3.6.x -* pip >= 18 - -If your system uses a different Python version, we recommend using the conda environment `keras_python36.yml`. Install this using: - -``` -conda env create -f keras_python36.yml --name keras_python36 -``` - -Activate the conda environment before installing the pip package (below) using: - -``` -conda activate keras_python36 -``` - -**NOTE:** This step must be performed each time (for each shell process) the frontend is to be used. - - -## Installing the Keras Frontend Package - -At the root of this project (`/projects/keras/`) install the Keras frontend pip package as: - -``` -pip3 install -e ./ -``` - -**NOTE:** If you are using the conda environment, activate it prior to this step. - -## Suppported Operations - -List of supported operations and limitations are documented [here](../projects/keras/docs/Support.md) - - - -# Keras Benchmarks - -Run the Keras benchmarks under `hpvm/hpvm/test/dnn_benchmarks/keras` - -## Download CNN Model Files - -Prior to running the benchmarks, ensure you download the CNN model data (inputs and weights) if not done in automatic build script. - -``` -wget https://databank.illinois.edu/datafiles/o3izd/download -O model_params.tar.gz -tar -xf model_params.tar.gz -``` - -Move extracted `model_params` directory to `/test/dnn_benchmarks/model_params` (Benchmarks expect data at this location) - - -## Running Benchmaks - -List of benchmarks and the expected accuracies: - -| Benchmark | Accuracy | -| ----------- | ----------- | -| alexnet.py | 79.28 | -| alexnet2.py | 84.98 | -| alexnet_imagenet.py | 56.30 | -| lenet.py | 98.70 | -| mobilenet_cifar10.py | 84.42 | -| resnet18_cifar10.py | 89.56 | -| resnet50_imagenet.py | 75.10 | -| vgg16_cifar10.py | 89.96 | -| vgg16_cifar100.py | 66.50 | -| vgg16_imagenet.py | 69.46 | - - -### Synopsis - -``` -python3 ${BENCH_NAME}.py [hpvm_reload|keras_reload] [frontend] [compile] - -``` - - -**Command-line Parameters** - -`hpvm_reload` : Reloads HPVM weights (`.bin` binary format used by HPVM weights - present in `model_params` download directory) from directory path specified in the `reload_dir` parameter set in code - this is described in "Parameters to Change in Code" (below). - -`keras_reload`: Alternatively, reload weights in Keras `.h5` file format with path to file specified in `keras_model_file` described in "Parameters to Change in Code" (below). - -`frontend`: Invokes the HPVM frontend and dumps weights (in HPVM `.bin` format) in the output directory specified. The parameters that control where data and source files are dumped are specified by parameters `data_dir` and `src_dir`, respectively. These are described below. - -`compile`: Optional Parameter. When specified, it compiles the HPVM-C code generated by the frontend into an HPVM binary under the directory specified by `src_dir` (described below). If `src_dir` path exists, a unique directory (which appends a unique ID) is created. -The binary is built with the name `HPVM_binary`. - -**NOTE:** Before running `HPVM_binary` necessary to set CUDA and CUDNN paths with: - -``` -source ${PATH_TO_YOUR_HPVM_ROOT}/hpvm/set_paths.sh -``` - -**Parameters to Change in Code** - -The AlexNet source is commented with explanations on how to use the Keras frontend interface. AlexNet source is [here](https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/alexnet.py). - -* `NAME`: Benchmark Name - Can be set to any desired value - -* `reload_dir`: Path to directory from where to reload weights in HPVM format. This directory is used to reload weights if `hpvm_reload` command-line option is used. - -* `keras_model_file`: Path to Keras .h5 model file to reload weigths from. Either of `reload_dir` or `keras_model_file` can be used. -`keras_model_file` is used when `keras_reload` commad-line parameter is used with the Benchmark script. - -* `data_dir`: Directory to dump weights specified specified in [constructor](https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/Benchmark.py#L21) - -* `src_dir`: Directory to dump ApproxHPVM sources in HPVM-C (C with HPVM compiler intrinsics) specified in [constructor](https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/Benchmark.py#L22) - -* `num_classes`: number of output classes - dependent on the dataset used. For CIFAR10, `num_classes` is 10, CIFAR100 has 100 classes, - for ImageNet, number of classes is 1000. - -* `batch_size`: This parameter controls the size of each batch that is processed in HPVM. The batch size should be kept as large as the GPU memory -can support. This parameter should be adapted according to the memory size of the deployed device. - - - -### Using the Frontend with Custom (New) Benchmarks - -Any new benchmarks must inherit from the commom parent `Benchmark` class -and override the virtual functions for building the model, training, -and data preprocessing. These methods are described below: - - -`def buildModel(self)`: -Constructs and returns a keras model - -`def data_preprocess(self)`: -returns X_train, y_train, X_test, y_test, X_tuner, and y_tuner data (in that order): -These are described here: - -* `X_train:` Training data (fp32) in NCHW format -* `y_train:` Training labels (int32) - -* `X_test:` Testing/Evaluation data in NCHW format -* `y_test:` Testing/Evaluation labels - -* `X_tuner:` Data to be used for autotuning -* `y_tuner:` Labels corresponding to tuning data - - -`def trainModel(self, model, X_train, y_train, X_test, y_test)`: -Trains the Keras model constructed in `buildModel` and is expected to return the -trained keras model - training parameters should be tuned here. - -### Directly using Keras Frontend API - -Alternate to extending the `Benchmark` class, users may directly invoke the Keras Frontend API. This can be done as: - -```python - -from keras_frontend.approxhpvm_translator import translate_to_approxhpvm - -# Construct and train your Keras Model (or load pre-trained weights) - -translate_to_approxhpvm(model, data_dir, src_dir, test_data, test_labels, tune_data, tune_labels, batch_size, num_classes) - -``` - -## Running HPVM Binary - -Run the `HPVM_binary` generated under the directory specified by `src_dir` (described above). Usage: - -``` -./HPVM_binary -t {test|tune} -c ${config_file_path} -``` - -`test|tune`: Runs with either tune (autotuning data) or test set (for evaluation) - -`config_file_path`: Path to an HPVM tensor configuration file (includes approximation settings) - -**NOTE:** The accuracy of the bennchmarks is dumped into a file named `final_accuracy` in the current working directory - this includes accuracy averaged across batches - -## Automated Tests - -`scripts/test_benchmarks.py` is an automated test script that evaluates the accuracy of each Benchmark in Keras and HPVM (after comilation using HPVM Compiler) and compares the accuracy of each binary to the known correct accuracy. Run from root of `/test/dnn_benchmarks/keras`: - -``` -python test_benchmarks.py -``` - - - - - - diff --git a/hpvm/docs/Makefile b/hpvm/docs/Makefile new file mode 100644 index 0000000000..b7000de485 --- /dev/null +++ b/hpvm/docs/Makefile @@ -0,0 +1,109 @@ +# Makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +PAPER = + +# Internal variables. +PAPEROPT_a4 = -D latex_paper_size=a4 +PAPEROPT_letter = -D latex_paper_size=letter +ALLSPHINXOPTS = -d build/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . + +.PHONY: help clean html dirhtml pickle json htmlhelp qthelp latex changes linkcheck doctest epub + +help: + @echo "Please use \`make <target>' where <target> is one of" + @echo " html to make standalone HTML files" + @echo " dirhtml to make HTML files named index.html in directories" + @echo " pickle to make pickle files" + @echo " epub to make an epub" + @echo " json to make JSON files" + @echo " htmlhelp to make HTML files and a HTML help project" + @echo " qthelp to make HTML files and a qthelp project" + @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" + @echo " changes to make an overview of all changed/added/deprecated items" + @echo " linkcheck to check all external links for integrity" + @echo " doctest to run all doctests embedded in the documentation (if enabled)" + @echo " gitwash to update the gitwash documentation" + + +clean: + -rm -rf build/* + +dist: html + test -d build/latex || make latex + make -C build/latex all-pdf + -rm -rf build/dist + (cd build/html; cp -r . ../../build/dist) + (cd build/dist && tar czf ../dist.tar.gz .) + +html: + $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) build/html + @echo + @echo "Build finished. The HTML pages are in build/html." + +dirhtml: + $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) build/dirhtml + @echo + @echo "Build finished. The HTML pages are in build/dirhtml." + +pickle: + $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) build/pickle + @echo + @echo "Build finished; now you can process the pickle files." + +json: + $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) build/json + @echo + @echo "Build finished; now you can process the JSON files." + +htmlhelp: + $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) build/htmlhelp + @echo + @echo "Build finished; now you can run HTML Help Workshop with the" \ + ".hhp project file in build/htmlhelp." + +qthelp: + $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) build/qthelp + @echo + @echo "Build finished; now you can run "qcollectiongenerator" with the" \ + ".qhcp project file in build/qthelp, like this:" + @echo "# qcollectiongenerator build/qthelp/test.qhcp" + @echo "To view the help file:" + @echo "# assistant -collectionFile build/qthelp/test.qhc" + +epub: + $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) build/epub + @echo + @echo "Build finished. The epub file is in $(BUILDDIR)/epub." + + +latex: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) build/latex + @echo + @echo "Build finished; the LaTeX files are in build/latex." + @echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \ + "run these through (pdf)latex." + +changes: + $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) build/changes + @echo + @echo "The overview file is in build/changes." + +linkcheck: + $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) build/linkcheck + @echo + @echo "Link check complete; look for any errors in the above output " \ + "or in build/linkcheck/output.txt." + +doctest: + $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) build/doctest + @echo "Testing of doctests in the sources finished, look at the " \ + "results in build/doctest/output.txt." + +latexpdf: latex + @echo "Running LaTeX files through latexmk..." + $(MAKE) -C build/latex all-pdf + @echo "latexmk finished; the PDF files are in build/latex." diff --git a/hpvm/docs/README.md b/hpvm/docs/README.md new file mode 100644 index 0000000000..7ea7dea380 --- /dev/null +++ b/hpvm/docs/README.md @@ -0,0 +1,28 @@ +# Building docs + +We use Sphinx for generating the API and reference documentation. + +## Instructions + +Install the following Python packages needed to build the documentation by entering: + +```bash +pip install sphinx sphinx-autodoc-typehints sphinx-rtd-theme numpydoc +``` + +To build the HTML documentation, enter:: + +```bash +make html +``` + +in the ``doc/`` directory. This will generate a ``build/html`` subdirectory +containing the built documentation. + +To build the PDF documentation, enter:: + +```bash +make latexpdf +``` + +You will need to have LaTeX installed for this. diff --git a/hpvm/docs/compilation.md b/hpvm/docs/compilation.md deleted file mode 100644 index 6381fec7d8..0000000000 --- a/hpvm/docs/compilation.md +++ /dev/null @@ -1,15 +0,0 @@ -# HPVM Compilation Process -Compilation of an HPVM program involves the following steps: - -1. `clang` takes an HPVM-C/C++ program (e.g. `main.c`) and produces an LLVM IR (`main.ll`) file that contains the HPVM-C function calls. The declarations of these functions are defined in `test/benchmark/include/hpvm.h`, which must be included in the program. -2. `opt` takes (`main.ll`) and invoke the GenHPVM pass on it, which converts the HPVM-C function calls to HPVM intrinsics. This generates the HPVM textual representation (`main.hpvm.ll`). -3. `opt` takes the HPVM textual representation (`main.hpvm.ll`) and invokes the following passes in sequence: - * BuildDFG: Converts the textual representation to the internal HPVM representation. - * LocalMem and DFG2LLVM_OpenCL: Invoked only when GPU target is selected. Generates the kernel module (`main.kernels.ll`) and the portion of the host code that invokes the kernel into the host module (`main.host.ll`). - * DFG2LLVM_CPU: Generates either all, or the remainder of the host module (`main.host.ll`) depending on the chosen target. - * ClearDFG: Deletes the internal HPVM representation from memory. -4. `clang` is used to to compile any remaining project files that would be later linked with the host module. -5. `llvm-link` takes the host module and all the other generate `ll` files, and links them with the HPVM runtime module (`hpvm-rt.bc`), to generate the linked host module (`main.host.linked.ll`). -6. Generate the executable code from the generated `ll` files for all parts of the program: - * GPU target: `llvm-cbe` takes the kernel module (`main.kernels.ll`) and generates an OpenCL representation of the kernels that will be invoked by the host. - * CPU target: `clang` takes the linked host module (`main.host.linked.ll`) and generates the CPU binary. diff --git a/hpvm/docs/components/index.rst b/hpvm/docs/components/index.rst new file mode 100644 index 0000000000..774c1ef1cd --- /dev/null +++ b/hpvm/docs/components/index.rst @@ -0,0 +1,30 @@ +Components +================================ + +HPVM consists of a few relatively independent key components. + +* Patched LLVM: provides HPVM IR and a compilation infrastructure, including ``clang`` and ``opt``. +* HPVM code generator: a few ``opt`` passes that lowers HPVM IR to LLVM IR, + which is then compiled into object code and binary. + +`Compilation process of HPVM <../references/hpvm-specification.html>`_ +shows how these 2 components work together. +In addition, there are: + +* Frontends (Keras/PyTorch): code generators in Python for lowering Keras and PyTorch + DNN models into HPVM-C format. +* Predictive tuner: an autotuner library in Python for finding approximation choices (configurations) + with best performance gain within some loss of Quality of Service (QoS, such as accuracy). +* HPVM profiler: an API in Python for measuring real performance of configurations. +* Tensor runtime: a backend which holds implementations for some common tensor operators + (such as convolution) that HPVM-C functions can be converted into. + +The documentation of these components are listed below, +which explains their role, usage, and other details. + +.. toctree:: + :maxdepth: 1 + + keras-frontend + keras-benchmarks + torch2hpvm diff --git a/hpvm/docs/components/keras-benchmarks.rst b/hpvm/docs/components/keras-benchmarks.rst new file mode 100644 index 0000000000..31a8d7bc92 --- /dev/null +++ b/hpvm/docs/components/keras-benchmarks.rst @@ -0,0 +1,177 @@ +Keras Benchmarks +================ + +TODO: some of this belongs to `test/`. + +Run the Keras benchmarks under ``hpvm/hpvm/test/dnn_benchmarks/keras`` + +Download CNN Model Files +------------------------ + +Prior to running the benchmarks, ensure you download the CNN model data (inputs and weights) if not done in automatic build script. + +.. code-block:: + + wget https://databank.illinois.edu/datafiles/o3izd/download -O model_params.tar.gz + tar -xf model_params.tar.gz + +Move extracted ``model_params`` directory to ``/test/dnn_benchmarks/model_params`` (Benchmarks expect data at this location) + +Running Benchmaks +----------------- + +List of benchmarks and the expected accuracies: + +.. list-table:: + :header-rows: 1 + + * - Benchmark + - Accuracy + * - alexnet.py + - 79.28 + * - alexnet2.py + - 84.98 + * - alexnet_imagenet.py + - 56.30 + * - lenet.py + - 98.70 + * - mobilenet_cifar10.py + - 84.42 + * - resnet18_cifar10.py + - 89.56 + * - resnet50_imagenet.py + - 75.10 + * - vgg16_cifar10.py + - 89.96 + * - vgg16_cifar100.py + - 66.50 + * - vgg16_imagenet.py + - 69.46 + + +Synopsis +^^^^^^^^ + +.. code-block:: + + python3 ${BENCH_NAME}.py [hpvm_reload|keras_reload] [frontend] [compile] + +**Command-line Parameters** + +``hpvm_reload`` : Reloads HPVM weights (``.bin`` binary format used by HPVM weights - present in ``model_params`` download directory) from directory path specified in the ``reload_dir`` parameter set in code - this is described in "Parameters to Change in Code" (below). + +``keras_reload``: Alternatively, reload weights in Keras ``.h5`` file format with path to file specified in ``keras_model_file`` described in "Parameters to Change in Code" (below). + +``frontend``: Invokes the HPVM frontend and dumps weights (in HPVM ``.bin`` format) in the output directory specified. The parameters that control where data and source files are dumped are specified by parameters ``data_dir`` and ``src_dir``, respectively. These are described below. + +``compile``: Optional Parameter. When specified, it compiles the HPVM-C code generated by the frontend into an HPVM binary under the directory specified by ``src_dir`` (described below). If ``src_dir`` path exists, a unique directory (which appends a unique ID) is created. +The binary is built with the name ``HPVM_binary``. + +**NOTE:** Before running ``HPVM_binary`` necessary to set CUDA and CUDNN paths with: + +.. code-block:: + + source ${PATH_TO_YOUR_HPVM_ROOT}/hpvm/set_paths.sh + +**Parameters to Change in Code** + +The AlexNet source is commented with explanations on how to use the Keras frontend interface. AlexNet source is `here <https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/alexnet.py>`_. + + +* + ``NAME``: Benchmark Name - Can be set to any desired value + +* + ``reload_dir``: Path to directory from where to reload weights in HPVM format. This directory is used to reload weights if ``hpvm_reload`` command-line option is used. + +* + ``keras_model_file``: Path to Keras .h5 model file to reload weigths from. Either of ``reload_dir`` or ``keras_model_file`` can be used. + ``keras_model_file`` is used when ``keras_reload`` commad-line parameter is used with the Benchmark script. + +* + ``data_dir``: Directory to dump weights specified specified in + `constructor <https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/Benchmark.py#L21>`_. + +* + ``src_dir``: Directory to dump ApproxHPVM sources in HPVM-C (C with HPVM compiler intrinsics) specified in + `constructor <https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/Benchmark.py#L22>`_. + +* + ``num_classes``: number of output classes - dependent on the dataset used. For CIFAR10, ``num_classes`` is 10, CIFAR100 has 100 classes, + for ImageNet, number of classes is 1000. + +* + ``batch_size``: This parameter controls the size of each batch that is processed in HPVM. The batch size should be kept as large as the GPU memory + can support. This parameter should be adapted according to the memory size of the deployed device. + +Using the Frontend with Custom (New) Benchmarks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Any new benchmarks must inherit from the commom parent ``Benchmark`` class +and override the virtual functions for building the model, training, +and data preprocessing. These methods are described below: + +``def buildModel(self)``: +Constructs and returns a keras model + +``def data_preprocess(self)``: +returns X_train, y_train, X_test, y_test, X_tuner, and y_tuner data (in that order): +These are described here: + + +* ``X_train:`` Training data (fp32) in NCHW format +* + ``y_train:`` Training labels (int32) + +* + ``X_test:`` Testing/Evaluation data in NCHW format + +* + ``y_test:`` Testing/Evaluation labels + +* + ``X_tuner:`` Data to be used for autotuning + +* ``y_tuner:`` Labels corresponding to tuning data + +``def trainModel(self, model, X_train, y_train, X_test, y_test)``: +Trains the Keras model constructed in ``buildModel`` and is expected to return the +trained keras model - training parameters should be tuned here. + +Directly using Keras Frontend API +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Alternate to extending the ``Benchmark`` class, users may directly invoke the Keras Frontend API. This can be done as: + +.. code-block:: python + + + from keras_frontend.approxhpvm_translator import translate_to_approxhpvm + + # Construct and train your Keras Model (or load pre-trained weights) + + translate_to_approxhpvm(model, data_dir, src_dir, test_data, test_labels, tune_data, tune_labels, batch_size, num_classes) + +Running HPVM Binary +------------------- + +Run the ``HPVM_binary`` generated under the directory specified by ``src_dir`` (described above). Usage: + +.. code-block:: + + ./HPVM_binary -t {test|tune} -c ${config_file_path} + +``test|tune``: Runs with either tune (autotuning data) or test set (for evaluation) + +``config_file_path``: Path to an HPVM tensor configuration file (includes approximation settings) + +**NOTE:** The accuracy of the bennchmarks is dumped into a file named ``final_accuracy`` in the current working directory - this includes accuracy averaged across batches + +Automated Tests +--------------- + +``scripts/test_benchmarks.py`` is an automated test script that evaluates the accuracy of each Benchmark in Keras and HPVM (after comilation using HPVM Compiler) and compares the accuracy of each binary to the known correct accuracy. Run from root of ``/test/dnn_benchmarks/keras``: + +.. code-block:: + + python test_benchmarks.py diff --git a/hpvm/docs/components/keras-frontend.rst b/hpvm/docs/components/keras-frontend.rst new file mode 100644 index 0000000000..51dc06ae62 --- /dev/null +++ b/hpvm/docs/components/keras-frontend.rst @@ -0,0 +1,43 @@ + +Keras Frontend +============== + +Install Keras Frontend after moving to directory ``/hpvm/hpvm/projects/keras`` + +Requirements +------------ + + +* python == 3.6.x +* pip >= 18 + +If your system uses a different Python version, we recommend using the conda environment ``keras_python36.yml``. Install this using: + +.. code-block:: + + conda env create -f keras_python36.yml --name keras_python36 + +Activate the conda environment before installing the pip package (below) using: + +.. code-block:: + + conda activate keras_python36 + +**NOTE:** This step must be performed each time (for each shell process) the frontend is to be used. + +Installing the Keras Frontend Package +------------------------------------- + +At the root of this project (``/projects/keras/``) install the Keras frontend pip package as: + +.. code-block:: + + pip3 install -e ./ + +**NOTE:** If you are using the conda environment, activate it prior to this step. + +Suppported Operations +--------------------- + +List of supported operations and limitations are documented `here <../projects/keras/docs/Support.md>`_. +TODO: move that Support.md in here as well, otherwise the link will fail when we publish to a website. diff --git a/hpvm/docs/components/torch2hpvm.rst b/hpvm/docs/components/torch2hpvm.rst new file mode 120000 index 0000000000..54d5dbb580 --- /dev/null +++ b/hpvm/docs/components/torch2hpvm.rst @@ -0,0 +1 @@ +../../projects/torch2hpvm/README.rst \ No newline at end of file diff --git a/hpvm/docs/conf.py b/hpvm/docs/conf.py new file mode 100644 index 0000000000..8e65a2b358 --- /dev/null +++ b/hpvm/docs/conf.py @@ -0,0 +1,156 @@ +from datetime import date +import sphinx_rtd_theme + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import sys + +sys.path.insert(0, os.path.abspath("..")) + +# General configuration +# --------------------- + +# Add any Sphinx extension module names here, as strings. They can be extensions +# coming with Sphinx (named 'sphinx.ext.*') or your custom ones. +extensions = [ + "sphinx.ext.autosummary", + "sphinx.ext.autodoc", + "sphinx_autodoc_typehints", + "sphinx.ext.coverage", + "sphinx.ext.doctest", + "sphinx.ext.intersphinx", + "sphinx.ext.mathjax", + "sphinx.ext.todo", + "sphinx.ext.viewcode", + "numpydoc", +] +always_document_param_types = True + +# generate autosummary pages +autosummary_generate = True + +# Add any paths that contain templates here, relative to this directory. +templates_path = ["_templates"] + +# The suffix of source filenames. +source_suffix = ".rst" + +# The encoding of source files. +source_encoding = "utf-8" + +# The master toctree document. +master_doc = "index" + +# General substitutions. +project = "HPVM" +copyright = f"2020-{date.today().year}, University of Illinois" + +# There are two options for replacing |today|: either, you set today to some +# non-false value, then it is used: +# today = '' +# Else, today_fmt is used as the format for a strftime call. +# today_fmt = '%B %d, %Y' + +# List of documents that shouldn't be included in the build. +# unused_docs = [''] + +# If true, '()' will be appended to :func: etc. cross-reference text. +# add_function_parentheses = True + +# If true, the current module name will be prepended to all description +# unit titles (such as .. function::). +add_module_names = False + +# show_authors = True + +# The name of the Pygments (syntax highlighting) style to use. +# pygments_style = 'friendly' +pygments_style = "sphinx" + +# A list of prefixs that are ignored when creating the module index. (new in Sphinx 0.6) +# modindex_common_prefix = ["networkx."] + +# doctest_global_setup = "import networkx as nx" + +# Options for HTML output +# ----------------------- + + +html_theme = "sphinx_rtd_theme" +html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] + +html_theme_options = { + "canonical_url": "https://networkx.org/documentation/stable/", + "navigation_depth": 3, + "logo_only": True, +} + +# html_logo = "_static/networkx_logo.svg" + +# The style sheet to use for HTML and HTML Help pages. A file of that name +# must exist either in Sphinx' static/ path, or in one of the custom paths +# given in html_static_path. +# html_style = '' + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = [] + +# If not '', a 'Last updated on:' timestamp is inserted at every page bottom, +# using the given strftime format. +html_last_updated_fmt = "%b %d, %Y" + +# If true, SmartyPants will be used to convert quotes and dashes to +# typographically correct entities. +# html_use_smartypants = True + +# Content template for the index page. +# html_index = 'index.html' + +# Custom sidebar templates, maps page names to templates. +# html_sidebars = {} + +# Additional templates that should be rendered to pages, maps page names to +# templates. +# html_additional_pages = {'': ''} + +# If true, the reST sources are included in the HTML build as _sources/<name>. +html_copy_source = False + +# Options for LaTeX output +# ------------------------ + +# Use a latex engine that allows for unicode characters in docstrings +latex_engine = "xelatex" +# The paper size ('letter' or 'a4'). +latex_paper_size = "letter" + +# The font size ('10pt', '11pt' or '12pt'). +# latex_font_size = '10pt' + +latex_appendices = ["tutorial"] + +# Intersphinx mapping +intersphinx_mapping = { + "python": ("https://docs.python.org/3/", None), + "numpy": ("https://numpy.org/doc/stable/", None), + "matplotlib": ("https://matplotlib.org", None), + "scipy": ("https://docs.scipy.org/doc/scipy/reference", None), + "pandas": ("https://pandas.pydata.org/pandas-docs/stable", None), + "pytorch": ("https://pytorch.org/docs/stable", None), +} + +# The reST default role (used for this markup: `text`) to use for all +# documents. +default_role = "obj" + +numpydoc_show_class_members = False + + +def setup(app): + app.add_css_file("custom.css") + app.add_js_file("copybutton.js") diff --git a/hpvm/docs/getting-started.rst b/hpvm/docs/getting-started.rst new file mode 100644 index 0000000000..f2f0c84525 --- /dev/null +++ b/hpvm/docs/getting-started.rst @@ -0,0 +1,4 @@ +Getting Started +=============== + +TODO: this is the system-wide tour Sasa was suggesting. Finish this. diff --git a/hpvm/docs/hpvm-c.md b/hpvm/docs/hpvm-c.md deleted file mode 100644 index 76cfde58c0..0000000000 --- a/hpvm/docs/hpvm-c.md +++ /dev/null @@ -1,131 +0,0 @@ -# HPVM-C Language Specification -An HPVM program is a combination of host code and one or more data flow graphs (DFG) at the IR level. We provide C function declarations representing the HPVM intrinsics that allow creating, querying, and interacting with the DFGs. More details about the HPVM IR intrinsics can be found in [the HPVM IR Specification.](/hpvm/docs/hpvm-specification.md). - -An HPVM-C program contains both the host and the DFG code. Each HPVM kernel, represented by a leaf node in the DFG, can be compiled to multiple different targets (e.g. CPU and GPU) as described below. - -This document describes all the API calls that can be used in an HPVM-C program. - -## Host API - -```void __hpvm__init()``` -Used before all other HPVM calls to initialize the HPVM runtime. - -```void __hpvm__cleanup()``` -Used at the end of HPVM program to clean up all remaining runtime-created HPVM objects. - -```void llvm_hpvm_track_mem(void* ptr, size_t sz)``` -Insert memory starting at ```ptr``` of size ```sz``` in the memory tracker of HPVM runtime. - -```void llvm_hpvm_untrack_mem(void* ptr)``` -Stop tracking the memory object identified by ```ptr```. - -```void llvm_hpvm_request_mem(void* ptr, size_t sz)``` -If the memory object identified by ```ptr``` is not in host memory, copy it to host memory. - -```void* __hpvm__launch(unsigned isStream, void* rootGraph, void* args)``` -Launches the execution of the dataflow graph with node function ```rootGraph```. ```args``` is a pointer to a packed struct, containing one field per argument of the RootGraph function, consecutively. For non-streaming DFGs with a non empty result type, ```args``` must contain an additional field of the type ```RootGraph.returnTy```, where the result of the graph will be returned. ```isStream``` chooses between a non streaming (0) or streaming (1) graph execution. Returns a handle to the executing graph. - -```void __hpvm__wait(void* G)``` -Waits for completion of execution of the dataflow graph with handle ```G```. - -```void __hpvm__push(void* G, void* args)``` -Push set of input data items, ```args```, (same as type included in launch) to streaming DFG with handle ```G```. - -```void* __hpvm__pop(void* G)``` -Pop and return data produced from one execution of streaming DFG with handle ```G```. The return type is a struct containing a field for every output of DFG. - -## Internal Node API - -```void* __hpvm__createNodeND(unsigned dims, void* F, ...)``` -Creates a static dataflow node replicated in ```dims``` dimensions (0 to 3), each executing node function ```F```. The arguments following ```F``` are the size of each dimension, respectively, passed in as a ```size_t```. Returns a handle to the created dataflow node. - -```void* __hpvm__edge(void* src, void* dst, unsigned replType, unsigned sp, unsigned dp, unsigned isStream)``` -Creates an edge from output ```sp``` of node ```src``` to input ```dp``` of node ```dst```. If ```replType``` is 0, the edge is a one-to-one edge, otherwise it is an all-to-all edge. ```isStream``` defines whether or not the edge is streaming. Returns a handle to the created edge. - -```void __hpvm__bindIn(void* N, unsigned ip, unsigned ic, unsigned isStream)``` -Binds the input ```ip``` of the current node to input ```ic``` of child node function ```N```. ```isStream``` defines whether or not the input bind is streaming. - -```void __hpvm__bindOut(void* N, unsigned op, unsigned oc, unsigned isStream)``` -Binds the output ```op``` of the current node to output ```oc``` of child node function ```N```. ```isStream``` defines whether or not the output bind is streaming. - -```void __hpvm__hint(enum Target target)``` (C\) -```void __hpvm__hint(hpvm::Target target)``` (C++) -Must be called once in each node function. Indicates which hardware target the current function should run in. - -```void __hpvm__attributes(unsigned ni, …, unsigned no, …)``` -Must be called once at the beginning of each node function. Defines the properties of the pointer arguments to the current function. ```ni``` represents the number of input arguments, and ```no``` the number of output arguments. The arguments following ```ni``` are the input arguments, and the arguments following ```no``` are the output arguments. Arguments can be marked as both input and output. All pointer arguments must be included. - -## Leaf Node API -```void __hpvm__hint(enum Target target)``` (C\) -```void __hpvm__hint(hpvm::Target target)``` (C++) -As described in internal node API. - -```void __hpvm__attributes(unsigned ni, …, unsigned no, …)``` -As described in internal node API. - -```void __hpvm__return(unsigned n, ...)``` -Returns ```n``` values from a leaf node function. The remaining arguments are the values to be returned. All ```__hpvm__return``` statements within the same function must return the same number of values. - -```void* __hpvm__getNode()``` -Returns a handle to the current leaf node. - -```void* __hpvm__getParentNode(void* N)``` -Returns a handle to the parent node of node ```N```. - -```long __hpvm__getNodeInstanceID_{x,y,z}(void* N)``` -Returns the dynamic ID of the current instance of node ```N``` in the x, y, or z dimension respectively. The dimension must be one of the dimensions in which the node is replicated. - -```long __hpvm__getNumNodeInstances_{x,y,z}(void* N)``` -Returns the number of dynamic instances of node ```N``` in the x, y, or z dimension respectively. The dimension must be one of the dimensions in which the node is replicated. - -```void* __hpvm__malloc(long nBytes)``` -Allocate a block of memory of size ```nBytes``` and returns a pointer to it. The allocated object can be shared by all nodes. *Note that the returned pointer must somehow be communicated explicitly for use by other nodes.* - -```int __hpvm__atomic_add(int* m, int v)``` -Atomically adds ```v``` to the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```int __hpvm__atomic_sub(int* m, int v)``` -Atomically subtracts ```v``` from the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```int __hpvm__atomic_min(int* m, int v)``` -Atomically computes the min of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```int __hpvm__atomic_max(int* m, int v)``` -Atomically computes the max of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```int __hpvm__atomic_xchg(int* m, int v)``` -Atomically swaps ```v``` with the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```int __hpvm__atomic_and(int* m, int v)``` -Atomically computes the bitwise AND of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```int __hpvm__atomic_or(int* m, int v)``` -Atomically computes the bitwise OR of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```int __hpvm__atomic_xor(int* m, int v)``` -Atomically computes the bitwise XOR of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```void __hpvm__barrier()``` -Local synchronization barrier across dynamic instances of current leaf node. - -# Porting a Program from C to HPVM-C - -The following represents the required steps to port a regular C program into an HPVM program with HPVM-C. These steps are described at a high level; for more detail, please see [hpvm-cava](/hpvm/test/benchmarks/hpvm-cava) provided in [benchmarks](/hpvm/test/benchmarks). -* Separate the computation that will become a kernel into its own (leaf node) function and add the attributes and target hint. -* Create a level 1 wrapper node function that will describe the thread-level parallelism (for the GPU). The node will: - * Use the ```createNode[ND]()``` method to create a kernel node and specify how many threads will execute it. - * Bind its arguments to the kernel arguments. -* If desired, create a level 2 wrapper node function which will describe the threadblock-level parallalism (for the GPU). This node will: - * Use the ```createNode[ND]()``` method to create a level 1 wrapper node and specify how many threadblocks will execute it. - * Bind its arguments to its child node's arguments. -* A root node function that creates all the top-level wrapper nodes, binds their arguments, and connects their edges. - * Each root node represents a DFG. -* All the above node functions have the combined arguments of all the kernels that are nested at each level. -* The host code will have to include the following: - * Initialize the HPVM runtime using the ```init()``` method. - * Create an argument struct for each DFG and assign its member variables. - * Add all the memory that is required by the kernel into the memory tracker. - * Launch the DFG by calling the ```launch()``` method on the root node function, and passing the corresponding argument struct. - * Wait for the DFG to complete execution. - * Read out any generated memory using the ```request_mem()``` method. - * Remove all the tracked memory from the memory tracker. diff --git a/hpvm/docs/hpvm-specification.md b/hpvm/docs/hpvm-specification.md deleted file mode 100644 index 54023fc9ed..0000000000 --- a/hpvm/docs/hpvm-specification.md +++ /dev/null @@ -1,221 +0,0 @@ -# Table of Contents -* [HPVM Abstraction](#abstraction) - * [Dataflow Node](#node) - * [Dataflow Edge](#edge) - * [Input and Output Bind](#bind) - * [Host Code](#host) -* [HPVM Implementation](#implementation) - * [Intrinsics for Describing Graphs](#describing) - * [Intrinsics for Querying Graphs](#querying) - * [Intrinsics for Memory Allocation and Synchronization](#memory) - * [Intrinsics for Graph Interaction](#interaction) -* [Implementation Limitations](#limitations) - -<a name="abstraction"></a> -# HPVM Abstraction -An HPVM program is a combination of host code plus a set of one or more distinct dataflow graphs. Each dataflow graph (DFG) is a hierarchical graph with side effects. The DFG must be acyclic. Nodes represent units of execution, and edges between nodes describe the explicit data transfer requirements. A node can begin execution once a data item becomes available on every one of its input edges. Repeated transfer of data items between nodes (if more inputs are provided) yields a pipelined execution of different nodes in the graph. The execution of a DFG is initiated and terminated by host code that launches the graph. Nodes may access globally shared memory through load and store instructions (side-effects). - -<a name="node"></a> -## Dataflow Node -A *dataflow node* represents unit of computation in the DFG. A node can begin execution once a data item becomes available on every one of its input edges. - -A single static dataflow node represents multiple dynamic instances of the node, each executing the same computation with different index values used to uniquely identify each dynamic instance w.r.t. the others. The dynamic instances of a node may be executed concurrently, and any required synchronization must imposed using HPVM synchronization operations. - -Each dataflow node in a DFG can either be a *leaf node* or an *internal node*. An internal node contains a complete DFG, called a *child graph*, and the child graph itself can have internal nodes and/or leaf nodes. - -Internal nodes only create the structure of the child graph, and cannot include actual computation. - -Leaf nodes contain code expressing actual computations. Leaf nodes may contain instructions to query the structure of the underlying DFG, and any non host side HPVM operation for synchronization and memory allocation. - -Note that the graph is fully interpreted at compile-time and cannot be modified at runtime except for the number of dynamic instances, which can be data dependent. - -<a name="edge"></a> -## Dataflow Edge -A *dataflow edge* from the output ```out``` of a source dataflow node ```Src``` to the input ```in``` of a sink dataflow node ```Dst``` describes the explicit data transfer requirements. ```Src``` and ```Dst``` node must belong to the same child graph, i.e. must be children of the same internal node. - -An edge from source to sink has the semantics of copying the specified data from the source to the sink after the source node has completed execution. The pairs ```(Src, out)``` and ```(Dst, in)```, representing source and sink respectively, must be unique w.r.t. every other edge in the same child graph, i.e. two dataflow edges in the same child graph cannot have the same source or destination. - -A static edge also represents multiple dynamic instances of that edge between the dynamic instances of the source and the sink nodes. - -An edge can be instantiated at runtime using one of two replication mechanisms: -- *All-to-all*, where all dynamic instances of the source node are connected to all dynamic instances of the sink node, thus expressing a synchronization barrier between the two groups of nodes, or -- *One-to-one*, where each dynamic instance of the source node is connected to a single corresponding instance of the sink node. One-to-one replication requires that the grid structure (number of dimensions and the extents in each dimension) of the source and sink nodes be identical. - -<a name="bind"></a> -## Input and Output Bind -An internal node is responsible for mapping its inputs, provided by incoming dataflow edges, to the inputs to one or more nodes of the child graph. - -An internal node binds its input ```ip``` to input ```ic``` of its child node ```Dst``` using an *input bind*. -The pair ```(Dst, ic)``` must be unique, i.e. no two input binds in the same graph can have the same destination, as that would create a conflict. Semantically, these represent name bindings of input values and not data movement. - -Conversely, an internal node binds output ```oc``` of its child node ```Src``` to its output ```op``` using an *output bind*. The pair ```(Src, oc)``` and destination ```op``` must be unique, i.e. no two output binds in the same graph can have the same source destination, as that would create a conflict. - -A bind is always ***all-to-all***. - -<a name="host"></a> -## Host Code -In an HPVM program, the host code is responsible for setting up, initiating the execution and blocking for completion of a DFG. The host can interact with the DFG to sustain a streaming computation by sending all data required for, and receiving all data produced by, one execution of the DFG. The list of actions that can be performed by the host is described below: - -- **Initialization and Cleanup**: -All HPVM operations must be enclosed by the HPVM initialization and cleanup. These operations perform initialization and cleanup of runtime constructs that provide the runtime support for HPVM. -- **Track Memory**: -Memory objects that are passed to dataflow graphs need to be managed by the HPVM runtime. Our memory model assumes two separate address spaces for host and device memory. The HPVM runtime includes a memory tracker for tracking the location of HPVM-managed memory objects between these address spaces. Track memory inserts the specified memory object in the memory tracker and starts tracking it. -- **Untrack Memory**: -Stop tracking specified memory object and remove it from memory tracker. -- **Request Memory**: -If the specified memory object is not present in host memory, copy it to host memory. -- **Launch**: -The host code initiates execution of specified DFG, either streaming or non streaming. - - Non streaming DFG: The host provides all data items required for execution of the DFG at the time of the launch. - - Streaming DFG: No data is provided by the launch operation. Streaming execution is sustained by push and pop operations, described below. -- **Push**: -Push a set of data items required for one graph execution to the specified DFG. The DFG must have been launched using a streaming launch operation. This is a blocking operation. -- **Pop**: -Read data produced from one execution of the specified DFG. The DFG must have been launched using a streaming launch operation. This is a blocking operation. -- **Wait**: -The host code blocks for completion of specified DFG. - - For a non-streaming DFG, the data produced by the DFG are ready to be read by the host. - - For a streaming DFG, no more data may be provided for processing by the DFG. - -<a name="implementation"></a> -# HPVM Implementation - -This section describes the implementation of HPVM on top of LLVM IR. - -iN is the N-bit integer type in LLVM. - -We use intrinsic functions to implement the HPVM IR. - -The code for each dataflow node is given as a separate LLVM function, called the node function. The node function may call additional, auxiliary functions. However, the auxiliary functions are not allowed to include any HPVM intrinsics, as they are not considered to be part of the HPVM node hierarchy. - -The incoming dataflow edges and their data types are denoted by the parameters to the node function. The outgoing dataflow edges are represented by the return type of the node function, which must be an LLVM struct type with zero or more fields (one per outgoing edge). - -Each top-level DFG in an HPVM program is defined by its own *root node function* which creates the underlying DFG structure. The DFG is the (internal) root node's child graph. Unlike regular internal nodes, the root node only has one dynamic instance because it instantiates the top-level DFG. The DFG is launched by the host code using the root node function, as described below. - -We represent nodes with opaque handles (pointers of LLVM type i8\*). We represent input edges of a node as integer indices into the list of function arguments, and output edges of a node as integer indices into the return struct type. - -Pointer arguments of node functions are required to be annotated with attributes in, and/or out, depending on their expected use (read only, write only, read write). - -<a name="describing"></a> -## Intrinsics for Describing Graphs - -The intrinsics for describing graphs can only be used by internal nodes. Also, internal nodes are only allowed to have these intrinsics as part of their node function, with the exception of a return statement of the appropriate type, in order to return the result of the outgoing dataflow edges. - - -```i8* llvm.hpvm.createNode(i8* F)``` -Create a static dataflow node with one dynamic instance executing node function ```F```. Return a handle to the created node. - -```i8* llvm.hpvm.createNode1D(i8* F, i64 n1)``` -Create a static dataflow node replicated in one dimension, namely ```x```, with ```n1``` dynamic instances executing node function ```F```. Return a handle to the created node. - -```i8* llvm.hpvm.createNode2D(i8* F, i64 n1, i64 n2)``` -Create a static dataflow node replicated in two dimensions, namely ```x``` and ```y```, with ```n1``` and ```n2``` dynamic instances in each dimension respectively, executing node function ```F```. Return a handle to the created node. - -```i8* llvm.hpvm.createNode3D(i8* F, i64 n1, i64 n2, i64 n3)``` -Create a static dataflow node replicated in three dimensions, namely ```x```, ```y``` and ```z```, with ```n1```, ```n2``` and ```n3``` dynamic instances in each dimension respectively, executing node function ```F```. Return a handle to the created node. - -```i8* llvm.hpvm.createEdge(i8* Src, i8* Dst, i1 ReplType, i32 sp, i32 dp, i1 isStream)``` -Create edge from output ```sp``` of node ```Src``` to input ```dp``` of node ```Dst```. Argument ```dp``` of ```Dst```'s node function and field ```sp``` of the return struct in ```Src```'s node function must have matching types. ```ReplType``` chooses between a one-to-one (0) or all-to-all (1) edge. ```isStream``` chooses a streaming (1) or non streaming (0) edge. Return a handle to the created edge. - -```void llvm.hpvm.bind.input(i8* N, i32 ip, i32 ic, i1 isStream)``` -Bind input ```ip``` of current node to input ```ic``` of child node ```N```. Argument ```ic``` of ```N```'s node function and argument ```ip``` of the current node function must have matching types. ```isStream``` chooses a streaming (1) or non streaming (0) bind. - -```void llvm.hpvm.bind.output(i8* N, i32 oc, i32 op, i1 isStream)``` -Bind output ```oc``` of child node ```N``` to output ```op``` of current node. Field ```oc``` of the return struct in ```N```'s node function and field ```op``` of the return struct in the current node function must have matching types. ```isStream``` chooses a streaming (1) or non streaming (0) bind. - -<a name="querying"></a> -## Intrinsics for Querying Graphs - -The following intrinsics are used to query the structure of the DFG. They can only be used by leaf nodes. - -```i8* llvm.hpvm.getNode()``` -Return a handle to the current leaf node. - -```i8* llvm.hpvm.getParentNode(i8* N)``` -Return a handle to the parent in the hierarchy of node ```N```. - -```i32 llvm.hpvm.getNumDims(i8* N)``` -Get the number of dimensions of node ```N```. - -```i64 llvm.hpvm.getNodeInstanceID.{x,y,z}(i8* N)``` -Get index of current dynamic node instance of node ```N``` in dimension x, y or z respectively. The dimension must be one of the dimensions in which the node is replicated. - -```i64 llvm.hpvm.getNumNodeInstances.{x,y,z}(i8* N)``` -Get number of dynamic instances of node ```N``` in dimension x, y or z respectively. The dimension must be one of the dimensions in which the node is replicated. - -<a name="memory"></a> -## Intrinsics for Memory Allocation and Synchronization - -The following intrinsics are used for memory allocation and synchronization. They can only be used by leaf nodes. - -```i8* llvm.hpvm.malloc(i64 nBytes)``` -Allocate a block of memory of size ```nBytes``` and return pointer to it. The allocated object can be shared by all nodes. -*Note that the returned pointer must somehow be communicated explicitly for use by other nodes.* - -```i32 llvm.hpvm.atomic.add(i8* m, i32 v)``` -Atomically computes the bitwise ADD of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```i32 llvm.hpvm.atomic.sub(i8* m, i32 v)``` -Atomically computes the bitwise SUB of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```i32 llvm.hpvm.atomic.min(i8* m, i32 v)``` -Atomically computes the bitwise MIN of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```i32 llvm.hpvm.atomic.max(i8* m, i32 v)``` -Atomically computes the bitwise MAX of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```i32 llvm.hpvm.atomic.xchg(i8* m, i32 v)``` -Atomically computes the bitwise XCHG of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```i32 llvm.hpvm.atomic.and(i8* m, i32 v)``` -Atomically computes the bitwise AND of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```i32 llvm.hpvm.atomic.or(i8* m, i32 v)``` -Atomically computes the bitwise OR of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```i32 llvm.hpvm.atomic.xor(i8* m, i32 v)``` -Atomically computes the bitwise XOR of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```. - -```void llvm.hpvm.barrier()``` -Local synchronization barrier across dynamic instances of current leaf node. - -<a name="interaction"></a> -## Intrinsics for Graph Interaction - -The following intrinsics are for graph initialization/termination and interaction with the host code, and can be used only by the host code. - -```void llvm.hpvm.init()``` -Initialization of HPVM runtime. - -```void llvm.hpvm.cleanup()``` -Cleanup of HPVM runtime created objects. - -```void llvm.hpvm.trackMemory(i8* ptr, i64 sz)``` -Insert memory starting at ```ptr``` of size ```sz``` in the memory tracker. ```ptr``` becomes the key for identifying this memory object. As soon as a memory object is inserted in the memory tracker it starts being tracked, and can be passed as a data item to a DFG. - -```void llvm.hpvm.untrackMemory(i8* ptr)``` -Stop tracking memory object with key ```ptr```, and remove it from memory tracker. - -```void llvm.hpvm.requestMemory(i8* ptr, i64 sz)``` -If memory object with key ```ptr``` is not located in host memory, copy it to host memory. - -```i8* llvm.hpvm.launch(i8* RootGraph, i8* Args, i1 isStream)``` -Launch the execution of a top-level DFG with root node function ```RootGraph```. ```Args``` is a pointer to a packed struct, containing one field per argument of the ```RootGraph``` function, consecutively. For non-streaming DFGs with a non empty result type, ```Args``` must contain an additional field of the type ```RootGraph.returnTy```, where the result of the graph will be returned. ```isStream``` chooses between a non streaming (0) or streaming (1) graph execution. Return a handle to the invoked DFG. - -```void llvm.hpvm.wait(i8* GraphID)``` -Wait for completion of execution of DFG with handle ```GraphID```. - -```void llvm.hpvm.push(i8* GraphID, i8* args)``` -Push set of input data ```args``` (same as type included in launch) to streaming DFG with handle ```GraphID```. - -```i8* llvm.hpvm.pop(i8* GraphID)``` -Pop and return data from streaming DFG with handle ```GraphID```. The return type is a struct containing a field for every output of DFG. - -<a name="limitations"></a> -## Implementation Limitations -Due to limitations of our current prototype implementation, the following restrictions are imposed: - -- In HPVM, a memory object is represented as a (pointer, size) pair that includes the address of memory object, and the size (in bytes) of the pointed-to object. Therefore, when an edge/bind carries a pointer, it must be followed by an i64 size value. -- Pointers cannot be transferred between nodes using dataflow edges. Instead, they should be passed using the bind operation from the (common) parent of the source and sink nodes. - -- Instantiation of dataflow nodes is supported in up to three dimensions. diff --git a/hpvm/docs/index.rst b/hpvm/docs/index.rst new file mode 100644 index 0000000000..f5031e7b43 --- /dev/null +++ b/hpvm/docs/index.rst @@ -0,0 +1,40 @@ +.. _contents: + +The HPVM Compiler Infrastructure +================================ + +HPVM is a compiler for heterogeneous parallel system. +For more about what HPVM is, see `our website <https://publish.illinois.edu/hpvm-project/>`_ +and publications: +`PPoPP'18 paper <https://dl.acm.org/doi/pdf/10.1145/3200691.3178493>`_, +`OOPSLA'19 paper <https://dl.acm.org/doi/10.1145/3360612>`_, +`PPoPP'21 paper <https://dl.acm.org/doi/10.1145/3437801.3446108>`_. + +This is the documentation of HPVM at **version 1.0**. + +Audience +-------- + +TODO: write something here. + +Documentation +------------- + +.. toctree:: + :maxdepth: 1 + + install + getting-started + tests + components/index + references/index + +Indices and tables +------------------ + +* :ref:`genindex` + +Support +------- + +All questions can be directed to `hpvm-dev@lists.cs.illinois.edu <mailto:hpvm-dev@lists.cs.illinois.edu>`_. diff --git a/hpvm/docs/install.rst b/hpvm/docs/install.rst new file mode 100644 index 0000000000..16bfc7f5cc --- /dev/null +++ b/hpvm/docs/install.rst @@ -0,0 +1,153 @@ +Install +=============== + +Dependencies +------------ + +The following components are required to be installed on your machine to build HPVM. + +* GCC (>=5.1) + + * In addition, each version of CUDA-nvcc requires GCC to be not newer than a certain version. + See `here <https://gist.github.com/ax3l/9489132>`_ for the support matrix. + +* CMake (>=3.17) +* GNU Make (>=3.79) +* OpenCL (>=1.0.0) +* CUDA (>=9.1) +* Python (==3.6) with pip (>=20) + +Python must be strictly 3.6 (any subversion from 3.6.0 to 3.6.13). +Alternatively, if you use Anaconda for package management, +we provide a conda environment file that covers all Python and package requirements: + +.. code-block:: bash + + conda env create -n hpvm -f hpvm/env.yaml + + +Supported Architectures +----------------------- + +Supported/tested CPU architectures: + +* Intel Xeon E5-2640 +* Intel Xeon W-2135 +* ARM Cortex A-57 + +Supported/tested GPU architectures for OpenCL backend: + +* Nvidia Quadro P1000 +* Nvidia GeForce GTX 1080 + +Supported/tested GPU architectures for Tensor Backend: + +* Nvidia Jetson TX2 +* Nvidia GeForce GTX 1080 + +HPVM has not been tested but might work on other CPUs supported by LLVM Backend, +and GPUs supported by OpenCL such as Intel, AMD, etc. + +**NOTE**: Approximations are tuned for Jetson TX2 and same speedups may not exist for other architectures. + + +Installing from Source +---------------------- + +Checkout HPVM and go to directory ``./hpvm`` under project root: + +.. code-block:: shell + + git clone --recursive -b approx_hpvm_reorg --single-branch https://gitlab.engr.illinois.edu/llvm/hpvm.git + cd hpvm/ + +HPVM needs to be able to find CUDA. +If CUDA is installed in your system's $PATH (e.g. if it was installed at the default location), +HPVM can find CUDA automatically. + +Use HPVM installer script to download, configure and build HPVM along with LLVM and Clang: + +.. code-block:: shell + + ./install.sh + + +* Without arguments, this script will interactively prompt you for some parameters. + Alternatively, use ``./install.sh -h`` for a list of available arguments + and pass arguments as required. + +After configuring HPVM, +the installer will also compile HPVM by default, which you can opt out of. +If you do so, follow the next section "Manually Build HPVM" to manually compile HPVM, +and "Benchmarks and Tests" to manually run test cases if you wish so. +Otherwise, you can skip the next 2 sections. + +* Specifically, the HPVM installer downloads LLVM, and Clang, copies HPVM source into + llvm/tools and builds the entire tree. It also builds a modified LLVM C-Backend, + based on the one maintained by `Julia Computing <https://github.com/JuliaComputing/llvm-cbe>`_, + as a part of HPVM and is currently used to generate OpenCL kernels for GPUs. + +TroubleShooting +^^^^^^^^^^^^^^^ + +If CMake did not find your CUDA, some environment variables will help it: + +* ``CUDA_TOOLKIT_PATH`` --- Path to the CUDA toolkit +* ``CUDA_INCLUDE_PATH`` --- Path to the CUDA headers +* ``CUDA_LIB_PATH`` --- Path to CUDA libraries + +You can use ``set_paths.sh`` for this purpose: modify the values of these variables +in ``set_paths.sh`` according to your system, and source the script: + +.. code-block:: shell + + source set_paths.sh + +Manually Build HPVM +------------------- + +Alternatively, you can manually build HPVM with CMake. +Please note that in this case, +the installer script still *must* be executed to obtain some required components, +but without the build step. +In current directory (``hpvm/``), do + +.. code-block:: shell + + mkdir build + cd build + cmake ../llvm [options] + export PATH=$(realpath ./bin):$PATH + +**Note** that you must *manually add ``build/bin`` directory to your $PATH variable* +as absolute path (as shown above). + +Some common options that can be used with CMake are: + +* ``-DCMAKE_INSTALL_PREFIX=directory`` --- Specify for directory the full pathname of where you want the HPVM tools and libraries to be installed. +* ``-DCMAKE_BUILD_TYPE=type`` --- Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is Debug. +* ``-DLLVM_ENABLE_ASSERTIONS=On`` --- Compile with assertion checks enabled (default is Yes for Debug builds, No for all other build types). + +Now, compile the HPVM Compilation Tool ``approxhpvm.py`` using: + +.. code-block:: shell + + make -j<number of threads> approxhpvm.py + +With all the aforementioned steps, HPVM should be built, installed, tested and ready to use. +In particular, ``approxhpvm.py`` should be an executable command from your command line. + +Benchmarks and Tests +-------------------- + +We provide a number of general benchmarks, DNN benchmarks, and test cases, written in HPVM. + +``make`` targets ``check-hpvm-pass``, ``check-hpvm-dnn``, and ``check-hpvm-profiler`` +tests various components of HPVM and are increasingly time-consuming. +You can run tests similarly as how ``approxhpvm.py`` is compiled: for example, + +.. code-block:: shell + + make -j<number of threads> check-hpvm-pass + +runs ``check-hpvm-pass`` tests. See TODO for details on benchmarks and test cases. diff --git a/hpvm/docs/references/compilation-process.rst b/hpvm/docs/references/compilation-process.rst new file mode 100644 index 0000000000..ab3f392f4f --- /dev/null +++ b/hpvm/docs/references/compilation-process.rst @@ -0,0 +1,21 @@ +HPVM Compilation Process +======================== + +Compilation of an HPVM program involves the following steps: + + +#. ``clang`` takes an HPVM-C/C++ program (e.g. ``main.c``) and produces an LLVM IR (``main.ll``) file that contains the HPVM-C function calls. The declarations of these functions are defined in ``test/benchmark/include/hpvm.h``, which must be included in the program. +#. ``opt`` takes (``main.ll``) and invoke the GenHPVM pass on it, which converts the HPVM-C function calls to HPVM intrinsics. This generates the HPVM textual representation (``main.hpvm.ll``). +#. ``opt`` takes the HPVM textual representation (``main.hpvm.ll``) and invokes the following passes in sequence: + + * BuildDFG: Converts the textual representation to the internal HPVM representation. + * LocalMem and DFG2LLVM_OpenCL: Invoked only when GPU target is selected. Generates the kernel module (``main.kernels.ll``) and the portion of the host code that invokes the kernel into the host module (``main.host.ll``). + * DFG2LLVM_CPU: Generates either all, or the remainder of the host module (``main.host.ll``) depending on the chosen target. + * ClearDFG: Deletes the internal HPVM representation from memory. + +#. ``clang`` is used to to compile any remaining project files that would be later linked with the host module. +#. ``llvm-link`` takes the host module and all the other generate ``ll`` files, and links them with the HPVM runtime module (``hpvm-rt.bc``), to generate the linked host module (``main.host.linked.ll``). +#. Generate the executable code from the generated ``ll`` files for all parts of the program: + + * GPU target: ``llvm-cbe`` takes the kernel module (``main.kernels.ll``) and generates an OpenCL representation of the kernels that will be invoked by the host. + * CPU target: ``clang`` takes the linked host module (``main.host.linked.ll``) and generates the CPU binary. diff --git a/hpvm/docs/references/hpvm-c.rst b/hpvm/docs/references/hpvm-c.rst new file mode 100644 index 0000000000..8956bf0c87 --- /dev/null +++ b/hpvm/docs/references/hpvm-c.rst @@ -0,0 +1,151 @@ +.. role:: raw-html-m2r(raw) + :format: html + + +HPVM-C Language Specification +============================= + +An HPVM program is a combination of host code and one or more data flow graphs (DFG) at the IR level. We provide C function declarations representing the HPVM intrinsics that allow creating, querying, and interacting with the DFGs. More details about the HPVM IR intrinsics can be found in `the HPVM IR Specification <hpvm-specification.html>`_. + +An HPVM-C program contains both the host and the DFG code. Each HPVM kernel, represented by a leaf node in the DFG, can be compiled to multiple different targets (e.g. CPU and GPU) as described below. + +This document describes all the API calls that can be used in an HPVM-C program. + +Host API +-------- + +``void __hpvm__init()``:raw-html-m2r:`<br>` +Used before all other HPVM calls to initialize the HPVM runtime. + +``void __hpvm__cleanup()``:raw-html-m2r:`<br>` +Used at the end of HPVM program to clean up all remaining runtime-created HPVM objects. + +``void llvm_hpvm_track_mem(void* ptr, size_t sz)``:raw-html-m2r:`<br>` +Insert memory starting at ``ptr`` of size ``sz`` in the memory tracker of HPVM runtime. + +``void llvm_hpvm_untrack_mem(void* ptr)``:raw-html-m2r:`<br>` +Stop tracking the memory object identified by ``ptr``. + +``void llvm_hpvm_request_mem(void* ptr, size_t sz)``:raw-html-m2r:`<br>` +If the memory object identified by ``ptr`` is not in host memory, copy it to host memory. + +``void* __hpvm__launch(unsigned isStream, void* rootGraph, void* args)``:raw-html-m2r:`<br>` +Launches the execution of the dataflow graph with node function ``rootGraph``. ``args`` is a pointer to a packed struct, containing one field per argument of the RootGraph function, consecutively. For non-streaming DFGs with a non empty result type, ``args`` must contain an additional field of the type ``RootGraph.returnTy``, where the result of the graph will be returned. ``isStream`` chooses between a non streaming (0) or streaming (1) graph execution. Returns a handle to the executing graph. + +``void __hpvm__wait(void* G)``:raw-html-m2r:`<br>` +Waits for completion of execution of the dataflow graph with handle ``G``. + +``void __hpvm__push(void* G, void* args)``:raw-html-m2r:`<br>` +Push set of input data items, ``args``, (same as type included in launch) to streaming DFG with handle ``G``. + +``void* __hpvm__pop(void* G)``:raw-html-m2r:`<br>` +Pop and return data produced from one execution of streaming DFG with handle ``G``. The return type is a struct containing a field for every output of DFG. + +Internal Node API +----------------- + +``void* __hpvm__createNodeND(unsigned dims, void* F, ...)``:raw-html-m2r:`<br>` +Creates a static dataflow node replicated in ``dims`` dimensions (0 to 3), each executing node function ``F``. The arguments following ``F`` are the size of each dimension, respectively, passed in as a ``size_t``. Returns a handle to the created dataflow node. + +``void* __hpvm__edge(void* src, void* dst, unsigned replType, unsigned sp, unsigned dp, unsigned isStream)``:raw-html-m2r:`<br>` +Creates an edge from output ``sp`` of node ``src`` to input ``dp`` of node ``dst``. If ``replType`` is 0, the edge is a one-to-one edge, otherwise it is an all-to-all edge. ``isStream`` defines whether or not the edge is streaming. Returns a handle to the created edge. + +``void __hpvm__bindIn(void* N, unsigned ip, unsigned ic, unsigned isStream)``:raw-html-m2r:`<br>` +Binds the input ``ip`` of the current node to input ``ic`` of child node function ``N``. ``isStream`` defines whether or not the input bind is streaming. + +``void __hpvm__bindOut(void* N, unsigned op, unsigned oc, unsigned isStream)``:raw-html-m2r:`<br>` +Binds the output ``op`` of the current node to output ``oc`` of child node function ``N``. ``isStream`` defines whether or not the output bind is streaming. + +``void __hpvm__hint(enum Target target)`` (C):raw-html-m2r:`<br>` +``void __hpvm__hint(hpvm::Target target)`` (C++):raw-html-m2r:`<br>` +Must be called once in each node function. Indicates which hardware target the current function should run in. + +``void __hpvm__attributes(unsigned ni, …, unsigned no, …)``:raw-html-m2r:`<br>` +Must be called once at the beginning of each node function. Defines the properties of the pointer arguments to the current function. ``ni`` represents the number of input arguments, and ``no`` the number of output arguments. The arguments following ``ni`` are the input arguments, and the arguments following ``no`` are the output arguments. Arguments can be marked as both input and output. All pointer arguments must be included. + +Leaf Node API +------------- + +``void __hpvm__hint(enum Target target)`` (C):raw-html-m2r:`<br>` +``void __hpvm__hint(hpvm::Target target)`` (C++):raw-html-m2r:`<br>` +As described in internal node API. + +``void __hpvm__attributes(unsigned ni, …, unsigned no, …)``:raw-html-m2r:`<br>` +As described in internal node API. + +``void __hpvm__return(unsigned n, ...)``:raw-html-m2r:`<br>` +Returns ``n`` values from a leaf node function. The remaining arguments are the values to be returned. All ``__hpvm__return`` statements within the same function must return the same number of values. + +``void* __hpvm__getNode()``:raw-html-m2r:`<br>` +Returns a handle to the current leaf node. + +``void* __hpvm__getParentNode(void* N)``:raw-html-m2r:`<br>` +Returns a handle to the parent node of node ``N``. + +``long __hpvm__getNodeInstanceID_{x,y,z}(void* N)``:raw-html-m2r:`<br>` +Returns the dynamic ID of the current instance of node ``N`` in the x, y, or z dimension respectively. The dimension must be one of the dimensions in which the node is replicated. + +``long __hpvm__getNumNodeInstances_{x,y,z}(void* N)``:raw-html-m2r:`<br>` +Returns the number of dynamic instances of node ``N`` in the x, y, or z dimension respectively. The dimension must be one of the dimensions in which the node is replicated. + +``void* __hpvm__malloc(long nBytes)``:raw-html-m2r:`<br>` +Allocate a block of memory of size ``nBytes`` and returns a pointer to it. The allocated object can be shared by all nodes. *Note that the returned pointer must somehow be communicated explicitly for use by other nodes.* + +``int __hpvm__atomic_add(int* m, int v)``:raw-html-m2r:`<br>` +Atomically adds ``v`` to the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``int __hpvm__atomic_sub(int* m, int v)``:raw-html-m2r:`<br>` +Atomically subtracts ``v`` from the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``int __hpvm__atomic_min(int* m, int v)``:raw-html-m2r:`<br>` +Atomically computes the min of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``int __hpvm__atomic_max(int* m, int v)``:raw-html-m2r:`<br>` +Atomically computes the max of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``int __hpvm__atomic_xchg(int* m, int v)``:raw-html-m2r:`<br>` +Atomically swaps ``v`` with the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``int __hpvm__atomic_and(int* m, int v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise AND of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``int __hpvm__atomic_or(int* m, int v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise OR of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``int __hpvm__atomic_xor(int* m, int v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise XOR of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``void __hpvm__barrier()``:raw-html-m2r:`<br>` +Local synchronization barrier across dynamic instances of current leaf node. + +Porting a Program from C to HPVM-C +================================== + +The following represents the required steps to port a regular C program into an HPVM program with HPVM-C. These steps are described at a high level; for more detail, please see `hpvm-cava </hpvm/test/benchmarks/hpvm-cava>`_ provided in `benchmarks </hpvm/test/benchmarks>`_. + + +* Separate the computation that will become a kernel into its own (leaf node) function and add the attributes and target hint. +* Create a level 1 wrapper node function that will describe the thread-level parallelism (for the GPU). The node will: + + * Use the ``createNode[ND]()`` method to create a kernel node and specify how many threads will execute it. + * Bind its arguments to the kernel arguments. + +* If desired, create a level 2 wrapper node function which will describe the threadblock-level parallalism (for the GPU). This node will: + + * Use the ``createNode[ND]()`` method to create a level 1 wrapper node and specify how many threadblocks will execute it. + * Bind its arguments to its child node's arguments. + +* A root node function that creates all the top-level wrapper nodes, binds their arguments, and connects their edges. + + * Each root node represents a DFG. + +* All the above node functions have the combined arguments of all the kernels that are nested at each level. +* The host code will have to include the following: + + * Initialize the HPVM runtime using the ``init()`` method. + * Create an argument struct for each DFG and assign its member variables. + * Add all the memory that is required by the kernel into the memory tracker. + * Launch the DFG by calling the ``launch()`` method on the root node function, and passing the corresponding argument struct. + * Wait for the DFG to complete execution. + * Read out any generated memory using the ``request_mem()`` method. + * Remove all the tracked memory from the memory tracker. diff --git a/hpvm/docs/references/hpvm-specification.rst b/hpvm/docs/references/hpvm-specification.rst new file mode 100644 index 0000000000..90226b333d --- /dev/null +++ b/hpvm/docs/references/hpvm-specification.rst @@ -0,0 +1,266 @@ +.. role:: raw-html-m2r(raw) + :format: html + +HPVM Abstraction +================ + +Table of Contents +------------------ + +* `HPVM Abstraction <#abstraction>`_ + + * `Dataflow Node <#node>`_ + * `Dataflow Edge <#edge>`_ + * `Input and Output Bind <#bind>`_ + * `Host Code <#host>`_ + +* `HPVM Implementation <#implementation>`_ + + * `Intrinsics for Describing Graphs <#describing>`_ + * `Intrinsics for Querying Graphs <#querying>`_ + * `Intrinsics for Memory Allocation and Synchronization <#memory>`_ + * `Intrinsics for Graph Interaction <#interaction>`_ + +* `Implementation Limitations <#limitations>`_ + +:raw-html-m2r:`<a name="abstraction"></a>` + +An HPVM program is a combination of host code plus a set of one or more distinct dataflow graphs. Each dataflow graph (DFG) is a hierarchical graph with side effects. The DFG must be acyclic. Nodes represent units of execution, and edges between nodes describe the explicit data transfer requirements. A node can begin execution once a data item becomes available on every one of its input edges. Repeated transfer of data items between nodes (if more inputs are provided) yields a pipelined execution of different nodes in the graph. The execution of a DFG is initiated and terminated by host code that launches the graph. Nodes may access globally shared memory through load and store instructions (side-effects). + +:raw-html-m2r:`<a name="node"></a>` + +Dataflow Node +------------- + +A *dataflow node* represents unit of computation in the DFG. A node can begin execution once a data item becomes available on every one of its input edges. + +A single static dataflow node represents multiple dynamic instances of the node, each executing the same computation with different index values used to uniquely identify each dynamic instance w.r.t. the others. The dynamic instances of a node may be executed concurrently, and any required synchronization must imposed using HPVM synchronization operations. + +Each dataflow node in a DFG can either be a *leaf node* or an *internal node*. An internal node contains a complete DFG, called a *child graph*, and the child graph itself can have internal nodes and/or leaf nodes. + +Internal nodes only create the structure of the child graph, and cannot include actual computation. + +Leaf nodes contain code expressing actual computations. Leaf nodes may contain instructions to query the structure of the underlying DFG, and any non host side HPVM operation for synchronization and memory allocation. + +Note that the graph is fully interpreted at compile-time and cannot be modified at runtime except for the number of dynamic instances, which can be data dependent. + +:raw-html-m2r:`<a name="edge"></a>` + +Dataflow Edge +------------- + +A *dataflow edge* from the output ``out`` of a source dataflow node ``Src`` to the input ``in`` of a sink dataflow node ``Dst`` describes the explicit data transfer requirements. ``Src`` and ``Dst`` node must belong to the same child graph, i.e. must be children of the same internal node. + +An edge from source to sink has the semantics of copying the specified data from the source to the sink after the source node has completed execution. The pairs ``(Src, out)`` and ``(Dst, in)``, representing source and sink respectively, must be unique w.r.t. every other edge in the same child graph, i.e. two dataflow edges in the same child graph cannot have the same source or destination. + +A static edge also represents multiple dynamic instances of that edge between the dynamic instances of the source and the sink nodes. + +An edge can be instantiated at runtime using one of two replication mechanisms: + + +* *All-to-all*, where all dynamic instances of the source node are connected to all dynamic instances of the sink node, thus expressing a synchronization barrier between the two groups of nodes, or +* *One-to-one*, where each dynamic instance of the source node is connected to a single corresponding instance of the sink node. One-to-one replication requires that the grid structure (number of dimensions and the extents in each dimension) of the source and sink nodes be identical. + +:raw-html-m2r:`<a name="bind"></a>` + +Input and Output Bind +--------------------- + +An internal node is responsible for mapping its inputs, provided by incoming dataflow edges, to the inputs to one or more nodes of the child graph. + +An internal node binds its input ``ip`` to input ``ic`` of its child node ``Dst`` using an *input bind*. +The pair ``(Dst, ic)`` must be unique, i.e. no two input binds in the same graph can have the same destination, as that would create a conflict. Semantically, these represent name bindings of input values and not data movement. + +Conversely, an internal node binds output ``oc`` of its child node ``Src`` to its output ``op`` using an *output bind*. The pair ``(Src, oc)`` and destination ``op`` must be unique, i.e. no two output binds in the same graph can have the same source destination, as that would create a conflict. + +A bind is always **all-to-all**. + +:raw-html-m2r:`<a name="host"></a>` + +Host Code +--------- + +In an HPVM program, the host code is responsible for setting up, initiating the execution and blocking for completion of a DFG. The host can interact with the DFG to sustain a streaming computation by sending all data required for, and receiving all data produced by, one execution of the DFG. The list of actions that can be performed by the host is described below: + + +* **Initialization and Cleanup**: + All HPVM operations must be enclosed by the HPVM initialization and cleanup. These operations perform initialization and cleanup of runtime constructs that provide the runtime support for HPVM. +* **Track Memory**: + Memory objects that are passed to dataflow graphs need to be managed by the HPVM runtime. Our memory model assumes two separate address spaces for host and device memory. The HPVM runtime includes a memory tracker for tracking the location of HPVM-managed memory objects between these address spaces. Track memory inserts the specified memory object in the memory tracker and starts tracking it. +* **Untrack Memory**: + Stop tracking specified memory object and remove it from memory tracker. +* **Request Memory**: + If the specified memory object is not present in host memory, copy it to host memory. +* **Launch**: + The host code initiates execution of specified DFG, either streaming or non streaming. + + * Non streaming DFG: The host provides all data items required for execution of the DFG at the time of the launch. + * Streaming DFG: No data is provided by the launch operation. Streaming execution is sustained by push and pop operations, described below. + +* **Push**: + Push a set of data items required for one graph execution to the specified DFG. The DFG must have been launched using a streaming launch operation. This is a blocking operation. +* **Pop**: + Read data produced from one execution of the specified DFG. The DFG must have been launched using a streaming launch operation. This is a blocking operation. +* **Wait**: + The host code blocks for completion of specified DFG. + + * For a non-streaming DFG, the data produced by the DFG are ready to be read by the host. + * For a streaming DFG, no more data may be provided for processing by the DFG. + +:raw-html-m2r:`<a name="implementation"></a>` + +HPVM Implementation +=================== + +This section describes the implementation of HPVM on top of LLVM IR. + +iN is the N-bit integer type in LLVM. + +We use intrinsic functions to implement the HPVM IR. + +The code for each dataflow node is given as a separate LLVM function, called the node function. The node function may call additional, auxiliary functions. However, the auxiliary functions are not allowed to include any HPVM intrinsics, as they are not considered to be part of the HPVM node hierarchy. + +The incoming dataflow edges and their data types are denoted by the parameters to the node function. The outgoing dataflow edges are represented by the return type of the node function, which must be an LLVM struct type with zero or more fields (one per outgoing edge). + +Each top-level DFG in an HPVM program is defined by its own *root node function* which creates the underlying DFG structure. The DFG is the (internal) root node's child graph. Unlike regular internal nodes, the root node only has one dynamic instance because it instantiates the top-level DFG. The DFG is launched by the host code using the root node function, as described below. + +We represent nodes with opaque handles (pointers of LLVM type i8*). We represent input edges of a node as integer indices into the list of function arguments, and output edges of a node as integer indices into the return struct type. + +Pointer arguments of node functions are required to be annotated with attributes in, and/or out, depending on their expected use (read only, write only, read write). + +:raw-html-m2r:`<a name="describing"></a>` + +Intrinsics for Describing Graphs +-------------------------------- + +The intrinsics for describing graphs can only be used by internal nodes. Also, internal nodes are only allowed to have these intrinsics as part of their node function, with the exception of a return statement of the appropriate type, in order to return the result of the outgoing dataflow edges. + +``i8* llvm.hpvm.createNode(i8* F)``:raw-html-m2r:`<br>` +Create a static dataflow node with one dynamic instance executing node function ``F``. Return a handle to the created node. + +``i8* llvm.hpvm.createNode1D(i8* F, i64 n1)``:raw-html-m2r:`<br>` +Create a static dataflow node replicated in one dimension, namely ``x``, with ``n1`` dynamic instances executing node function ``F``. Return a handle to the created node. + +``i8* llvm.hpvm.createNode2D(i8* F, i64 n1, i64 n2)``:raw-html-m2r:`<br>` +Create a static dataflow node replicated in two dimensions, namely ``x`` and ``y``, with ``n1`` and ``n2`` dynamic instances in each dimension respectively, executing node function ``F``. Return a handle to the created node. + +``i8* llvm.hpvm.createNode3D(i8* F, i64 n1, i64 n2, i64 n3)``:raw-html-m2r:`<br>` +Create a static dataflow node replicated in three dimensions, namely ``x``, ``y`` and ``z``, with ``n1``, ``n2`` and ``n3`` dynamic instances in each dimension respectively, executing node function ``F``. Return a handle to the created node. + +``i8* llvm.hpvm.createEdge(i8* Src, i8* Dst, i1 ReplType, i32 sp, i32 dp, i1 isStream)``:raw-html-m2r:`<br>` +Create edge from output ``sp`` of node ``Src`` to input ``dp`` of node ``Dst``. Argument ``dp`` of ``Dst``'s node function and field ``sp`` of the return struct in ``Src``'s node function must have matching types. ``ReplType`` chooses between a one-to-one (0) or all-to-all (1) edge. ``isStream`` chooses a streaming (1) or non streaming (0) edge. Return a handle to the created edge. + +``void llvm.hpvm.bind.input(i8* N, i32 ip, i32 ic, i1 isStream)``:raw-html-m2r:`<br>` +Bind input ``ip`` of current node to input ``ic`` of child node ``N``. Argument ``ic`` of ``N``'s node function and argument ``ip`` of the current node function must have matching types. ``isStream`` chooses a streaming (1) or non streaming (0) bind. + +``void llvm.hpvm.bind.output(i8* N, i32 oc, i32 op, i1 isStream)``:raw-html-m2r:`<br>` +Bind output ``oc`` of child node ``N`` to output ``op`` of current node. Field ``oc`` of the return struct in ``N``'s node function and field ``op`` of the return struct in the current node function must have matching types. ``isStream`` chooses a streaming (1) or non streaming (0) bind. + +:raw-html-m2r:`<a name="querying"></a>` + +Intrinsics for Querying Graphs +------------------------------ + +The following intrinsics are used to query the structure of the DFG. They can only be used by leaf nodes. + +``i8* llvm.hpvm.getNode()``:raw-html-m2r:`<br>` +Return a handle to the current leaf node. + +``i8* llvm.hpvm.getParentNode(i8* N)``:raw-html-m2r:`<br>` +Return a handle to the parent in the hierarchy of node ``N``. + +``i32 llvm.hpvm.getNumDims(i8* N)``:raw-html-m2r:`<br>` +Get the number of dimensions of node ``N``. + +``i64 llvm.hpvm.getNodeInstanceID.{x,y,z}(i8* N)``:raw-html-m2r:`<br>` +Get index of current dynamic node instance of node ``N`` in dimension x, y or z respectively. The dimension must be one of the dimensions in which the node is replicated. + +``i64 llvm.hpvm.getNumNodeInstances.{x,y,z}(i8* N)``:raw-html-m2r:`<br>` +Get number of dynamic instances of node ``N`` in dimension x, y or z respectively. The dimension must be one of the dimensions in which the node is replicated. + +:raw-html-m2r:`<a name="memory"></a>` + +Intrinsics for Memory Allocation and Synchronization +---------------------------------------------------- + +The following intrinsics are used for memory allocation and synchronization. They can only be used by leaf nodes. + +``i8* llvm.hpvm.malloc(i64 nBytes)``:raw-html-m2r:`<br>` +Allocate a block of memory of size ``nBytes`` and return pointer to it. The allocated object can be shared by all nodes.:raw-html-m2r:`<br>` +*Note that the returned pointer must somehow be communicated explicitly for use by other nodes.* + +``i32 llvm.hpvm.atomic.add(i8* m, i32 v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise ADD of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``i32 llvm.hpvm.atomic.sub(i8* m, i32 v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise SUB of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``i32 llvm.hpvm.atomic.min(i8* m, i32 v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise MIN of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``i32 llvm.hpvm.atomic.max(i8* m, i32 v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise MAX of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``i32 llvm.hpvm.atomic.xchg(i8* m, i32 v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise XCHG of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``i32 llvm.hpvm.atomic.and(i8* m, i32 v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise AND of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``i32 llvm.hpvm.atomic.or(i8* m, i32 v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise OR of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``i32 llvm.hpvm.atomic.xor(i8* m, i32 v)``:raw-html-m2r:`<br>` +Atomically computes the bitwise XOR of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``. + +``void llvm.hpvm.barrier()``:raw-html-m2r:`<br>` +Local synchronization barrier across dynamic instances of current leaf node. + +:raw-html-m2r:`<a name="interaction"></a>` + +Intrinsics for Graph Interaction +-------------------------------- + +The following intrinsics are for graph initialization/termination and interaction with the host code, and can be used only by the host code. + +``void llvm.hpvm.init()``:raw-html-m2r:`<br>` +Initialization of HPVM runtime. + +``void llvm.hpvm.cleanup()``:raw-html-m2r:`<br>` +Cleanup of HPVM runtime created objects. + +``void llvm.hpvm.trackMemory(i8* ptr, i64 sz)``:raw-html-m2r:`<br>` +Insert memory starting at ``ptr`` of size ``sz`` in the memory tracker. ``ptr`` becomes the key for identifying this memory object. As soon as a memory object is inserted in the memory tracker it starts being tracked, and can be passed as a data item to a DFG. + +``void llvm.hpvm.untrackMemory(i8* ptr)``:raw-html-m2r:`<br>` +Stop tracking memory object with key ``ptr``, and remove it from memory tracker. + +``void llvm.hpvm.requestMemory(i8* ptr, i64 sz)``:raw-html-m2r:`<br>` +If memory object with key ``ptr`` is not located in host memory, copy it to host memory. + +``i8* llvm.hpvm.launch(i8* RootGraph, i8* Args, i1 isStream)``:raw-html-m2r:`<br>` +Launch the execution of a top-level DFG with root node function ``RootGraph``. ``Args`` is a pointer to a packed struct, containing one field per argument of the ``RootGraph`` function, consecutively. For non-streaming DFGs with a non empty result type, ``Args`` must contain an additional field of the type ``RootGraph.returnTy``, where the result of the graph will be returned. ``isStream`` chooses between a non streaming (0) or streaming (1) graph execution. Return a handle to the invoked DFG. + +``void llvm.hpvm.wait(i8* GraphID)``:raw-html-m2r:`<br>` +Wait for completion of execution of DFG with handle ``GraphID``. + +``void llvm.hpvm.push(i8* GraphID, i8* args)``:raw-html-m2r:`<br>` +Push set of input data ``args`` (same as type included in launch) to streaming DFG with handle ``GraphID``. + +``i8* llvm.hpvm.pop(i8* GraphID)``:raw-html-m2r:`<br>` +Pop and return data from streaming DFG with handle ``GraphID``. The return type is a struct containing a field for every output of DFG. + +:raw-html-m2r:`<a name="limitations"></a>` + +Implementation Limitations +-------------------------- + +Due to limitations of our current prototype implementation, the following restrictions are imposed: + + +* In HPVM, a memory object is represented as a (pointer, size) pair that includes the address of memory object, and the size (in bytes) of the pointed-to object. Therefore, when an edge/bind carries a pointer, it must be followed by an i64 size value. +* + Pointers cannot be transferred between nodes using dataflow edges. Instead, they should be passed using the bind operation from the (common) parent of the source and sink nodes. + +* + Instantiation of dataflow nodes is supported in up to three dimensions. diff --git a/hpvm/docs/references/index.rst b/hpvm/docs/references/index.rst new file mode 100644 index 0000000000..e2650fb9e2 --- /dev/null +++ b/hpvm/docs/references/index.rst @@ -0,0 +1,11 @@ +References +============ + +Below are some technical details of HPVM system and the HPVM-C language. + +.. toctree:: + :maxdepth: 1 + + hpvm-c + hpvm-specification + compilation-process diff --git a/hpvm/docs/tests.rst b/hpvm/docs/tests.rst new file mode 120000 index 0000000000..729ecaa5ca --- /dev/null +++ b/hpvm/docs/tests.rst @@ -0,0 +1 @@ +../test/README.rst \ No newline at end of file diff --git a/hpvm/docs/tradeoff-curves/alexnet2.pdf b/hpvm/docs/tradeoff-curves/alexnet2_cifar10.pdf similarity index 100% rename from hpvm/docs/tradeoff-curves/alexnet2.pdf rename to hpvm/docs/tradeoff-curves/alexnet2_cifar10.pdf diff --git a/hpvm/docs/tradeoff-curves/alexnet.pdf b/hpvm/docs/tradeoff-curves/alexnet_cifar10.pdf similarity index 100% rename from hpvm/docs/tradeoff-curves/alexnet.pdf rename to hpvm/docs/tradeoff-curves/alexnet_cifar10.pdf diff --git a/hpvm/docs/tradeoff-curves/alexnet_imagenet.pdf b/hpvm/docs/tradeoff-curves/alexnet_imagenet.pdf index b0e0ec99473c01e4e2809fc16a298fe81e993376..b3bcc6e53b091d0c8ac317fe8f3d9cfb6a79fb80 100644 GIT binary patch delta 2235 zcmZuw2T)U46kQbvqGBjk5Tb@o;_qc$kY*M|5dlFgQ9ukRO^gfD;>wBy44{z_4GKn3 zM_gf-C?dgziUq_n3jw9b&>Wgl6oHkEj_c0+zi;OM`E&j`_uO~heGkHG!;34Yh$4*f znH9A~&k7u(HlED4w_Fs)Mm_QvRXndeE1kfKJ)tqDRh#j?TLUlS`&EPP`ZIsZDQNb{ z$jDEw39IaE%q+V<(lp`!`Cf}nWv5~NKIh_Vl8hY&{rzLhyOJcu!OHO$NUrQ$pl!pQ z0mbyxDz0DK#=4SH>!bv{v7O^~SCsTL0oklzl+141UOZp?9Qte@Uj6gli1hS=gr$jX zOfSP(Kz#3Hh_jDXKAY=`Wz?%)a;}8r7F|I~vP|7&ro-{daFg4?ZzQuAgw{RTL0+Y5 zb>w4v=*-EHI^(}rcQ(~Ow`(}kSESs>)f!|Ux8b}0UVDjuY~6#jITp<!t|Imv0WY#Q zw(DSyakHsBdXB)qXXSXXZ#w7GaZAA)YoQ{a?MS+oBOR802^U9~MDv$$cPuK)&~7m( z-;ySu(XcqhiKh|No)9VCb$ct*Zcr!GV(r~~q)(3TWlS^pQS3h3?f1u1jktx2V`^7# zdR11U8g}q1K4PTpyoGbYTP~9KBICcgaPmkGDY?h}YPFl<l+sVU*4`mym9xa5EAJe^ zHqVRq-uh0TFP_~GUhlQ?jxJAa;GKw7W@SqFaLkosr2*DZr(@JH+xFzER$d&7icYTv zLLq&)p;S?{|F|CesGZ@4rWad=nC60g`CxrcZiq>v&{b+(wMlEI=u~up86ipKT+n>Z znBk~tc4E~Dq?W1iaP6v&ksC(r+PNH+z^dGiwV%C!%UnF>BiRlGoIPOdsXH>WclS+i zpv&WR;YN*>W~oQzb*aCY&hWVN>8`v`@1@(B%Yjo*YCJRbh<8HeE-C!=%0`Fo?1Piz z%j|YPt*#`z*5I7SO3^FXyzI2mj)wRi+X(07O{WZ$J#j0vd^)n`PwRdmHk<D_q<7TG zZEuu+dLs7`J}=iOWs{qv)k5W4W0bg?8!r1e?*}KwTe)dO@mP0L=ZC@43qw!Fn|8(R zomBLcK2zx7whj6VpU!U|ZS~73vAkZ~RUtdHNq$ye;)pD|!HgA!`KB!z8*OF9`q+RK zE?t7|$Jkm%a^G+(^T*7v>OuXjo!yzTEai!t@9!)YcCGI>-X<?6$0jbrrxSGj<9Ic3 zvEn{PFDaRB)KRxQ&{OZcs3$?mUtz~@E}Dd+w?!`>dRwzDxA|#OocZRQ((}%KS)LdF z^cGmYpCK?A3=)g1A3oF{^PT;Af!lSx(-j6Thi%RtOlxGUNOUl%HVaKQ|EovCWS3*0 zx5_0@l~ks!%S+imwSHIo_Or%IwB9vu(O_^0fo`kkh0Pc3taV$H$~5?^=L{uIEs9Ei zw9}%yZleaz)V<xj)w)Fvv}IS{Q47qnEev9^2HqG2Fh7opR{fC|?$P!-drfK8$Iz9y z=M#nP<t%<p0jIT)owe}wwH7zcHxsXR?j0GPg^rGUCj}(FwO8z1s+lm6D;w#od@J@F zxReysc<)nTUr<ZeyQ=zbz;i=o9#;35VUXP7AQskWd$)&Q_jfJTKJ@F8@|s&8Ww$18 zH1PQmk#J^Yoz}$cD2D-1x<boj3R1$?I&3ZN2Ar5M7iDq2yuLkBM?=;$5C)}zhBndz z;P5gCLI8x|0H_53SXk*SbB+q~7e+*}EW`YTk)aX7(2xMN7UDv-tYD!qQkn~yObL_T zgyw2~MWu}pPXwVX;LBo#vvj_GfDjDBbPS{O5DcO55Q5G_0gS>^8-1%M&4n=$LHP|r zC`M}mATVkf5WxNi15p|eBteqm0e~3%4HJFeI{@M&jR-*?KovtUPRDQrr}AI`rSV{x zf{D~|z!*kN0f0$@h9MwDtkxnZ22za?9H3(aNUMi}2qvP6Q5d8~2%sq47$soZj!}}J zC5S;J%@{+!HzWFY^Z*P;XmQ~XNF6f(S7%6V8b@g(3g9IDm<W)1oa+1ukbdq20#Jt# zAP9OhBuH<DgmGF5BucMGVi?3y-(L0lZwN^Ory7G0L7>tPn5#CaV`jE;9Z%;UsGJx! delta 2233 zcmZuvYgkMP7~XN2l;|QQx|}3knC6^0bIzPov}(FYq7aqqbQ#SyO__#D)(}ZHw<bwV zo5+L^jiO@4Wl&a^kt}KGQ9Ctd-II#ic&y!Lj&uH;=RNQDz2En~Ujy$p@BB@YaPbTu z{i}VYC;f_4OtXrfn?XK?x=V^If&(&dE-=c?w*z*KW$V_;8%l=6yTaJ+l$O0c<vsn6 z&#~j5oUcS9cpI)xLDyeCH!Y={vt>qJTmqvNx90BjYEnJ(e^-vX+|HzYs?R-|_#)G8 zU*z265?V`w$p2~Cj5%E@dnq%SF02M7l7Blmh0^}2spS*vWu5YZAXXvA6eeDy^{vya zV``jIvlL@%)tafvRkwQI4(UmQ-)PSCyEhc~sEzbZWsl;vh0hZ^M$t1UV~StcwjT+J zxu=YcSS0s<S<B{D_dcwgQX>jDmx!0Um2CEPGp8;WCl=crTqZWZeqow;ypYdN6z+Z+ zBu+hRW_c{ea^+jc!1^0OT~WTiDf|OG&fUPWfq*w&^;bMs)u4PUBboiCN>eZEGNxf_ z#<;sRgKfsm+Xex0hbh<I|AqozE?&7jZxlV^s*CCQ=fB>XaJbp<%6hxAf47EkHy1eu z-=473=3Mqtr@+8XxxacGn|Kr#28i$ti^}i$UCh10xi!-oP!tt<AB}Gew3sw006p!T zsQXHx(>G<+2b0s)G2s=;6Qt6~Wy)xBL(tn$$@Fk5JNsG>UT_|*nNN#yJG7VSn!akI zUv;jJxx4XWW?(agf*7#)%YCNC*YzJ#7y75sPw~%0t}9Gh5t`C;@)6%f2n?*VIeD~$ zrm?Am$B_6-mn6wwde=(5dU`|d^7`gm`R*F;xmy-wo0Adj0HkFx=Z>z<rB1$woAf!E zx(B9*6`d?Urn)t1S)A;A*DnhFO#7akc{54xbxaBZ_Bt3HOE4)#4P7K{34JFz>nfAP z>~u}B=aG17WcDHZQgukIFlky9=b^aGlFA&6{iCFDNSQy$4@{BRPEXyZ?znI9c}XVE zS&kY?vs<po*9kdpuYKZ^o_yM(3kdePb!*5a7cS0S{cO$Wp)DIW%3~OezJ@0@1CsF- ztlI3xXbD!VQ|hsL%#Bp+m1=s6vEiX@`lGs)sg3MG+6!f|PO?%e3{BTiT;v`rDmZSo zT(!<%<92R%_NA^e&$RnzI`Z`w8GI;o-M^soqp6CbJbWjw|FYEJ34B-8w37RC7?rc| zkwd~$bLJlKqwA~o=I(w?_txHm%E1a%ZiDsKflnLETW9_Cd^b{X^K_O+&Iyp2z7~6z zL-kwc?w#FOMQ(eF33pCYK1g<df7#gSvP1IgcApOAoSk)zMVs@lGbvFEb@QV1lIaF1 zJB-49zg|8S4|eEGGui@<0Hzj@Pu!^{%7c7kpV!JcW7aH~e0(dpl)tZ#y}qj+8<$}! zuBp2gB=ce^9FH7NGPKBAmvyf#NOq@TS72+?3I%E74&zQqgs{p|W;Sbp6H1O`oxK?C z97D36e(FJ+WYE)ToS8vS@8M3J2W<uM6ZZ6%4@<fwwMxxEL0<Df)8q7zm=ZX0;?Q$d zMOlw@*fpA;<8Yc4VAfuL$w;l5zg<vgAI5LbT8m~RwVs{rDyv#KoGc$2`m;{my?HfH zR?0AFuW!g39!o+GXnqpSBU_PR=~{A{(`avqB2t4MdKf^#fG_WA(ROodV;BOcOE-oI zlHwu(s&_0uMo$_I8IWdy3<RP>p^kb~CO?uTh~)zg{wzTpH&(#qgaMmiXV@8<4MYh9 zaq*&4G%L&h{fvuX17WQ2=ty3~H=?MW=FNt10Q~X*JV5X32SkHm8WBV31dNWtL<}SH z5D*dVhZhSO1U7;q+8Bua07HL(K?tE1!XZqn9|)oFH%$0_MG#`(1R{+FYFh!)V3de4 z=%}^`4AKca7}mmsBlf~5s_g;@;}`*>Lt4cnwR8jpwT0=7k>tL{_TO@J3?lR!xf(>M zO+;Ww+d>dQh{6a46E;S0jNl+j!wJGD@_iWLzZV9g447aSg9d3|6UZ3xP<v<wLbybb zffHlGAniCu{Kp_-?id}^K6enqh{NCzaTpwC5M00!Vm};3A;4ZMr;*sTg&`Woz(vF8 P{BfY3ot@WWZ@qs29OEN< diff --git a/hpvm/docs/tradeoff-curves/alexnet_imagenet_tradeoff.pdf b/hpvm/docs/tradeoff-curves/alexnet_imagenet_tradeoff.pdf deleted file mode 100644 index b3bcc6e53b091d0c8ac317fe8f3d9cfb6a79fb80..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 14159 zcmb_@2Rzl^8^3Iq%#cxCdxX0$*UBE*BPk>68rc_FQ4u0!%g83FB$SaRvb9A@C96=8 zqUnD=R|@%me_sFB-|KapbDw+0bDrn<JkR@_&vQkLHMFErGFX_%h3D||jW84(3HNq5 z2vbyqBaH1`h<<Q1pqRoD=D|KhI6~dt-`>O91+J_NBYHVO9pzRG&<yg|HuJalC&IBy zgS5Q8{Nb3Twz0jxKau1G$3Y)q2xF4BqZ!d3ZUxrS&;s)lgZ$wLT~Dy4+V5KZcdZMz zfg{XZ?Va4bT;TYn;g&`Y2Z@gUa9Q%?f7wAH0bzLqs$O2+{(fKyFkT(7Li7Tk2ul-a zle`0b;LtYJ&EN<P;vqLjqNz4m8hkP**?al<*prA}j={@2U+NDQLKqO8-0ao7gTTDd zVpu#<Mh=d_;$-m9I>yj2|7EF}68*dbNRB{^kevRA5t8f*8A2>A^Q`6O0mKAHXn6oR zY7ia0oq*W&iC!-Lu5c8j9aCtapFfFc?+FXaP1tMT%`eMz{obYfr79%LtP*AMZJxq7 z;{v)iEwcg%nB#R{kDxRCR<hGKKRi18zJJz`ok=mbbjbMBsgmd(&*rgbNe!)^U;MED z)$(#@^VpWB0qRv(6HYmCynp{qVLURS%C+hHLu`Idv7^+$?GLpKacwgDhAsPQ>Leq> zWS@;l>)na$4Z~X1($%mG+f{9ieSrR@?A5L9=be~X8YUDz%xbn}GxG4%qMN#fM2WDB z4*t~B{zuZ4Zuyrgu{G&C40CqO9d7d4e#`ZB!e%--+LoLMt>*sSKAp<wysXcC{6BY( zy?FXSX&`?3Qd58o?W}OxPFur^_bO~tjP9P~6dQ8W5$1H!@|g-5kI3a8+EEn9>0&>r z`Q3GIqV(4^ahKPU{<S5-ssx>D*irF)VdV>gfwqD&PTLwz(ZA%lW__}VaX{dNnie&2 zB+MuF;4K?grCFGVn8BSEf<^A$3wRpWj*tK;yVmb<d@|<+g6{3!KiE*yKc`|=70*Yn zZY_P*yr^gj`~A3vbOFIQmEdc5rCYByyJ=snfwIa8UG)T&@q%JGVbOxa<~ASMZDU#9 zqw1$5%mc5*4QOSCG^Hmc*kXbzQ|cTf(POHl6seKuD-veXV%Nva2IOi}+xOJfUJ6cQ z6HZpzvggG^>rbqrE&(N|r@8rV+n@RCq)N8!r*+N_EESPUh?71`^MH;~l}02}Hxql0 zmAc(Pcl2{3pYS~{>3$c9YfA87VICP)fo~QGc4&vfqx`1KpP8oIZ<r&;jh~;lq}CLP zOD^h*JG6t*`1aR3Md#Tb>E&H;q|Bnezr*;j8m>s=gsn-_GnI+c5sTjuO76Yg&2nb? zveKPRgE`!%PcDoO9G;Z&R#$kD&CxU&x{KCgG<_@8#KTyTt*W2clGXJ5NcM^0G99wq z`Ftn#>m|Gr>)+eM7w#_Ol{3#xp+@(%Xoq*zw}}^HUuNshe(L)E;$W!XV(nzz{aQ9z zsR4H0-mN1GulD8Eh}Tz*U(bo!UsTARpo-nr$Qt6g_vE&33$NfI7CTYbHO5^gI`IZ{ z(bIFUSml#EyJy*L#wL<-(wj1G%-k06A2)f=f1v0Z;oFZ3m@{F__Muump&_x;bW?-` z2ENfg1xHi1vLlmWO}4v~Y=<r_Vge5YDtvm|W0XJC8yPBUm0MS)zAxRhyxQDFe1_3w z`z$e5vb~-C+g_IEJbLwPXRdQ-#Oy4LIQfijSGdadZV``Y(I=DC+YhQbn)hos_D43* zGi#j)rhIyE#IBHEkoMz{H8q{IoD1_S)3a9PN|Ma4A{(e}cXNIUr@TZ;>~I#F=(D8O z+F>{%`bzR;5pwu+_iZph<g^$^Q2K}0d=9Mh3%ZvJyo`sRpVqHyoA=O^HSMaEx{z+$ zQ!4%Hyl^`2^Q$lQXkPyqboTo^w;8wa-8|AE{H=1Wvk*<#kNlj^W6f`4_kE~{BtC2T zdVZSta{Oc4(+Q;Mp5_94Und<$^h=dk{~mht5wCiCojUrcV_nyJZqDc2Tx=Y$we|MF zB>D9HU?Hh|KtW;Pw=?=35Xepn@@6O$4!ayBP>`SAh>MexTjkzx@z*xrLegu5mesQ@ z$Q_-JHBJ6G$Z921ANr7EoN%jMnMHe|McPUIby<)up-|ZbyWd;>s*c~yCMOIx=TFS@ zu&C*=qfS`e5rGGzq|tp6A<CK0_`=s=dSw{<FBgo#tYOepFU3lge(<K*1ZCyHUp>LO z*-u7(4bo3tC^*K&p2Q{B-lE53l0a(R5gaiif9&SusM`<ap{U0ZX9a)Wy5h$7p)DI{ zQiXMp=vMynQ(U0U?9EHoi3S}GXD2iX&U51X73Cv+Ir@Yjv?XqS-^6u8Sf%@4@E$t6 zFG=-bDsRTrNKAlqRU^H@X+C#B9>Ntg?p|BoYjoaLA=M{RaNYcRBhGjo%ytDA>ENW- zK^5GH@aqXr(WlO3?x8_nWf-$l$YMAvF35VbK)50Hn(I69MCZHNgHh2bg7Ytwa&zYz zZ{}*y*fPA9>SIsj3F$<ODQw<daB!6RwEgujQNG#u-SuXVrfn`?dg&r+|9q^f#uUA| z*VhM;)6FIv`T}Lou|4MW{Cl?emZq;Z9&FYP-3_9TPW=>rrn6(q({XEqtc&8WcT|kT zaz0&H{4sRz-mWCsu*>VHW~0$@jp;}yT!u`CI@IVk{VTe7U^=be>u&dNJ|cg8JByI0 z&vx0;2VzPO`8ChS-YdFv`}AYMj{BGVT11E<@uY@3*Gf$oXjJhGKekp*U>5IYHSGMr zlH9;&|1W>&m5Ad%O$UixW!p+A3MtJ<CiT!W5!~(Z3#lWYoW#=Se@!c+J1mas#!`qQ z0<x;h6ya{742ut5zjja<4eZbB>ArPE1HsWHaj$S^%X7`I)maIR)$*+s&B!xWbLF`w zqYYBC$0hZ6>KOByuf&dLgtf59V$10Q67=qDe(}V0AFRhGU{T?^_CO#`kvqry-DB=k z+$Nl3Uj>YcV&3i^KCCD%U>%Co;$j~$d!Uk7V|{VfR_zOO9dGp=)q}d@HDZeW0U<^* zUfeybEKUvP3x+0B)Aunqrgk$Zv0r^{`94e``|?}f9J#6R4_bYl@s+Lu;q80HCPnO? zn~7?&I~iWJ^g8(@?|q?onP0RY5B%LgT(s$!!`|~pm=wIF5Cc-}Y2A!yt+NJsazX9k z3E6YO>El5^H+g*WQ&>mvE3xW-2_B6>t`a=XMAr)lzG;&4(=m!+nVW5@M$?OzwmoF9 zvvbwU?P25NV+)GCeNT1du2o-r_CijF&CE4L&g~3!5n?9Pw$Y&j3^LwZGk?l;AO2{& zpODIHgobku%ze~ox64$SxHZN5*gnTi>|6Bldx<(YzI}!V^zQ4OXB9UZ2tUiXyY}L{ zd($;CwbSlKas$)n#L|u6c~;ewnF)LKxg2@sG)84w=Ts|dqVkT<79D1;{#q`;Zc~(d zOpKPUSVf^~{Azye-l&3rfc&cog}f5mj27`&oUY&qhe+<g!+NWQUwNTFe^8Z~o8_+~ zE(pkP#08_#va7{qq)<&Iz%)sSl1bu;_q4lLlVy&W_s(~!-pu|>XLe3QThw-}o&LhW z%vCO)6FduCW;2(RrkjtyQ>u@lA01XZZ7-aQo2<_{`=<4#%_yet_O;D795T46uOd@F zr?+Nx9lCAFE9Prh&FA=3!^6>L^6i#BHhpm_fiu%ck#>GgiLmP8c!Ip)NbI;nMXSus z5B6cV%TPsk`1jHn-FGt1Q2HQRu#J1}<&%V$y4aJpl;~tT`!m90z55Iop15nbnVb|v ziIwn-pxGYgRyx268CvBPV3M3PneY5MUnnoS+E^K^BP;04vLRVvajRsdX?!g~9mO=M zM8i#Q_fsk~FHk3a@TSskEuScX{ku~3v!6b3OONStn)k4_mZ}otP41{+Ov^p)DAnHM zB`Ud}d04Z8qMvDbO&s#+ZED8E6DYY`_xqGR%KBp6Kaxu)#@g$dHWRPbsB6yYYo70` zKT}bCFX8F2(kmV{kCLYZ5{~QXiil>0n@KL#N?1uwJk#W;(iu6HYJhz$wM#w4p}Qc> zWFC1Xan83oSv%l<$3QMYI&0wl*=ha5ES92KCZiH(#?i`R=RR2*hP|)Z_F$x_bkFzP zpF_zRvw~L!3!l8*D`JwB@Z@Y#z2ahg{>(3`#Daa~gWzfefQC+g8<XB@OEWOJ=BWXs z_i~H%D~d_?;5<fCn=jt#9rQ_e;Drc@vm#ty?&>kUrn6$XF_OGT{mAxRs40w;Qpd>S zU1JR$=38!XR^*&E=5~1f)0Mi3%O4@Qvy<-SDI$|!@BFJ`)=WE5tlb4~T~Y2QO&F&; zv4x4vsyXIc0}ZxFRdV)V+NUjKtL1Cp9zh};!*?k(s0X-{dhCkhFQ+H9lD3z_QfT(u zJ$=h(ax!9PS|5XdQC~}A_p9^cUk=HtBW7QfHwJ|<P3l$MnJ5jFI$Wu?qkB$YAy8N7 z`o}U3HfI(Z)2uVe>GDbrVRb5t`NW+I-#hMo`pL*CZY{Zv^+8AEjmVF%+OHWTsIuRJ z(^GwD5;QW_xWDKh@^JCM`Nz!HZy8d?a%o1);STB{-`Ar#-Hjzqr)A#SgS2LR_P(g| z7g|^UO!TABI;zO(8Q;i<y5GO}hESaU8AAJl<srYI<vJWfr|^w%xE3yHn(C2Sk@SP( zTdd)jF_$-=nhf{8{aKgpBmY8ij)|TDtwhu*dyda=IzGZbTe{cj#FM0atwfC$)|V26 zw<>2Fnlf6f+r46U&lGOkm02o_?y0<AQKx+te^Lv#Q)nEnemultqWitcpi}eYA%VB@ zW?V5oQj6*|)oB5p5^2<)sxSM6CcL>6*`CwZjFD;@^u0dcd~d~=_f+2MvL~GiLx$gw z9W6ymGs82AmgmogxFp1P>}ibCy>oHaT1>(C>ir7;CfTDrbXy5_%_$d|pSpTeG%K^r z6=}78;VdYZg8%9raK1f+>d>hY8>2N;@|AB*_R*EQOQ{lfad=DP%(nhv1b5|$?2ilq z@<fBCJ9*EB+tgC&KbZPbeiY>Mve~A2#lvWvp>~&K3dW&id&>ic<i{z!Y+EvEKi}v5 zULVVIDWPR7@5YN=cL@2D-?_Q!BVVnfnvEiD{AwVJLDv~0jjOPm*rT~bVPnp;Fo|eu zl^f?`8`NaDb(mT@H`Op5trf|QIANuRRZWYqWG!_Tb!OF-qY})drRb>T99@T(jiTx` zUUXevGFUBSxb>yzx+x!I_`n47Xesi;n69j|=tl7(8n?!GC+nq9!Ra;T^CE<f{uoir z%HBg-oW`Ek#~;cp|9O13P#L?__?BSg&-U3Kr>k$T@Si`EKp7{vrF%qj&hprkGZMqy zS2V7CzKkY4f4ePmc=LE=u6cF6$zAyhc}}^q@y>4*?|;-wb>0w)LDP8j5t#8lqgU<Y zIf{EM0%N{wB%Mt?S3G<u$zvfh_3n+%C!F<$`0wvTw$ep}7c%J28WaRXtGpSL2q7_i znen~$Nu{c;iJ`=Uex#i>o>}#pVdW9u$sa?D^37+)pE2!C44hHWcfa-qf$E5BII8)6 z%i~W<+!rUAwteWxawI(pQy;gk>UyDCcJIfI$$1vzIOh}p$`OF3zt6bZ(tVdS3W-L+ zrLlMn93@M@!ck}p2D<e6FZ6|xg(1uW9Q?`mhg#kw&*dAn<tx0E*I94SSfm*ekK5Be zik8Jw){t&V3Hb@Z8HG>UsX9HP*mM8}TZhw)0zniWwK`fS>9IG%nbhZV)wDlq(;m+A zZ`}Ri=0yR6nxkhgtI!6DQJ=d)!|{Ly;i=cisBFd1dpBqT?}cL@p$<4|QS!z$%`~i` z)h*qB#ue|klkp<QHfQFLv|^qDR(Z~$>7o%*+56tF+CHBTeAd4FKDVdkmeVNnap%tx z?+@o}GqHK$bmXC<SD*V4&My^JwHHc9j}P|RU<#<;pW~nYNpsFm{`fjO-7tnjtvPEL zUt?28(F0t<IMRD~TJ~O7Y1c1L;oK56yc&C2wth!ijC^Ox1qq{V%6bklO);ZhPIwVX zO+;#y$ozfSOjzvLmiZS?GHTSZnXZIzMvP<y-?m43_-r3Zj>dxL&6sbSJ^iUHVzytl zg5NK(dwLa4YGc{TbUTaVq#)C71KP!lq<PclAC&mLi%z8Zf2{mEI@fhiy2(%}QKRwt z*Eo0IGI^MWkJiE2IDh-r>UQ;#C)7A4D)%cCl|vz4WnH!Xd5%tT#J<6n@~bcQ)wfMj z#l)G;ts|a|!Zx|pfm8F(ErT6(xTtf!czVqdKKSnEK1Pk9_44d!vmyKF80tJGsc5qq z+N@$bT)mcQ!kMY!Dz5_p$Hqd;&VQb#lQcSX$XBVSi)TH)HVlX`_*DjKV65k*A;2`~ z7iaxSS6E<Eulkcc<G%(kH=9pOMsGc$!}7Lz&-O_(hk)3r*;0edPING5`KZz0q6rhF z*W&$l)!MrFGY(_yeXVzd6-Ti`d}2?fcAQc$;&YC=+mi6+j0}%mKG!CkReH}cf~UN! zN@EglT}921lQEOvJGCCVRGHkf4>C^>^2+UY2Z~-v;<HwMQb^^y|2nL}@nouZ3;({t zTlV+$efth<>MQT4-q#M-jTYz)D-KPAC6spt@8rD0l3K|2Ir>QfZF)mg(Yeu-AKRW$ zB5AJ12fSt4eXN4w_DjjLVK+AkC)k*+qs)z>AQW~ric~Uc*i5yA!R2m+NLhWdPI^~; zgs)S}lwMF}CYh++2D5gcOid)PBiwc^xbYsdq{V5^9gnlOeC3rw$r35GdvBh!EmQI{ zH5;9UKGva&`c6fPm)I11#JG#;l+F3}Z4XgV(BRxxsoQM%#KPv_nJP;D>fSp6O5GLI zqYuQ~C3YB_t6leB%nyC`osq&DF}cp(Hj1B6Si&lUz@5@#24^w#c|sHdX&QdqXM06u z+5Ahq{eA`p#zDkwuE^wgJ$<{W(~hr2%#AK83k#iU)FrT<GnS<`a@A+TcsvyO*y;Zg zooCgQYw<3^P-o=r1w3z2yt8fd1G8RO&abZL!Uu_3Sybpr^E!^|YCD=Snk$O)nD^z@ z6USrDbXH9mqKck7Yw?CJ4$l3e3`HXy*V)}hAqW<^+8^Ug^{V+%^n>#f^(qzJZCmrt znt6XFEqV%1*>)H3B(a`Wiebt!`&F(tEwNMo*Si{z*qqLrV)I^UJ6qc_x&<uEhQx$g z*m;alk1+>nFdD25QHck8bycWOXPev+rT<W4cl72{heMvbo<tog$*jzn3_`0oOI)bZ zyWDHse7jKC^GIZqO9?5y)&eKl>PmPJFXZ)3W@5UAAu{;xU~+qsYO8$O)@f>$^7hTu zbV6cE3I>5AehU)$UAp--`!709a_oagQp&^M1Qk)*m?l@=Bn-6`B;k<BceW)*oQah- z@IYG@AES%Vj_SOii0Z%mfP#+Gm*6XB%s?vOjm-IyI3f4>M@`XAk5gSU$rp!tFFw2c z9V@y0o`;yl_mG;&j0*q7p2#fBH})Beg|YE@l&ep%O0mqhcj2MjF{<7gmG#byGq-v; z?{x=<abzVt;(ss`LLH|@%lSIy%{2lnEjspcSQE>4BuVd?PMCib>D}0c2ZCXb@A_m@ zRPC(B-kx<lNr~O9%Dbo)oS(ng&UP_=9nEeS(5*IG489hCOC%Oam!cJ|l93#%7AL^1 zL($Sb8o@AjbRVq>Z;>2brD)8~%@lVhHjA#a@J2x$3b)!BXeKp4fltG4k17Lu75%%* zZ4Uw$&vAd<UF%ID9dk*<tYU|lVA=Dhu5-v>=6nMY3#oq5tEs#sJ}%YWsMA4)sPn~j zg;_C=C+o{A>I0n9AGvjPwjTN9x)+nGbO-lN?4!ZNLcU_{xe2R+ON?n@Dp%yr=o+KK zE;$@4&OLNQFn8-KWghJ;2_atxFWA&P4~6rMz4dqB(7f0r8~pDsu`40he_|>OdX-y{ zHYFqqqnNsW!ouw+CTC~zw(CFszUZ&JhlZxYqf2pw)T|*AE|oNB;y(OT%emmr$4lqC zqdw>iw#%PzZ`ZW$$hhO8AvJ<(QEZ9$Fu1_8{p*G9jvBhVh=F+?i>Y%AU21nvVrrWu z6PqR5FWXHDR)hwbl{B9d?tgP~QMuKRJA&(mkn1hP*YWd0K{rY?pD8&H)>^S-xM`<v z$%veG>+KrRo0Qty_U0>mV2D#e?cm;{MaaOr!7_<xPojpqy-Q@t-PSK-{ijN)Z6@v# zOXMiOW=Y<m+MV6)Q3U_h{#M{DnzgL*?QXF;6VAYQ9o97sO=on`R`Z&1JzIJw?__-r zJdHlzqY#|IQEI5Uz!Q3vdh?k#?*`nnqO|G=w|9yh#vckH&0f#m$I@|5M~CxKBh0Y4 zeT!ay#zJ;n_QS_rA75t9^Bh^+gc&}(X&t3(6kV=8P+{<9;6R0>WG^d=hK*@ukPlSK zC(^rUIJ!2~bW3#~lM-S~%HWq#XLLk*I(U@WIXlog?SrG>jSneE3+&13@Ul_-2i~`= zgw^Ed?`ZQDbr}lYG|H)UCf+D=LSa_BY|T?$o2fxS`e6Q=eU?Z0T+EZa{$H0Sk=d$h z3#x(&Gf4UipJws*TR%Q1YL@+le{$Y2U1~>Th^Yn3+oYJLSx#d&q)AKEKK*U#_PwV& z+cVI0e6Oz>L|*VnH@_HpNRgK1>RIA>`tWm}_V1)-nC|Sct#8}BD@pHS;PpK>j;NU- zGuPS6M$y*V%L$w*q4I?a(_|RT27dHVPwqI)x$yTdqtgz+)N3u`vJ<(|-M$h9_DHA7 z+?>|+jOCl1?|{W}r0f-WR?3uY{8cf^{H9A+(~Z7dHn?6oEHyQyO@#3mQ|6Ei|G1<3 zu2VvRN+EW8;*Ya$>e|9l?7mO_T61>755=-gyN20Fm6mn<j+{E18-1T~eo+4u+|CPk zKAjN<v$2Tf3Rj&e(7qTpZ)t7`e=yCj>l2?~oP6}qv1d9=gc`1E<%n<l(oQq@PoETg zUv!IIw>#ItM*bU%vcFXMP5hg#ONqO`>FjjZe->D*>((upTA8WoE$bboRIl<$rt^qx z4_wu(Pu%VB{faY=;@LdJs6hT~CTtsjOgD#O42C^zE=)Z9fSzchc=9gh&4ot}Y%h+_ zrS;zA_aOgj92NJ>fc+(YyQqh)*&|^pk{ls+-xi-P1poR$#eemv`8ulED5gVWSKm%Z zn-Ef{RO!3MIYZR^9(}mnl{~-r7S_2bGFlxjFYRl5*N8P^=g@5Xp4!(RvbNq1yj}RR zGk}kaz{;ufaI3jNxlPmd>vV_sohd(+;G`mXCaHEg!em2ygY@qXOb77bs6(H+8#x=J zcXIo7w9uFZ^tWd<8<oW~PFvsG+}^;}N8DPHlYTe;xR8N>sZQZL(Q_EpB<kYxCdCQ- zt>JIGgwELS5jx29-CA$(sUJr#FIIu;ydt4gdv9)6M=Rq_x}Qxww=~Up&nvRB``jE< z|Cu{AB0Kc3p{8|i!Pc+33CU?WGj@e9$>7R@EMj;;>CNQVA5`U~R$S`{aKpe+cJ*^= zkYvEDCcqjwpIgL@Ec`@u>XWNAxOhAC@#}EJoSuOC*=DM3+QVAf(Y+HvRUIhiCPuWV zZNjJg_(|bJhIgEvw;O8}-uFnM#H2Wc?A+6y=V)(Pczv2tHDYSEnR}~7iTi19Tz)`m zc=)wPHjhgK<{t^ca@!M)nfQEY`X4DVKWFd#kS%)CT!JS@E{(;UJGpv*A*#I8A=7M+ zYy7Z<(ub3_(_c5;{-W@Gojq?9PU2PvMGOu=4&(b*%<tpS_&rD*b@k>ReNI!RhF4S8 z^c=$Z%`|2V_Qy*6SS9`HA5xfSvTR#t?TrE+xz!FAXVAa~UOrA{MX+@j_fMtjjZVF! zovK%r4}+)ixsPm$czwoOV4)1bb3B0FNb;J9yhmYl4J;-0^<I3?p31l{S5ji9Jjzi0 z1;lR;dh2xW(UlJIonD8%4a1@}udS6%sWMZcz*TW@)Gk(uLu!^oHG9Ch@qk|T3)}nj z@hC1eb?@?n%sCO8+fJ71mG^y=h>P9L+@58}Tw3yQn@e(tgWT5V+iUrHhcCbV#&#{m zBJBQ{R-=e(g!Yt~+w%-xjut8$%{z&A*2PV%4f*Hx_G43C)K;gs%ay&nK7MX;@zI{4 z0|#EptOGG|8^u6qkm|KE#^H>+HUp%0%>~9)zJ>CbQ=Y~3fJJxD`wI4Ki)xnbY4pM8 zI(jAE>M6_Xvs{(Ad0CM0Vt~k0XEr=<=ZC_q>dO!MhwXfLMQx9A$=Xf0CGg_>Uq_V5 zq^lik75SKTI3RT5h7nKDK_1c}YBu!_JBEXeTn3i4iup-~n}!#|o8L>16sCOY%;WpY zmplDYp~maA*ug`B4@jKo$2Hitt@Iq4+ZBT~%Y24w`U0O|*&jPI8fI*&G~5->OkpQz zyYJ{!r3OLoVPa{%cT?LuiwcEV?Z4Vat@NI1_I^ZidL77JJ7{kaU<OjwKrWlBJqgOJ z^Mj)xPXY4mG`&Cy8I(Aus0c%7xH&r$p|m!TkY)u3AQA!6+#HE;j4T}CLj-whAaf0X zPy{*m&4cI+GV;hxlADVw_y$kz<>u`K2ksdWLL_;^5#C-z7{WggwESI3M3AfI3=-bJ zf44(KIJjU%_z@2gz2JDzQiaQ5p!7QMo*rQdCjd%-mlKiX=jcr$!U=!_dpLmrLpZ__ zPH+SfjzWUoE}#M(-9QCKdB73i_RtF~3K#*|a9B73ECsd=_zVD-k%03cI3gHs1N)t> z_-FILP{1W6$<H4W5D*Lk65ap0)C65Lh<=VFHy?j*5*(~X7AQ2*z}}zaMos{fL4q?Z zWU|OVEpQ*IZ~uEG=#Cj-XzvN+@>fkO?n9xhO0dnP<jN&=?saqWcLn-eN}B|XFXz>) zgf*&uWL*)^1fX+!9~~lDL114H!;luG(cmfxN+NXe19}FV-`IZFvI;UVLBgk-qpFvS z2N8}0$&dCyWCj6$SYSS8{zOj;NS=RputH=A%Rh{**vB%|w1VT2$p0S&_WwE$1|<i~ z9R;Ou0^5{><1u&`0XQI8a6gEa#lbP)1shawfKTXqs6PhWA(LzH%z*y?Rbb2CV?bZ1 zKyxe?901ZNBqRViEGS?if*f2{7JLSeB@Qeu3l$L3&;%R|6z~Y)!2<;*0hjNx;Q5mU zAqNkh5poTEj{yZFU&(^;(3}K00;rd0P)*(pxQvHthzV#@P{9LBgT~^(#+T**eZjiq zHU<j>AxADaSpvBx$O3)=9gjkT4KA&V1?%F#n*j^~8i$4BKrIJ^1=Zw&hWg>jLV#*$ zeX`OZ0RkR?Kw;3DI537Rlx60inrvy%N?7Ov5)u)_HncUU28FB-Xt|{V!~NFTpQ)A_ zfWf6H&|t@47!KkJa1ITJ%wSnv%Qa9LY?%@A3|IiFz_!WkLw4|IeyA6;C$cetnV`>q zW`USOB4On3eosKa6X44QGK3`?fb3v(A<G)r1!NN|3fUeYn^-PnTlsAhkUgv{5MSia zU=GMOmJ76VJQ4?V2zZ5TV4Xsc13H210)(EGW!WZ{Ye*<S?=W%=wO16fu9v=t`ur)7 zT|ioeYMVb!ZY91|1uvN(p9=*IAVhM@5{`l}51`q@$(dK6<p>9wCDWYXXy}9tXhb+r zGHAh;A`v+y0g%bT38<ai#|4fi2T3TtfSeaH%>$19y^<%8?h?%lSjf^`-cZaW_XA-M zSo2cb2V{T$Zb8Z4<hThoM((#Fd;(hsB-o!o3mn2y7a|D?sX$HSk0kIEEw%iBJ^gNx zgDCbl7fUf4xL<OA5SM@Ru@r!@zj=XTGI*~_?h^pTZqS0Q47uRiocv`F6wS%4U^uvL zCb!73oeagu>E^#f(6Z--vgWmb`7h1>*R`*TP^hKkdQ~vtay0()^8kdSJ-Ft8g0d=l zxf|;L-VN&cn+0ge--!o*i7SleFA>_^a*D#=*emVn<`n=!;%~tk0FSaH$G-!~gyoUT z7y%=LgG)pImdpQm>j~~jR)qQg_1e=jNtO9HoJn(DNfYr=S5K2T`Ov*SOMm!h9hI+& z{tN8=8;ap3zfwLLmfl(rX3#UN83}om^srGse(TtiYVo!L6Y0L*0J-ve_kA4wPhQ?q z_PB2%s5s{>ekgU<x2k;tTD$m}FC7o%QWv$BK+(p$WJC-$FS6y?KEAc1C?VW&zxvOA z!6qb?Wg)102LMO`6r|_o1OZ3#k1R}~7r#qkf9Z!Je<g^64E4`o@Jj`MH*YTu=yxIF z8uDoH3K@w;V$k4KFIG+(i4#L2#Ux+|b*LY}S>m4d{yrYw{vK`)K-g#*v=rRc-`__b zf%yA{j5o;z3;{^jDZml@T4LoGA17zHgT13WFqozB5-ZFnh(vUTL7pE5LhtWC5Lt2H zadn3Op+VRg`h$D@rlHVskkeR0!$1yTEe#;uwKOaqJQ!>GVL|v>OT!`8t_ynnGanL( z0g-+6SR@cAc!r_BKRhGB+wQeA6dDK6!|HwjS>xBPi$dbpp`ic*Su++zfZQ?kw{l&` zfvus*68@o~p%dJierOB`?`vtYkf&Nb9|j3Ir!_Qi;Qxn)2e-0o`w_qiWDO0A`gbfA z2SWVXezFkUhyMQV7YiYsH8dOwa=5Gefk6K+4T`R7`^iC(Va-?w&8?x~(Eu5&r9lvA zElm~*BhcSpo@LR{ab^t-0|l)$G~B=TCrkKOM{=ltbt#AXSLbqAfLPb8At(1QF9g)T zybv$|fvy<~(E2~~5%756yjJgnASb)FA9$u$K7;-wdp8dviTuMiT`y;E049MG1CYVg k+Z*zI<hGex2sqXLzGku~`IFrV3WLNDU?L)#Mq04{1M6eD1poj5 diff --git a/hpvm/docs/tradeoff-curves/index.rst b/hpvm/docs/tradeoff-curves/index.rst new file mode 100644 index 0000000000..551ac34089 --- /dev/null +++ b/hpvm/docs/tradeoff-curves/index.rst @@ -0,0 +1,6 @@ +Gallery +======= + +This gallery contains example tradeoff curves for the 10 DNN benchmarks in HPVM. + +.. image:: alexnet_cifar10.pdf diff --git a/hpvm/docs/tradeoff-curves/mobilenet.pdf b/hpvm/docs/tradeoff-curves/mobilenet_cifar10.pdf similarity index 100% rename from hpvm/docs/tradeoff-curves/mobilenet.pdf rename to hpvm/docs/tradeoff-curves/mobilenet_cifar10.pdf diff --git a/hpvm/docs/tradeoff-curves/resnet18.pdf b/hpvm/docs/tradeoff-curves/resnet18_cifar10.pdf similarity index 100% rename from hpvm/docs/tradeoff-curves/resnet18.pdf rename to hpvm/docs/tradeoff-curves/resnet18_cifar10.pdf diff --git a/hpvm/docs/tradeoff-curves/resnet50_imagenet.pdf b/hpvm/docs/tradeoff-curves/resnet50_imagenet.pdf index 6d3be5c8ad19198902b15ff818f2984644026ae0..67d3651bf15ab0b7c1350988846072fd712f4b4f 100644 GIT binary patch delta 2298 zcmZuv30M<%9yK22kV8};$Q6|=QYD!)nSgSL0s<n0wrEu32q0G!L{Y9GL{y-FGEyp7 zITY9xP!ugkfMUErEPNHY8X~x`93BWp(XK0OP<N}D&iBoH^Syb$_ul;8zgf^GsJcbL zjo3>XI=m;BHruT0lJVt{*B)=OOEL;y9GQfvE<94Xy^6on@OH~US)TlX%dUup&q6V& zSH-2os^hY*=7|TV#FA0*JK1~Ttf`UME1sbuc@ldfHU37|==j&0UaPI;U7A^OEbC_L z;`LWWvhhNL8i%P{t>;F^s_#GTXv-@Ow!mw_JZ9Um>0f0kYTK)@Ev@j(fIwy)Gvw~$ zEY-2SQkreHRu9~YAE6a&aXnSYl|84&*Gqz5%LfU;i&l2o&Dm=WjYduOo}C%S`2|Ii z-9oX3_r^+z@an>t$mi~fE8je6^LH1lw#nT^=Z^H&&`eL?ut!HkdZ~z+rKgo1l3!?( zf7+)gD9!VPb&%T<Bib1U-Q0!?s{dS>lNq7A=X5YT!IB<QLamoBV7Z(s2J1Cfog*gZ z>l;`7LNzPN^kevV<x!izOL756&#C8!Yxq}Om&i4VRSV2C+e;kBf4fzad*ZOu?Iyx& zfhB%#t0>t#;2|U94n}k47^FM5*sedR?Sl3dBRMeLy(pjl?q^<T_6q2rSI*8E9j;+x zYZ_4z{jJwn4znli;kFYSLnczrdOWL;HTV+-E$5?DnXanW?dn7Z*6G62mO5R=|1Ns8 zy=+sbg{s?CGlyCGsIb^#gNi?Pd!$qL?C!=7q!jf}Io17;m|Bx)cQ9L1Q;3XGxjl9K z9m4f?q00}r`+nE#6#K`a%aW_S(iFk-@LrDtX9I+MXW@lXlM!xaH;me)iwrxg)Ru10 zamf}Z_%7R;gk7iJ*`ZJU{+iUES9`Bz`J=_r$pOFH(pnshDLU56;tW?W3R|_tl-D;8 zcMY@8dSR8x^<{K(E47nLqQ*<Nl#7b@G6K#7e<QH!wdeh`wtTa?Wf4zQc}1F@@1f_y z=uS_`Hk<rcBD2Z6o_5r0Nor?Dm}G~2XJ24EJX|w<xTOZ6hq)$a2e`*2OuwG|Jz;05 zCc9mSr6$Wa1Hi|*Dnv(Znvb*}Zn%YC8YrXqbtY>UJl09KQX3SSWJDa#4W1Y_FK5S0 zM>ap4*I1-6q7^gkclb_+tgmZw$RKHObnI@Fuxy)8#N~FpF)l8#TDl9buaj<NXk<(^ zb2r$hr3^<6Q93s06^@;9_RG3>Wo5unbI>OJjVg|G=2%;X`EOy4k8|&i{b!yrew)|r z`7*X~YPr?P4Y5-J(S1(Z;9(?HH=#}ejZp4i7zbl}odyb$d;am(BK(`=9_OqpL*F+t z1BB=ME6VSl$+#EV&15hp1ReVrGY1QPd3I>xeYNQ$+X3@8v<NV_awyiD%cno2*t}ir zeQqd4@|3pP1(TZ(Mn-C|15NMZN2!sUt!sZ|H@xmuJK|$<GicZ>Eq+mtNr+aTK5uVF z+xbC{qd$62_6KU|0&eRDJ72EB_%~kJid^>ygj$E@yA8@;QfqTA9;;7|L34cv;OL!3 zi?k>YQgmy}Yxw#7xxVcRiIWQ*o~S?Sic1uqCE||st<RS6&%arEfB&`@c%%6pk@kCU zH}q*&cm!qN_S>P8^^IE$CDO~%=f}mD|9)ejd&TlM^0(?uF&7H6X657l??zsgh<gue z|In3Mqatlf$S>Q|rZ3uj(c~-i(ofzYMv2vC5Ar<Qn?mKb06K5k;32WNZ>I9qXxUSF zn<XPy$47gXLM!6gs0Rx82Z))ksPhr1nyMnTAf|zhOJD{{vz&W;x(PdQHDRo~mdFHO zF(C*6P{0yE%>lsNLY?Ul73deeFN(G%!Y_J%_`c}y5P!uS#DLb(_C`nVCn_Nmqt8C= z=LgaJ{Q|-i#vf_K803f}APtziXabu0hZhin5e^x{$UFoCNIVoE^H2y<;wg++=m(6b zgjp08F%16==P0v4fb|y`h<%*=EHTJIlraFrAhxnA$iYY$1aq(tJp4~V07B7^;O7_z zCYitxOya@lUtk!@QVxTF>@P5UZu<yK9!A)tJ^%#6lsi%!4d4sGIXx7H5oKc(1;`j1 z#z}v|QtUx_m<6&)D`X)U$(Y3^hf%ai@?aytr`qxVpFn_(La?$in~jn%3?$ctL7)2k ztU3%P_l}{QFHdTiVqF~Ie`)><!%>v19EO9kNL%2rlmzGWI3zs~;s78q$#POdQFS_f J^EOBI{{pcaN|gWr delta 2321 zcmZuvd0bOx7A?>g2|LIp0%3__WqB_zFE0-wB50UacC1xdLQ(=n*@;Amn6M*MD@34d z7AYvzB49xb3?f?@iV!J-;F6h8HWdjJs;mQrjx$AG-}~qLo$q}2-gD0F4!a*#-7drP zM%2Xru994{zpb_p8l9<OUwUMpV3n=wq9Y^Q=J8B7OK>dm!fL!|#xCIT<xk~2+Y5(> z^C8t<7H3~|W7O!aUxl1ky)(ic+iv!#X)Y$x>DTGTH{6c(%}!1;7N|u<%x>+Nprk;m zN{8m|r=q@~{6Tsid`RRu<4>aJDo^p!xZ5<U{UfupgI2hWG1KA;wrEa_$!aE?V|KhI zJ$R{GZqfNf#``%h!>T0xu-$wb=WdfhxF~fkIF<{q@Ds1;*pk;AQVL}eJ*(ssDRL|Y zY1D}8Dt><HDbqEKV3+4Z@Ejki#~WR4IRK9*X`d&VS$f!L-z>2!O!m9V^w01}w+(bw z+kww_)XMvN*|uMdb1sHz(Ju!f@s@<(a@~ebIr1l2rC@`WN%7sp)b=N!_M7`f`~l5R zQ*4gZ%2PqIOvF81v+|1!D?hIS-TO!RpMVora5qDgnBUj6&S?ogk~358DyO~ZYp>5s zKjYLP6nM*7@)}%{5{to;d9=@Ws+ARedB|+kh&-C)>2k;1Wz6}!*Ym)xdzHBbWvR~M zBV)O|<bTrKZvSGo*LPrh+*o&=$!pD0_X4i*?(MdxkgB184`(ZOwsvbcSh`F7f9z%A zS%(jvPCHle*(2jVHcnPx(e;pb=~=;*;13&DtSUPARaTQcVXM1z;ai`>&F$X%OKiGQ zQ|$|)mA;$wx&6JpAs)^^cLRGRrcGzL?u_G>{SNgpM?)&m-Wq#inZ2i%YV;nb$^0E9 z9W6;g#qXcLvW?IswyVzAWnXkcY2>1!675MSXz9ZWz)gQ((*jP>)kd#g<fn^Mnv${- zVfD+Zl_7jyd0j~-$G~Y1pmXZGNAfM6LUw=IzyS|ZUd;pOBgEhpzOpxGz$7Kty|U%e z&{T(uxgtSlvcutHjfJRYo&Dy%YOR*(?Jp+a;aepyq{MrAv))|lsCPkj9pOM1c)iJP zIq=NLISa2;{R#O(zBKfwn%!URDvs1@x`PItN$HpJuDAFPB&yJd6*6rUC`L_soyt|c z;;!(cLDH2uez1&<qxu7V=}P}hBE4a5a(FQMy_V*zwOP5wn$xf2;^Tjv5I1HPPMw?^ z8xOOsG1~Dsp*$`Trzje0F}=Y(c>ib9v{plJ(}T2sBs@&FN*>QN@q3sjFDf1UvcTG{ zZ&BUmC9n3?way-COZ0R5H(#{5^|6fhwM+VpVxD&=O*{l~{z@)xeQkD2tCB|j<Zs$& z>Sr$ew%xXOXql)@`KhNN&S%j4MqqqgY%GTx8NFDwycF7a+Q@HZc6iKA=#-zC6@pyl z5DqaN3?xa-5o4%dg;0f9nml<<)=;N2!s|_>GcR5Jv)ZIgk2Y^^@ui6|Fofn>dXFP{ zV%lT8?4URm8@qjHhT5D8m6PxI!2@;0g<2|3<v6e)lKh<KU6KT^DV+5)l&kj~aOX&T zdk91GN@8uN1V=ubA^&W^`?m7bVPIm+BjRs~i){IS4RFt0EfR2Jo2l920IzwC2Q&85 zye5N@;xuiGYXx<u?%d($-DJXdgRMoWGt%X_|Gbdeo}8)jRnAzBKl?PSzN>5U^?th5 zV7SOn*d03W`Z$Znsjj5f#;z)#1&^7%WVnwkNGkQ_#TE&*<KJrCLi)abVmCJ*)6{f) z?a0ysD>}07N6}aN2Mecv7zs~Zd(XArODL$3*A%GA<76T92Yvacd*o=KrQi+VfTQ3o zDfo@wSHRFS(ttp`wbyATJ2rxax2A<hh2jK&mcAet{B19Qz#s`au?J`E7{y|+!=v!# zE(~@=NH{wrn2C>ptRO4U0{;b@9T6=M5)BPDT}A{1;+c%kPqD%R-Vz0KL|PyO;em}0 zABM;MaR7-BiHyY%3I-z~2o^)JJQzd;-#~6MkYF0x#iO8G9|7dQz(5LdD-T3TTQOjr zM%uyyK$L>Ph!6_@!Q=g@6d=Ozf5FWd3SpW+L<qx!;J?5q@K!eg5{SHm@it5*LD<J6 z1Y<iuqENO*0l)z8PQi_OFtoli8;4CoU>Lw+2*krEAp=B=Vls$eEF_aCn8suTI}8~m zVtODD`Zjjn{}*k2mx;jEsSyOmU=$D=6NUIT&rOOc5H@!TOnG-9H~god|C{E|Vb){E xG)7@E#s!qTrQmu!biLpfAs~^0ZgPtq#Rv{zMd7GafhqzikVrTJ;gA~*_d7!DUa|lH diff --git a/hpvm/docs/tradeoff-curves/resnet50_imagenet_tradeoff.pdf b/hpvm/docs/tradeoff-curves/resnet50_imagenet_tradeoff.pdf deleted file mode 100644 index 67d3651bf15ab0b7c1350988846072fd712f4b4f..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 14155 zcmb_@2{={X_ctM2WC|IoV@~1DxTehWOemD;ip<wFM~0B1%rZoXN>QkcmCRF=q9lz% zNhKAH{`*`h<oW(Q@AG<|ZSS+r9@buKowYt^pLIn|G_<79(l~_Z#mCTvIs_U*LEerB z5Q>Tr(!^mu(GS7^iW!8o2=*aDNOcFYgNOHiNLd*{^m2wf$}Jh78AR4LCp(ad5N=_R zmbVuf!Y;H;9LQuM$qT~6mk6W@$=k`CNQP{{IvQGFeqs<ALh5>gHPwFC>c4AU$PPl9 zyE-_#dF_W}7KU3JJ02i9ks(>i<bT;gqX1!X1FBwL-ef<p1Q@RlSRs0WE5gDA+9Yp( z9|+#2x;cc@AO^TO5zVy0(%{O1<lyD!<3J*MIR!87e4#&B2x&-kc5_hk4g&MSi{WHY z(sB?MhnJRt*D-;Ikr$<AM)dRcCpiHz!gBf}Mp&{-WC*jg$g`H42M`m4)bapw)F3)} zI|H#B5WV)3T_H5A9W!{KADKjS@I(aVCR%lS3COZF^jUr2XjGOQ7;&EQz4BZ`B|^Z1 z$u~ldfi}3z`m(bPe{09nv)MEKdbXalZx6@~n_nyszZ^F*(D<%DxxQ(<{`<(!$oZ`T z_2vhq>6_k!9Em@8Yhe8Kc8$-BJNA_S;)^}e+{vU^89nm)G;g)~bW3O6_(WCDWM6l7 z;eLr4bT+a(_RH-NdPeKZa=M+^Ur)V86arrw?bR7(-&s+TBC>-UZFndc%GcFTK5aHK zC3>)~Y5(V$XQ^KAJ7v!{&tGY{E^tHD^j_lTocw|&yU2QGE48ww$V%Ek-zlST!Pz(6 zjz(U>%9*yJW@8Vlp{;4RRPkfc-0?UODN}jwoYTrVX?qJ?j+!1(a4}#LfU?xNG8@h? zybw%3?#Z<;ZNF@ol&D+rrn+Go8NK8}bRCOOUQJYN%Of=Rm&V+P0nS}9@)lR<711)( zmv3$oDL(EXz1KW@Q=?^s9(p_vk>kPac=2k}42v-1HjV9sPvnznR~habXsOlF@`)Wr zSkM#;xIgG3b4*+kc5Lh|)_Q%nr8@ISkap`0?Z$_x+Xd75^()Xyk{a>Z?U1#eWxrR1 zL3dY&rDqfNFmt4JD7zY7Cw>pEsLL=qO5EwbbYhzvq(k66rqi)gF^N?VKU9cI$BG&i z<cNO1w&y?!AI9H2-R2j&8UOXpqp26kyUZujkq?gEH$I}~_AVyZ<n6_g2FFz1jv^8R zQlH_fN^LZ+!m-FSDfWTQ-wK{spVhc0!C+7+qCT%mat|rwz4+D6<QVlnyFtI0g2&U^ zwHe{@)!{1sDJ(3JxbaP94{DuvM=Giu*yv|uZ`r6F@-^U6Q{|qL7_TXhhbDfxPLa+! zk!MP_jF}xD#Nt(sMf3N`Gjb@g>!s9(*>f63$X(skwwq^D!j)miJvCh&8=o+dqnsY? zggVqWQ?n~@hVl#3xeIOEx@U+=-(B^@d->yL_R@o9Wvo%fq}L_77orPYrJb_&TX@Mo zRNZrG#|2G<RKcF;vWnqjIVRkC(u2ohQbazy3P;|ss)G)jv&G-*b8p(MdT)q$5Ib7^ zC8(nsC+e;rmEvR+8usP$heu&HB`mVN>@tiaIU*?Z^Gy2GzM7+ZhaY1bTFRfErFOU% z#hU+|{g^!Cv&b!po^hG`-;Ii2kPZCe+jyVqdI9s;hQKclL2Z2_LjxaP@<u!xpJ*pV zo;BX<d8zl%_0Z7ps$tthb+yBW(#(m|jb=)!M`K1wFRA;KxKB?c>o}Y!tq^p2+32Fd zqeibKik#?96u;}P^*pnE;s@2{);)u!A49HBZ<J3`3Ym5y4{5WagK+U&VYPrXM%{Df zH9F*>_S5{R2j}M`JS?If=$xo{nQ$HH6j}86;)V9C#I6H_NNMSJUVYxuzx?xWz72Ty zvuf*;olnJQAy0H>*~<_sGiTBJ)XH;AR(UUDnkFG(J-He2XTHA7vc#?Jhd23ZD%6~m zZTS3<F?jEmW|vWsqX+38Y;oH##Iwh>ue<1($>Eb$A087oaG?w&pWXYoP0sn&r=8LI z`@AAI1gtZ7HuG^)O?qi;T~r`G)BY)zY$HIofx172tL8$rbI#*T`(7|b60Q0h#FK&0 z@cP`;(BL7(lo98mS&kkb<M-Fa+oD;2S{dLc2RuBfy{%ghe5kvw%ilD7X?QBG{?fTy zPY3xn&d$ssZUmk=ePVv*wd41(PsQ~Q{aG^x;;ZS0yTfwM?(61>)-2t!9$$XS%1t1= zN~AyAwD-n=nVl#+73Z^n`ud??WuL~+PR?{oN$+Rh%Q{aD73`seu-}gKcQB(kSlAPz zu{g|PghRtIW-abnPHvfN#wXOXG@=+ZLQ87c=j0-0<IR$P4zt-v*M#<SjuRR?lv%YW zTBMxSpO*yb5(<<}aeKVwuj%;RY;eZ%aQ(zS4vU!@i*Uy2jtbs=MS8VcB1Ad!5q~($ z)e5GUhT#7SA6V=P2F>(SY*ZPBZ<<f6D?jk-c5rU?!04}GhPM|_M{;u{am#hI=rfxp zl3KS0M@`E|-kf~p_Cxte%>AgdLO&a?y79kn&&Hcx#yLv#D1Z66O|aek<x{qaS{+Uo zXABAAJ@Wmk@}a>T1HzB>XWab0ZPN{5lX^J97jSTQl4^e{U&h<fIDe_jbqt0l`Q3$h z30E=rzV^K5n7mEGssoacZhnn1SAq_9i^6`X;H2k4r98;+>xmCAC(dW?qQhKc9J5!* zVm!M|i0$TSk=por*EidayR>Bw$Hb-x&5kJL=6<TXnX5s!m+`seAjfgukS>h40*l`1 z1FvXLI$Zw}<C`s`S7Ux}%C6$l)BR!&kH;=onPFIZeSMHQJ?tW(BWMTCEpaDj-?1mO zG<>!5V7IO9sTI3-;^($UI$JkB7`HXdD%<vaYw2h_*MqtFA5Z%Fb|%S&Rji_#wMO<8 zrX!VD0h<nOsPQd^XY>icbXvbx-+IV$NdEd3R$(!pEwaUT#g+O6G%v*W6<)e^^1e{# zolAZ#qD0XIQf*Uxu_+^+stn_gP305V`L?Xu9q(C_Yxy1i<qxwIE&ZqIpm57<TPZ~$ zr5VMn9(pQ@rz2r5b#%sAJbm`plrpB%GD0_=Y8%o&tFlB9auZ{mzx({Tqr$7ehk3Vq z8n0>~IlCqL3U;(S*8EzTl~`9P-&)#?I%V_eT<)=0!_@5Y?fSgcOnJ>$<Hs|?T3BUq z=ji<t^_y5m23&U|Zu|JpD_qwe3dAe&<XF79&vSyulxys(pmAZ`YrPi-6}Ji6hN85% zIY!OzsvNJfE&I4v?aPL0zRD)m1G?i?;))ObLyV=pcy6<?I@emv8JWJFx`Vy(R*zAM z<JxoUcVP<I6|ebn<lctA*Bb0fD0dYM@30b|6t#bBE~d%hY;?`q>)1fvyMk>cezAVM z(3_#7v1VhARu>L2D|kyHha@}FdYCX;XASe@f;z$zvp)r=j|csv_L%WgSVizlarS=+ z9)m?K6FlBj*9!=~VUlYmlIn%@&30AesrgHr`x))+UG;Nsv-9(_2gTp&Qyp!y8BEBY z%jvY6u2<yR!dM+8Zc4j1Hgt$l+Iv&xPr06hANKAcr1BYKAfBO5A2d4bGgT%U-}2pe z$Z-?@7JInwxDK9wx6wZRJNg&cwiyqFpJmd!Qud~As!ICGl)JIq(A0VHbYm#brgB|o zqLl%+6YnRDSJJJYR7<O3@(zD2Jh-9q>p4LVyTaT^@%8jYDhij!ujR*E#hmu{&%c&f zz$dZUcwPpF*A*J&6wMv#ud$i?l^6Q+2Th5EdHyQmf?q<`;({?4+2!IgR;Z*AWS%6% zNGI_oc-r?>WmzC+z4M(bSvY>_eEg)LEw*>8gW=-P^fhkYBfN9m=F^vyrkW4GQL2e! zc=bZ<q=QH<ezGR#?90}hcCWC5x9VALIA-wBUPGmRPH)ZX4!C8_C+=%h$?x<)!^6pL z^7Y0+c7ts+f~TfXq8$QU5@D4^2?Tkg(fD!4(pKr4?;XN!m7oio1gz+c?>L)eD7_au zy_x6J(}BdNy0~L|*I|<F9Zrdi_3k#B8*tZdH$5hV7C*x~iec~1Eq6o|Ft*AoAhvVS zWxnz28d08iwX-o?MON_p)|zC6!!MJSrb&IGI+}S>iH?WC{-<PUUZ76;@J%H>EuR>{ zJv&qOaGX5SsLx!H=KVrjOI3;KCQr-@Y)c<cjB4-UGb*{CdD*gpVjpRFO$7M#HZ)@s z2<v!S_xP0D%lcyba5R@*oUPY0Z8|}%PS=9V*COB7V7j!jFY!TS@l}tid&zGF6A$a_ zii%~1n{S`LB4M+A;*lojWu4K;R72cz$(`ycjy<Q-OlMJ7kAL#5OxE_l(>at&kjff* zcXrC)Agi@lmgy^rQ{xzA@$)mbMq%%&Hs2jBEZ+4!_ve%3jE_QBhYJQ?TZx)xB@UcT zs!^Oz$e;d2bNuw~RS*CMesf)$^p;zip=rIR29Vy<jkeFICf$Scn9S_Hc&m3ZBsD<= zNQtwe++W)CnV-|!u-+I=-lcwM%TDxLtfW%s=>45zwVf6lZ*Z07oHXHaeE!pwwt<_B z6xz{6|MUcr*{^r@SrJ>Ny%^5^qPMOX&p-p#xhcNjc(ZDb#il^RjgsYDx3L{lma>)d zRgg!JDCdhdDh=8po}}CMMF|z@Nv)(U=MX7$d+Z;)<~KbSwIgkikz6>~QrGkB!uXc} zS#{*cXXomI!k8!ZFE>pThe{qSSKHe2$v`1cSNQse5>9p(Rywn+Q_1P_N{(UGD)afo z9dqA1`(}PJac#5RzKZq1ulH+_A7Qy)GfY(FXoMJO-Zuytn`qpbCx`UU@4Ik+!}Ug^ zb@AMqQJ?S!^il6>FkJ2?5+~C#8+V~>nI63>?D~b#H8>S}FSME_x^mh#x?lJE7vB)7 z3qM2FkFfR&2wAVfA$)?c77ka!B~3GZQY(sKczmNRgdN-ea;Cw^>h;g+bRYQ<#ZSx( zj2I=NPRV1L4Clk60w0U5oR177<!c?+XkmLQQP5cav9=+j#kRvMUT?Z!^Ulm-S<LP7 zJEhgyXJw9Q;dcm+L+Xb^EGK&2nGQQQPX-9SmN(~)^O2lar>ji!?~+KP^;CWOP<X<d zTao?o`l>NfRjq;7=bP_rnDQRT+f;bct1xExJ+WU;)zZxPh^poBqbJ-F+qUkiJF44M z_R&^c!Q|SVQgVZA1TXz2f_-yJ*@g$M-c-%Xte*<CTEB3eJ|_wN>K$^q^#t9ib6I?B zy^)fyd~33gu3XzXm7`@ZHr7pVe)s~(Q+_1-1Eaq@(XgQ@@6n5PwN!@pX1?n_2=RN_ zZPvW%VLZ-wW#{%3tmBz2Eq57{@2B*#Z_Hf(`3~Rrnt0wzi7jJ!H%4|g5%MR$^KjQh zKU+mLYem{J%YiHwQ*DAWxs0Rch~*YTjJd2wNW|Kz+&CX!t0v8(!`#wEUBw)6MKm|+ zh>aRfH7&}Tt=L7(g-utEMksSVRp%A1SF7-{R#d&hi>}H`2AicckAWn8&$_!AJ_w<_ z^;G#`%vV=gbgg(1gJ0phll4<*AO?-uyeQ#_AESy{*}F*dQ@E1`G69+Aeje5nF5!?I z-x!Sg+41qV^R?Gk1umRQTz6DxW6$XJPu7tGrzBqVT-CVxxdKCa{Cacr3zqTnT#L#Y z(>D21c`mt<@vd*B?|#%scHIz;!_avQ5;n;Ayt?L)z*%(LGBEDDM$*~T^F=QLl04?3 zQ`>HI4RF;M$$Wn!x`{q2ynxZ*qv2`)Se2J!5+NkUFVnvDGb)#>8yL@cFpPGvC2UZw zH!45mJNe_uynOSi@kh*7#{;Jo4BYEqBGH{kYa=w@ZM;9D#8WoOy!m}+mJ{h-nEJTw z<?a#HlD;2XCudnrj=CKASB?NQ{e8yO7S8IV&?pQVlETSgA+#(32ca=oEPP=1U+4=f zi$I$DJCZ49pIY7|&&AWg#pB1NM{zGPIFva`2EVK06-HKOT@|TOQrJ%zViGxKuj+h{ zih3Ucu?nYa1%hZ9^zvw(q|ea|F{{t!s%d}FUVku;T&MT`W|^R2Rm9l}mGyz*wCAtV zao(jvdg|9PDcdmiwgpYdc;PulY5gNC&%AU^GYhM2bxS8tyUO_OU@GI>?6M&wt%&zD zn>^Q(slriG$-CaK+CHE6eb%mcmwUUV(fQSe!!DmC-W|-@Y-%^+e5l{aYta1=*O$`E zS1uO6Iy~HKhdoXE?!3U%PrCDd@`qR1>6$Sddd0hjNj<wdngQSvrqSMmQ?h;C#ofO= zMRL!m$*6IpWgB#+#mRT2T$C{0tgP=C*AVy0%UMQryCyRAvgqs`#B^Bv*v8qBfs873 zT&61_oC&+Vlz;O*eVJ^Z?VNR|A2(yav3QbcEaSFR*g$vA*gv=ik=j`|G2hDKJSN1f zXSja8j5KTZ_`Q;Vcj1vV@`v)TuReA6Ni`TL9oMM4{`IK4Z;3oY!$<4D$D?G2*2)g` zGXu1EB^vjuROL@XzRJ34lX)ZFa>l>J6$_}(57x9#(!?D#`?QL9)(YF?mIqGFKc@`# zwBced`P<T~4)H^Jk9~~mLTltXFy>DjV&iD@m?dM)r`Kl{+2d=p%o0z%ExPQr&p&c3 z#QehNS^Dk90Rg^Bx4U^)<7>@;2rIM9Kn+dwy)*=why9M)KGPKur0!K8*fsuZxT4u& zYJ2RaLprRlD|c;~G<WomfBUi6FtZC2%ysUS@$kGU^E$8jI~}T5suNB*j&TgOHi;;{ z!U^+>KakvdLcy5dC8n(<@#QILUi*A*YP?PQ?MQ;BysS!H5?^&`)e~nEW}!D~x9L-* zbI;z*JVMAjXRjA1b~TCLR(YU+#&^$kM6J`YRPPpn-35&fcMN<7_fZd?>#W?}0qMpH z_J$ROrXdo~bp`L>YGO?-VE-IDaC&`uZA{_$S1CU>KU#;PyO!YpnprQhl<L;g?PtSo zQi~+onXjVEwW1(2ZaIonGOlHzS-{|OjUkdYGqy=@Di85@X_+wysZ1vmwc8Q4j_Xp7 z6F87=JLlZ^BCXfswLcv`>R|oMD`g#Pw4|O@p449E<VR|DI&*_;Ps$9O3l%T1EBJ`> z6wxbN@bBIdqN1R|wYyxm*?Pdz?!c+b>jWx$oBWk}N@-u+6?d1|YGR>wojji(`sh0o zl{a#7mA$PMKcR7iWd?yip}zsVi>c2NVvs1a@WVb^O3O=TpUODwVPs?)M&9C%PEOD_ zuz!2f>A9$dahbA+@QFHI0^4~LSz2RP17@s8zvzc9@>5KnO+&8bn<yil(bpGc_zDwT z_BP)&??vSN>V7P8fT)#4gPF9b=B%u=ryHZYsyK^%cg}X=aNMb`%M(WE!pAOJeBtxM zpMI?i#h{#4+1*+p2oAN}ALGsRD+SOD!?O}KDy2Q`oAS?^dw(X)dy2f>+jE*XiS49P z9CMcWuXFlS5<3ijy{Ynu&*{1;KI@gXqqRMwN6^yziMVhJ2d^>uKK1||R)ftk=J<hL zT@~7s*``fm4DYM#BW^x$4Dj4J5EF1Fvpi!m2&3X6aq+T#MXyQotpX9xL(vWU&yW(X zSmL+0x)Sat2z$Meo|vj)j1F!aPVPujZIw^kG)1d&u7jnLURYd7!7y;tZ%!h=TQ|RI zPnpvs=WZx^ojmk1sBoQ~S#tSJ!jtyXNqAIp)8^!;Q}I%U9vJJQNct%4n68V8=!X?| zspz?U3BGbBjHJ_i(K%m^Psn}#QB}CZ<3#s#a@h;MvPTu)aoe}_d5Bwn52>2WC?(I| zj?Th<<CwOb8ylZRyZRKV6iI)36CTPFr|PXyUgN?v-FTa;uO~Q+Gb{0)z}@K(+M{ag zxt_<ptS2DSV&f~q8d$%hNcxX-!pJeCH)9v?3WeQo^U0>VyrU9#>!Z`LbvQj$zIm<S z{QUV2_OgUkG`nU%x7=*8GFJe&MBz~MDO#~A8OiZ#M+JFws9JhnMKO*=>|U?JS13nc zE*7_gg{p0WMQoLY*9z*;_~p((GpQC1d>VecR2exgGrXy2zZ*Dzp69FH6>lo3xJ#nu zrCY^?N*+IO{e%kMkZ&kzDfv+BS}GrjpIcQAeKN=heW9qjAS>?vWX-wK8h@Acdv4ub zt%qh@t+1&|P53wB9}Fkv@)fU~pRhT7i772i<*M8%T@!TJCCA93+<-$uxtpFT^J-^F z2>UvEA>PjNQn}o)s%d*kH$p8N{O>MtOCi^PVk#_VnOl%DBODh&Gk5<)gxgb1ew@zR zVsQWaJXv=a9bKtMx8f+NSwl2jGHKY<{lx<<m(xujE?ww}d9O3vA%DcZL({f1qiMf} z<S4pDu_fyL@Eq@!uNQkdtLWR1L$kb=Z_hJ!tF;}&UTNNbym@;^h5e*ZX=sr7ndb8% z4__XeS8ny=iQ>K??AnO@I(|Vo=*Ah%M@lZkS8P}_+_cj-W<*c9^>&ZyPfA*~zx)af zJ>gPNJ75)2hze{AmOhU0Bx<-j?2kUv*7{}a;fZ2eyNNd98M$>|v$i+U=w<hK6hgl` zUJIVZu$7d*))TKb<qCY$X<Jp>a7q_rGpl*@_Qu}HrmW9_CovapD+Fh778_~K@rGWb zWjXco&5(Omj8-+_)(+8wG65l^kJqzzvv!`>(c!vRhcGJY*r@+7V=nt>cK`kE4^J~^ zc@ND~V_zJkUPUQuMVBjIsIW5T;Drjcj-#X`7BQxkL3yE4K9b%|$JtF?)g##xDJjg9 zlp!Fk&g6vhbo4l5@8Y=Lc{hZH>iVfja~#R5@Um9?2fnu~h1HbjFQR#)x-=DE+Pb%^ zOuSa)gvKs+*_tQ1S!h8(dT;TZ<0G%~`M80+hrcdOqOw)h=2V3grcn$RXFkf@Y5j1w zuvzw(%)kYwbjhuCA!e4Wuan{$K6073p-fw1b{lL~cj!IY)scaz=6`<8F#4iLx<y%Z zfZ}@AYiEfU7{bqcI=qpbW^UTGx2BzCXOezd;PqWM4yl==GFRElTG7_Z!wI|@p?pMz zc`^)P2So(j&K;*aAO7xXY}!7A`W4Hg*~hul-M$h9cS)s6-<;C*jOYJ2+lh$hOtBJu zRLq=g@>Ma$;^zMDh8u&q?2vvsA~iLoU6kn;bLJCifpI7IohO6?l|t-yB^>6U?%v2* z<i1<JzBxPbhhho!&KK;Ya_edVCoUb9I^RcJ@6~65JNO`%lNm=5c9!wn;i}W8wadb0 ztu3sfyHf(XJ_(5?$q@mOk93#`RowOGkl%Kvon$0W9TR$2*vO&Vlj~?F|BY3dEP3vx z%**ae$MwGH>~J@D6j-F|)+3Z!o~h|A>m8<4qcS7ib!hKxNY#9Bo7=%VrKg;>W%Cka z0tK>}aqR+eJ)DYhSdO$$VcWv@>5J8EOWwJGrQp!MEky~r>%FP{?&e=RO2acf<ZwyA zKBm7ldo)aCJ7<XfxA_Ni!M}dc2waP>SVc8!#dH|l^3w?^Q$h-jDns`;SBRS5z4sN} z$+PpX5na^LvFeb#l&?vfF<ZutCm%a@U3vaKYtya3TLn+M{Q0>FY+O41n=A~^*)?pr zP9GrPvTo)KUNV|@l4hq9LN>%V$e?X#%AXfc8~VWA*u@01gU7eCh0fgnVMkW8aY+Kx zlx-hNM=kpxanqTc^tOb<!iIuoIt6dU&SO=RXp1hG79|R_hQIC>KIO1W_yF^FTm9h& zew@90I0fztiiBcqtK9C+R;C^FKO1g0YFhAJP-NrqxjC%<GxzPN?34c5s@6|)d;KaK zP+Zp2V^{f;46n|~B45lYy`22|gXUbZ4fiSnTr+T#UH+UJCK+x}6J(2?%`N0X70l3_ zm~piQ2X6r%zP^b1q%Wv`wwY$L_6sfT*xret%bn;A4NMrZy@@mV36mnn8Q*Yu-m1Hz z@b0!GT3nJ-*xo(uagO%Jx#uUDRHNR0Z06adamM|mH$LA#H9Wljp56Uo|Ji%Oh}@3j zCd~XkbPw+-ZFtPl`#xLjriBD=j$9h61y6G25M#`_V#iGLU9Jf)ES26L+dK7@`qmeP z@2l*2t#A^*JSbxE0CJezxoU9-kCEAh!qZl=+&17cW3GMn)|P=&B)^%?oY5ijj33)} zznXq3i%izdtE|0Nz=K}#U0CUa>IND#IExLA*~ummpl1D~Y8QAh+ozvBviA-{0-9S* z-TT~u4LMOP?Z=As&kcT%I2y0Fp(D$FL-CpZ&HIx>9OX7W-g1S%_eI6)Z|wCcmSK0s zwCY4%qqN_eyFJeE<!qtB)4h>+V_Vd~R-1p`>LD&=<Vt0VyIjfB>*MDq=kM)$vTxs0 z=~XZsc%%4xuJd0Y0|WBFmIe{LNjD3CQuX^0+~+o~i*(MjyzW2m?s-SSfqh=hx+9Gt z_<U!t#A|(Jc>~sK(l;xFn9BS`-*#m~c{|=0WK~w&efYxOhfi#81h=gHgj*sXp8Pzj zL^@q9vQ_j$)<OT!i5teeK?itA0krJuo%W0e>bMQ9uPEjx8BxEO4{v@aHCm7|)0M~n zl|OgtgF=<pbMXTKLU&1AnEO?@_Du|&np+fuHA{S6R1F3W;5hEPFd1b~mmBR&*g$13 zw0C#J+j0$pe*f{}eD8+#SymM)^DF;q8@<#$sX6!&DY<tbvF(6^rN24IX9J0Dt_~zP z?amKE|IVV*^aA;1a8{k7A_A%5=HfzxbKXF<nhgX{8WQBdIT0bOEQItSf+RMO)&@`> zl9CwbL39D>dXy%~ZNDqHBSY!s=Ismt|A!1AlDr|Lw-*tCBnN^P*_A{DiEJ()8xH(; z3m`(^1QY2;3?O<zGN7dj$zkE#JMbYKX$=to#ox=BNb+;?CJ`Y5;J^VQ5D-Ww2<Z$V zi4YnEdhZ7n=;#J2Fv<f$g5x<auqa>zq{`tSBv=Y;8}R85PW#XhG8nQ${7z*2vvXi2 z;0Tc9M~0;XtP2Ut?SCB^f-V|FKPQr#580apfu$%ZjqpfA2QtZxk{T$D0&j>gC`SEh zfzw6<hu<r~C%8x>2TvfBzdBlS+6d=cf;}#zT`uU-%FUVV3Y52yPze}c%vxCr8dUu# ziXvbLK;aHPIz)<gz`kIHVHHYYz(Eb1Ubx>6C>d;iZTnrxD#*<Qsi1C7s$Tm&h!6^- zMLGmg7zF&`fbE!*iJq3QJpb-siO7)Fe;8Y`k3}G71IeIJ|33=i|8*WLS`OGc8cyy6 zhA9WhU}X>lJPMKpr*9ZpJcI?Gui%0Qe8TtP{#bBAOR2#F1N#40K`h?Kg1&Hp=U6Ow z08`N@SO9W3P{2e4IY?F(T!Uv44;Gh&3kVn(0v-Vhcz$HS(*z~~N87UC;ggjE41foO zQp5MLpn%jSSuh@+lORU`^#Tp9DVqUD-Ea*v0dEQ}GQiT{v3RiYg*iZ9ur8&I#Q{Oc zQ3_s`K&c6`fL}nDL1VxM7uLmrb@AZY!xG?eI0z4FIUp>!rW6d^Plh4{xQ5rKC=C`M z-~k8}0k4S%V<<vdWDc$=mIkkcgD+69h+wwit-&=Y6n((UEffU)x6b}dwa@?zE=+*| zI|jq>Fjs(ccsOhZi|Sgefyxkzj8JC40muTjO<^ClgFo}bz2H4jj0wyHU;mi}W(tKu zQ11PnfFMJF77J_$3pN1T!SX_pHLwfVCYBV6J-{}xSSYsg+a_RpSXyAdDA!;P*ftgm zymJ{89_SG83fsUcg&+rX0^0?MJ4?%=O)S>1P=MYMlp1a?DHL5V+=u)8DX?9@T7_%7 zKTd8bdQ=6UaA2PcM+_iDO3NBT!{`9e93V<w6KFX>K(iE@GlYTPKLL#h0VRVLVj&Ds zKKKJNB{BiEQ~K<OFq8-h2N#f0LZNv;nBOaT0_iT$ynux)%;gOSOiDix^MExkw0%Hc z`0o~+7ETG8U}KbiOJXOmbwEP=iL$^UEOa4~;CKquM7bn^r)Z(&2khy0ixNR`zqwcl z*uec#`h%eSn~#O)i~G$B9FW0ROG+PqIB<g&Vrjeuhtiar!Eh+2v?$@6g1#u}=f4Bb zqQ8c-=(T{+FU<MZp)ZR|=!Il^RWRXV4F2<T0i=@yI68nsu_|V<8;I8nq2uqb3KqM; zJ%6(RFZugp{$Jt}Q~67T_q6!g|8K;V@^tg^2Z8XnI1PaxS&-h}ffT~x$VFs;mBvF- z@V~|KKfWY_vxg<Y{eOK*^h{FSa2R6NoK@09e$ds|Bu@6b*JK&I_*qTkt70&MyK_S^ z-1Jw<2czP~IT1#EqpH!6drAFu1__(S1}e9;pEi{m?DdyB*LTOqiF~YLW6Ax&iJ+pK z*D_C1cYeFPTTpAKz=lhQL%G$(Y$ed^<DN1hhnwfw^Y-3v+*+6z?zBh!=fC(8mdc_K z)V=)ykN^tOcXNiZBIQRHX7I<<1&uBIaOAH9fuC4182oa9?B?yI0sl^9n}$3Fe3nIF zP*@E3e2Pa&q444;l(+-}sSfu8=xUp%1KG#Jo9yA{2!xH1#z;c0WU`Mu68ZOrv^Qx# z7y`hpv%eGg^~BOKKF%(Xql1$>Fqnn$5=+b{h(vTj0Dq50fPnk^55!YEct~BKKQtIF z!+((1ZyFjS2S0KvXju3GSxJ-ow;v9Gh?V{DFj!nM77x3uWi)UB!178O7R2+FGz@^m zD`;pr)W7?HWB2lY089hFyOJie3Jnbc?TUVA0_<4ezdy1<q2*u~x`IXkv1kPig9Y(y zB@K`NmqvhH=<@ln7}!a!py5}c$pNgiq8|<|`!5YXUt8Hv_8<MCaB~039)$FN=EGy* zK(L%Y5I+B<$^N^p3<?fQ%g4%~W&e?%3=Uw$mHlM@<prqnpS{WAP%Cwg0uU28_~q*Y zIPnin4*jo=<S_s05`_DIc$UKxR@yy^fclpg0#NWuUI=)Zf93<w>EHPX|ICNR5SBh~ zWRioM2a!bi(VMQ9i#I@Jz+nMUVCL-&`#DP6+${vWwEjMsav+f@t^<ul;qeGjQB7kl G#Qy<qY|G#P diff --git a/hpvm/projects/torch2hpvm/README.md b/hpvm/projects/torch2hpvm/README.md deleted file mode 100644 index 1f06142f52..0000000000 --- a/hpvm/projects/torch2hpvm/README.md +++ /dev/null @@ -1,111 +0,0 @@ -# PyTorch Frontend for HPVM - -`torch2hpvm` is a PyTorch frontend for HPVM. It provides a set of API that - -- Generates a PyTorch `module` into HPVM-C code; -- Exports a PyTorch dataset to ApproxHPVM dataset format; -- Compiles the generated code into binary by invoking HPVM automatically. - -## Installation - -`pip` is the recommended package manager (also available within `conda`). -Using `pip`: - -```bash -pip install -e ./ -``` - -## Getting Started - -Let's look at an example that uses DNNs and weights pre-shipped with HPVM. -This is found at `hpvm/test/dnn_benchmarks/pytorch/test_frontend.py`. -*Note* that below we'll be working under directory `hpvm/test/dnn_benchmarks/pytorch`. - -We'll be generating ResNet-18 into an HPVM-compiled binary. -First, prepare 2 datasets for autotuning and testing. - -```python -from torch2hpvm import BinDataset -from pathlib import Path - -data_dir = Path(__file__).parent / "../model_params/resnet18_cifar10" -dataset_shape = 5000, 3, 32, 32 -tuneset = BinDataset(data_dir / "tune_input.bin", data_dir / "tune_labels.bin", dataset_shape) -testset = BinDataset(data_dir / "test_input.bin", data_dir / "test_labels.bin", dataset_shape) -``` - -`BinDataset` is a dataset created over files of ApproxHPVM dataset format. -Any instance `torch.utils.data.Dataset` can be used here. - -*Note* that each `module` is bound to 2 datasets: a "tune" and a "test" set. -The generated binary accepts an argument to be either the string "tune" or "test", -and performs inference over a dataset accordingly. -This is because the dataset can contain arbitrary Python code which cannot yet be exported into HPVM-C; -instead the frontend has to export some predefined datasets for the model to use. -See TODOs (1). - -Create a DNN `module` and load the checkpoint: - -```python -import torch -from torch.nn import Module -import dnn # Defined at `hpvm/test/dnn_benchmarks/pytorch` - -model: Module = dnn.ResNet18() -checkpoint = Path(__file__).parent / "../model_params/resnet18_cifar10.pth.tar" -model.load_state_dict(torch.load(checkpoint)) -``` - -Any `torch.nn.Module` can be similarly used, -as long as they only contain the tensor operators supported in HPVM -(see "Supported Operators" and TODOs (2)). - -Now we are ready to export the model. The main functioning class of `torch2hpvm` is `ModelExporter`: - -```python -from torch2hpvm import ModelExporter - -output_dir = Path("./resnet18_hpvm") -build_dir = output_dir / "build" -target_binary = build_dir / "resnet18" -batch_size = 500 -conf_file = "" # TODO: points to your configuration file. -exporter = ModelExporter(model, tuneset, testset, output_dir, config_file=conf_file) -exporter.generate(batch_size=batch_size).compile(target_binary, build_dir) -``` - -`output_dir`, `build_dir`, and `target_binary` define the folder for code generation, compilation, -and path to the compiled binary respectively. -`batch_size` is the batch size the binary uses during inference. - -*Note* that `conf_file` is the path to an HPVM approximation configuration file. -This file decides what approximation the binary will use during inference. -This path is hardcoded into the binary and is only read when the binary starts, -so it's fine to have `conf_file` point to a non-existing path. -An example can be found at `test/dnn_benchmarks/hpvm-c/benchmarks/resnet18_cifar10/data/tuner_confs.txt`. - -## Supported Operators - -Any builtin and custom PyTorch `Module` are supported -*as long as* the generated ONNX model consists of only the following operators -when the Module is exported into ONNX: - -| Convolution | Linear | Pooling | Pointwise | Other | -|-------------|--------|-------------------|--------------------|----------| -| Conv | MatMul | GlobalAveragePool | BatchNormalization | Flatten | -| | Gemm | AveragePool | Relu | Softmax | -| | | MaxPool | Tanh | Identity | -| | | | | Pad | -| | | | | Add | - -This choice of operators is largely constrained by backend (tensor_runtime) supports. - -## TODOs - -1. Optionally insert a Python-C interface in the generated binary to - call back into a Dataset class and read the data. - - Needs pybind11, hardcoding of Python environment, and some fiddling with import mechanism. -1. Expand the list of operators supported in the frontend. - - Most ideally, create a high-level description of operators that can tie - HPVM-C intrinsics and the frontend list of operators together. - diff --git a/hpvm/projects/torch2hpvm/README.rst b/hpvm/projects/torch2hpvm/README.rst new file mode 100644 index 0000000000..7d1aeae3a8 --- /dev/null +++ b/hpvm/projects/torch2hpvm/README.rst @@ -0,0 +1,148 @@ + +PyTorch Frontend for HPVM +========================= + +``torch2hpvm`` is a PyTorch frontend for HPVM. It provides a set of API that + + +* Generates a PyTorch ``module`` into HPVM-C code; +* Exports a PyTorch dataset to ApproxHPVM dataset format; +* Compiles the generated code into binary by invoking HPVM automatically. + +Installation +------------ + +``pip`` is the recommended package manager (also available within ``conda``). +Using ``pip``: + +.. code-block:: bash + + pip install -e ./ + +Getting Started +--------------- + +Let's look at an example that uses DNNs and weights pre-shipped with HPVM. +This is found at ``hpvm/test/dnn_benchmarks/pytorch/test_frontend.py``. +*Note* that below we'll be working under directory ``hpvm/test/dnn_benchmarks/pytorch``. + +We'll be generating ResNet-18 into an HPVM-compiled binary. +First, prepare 2 datasets for autotuning and testing. + +.. code-block:: python + + from torch2hpvm import BinDataset + from pathlib import Path + + data_dir = Path(__file__).parent / "../model_params/resnet18_cifar10" + dataset_shape = 5000, 3, 32, 32 + tuneset = BinDataset(data_dir / "tune_input.bin", data_dir / "tune_labels.bin", dataset_shape) + testset = BinDataset(data_dir / "test_input.bin", data_dir / "test_labels.bin", dataset_shape) + +``BinDataset`` is a dataset created over files of ApproxHPVM dataset format. +Any instance ``torch.utils.data.Dataset`` can be used here. + +*Note* that each ``module`` is bound to 2 datasets: a "tune" and a "test" set. +The generated binary accepts an argument to be either the string "tune" or "test", +and performs inference over a dataset accordingly. +This is because the dataset can contain arbitrary Python code which cannot yet be exported into HPVM-C; +instead the frontend has to export some predefined datasets for the model to use. +See TODOs (1). + +Create a DNN ``module`` and load the checkpoint: + +.. code-block:: python + + import torch + from torch.nn import Module + import dnn # Defined at `hpvm/test/dnn_benchmarks/pytorch` + + model: Module = dnn.ResNet18() + checkpoint = Path(__file__).parent / "../model_params/resnet18_cifar10.pth.tar" + model.load_state_dict(torch.load(checkpoint)) + +Any ``torch.nn.Module`` can be similarly used, +as long as they only contain the tensor operators supported in HPVM +(see "Supported Operators" and TODOs (2)). + +Now we are ready to export the model. The main functioning class of ``torch2hpvm`` is ``ModelExporter``: + +.. code-block:: python + + from torch2hpvm import ModelExporter + + output_dir = Path("./resnet18_hpvm") + build_dir = output_dir / "build" + target_binary = build_dir / "resnet18" + batch_size = 500 + conf_file = "" # Change this to point to your configuration file. + exporter = ModelExporter(model, tuneset, testset, output_dir, config_file=conf_file) + exporter.generate(batch_size=batch_size).compile(target_binary, build_dir) + +``output_dir``, ``build_dir``, and ``target_binary`` define the folder for code generation, compilation, +and path to the compiled binary respectively. +``batch_size`` is the batch size the binary uses during inference. + +*Note* that ``conf_file`` is the path to an HPVM approximation configuration file. +This file decides what approximation the binary will use during inference. +This path is hardcoded into the binary and is only read when the binary starts, +so it's fine to have ``conf_file`` point to a non-existing path. +An example can be found at ``test/dnn_benchmarks/hpvm-c/benchmarks/resnet18_cifar10/data/tuner_confs.txt``. + +Supported Operators +------------------- + +Any builtin and custom PyTorch ``Module`` are supported +*as long as* the generated ONNX model consists of only the following operators +when the Module is exported into ONNX: + +.. list-table:: + :header-rows: 1 + + * - Convolution + - Linear + - Pooling + - Pointwise + - Other + * - Conv + - MatMul + - GlobalAveragePool + - BatchNormalization + - Flatten + * - + - Gemm + - AveragePool + - Relu + - Softmax + * - + - + - MaxPool + - Tanh + - Identity + * - + - + - + - + - Pad + * - + - + - + - + - Add + + +This choice of operators is largely constrained by backend (tensor_runtime) supports. + +TODOs +----- + + +#. Optionally insert a Python-C interface in the generated binary to + call back into a Dataset class and read the data. + + * Needs pybind11, hardcoding of Python environment, and some fiddling with import mechanism. + +#. Expand the list of operators supported in the frontend. + + * Most ideally, create a high-level description of operators that can tie + HPVM-C intrinsics and the frontend list of operators together. diff --git a/hpvm/test/README.md b/hpvm/test/README.md deleted file mode 100644 index 18cb05b833..0000000000 --- a/hpvm/test/README.md +++ /dev/null @@ -1,91 +0,0 @@ -# HPVM Test and Benchmarks - -## Directory Organization - -This directory is organized as follows: - -* `unitTests/` and `regressionTests/`: unit and regression tests for HPVM. - These are LLVM-bitcode test cases for HPVM passes. - -* `benchmarks/`: includes a few applications written in HPVM-C, a template, and directions for compiling and running these benchmarks. - -* `dnn_benchmarks/`: ten (10) DNN benchmarks in HPVM-C, Keras and PyTorch, supported by ApproxHPVM. - This tests HPVM as well as the Keras and PyTorch frontends. - - * `dnn_benchmarks/hpvm-c` contains the HPVM-C version of these DNNs. - Their organization and usage are similar to the benchmarks under `benchmarks/`. - - Each subfolder contains a DNN with 2 versions (2 `.cpp` files): - the `tensor`-targeted version which compiles to `tensor_runtime`, - and the `cudnn`-targeted version which compiles to operators in `cuDNN` - (has `_cudnn` in name). - - * `dnn_benchmarks/keras` contains these DNNs implemented in Keras, - and code for generating them down to HPVM-C (testing Keras frontend). - * `dnn_benchmarks/pytorch` contains these DNNs in PyTorch - and code for generating them down to HPVM-C (testing PyTorch/ONNX frontend). - - The code generated from Keras and PyTorch frontend should be largely similar and functionally equivalent. - -## Running Test Cases and Benchmarks - -The easiest way to run tests is to use `make` targets, -which will also take care of all compilation of test cases and test fixtures. -The following targets runs these tests respectively: - -* `make -j check-hpvm-pass` runs tests in `hpvm_pass`: `hpvm_pass/**/*.ll`. - These are regression and unit tests for HPVM passes. -* `make -j check-hpvm-dnn` runs all 20 DNN benchmarks under `dnn_benchmarks/hpvm-c` - (10 DNNs x 2 versions) and validates their accuracy. - - *Note* that this can take quite long due to the size of DNNs and datasets. - Depending on your hardware capability, this test can take 5-30 minutes. - Also, this is set to run sequentially out of GPU memory concerns. - -* `make -j check-hpvm-profiler` runs `hpvm-profiler` on some smaller networks - (as it is extremely time-consuming) and presents the tradeoff curve with profiled speedup. - - *Note* that if you're on an NVIDIA Jetson TX2, you may want to run - `bash dnn_benchmarks/profiling/jetson_clocks.sh` - to ensure that the clocks are running at the maximum frequency - -Underneath, `llvm-lit` is used to discover and run the tests. - -`benchmarks/` can only be compiled in-source with `make`. -We are working to migrate it into the `cmake` system. - -## Compiling Benchmarks - -This section explains how to compile the benchmarks without running them as tests. - -### HPVM-C DNN Benchmarks - -To build (not run) all `dnn_benchmarks/hpvm-c`, use `make -j dnn_benchmarks`. -For each benchmark `${bench_name}`, the binary is generated at -`${build_dir}/tools/hpvm/test/dnn_benchmarks/hpvm-c/${bench_name}`. - -Alternatively, it's possible to build just 1 DNN benchmark. -The output of CMake shows a list of these benchmarks as target names, starting with -> List of test dnn benchmarks: alexnet2_cifar10;alexnet2_cifar10... - -Currently, there are 20 of them. These are: - -| | | -|-------------------|-------------------------| -| lenet_mnist | lenet_mnist_cudnn | -| alexnet_cifar10 | alexnet_cifar10_cudnn | -| alexnet2_cifar10 | alexnet2_cifar10_cudnn | -| vgg16_cifar10 | vgg16_cifar10_cudnn | -| vgg16_cifar100 | vgg16_cifar100_cudnn | -| mobilenet_cifar10 | mobilenet_cifar10_cudnn | -| resnet18_cifar10 | resnet18_cifar10_cudnn | -| alexnet_imagenet | alexnet_imagenet_cudnn | -| vgg16_imagenet | vgg16_imagenet_cudnn | -| resnet50_imagenet | resnet50_imagenet_cudnn | - -`_cudnn` suffix indicates the code is generated onto cuDNN functions. -Otherwise they are generated to `tensor_runtime` DNN functions which are hand-written in CUDA. - -### TODO: figure out how to - -1. Auto run Keras and PyTorch tests (generating, compiling and running all DNNs) diff --git a/hpvm/test/README.rst b/hpvm/test/README.rst new file mode 100644 index 0000000000..a2f94baaac --- /dev/null +++ b/hpvm/test/README.rst @@ -0,0 +1,129 @@ +Test and Benchmarks +======================== + +Directory Organization +---------------------- + +The `hpvm/test` directory holds all tests and benchmarks in HPVM and is organized as follows: + +* + ``unitTests/`` and ``regressionTests/``: unit and regression tests for HPVM. + These are LLVM-bitcode test cases for HPVM passes. + +* + ``benchmarks/``: includes a few applications written in HPVM-C, a template, and directions for compiling and running these benchmarks. + + * ``benchmarks/parboil``: Selected benchmarks from the `Parboil <http://impact.crhc.illinois.edu/parboil/parboil.aspx>`_ benchmark suite. + * ``benchmarks/pipeline``: An edge detection pipeline benchmark. + * ``benchmarks/hpvm-cava``: A Camera ISP pipeline, adapted from C code provided from our collaborators at `Harvard <http://vlsiarch.eecs.harvard.edu>`_. + +* + ``dnn_benchmarks/``: ten (10) DNN benchmarks in HPVM-C, Keras and PyTorch, supported by ApproxHPVM. + This tests HPVM as well as the Keras and PyTorch frontends. + + * + ``dnn_benchmarks/hpvm-c`` contains the HPVM-C version of these DNNs. + Their organization and usage are similar to the benchmarks under ``benchmarks/``. + + Each subfolder contains a DNN with 2 versions (2 ``.cpp`` files): + the ``tensor``-targeted version which compiles to ``tensor_runtime``, + and the ``cudnn``-targeted version which compiles to operators in ``cuDNN`` + (has ``_cudnn`` in name). + + * + ``dnn_benchmarks/keras`` contains these DNNs implemented in Keras, + and code for generating them down to HPVM-C (testing Keras frontend). + + * ``dnn_benchmarks/pytorch`` contains these DNNs in PyTorch + and code for generating them down to HPVM-C (testing PyTorch/ONNX frontend). + + The code generated from Keras and PyTorch frontend should be largely similar and functionally equivalent. + +Running Test Cases and Benchmarks +--------------------------------- + +The easiest way to run tests is to use ``make`` targets, +which will also take care of all compilation of test cases and test fixtures. +The following targets runs these tests respectively: + + +* ``make -j check-hpvm-pass`` runs tests in ``hpvm_pass``: ``hpvm_pass/**/*.ll``. + These are regression and unit tests for HPVM passes. +* + ``make -j check-hpvm-dnn`` runs all 20 DNN benchmarks under ``dnn_benchmarks/hpvm-c`` + (10 DNNs x 2 versions) and validates their accuracy. + + *Note* that this can take quite long due to the size of DNNs and datasets. + Depending on your hardware capability, this test can take 5-30 minutes. + Also, this is set to run sequentially out of GPU memory concerns. + +* + ``make -j check-hpvm-profiler`` runs ``hpvm-profiler`` on some smaller networks + (as it is extremely time-consuming) and presents the tradeoff curve with profiled speedup. + + *Note* that if you're on an NVIDIA Jetson TX2, you may want to run + ``bash dnn_benchmarks/profiling/jetson_clocks.sh`` + to ensure that the clocks are running at the maximum frequency + +Underneath, ``llvm-lit`` is used to discover and run the tests. + +``benchmarks/`` can only be compiled in-source with ``make``. +We are working to migrate it into the ``cmake`` system. + +Compiling Benchmarks +-------------------- + +This section explains how to compile the benchmarks without running them as tests. + +HPVM-C DNN Benchmarks +^^^^^^^^^^^^^^^^^^^^^ + +To build (not run) all ``dnn_benchmarks/hpvm-c``, use ``make -j dnn_benchmarks``. +For each benchmark ``${bench_name}``, the binary is generated at +``${build_dir}/tools/hpvm/test/dnn_benchmarks/hpvm-c/${bench_name}``. + +Alternatively, it's possible to build just 1 DNN benchmark. +The output of CMake shows a list of these benchmarks as target names, starting with + +.. + + List of test dnn benchmarks: alexnet2_cifar10;alexnet2_cifar10... + + +Currently, there are 20 of them. These are: + +.. list-table:: + :header-rows: 1 + + * - + - + * - lenet_mnist + - lenet_mnist_cudnn + * - alexnet_cifar10 + - alexnet_cifar10_cudnn + * - alexnet2_cifar10 + - alexnet2_cifar10_cudnn + * - vgg16_cifar10 + - vgg16_cifar10_cudnn + * - vgg16_cifar100 + - vgg16_cifar100_cudnn + * - mobilenet_cifar10 + - mobilenet_cifar10_cudnn + * - resnet18_cifar10 + - resnet18_cifar10_cudnn + * - alexnet_imagenet + - alexnet_imagenet_cudnn + * - vgg16_imagenet + - vgg16_imagenet_cudnn + * - resnet50_imagenet + - resnet50_imagenet_cudnn + + +``_cudnn`` suffix indicates the code is generated onto cuDNN functions. +Otherwise they are generated to ``tensor_runtime`` DNN functions which are hand-written in CUDA. + +TODO: figure out how to +^^^^^^^^^^^^^^^^^^^^^^^ + + +#. Auto run Keras and PyTorch tests (generating, compiling and running all DNNs) -- GitLab