A framework for documentation on website

734eb22f · Yifan Zhao · 8a92297d · 734eb22f · 8a92297d · 734eb22f
Commit 734eb22f authored 4 years ago by Yifan Zhao
--- a/README.md
+++ b/README.md
@@ -2,167 +2,26 @@

 This repository contains the source code and documentation for the HPVM Compiler Infrastructure.

-The README briefly describes how to get started with building and installing HPVM. It also provides a
-benchmark suite to test the compiler infrastructure.
+HPVM is a compiler for heterogeneous parallel system.
+For more about what HPVM is, see [our website](https://publish.illinois.edu/hpvm-project/)
+and publications:
+[PPoPP'18 paper](https://dl.acm.org/doi/pdf/10.1145/3200691.3178493),
+[OOPSLA'19 paper](https://dl.acm.org/doi/10.1145/3360612),
+[PPoPP'21 paper](https://dl.acm.org/doi/10.1145/3437801.3446108).

-HPVM is currently at **version 1.0**. For more about what HPVM is, see [our website](https://publish.illinois.edu/hpvm-project/).
+HPVM is currently at **version 1.0**.

-## Papers
+For instruction on how to build and install HPVM, see [here](/hpvm/docs/install.rst);
+for how to use HPVM, see [here](/hpvm/docs/getting_started.rst).

-[PPoPP'18 paper](https://dl.acm.org/doi/pdf/10.1145/3200691.3178493)
-
-[OOPSLA'19 paper](https://dl.acm.org/doi/10.1145/3360612)
-
-[PPoPP'21 paper](https://dl.acm.org/doi/10.1145/3437801.3446108)
-
-## Resources
-
-[HPVM IR Specification](/hpvm/docs/hpvm-specification.md)
-
-[HPVM-C Language Specification](/hpvm/docs/hpvm-c.md)
-
-[HPVM Compilation Process](/hpvm/docs/compilation.md)
-
-## Dependencies
-
-The following components are required to be installed on your machine to build HPVM.
-
-* GCC (>=5.1)
-  * In addition, each version of CUDA-nvcc requires GCC to be not newer than a certain version.
-    See [here](https://gist.github.com/ax3l/9489132) for the support matrix.
-* CMake (>=3.17)
-* GNU Make (>=3.79)
-* OpenCL (>=1.0.0)
-* CUDA (>=9.1)
-* Python (==3.6) with pip (>=20)
-
-Python must be strictly 3.6 (any subversion between 3.6.0~3.6.13).
-Alternatively, if you use Anaconda for package management,
-we provide a conda environment file that covers all Python and Python package requirements:
-
-```bash
-conda env create -n hpvm -f hpvm/env.yaml
-```
-
-## Supported Targets
-
-Supported/tested CPU architectures:
-
-* Intel Xeon E5-2640
-* Intel Xeon W-2135
-* ARM Cortex A-57
-
-Supported/tested GPU architectures for OpenCL backend:
-
-* Nvidia Quadro P1000
-* Nvidia GeForce GTX 1080
-
-Supported/tested GPU architectures for Tensor Backend:
-
-* Nvidia Jetson TX2
-* Nvidia GeForce GTX 1080
-
-HPVM has not been tested but might work on other CPUs supported by LLVM Backend, and GPUs supported by OpenCL such as Intel, AMD, etc.
-
-**NOTE**: Approximations are tuned for Jetson TX2 and same speedups may not exist for other architectures.
-
-## Getting Started
-
-### Getting source code and setting up environment
-
-Checkout HPVM and go to directory `./hpvm` under project root:
-
-```shell
-git clone --recursive -b approx_hpvm_reorg --single-branch https://gitlab.engr.illinois.edu/llvm/hpvm.git
-cd hpvm/
-```
-
-HPVM needs to be able to find CUDA.
-If CUDA is installed in your system's $PATH (e.g. if it was installed at the default location),
-HPVM can find CUDA automatically.
-Otherwise, some environment variables are required:
-
-* `CUDA_TOOLKIT_PATH` --- Path to the CUDA toolkit
-* `CUDA_INCLUDE_PATH` --- Path to the CUDA headers
-* `CUDA_LIB_PATH` --- Path to CUDA libraries
-
-`set_paths.sh` can be used for this.
-Modify the values of these variables in `set_paths.sh` according to your system, and source the script:
-
-```shell
-source set_paths.sh
-```
-
-HPVM installer script can be used to download, configure and build HPVM along with LLVM and Clang.
-
-```shell
-bash install.sh
-```
-
-On launch, the installer asks whether it should also build HPVM.
-If HPVM is to be built, the installer asks the number of threads to be used.
-The default number of threads used to build HPVM is two (2).
-
-If you use this automatic build, skip the next section.
-
-* Specifically, the HPVM installer downloads LLVM, and Clang, copies HPVM source into
-  llvm/tools and builds the entire tree. It also builds a modified LLVM C-Backend,
-  based on the one maintained by [Julia Computing](https://github.com/JuliaComputing/llvm-cbe),
-  as a part of HPVM and is currently used to generate OpenCL kernels for GPUs.
-
-### Manually Build HPVM
-
-Alternatively, you can manually build HPVM with CMake.
-Please note that in this case,
-the installer script still *must* be executed to obtain some required components,
-but without the build step.
-
-In current directory (`hpvm/`), do
-
-```shell
-mkdir build
-cd build
-cmake ../llvm [options]
-export PATH=$(realpath ./bin):$PATH
-```
-
-Some common options that can be used with CMake are:
-
-* -DCMAKE_INSTALL_PREFIX=directory --- Specify for directory the full pathname of where you want the HPVM tools and libraries to be installed.
-* -DCMAKE_BUILD_TYPE=type --- Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is Debug.
-* -DLLVM_ENABLE_ASSERTIONS=On --- Compile with assertion checks enabled (default is Yes for Debug builds, No for all other build types).
-
-**Note** that if the installer script was not used,
-you must _manually add `build/bin` directory to your $PATH variable_ as absolute path (as shown above).
-
-Now, compile the HPVM Compilation Tool `approxhpvm.py` using:
-
-```shell
-make -j<number of threads> approxhpvm.py
-```
-
-With all the aforementioned steps, HPVM should be built, installed, tested and ready to use.
-In particular, `approxhpvm.py` should be an executable command from your command line.
-
-When not using the installer, you may want to run the regression tests using this script (outside of build directory):
-
-```shell
-cd ..
-bash scripts/automate_tests.sh
-```
-
-## Benchmarks and Tests
-
-We are providing the following [HPVM benchmarks](/hpvm/test/benchmarks):
-
-* Select benchmarks from the [Parboil](http://impact.crhc.illinois.edu/parboil/parboil.aspx) benchmark suite, located under [test/benchmarks/parboil](/hpvm/test/benchmarks/parboil).
-* An edge detection pipeline benchmark, located under [test/benchmarks/pipeline](/hpvm/test/benchmarks/pipeline).
-* A Camera ISP pipeline, located under [test/benchmarks/hpvm-cava](/hpvm/test/benchmarks/hpvm-cava), adapted from C code provided from our collaborators at [Harvard](http://vlsiarch.eecs.harvard.edu).
+## Support

-Benchmark descriptions and instructions on how to compile and run them are [here](/hpvm/test/benchmarks).
+All questions can be directed to [hpvm-dev@lists.cs.illinois.edu](mailto:hpvm-dev@lists.cs.illinois.edu).

-We are also providing [unit tests](/hpvm/test/unitTests) and [regression tests](/hpvm/test/regressionTests).
+## References

-## Support
+Some documents on technical details and the internal working of HPVM:

-All questions can be directed to [hpvm-dev@lists.cs.illinois.edu](mailto:hpvm-dev@lists.cs.illinois.edu).
+* [HPVM IR Specification](/hpvm/docs/references/hpvm-specification.md)
+* [HPVM-C Language Specification](/hpvm/docs/references/hpvm-c.md)
+* [HPVM Compilation Process](/hpvm/docs/references/compilation-process.rst)
--- a/hpvm/docs/KerasFrontend.md
+++ b/hpvm/docs/KerasFrontend.md
-# Keras Frontend 
-
-Install Keras Frontend after moving to directory `/hpvm/hpvm/projects/keras`
-
-## Requirements 
-
-* python == 3.6.x
-* pip >= 18
-
-If your system uses a different Python version, we recommend using the conda environment `keras_python36.yml`. Install this using:
-
-```
-conda env create -f keras_python36.yml --name keras_python36
-```
-
-Activate the conda environment before installing the pip package (below) using:
-
-```
-conda activate keras_python36
-```
-
-**NOTE:** This step must be performed each time (for each shell process) the frontend is to be used.
-
-
-## Installing the Keras Frontend Package
-
-At the root of this project (`/projects/keras/`) install the Keras frontend pip package as:
-
-```
-pip3 install -e ./
-```
-
-**NOTE:** If you are using the conda environment, activate it prior to this step.
-
-## Suppported Operations
-
-List of supported operations and limitations are documented [here](../projects/keras/docs/Support.md) 
-
-
-
-# Keras Benchmarks
-
-Run the Keras benchmarks under `hpvm/hpvm/test/dnn_benchmarks/keras`
-
-## Download CNN Model Files 
-
-Prior to running the benchmarks, ensure you download the CNN model data (inputs and weights) if not done in automatic build script.
-
-```
-wget https://databank.illinois.edu/datafiles/o3izd/download -O model_params.tar.gz
-tar -xf  model_params.tar.gz
-```
-
-Move extracted `model_params` directory to `/test/dnn_benchmarks/model_params` (Benchmarks expect data at this location)
-
-
-## Running Benchmaks
-
-List of benchmarks and the expected accuracies:
-
-| Benchmark       | Accuracy    |
-| ----------- | ----------- |
-| alexnet.py      | 79.28       |
-| alexnet2.py   | 84.98        |
-| alexnet_imagenet.py | 56.30 |
-| lenet.py | 98.70 | 
-| mobilenet_cifar10.py | 84.42 |
-| resnet18_cifar10.py | 89.56 |
-| resnet50_imagenet.py | 75.10 |
-| vgg16_cifar10.py | 89.96 |
-| vgg16_cifar100.py | 66.50 |
-| vgg16_imagenet.py | 69.46 |
-
-
-### Synopsis
-
-```
-python3 ${BENCH_NAME}.py  [hpvm_reload|keras_reload]  [frontend] [compile]
-
-```
-
-
-**Command-line Parameters**
-
-`hpvm_reload` : Reloads HPVM weights (`.bin` binary format used by HPVM weights - present in `model_params` download directory) from directory path specified in the `reload_dir` parameter set in code - this is described in "Parameters to Change in Code" (below).
-
-`keras_reload`: Alternatively, reload weights in Keras `.h5` file format with path to file specified in `keras_model_file` described in "Parameters to Change in Code" (below).
-
-`frontend`: Invokes the HPVM frontend and dumps weights (in HPVM `.bin` format) in the output directory specified. The parameters that control where data and source files are dumped are specified by parameters `data_dir` and `src_dir`, respectively. These are described below.
-
-`compile`: Optional Parameter. When specified, it compiles the HPVM-C code generated by the frontend into an HPVM binary under the directory specified by `src_dir` (described below). If `src_dir` path exists, a unique directory (which appends a unique ID) is created. 
-The binary is built with the name `HPVM_binary`. 
-
-**NOTE:** Before running `HPVM_binary` necessary to set CUDA and CUDNN paths with:
-
-```
-source ${PATH_TO_YOUR_HPVM_ROOT}/hpvm/set_paths.sh
-```
-
-**Parameters to Change in Code** 
-
-The AlexNet source is commented with explanations on how to use the Keras frontend interface. AlexNet source is [here](https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/alexnet.py).
-
-* `NAME`: Benchmark Name - Can be set to any desired value
-
-* `reload_dir`: Path to directory from where to reload weights in HPVM format. This directory is used to reload weights if `hpvm_reload` command-line option is used.
-
-* `keras_model_file`: Path to Keras .h5 model file to reload weigths from. Either of `reload_dir` or `keras_model_file` can be used. 
-`keras_model_file` is used when `keras_reload` commad-line parameter is used with the Benchmark script.
-
-* `data_dir`: Directory to dump weights specified specified in [constructor](https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/Benchmark.py#L21)
- 
-* `src_dir`: Directory to dump ApproxHPVM sources in HPVM-C (C with HPVM compiler intrinsics) specified in [constructor](https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/Benchmark.py#L22) 
-
-* `num_classes`: number of output classes - dependent on the dataset used. For CIFAR10, `num_classes` is 10, CIFAR100 has 100 classes,
- for ImageNet, number of classes is 1000.
-
-* `batch_size`: This parameter controls the size of each batch that is processed in HPVM. The batch size should be kept as large as the GPU memory 
-can support. This parameter should be adapted according to the memory size of the deployed device.
-
-
-
-### Using the Frontend with Custom (New) Benchmarks 
-
-Any new benchmarks must inherit from the commom parent `Benchmark` class 
-and override the virtual functions for building the model, training, 
-and data preprocessing. These methods are described below:
-        
-    
-`def buildModel(self)`:
-Constructs and returns a keras model
-
-`def data_preprocess(self)`:
-returns X_train, y_train, X_test, y_test, X_tuner, and y_tuner data (in that order): 
-These are described here:
-
-* `X_train:` Training data (fp32) in NCHW format
-* `y_train:` Training labels (int32)
-
-* `X_test:` Testing/Evaluation data in NCHW format
-* `y_test:` Testing/Evaluation labels
-
-* `X_tuner:` Data to be used for autotuning 
-* `y_tuner:` Labels corresponding to tuning data
-
-
-`def trainModel(self, model, X_train, y_train, X_test, y_test)`:
-Trains the Keras model constructed in `buildModel` and is expected to return the 
-trained keras model - training parameters should be tuned here.
-
-### Directly using Keras Frontend API
-
-Alternate to extending the `Benchmark` class, users may directly invoke the Keras Frontend API. This can be done as:
-
-```python
-
-from keras_frontend.approxhpvm_translator import translate_to_approxhpvm
-
-# Construct and train your Keras Model (or load pre-trained weights)
-
-translate_to_approxhpvm(model, data_dir, src_dir, test_data, test_labels, tune_data, tune_labels, batch_size, num_classes)
-
-```
-
-## Running HPVM Binary 
-
-Run the `HPVM_binary` generated under the directory specified by `src_dir` (described above). Usage: 
-
-```
-./HPVM_binary -t {test|tune} -c ${config_file_path}
-```
-
-`test|tune`: Runs with either tune (autotuning data) or test set (for evaluation)
-
-`config_file_path`: Path to an HPVM tensor configuration file (includes approximation settings)
-
-**NOTE:** The accuracy of the bennchmarks is dumped into a file named `final_accuracy` in the current working directory - this includes accuracy averaged across batches
-
-## Automated Tests 
-
-`scripts/test_benchmarks.py` is an automated test script that evaluates the accuracy of each Benchmark in Keras and HPVM (after comilation using HPVM Compiler) and compares the accuracy of each binary to the known correct accuracy. Run from root of `/test/dnn_benchmarks/keras`:
-
-```
-python test_benchmarks.py
-```
-
-
-
-
-
-
--- a/hpvm/docs/Makefile
+++ b/hpvm/docs/Makefile
+# Makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+PAPER         =
+
+# Internal variables.
+PAPEROPT_a4     = -D latex_paper_size=a4
+PAPEROPT_letter = -D latex_paper_size=letter
+ALLSPHINXOPTS   = -d build/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+
+.PHONY: help clean html dirhtml pickle json htmlhelp qthelp latex changes linkcheck doctest epub
+
+help:
+	@echo "Please use \`make <target>' where <target> is one of"
+	@echo "  html      to make standalone HTML files"
+	@echo "  dirhtml   to make HTML files named index.html in directories"
+	@echo "  pickle    to make pickle files"
+	@echo "  epub       to make an epub"
+	@echo "  json      to make JSON files"
+	@echo "  htmlhelp  to make HTML files and a HTML help project"
+	@echo "  qthelp    to make HTML files and a qthelp project"
+	@echo "  latex     to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
+	@echo "  changes   to make an overview of all changed/added/deprecated items"
+	@echo "  linkcheck to check all external links for integrity"
+	@echo "  doctest   to run all doctests embedded in the documentation (if enabled)"
+	@echo "  gitwash   to update the gitwash documentation"
+
+
+clean:
+	-rm -rf build/*
+
+dist: html
+	test -d build/latex || make latex
+	make -C build/latex all-pdf
+	-rm -rf build/dist
+	(cd build/html; cp -r . ../../build/dist)
+	(cd build/dist && tar czf ../dist.tar.gz .)
+
+html:
+	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) build/html
+	@echo
+	@echo "Build finished. The HTML pages are in build/html."
+
+dirhtml:
+	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) build/dirhtml
+	@echo
+	@echo "Build finished. The HTML pages are in build/dirhtml."
+
+pickle:
+	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) build/pickle
+	@echo
+	@echo "Build finished; now you can process the pickle files."
+
+json:
+	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) build/json
+	@echo
+	@echo "Build finished; now you can process the JSON files."
+
+htmlhelp:
+	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) build/htmlhelp
+	@echo
+	@echo "Build finished; now you can run HTML Help Workshop with the" \
+	      ".hhp project file in build/htmlhelp."
+
+qthelp:
+	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) build/qthelp
+	@echo
+	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
+	      ".qhcp project file in build/qthelp, like this:"
+	@echo "# qcollectiongenerator build/qthelp/test.qhcp"
+	@echo "To view the help file:"
+	@echo "# assistant -collectionFile build/qthelp/test.qhc"
+
+epub:
+	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) build/epub
+	@echo
+	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
+
+
+latex:
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) build/latex
+	@echo
+	@echo "Build finished; the LaTeX files are in build/latex."
+	@echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \
+	      "run these through (pdf)latex."
+
+changes:
+	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) build/changes
+	@echo
+	@echo "The overview file is in build/changes."
+
+linkcheck:
+	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) build/linkcheck
+	@echo
+	@echo "Link check complete; look for any errors in the above output " \
+	      "or in build/linkcheck/output.txt."
+
+doctest:
+	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) build/doctest
+	@echo "Testing of doctests in the sources finished, look at the " \
+	      "results in build/doctest/output.txt."
+
+latexpdf: latex
+	@echo "Running LaTeX files through latexmk..."
+	$(MAKE) -C build/latex all-pdf
+	@echo "latexmk finished; the PDF files are in build/latex."
--- a/hpvm/docs/README.md
+++ b/hpvm/docs/README.md
+# Building docs
+
+We use Sphinx for generating the API and reference documentation.
+
+## Instructions
+
+Install the following Python packages needed to build the documentation by entering:
+
+```bash
+pip install sphinx sphinx-autodoc-typehints sphinx-rtd-theme numpydoc
+```
+
+To build the HTML documentation, enter::
+
+```bash
+make html
+```
+
+in the ``doc/`` directory. This will generate a ``build/html`` subdirectory
+containing the built documentation.
+
+To build the PDF documentation, enter::
+
+```bash
+make latexpdf
+```
+
+You will need to have LaTeX installed for this.
--- a/hpvm/docs/compilation.md
+++ b/hpvm/docs/compilation.md
-# HPVM Compilation Process
-Compilation of an HPVM program involves the following steps:
-
-1. `clang` takes an HPVM-C/C++ program (e.g. `main.c`) and produces an LLVM IR (`main.ll`) file that contains the HPVM-C function calls. The declarations of these functions are defined in `test/benchmark/include/hpvm.h`, which must be included in the program.
-2. `opt` takes (`main.ll`) and invoke the GenHPVM pass on it, which converts the HPVM-C function calls to HPVM intrinsics. This generates the HPVM textual representation (`main.hpvm.ll`).
-3. `opt` takes the HPVM textual representation (`main.hpvm.ll`) and invokes the following passes in sequence: 
-    * BuildDFG: Converts the textual representation to the internal HPVM representation.
-    * LocalMem and DFG2LLVM_OpenCL: Invoked only when GPU target is selected. Generates the kernel module (`main.kernels.ll`) and the portion of the host code that invokes the kernel into the host module (`main.host.ll`).
-    * DFG2LLVM_CPU: Generates either all, or the remainder of the host module (`main.host.ll`) depending on the chosen target.
-    * ClearDFG: Deletes the internal HPVM representation from memory.
-4. `clang` is used to to compile any remaining project files that would be later linked with the host module.
-5. `llvm-link` takes the host module and all the other generate `ll` files, and links them with the HPVM runtime module (`hpvm-rt.bc`), to generate the linked host module (`main.host.linked.ll`). 
-6. Generate the executable code from the generated `ll` files for all parts of the program:
-    * GPU target: `llvm-cbe` takes the kernel module (`main.kernels.ll`) and generates an OpenCL representation of the kernels that will be invoked by the host.
-    * CPU target: `clang` takes the linked  host module (`main.host.linked.ll`) and generates the CPU binary.
--- a/hpvm/docs/components/index.rst
+++ b/hpvm/docs/components/index.rst
+Components
+================================
+
+HPVM consists of a few relatively independent key components.
+
+* Patched LLVM: provides HPVM IR and a compilation infrastructure, including ``clang`` and ``opt``.
+* HPVM code generator: a few ``opt`` passes that lowers HPVM IR to LLVM IR,
+  which is then compiled into object code and binary.
+
+`Compilation process of HPVM <../references/hpvm-specification.html>`_
+shows how these 2 components work together.
+In addition, there are:
+
+* Frontends (Keras/PyTorch): code generators in Python for lowering Keras and PyTorch
+  DNN models into HPVM-C format.
+* Predictive tuner: an autotuner library in Python for finding approximation choices (configurations)
+  with best performance gain within some loss of Quality of Service (QoS, such as accuracy).
+* HPVM profiler: an API in Python for measuring real performance of configurations.
+* Tensor runtime: a backend which holds implementations for some common tensor operators
+  (such as convolution) that HPVM-C functions can be converted into.
+
+The documentation of these components are listed below,
+which explains their role, usage, and other details.
+
+.. toctree::
+   :maxdepth: 1
+
+   keras-frontend
+   keras-benchmarks
+   torch2hpvm
--- a/hpvm/docs/components/keras-benchmarks.rst
+++ b/hpvm/docs/components/keras-benchmarks.rst
+Keras Benchmarks
+================
+
+TODO: some of this belongs to `test/`.
+
+Run the Keras benchmarks under ``hpvm/hpvm/test/dnn_benchmarks/keras``
+
+Download CNN Model Files
+------------------------
+
+Prior to running the benchmarks, ensure you download the CNN model data (inputs and weights) if not done in automatic build script.
+
+.. code-block::
+
+   wget https://databank.illinois.edu/datafiles/o3izd/download -O model_params.tar.gz
+   tar -xf  model_params.tar.gz
+
+Move extracted ``model_params`` directory to ``/test/dnn_benchmarks/model_params`` (Benchmarks expect data at this location)
+
+Running Benchmaks
+-----------------
+
+List of benchmarks and the expected accuracies:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Benchmark
+     - Accuracy
+   * - alexnet.py
+     - 79.28
+   * - alexnet2.py
+     - 84.98
+   * - alexnet_imagenet.py
+     - 56.30
+   * - lenet.py
+     - 98.70
+   * - mobilenet_cifar10.py
+     - 84.42
+   * - resnet18_cifar10.py
+     - 89.56
+   * - resnet50_imagenet.py
+     - 75.10
+   * - vgg16_cifar10.py
+     - 89.96
+   * - vgg16_cifar100.py
+     - 66.50
+   * - vgg16_imagenet.py
+     - 69.46
+
+
+Synopsis
+^^^^^^^^
+
+.. code-block::
+
+   python3 ${BENCH_NAME}.py  [hpvm_reload|keras_reload]  [frontend] [compile]
+
+**Command-line Parameters**
+
+``hpvm_reload`` : Reloads HPVM weights (``.bin`` binary format used by HPVM weights - present in ``model_params`` download directory) from directory path specified in the ``reload_dir`` parameter set in code - this is described in "Parameters to Change in Code" (below).
+
+``keras_reload``: Alternatively, reload weights in Keras ``.h5`` file format with path to file specified in ``keras_model_file`` described in "Parameters to Change in Code" (below).
+
+``frontend``: Invokes the HPVM frontend and dumps weights (in HPVM ``.bin`` format) in the output directory specified. The parameters that control where data and source files are dumped are specified by parameters ``data_dir`` and ``src_dir``, respectively. These are described below.
+
+``compile``: Optional Parameter. When specified, it compiles the HPVM-C code generated by the frontend into an HPVM binary under the directory specified by ``src_dir`` (described below). If ``src_dir`` path exists, a unique directory (which appends a unique ID) is created. 
+The binary is built with the name ``HPVM_binary``. 
+
+**NOTE:** Before running ``HPVM_binary`` necessary to set CUDA and CUDNN paths with:
+
+.. code-block::
+
+   source ${PATH_TO_YOUR_HPVM_ROOT}/hpvm/set_paths.sh
+
+**Parameters to Change in Code** 
+
+The AlexNet source is commented with explanations on how to use the Keras frontend interface. AlexNet source is `here <https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/alexnet.py>`_.
+
+
+* 
+  ``NAME``: Benchmark Name - Can be set to any desired value
+
+* 
+  ``reload_dir``: Path to directory from where to reload weights in HPVM format. This directory is used to reload weights if ``hpvm_reload`` command-line option is used.
+
+* 
+  ``keras_model_file``: Path to Keras .h5 model file to reload weigths from. Either of ``reload_dir`` or ``keras_model_file`` can be used. 
+  ``keras_model_file`` is used when ``keras_reload`` commad-line parameter is used with the Benchmark script.
+
+* 
+  ``data_dir``: Directory to dump weights specified specified in
+  `constructor <https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/Benchmark.py#L21>`_.
+
+* 
+  ``src_dir``: Directory to dump ApproxHPVM sources in HPVM-C (C with HPVM compiler intrinsics) specified in
+  `constructor <https://gitlab.engr.illinois.edu/llvm/hpvm/-/blob/approx_hpvm_reorg_keras/hpvm/projects/keras/src/Benchmark.py#L22>`_.
+
+* 
+  ``num_classes``: number of output classes - dependent on the dataset used. For CIFAR10, ``num_classes`` is 10, CIFAR100 has 100 classes,
+  for ImageNet, number of classes is 1000.
+
+* 
+  ``batch_size``: This parameter controls the size of each batch that is processed in HPVM. The batch size should be kept as large as the GPU memory 
+  can support. This parameter should be adapted according to the memory size of the deployed device.
+
+Using the Frontend with Custom (New) Benchmarks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Any new benchmarks must inherit from the commom parent ``Benchmark`` class 
+and override the virtual functions for building the model, training, 
+and data preprocessing. These methods are described below:
+
+``def buildModel(self)``:
+Constructs and returns a keras model
+
+``def data_preprocess(self)``:
+returns X_train, y_train, X_test, y_test, X_tuner, and y_tuner data (in that order): 
+These are described here:
+
+
+* ``X_train:`` Training data (fp32) in NCHW format
+* 
+  ``y_train:`` Training labels (int32)
+
+* 
+  ``X_test:`` Testing/Evaluation data in NCHW format
+
+* 
+  ``y_test:`` Testing/Evaluation labels
+
+* 
+  ``X_tuner:`` Data to be used for autotuning 
+
+* ``y_tuner:`` Labels corresponding to tuning data
+
+``def trainModel(self, model, X_train, y_train, X_test, y_test)``:
+Trains the Keras model constructed in ``buildModel`` and is expected to return the 
+trained keras model - training parameters should be tuned here.
+
+Directly using Keras Frontend API
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Alternate to extending the ``Benchmark`` class, users may directly invoke the Keras Frontend API. This can be done as:
+
+.. code-block:: python
+
+
+   from keras_frontend.approxhpvm_translator import translate_to_approxhpvm
+
+   # Construct and train your Keras Model (or load pre-trained weights)
+
+   translate_to_approxhpvm(model, data_dir, src_dir, test_data, test_labels, tune_data, tune_labels, batch_size, num_classes)
+
+Running HPVM Binary
+-------------------
+
+Run the ``HPVM_binary`` generated under the directory specified by ``src_dir`` (described above). Usage: 
+
+.. code-block::
+
+   ./HPVM_binary -t {test|tune} -c ${config_file_path}
+
+``test|tune``: Runs with either tune (autotuning data) or test set (for evaluation)
+
+``config_file_path``: Path to an HPVM tensor configuration file (includes approximation settings)
+
+**NOTE:** The accuracy of the bennchmarks is dumped into a file named ``final_accuracy`` in the current working directory - this includes accuracy averaged across batches
+
+Automated Tests
+---------------
+
+``scripts/test_benchmarks.py`` is an automated test script that evaluates the accuracy of each Benchmark in Keras and HPVM (after comilation using HPVM Compiler) and compares the accuracy of each binary to the known correct accuracy. Run from root of ``/test/dnn_benchmarks/keras``:
+
+.. code-block::
+
+   python test_benchmarks.py
--- a/hpvm/docs/components/keras-frontend.rst
+++ b/hpvm/docs/components/keras-frontend.rst
+
+Keras Frontend
+==============
+
+Install Keras Frontend after moving to directory ``/hpvm/hpvm/projects/keras``
+
+Requirements
+------------
+
+
+* python == 3.6.x
+* pip >= 18
+
+If your system uses a different Python version, we recommend using the conda environment ``keras_python36.yml``. Install this using:
+
+.. code-block::
+
+   conda env create -f keras_python36.yml --name keras_python36
+
+Activate the conda environment before installing the pip package (below) using:
+
+.. code-block::
+
+   conda activate keras_python36
+
+**NOTE:** This step must be performed each time (for each shell process) the frontend is to be used.
+
+Installing the Keras Frontend Package
+-------------------------------------
+
+At the root of this project (``/projects/keras/``) install the Keras frontend pip package as:
+
+.. code-block::
+
+   pip3 install -e ./
+
+**NOTE:** If you are using the conda environment, activate it prior to this step.
+
+Suppported Operations
+---------------------
+
+List of supported operations and limitations are documented `here <../projects/keras/docs/Support.md>`_.
+TODO: move that Support.md in here as well, otherwise the link will fail when we publish to a website.
--- a/hpvm/docs/components/torch2hpvm.rst
+++ b/hpvm/docs/components/torch2hpvm.rst
+../../projects/torch2hpvm/README.rst
\ No newline at end of file
--- a/hpvm/docs/conf.py
+++ b/hpvm/docs/conf.py
+from datetime import date
+import sphinx_rtd_theme
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+
+sys.path.insert(0, os.path.abspath(".."))
+
+# General configuration
+# ---------------------
+
+# Add any Sphinx extension module names here, as strings. They can be extensions
+# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
+extensions = [
+    "sphinx.ext.autosummary",
+    "sphinx.ext.autodoc",
+    "sphinx_autodoc_typehints",
+    "sphinx.ext.coverage",
+    "sphinx.ext.doctest",
+    "sphinx.ext.intersphinx",
+    "sphinx.ext.mathjax",
+    "sphinx.ext.todo",
+    "sphinx.ext.viewcode",
+    "numpydoc",
+]
+always_document_param_types = True
+
+# generate autosummary pages
+autosummary_generate = True
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ["_templates"]
+
+# The suffix of source filenames.
+source_suffix = ".rst"
+
+# The encoding of source files.
+source_encoding = "utf-8"
+
+# The master toctree document.
+master_doc = "index"
+
+# General substitutions.
+project = "HPVM"
+copyright = f"2020-{date.today().year}, University of Illinois"
+
+# There are two options for replacing |today|: either, you set today to some
+# non-false value, then it is used:
+# today = ''
+# Else, today_fmt is used as the format for a strftime call.
+# today_fmt = '%B %d, %Y'
+
+# List of documents that shouldn't be included in the build.
+# unused_docs = ['']
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+# add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+add_module_names = False
+
+# show_authors = True
+
+# The name of the Pygments (syntax highlighting) style to use.
+# pygments_style = 'friendly'
+pygments_style = "sphinx"
+
+# A list of prefixs that are ignored when creating the module index. (new in Sphinx 0.6)
+# modindex_common_prefix = ["networkx."]
+
+# doctest_global_setup = "import networkx as nx"
+
+# Options for HTML output
+# -----------------------
+
+
+html_theme = "sphinx_rtd_theme"
+html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+
+html_theme_options = {
+    "canonical_url": "https://networkx.org/documentation/stable/",
+    "navigation_depth": 3,
+    "logo_only": True,
+}
+
+# html_logo = "_static/networkx_logo.svg"
+
+# The style sheet to use for HTML and HTML Help pages. A file of that name
+# must exist either in Sphinx' static/ path, or in one of the custom paths
+# given in html_static_path.
+# html_style = ''
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = []
+
+# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
+# using the given strftime format.
+html_last_updated_fmt = "%b %d, %Y"
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+# html_use_smartypants = True
+
+# Content template for the index page.
+# html_index = 'index.html'
+
+# Custom sidebar templates, maps page names to templates.
+# html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# templates.
+# html_additional_pages = {'': ''}
+
+# If true, the reST sources are included in the HTML build as _sources/<name>.
+html_copy_source = False
+
+# Options for LaTeX output
+# ------------------------
+
+# Use a latex engine that allows for unicode characters in docstrings
+latex_engine = "xelatex"
+# The paper size ('letter' or 'a4').
+latex_paper_size = "letter"
+
+# The font size ('10pt', '11pt' or '12pt').
+# latex_font_size = '10pt'
+
+latex_appendices = ["tutorial"]
+
+# Intersphinx mapping
+intersphinx_mapping = {
+    "python": ("https://docs.python.org/3/", None),
+    "numpy": ("https://numpy.org/doc/stable/", None),
+    "matplotlib": ("https://matplotlib.org", None),
+    "scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
+    "pandas": ("https://pandas.pydata.org/pandas-docs/stable", None),
+    "pytorch": ("https://pytorch.org/docs/stable", None),
+}
+
+# The reST default role (used for this markup: `text`) to use for all
+# documents.
+default_role = "obj"
+
+numpydoc_show_class_members = False
+
+
+def setup(app):
+    app.add_css_file("custom.css")
+    app.add_js_file("copybutton.js")
--- a/hpvm/docs/getting-started.rst
+++ b/hpvm/docs/getting-started.rst
+Getting Started
+===============
+
+TODO: this is the system-wide tour Sasa was suggesting. Finish this.
--- a/hpvm/docs/hpvm-c.md
+++ b/hpvm/docs/hpvm-c.md
-# HPVM-C Language Specification
-An HPVM program is a combination of host code and one or more data flow graphs (DFG) at the IR level. We provide C function declarations representing the HPVM intrinsics that allow creating, querying, and interacting with the DFGs. More details about the HPVM IR intrinsics can be found in [the HPVM IR Specification.](/hpvm/docs/hpvm-specification.md).
-
-An HPVM-C program contains both the host and the DFG code. Each HPVM kernel, represented by a leaf node in the DFG, can be compiled to multiple different targets (e.g. CPU and GPU) as described below. 
-
-This document describes all the API calls that can be used in an HPVM-C program.
-
-## Host API
-
-```void __hpvm__init()```  
-Used before all other HPVM calls to initialize the HPVM runtime.
-
-```void __hpvm__cleanup()```  
-Used at the end of HPVM program to clean up all remaining runtime-created HPVM objects.
-
-```void llvm_hpvm_track_mem(void* ptr, size_t sz)```  
-Insert memory starting at ```ptr``` of size ```sz``` in the memory tracker of HPVM runtime.
-
-```void llvm_hpvm_untrack_mem(void* ptr)```  
-Stop tracking the memory object identified by ```ptr```.
-
-```void llvm_hpvm_request_mem(void* ptr, size_t sz)```  
-If the memory object identified by ```ptr``` is not in host memory, copy it to host memory.
-
-```void* __hpvm__launch(unsigned isStream, void* rootGraph, void* args)```  
-Launches the execution of the dataflow graph with node function ```rootGraph```. ```args``` is a pointer to a packed struct, containing one field per argument of the RootGraph function, consecutively. For non-streaming DFGs with a non empty result type, ```args``` must contain an additional field of the type ```RootGraph.returnTy```, where the result of the graph will be returned. ```isStream``` chooses between a non streaming (0) or streaming (1) graph execution. Returns a handle to the executing graph.
-
-```void __hpvm__wait(void* G)```  
-Waits for completion of execution of the dataflow graph with handle ```G```.
-
-```void __hpvm__push(void* G, void* args)```  
-Push set of input data items, ```args```, (same as type included in launch) to streaming DFG with handle ```G```.
-
-```void* __hpvm__pop(void* G)```  
-Pop and return data produced from one execution of streaming DFG with handle ```G```. The return type is a struct containing a field for every output of DFG. 
-
-## Internal Node API
-
-```void* __hpvm__createNodeND(unsigned dims, void* F, ...)```  
-Creates a static dataflow node replicated in ```dims``` dimensions (0 to 3), each executing node function ```F```. The arguments following ```F``` are the size of each dimension, respectively, passed in as a ```size_t```. Returns a handle to the created dataflow node.
-
-```void* __hpvm__edge(void* src, void* dst, unsigned replType, unsigned sp, unsigned dp, unsigned isStream)```  
-Creates an edge from output ```sp``` of node ```src``` to input ```dp``` of node ```dst```. If ```replType``` is 0, the edge is a one-to-one edge, otherwise it is an all-to-all edge. ```isStream``` defines whether or not the edge is streaming. Returns a handle to the created edge.
-
-```void __hpvm__bindIn(void* N, unsigned ip, unsigned ic, unsigned isStream)```  
-Binds the input ```ip``` of the current node to input ```ic``` of child node function ```N```. ```isStream``` defines whether or not the input bind is streaming.
-
-```void __hpvm__bindOut(void* N, unsigned op, unsigned oc, unsigned isStream)```  
-Binds the output ```op``` of the current node to output ```oc``` of child node function ```N```. ```isStream``` defines whether or not the output bind is streaming.
-
-```void __hpvm__hint(enum Target target)``` (C\)  
-```void __hpvm__hint(hpvm::Target target)``` (C++)  
-Must be called once in each node function. Indicates which hardware target the current function should run in.
-
-```void __hpvm__attributes(unsigned ni, …, unsigned no, …)```  
-Must be called once at the beginning of each node function. Defines the properties of the pointer arguments to the current function. ```ni``` represents the number of input arguments, and ```no``` the number of output arguments. The arguments following ```ni``` are the input arguments, and the arguments following ```no``` are the output arguments. Arguments can be marked as both input and output. All pointer arguments must be included.
-
-## Leaf Node API
-```void __hpvm__hint(enum Target target)``` (C\)  
-```void __hpvm__hint(hpvm::Target target)``` (C++)  
-As described in internal node API.
-
-```void __hpvm__attributes(unsigned ni, …, unsigned no, …)```  
-As described in internal node API.
-
-```void __hpvm__return(unsigned n, ...)```  
-Returns ```n``` values from a leaf node function. The remaining arguments are the values to be returned. All ```__hpvm__return``` statements within the same function must return the same number of values.
-
-```void* __hpvm__getNode()```  
-Returns a handle to the current leaf node.
-
-```void* __hpvm__getParentNode(void* N)```  
-Returns a handle to the parent node of node ```N```.
-
-```long __hpvm__getNodeInstanceID_{x,y,z}(void* N)```  
-Returns the dynamic ID of the current instance of node ```N``` in the x, y, or z dimension respectively. The dimension must be one of the dimensions in which the node is replicated.
-
-```long __hpvm__getNumNodeInstances_{x,y,z}(void* N)```  
-Returns the number of dynamic instances of node ```N``` in the x, y, or z dimension respectively. The dimension must be one of the dimensions in which the node is replicated.
-
-```void* __hpvm__malloc(long nBytes)```  
-Allocate a block of memory of size ```nBytes``` and returns a pointer to it. The allocated object can be shared by all nodes. *Note that the returned pointer must somehow be communicated explicitly for use by other nodes.*
-
-```int __hpvm__atomic_add(int* m, int v)```  
-Atomically adds ```v``` to the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
-
-```int __hpvm__atomic_sub(int* m, int v)```  
-Atomically subtracts ```v``` from the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
-
-```int __hpvm__atomic_min(int* m, int v)```  
-Atomically computes the min of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
-
-```int __hpvm__atomic_max(int* m, int v)```  
-Atomically computes the max of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
-
-```int __hpvm__atomic_xchg(int* m, int v)```  
-Atomically swaps ```v``` with the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
-
-```int __hpvm__atomic_and(int* m, int v)```  
-Atomically computes the bitwise AND of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
-
-```int __hpvm__atomic_or(int* m, int v)```  
-Atomically computes the bitwise OR of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
-
-```int __hpvm__atomic_xor(int* m, int v)```  
-Atomically computes the bitwise XOR of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
-
-```void __hpvm__barrier()```  
-Local synchronization barrier across dynamic instances of current leaf node.
-
-# Porting a Program from C to HPVM-C
-
-The following represents the required steps to port a regular C program into an HPVM program with HPVM-C. These steps are described at a high level; for more detail, please see [hpvm-cava](/hpvm/test/benchmarks/hpvm-cava) provided in [benchmarks](/hpvm/test/benchmarks).
-* Separate the computation that will become a kernel into its own (leaf node) function and add the attributes and target hint.
-* Create a level 1 wrapper node function that will describe the thread-level parallelism (for the GPU). The node will:
-    * Use the ```createNode[ND]()``` method to create a kernel node and specify how many threads will execute it.
-    * Bind its arguments to the kernel arguments.
-* If desired, create a level 2 wrapper node function which will describe the threadblock-level parallalism (for the GPU). This node will:
-    * Use the ```createNode[ND]()``` method to create a level 1 wrapper node and specify how many threadblocks will execute it.
-    * Bind its arguments to its child node's arguments.
-* A root node function that creates all the top-level wrapper nodes, binds their arguments, and connects their edges.
-    * Each root node represents a DFG.
-* All the above node functions have the combined arguments of all the kernels that are nested at each level. 
-* The host code will have to include the following:
-    * Initialize the HPVM runtime using the ```init()``` method.
-    * Create an argument struct for each DFG and assign its member variables.
-    * Add all the memory that is required by the kernel into the memory tracker.
-    * Launch the DFG by calling the ```launch()``` method on the root node function, and passing the corresponding argument struct.
-    * Wait for the DFG to complete execution.
-    * Read out any generated memory using the ```request_mem()``` method.
-    * Remove all the tracked memory from the memory tracker.
--- a/hpvm/docs/index.rst
+++ b/hpvm/docs/index.rst
+.. _contents:
+
+The HPVM Compiler Infrastructure
+================================
+
+HPVM is a compiler for heterogeneous parallel system.
+For more about what HPVM is, see `our website <https://publish.illinois.edu/hpvm-project/>`_
+and publications:
+`PPoPP'18 paper <https://dl.acm.org/doi/pdf/10.1145/3200691.3178493>`_,
+`OOPSLA'19 paper <https://dl.acm.org/doi/10.1145/3360612>`_,
+`PPoPP'21 paper <https://dl.acm.org/doi/10.1145/3437801.3446108>`_.
+
+This is the documentation of HPVM at **version 1.0**.
+
+Audience
+--------
+
+TODO: write something here.
+
+Documentation
+-------------
+
+.. toctree::
+   :maxdepth: 1
+
+   install
+   getting-started
+   tests
+   components/index
+   references/index
+
+Indices and tables
+------------------
+
+* :ref:`genindex`
+
+Support
+-------
+
+All questions can be directed to `hpvm-dev@lists.cs.illinois.edu <mailto:hpvm-dev@lists.cs.illinois.edu>`_.
--- a/hpvm/docs/install.rst
+++ b/hpvm/docs/install.rst
+Install
+===============
+
+Dependencies
+------------
+
+The following components are required to be installed on your machine to build HPVM.
+
+* GCC (>=5.1)
+
+  * In addition, each version of CUDA-nvcc requires GCC to be not newer than a certain version.
+    See `here <https://gist.github.com/ax3l/9489132>`_ for the support matrix.
+
+* CMake (>=3.17)
+* GNU Make (>=3.79)
+* OpenCL (>=1.0.0)
+* CUDA (>=9.1)
+* Python (==3.6) with pip (>=20)
+
+Python must be strictly 3.6 (any subversion from 3.6.0 to 3.6.13).
+Alternatively, if you use Anaconda for package management,
+we provide a conda environment file that covers all Python and package requirements:
+
+.. code-block:: bash
+
+   conda env create -n hpvm -f hpvm/env.yaml
+
+
+Supported Architectures
+-----------------------
+
+Supported/tested CPU architectures:
+
+* Intel Xeon E5-2640
+* Intel Xeon W-2135
+* ARM Cortex A-57
+
+Supported/tested GPU architectures for OpenCL backend:
+
+* Nvidia Quadro P1000
+* Nvidia GeForce GTX 1080
+
+Supported/tested GPU architectures for Tensor Backend:
+
+* Nvidia Jetson TX2
+* Nvidia GeForce GTX 1080
+
+HPVM has not been tested but might work on other CPUs supported by LLVM Backend,
+and GPUs supported by OpenCL such as Intel, AMD, etc.
+
+**NOTE**: Approximations are tuned for Jetson TX2 and same speedups may not exist for other architectures.
+
+
+Installing from Source
+----------------------
+
+Checkout HPVM and go to directory ``./hpvm`` under project root:
+
+.. code-block:: shell
+
+   git clone --recursive -b approx_hpvm_reorg --single-branch https://gitlab.engr.illinois.edu/llvm/hpvm.git
+   cd hpvm/
+
+HPVM needs to be able to find CUDA.
+If CUDA is installed in your system's $PATH (e.g. if it was installed at the default location),
+HPVM can find CUDA automatically.
+
+Use HPVM installer script to download, configure and build HPVM along with LLVM and Clang:
+
+.. code-block:: shell
+
+   ./install.sh
+
+
+* Without arguments, this script will interactively prompt you for some parameters.
+  Alternatively, use ``./install.sh -h`` for a list of available arguments
+  and pass arguments as required.
+
+After configuring HPVM,
+the installer will also compile HPVM by default, which you can opt out of.
+If you do so, follow the next section "Manually Build HPVM" to manually compile HPVM,
+and "Benchmarks and Tests" to manually run test cases if you wish so.
+Otherwise, you can skip the next 2 sections.
+
+* Specifically, the HPVM installer downloads LLVM, and Clang, copies HPVM source into
+  llvm/tools and builds the entire tree. It also builds a modified LLVM C-Backend,
+  based on the one maintained by `Julia Computing <https://github.com/JuliaComputing/llvm-cbe>`_,
+  as a part of HPVM and is currently used to generate OpenCL kernels for GPUs.
+
+TroubleShooting
+^^^^^^^^^^^^^^^
+
+If CMake did not find your CUDA, some environment variables will help it:
+
+* ``CUDA_TOOLKIT_PATH`` --- Path to the CUDA toolkit
+* ``CUDA_INCLUDE_PATH`` --- Path to the CUDA headers
+* ``CUDA_LIB_PATH`` --- Path to CUDA libraries
+
+You can use ``set_paths.sh`` for this purpose: modify the values of these variables
+in ``set_paths.sh`` according to your system, and source the script:
+
+.. code-block:: shell
+
+   source set_paths.sh
+
+Manually Build HPVM
+-------------------
+
+Alternatively, you can manually build HPVM with CMake.
+Please note that in this case,
+the installer script still *must* be executed to obtain some required components,
+but without the build step.
+In current directory (``hpvm/``), do
+
+.. code-block:: shell
+
+   mkdir build
+   cd build
+   cmake ../llvm [options]
+   export PATH=$(realpath ./bin):$PATH
+
+**Note** that you must *manually add ``build/bin`` directory to your $PATH variable*
+as absolute path (as shown above).
+
+Some common options that can be used with CMake are:
+
+* ``-DCMAKE_INSTALL_PREFIX=directory`` --- Specify for directory the full pathname of where you want the HPVM tools and libraries to be installed.
+* ``-DCMAKE_BUILD_TYPE=type`` --- Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is Debug.
+* ``-DLLVM_ENABLE_ASSERTIONS=On`` --- Compile with assertion checks enabled (default is Yes for Debug builds, No for all other build types).
+
+Now, compile the HPVM Compilation Tool ``approxhpvm.py`` using:
+
+.. code-block:: shell
+
+   make -j<number of threads> approxhpvm.py
+
+With all the aforementioned steps, HPVM should be built, installed, tested and ready to use.
+In particular, ``approxhpvm.py`` should be an executable command from your command line.
+
+Benchmarks and Tests
+--------------------
+
+We provide a number of general benchmarks, DNN benchmarks, and test cases, written in HPVM.
+
+``make`` targets ``check-hpvm-pass``, ``check-hpvm-dnn``, and ``check-hpvm-profiler``
+tests various components of HPVM and are increasingly time-consuming.
+You can run tests similarly as how ``approxhpvm.py`` is compiled: for example,
+
+.. code-block:: shell
+
+   make -j<number of threads> check-hpvm-pass
+
+runs ``check-hpvm-pass`` tests. See TODO for details on benchmarks and test cases.
--- a/hpvm/docs/references/compilation-process.rst
+++ b/hpvm/docs/references/compilation-process.rst
+HPVM Compilation Process
+========================
+
+Compilation of an HPVM program involves the following steps:
+
+
+#. ``clang`` takes an HPVM-C/C++ program (e.g. ``main.c``) and produces an LLVM IR (``main.ll``) file that contains the HPVM-C function calls. The declarations of these functions are defined in ``test/benchmark/include/hpvm.h``, which must be included in the program.
+#. ``opt`` takes (``main.ll``) and invoke the GenHPVM pass on it, which converts the HPVM-C function calls to HPVM intrinsics. This generates the HPVM textual representation (``main.hpvm.ll``).
+#. ``opt`` takes the HPVM textual representation (``main.hpvm.ll``) and invokes the following passes in sequence: 
+
+   * BuildDFG: Converts the textual representation to the internal HPVM representation.
+   * LocalMem and DFG2LLVM_OpenCL: Invoked only when GPU target is selected. Generates the kernel module (``main.kernels.ll``) and the portion of the host code that invokes the kernel into the host module (``main.host.ll``).
+   * DFG2LLVM_CPU: Generates either all, or the remainder of the host module (``main.host.ll``) depending on the chosen target.
+   * ClearDFG: Deletes the internal HPVM representation from memory.
+
+#. ``clang`` is used to to compile any remaining project files that would be later linked with the host module.
+#. ``llvm-link`` takes the host module and all the other generate ``ll`` files, and links them with the HPVM runtime module (``hpvm-rt.bc``), to generate the linked host module (``main.host.linked.ll``). 
+#. Generate the executable code from the generated ``ll`` files for all parts of the program:
+
+   * GPU target: ``llvm-cbe`` takes the kernel module (``main.kernels.ll``) and generates an OpenCL representation of the kernels that will be invoked by the host.
+   * CPU target: ``clang`` takes the linked  host module (``main.host.linked.ll``) and generates the CPU binary.
--- a/hpvm/docs/references/hpvm-c.rst
+++ b/hpvm/docs/references/hpvm-c.rst
+.. role:: raw-html-m2r(raw)
+   :format: html
+
+
+HPVM-C Language Specification
+=============================
+
+An HPVM program is a combination of host code and one or more data flow graphs (DFG) at the IR level. We provide C function declarations representing the HPVM intrinsics that allow creating, querying, and interacting with the DFGs. More details about the HPVM IR intrinsics can be found in `the HPVM IR Specification <hpvm-specification.html>`_.
+
+An HPVM-C program contains both the host and the DFG code. Each HPVM kernel, represented by a leaf node in the DFG, can be compiled to multiple different targets (e.g. CPU and GPU) as described below. 
+
+This document describes all the API calls that can be used in an HPVM-C program.
+
+Host API
+--------
+
+``void __hpvm__init()``:raw-html-m2r:`<br>`
+Used before all other HPVM calls to initialize the HPVM runtime.
+
+``void __hpvm__cleanup()``:raw-html-m2r:`<br>`
+Used at the end of HPVM program to clean up all remaining runtime-created HPVM objects.
+
+``void llvm_hpvm_track_mem(void* ptr, size_t sz)``:raw-html-m2r:`<br>`
+Insert memory starting at ``ptr`` of size ``sz`` in the memory tracker of HPVM runtime.
+
+``void llvm_hpvm_untrack_mem(void* ptr)``:raw-html-m2r:`<br>`
+Stop tracking the memory object identified by ``ptr``.
+
+``void llvm_hpvm_request_mem(void* ptr, size_t sz)``:raw-html-m2r:`<br>`
+If the memory object identified by ``ptr`` is not in host memory, copy it to host memory.
+
+``void* __hpvm__launch(unsigned isStream, void* rootGraph, void* args)``:raw-html-m2r:`<br>`
+Launches the execution of the dataflow graph with node function ``rootGraph``. ``args`` is a pointer to a packed struct, containing one field per argument of the RootGraph function, consecutively. For non-streaming DFGs with a non empty result type, ``args`` must contain an additional field of the type ``RootGraph.returnTy``, where the result of the graph will be returned. ``isStream`` chooses between a non streaming (0) or streaming (1) graph execution. Returns a handle to the executing graph.
+
+``void __hpvm__wait(void* G)``:raw-html-m2r:`<br>`
+Waits for completion of execution of the dataflow graph with handle ``G``.
+
+``void __hpvm__push(void* G, void* args)``:raw-html-m2r:`<br>`
+Push set of input data items, ``args``, (same as type included in launch) to streaming DFG with handle ``G``.
+
+``void* __hpvm__pop(void* G)``:raw-html-m2r:`<br>`
+Pop and return data produced from one execution of streaming DFG with handle ``G``. The return type is a struct containing a field for every output of DFG. 
+
+Internal Node API
+-----------------
+
+``void* __hpvm__createNodeND(unsigned dims, void* F, ...)``:raw-html-m2r:`<br>`
+Creates a static dataflow node replicated in ``dims`` dimensions (0 to 3), each executing node function ``F``. The arguments following ``F`` are the size of each dimension, respectively, passed in as a ``size_t``. Returns a handle to the created dataflow node.
+
+``void* __hpvm__edge(void* src, void* dst, unsigned replType, unsigned sp, unsigned dp, unsigned isStream)``:raw-html-m2r:`<br>`
+Creates an edge from output ``sp`` of node ``src`` to input ``dp`` of node ``dst``. If ``replType`` is 0, the edge is a one-to-one edge, otherwise it is an all-to-all edge. ``isStream`` defines whether or not the edge is streaming. Returns a handle to the created edge.
+
+``void __hpvm__bindIn(void* N, unsigned ip, unsigned ic, unsigned isStream)``:raw-html-m2r:`<br>`
+Binds the input ``ip`` of the current node to input ``ic`` of child node function ``N``. ``isStream`` defines whether or not the input bind is streaming.
+
+``void __hpvm__bindOut(void* N, unsigned op, unsigned oc, unsigned isStream)``:raw-html-m2r:`<br>`
+Binds the output ``op`` of the current node to output ``oc`` of child node function ``N``. ``isStream`` defines whether or not the output bind is streaming.
+
+``void __hpvm__hint(enum Target target)`` (C):raw-html-m2r:`<br>`
+``void __hpvm__hint(hpvm::Target target)`` (C++):raw-html-m2r:`<br>`
+Must be called once in each node function. Indicates which hardware target the current function should run in.
+
+``void __hpvm__attributes(unsigned ni, …, unsigned no, …)``:raw-html-m2r:`<br>`
+Must be called once at the beginning of each node function. Defines the properties of the pointer arguments to the current function. ``ni`` represents the number of input arguments, and ``no`` the number of output arguments. The arguments following ``ni`` are the input arguments, and the arguments following ``no`` are the output arguments. Arguments can be marked as both input and output. All pointer arguments must be included.
+
+Leaf Node API
+-------------
+
+``void __hpvm__hint(enum Target target)`` (C):raw-html-m2r:`<br>`
+``void __hpvm__hint(hpvm::Target target)`` (C++):raw-html-m2r:`<br>`
+As described in internal node API.
+
+``void __hpvm__attributes(unsigned ni, …, unsigned no, …)``:raw-html-m2r:`<br>`
+As described in internal node API.
+
+``void __hpvm__return(unsigned n, ...)``:raw-html-m2r:`<br>`
+Returns ``n`` values from a leaf node function. The remaining arguments are the values to be returned. All ``__hpvm__return`` statements within the same function must return the same number of values.
+
+``void* __hpvm__getNode()``:raw-html-m2r:`<br>`
+Returns a handle to the current leaf node.
+
+``void* __hpvm__getParentNode(void* N)``:raw-html-m2r:`<br>`
+Returns a handle to the parent node of node ``N``.
+
+``long __hpvm__getNodeInstanceID_{x,y,z}(void* N)``:raw-html-m2r:`<br>`
+Returns the dynamic ID of the current instance of node ``N`` in the x, y, or z dimension respectively. The dimension must be one of the dimensions in which the node is replicated.
+
+``long __hpvm__getNumNodeInstances_{x,y,z}(void* N)``:raw-html-m2r:`<br>`
+Returns the number of dynamic instances of node ``N`` in the x, y, or z dimension respectively. The dimension must be one of the dimensions in which the node is replicated.
+
+``void* __hpvm__malloc(long nBytes)``:raw-html-m2r:`<br>`
+Allocate a block of memory of size ``nBytes`` and returns a pointer to it. The allocated object can be shared by all nodes. *Note that the returned pointer must somehow be communicated explicitly for use by other nodes.*
+
+``int __hpvm__atomic_add(int* m, int v)``:raw-html-m2r:`<br>`
+Atomically adds ``v`` to the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
+
+``int __hpvm__atomic_sub(int* m, int v)``:raw-html-m2r:`<br>`
+Atomically subtracts ``v`` from the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
+
+``int __hpvm__atomic_min(int* m, int v)``:raw-html-m2r:`<br>`
+Atomically computes the min of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
+
+``int __hpvm__atomic_max(int* m, int v)``:raw-html-m2r:`<br>`
+Atomically computes the max of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
+
+``int __hpvm__atomic_xchg(int* m, int v)``:raw-html-m2r:`<br>`
+Atomically swaps ``v`` with the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
+
+``int __hpvm__atomic_and(int* m, int v)``:raw-html-m2r:`<br>`
+Atomically computes the bitwise AND of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
+
+``int __hpvm__atomic_or(int* m, int v)``:raw-html-m2r:`<br>`
+Atomically computes the bitwise OR of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
+
+``int __hpvm__atomic_xor(int* m, int v)``:raw-html-m2r:`<br>`
+Atomically computes the bitwise XOR of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
+
+``void __hpvm__barrier()``:raw-html-m2r:`<br>`
+Local synchronization barrier across dynamic instances of current leaf node.
+
+Porting a Program from C to HPVM-C
+==================================
+
+The following represents the required steps to port a regular C program into an HPVM program with HPVM-C. These steps are described at a high level; for more detail, please see `hpvm-cava </hpvm/test/benchmarks/hpvm-cava>`_ provided in `benchmarks </hpvm/test/benchmarks>`_.
+
+
+* Separate the computation that will become a kernel into its own (leaf node) function and add the attributes and target hint.
+* Create a level 1 wrapper node function that will describe the thread-level parallelism (for the GPU). The node will:
+
+  * Use the ``createNode[ND]()`` method to create a kernel node and specify how many threads will execute it.
+  * Bind its arguments to the kernel arguments.
+
+* If desired, create a level 2 wrapper node function which will describe the threadblock-level parallalism (for the GPU). This node will:
+
+  * Use the ``createNode[ND]()`` method to create a level 1 wrapper node and specify how many threadblocks will execute it.
+  * Bind its arguments to its child node's arguments.
+
+* A root node function that creates all the top-level wrapper nodes, binds their arguments, and connects their edges.
+
+  * Each root node represents a DFG.
+
+* All the above node functions have the combined arguments of all the kernels that are nested at each level. 
+* The host code will have to include the following:
+
+  * Initialize the HPVM runtime using the ``init()`` method.
+  * Create an argument struct for each DFG and assign its member variables.
+  * Add all the memory that is required by the kernel into the memory tracker.
+  * Launch the DFG by calling the ``launch()`` method on the root node function, and passing the corresponding argument struct.
+  * Wait for the DFG to complete execution.
+  * Read out any generated memory using the ``request_mem()`` method.
+  * Remove all the tracked memory from the memory tracker.
--- a/hpvm/docs/hpvm-specification.md
+++ b/hpvm/docs/hpvm-specification.md
--- a/hpvm/docs/references/index.rst
+++ b/hpvm/docs/references/index.rst
+References
+============
+
+Below are some technical details of HPVM system and the HPVM-C language.
+
+.. toctree::
+   :maxdepth: 1
+
+   hpvm-c
+   hpvm-specification
+   compilation-process
--- a/hpvm/docs/tests.rst
+++ b/hpvm/docs/tests.rst
+../test/README.rst
\ No newline at end of file
--- a/hpvm/docs/tradeoff-curves/alexnet2.pdf
+++ b/hpvm/docs/tradeoff-curves/alexnet2.pdf