Skip to content
Snippets Groups Projects
Commit f6e1bf5c authored by Yifan Zhao's avatar Yifan Zhao
Browse files

Small improvements to the doc

parent 1d52149a
No related branches found
No related tags found
No related merge requests found
......@@ -12,9 +12,13 @@ Dependencies
* GNU Make (>=3.79) or Ninja (>=1.10)
* Python (>=3.6) with pip (>=20)
* Python (==3.6) with pip (>=20)
* Python must be strictly 3.6 (any subversion from 3.6.0 to 3.6.13).
This is needed by some Python packages in HPVM.
* If you choose to not install these packages, then any Python >= 3.6 will work.
See :ref:`how to skip installing Python packages in the installer <skip-pypkg>`.
* OpenCL (>=1.0.0) is required for compiling HPVM-C code on GPU; otherwise, only CPU is available.
......@@ -28,10 +32,10 @@ Dependencies
* OpenMP (>= 4.0)
* GCC comes with OpenMP support; OpenMP-4.0 is supported by GCC-4.9 onward.
see `here <https://gcc.gnu.org/wiki/openmp>`_ for the OpenMP version supported by each GCC version.
see `here <https://gcc.gnu.org/wiki/openmp>`__ for the OpenMP version supported by each GCC version.
* In addition, each version of CUDA-nvcc requires GCC to be not newer than a certain version.
See `here <https://gist.github.com/ax3l/9489132>`_ for the support matrix.
See `here <https://gist.github.com/ax3l/9489132>`__ for the support matrix.
Python Environment
......@@ -100,7 +104,7 @@ the directory ``hpvm/projects/predtuner`` should be empty,
which can be fixed with ``git submodule update --recursive --init``.
HPVM needs to be able to find CUDA.
If CUDA is installed in your system's `$PATH` (e.g. if it was installed at the default location),
If CUDA is installed in your system's ``$PATH`` (e.g. if it was installed at the default location),
HPVM can find CUDA automatically.
Use HPVM installer script to download extra components, configure and build HPVM:
......@@ -159,6 +163,26 @@ The HPVM installer performs the following tasks:
* While running tests is recommended, it is not turned on by default as it is very time-consuming.
.. _skip-pypkg:
Skipping Python Package installation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you are installing HPVM on a "target" device which is just used for
:ref:`profiling <target-profiling>`,
you may not need to install the frontend and the tuner packages.
These packages also have Python version requirement and package dependencies
that may be hard to meet on some devices, especially edge computing devices with ARM CPUs.
You can instead skip the installation by either passing ``--no-pypkg`` flag to
the installer, or answering yes ("y") when it prompt the following:
.. code-block:: text
Install HPVM Python Packages (recommended)? [y/n]
In this case, any Python >= 3.6 will work.
TroubleShooting
^^^^^^^^^^^^^^^
......
.. _hpvm-comp-process:
HPVM Compilation Process
========================
Compilation of an HPVM program involves the following steps:
#. ``clang`` takes an HPVM-C/C++ program (e.g. ``main.c``) and produces an LLVM IR (``main.ll``) file that contains the HPVM-C function calls. The declarations of these functions are defined in ``test/benchmark/include/hpvm.h``, which must be included in the program.
#. ``opt`` takes (``main.ll``) and invoke the GenHPVM pass on it, which converts the HPVM-C function calls to HPVM intrinsics. This generates the HPVM textual representation (``main.hpvm.ll``).
#. ``opt`` takes the HPVM textual representation (``main.hpvm.ll``) and invokes the following passes in sequence:
* BuildDFG: Converts the textual representation to the internal HPVM representation.
* LocalMem and DFG2LLVM_OpenCL: Invoked only when GPU target is selected. Generates the kernel module (``main.kernels.ll``) and the portion of the host code that invokes the kernel into the host module (``main.host.ll``).
* DFG2LLVM_CPU: Generates either all, or the remainder of the host module (``main.host.ll``) depending on the chosen target.
* ClearDFG: Deletes the internal HPVM representation from memory.
#. ``clang`` is used to to compile any remaining project files that would be later linked with the host module.
#. ``llvm-link`` takes the host module and all the other generate ``ll`` files, and links them with the HPVM runtime module (``hpvm-rt.bc``), to generate the linked host module (``main.host.linked.ll``).
#. Generate the executable code from the generated ``ll`` files for all parts of the program:
* GPU target: ``llvm-cbe`` takes the kernel module (``main.kernels.ll``) and generates an OpenCL representation of the kernels that will be invoked by the host.
* CPU target: ``clang`` takes the linked host module (``main.host.linked.ll``) and generates the CPU binary.
......@@ -5,8 +5,8 @@ Developer Documents
:maxdepth: 1
approximation-implementation
backend-passes
cnn-models
compilation-process
configuration-format
dynamic-approximation
port-to-hpvm-c
......@@ -190,8 +190,10 @@ After the tuning finishes, the tuner will
It is also possible to save the configuration in other formats
(see the `predtuner documentation <https://predtuner.readthedocs.io/en/latest/index.html>`_).
Profiling the Configurations
----------------------------
.. _target-profiling:
Profiling the Configurations on Target Device
---------------------------------------------
We will use `hpvm_profiler` (a Python package) for profiling the ``./hpvm_confs.txt``
we obtained in the tuning step.
......@@ -203,6 +205,8 @@ we obtained in the tuning step.
where the performance gain is desired.
As the compiled binary is usually not compatible across architectures,
you need to install HPVM on the edge device and recompile the model.
You may also want to :ref:`skip Python packages in the installation <skip-pypkg>`
to reduce some constraints on Python version and Python packages.
* **Also note** that currently,
the approximation implementations in the tensor runtime are tuned for Jetson TX2,
......
......@@ -52,11 +52,11 @@ Please refer to :doc:`getting-started` for how to build and use HPVM.
getting-started
build-hpvm
FAQs<faqs>
components/index
specifications/index
developerdocs/index
gallery
FAQs<faqs>
Indices and tables
------------------
......
......@@ -99,7 +99,7 @@ For each benchmark ``${bench_name}``, the binary is generated at
Alternatively, it's possible to build just 1 DNN benchmark.
The output of CMake shows a list of these benchmarks as target names, starting with
..
.. code-block:: text
List of test dnn benchmarks: alexnet2_cifar10;alexnet2_cifar10...
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment