Skip to content
Snippets Groups Projects
Commit b9f93499 authored by Yifan Zhao's avatar Yifan Zhao
Browse files

Merge remote-tracking branch 'origin/hpvm-release-docs' into hpvm-release-exp

parents 32fe95a7 05778aae
No related branches found
No related tags found
No related merge requests found
......@@ -44,7 +44,7 @@ Python Environment
It is strongly recommended to use some Python virtual environment,
as HPVM will install a few Python packages during this installation process.
* Some HPVM Python packages contains executables. If you don't use a virtual environment,
* Some HPVM Python packages contain executables. If you don't use a virtual environment,
these executables are installed to your local ``bin`` directory, usually ``$HOME/.local/bin``.
Please ensure this directory is in your `$PATH` variable.
Below it is assumed that these executables are visible through `$PATH`.
......
......@@ -33,7 +33,7 @@ Consider a 3 dimensional Leaf node function with the following body:
p[index] = p[index] + q[index];
__hpvm_return(4, p, pSize, q, qSize);
__hpvm__return(4, p, pSize, q, qSize);
}
......@@ -62,7 +62,7 @@ The above example will illustrate the steps in transforming the leaf node in thi
p[index] = p[index] + q[index];
__hpvm_return(4, p, pSize, q, qSize);
__hpvm__return(4, p, pSize, q, qSize);
}
......@@ -89,7 +89,7 @@ The above example will illustrate the steps in transforming the leaf node in thi
p[index] = p[index] + q[index];
__hpvm_return(4, p, pSize, q, qSize);
__hpvm__return(4, p, pSize, q, qSize);
}
......
.. role:: raw-html-m2r(raw)
:format: html
.. |br| raw:: html
<br/>
HPVM Language Reference
=======================
......@@ -105,25 +106,25 @@ Intrinsics for Describing Graphs
The intrinsics for describing graphs can only be used by internal nodes. Also, internal nodes are only allowed to have these intrinsics as part of their node function, with the exception of a return statement of the appropriate type, in order to return the result of the outgoing dataflow edges.
``i8* llvm.hpvm.createNode(i8* F)``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.createNode(i8* F)`` |br|
Create a static dataflow node with one dynamic instance executing node function ``F``. Return a handle to the created node.
``i8* llvm.hpvm.createNode1D(i8* F, i64 n1)``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.createNode1D(i8* F, i64 n1)`` |br|
Create a static dataflow node replicated in one dimension, namely ``x``, with ``n1`` dynamic instances executing node function ``F``. Return a handle to the created node.
``i8* llvm.hpvm.createNode2D(i8* F, i64 n1, i64 n2)``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.createNode2D(i8* F, i64 n1, i64 n2)`` |br|
Create a static dataflow node replicated in two dimensions, namely ``x`` and ``y``, with ``n1`` and ``n2`` dynamic instances in each dimension respectively, executing node function ``F``. Return a handle to the created node.
``i8* llvm.hpvm.createNode3D(i8* F, i64 n1, i64 n2, i64 n3)``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.createNode3D(i8* F, i64 n1, i64 n2, i64 n3)`` |br|
Create a static dataflow node replicated in three dimensions, namely ``x``, ``y`` and ``z``, with ``n1``, ``n2`` and ``n3`` dynamic instances in each dimension respectively, executing node function ``F``. Return a handle to the created node.
``i8* llvm.hpvm.createEdge(i8* Src, i8* Dst, i1 ReplType, i32 sp, i32 dp, i1 isStream)``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.createEdge(i8* Src, i8* Dst, i1 ReplType, i32 sp, i32 dp, i1 isStream)`` |br|
Create edge from output ``sp`` of node ``Src`` to input ``dp`` of node ``Dst``. Argument ``dp`` of ``Dst``'s node function and field ``sp`` of the return struct in ``Src``'s node function must have matching types. ``ReplType`` chooses between a one-to-one (0) or all-to-all (1) edge. ``isStream`` chooses a streaming (1) or non streaming (0) edge. Return a handle to the created edge.
``void llvm.hpvm.bind.input(i8* N, i32 ip, i32 ic, i1 isStream)``:raw-html-m2r:`<br>`
``void llvm.hpvm.bind.input(i8* N, i32 ip, i32 ic, i1 isStream)`` |br|
Bind input ``ip`` of current node to input ``ic`` of child node ``N``. Argument ``ic`` of ``N``'s node function and argument ``ip`` of the current node function must have matching types. ``isStream`` chooses a streaming (1) or non streaming (0) bind.
``void llvm.hpvm.bind.output(i8* N, i32 oc, i32 op, i1 isStream)``:raw-html-m2r:`<br>`
``void llvm.hpvm.bind.output(i8* N, i32 oc, i32 op, i1 isStream)`` |br|
Bind output ``oc`` of child node ``N`` to output ``op`` of current node. Field ``oc`` of the return struct in ``N``'s node function and field ``op`` of the return struct in the current node function must have matching types. ``isStream`` chooses a streaming (1) or non streaming (0) bind.
Intrinsics for Querying Graphs
......@@ -131,19 +132,19 @@ Intrinsics for Querying Graphs
The following intrinsics are used to query the structure of the DFG. They can only be used by leaf nodes.
``i8* llvm.hpvm.getNode()``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.getNode()`` |br|
Return a handle to the current leaf node.
``i8* llvm.hpvm.getParentNode(i8* N)``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.getParentNode(i8* N)`` |br|
Return a handle to the parent in the hierarchy of node ``N``.
``i32 llvm.hpvm.getNumDims(i8* N)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.getNumDims(i8* N)`` |br|
Get the number of dimensions of node ``N``.
``i64 llvm.hpvm.getNodeInstanceID.{x,y,z}(i8* N)``:raw-html-m2r:`<br>`
``i64 llvm.hpvm.getNodeInstanceID.{x,y,z}(i8* N)`` |br|
Get index of current dynamic node instance of node ``N`` in dimension x, y or z respectively. The dimension must be one of the dimensions in which the node is replicated.
``i64 llvm.hpvm.getNumNodeInstances.{x,y,z}(i8* N)``:raw-html-m2r:`<br>`
``i64 llvm.hpvm.getNumNodeInstances.{x,y,z}(i8* N)`` |br|
Get number of dynamic instances of node ``N`` in dimension x, y or z respectively. The dimension must be one of the dimensions in which the node is replicated.
Intrinsics for Memory Allocation and Synchronization
......@@ -151,35 +152,35 @@ Intrinsics for Memory Allocation and Synchronization
The following intrinsics are used for memory allocation and synchronization. They can only be used by leaf nodes.
``i8* llvm.hpvm.malloc(i64 nBytes)``:raw-html-m2r:`<br>`
Allocate a block of memory of size ``nBytes`` and return pointer to it. The allocated object can be shared by all nodes.:raw-html-m2r:`<br>`
``i8* llvm.hpvm.malloc(i64 nBytes)`` |br|
Allocate a block of memory of size ``nBytes`` and return pointer to it. The allocated object can be shared by all nodes. |br|
*Note that the returned pointer must somehow be communicated explicitly for use by other nodes.*
``i32 llvm.hpvm.atomic.add(i8* m, i32 v)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.atomic.add(i8* m, i32 v)`` |br|
Atomically computes the bitwise ADD of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
``i32 llvm.hpvm.atomic.sub(i8* m, i32 v)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.atomic.sub(i8* m, i32 v)`` |br|
Atomically computes the bitwise SUB of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
``i32 llvm.hpvm.atomic.min(i8* m, i32 v)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.atomic.min(i8* m, i32 v)`` |br|
Atomically computes the bitwise MIN of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
``i32 llvm.hpvm.atomic.max(i8* m, i32 v)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.atomic.max(i8* m, i32 v)`` |br|
Atomically computes the bitwise MAX of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
``i32 llvm.hpvm.atomic.xchg(i8* m, i32 v)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.atomic.xchg(i8* m, i32 v)`` |br|
Atomically computes the bitwise XCHG of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
``i32 llvm.hpvm.atomic.and(i8* m, i32 v)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.atomic.and(i8* m, i32 v)`` |br|
Atomically computes the bitwise AND of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
``i32 llvm.hpvm.atomic.or(i8* m, i32 v)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.atomic.or(i8* m, i32 v)`` |br|
Atomically computes the bitwise OR of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
``i32 llvm.hpvm.atomic.xor(i8* m, i32 v)``:raw-html-m2r:`<br>`
``i32 llvm.hpvm.atomic.xor(i8* m, i32 v)`` |br|
Atomically computes the bitwise XOR of ``v`` and the value stored at memory location ``[m]`` w.r.t. the dynamic instances of the current leaf node and stores the result back into ``[m]``. Returns the value previously stored at ``[m]``.
``void llvm.hpvm.barrier()``:raw-html-m2r:`<br>`
``void llvm.hpvm.barrier()`` |br|
Local synchronization barrier across dynamic instances of current leaf node.
Intrinsics for Graph Interaction
......@@ -187,31 +188,31 @@ Intrinsics for Graph Interaction
The following intrinsics are for graph initialization/termination and interaction with the host code, and can be used only by the host code.
``void llvm.hpvm.init()``:raw-html-m2r:`<br>`
``void llvm.hpvm.init()`` |br|
Initialization of HPVM runtime.
``void llvm.hpvm.cleanup()``:raw-html-m2r:`<br>`
``void llvm.hpvm.cleanup()`` |br|
Cleanup of HPVM runtime created objects.
``void llvm.hpvm.trackMemory(i8* ptr, i64 sz)``:raw-html-m2r:`<br>`
``void llvm.hpvm.trackMemory(i8* ptr, i64 sz)`` |br|
Insert memory starting at ``ptr`` of size ``sz`` in the memory tracker. ``ptr`` becomes the key for identifying this memory object. As soon as a memory object is inserted in the memory tracker it starts being tracked, and can be passed as a data item to a DFG.
``void llvm.hpvm.untrackMemory(i8* ptr)``:raw-html-m2r:`<br>`
``void llvm.hpvm.untrackMemory(i8* ptr)`` |br|
Stop tracking memory object with key ``ptr``, and remove it from memory tracker.
``void llvm.hpvm.requestMemory(i8* ptr, i64 sz)``:raw-html-m2r:`<br>`
``void llvm.hpvm.requestMemory(i8* ptr, i64 sz)`` |br|
If memory object with key ``ptr`` is not located in host memory, copy it to host memory.
``i8* llvm.hpvm.launch(i8* RootGraph, i8* Args, i1 isStream)``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.launch(i8* RootGraph, i8* Args, i1 isStream)`` |br|
Launch the execution of a top-level DFG with root node function ``RootGraph``. ``Args`` is a pointer to a packed struct, containing one field per argument of the ``RootGraph`` function, consecutively. For non-streaming DFGs with a non empty result type, ``Args`` must contain an additional field of the type ``RootGraph.returnTy``, where the result of the graph will be returned. ``isStream`` chooses between a non streaming (0) or streaming (1) graph execution. Return a handle to the invoked DFG.
``void llvm.hpvm.wait(i8* GraphID)``:raw-html-m2r:`<br>`
``void llvm.hpvm.wait(i8* GraphID)`` |br|
Wait for completion of execution of DFG with handle ``GraphID``.
``void llvm.hpvm.push(i8* GraphID, i8* args)``:raw-html-m2r:`<br>`
``void llvm.hpvm.push(i8* GraphID, i8* args)`` |br|
Push set of input data ``args`` (same as type included in launch) to streaming DFG with handle ``GraphID``.
``i8* llvm.hpvm.pop(i8* GraphID)``:raw-html-m2r:`<br>`
``i8* llvm.hpvm.pop(i8* GraphID)`` |br|
Pop and return data from streaming DFG with handle ``GraphID``. The return type is a struct containing a field for every output of DFG.
Implementation Limitations
......
......@@ -128,17 +128,5 @@ when the Module is exported into ONNX:
- Add
This choice of operators is largely constrained by backend (tensor_runtime) supports.
This choice of operators is largely constrained by the operators supported by the current backends (tensor_runtime).
TODOs
-----
#. Optionally insert a Python-C interface in the generated binary to
call back into a Dataset class and read the data.
* Needs pybind11, hardcoding of Python environment, and some fiddling with import mechanism.
#. Expand the list of operators supported in the frontend.
* Most ideally, create a high-level description of operators that can tie
HPVM-C intrinsics and the frontend list of operators together.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment