Skip to content
Snippets Groups Projects
Commit c3d3c694 authored by Yifan Zhao's avatar Yifan Zhao
Browse files

Merge remote-tracking branch 'origin/hpvm-reorg-9-doc' into hpvm-reorg-9

parents b8dc6fc1 0e059c6e
No related branches found
No related tags found
No related merge requests found
...@@ -37,7 +37,7 @@ Pop and return data produced from one execution of streaming DFG with handle ``` ...@@ -37,7 +37,7 @@ Pop and return data produced from one execution of streaming DFG with handle ```
```void* __hpvm__createNodeND(unsigned dims, void* F, ...)``` ```void* __hpvm__createNodeND(unsigned dims, void* F, ...)```
Creates a static dataflow node replicated in ```dims``` dimensions (0 to 3), each executing node function ```F```. The arguments following ```F``` are the size of each dimension, respectively, passed in as a ```size_t```. Returns a handle to the created dataflow node. Creates a static dataflow node replicated in ```dims``` dimensions (0 to 3), each executing node function ```F```. The arguments following ```F``` are the size of each dimension, respectively, passed in as a ```size_t```. Returns a handle to the created dataflow node.
```void* __hpvm__edge(void* src, void* dst, unsigned replType, unsigned sp, unsigned dp, unsigned stream)``` ```void* __hpvm__edge(void* src, void* dst, unsigned replType, unsigned sp, unsigned dp, unsigned isStream)```
Creates an edge from output ```sp``` of node ```src``` to input ```dp``` of node ```dst```. If ```replType``` is 0, the edge is a one-to-one edge, otherwise it is an all-to-all edge. ```isStream``` defines whether or not the edge is streaming. Returns a handle to the created edge. Creates an edge from output ```sp``` of node ```src``` to input ```dp``` of node ```dst```. If ```replType``` is 0, the edge is a one-to-one edge, otherwise it is an all-to-all edge. ```isStream``` defines whether or not the edge is streaming. Returns a handle to the created edge.
```void __hpvm__bindIn(void* N, unsigned ip, unsigned ic, unsigned isStream)``` ```void __hpvm__bindIn(void* N, unsigned ip, unsigned ic, unsigned isStream)```
...@@ -46,14 +46,16 @@ Binds the input ```ip``` of the current node to input ```ic``` of child node fun ...@@ -46,14 +46,16 @@ Binds the input ```ip``` of the current node to input ```ic``` of child node fun
```void __hpvm__bindOut(void* N, unsigned op, unsigned oc, unsigned isStream)``` ```void __hpvm__bindOut(void* N, unsigned op, unsigned oc, unsigned isStream)```
Binds the output ```op``` of the current node to output ```oc``` of child node function ```N```. ```isStream``` defines whether or not the output bind is streaming. Binds the output ```op``` of the current node to output ```oc``` of child node function ```N```. ```isStream``` defines whether or not the output bind is streaming.
```void __hpvm__hint(enum Target target)``` (C\) / ```void __hpvm__hint(hpvm::Target target)``` (C++) ```void __hpvm__hint(enum Target target)``` (C\)
```void __hpvm__hint(hpvm::Target target)``` (C++)
Must be called once in each node function. Indicates which hardware target the current function should run in Must be called once in each node function. Indicates which hardware target the current function should run in
```void __hpvm__attributes(unsigned ni, …, unsigned no, …)``` ```void __hpvm__attributes(unsigned ni, …, unsigned no, …)```
Must be called once at the beginning of each node function. Defines the properties of the pointer arguments to the current function. ```ni``` represents the number of input arguments, and ```no``` the number of output arguments. The arguments following ```ni``` are the input arguments, and the arguments following ```no``` are the output arguments. Arguments can be marked as both input and output. All pointer arguments must be included. Must be called once at the beginning of each node function. Defines the properties of the pointer arguments to the current function. ```ni``` represents the number of input arguments, and ```no``` the number of output arguments. The arguments following ```ni``` are the input arguments, and the arguments following ```no``` are the output arguments. Arguments can be marked as both input and output. All pointer arguments must be included.
## Leaf Node API ## Leaf Node API
```void __hpvm__hint(enum Target target)``` (C\) / ```void __hpvm__hint(hpvm::Target target)``` (C++) ```void __hpvm__hint(enum Target target)``` (C\)
```void __hpvm__hint(hpvm::Target target)``` (C++)
As described in internal node API. As described in internal node API.
```void __hpvm__attributes(unsigned ni, …, unsigned no, …)``` ```void __hpvm__attributes(unsigned ni, …, unsigned no, …)```
......
...@@ -2,32 +2,34 @@ ...@@ -2,32 +2,34 @@
An HPVM program is a combination of host code plus a set of one or more distinct dataflow graphs. Each dataflow graph (DFG) is a hierarchical graph with side effects. Nodes represent units of execution, and edges between nodes describe the explicit data transfer requirements. A node can begin execution once a data item becomes available on every one of its input edges. Repeated transfer of data items between nodes (if more inputs are provided) yields a pipelined execution of different nodes in the graph. The execution of a DFG is initiated and terminated by host code that launches the graph. Nodes may access globally shared memory through load and store instructions (side-effects). An HPVM program is a combination of host code plus a set of one or more distinct dataflow graphs. Each dataflow graph (DFG) is a hierarchical graph with side effects. Nodes represent units of execution, and edges between nodes describe the explicit data transfer requirements. A node can begin execution once a data item becomes available on every one of its input edges. Repeated transfer of data items between nodes (if more inputs are provided) yields a pipelined execution of different nodes in the graph. The execution of a DFG is initiated and terminated by host code that launches the graph. Nodes may access globally shared memory through load and store instructions (side-effects).
## Dataflow Node ## Dataflow Node
A dataflow node represents unit of computation in the DFG. A node can begin execution once a data item becomes available on every one of its input edges. A *dataflow node* represents unit of computation in the DFG. A node can begin execution once a data item becomes available on every one of its input edges.
A single static dataflow node represents multiple dynamic instances of the node, each executing the same computation with different index values. The dynamic instances of a node are required to be independent of each other except on HPVM synchronization operations. A single static dataflow node represents multiple dynamic instances of the node, each executing the same computation with different index values used to uniquely identify each dynamic instance w.r.t. the others. The dynamic instances of a node may be executed concurrently, and any required synchronization must imposed using HPVM synchronization operations.
Each dataflow node in a DFG can either be a leaf node or an internal node. An internal node contains a complete DFG, called a child graph, and the child graph itself can have internal nodes and/or leaf nodes. Each dataflow node in a DFG can either be a *leaf node* or an *internal node*. An internal node contains a complete DFG, called a *child graph*, and the child graph itself can have internal nodes and/or leaf nodes.
Internal nodes only create the structure of the child graph, and cannot include actual computation. The DFG cannot be modified at runtime except for the number of dynamic instances, which can be data dependent. Internal nodes only create the structure of the child graph, and cannot include actual computation. The DFG cannot be modified at runtime except for the number of dynamic instances, which can be data dependent.
Leaf nodes contain code expressing actual computations. Leaf nodes may contain instructions to query the structure of the underlying DFG, and any non host side HPVM operation for synchronization and memory allocation. Leaf nodes contain code expressing actual computations. Leaf nodes may contain instructions to query the structure of the underlying DFG, and any non host side HPVM operation for synchronization and memory allocation.
## Dataflow Edge ## Dataflow Edge
A dataflow edge from the output out of a source dataflow node ```Src``` to the input in of a sink dataflow node ```Dst``` describes the explicit data transfer requirements. ```Src``` and ```Dst``` node must belong to the same child graph, i.e. must be children of the same internal node. A *dataflow edge* from the output ```out``` of a source dataflow node ```Src``` to the input ```in``` of a sink dataflow node ```Dst``` describes the explicit data transfer requirements. ```Src``` and ```Dst``` node must belong to the same child graph, i.e. must be children of the same internal node.
An edge from source to sink has the semantics of copying the specified data from the source to the sink after the source node has completed execution. The pairs ```(Src, out)```, and ```(Dst, in)``` must be unique, i.e. no two dataflow edges in the same graph can have the same source or destination. An edge from source to sink has the semantics of copying the specified data from the source to the sink after the source node has completed execution. The pairs ```(Src, out)```, and ```(Dst, in)``` must be unique w.r.t. every other edge in the same child graph, i.e. two dataflow edges in the same child graph cannot have the same source or destination.
A static edge also represents multiple dynamic instances of that edge between the dynamic instances of the source and the sink nodes. A static edge also represents multiple dynamic instances of that edge between the dynamic instances of the source and the sink nodes.
An edge can be instantiated at runtime using one of two replication mechanisms: all-to-all, where all dynamic instances of the source node are connected to all dynamic instances of the sink node, thus expressing a synchronization barrier between the two groups of nodes, or one-to-one, where each dynamic instance of the source node is connected to a single corresponding instance of the sink node. One-to-one replication requires that the grid structure (number of dimensions and the extents in each dimension) of the source and sink nodes be identical. An edge can be instantiated at runtime using one of two replication mechanisms:
- *All-to-all*, where all dynamic instances of the source node are connected to all dynamic instances of the sink node, thus expressing a synchronization barrier between the two groups of nodes, or
- *One-to-one*, where each dynamic instance of the source node is connected to a single corresponding instance of the sink node. One-to-one replication requires that the grid structure (number of dimensions and the extents in each dimension) of the source and sink nodes be identical.
## Input and Output Bind ## Input and Output Bind
An internal node is responsible for mapping its inputs, provided by incoming dataflow edges, to the inputs to one or more nodes of the child graph. An internal node is responsible for mapping its inputs, provided by incoming dataflow edges, to the inputs to one or more nodes of the child graph.
An internal node binds its input ```ip``` to input ```ic``` of its child node ```Dst``` using an input bind. An internal node binds its input ```ip``` to input ```ic``` of its child node ```Dst``` using an *input bind*.
The pair ```(Dst, ic)``` must be unique, i.e. no two input binds in the same graph can have the same destination, as that would create a conflict. Semantically, these represent name bindings of input values and not data movement. The pair ```(Dst, ic)``` must be unique, i.e. no two input binds in the same graph can have the same destination, as that would create a conflict. Semantically, these represent name bindings of input values and not data movement.
Conversely, an internal node binds output ```oc``` of its child node ```Src``` to its output ```op``` using an output bind. The pair ```(Src, oc)``` and destination ```op``` must be unique, i.e. no two output binds in the same graph can have the same source destination, as that would create a conflict. Conversely, an internal node binds output ```oc``` of its child node ```Src``` to its output ```op``` using an *output bind*. The pair ```(Src, oc)``` and destination ```op``` must be unique, i.e. no two output binds in the same graph can have the same source destination, as that would create a conflict.
A bind is always all-to-all. A bind is always all-to-all.
...@@ -37,13 +39,13 @@ In an HPVM program, the host code is responsible for setting up, initiating the ...@@ -37,13 +39,13 @@ In an HPVM program, the host code is responsible for setting up, initiating the
- **Initialization and Cleanup**: - **Initialization and Cleanup**:
All HPVM operations must be enclosed by the HPVM initialization and cleanup. These operations perform initialization and cleanup of runtime constructs that provide the runtime support for HPVM. All HPVM operations must be enclosed by the HPVM initialization and cleanup. These operations perform initialization and cleanup of runtime constructs that provide the runtime support for HPVM.
- **Track Memory**: - **Track Memory**:
Memory objects that are passed to dataflow graphs need to be managed by the HPVM runtime. The HPVM runtime includes a memory tracker for recording the location of HPVM-managed memory objects. Track memory starts tracking specified memory object. Memory objects that are passed to dataflow graphs need to be managed by the HPVM runtime. The HPVM runtime includes a memory tracker for recording the location, or tracking, of HPVM-managed memory objects. Track memory starts tracking specified memory object.
- **Untrack Memory**: - **Untrack Memory**:
Stop tracking specified memory object. Stop tracking specified memory object.
- **Request Memory**: - **Request Memory**:
If the specified memory object is not present in host memory, copy it to host memory. If the specified memory object is not present in host memory, copy it to host memory.
- **Launch**: - **Launch**:
The host code initiates execution of specified DFG, either streaming or non streaming, and passes initial data. All data for one graph execution must be provided. The host code initiates execution of specified DFG, either streaming or non streaming, and provides initial data. All data for one graph execution must be provided.
- **Wait**: - **Wait**:
The host code blocks for completion of specified DFG. The host code blocks for completion of specified DFG.
- **Push**: - **Push**:
...@@ -61,7 +63,7 @@ The code for each dataflow node is given as a separate LLVM function, called the ...@@ -61,7 +63,7 @@ The code for each dataflow node is given as a separate LLVM function, called the
The incoming dataflow edges and their data types are denoted by the parameters to the node function. The outgoing dataflow edges are represented by the return type of the node function, which must be an LLVM struct type with zero or more fields (one per outgoing edge). The incoming dataflow edges and their data types are denoted by the parameters to the node function. The outgoing dataflow edges are represented by the return type of the node function, which must be an LLVM struct type with zero or more fields (one per outgoing edge).
We represent nodes with opaque handles (pointers of LLVM type i8*). We represent input edges of a node as integer indices into the list of function arguments, and output edges of a node as integer indices into the return struct type. We represent nodes with opaque handles (pointers of LLVM type i8\*). We represent input edges of a node as integer indices into the list of function arguments, and output edges of a node as integer indices into the return struct type.
Pointer arguments of node functions are required to be annotated with attributes in, and/or out, depending on their expected use (read only, write only, read write). Pointer arguments of node functions are required to be annotated with attributes in, and/or out, depending on their expected use (read only, write only, read write).
...@@ -71,19 +73,19 @@ The intrinsics for describing graphs can only be used by internal nodes. Also, i ...@@ -71,19 +73,19 @@ The intrinsics for describing graphs can only be used by internal nodes. Also, i
```i8* llvm.hpvm.createNode(i8* F)``` ```i8* llvm.hpvm.createNode(i8* F)```
Create a static dataflow node with no dynamic instances executing node function ```F```. Return a handle to the created node. Create a static dataflow node with one dynamic instance executing node function ```F```. Return a handle to the created node.
```i8* llvm.hpvm.createNode1D(i8* F, i64 n1)``` ```i8* llvm.hpvm.createNode1D(i8* F, i64 n1)```
Create a static dataflow node with ```n1``` dynamic instances executing node function ```F```. Return a handle to the created node. Create a static dataflow node replicated in one dimension, namely ```x```, with ```n1``` dynamic instances executing node function ```F```. Return a handle to the created node.
```i8* llvm.hpvm.createNode2D(i8* F, i64 n1, i64 n2)``` ```i8* llvm.hpvm.createNode2D(i8* F, i64 n1, i64 n2)```
Create a static dataflow node replicated in two dimensions, with ```n1``` and ```n2``` dynamic instances in each dimension respectively, executing node function ```F```. Return a handle to the created node. Create a static dataflow node replicated in two dimensions, namely ```x``` and ```y```, with ```n1``` and ```n2``` dynamic instances in each dimension respectively, executing node function ```F```. Return a handle to the created node.
```i8* llvm.hpvm.createNode3D(i8* F, i64 n1, i64 n2, i64 n3)``` ```i8* llvm.hpvm.createNode3D(i8* F, i64 n1, i64 n2, i64 n3)```
Create a static dataflow node replicated in three dimensions, with ```n1```, ```n2``` and ```n3``` dynamic instances in each dimension respectively, executing node function ```F```. Return a handle to the created node. Create a static dataflow node replicated in three dimensions, namely ```x```, ```y``` and ```z```, with ```n1```, ```n2``` and ```n3``` dynamic instances in each dimension respectively, executing node function ```F```. Return a handle to the created node.
```i8* llvm.hpvm.createEdge(i8* Src, i8* Dst, i1 ReplType, i32 sp, i32 dp, i1 isStream)``` ```i8* llvm.hpvm.createEdge(i8* Src, i8* Dst, i1 ReplType, i32 sp, i32 dp, i1 isStream)```
Create edge from output ```sp``` of node ```Src``` to input ```dp``` of node ```Dst```. ```ReplType``` chooses between a 1-1 or all-to-all edge. ```isStream``` chooses a streaming (1) or non streaming (0) edge. Return a handle to the created edge. Create edge from output ```sp``` of node ```Src``` to input ```dp``` of node ```Dst```. ```ReplType``` chooses between a one-to-one (0) or all-to-all (1) edge. ```isStream``` chooses a streaming (1) or non streaming (0) edge. Return a handle to the created edge.
```void llvm.hpvm.bind.input(i8* N, i32 ip, i32 ic, i1 isStream)``` ```void llvm.hpvm.bind.input(i8* N, i32 ip, i32 ic, i1 isStream)```
Bind input ```ip``` of current node to input ```ic``` of child node ```N```. ```isStream``` chooses a streaming (1) or non streaming (0) bind. Bind input ```ip``` of current node to input ```ic``` of child node ```N```. ```isStream``` chooses a streaming (1) or non streaming (0) bind.
...@@ -96,7 +98,7 @@ Bind output ```oc``` of child node ```N``` to output ```op``` of current node. ` ...@@ -96,7 +98,7 @@ Bind output ```oc``` of child node ```N``` to output ```op``` of current node. `
The following intrinsics are used to query the structure of the DFG. They can only be used by leaf nodes. The following intrinsics are used to query the structure of the DFG. They can only be used by leaf nodes.
```i8* llvm.hpvm.getNode()``` ```i8* llvm.hpvm.getNode()```
Return a handle to the current dataflow node. Return a handle to the current leaf node.
```i8* llvm.hpvm.getParentNode(i8* N)``` ```i8* llvm.hpvm.getParentNode(i8* N)```
Return a handle to the parent in the hierarchy of node ```N```. Return a handle to the parent in the hierarchy of node ```N```.
...@@ -113,32 +115,34 @@ Get number of dynamic instances of node ```N``` in dimension x, y or z respectiv ...@@ -113,32 +115,34 @@ Get number of dynamic instances of node ```N``` in dimension x, y or z respectiv
## Intrinsics for Memory Allocation and Synchronization ## Intrinsics for Memory Allocation and Synchronization
The following intrinsics are used for memory allocation and synchronization. They can only be used by leaf nodes. The following intrinsics are used for memory allocation and synchronization. They can only be used by leaf nodes.
```i8* llvm.hpvm.malloc(i64 nBytes)``` ```i8* llvm.hpvm.malloc(i64 nBytes)```
Allocate a block of memory of size ```nBytes``` and return pointer to it. The allocated object can be shared by all nodes, although the pointer returned must somehow be communicated explicitly for use by other nodes. Allocate a block of memory of size ```nBytes``` and return pointer to it. The allocated object can be shared by all nodes.
*Note that the pointer returned must somehow be communicated explicitly for use by other nodes.*
```i32 llvm.hpvm.atomic.add(i8* m, i32 v)``` ```i32 llvm.hpvm.atomic.add(i8* m, i32 v)```
Atomically computes the bitwise ADD of ```v``` and the value stored at memory location ```[m]```. Returns the value previously stored at ```[m]```. Atomically computes the bitwise ADD of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
```i32 llvm.hpvm.atomic.sub(i8* m, i32 v)``` ```i32 llvm.hpvm.atomic.sub(i8* m, i32 v)```
Atomically computes the bitwise SUB of ```v``` and the value stored at memory location ```[m]```. Returns the value previously stored at ```[m]```. Atomically computes the bitwise SUB of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
```i32 llvm.hpvm.atomic.min(i8* m, i32 v)``` ```i32 llvm.hpvm.atomic.min(i8* m, i32 v)```
Atomically computes the bitwise MIN of ```v``` and the value stored at memory location ```[m]```. Returns the value previously stored at ```[m]```. Atomically computes the bitwise MIN of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
```i32 llvm.hpvm.atomic.max(i8* m, i32 v)``` ```i32 llvm.hpvm.atomic.max(i8* m, i32 v)```
Atomically computes the bitwise MAX of ```v``` and the value stored at memory location ```[m]```. Returns the value previously stored at ```[m]```. Atomically computes the bitwise MAX of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
```i32 llvm.hpvm.atomic.xchg(i8* m, i32 v)``` ```i32 llvm.hpvm.atomic.xchg(i8* m, i32 v)```
Atomically computes the bitwise XCHG of ```v``` and the value stored at memory location ```[m]```. Returns the value previously stored at ```[m]```. Atomically computes the bitwise XCHG of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
```i32 llvm.hpvm.atomic.and(i8* m, i32 v)``` ```i32 llvm.hpvm.atomic.and(i8* m, i32 v)```
Atomically computes the bitwise AND of ```v``` and the value stored at memory location ```[m]```. Returns the value previously stored at ```[m]```. Atomically computes the bitwise AND of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
```i32 llvm.hpvm.atomic.or(i8* m, i32 v)``` ```i32 llvm.hpvm.atomic.or(i8* m, i32 v)```
Atomically computes the bitwise XOR of ```v``` and the value stored at memory location ```[m]```. Returns the value previously stored at ```[m]```. Atomically computes the bitwise XOR of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
```i32 llvm.hpvm.atomic.xor(i8* m, i32 v)``` ```i32 llvm.hpvm.atomic.xor(i8* m, i32 v)```
Atomically computes the bitwise XOR of ```v``` and the value stored at memory location ```[m]```. Returns the value previously stored at ```[m]```. Atomically computes the bitwise XOR of ```v``` and the value stored at memory location ```[m]``` w.r.t. the dynamic instances of the current leaf node and stores the result back into ```[m]```. Returns the value previously stored at ```[m]```.
```void llvm.hpvm.barrier()``` ```void llvm.hpvm.barrier()```
Local synchronization barrier across dynamic instances of current leaf node. Local synchronization barrier across dynamic instances of current leaf node.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment