@@ -107,3 +107,26 @@ Atomically computes the bitwise XOR of ```v``` and the value stored at memory lo
...
@@ -107,3 +107,26 @@ Atomically computes the bitwise XOR of ```v``` and the value stored at memory lo
```void __hpvm__barrier()```
```void __hpvm__barrier()```
Local synchronization barrier across dynamic instances of current leaf node.
Local synchronization barrier across dynamic instances of current leaf node.
# Porting a Program from C to HPVM-C
The follwing represents the required steps to port a regular C program into an HPVM program with HPVM-C. These steps are described at a high level; for more detail, please see the benchmarks provided in [benchmarks](/hpvm/test/benchmarks).
* Separate the computation that will become a kernel into its own (leaf node) function and add the attributes and target hint
* Create a level 1 wrapper node function that will describe the thread-level parallelism. The node will:
* Instantiate one (or more) kernels and specify how many threads will execute the kernel using the createNode\[ND]() method.
* Binds its arguments to the kernel arguments.
* If more than one kernel are in the wrapper node and depend on each other, an edge is created to connect the output of one kernel to the input of the other.
* If desired, create a level 2 wrapper node function which will describe the threadblock-level pallalism. This node will:
* Instantiate 1 or more level 1 wrapper nodes and specify how many threadblocks will ecexute them using the createNode\[ND]() method. These will become its child nodes.
* Bind its arguments to its child nodes' arguments.
* If more than one of its children depend on each other, an edge is created to connect the output of one node, to the input of the other.
* A root node function that creates all the top-level wrapper nodes, binds their arguments, and connects their edges.
* All the above node functions have the combined arguments of all the kernel that are nested at each level. Each root node represents a DFG.
* The host code will have to include the following:
* Initialize the HPVM runtime using the init() method.
* Create an argument struct for each DFG and assign its member variables.
* Add all the memory that is required by the kernel into the memory tracker.
* Launch the DFG by calling the launch() method on the root node function, and passing the corresponding argument struct.
* Wait for the DFG to complete execytion.
* Read out any generated memory using the request_mem() method
* Remove all the tracked memory from the memory tracker.