Commit 9477ccd9 authored by Adel Ejjeh's avatar Adel Ejjeh
Browse files

Update hpvm-c.md

parent 8fcc3454
......@@ -110,23 +110,22 @@ Local synchronization barrier across dynamic instances of current leaf node.
# Porting a Program from C to HPVM-C
The follwing represents the required steps to port a regular C program into an HPVM program with HPVM-C. These steps are described at a high level; for more detail, please see the benchmarks provided in [benchmarks](/hpvm/test/benchmarks).
* Separate the computation that will become a kernel into its own (leaf node) function and add the attributes and target hint
* Create a level 1 wrapper node function that will describe the thread-level parallelism. The node will:
* Instantiate one (or more) kernels and specify how many threads will execute the kernel using the createNode\[ND]() method.
* Binds its arguments to the kernel arguments.
* If more than one kernel are in the wrapper node and depend on each other, an edge is created to connect the output of one kernel to the input of the other.
* If desired, create a level 2 wrapper node function which will describe the threadblock-level pallalism. This node will:
* Instantiate 1 or more level 1 wrapper nodes and specify how many threadblocks will ecexute them using the createNode\[ND]() method. These will become its child nodes.
* Bind its arguments to its child nodes' arguments.
* If more than one of its children depend on each other, an edge is created to connect the output of one node, to the input of the other.
The following represents the required steps to port a regular C program into an HPVM program with HPVM-C. These steps are described at a high level; for more detail, please see the benchmarks provided in [benchmarks](/hpvm/test/benchmarks).
* Separate the computation that will become a kernel into its own (leaf node) function and add the attributes and target hint.
* Create a level 1 wrapper node function that will describe the thread-level parallelism (for the GPU). The node will:
* Use the ```createNode\[ND]()``` method to create a kernel node and specify how many threads will execute it.
* Bind its arguments to the kernel arguments.
* If desired, create a level 2 wrapper node function which will describe the threadblock-level parallalism (for the GPU). This node will:
* Use the ```createNode\[ND]()``` method to create a level 1 wrapper node and specify how many threadblocks will ecexute it.
* Bind its arguments to its child node's arguments.
* A root node function that creates all the top-level wrapper nodes, binds their arguments, and connects their edges.
* All the above node functions have the combined arguments of all the kernel that are nested at each level. Each root node represents a DFG.
* Each root node represents a DFG.
* All the above node functions have the combined arguments of all the kernels that are nested at each level.
* The host code will have to include the following:
* Initialize the HPVM runtime using the init() method.
* Initialize the HPVM runtime using the ```init()``` method.
* Create an argument struct for each DFG and assign its member variables.
* Add all the memory that is required by the kernel into the memory tracker.
* Launch the DFG by calling the launch() method on the root node function, and passing the corresponding argument struct.
* Wait for the DFG to complete execytion.
* Read out any generated memory using the request_mem() method
* Launch the DFG by calling the ```launch()``` method on the root node function, and passing the corresponding argument struct.
* Wait for the DFG to complete execution.
* Read out any generated memory using the ```request_mem()``` method
* Remove all the tracked memory from the memory tracker.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment