diff --git a/CREDITS.txt b/CREDITS.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1c996e47ee6ddd6c7b6e7b2c1ed31ab9fe23cf8a
--- /dev/null
+++ b/CREDITS.txt
@@ -0,0 +1,36 @@
+This file is a partial list of people who have contributed to the HPVM-CAVA
+pilot project.
+
+As with LLVM's credits, the list is sorted by surname and formatted to allow
+easy grepping and beautification by scripts.  The fields are: name (N) and
+email (E).
+
+N: Vikram Adve
+E: vadve@illinois.edu
+
+N: Sarita Adve
+E: sadve@illinois.edu
+
+N: Adel Ejjeh
+E: aejjeh@illinois.edu
+
+N: Kapil Kanwar
+E: kkanwar2@illinois.edu
+
+N: Akash Kothari
+E: akashk4@illinois.edu
+
+N: Maria Kotsifakou
+E: kotsifa2@illinois.edu
+
+N: Sasa Misailovic
+E: misailo@illinois.edu
+
+N: Benjamin Schreiber
+E: bjschre2@illinois.edu
+
+N: Hashim Sharif
+E: hsharif3@illinois.edu
+
+N: Yifan Zhao
+E: yifanz16@illinois.edu
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..7eeac83a8c557b8cc45979a47e1238c6be38b653
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,34 @@
+University of Illinois/NCSA Open Source License
+
+Copyright (c) 2020 Illinois LLVM Group. All rights reserved.
+
+Developed by: The Illinois LLVM Group
+              University of Illinois at Urbana Champaign
+              https://hpvm.cs.illinois.edu
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation files
+(the "Software"), to deal with the Software without restriction,
+including without limitation the rights to use, copy, modify, merge,
+publish, distribute, sublicense, and/or sell copies of the Software,
+and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+
+* Redistributions of source code must retain the above copyright notice,
+  this list of conditions and the following disclaimers.
+
+* Redistributions in binary form must reproduce the above copyright
+  notice, this list of conditions and the following disclaimers in the
+  documentation and/or other materials provided with the distribution.
+
+* Neither the names of [fullname], [project] nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this Software without specific prior written permission.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
+THE SOFTWARE.
diff --git a/README.md b/README.md
index 0ea66d71920e43417184d1d48f8bc9f3b805cd08..51e6de9d74f7176c75e2c6dacc71f546573ec55c 100644
--- a/README.md
+++ b/README.md
@@ -5,9 +5,11 @@ This repository contains the source code and documentation for the HPVM Compiler
 The README briefly describes how to get started with building and installing HPVM. It also provides a
 benchmark suite to test the compiler infrastructure.
 
+HPVM is currently at version 0.5. For more about what HPVM is, see [our website](https://publish.illinois.edu/hpvm-project/).
+
 ## Paper
 
-[PPoPP'18 paper](http://rsim.cs.illinois.edu/Pubs/17-PPOPP-HPVM.pdf)
+[PPoPP'18 paper](https://dl.acm.org/doi/pdf/10.1145/3200691.3178493)
 
 ## Docs
 [HPVM IR Specification](/hpvm/docs/hpvm-specification.md)
@@ -65,7 +67,15 @@ In order to manually build and install HPVM, GNU Make can be run using the follo
 make -j<number of threads>
 make install
 ```
-With all the aforementioned steps, HPVM should be built, installed and ready to use.
+
+In the end of the installation process, the installer automatically runs all the regression tests to ensure that the installation is
+successful. If HPVM is built and installed manually, the tests can be automatically run by executing the following step from the 
+current directory.
+```shell
+bash scripts/automate_tests.sh
+```
+
+With all the aforementioned steps, HPVM should be built, installed, tested and ready to use.
 
 ## Benchmarks and Tests
 We are providing the following [HPVM benchmarks](/hpvm/test/benchmarks):
diff --git a/hpvm/docs/compilation.md b/hpvm/docs/compilation.md
index 8e68d00174b6fb63bfb647a0dbee1aa5dbd10b6a..6381fec7d856c79fdd2ed31bc23fe02990c9e38d 100644
--- a/hpvm/docs/compilation.md
+++ b/hpvm/docs/compilation.md
@@ -5,11 +5,11 @@ Compilation of an HPVM program involves the following steps:
 2. `opt` takes (`main.ll`) and invoke the GenHPVM pass on it, which converts the HPVM-C function calls to HPVM intrinsics. This generates the HPVM textual representation (`main.hpvm.ll`).
 3. `opt` takes the HPVM textual representation (`main.hpvm.ll`) and invokes the following passes in sequence: 
     * BuildDFG: Converts the textual representation to the internal HPVM representation.
-    * LocalMem and DFG2LLVM_NVPTX: Invoked only when GPU target is selected. Generates the kernel module (`main.kernels.ll`) and the portion of the host code that invokes the kernel into the host module (`main.host.ll`).
-    * DFG2LLVM_X86: Generates either all, or the remainder of the host module (`main.host.ll`) depending on the chosen target.
+    * LocalMem and DFG2LLVM_OpenCL: Invoked only when GPU target is selected. Generates the kernel module (`main.kernels.ll`) and the portion of the host code that invokes the kernel into the host module (`main.host.ll`).
+    * DFG2LLVM_CPU: Generates either all, or the remainder of the host module (`main.host.ll`) depending on the chosen target.
     * ClearDFG: Deletes the internal HPVM representation from memory.
 4. `clang` is used to to compile any remaining project files that would be later linked with the host module.
 5. `llvm-link` takes the host module and all the other generate `ll` files, and links them with the HPVM runtime module (`hpvm-rt.bc`), to generate the linked host module (`main.host.linked.ll`). 
 6. Generate the executable code from the generated `ll` files for all parts of the program:
     * GPU target: `llvm-cbe` takes the kernel module (`main.kernels.ll`) and generates an OpenCL representation of the kernels that will be invoked by the host.
-    * X86 target: `clang` takes the linked  host module (`main.host.linked.ll`) and generates the X86 binary.
+    * CPU target: `clang` takes the linked  host module (`main.host.linked.ll`) and generates the CPU binary.
diff --git a/hpvm/docs/hpvm-c.md b/hpvm/docs/hpvm-c.md
index 77bc684b16eb6462d7d61cffbc50f258b454b1f6..76cfde58c0e406896606bc9703e88d8a9bf27fa7 100644
--- a/hpvm/docs/hpvm-c.md
+++ b/hpvm/docs/hpvm-c.md
@@ -1,4 +1,9 @@
-# HPVM-C Language
+# HPVM-C Language Specification
+An HPVM program is a combination of host code and one or more data flow graphs (DFG) at the IR level. We provide C function declarations representing the HPVM intrinsics that allow creating, querying, and interacting with the DFGs. More details about the HPVM IR intrinsics can be found in [the HPVM IR Specification.](/hpvm/docs/hpvm-specification.md).
+
+An HPVM-C program contains both the host and the DFG code. Each HPVM kernel, represented by a leaf node in the DFG, can be compiled to multiple different targets (e.g. CPU and GPU) as described below. 
+
+This document describes all the API calls that can be used in an HPVM-C program.
 
 ## Host API
 
@@ -102,3 +107,25 @@ Atomically computes the bitwise XOR of ```v``` and the value stored at memory lo
 
 ```void __hpvm__barrier()```  
 Local synchronization barrier across dynamic instances of current leaf node.
+
+# Porting a Program from C to HPVM-C
+
+The following represents the required steps to port a regular C program into an HPVM program with HPVM-C. These steps are described at a high level; for more detail, please see [hpvm-cava](/hpvm/test/benchmarks/hpvm-cava) provided in [benchmarks](/hpvm/test/benchmarks).
+* Separate the computation that will become a kernel into its own (leaf node) function and add the attributes and target hint.
+* Create a level 1 wrapper node function that will describe the thread-level parallelism (for the GPU). The node will:
+    * Use the ```createNode[ND]()``` method to create a kernel node and specify how many threads will execute it.
+    * Bind its arguments to the kernel arguments.
+* If desired, create a level 2 wrapper node function which will describe the threadblock-level parallalism (for the GPU). This node will:
+    * Use the ```createNode[ND]()``` method to create a level 1 wrapper node and specify how many threadblocks will execute it.
+    * Bind its arguments to its child node's arguments.
+* A root node function that creates all the top-level wrapper nodes, binds their arguments, and connects their edges.
+    * Each root node represents a DFG.
+* All the above node functions have the combined arguments of all the kernels that are nested at each level. 
+* The host code will have to include the following:
+    * Initialize the HPVM runtime using the ```init()``` method.
+    * Create an argument struct for each DFG and assign its member variables.
+    * Add all the memory that is required by the kernel into the memory tracker.
+    * Launch the DFG by calling the ```launch()``` method on the root node function, and passing the corresponding argument struct.
+    * Wait for the DFG to complete execution.
+    * Read out any generated memory using the ```request_mem()``` method.
+    * Remove all the tracked memory from the memory tracker.
diff --git a/hpvm/docs/hpvm-specification.md b/hpvm/docs/hpvm-specification.md
index cd61d95b4e3d4f4068a985bd5f3bac4578f6e14d..54023fc9eddc4d9f317ac1b4cc585e52b98b8ae5 100644
--- a/hpvm/docs/hpvm-specification.md
+++ b/hpvm/docs/hpvm-specification.md
@@ -1,7 +1,22 @@
-# HPVM Abstraction
+# Table of Contents
+* [HPVM Abstraction](#abstraction)
+    * [Dataflow Node](#node)
+    * [Dataflow Edge](#edge)
+    * [Input and Output Bind](#bind)
+    * [Host Code](#host)
+* [HPVM Implementation](#implementation)
+    * [Intrinsics for Describing Graphs](#describing)
+    * [Intrinsics for Querying Graphs](#querying)
+    * [Intrinsics for Memory Allocation and Synchronization](#memory)
+    * [Intrinsics for Graph Interaction](#interaction)
+* [Implementation Limitations](#limitations)
+
+<a name="abstraction"></a>
+# HPVM Abstraction 
 An HPVM program is a combination of host code plus a set of one or more distinct dataflow graphs. Each dataflow graph (DFG) is a hierarchical graph with side effects. The DFG must be acyclic. Nodes represent units of execution, and edges between nodes describe the explicit data transfer requirements. A node can begin execution once a data item becomes available on every one of its input edges. Repeated transfer of data items between nodes (if more inputs are provided) yields a pipelined execution of different nodes in the graph. The execution of a DFG is initiated and terminated by host code that launches the graph. Nodes may access globally shared memory through load and store instructions (side-effects).
 
-## Dataflow Node
+<a name="node"></a>
+## Dataflow Node 
 A *dataflow node* represents unit of computation in the DFG. A node can begin execution once a data item becomes available on every one of its input edges.
 
 A single static dataflow node represents multiple dynamic instances of the node, each executing the same computation with different index values used to uniquely identify each dynamic instance w.r.t. the others. The dynamic instances of a node may be executed concurrently, and any required synchronization must imposed using HPVM synchronization operations.
@@ -14,8 +29,8 @@ Leaf nodes contain code expressing actual computations. Leaf nodes may contain i
 
 Note that the graph is fully interpreted at compile-time and  cannot be modified at runtime except for the number of dynamic instances, which can be data dependent.
 
-
-## Dataflow Edge
+<a name="edge"></a>
+## Dataflow Edge 
 A *dataflow edge* from the output ```out``` of a source dataflow node ```Src``` to the input ```in``` of a sink dataflow node ```Dst``` describes the explicit data transfer requirements. ```Src``` and ```Dst``` node must belong to the same child graph, i.e. must be children of the same internal node.
 
 An edge from source to sink has the semantics of copying the specified data from the source to the sink after the source node has completed execution. The pairs ```(Src, out)``` and ```(Dst, in)```, representing source and sink respectively, must be unique w.r.t. every other edge in the same child graph, i.e. two dataflow edges in the same child graph cannot have the same source or destination.
@@ -26,7 +41,8 @@ An edge can be instantiated at runtime using one of two replication mechanisms:
 - *All-to-all*, where all dynamic instances of the source node are connected to all dynamic instances of the sink node, thus expressing a synchronization barrier between the two groups of nodes, or
 - *One-to-one*, where each dynamic instance of the source node is connected to a single corresponding instance of the sink node. One-to-one replication requires that the grid structure (number of dimensions and the extents in each dimension) of the source and sink nodes be identical.
 
-## Input and Output Bind
+<a name="bind"></a>
+## Input and Output Bind 
 An internal node is responsible for mapping its inputs, provided by incoming dataflow edges, to the inputs to one or more nodes of the child graph.
 
 An internal node binds its input ```ip``` to input ```ic``` of its child node ```Dst``` using an *input bind*.
@@ -36,7 +52,8 @@ Conversely, an internal node binds output ```oc``` of its child node ```Src``` t
 
 A bind is always ***all-to-all***.
 
-## Host Code
+<a name="host"></a>
+## Host Code 
 In an HPVM program, the host code is responsible for setting up, initiating the execution and blocking for completion of a DFG. The host can interact with the DFG to sustain a streaming computation by sending all data required for, and receiving all data produced by, one execution of the DFG. The list of actions that can be performed by the host is described below:
 
 - **Initialization and Cleanup**:
@@ -60,7 +77,8 @@ The host code blocks for completion of specified DFG.
     - For a non-streaming DFG, the data produced by the DFG are ready to be read by the host.
     - For a streaming DFG, no more data may be provided for processing by the DFG.
 
-# HPVM Implementation
+<a name="implementation"></a>
+# HPVM Implementation 
 
 This section describes the implementation of HPVM on top of LLVM IR.
 
@@ -78,7 +96,8 @@ We represent nodes with opaque handles (pointers of LLVM type i8\*). We represen
 
 Pointer arguments of node functions are required to be annotated with attributes in, and/or out, depending on their expected use (read only, write only, read write).
 
-## Intrinsics for Describing Graphs
+<a name="describing"></a>
+## Intrinsics for Describing Graphs 
 
 The intrinsics for describing graphs can only be used by internal nodes. Also, internal nodes are only allowed to have these intrinsics as part of their node function, with the exception of a return statement of the appropriate type, in order to return the result of the outgoing dataflow edges.
 
@@ -104,7 +123,8 @@ Bind input ```ip``` of current node to input ```ic``` of child node ```N```. Arg
 ```void llvm.hpvm.bind.output(i8* N, i32 oc, i32 op, i1 isStream)```  
 Bind output ```oc``` of child node ```N``` to output ```op``` of current node. Field ```oc``` of the return struct in ```N```'s node function and field ```op``` of the return struct in the current node function must have matching types. ```isStream``` chooses a streaming (1) or non streaming (0) bind.
 
-## Intrinsics for Querying Graphs
+<a name="querying"></a>
+## Intrinsics for Querying Graphs 
 
 The following intrinsics are used to query the structure of the DFG. They can only be used by leaf nodes.
 
@@ -123,6 +143,7 @@ Get index of current dynamic node instance of node ```N``` in dimension x, y or
 ```i64 llvm.hpvm.getNumNodeInstances.{x,y,z}(i8* N)```  
 Get number of dynamic instances of node ```N``` in dimension x, y or z respectively. The dimension must be one of the dimensions in which the node is replicated.
 
+<a name="memory"></a>
 ## Intrinsics for Memory Allocation and Synchronization
 
 The following intrinsics are used for memory allocation and synchronization. They can only be used by leaf nodes.
@@ -158,6 +179,7 @@ Atomically computes the bitwise XOR of ```v``` and the value stored at memory lo
 ```void llvm.hpvm.barrier()```  
 Local synchronization barrier across dynamic instances of current leaf node.
 
+<a name="interaction"></a>
 ## Intrinsics for Graph Interaction
 
 The following intrinsics are for graph initialization/termination and interaction with the host code, and can be used only by the host code.
@@ -189,6 +211,7 @@ Push set of input data ```args``` (same as type included in launch) to streaming
 ```i8* llvm.hpvm.pop(i8* GraphID)```  
 Pop and return data from streaming DFG with handle ```GraphID```. The return type is a struct containing a field for every output of DFG. 
 
+<a name="limitations"></a>
 ## Implementation Limitations
 Due to limitations of our current prototype implementation, the following restrictions are imposed:
 
diff --git a/hpvm/include/BuildDFG/BuildDFG.h b/hpvm/include/BuildDFG/BuildDFG.h
index ca4c616da5f4076528b1294992ec8ad3ab768809..fd36a2593e4f0200e70a2244a2c3b95c6a6d0823 100644
--- a/hpvm/include/BuildDFG/BuildDFG.h
+++ b/hpvm/include/BuildDFG/BuildDFG.h
@@ -9,6 +9,12 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// This pass defines the BuildDFG pass which uses LLVM IR with HPVM intrinsics
+// to infer information about dataflow graph hierarchy and structure to
+// construct HPVM IR.
+//
+//===----------------------------------------------------------------------===//
 
 #include "SupportHPVM/DFGraph.h"
 #include "llvm/IR/Function.h"
diff --git a/hpvm/include/GenHPVM/GenHPVM.h b/hpvm/include/GenHPVM/GenHPVM.h
index 24798bc2740e2299f67cc7f515437339f2fe8310..f61d4a7c90dff2e4bff5f781c6c9cc92e3246232 100644
--- a/hpvm/include/GenHPVM/GenHPVM.h
+++ b/hpvm/include/GenHPVM/GenHPVM.h
@@ -1,4 +1,4 @@
-//== GenHPVM.h - Header file for "LLVM IR to HPVM IR Pass" =//
+//==------- GenHPVM.h - Header file for "LLVM IR to HPVM IR Pass" ----------==//
 //
 //                     The LLVM Compiler Infrastructure
 //
@@ -6,6 +6,12 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// This file defines the GenHPVM pass responsible for converting HPVM-C to
+// HPVM intrinsics. Note that this pass relies on memory-to-register optimiza-
+// tion pass to execute before this executes.
+//
+//===----------------------------------------------------------------------===//
 
 #include "SupportHPVM/HPVMTimer.h"
 #include "llvm/IR/DerivedTypes.h"
diff --git a/hpvm/include/SupportHPVM/DFG2LLVM.h b/hpvm/include/SupportHPVM/DFG2LLVM.h
index 07147c6d909f5352dd886b5f8bc1a2b0ae434ffe..e517a7d542ccde98d4e212c59803226045fce632 100644
--- a/hpvm/include/SupportHPVM/DFG2LLVM.h
+++ b/hpvm/include/SupportHPVM/DFG2LLVM.h
@@ -9,6 +9,11 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// This defines different classes for traversing Dataflow Graph for code
+// generation for different nodes for different targets.
+//
+//===----------------------------------------------------------------------===//
 
 #include "BuildDFG/BuildDFG.h"
 #include "SupportHPVM/HPVMHint.h"
diff --git a/hpvm/include/SupportHPVM/DFGraph.h b/hpvm/include/SupportHPVM/DFGraph.h
index d904e2401d7e9a58a38e9bca024de1a437cd56d1..2deb2ca8f5c17620da0ddf60e1ef269acde52235 100644
--- a/hpvm/include/SupportHPVM/DFGraph.h
+++ b/hpvm/include/SupportHPVM/DFGraph.h
@@ -51,11 +51,11 @@ struct TargetGenFunctions {
 };
 
 struct TargetGenFuncInfo {
-  bool cpu_hasX86Func;
-  bool gpu_hasX86Func;
-  bool spir_hasX86Func;
-  bool cudnn_hasX86Func;
-  bool promise_hasX86Func;
+  bool cpu_hasCPUFunc;
+  bool gpu_hasCPUFunc;
+  bool spir_hasCPUFunc;
+  bool cudnn_hasCPUFunc;
+  bool promise_hasCPUFunc;
 };
 
 class DFGraph {
@@ -191,7 +191,7 @@ private:
   ///< (if multiple are available)
   struct TargetGenFuncInfo GenFuncInfo;
   ///< True for each target generated function
-  ///< if the associated genFunc is an x86 function
+  ///< if the associated genFunc is an cpu function
   DFInternalNode *Parent;         ///< Pointer to parent dataflow Node
   unsigned NumOfDim;              ///< Number of dimensions
   std::vector<Value *> DimLimits; ///< Number of instances in each dimension
@@ -349,15 +349,15 @@ public:
 
   Function *getGenFunc() const { return GenFunc; }
 
-  void setHasX86FuncForTarget(hpvm::Target T, bool isX86Func) {
+  void setHasCPUFuncForTarget(hpvm::Target T, bool isCPUFunc) {
     switch (T) {
     case hpvm::None:
       return; // Do nothing.
     case hpvm::CPU_TARGET:
-      GenFuncInfo.cpu_hasX86Func = isX86Func;
+      GenFuncInfo.cpu_hasCPUFunc = isCPUFunc;
       break;
     case hpvm::GPU_TARGET:
-      GenFuncInfo.gpu_hasX86Func = isX86Func;
+      GenFuncInfo.gpu_hasCPUFunc = isCPUFunc;
       break;
     case hpvm::CPU_OR_GPU_TARGET:
       break;
@@ -368,14 +368,14 @@ public:
     return;
   }
 
-  bool hasX86GenFuncForTarget(hpvm::Target T) const {
+  bool hasCPUGenFuncForTarget(hpvm::Target T) const {
     switch (T) {
     case hpvm::None:
       return false;
     case hpvm::CPU_TARGET:
-      return GenFuncInfo.cpu_hasX86Func;
+      return GenFuncInfo.cpu_hasCPUFunc;
     case hpvm::GPU_TARGET:
-      return GenFuncInfo.gpu_hasX86Func;
+      return GenFuncInfo.gpu_hasCPUFunc;
     case hpvm::CPU_OR_GPU_TARGET:
       assert(false && "Single target expected (CPU/GPU/SPIR/CUDNN/PROMISE)\n");
     default:
@@ -384,7 +384,7 @@ public:
     return false;
   }
 
-  void addGenFunc(Function *F, hpvm::Target T, bool isX86Func) {
+  void addGenFunc(Function *F, hpvm::Target T, bool isCPUFunc) {
 
     switch (T) {
     case hpvm::CPU_TARGET:
@@ -393,7 +393,7 @@ public:
                      << FuncPointer->getName() << "\n");
       }
       GenFuncs.CPUGenFunc = F;
-      GenFuncInfo.cpu_hasX86Func = isX86Func;
+      GenFuncInfo.cpu_hasCPUFunc = isCPUFunc;
       break;
     case hpvm::GPU_TARGET:
       if (GenFuncs.GPUGenFunc != NULL) {
@@ -401,7 +401,7 @@ public:
                      << FuncPointer->getName() << "\n");
       }
       GenFuncs.GPUGenFunc = F;
-      GenFuncInfo.gpu_hasX86Func = isX86Func;
+      GenFuncInfo.gpu_hasCPUFunc = isCPUFunc;
       break;
     case hpvm::CPU_OR_GPU_TARGET:
       assert(false && "A node function should be set with a tag specifying its \
@@ -437,11 +437,11 @@ public:
       return;
     case hpvm::CPU_TARGET:
       GenFuncs.CPUGenFunc = NULL;
-      GenFuncInfo.cpu_hasX86Func = false;
+      GenFuncInfo.cpu_hasCPUFunc = false;
       break;
     case hpvm::GPU_TARGET:
       GenFuncs.GPUGenFunc = NULL;
-      GenFuncInfo.gpu_hasX86Func = false;
+      GenFuncInfo.gpu_hasCPUFunc = false;
       break;
     case hpvm::CPU_OR_GPU_TARGET:
       assert(false &&
@@ -690,11 +690,11 @@ DFNode::DFNode(IntrinsicInst *_II, Function *_FuncPointer, hpvm::Target _Hint,
   GenFuncs.CUDNNGenFunc = NULL;
   GenFuncs.PROMISEGenFunc = NULL;
 
-  GenFuncInfo.cpu_hasX86Func = false;
-  GenFuncInfo.gpu_hasX86Func = false;
-  GenFuncInfo.spir_hasX86Func = false;
-  GenFuncInfo.cudnn_hasX86Func = false;
-  GenFuncInfo.cudnn_hasX86Func = false;
+  GenFuncInfo.cpu_hasCPUFunc = false;
+  GenFuncInfo.gpu_hasCPUFunc = false;
+  GenFuncInfo.spir_hasCPUFunc = false;
+  GenFuncInfo.cudnn_hasCPUFunc = false;
+  GenFuncInfo.cudnn_hasCPUFunc = false;
 }
 
 void DFNode::setRank(unsigned r) {
diff --git a/hpvm/include/SupportHPVM/HPVMUtils.h b/hpvm/include/SupportHPVM/HPVMUtils.h
index 25b9880180f2cb4590f5b5fcbb3f3f2fbe025f8f..537a92caec4a5c63b0fcc06b3714bddefc0a3fde 100644
--- a/hpvm/include/SupportHPVM/HPVMUtils.h
+++ b/hpvm/include/SupportHPVM/HPVMUtils.h
@@ -7,6 +7,11 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// This filed defines utility functions used for target-specific code generation
+// for different nodes of dataflow graphs.
+//
+//===----------------------------------------------------------------------===//
 
 #ifndef HPVM_UTILS_HEADER
 #define HPVM_UTILS_HEADER
diff --git a/hpvm/install.sh b/hpvm/install.sh
index 23f5db63b2ad6366b9db501791ff02b37dac35c6..776b8aa6d11c0d92af507a161fb902b183da7672 100644
--- a/hpvm/install.sh
+++ b/hpvm/install.sh
@@ -1,2 +1,11 @@
+#!/bin/bash
+
+SCRIPTS_DIR=scripts
+
+BASH=/bin/bash
+
 # Run installer script
-/bin/bash llvm_installer/llvm_installer.sh
+$BASH $SCRIPTS_DIR/llvm_installer.sh
+
+# Run the tests
+$BASH $SCRIPTS_DIR/automate_tests.sh
diff --git a/hpvm/lib/Transforms/BuildDFG/BuildDFG.cpp b/hpvm/lib/Transforms/BuildDFG/BuildDFG.cpp
index be3e6cae3dae775716fc3e2206879e978febddb0..3177f860057b48dc1b22bc61940b53da848dca09 100644
--- a/hpvm/lib/Transforms/BuildDFG/BuildDFG.cpp
+++ b/hpvm/lib/Transforms/BuildDFG/BuildDFG.cpp
@@ -6,6 +6,15 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// BuildDFG pass is responsible for constructing dataflow graph from a textual
+// representation of HPVM IR with HPVM intrinsics from GenHPVM pass. This pass
+// makes use of three crutial abstractions: graph itself, dataflow nodes repre-
+// -senting functions and data edges representing tranfer of data between
+// the functions (or nodes in the graph). This pass is part of HPVM frontend
+// and does not make any changes to the textual representation of the IR.
+//
+//===----------------------------------------------------------------------===//
 
 #define DEBUG_TYPE "buildDFG"
 #include "BuildDFG/BuildDFG.h"
diff --git a/hpvm/lib/Transforms/CMakeLists.txt b/hpvm/lib/Transforms/CMakeLists.txt
index 5c9b8b9fe026ea5612caa124535e02d28d619c53..74917773b04146456b84db9b2bbf0814cd9bf387 100644
--- a/hpvm/lib/Transforms/CMakeLists.txt
+++ b/hpvm/lib/Transforms/CMakeLists.txt
@@ -1,6 +1,6 @@
 add_subdirectory(BuildDFG)
 add_subdirectory(ClearDFG)
-add_subdirectory(DFG2LLVM_NVPTX)
-add_subdirectory(DFG2LLVM_X86)
+add_subdirectory(DFG2LLVM_OpenCL)
+add_subdirectory(DFG2LLVM_CPU)
 add_subdirectory(GenHPVM)
 add_subdirectory(LocalMem)
diff --git a/hpvm/lib/Transforms/ClearDFG/ClearDFG.cpp b/hpvm/lib/Transforms/ClearDFG/ClearDFG.cpp
index c23043e7829a8947a995f7ad97688091c46cf23d..c905f745e658426951ae06fbe2a1d85685dfec74 100644
--- a/hpvm/lib/Transforms/ClearDFG/ClearDFG.cpp
+++ b/hpvm/lib/Transforms/ClearDFG/ClearDFG.cpp
@@ -6,6 +6,12 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// This pass HPVM intrinsics from HPVM IR. This pass is the final pass that
+// runs as a part of clean up after construction of dataflowgraph and LLVM
+// code generation for different targets from the dataflow graph.
+//
+//===----------------------------------------------------------------------===//
 
 #define DEBUG_TYPE "ClearDFG"
 #include "BuildDFG/BuildDFG.h"
diff --git a/hpvm/lib/Transforms/DFG2LLVM_X86/CMakeLists.txt b/hpvm/lib/Transforms/DFG2LLVM_CPU/CMakeLists.txt
similarity index 79%
rename from hpvm/lib/Transforms/DFG2LLVM_X86/CMakeLists.txt
rename to hpvm/lib/Transforms/DFG2LLVM_CPU/CMakeLists.txt
index 0a3a225f1967dd73d44d1401a2bc45cb8d43ee69..b4e129ba01837cf328912f7787b861f843f4f581 100644
--- a/hpvm/lib/Transforms/DFG2LLVM_X86/CMakeLists.txt
+++ b/hpvm/lib/Transforms/DFG2LLVM_CPU/CMakeLists.txt
@@ -4,9 +4,9 @@ endif()
 
 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DLLVM_BUILD_DIR=${PROJECT_BINARY_DIR}")
 
-add_llvm_library( LLVMDFG2LLVM_X86
+add_llvm_library( LLVMDFG2LLVM_CPU
   MODULE
-  DFG2LLVM_X86.cpp
+  DFG2LLVM_CPU.cpp
 
   DEPENDS intrinsics_gen
   PLUGIN_TOOL
diff --git a/hpvm/lib/Transforms/DFG2LLVM_X86/DFG2LLVM_X86.cpp b/hpvm/lib/Transforms/DFG2LLVM_CPU/DFG2LLVM_CPU.cpp
similarity index 87%
rename from hpvm/lib/Transforms/DFG2LLVM_X86/DFG2LLVM_X86.cpp
rename to hpvm/lib/Transforms/DFG2LLVM_CPU/DFG2LLVM_CPU.cpp
index c0e2b715fa9a7a14f6c728a3c58728742f80d77c..3655279a99ceecf462ca4aab46c25f82cf238ef0 100644
--- a/hpvm/lib/Transforms/DFG2LLVM_X86/DFG2LLVM_X86.cpp
+++ b/hpvm/lib/Transforms/DFG2LLVM_CPU/DFG2LLVM_CPU.cpp
@@ -1,4 +1,4 @@
-//===-------------------------- DFG2LLVM_X86.cpp --------------------------===//
+//===-------------------------- DFG2LLVM_CPU.cpp --------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
@@ -6,8 +6,13 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// This pass is responsible for generating code for host code and kernel code 
+// for CPU target using HPVM dataflow graph.
+//
+//===----------------------------------------------------------------------===//
 
-#define DEBUG_TYPE "DFG2LLVM_X86"
+#define DEBUG_TYPE "DFG2LLVM_CPU"
 #include "SupportHPVM/DFG2LLVM.h"
 #include "llvm/IR/Constant.h"
 #include "llvm/IR/Constants.h"
@@ -34,15 +39,15 @@ using namespace builddfg;
 using namespace dfg2llvm;
 
 // HPVM Command line option to use timer or not
-static cl::opt<bool> HPVMTimer_X86("hpvm-timers-x86",
+static cl::opt<bool> HPVMTimer_CPU("hpvm-timers-cpu",
                                    cl::desc("Enable hpvm timers"));
 
 namespace {
 
-// DFG2LLVM_X86 - The first implementation.
-struct DFG2LLVM_X86 : public DFG2LLVM {
+// DFG2LLVM_CPU - The first implementation.
+struct DFG2LLVM_CPU : public DFG2LLVM {
   static char ID; // Pass identification, replacement for typeid
-  DFG2LLVM_X86() : DFG2LLVM(ID) {}
+  DFG2LLVM_CPU() : DFG2LLVM(ID) {}
 
 private:
   // Member variables
@@ -54,16 +59,16 @@ public:
 };
 
 // Visitor for Code generation traversal (tree traversal for now)
-class CGT_X86 : public CodeGenTraversal {
+class CGT_CPU : public CodeGenTraversal {
 
 private:
   // Member variables
 
   FunctionCallee malloc;
   // HPVM Runtime API
-  FunctionCallee llvm_hpvm_x86_launch;
-  FunctionCallee llvm_hpvm_x86_wait;
-  FunctionCallee llvm_hpvm_x86_argument_ptr;
+  FunctionCallee llvm_hpvm_cpu_launch;
+  FunctionCallee llvm_hpvm_cpu_wait;
+  FunctionCallee llvm_hpvm_cpu_argument_ptr;
 
   FunctionCallee llvm_hpvm_streamLaunch;
   FunctionCallee llvm_hpvm_streamPush;
@@ -76,10 +81,10 @@ private:
   FunctionCallee llvm_hpvm_createThread;
   FunctionCallee llvm_hpvm_bufferPush;
   FunctionCallee llvm_hpvm_bufferPop;
-  FunctionCallee llvm_hpvm_x86_dstack_push;
-  FunctionCallee llvm_hpvm_x86_dstack_pop;
-  FunctionCallee llvm_hpvm_x86_getDimLimit;
-  FunctionCallee llvm_hpvm_x86_getDimInstance;
+  FunctionCallee llvm_hpvm_cpu_dstack_push;
+  FunctionCallee llvm_hpvm_cpu_dstack_pop;
+  FunctionCallee llvm_hpvm_cpu_getDimLimit;
+  FunctionCallee llvm_hpvm_cpu_getDimInstance;
 
   // Functions
   std::vector<IntrinsicInst *> *getUseList(Value *LI);
@@ -87,11 +92,11 @@ private:
   void addWhileLoop(Instruction *, Instruction *, Instruction *, Value *);
   Instruction *addWhileLoopCounter(BasicBlock *, BasicBlock *, BasicBlock *);
   Argument *getArgumentFromEnd(Function *F, unsigned offset);
-  Value *getInValueAt(DFNode *Child, unsigned i, Function *ParentF_X86,
+  Value *getInValueAt(DFNode *Child, unsigned i, Function *ParentF_CPU,
                       Instruction *InsertBefore);
-  void invokeChild_X86(DFNode *C, Function *F_X86, ValueToValueMapTy &VMap,
+  void invokeChild_CPU(DFNode *C, Function *F_CPU, ValueToValueMapTy &VMap,
                        Instruction *InsertBefore);
-  void invokeChild_PTX(DFNode *C, Function *F_X86, ValueToValueMapTy &VMap,
+  void invokeChild_PTX(DFNode *C, Function *F_CPU, ValueToValueMapTy &VMap,
                        Instruction *InsertBefore);
   StructType *getArgumentListStructTy(DFNode *);
   Function *createFunctionFilter(DFNode *C);
@@ -102,8 +107,8 @@ private:
 
   // Virtual Functions
   void init() {
-    HPVMTimer = HPVMTimer_X86;
-    TargetName = "X86";
+    HPVMTimer = HPVMTimer_CPU;
+    TargetName = "CPU";
   }
   void initRuntimeAPI();
   void codeGen(DFInternalNode *N);
@@ -113,7 +118,7 @@ private:
 
 public:
   // Constructor
-  CGT_X86(Module &_M, BuildDFG &_DFG) : CodeGenTraversal(_M, _DFG) {
+  CGT_CPU(Module &_M, BuildDFG &_DFG) : CodeGenTraversal(_M, _DFG) {
     init();
     initRuntimeAPI();
   }
@@ -122,8 +127,8 @@ public:
   void codeGenLaunchStreaming(DFInternalNode *Root);
 };
 
-bool DFG2LLVM_X86::runOnModule(Module &M) {
-  errs() << "\nDFG2LLVM_X86 PASS\n";
+bool DFG2LLVM_CPU::runOnModule(Module &M) {
+  DEBUG(errs() << "\nDFG2LLVM_CPU PASS\n");
 
   // Get the BuildDFG Analysis Results:
   // - Dataflow graph
@@ -136,7 +141,7 @@ bool DFG2LLVM_X86::runOnModule(Module &M) {
   // BuildDFG::HandleToDFEdge &HandleToDFEdgeMap = DFG.getHandleToDFEdgeMap();
 
   // Visitor for Code Generation Graph Traversal
-  CGT_X86 *CGTVisitor = new CGT_X86(M, DFG);
+  CGT_CPU *CGTVisitor = new CGT_CPU(M, DFG);
 
   // Iterate over all the DFGs and produce code for each one of them
   for (auto &rootNode : Roots) {
@@ -160,7 +165,7 @@ bool DFG2LLVM_X86::runOnModule(Module &M) {
 }
 
 // Initialize the HPVM runtime API. This makes it easier to insert these calls
-void CGT_X86::initRuntimeAPI() {
+void CGT_CPU::initRuntimeAPI() {
 
   // Load Runtime API Module
   SMDiagnostic Err;
@@ -176,10 +181,10 @@ void CGT_X86::initRuntimeAPI() {
     DEBUG(errs() << "Successfully loaded hpvm-rt API module\n");
 
   // Get or insert the global declarations for launch/wait functions
-  DECLARE(llvm_hpvm_x86_launch);
+  DECLARE(llvm_hpvm_cpu_launch);
   DECLARE(malloc);
-  DECLARE(llvm_hpvm_x86_wait);
-  DECLARE(llvm_hpvm_x86_argument_ptr);
+  DECLARE(llvm_hpvm_cpu_wait);
+  DECLARE(llvm_hpvm_cpu_argument_ptr);
   DECLARE(llvm_hpvm_streamLaunch);
   DECLARE(llvm_hpvm_streamPush);
   DECLARE(llvm_hpvm_streamPop);
@@ -191,10 +196,10 @@ void CGT_X86::initRuntimeAPI() {
   DECLARE(llvm_hpvm_createThread);
   DECLARE(llvm_hpvm_bufferPush);
   DECLARE(llvm_hpvm_bufferPop);
-  DECLARE(llvm_hpvm_x86_dstack_push);
-  DECLARE(llvm_hpvm_x86_dstack_pop);
-  DECLARE(llvm_hpvm_x86_getDimLimit);
-  DECLARE(llvm_hpvm_x86_getDimInstance);
+  DECLARE(llvm_hpvm_cpu_dstack_push);
+  DECLARE(llvm_hpvm_cpu_dstack_pop);
+  DECLARE(llvm_hpvm_cpu_getDimLimit);
+  DECLARE(llvm_hpvm_cpu_getDimInstance);
 
   // Get or insert timerAPI functions as well if you plan to use timers
   initTimerAPI();
@@ -202,7 +207,7 @@ void CGT_X86::initRuntimeAPI() {
   // Insert init context in main
   Function *VI = M.getFunction("llvm.hpvm.init");
   assert(VI->getNumUses() == 1 && "__hpvm__init should only be used once");
-  DEBUG(errs() << "Inserting x86 timer initialization\n");
+  DEBUG(errs() << "Inserting cpu timer initialization\n");
   Instruction *I = cast<Instruction>(*VI->user_begin());
   initializeTimerSet(I);
   switchToTimer(hpvm_TimerID_NONE, I);
@@ -210,13 +215,13 @@ void CGT_X86::initRuntimeAPI() {
   Function *VC = M.getFunction("llvm.hpvm.cleanup");
   assert(VC->getNumUses() == 1 && "__hpvm__cleanup should only be used once");
 
-  DEBUG(errs() << "Inserting x86 timer print\n");
+  DEBUG(errs() << "Inserting cpu timer print\n");
   printTimerSet(I);
 }
 
 /* Returns vector of all wait instructions
  */
-std::vector<IntrinsicInst *> *CGT_X86::getUseList(Value *GraphID) {
+std::vector<IntrinsicInst *> *CGT_CPU::getUseList(Value *GraphID) {
   std::vector<IntrinsicInst *> *UseList = new std::vector<IntrinsicInst *>();
   // It must have been loaded from memory somewhere
   for (Value::user_iterator ui = GraphID->user_begin(),
@@ -234,7 +239,7 @@ std::vector<IntrinsicInst *> *CGT_X86::getUseList(Value *GraphID) {
 /* Traverse the function argument list in reverse order to get argument at a
  * distance offset fromt he end of argument list of function F
  */
-Argument *CGT_X86::getArgumentFromEnd(Function *F, unsigned offset) {
+Argument *CGT_CPU::getArgumentFromEnd(Function *F, unsigned offset) {
   assert((F->getFunctionType()->getNumParams() >= offset && offset > 0) &&
          "Invalid offset to access arguments!");
   Function::arg_iterator e = F->arg_end();
@@ -259,7 +264,7 @@ Argument *CGT_X86::getArgumentFromEnd(Function *F, unsigned offset) {
  * which loops over bidy if true and goes to end if false
  * (5) Update phi node of body
  */
-void CGT_X86::addWhileLoop(Instruction *CondBlockStart, Instruction *BodyStart,
+void CGT_CPU::addWhileLoop(Instruction *CondBlockStart, Instruction *BodyStart,
                            Instruction *BodyEnd, Value *TerminationCond) {
   BasicBlock *Entry = CondBlockStart->getParent();
   BasicBlock *CondBlock = Entry->splitBasicBlock(CondBlockStart, "condition");
@@ -276,7 +281,7 @@ void CGT_X86::addWhileLoop(Instruction *CondBlockStart, Instruction *BodyStart,
   ReplaceInstWithInst(WhileBody->getTerminator(), UnconditionalBranch);
 }
 
-Instruction *CGT_X86::addWhileLoopCounter(BasicBlock *Entry, BasicBlock *Cond,
+Instruction *CGT_CPU::addWhileLoopCounter(BasicBlock *Entry, BasicBlock *Cond,
                                           BasicBlock *Body) {
   Module *M = Entry->getParent()->getParent();
   Type *Int64Ty = Type::getInt64Ty(M->getContext());
@@ -311,7 +316,7 @@ Instruction *CGT_X86::addWhileLoopCounter(BasicBlock *Entry, BasicBlock *Cond,
  * which loops over bidy if true and goes to end if false
  * (5) Update phi node of body
  */
-Value *CGT_X86::addLoop(Instruction *I, Value *limit, const Twine &indexName) {
+Value *CGT_CPU::addLoop(Instruction *I, Value *limit, const Twine &indexName) {
   BasicBlock *Entry = I->getParent();
   BasicBlock *ForBody = Entry->splitBasicBlock(I, "for.body");
 
@@ -356,7 +361,7 @@ Value *CGT_X86::addLoop(Instruction *I, Value *limit, const Twine &indexName) {
 // types, output types and isLastInput buffer type. All the streaming
 // inputs/outputs are converted to i8*, since this is the type of buffer
 // handles.
-StructType *CGT_X86::getArgumentListStructTy(DFNode *C) {
+StructType *CGT_CPU::getArgumentListStructTy(DFNode *C) {
   std::vector<Type *> TyList;
   // Input types
   Function *CF = C->getFuncPointer();
@@ -384,7 +389,7 @@ StructType *CGT_X86::getArgumentListStructTy(DFNode *C) {
   return STy;
 }
 
-void CGT_X86::startNodeThread(DFNode *C, std::vector<Value *> Args,
+void CGT_CPU::startNodeThread(DFNode *C, std::vector<Value *> Args,
                               DenseMap<DFEdge *, Value *> EdgeBufferMap,
                               Value *isLastInputBuffer, Value *graphID,
                               Instruction *IB) {
@@ -495,7 +500,7 @@ void CGT_X86::startNodeThread(DFNode *C, std::vector<Value *> Args,
                    ArrayRef<Value *>(CreateThreadArgs, 3), "", IB);
 }
 
-Function *CGT_X86::createLaunchFunction(DFInternalNode *N) {
+Function *CGT_CPU::createLaunchFunction(DFInternalNode *N) {
   DEBUG(errs() << "Generating Streaming Launch Function\n");
   // Get Function associated with Node N
   Function *NF = N->getFuncPointer();
@@ -643,7 +648,7 @@ Function *CGT_X86::createLaunchFunction(DFInternalNode *N) {
  * Modify each of the instrinsic in host code
  * Launch, Push, Pop, Wait
  */
-void CGT_X86::codeGenLaunchStreaming(DFInternalNode *Root) {
+void CGT_CPU::codeGenLaunchStreaming(DFInternalNode *Root) {
   IntrinsicInst *LI = Root->getInstruction();
   Function *RootLaunch = createLaunchFunction(Root);
   // Substitute launch intrinsic main
@@ -654,7 +659,7 @@ void CGT_X86::codeGenLaunchStreaming(DFInternalNode *Root) {
       "graph" + Root->getFuncPointer()->getName(), LI);
 
   DEBUG(errs() << *LaunchInst << "\n");
-  // Replace all wait instructions with x86 specific wait instructions
+  // Replace all wait instructions with cpu specific wait instructions
   DEBUG(errs() << "Substitute wait, push, pop intrinsics\n");
   std::vector<IntrinsicInst *> *UseList = getUseList(LI);
   for (unsigned i = 0; i < UseList->size(); ++i) {
@@ -684,7 +689,7 @@ void CGT_X86::codeGenLaunchStreaming(DFInternalNode *Root) {
   }
 }
 
-void CGT_X86::codeGenLaunch(DFInternalNode *Root) {
+void CGT_CPU::codeGenLaunch(DFInternalNode *Root) {
   // TODO: Place an assert to check if the constant passed by launch intrinsic
   // as the number of arguments to DFG is same as the number of arguments of the
   // root of DFG
@@ -725,28 +730,28 @@ void CGT_X86::codeGenLaunch(DFInternalNode *Root) {
   switchToTimer(hpvm_TimerID_ARG_UNPACK, RI);
 
   DEBUG(errs() << "Created Empty Launch Function\n");
-  // Find the X86 function generated for Root and
-  //  Function* RootF_X86 = Root->getGenFunc();
-  Function *RootF_X86 = Root->getGenFuncForTarget(hpvm::CPU_TARGET);
-  assert(RootF_X86 && "Error: No generated CPU function for Root node\n");
-  assert(Root->hasX86GenFuncForTarget(hpvm::CPU_TARGET) &&
-         "Error: Generated Function for Root node with no x86 wrapper\n");
-
-  // Generate a call to RootF_X86 with null parameters for now
+  // Find the CPU function generated for Root and
+  //  Function* RootF_CPU = Root->getGenFunc();
+  Function *RootF_CPU = Root->getGenFuncForTarget(hpvm::CPU_TARGET);
+  assert(RootF_CPU && "Error: No generated CPU function for Root node\n");
+  assert(Root->hasCPUGenFuncForTarget(hpvm::CPU_TARGET) &&
+         "Error: Generated Function for Root node with no cpu wrapper\n");
+
+  // Generate a call to RootF_CPU with null parameters for now
   std::vector<Value *> Args;
-  for (unsigned i = 0; i < RootF_X86->getFunctionType()->getNumParams(); i++) {
+  for (unsigned i = 0; i < RootF_CPU->getFunctionType()->getNumParams(); i++) {
     Args.push_back(
-        Constant::getNullValue(RootF_X86->getFunctionType()->getParamType(i)));
+        Constant::getNullValue(RootF_CPU->getFunctionType()->getParamType(i)));
   }
   CallInst *CI =
-      CallInst::Create(RootF_X86, Args, RootF_X86->getName() + ".output", RI);
+      CallInst::Create(RootF_CPU, Args, RootF_CPU->getName() + ".output", RI);
 
   // Extract input data from i8* data.addr and patch them to correct argument of
-  // call to RootF_X86. For each argument
+  // call to RootF_CPU. For each argument
   std::vector<Type *> TyList;
   std::vector<std::string> names;
-  for (Function::arg_iterator ai = RootF_X86->arg_begin(),
-                              ae = RootF_X86->arg_end();
+  for (Function::arg_iterator ai = RootF_CPU->arg_begin(),
+                              ae = RootF_CPU->arg_end();
        ai != ae; ++ai) {
     TyList.push_back(ai->getType());
     names.push_back(ai->getName());
@@ -756,19 +761,19 @@ void CGT_X86::codeGenLaunch(DFInternalNode *Root) {
   for (unsigned i = 0; i < CI->getNumArgOperands(); i++)
     CI->setArgOperand(i, elements[i]);
 
-  // Add timers around Call to RootF_X86 function
+  // Add timers around Call to RootF_CPU function
   switchToTimer(hpvm_TimerID_COMPUTATION, CI);
   switchToTimer(hpvm_TimerID_OUTPUT_PACK, RI);
 
   StructType *RootRetTy =
-      cast<StructType>(RootF_X86->getFunctionType()->getReturnType());
+      cast<StructType>(RootF_CPU->getFunctionType()->getReturnType());
 
   // if Root has non empty return
   if (RootRetTy->getNumElements()) {
     // We can't access the type of the arg struct - build it
     std::vector<Type *> TyList;
-    for (Function::arg_iterator ai = RootF_X86->arg_begin(),
-                                ae = RootF_X86->arg_end();
+    for (Function::arg_iterator ai = RootF_CPU->arg_begin(),
+                                ae = RootF_CPU->arg_end();
          ai != ae; ++ai) {
       TyList.push_back(ai->getType());
     }
@@ -776,7 +781,7 @@ void CGT_X86::codeGenLaunch(DFInternalNode *Root) {
 
     StructType *ArgStructTy = StructType::create(
         M.getContext(), ArrayRef<Type *>(TyList),
-        (RootF_X86->getName() + ".arg.struct.ty").str(), true);
+        (RootF_CPU->getName() + ".arg.struct.ty").str(), true);
 
     // Cast the data pointer to the type of the arg struct
     CastInst *OutputAddrCast = CastInst::CreatePointerCast(
@@ -816,19 +821,19 @@ void CGT_X86::codeGenLaunch(DFInternalNode *Root) {
   // Substitute launch intrinsic main
   Value *LaunchInstArgs[] = {AppFunc, LI->getArgOperand(1)};
   CallInst *LaunchInst = CallInst::Create(
-      llvm_hpvm_x86_launch, ArrayRef<Value *>(LaunchInstArgs, 2),
+      llvm_hpvm_cpu_launch, ArrayRef<Value *>(LaunchInstArgs, 2),
       "graph" + Root->getFuncPointer()->getName(), LI);
   // ReplaceInstWithInst(LI, LaunchInst);
 
   DEBUG(errs() << *LaunchInst << "\n");
-  // Replace all wait instructions with x86 specific wait instructions
+  // Replace all wait instructions with cpu specific wait instructions
   std::vector<IntrinsicInst *> *UseList = getUseList(LI);
   for (unsigned i = 0; i < UseList->size(); ++i) {
     IntrinsicInst *II = UseList->at(i);
     CallInst *CI;
     switch (II->getIntrinsicID()) {
     case Intrinsic::hpvm_wait:
-      CI = CallInst::Create(llvm_hpvm_x86_wait, ArrayRef<Value *>(LaunchInst),
+      CI = CallInst::Create(llvm_hpvm_cpu_wait, ArrayRef<Value *>(LaunchInst),
                             "");
       break;
     case Intrinsic::hpvm_push:
@@ -848,7 +853,7 @@ void CGT_X86::codeGenLaunch(DFInternalNode *Root) {
   }
 }
 
-Value *CGT_X86::getInValueAt(DFNode *Child, unsigned i, Function *ParentF_X86,
+Value *CGT_CPU::getInValueAt(DFNode *Child, unsigned i, Function *ParentF_CPU,
                              Instruction *InsertBefore) {
   // TODO: Assumption is that each input port of a node has just one
   // incoming edge. May change later on.
@@ -863,7 +868,7 @@ Value *CGT_X86::getInValueAt(DFNode *Child, unsigned i, Function *ParentF_X86,
   // argument from argument list of this internal node
   Value *inputVal;
   if (SrcDF->isEntryNode()) {
-    inputVal = getArgumentAt(ParentF_X86, E->getSourcePosition());
+    inputVal = getArgumentAt(ParentF_CPU, E->getSourcePosition());
     DEBUG(errs() << "Argument " << i << " = " << *inputVal << "\n");
   } else {
     // edge is from a sibling
@@ -885,38 +890,38 @@ Value *CGT_X86::getInValueAt(DFNode *Child, unsigned i, Function *ParentF_X86,
   return inputVal;
 }
 
-void CGT_X86::invokeChild_X86(DFNode *C, Function *F_X86,
+void CGT_CPU::invokeChild_CPU(DFNode *C, Function *F_CPU,
                               ValueToValueMapTy &VMap, Instruction *IB) {
   Function *CF = C->getFuncPointer();
 
-  //  Function* CF_X86 = C->getGenFunc();
-  Function *CF_X86 = C->getGenFuncForTarget(hpvm::CPU_TARGET);
-  assert(CF_X86 != NULL &&
+  //  Function* CF_CPU = C->getGenFunc();
+  Function *CF_CPU = C->getGenFuncForTarget(hpvm::CPU_TARGET);
+  assert(CF_CPU != NULL &&
          "Found leaf node for which code generation has not happened yet!\n");
-  assert(C->hasX86GenFuncForTarget(hpvm::CPU_TARGET) &&
-         "The generated function to be called from x86 backend is not an x86 "
+  assert(C->hasCPUGenFuncForTarget(hpvm::CPU_TARGET) &&
+         "The generated function to be called from cpu backend is not an cpu "
          "function\n");
-  DEBUG(errs() << "Invoking child node" << CF_X86->getName() << "\n");
+  DEBUG(errs() << "Invoking child node" << CF_CPU->getName() << "\n");
 
   std::vector<Value *> Args;
   // Create argument list to pass to call instruction
   // First find the correct values using the edges
   // The remaing six values are inserted as constants for now.
   for (unsigned i = 0; i < CF->getFunctionType()->getNumParams(); i++) {
-    Args.push_back(getInValueAt(C, i, F_X86, IB));
+    Args.push_back(getInValueAt(C, i, F_CPU, IB));
   }
 
-  Value *I64Zero = ConstantInt::get(Type::getInt64Ty(F_X86->getContext()), 0);
+  Value *I64Zero = ConstantInt::get(Type::getInt64Ty(F_CPU->getContext()), 0);
   for (unsigned j = 0; j < 6; j++)
     Args.push_back(I64Zero);
 
-  errs() << "Gen Function type: " << *CF_X86->getType() << "\n";
-  errs() << "Node Function type: " << *CF->getType() << "\n";
-  errs() << "Arguments: " << Args.size() << "\n";
+  DEBUG(errs() << "Gen Function type: " << *CF_CPU->getType() << "\n");
+  DEBUG(errs() << "Node Function type: " << *CF->getType() << "\n");
+  DEBUG(errs() << "Arguments: " << Args.size() << "\n");
 
-  // Call the F_X86 function associated with this node
+  // Call the F_CPU function associated with this node
   CallInst *CI =
-      CallInst::Create(CF_X86, Args, CF_X86->getName() + "_output", IB);
+      CallInst::Create(CF_CPU, Args, CF_CPU->getName() + "_output", IB);
   DEBUG(errs() << *CI << "\n");
   OutputMap[C] = CI;
 
@@ -928,7 +933,7 @@ void CGT_X86::invokeChild_X86(DFNode *C, Function *F_X86,
     Value *indexLimit = NULL;
     // Limit can either be a constant or an arguement of the internal node.
     // In case of constant we can use that constant value directly in the
-    // new F_X86 function. In case of an argument, we need to get the mapped
+    // new F_CPU function. In case of an argument, we need to get the mapped
     // value using VMap
     if (isa<Constant>(C->getDimLimits()[j])) {
       indexLimit = C->getDimLimits()[j];
@@ -960,7 +965,7 @@ void CGT_X86::invokeChild_X86(DFNode *C, Function *F_X86,
       CI->getArgOperand(numArgs - 6 + 2)  // iZ
   };
 
-  CallInst *Push = CallInst::Create(llvm_hpvm_x86_dstack_push,
+  CallInst *Push = CallInst::Create(llvm_hpvm_cpu_dstack_push,
                                     ArrayRef<Value *>(args, 7), "", CI);
   DEBUG(errs() << "Push on stack: " << *Push << "\n");
   // Insert call to runtime to pop the dim limits and instanceID from the depth
@@ -973,7 +978,7 @@ void CGT_X86::invokeChild_X86(DFNode *C, Function *F_X86,
   assert(NextI->getParent() == CI->getParent() &&
          "Next Instruction should also belong to the same basic block!");
 
-  CallInst *Pop = CallInst::Create(llvm_hpvm_x86_dstack_pop, None, "", NextI);
+  CallInst *Pop = CallInst::Create(llvm_hpvm_cpu_dstack_pop, None, "", NextI);
   DEBUG(errs() << "Pop from stack: " << *Pop << "\n");
   DEBUG(errs() << *CI->getParent()->getParent());
 }
@@ -994,7 +999,7 @@ void CGT_X86::invokeChild_X86(DFNode *C, Function *F_X86,
 // Add runtime API calls to push output for each of the streaming outputs
 // Add loop around the basic block, which exits the loop if isLastInput is false
 
-Function *CGT_X86::createFunctionFilter(DFNode *C) {
+Function *CGT_CPU::createFunctionFilter(DFNode *C) {
   DEBUG(errs() << "*********Creating Function filter for "
                << C->getFuncPointer()->getName() << "*****\n");
 
@@ -1160,7 +1165,7 @@ Function *CGT_X86::createFunctionFilter(DFNode *C) {
   return CF_Pipeline;
 }
 
-void CGT_X86::codeGen(DFInternalNode *N) {
+void CGT_CPU::codeGen(DFInternalNode *N) {
   // Check if N is root node and its graph is streaming. We do not do codeGen
   // for Root in such a case
   if (N->isRoot() && N->isChildGraphStreaming())
@@ -1182,7 +1187,7 @@ void CGT_X86::codeGen(DFInternalNode *N) {
   // Sort children in topological order before code generation
   N->getChildGraph()->sortChildren();
 
-  // Only process if all children have a CPU x86 function
+  // Only process if all children have a CPU cpu function
   // Otherwise skip to end
   bool codeGen = true;
   for (DFGraph::children_iterator ci = N->getChildGraph()->begin(),
@@ -1193,11 +1198,11 @@ void CGT_X86::codeGen(DFInternalNode *N) {
     if (C->isDummyNode())
       continue;
 
-    if (!(C->hasX86GenFuncForTarget(hpvm::CPU_TARGET))) {
-      errs() << "No CPU x86 version for child node "
-             << C->getFuncPointer()->getName()
-             << "\n  Skip code gen for parent node "
-             << N->getFuncPointer()->getName() << "\n";
+    if (!(C->hasCPUGenFuncForTarget(hpvm::CPU_TARGET))) {
+      DEBUG(errs() << "No CPU cpu version for child node "
+                   << C->getFuncPointer()->getName()
+                   << "\n  Skip code gen for parent node "
+                   << N->getFuncPointer()->getName() << "\n");
       codeGen = false;
     }
   }
@@ -1206,18 +1211,18 @@ void CGT_X86::codeGen(DFInternalNode *N) {
     Function *F = N->getFuncPointer();
     // Create of clone of F with no instructions. Only the type is the same as F
     // without the extra arguments.
-    Function *F_X86;
+    Function *F_CPU;
 
     // Clone the function, if we are seeing this function for the first time. We
     // only need a clone in terms of type.
     ValueToValueMapTy VMap;
 
     // Create new function with the same type
-    F_X86 = Function::Create(F->getFunctionType(), F->getLinkage(),
+    F_CPU = Function::Create(F->getFunctionType(), F->getLinkage(),
                              F->getName(), &M);
 
     // Loop over the arguments, copying the names of arguments over.
-    Function::arg_iterator dest_iterator = F_X86->arg_begin();
+    Function::arg_iterator dest_iterator = F_CPU->arg_begin();
     for (Function::const_arg_iterator i = F->arg_begin(), e = F->arg_end();
          i != e; ++i) {
       dest_iterator->setName(i->getName()); // Copy the name over...
@@ -1226,24 +1231,24 @@ void CGT_X86::codeGen(DFInternalNode *N) {
     }
 
     // Add a basic block to this empty function
-    BasicBlock *BB = BasicBlock::Create(F_X86->getContext(), "entry", F_X86);
+    BasicBlock *BB = BasicBlock::Create(F_CPU->getContext(), "entry", F_CPU);
     ReturnInst *RI = ReturnInst::Create(
-        F_X86->getContext(), UndefValue::get(F_X86->getReturnType()), BB);
+        F_CPU->getContext(), UndefValue::get(F_CPU->getReturnType()), BB);
 
     // Add Index and Dim arguments except for the root node and the child graph
     // of parent node is not streaming
     if (!N->isRoot() && !N->getParent()->isChildGraphStreaming())
-      F_X86 = addIdxDimArgs(F_X86);
+      F_CPU = addIdxDimArgs(F_CPU);
 
-    BB = &*F_X86->begin();
+    BB = &*F_CPU->begin();
     RI = cast<ReturnInst>(BB->getTerminator());
 
     // Add generated function info to DFNode
-    //    N->setGenFunc(F_X86, hpvm::CPU_TARGET);
-    N->addGenFunc(F_X86, hpvm::CPU_TARGET, true);
+    //    N->setGenFunc(F_CPU, hpvm::CPU_TARGET);
+    N->addGenFunc(F_CPU, hpvm::CPU_TARGET, true);
 
     // Loop over the arguments, to create the VMap.
-    dest_iterator = F_X86->arg_begin();
+    dest_iterator = F_CPU->arg_begin();
     for (Function::const_arg_iterator i = F->arg_begin(), e = F->arg_end();
          i != e; ++i) {
       // Add mapping and increment dest iterator
@@ -1261,7 +1266,7 @@ void CGT_X86::codeGen(DFInternalNode *N) {
         continue;
 
       // Create calls to CPU function of child node
-      invokeChild_X86(C, F_X86, VMap, RI);
+      invokeChild_CPU(C, F_CPU, VMap, RI);
     }
 
     DEBUG(errs() << "*** Generating epilogue code for the function****\n");
@@ -1270,7 +1275,7 @@ void CGT_X86::codeGen(DFInternalNode *N) {
     DFNode *C = N->getChildGraph()->getExit();
     // Get OutputType of this node
     StructType *OutTy = N->getOutputType();
-    Value *retVal = UndefValue::get(F_X86->getReturnType());
+    Value *retVal = UndefValue::get(F_CPU->getReturnType());
     // Find all the input edges to exit node
     for (unsigned i = 0; i < OutTy->getNumElements(); i++) {
       DEBUG(errs() << "Output Edge " << i << "\n");
@@ -1288,7 +1293,7 @@ void CGT_X86::codeGen(DFInternalNode *N) {
       // argument from argument list of this internal node
       Value *inputVal;
       if (SrcDF->isEntryNode()) {
-        inputVal = getArgumentAt(F_X86, i);
+        inputVal = getArgumentAt(F_CPU, i);
         DEBUG(errs() << "Argument " << i << " = " << *inputVal << "\n");
       } else {
         // edge is from a internal node
@@ -1313,14 +1318,14 @@ void CGT_X86::codeGen(DFInternalNode *N) {
     }
     DEBUG(errs() << "Extracted all\n");
     retVal->setName("output");
-    ReturnInst *newRI = ReturnInst::Create(F_X86->getContext(), retVal);
+    ReturnInst *newRI = ReturnInst::Create(F_CPU->getContext(), retVal);
     ReplaceInstWithInst(RI, newRI);
   }
 
   //-------------------------------------------------------------------------//
   // Here, we need to check if this node (N) has more than one versions
   // If so, we query the policy and have a call to each version
-  // If not, we see which version exists, check that it is in fact an x86
+  // If not, we see which version exists, check that it is in fact an cpu
   // function and save it as the CPU_TARGET function
 
   // TODO: hpvm_id per node, so we can use this for id for policies
@@ -1328,16 +1333,16 @@ void CGT_X86::codeGen(DFInternalNode *N) {
   Function *CF = N->getGenFuncForTarget(hpvm::CPU_TARGET);
   Function *GF = N->getGenFuncForTarget(hpvm::GPU_TARGET);
 
-  bool CFx86 = N->hasX86GenFuncForTarget(hpvm::CPU_TARGET);
-  bool GFx86 = N->hasX86GenFuncForTarget(hpvm::GPU_TARGET);
+  bool CFcpu = N->hasCPUGenFuncForTarget(hpvm::CPU_TARGET);
+  bool GFcpu = N->hasCPUGenFuncForTarget(hpvm::GPU_TARGET);
 
   DEBUG(errs() << "Before editing\n");
   DEBUG(errs() << "Node: " << N->getFuncPointer()->getName() << " with tag "
                << N->getTag() << "\n");
   DEBUG(errs() << "CPU Fun: " << (CF ? CF->getName() : "null") << "\n");
-  DEBUG(errs() << "hasx86GenFuncForCPU : " << CFx86 << "\n");
+  DEBUG(errs() << "hascpuGenFuncForCPU : " << CFcpu << "\n");
   DEBUG(errs() << "GPU Fun: " << (GF ? GF->getName() : "null") << "\n");
-  DEBUG(errs() << "hasx86GenFuncForGPU : " << GFx86 << "\n");
+  DEBUG(errs() << "hascpuGenFuncForGPU : " << GFcpu << "\n");
 
   if (N->getTag() == hpvm::None) {
     // No code is available for this node. This (usually) means that this
@@ -1357,15 +1362,15 @@ void CGT_X86::codeGen(DFInternalNode *N) {
     switch (N->getTag()) {
     case hpvm::CPU_TARGET:
       assert(N->getGenFuncForTarget(hpvm::CPU_TARGET) && "");
-      assert(N->hasX86GenFuncForTarget(hpvm::CPU_TARGET) && "");
+      assert(N->hasCPUGenFuncForTarget(hpvm::CPU_TARGET) && "");
       assert(!(N->getGenFuncForTarget(hpvm::GPU_TARGET)) && "");
-      assert(!(N->hasX86GenFuncForTarget(hpvm::GPU_TARGET)) && "");
+      assert(!(N->hasCPUGenFuncForTarget(hpvm::GPU_TARGET)) && "");
       break;
     case hpvm::GPU_TARGET:
       assert(!(N->getGenFuncForTarget(hpvm::CPU_TARGET)) && "");
-      assert(!(N->hasX86GenFuncForTarget(hpvm::CPU_TARGET)) && "");
+      assert(!(N->hasCPUGenFuncForTarget(hpvm::CPU_TARGET)) && "");
       assert(N->getGenFuncForTarget(hpvm::GPU_TARGET) && "");
-      assert(N->hasX86GenFuncForTarget(hpvm::GPU_TARGET) && "");
+      assert(N->hasCPUGenFuncForTarget(hpvm::GPU_TARGET) && "");
       break;
     default:
       assert(false && "Unreachable: we checked that tag was single target!\n");
@@ -1380,16 +1385,16 @@ void CGT_X86::codeGen(DFInternalNode *N) {
     CF = N->getGenFuncForTarget(hpvm::CPU_TARGET);
     GF = N->getGenFuncForTarget(hpvm::GPU_TARGET);
 
-    CFx86 = N->hasX86GenFuncForTarget(hpvm::CPU_TARGET);
-    GFx86 = N->hasX86GenFuncForTarget(hpvm::GPU_TARGET);
+    CFcpu = N->hasCPUGenFuncForTarget(hpvm::CPU_TARGET);
+    GFcpu = N->hasCPUGenFuncForTarget(hpvm::GPU_TARGET);
 
     DEBUG(errs() << "After editing\n");
     DEBUG(errs() << "Node: " << N->getFuncPointer()->getName() << " with tag "
                  << N->getTag() << "\n");
     DEBUG(errs() << "CPU Fun: " << (CF ? CF->getName() : "null") << "\n");
-    DEBUG(errs() << "hasx86GenFuncForCPU : " << CFx86 << "\n");
+    DEBUG(errs() << "hascpuGenFuncForCPU : " << CFcpu << "\n");
     DEBUG(errs() << "GPU Fun: " << (GF ? GF->getName() : "null") << "\n");
-    DEBUG(errs() << "hasx86GenFuncForGPU : " << GFx86 << "\n");
+    DEBUG(errs() << "hascpuGenFuncForGPU : " << GFcpu << "\n");
 
   } else {
     assert(false && "Multiple tags unsupported!");
@@ -1397,17 +1402,17 @@ void CGT_X86::codeGen(DFInternalNode *N) {
 }
 
 // Code generation for leaf nodes
-void CGT_X86::codeGen(DFLeafNode *N) {
+void CGT_CPU::codeGen(DFLeafNode *N) {
   // Skip code generation if it is a dummy node
   if (N->isDummyNode()) {
     DEBUG(errs() << "Skipping dummy node\n");
     return;
   }
 
-  // At this point, the X86 backend does not support code generation for
+  // At this point, the CPU backend does not support code generation for
   // the case where allocation node is used, so we skip. This means that a
   // CPU version will not be created, and therefore code generation will
-  // only succeed if another backend (nvptx or spir) has been invoked to
+  // only succeed if another backend (opencl or spir) has been invoked to
   // generate a node function for the node including the allocation node.
   if (N->isAllocationNode()) {
     DEBUG(errs() << "Skipping allocation node\n");
@@ -1420,14 +1425,14 @@ void CGT_X86::codeGen(DFLeafNode *N) {
   //    return;
 
   if (!preferredTargetIncludes(N, hpvm::CPU_TARGET)) {
-    errs() << "No CPU hint for node " << N->getFuncPointer()->getName()
-           << " : skipping it\n";
+    DEBUG(errs() << "No CPU hint for node " << N->getFuncPointer()->getName()
+                 << " : skipping it\n");
 
     switch (N->getTag()) {
     case hpvm::GPU_TARGET:
-      // A leaf node should not have an x86 function for GPU
-      // by design of DFG2LLVM_NVPTX backend
-      assert(!(N->hasX86GenFuncForTarget(hpvm::GPU_TARGET)) &&
+      // A leaf node should not have an cpu function for GPU
+      // by design of DFG2LLVM_OpenCL backend
+      assert(!(N->hasCPUGenFuncForTarget(hpvm::GPU_TARGET)) &&
              "Leaf node not expected to have GPU GenFunc");
       break;
     default:
@@ -1448,34 +1453,34 @@ void CGT_X86::codeGen(DFLeafNode *N) {
   Function *F = N->getFuncPointer();
 
   // Clone the function, if we are seeing this function for the first time.
-  Function *F_X86;
+  Function *F_CPU;
   ValueToValueMapTy VMap;
-  F_X86 = CloneFunction(F, VMap);
-  F_X86->removeFromParent();
+  F_CPU = CloneFunction(F, VMap);
+  F_CPU->removeFromParent();
   // Insert the cloned function into the module
-  M.getFunctionList().push_back(F_X86);
+  M.getFunctionList().push_back(F_CPU);
 
   // Add the new argument to the argument list. Add arguments only if the cild
   // graph of parent node is not streaming
   if (!N->getParent()->isChildGraphStreaming())
-    F_X86 = addIdxDimArgs(F_X86);
+    F_CPU = addIdxDimArgs(F_CPU);
 
   // Add generated function info to DFNode
-  //  N->setGenFunc(F_X86, hpvm::CPU_TARGET);
-  N->addGenFunc(F_X86, hpvm::CPU_TARGET, true);
+  //  N->setGenFunc(F_CPU, hpvm::CPU_TARGET);
+  N->addGenFunc(F_CPU, hpvm::CPU_TARGET, true);
 
   // Go through the arguments, and any pointer arguments with in attribute need
-  // to have x86_argument_ptr call to get the x86 ptr of the argument
+  // to have cpu_argument_ptr call to get the cpu ptr of the argument
   // Insert these calls in a new BB which would dominate all other BBs
   // Create new BB
-  BasicBlock *EntryBB = &*F_X86->begin();
+  BasicBlock *EntryBB = &*F_CPU->begin();
   BasicBlock *BB =
-      BasicBlock::Create(M.getContext(), "getHPVMPtrArgs", F_X86, EntryBB);
+      BasicBlock::Create(M.getContext(), "getHPVMPtrArgs", F_CPU, EntryBB);
   BranchInst *Terminator = BranchInst::Create(EntryBB, BB);
   // Insert calls
-  for (Function::arg_iterator ai = F_X86->arg_begin(), ae = F_X86->arg_end();
+  for (Function::arg_iterator ai = F_CPU->arg_begin(), ae = F_CPU->arg_end();
        ai != ae; ++ai) {
-    if (F_X86->getAttributes().hasAttribute(ai->getArgNo() + 1,
+    if (F_CPU->getAttributes().hasAttribute(ai->getArgNo() + 1,
                                             Attribute::In)) {
       assert(ai->getType()->isPointerTy() &&
              "Only pointer arguments can have hpvm in/out attributes ");
@@ -1488,14 +1493,14 @@ void CGT_X86::codeGen(DFLeafNode *N) {
           &*ai, Type::getInt8PtrTy(M.getContext()), ai->getName() + ".i8ptr",
           Terminator);
       Value *ArgPtrCallArgs[] = {BI, size};
-      CallInst::Create(llvm_hpvm_x86_argument_ptr,
+      CallInst::Create(llvm_hpvm_cpu_argument_ptr,
                        ArrayRef<Value *>(ArgPtrCallArgs, 2), "", Terminator);
     }
   }
-  errs() << *BB << "\n";
+  DEBUG(errs() << *BB << "\n");
 
   // Go through all the instructions
-  for (inst_iterator i = inst_begin(F_X86), e = inst_end(F_X86); i != e; ++i) {
+  for (inst_iterator i = inst_begin(F_CPU), e = inst_end(F_CPU); i != e; ++i) {
     Instruction *I = &(*i);
     DEBUG(errs() << *I << "\n");
     // Leaf nodes should not contain HPVM graph intrinsics or launch
@@ -1572,19 +1577,19 @@ void CGT_X86::codeGen(DFLeafNode *N) {
                "ID!");
 
         // For immediate ancestor, use the extra argument introduced in
-        // F_X86
+        // F_CPU
         int numParamsF = F->getFunctionType()->getNumParams();
-        int numParamsF_X86 = F_X86->getFunctionType()->getNumParams();
+        int numParamsF_CPU = F_CPU->getFunctionType()->getNumParams();
         assert(
-            (numParamsF_X86 - numParamsF == 6) &&
+            (numParamsF_CPU - numParamsF == 6) &&
             "Difference of arguments between function and its clone is not 6!");
 
         if (parentLevel == 0) {
           // Case when the query is for this node itself
           unsigned offset = 3 + (3 - dim);
-          // Traverse argument list of F_X86 in reverse order to find the
+          // Traverse argument list of F_CPU in reverse order to find the
           // correct index or dim argument.
-          Argument *indexVal = getArgumentFromEnd(F_X86, offset);
+          Argument *indexVal = getArgumentFromEnd(F_CPU, offset);
           assert(indexVal && "Index argument not found. Invalid offset!");
 
           DEBUG(errs() << *II << " replaced with " << *indexVal << "\n");
@@ -1596,7 +1601,7 @@ void CGT_X86::codeGen(DFLeafNode *N) {
           Value *args[] = {
               ConstantInt::get(Type::getInt32Ty(II->getContext()), parentLevel),
               ConstantInt::get(Type::getInt32Ty(II->getContext()), dim)};
-          CallInst *CI = CallInst::Create(llvm_hpvm_x86_getDimInstance,
+          CallInst *CI = CallInst::Create(llvm_hpvm_cpu_getDimInstance,
                                           ArrayRef<Value *>(args, 2),
                                           "nodeInstanceID", II);
           DEBUG(errs() << *II << " replaced with " << *CI << "\n");
@@ -1630,19 +1635,19 @@ void CGT_X86::codeGen(DFLeafNode *N) {
                "Intrinsic ID!");
 
         // For immediate ancestor, use the extra argument introduced in
-        // F_X86
+        // F_CPU
         int numParamsF = F->getFunctionType()->getNumParams();
-        int numParamsF_X86 = F_X86->getFunctionType()->getNumParams();
+        int numParamsF_CPU = F_CPU->getFunctionType()->getNumParams();
         assert(
-            (numParamsF_X86 - numParamsF == 6) &&
+            (numParamsF_CPU - numParamsF == 6) &&
             "Difference of arguments between function and its clone is not 6!");
 
         if (parentLevel == 0) {
           // Case when the query is for this node itself
           unsigned offset = 3 - dim;
-          // Traverse argument list of F_X86 in reverse order to find the
+          // Traverse argument list of F_CPU in reverse order to find the
           // correct index or dim argument.
-          Argument *limitVal = getArgumentFromEnd(F_X86, offset);
+          Argument *limitVal = getArgumentFromEnd(F_CPU, offset);
           assert(limitVal && "Limit argument not found. Invalid offset!");
 
           DEBUG(errs() << *II << " replaced with " << *limitVal << "\n");
@@ -1654,7 +1659,7 @@ void CGT_X86::codeGen(DFLeafNode *N) {
           Value *args[] = {
               ConstantInt::get(Type::getInt32Ty(II->getContext()), parentLevel),
               ConstantInt::get(Type::getInt32Ty(II->getContext()), dim)};
-          CallInst *CI = CallInst::Create(llvm_hpvm_x86_getDimLimit,
+          CallInst *CI = CallInst::Create(llvm_hpvm_cpu_getDimLimit,
                                           ArrayRef<Value *>(args, 2),
                                           "numNodeInstances", II);
           DEBUG(errs() << *II << " replaced with " << *CI << "\n");
@@ -1682,13 +1687,13 @@ void CGT_X86::codeGen(DFLeafNode *N) {
     (*i)->eraseFromParent();
   }
 
-  DEBUG(errs() << *F_X86);
+  DEBUG(errs() << *F_CPU);
 }
 
 } // End of namespace
 
-char DFG2LLVM_X86::ID = 0;
-static RegisterPass<DFG2LLVM_X86>
-    X("dfg2llvm-x86", "Dataflow Graph to LLVM for X86 backend",
+char DFG2LLVM_CPU::ID = 0;
+static RegisterPass<DFG2LLVM_CPU>
+    X("dfg2llvm-cpu", "Dataflow Graph to LLVM for CPU backend",
       false /* does not modify the CFG */,
       true /* transformation, not just analysis */);
diff --git a/hpvm/lib/Transforms/DFG2LLVM_NVPTX/DFG2LLVM_NVPTX.exports b/hpvm/lib/Transforms/DFG2LLVM_CPU/DFG2LLVM_CPU.exports
similarity index 100%
rename from hpvm/lib/Transforms/DFG2LLVM_NVPTX/DFG2LLVM_NVPTX.exports
rename to hpvm/lib/Transforms/DFG2LLVM_CPU/DFG2LLVM_CPU.exports
diff --git a/hpvm/lib/Transforms/DFG2LLVM_X86/LLVMBuild.txt b/hpvm/lib/Transforms/DFG2LLVM_CPU/LLVMBuild.txt
similarity index 87%
rename from hpvm/lib/Transforms/DFG2LLVM_X86/LLVMBuild.txt
rename to hpvm/lib/Transforms/DFG2LLVM_CPU/LLVMBuild.txt
index 1e82065bf06fe059cbd081b42a9f83e37352b703..30ba8a76365d02ca8fcdcb34948442ef89f5755e 100644
--- a/hpvm/lib/Transforms/DFG2LLVM_X86/LLVMBuild.txt
+++ b/hpvm/lib/Transforms/DFG2LLVM_CPU/LLVMBuild.txt
@@ -1,4 +1,4 @@
-;===- ./lib/Transforms/DFG2LLVM_X86/LLVMBuild.txt --------------*- Conf -*--===;
+;===- ./lib/Transforms/DFG2LLVM_CPU/LLVMBuild.txt --------------*- Conf -*--===;
 ;
 ;                     The LLVM Compiler Infrastructure
 ;
@@ -17,5 +17,5 @@
 
 [component_0]
 type = Library
-name = DFG2LLVM_X86
+name = DFG2LLVM_CPU
 parent = Transforms
diff --git a/hpvm/lib/Transforms/DFG2LLVM_NVPTX/CMakeLists.txt b/hpvm/lib/Transforms/DFG2LLVM_OpenCL/CMakeLists.txt
similarity index 78%
rename from hpvm/lib/Transforms/DFG2LLVM_NVPTX/CMakeLists.txt
rename to hpvm/lib/Transforms/DFG2LLVM_OpenCL/CMakeLists.txt
index 832f6334a4bc048992ee545844941f44ef2c8fe0..00c651eaa250fc114f229f30e0cb7c121154ff96 100644
--- a/hpvm/lib/Transforms/DFG2LLVM_NVPTX/CMakeLists.txt
+++ b/hpvm/lib/Transforms/DFG2LLVM_OpenCL/CMakeLists.txt
@@ -4,9 +4,9 @@ endif()
 
 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DLLVM_BUILD_DIR=${PROJECT_BINARY_DIR}")
 
-add_llvm_library( LLVMDFG2LLVM_NVPTX
+add_llvm_library( LLVMDFG2LLVM_OpenCL
   MODULE
-  DFG2LLVM_NVPTX.cpp
+  DFG2LLVM_OpenCL.cpp
 
   DEPENDS
   intrinsics_gen
diff --git a/hpvm/lib/Transforms/DFG2LLVM_NVPTX/DFG2LLVM_NVPTX.cpp b/hpvm/lib/Transforms/DFG2LLVM_OpenCL/DFG2LLVM_OpenCL.cpp
similarity index 93%
rename from hpvm/lib/Transforms/DFG2LLVM_NVPTX/DFG2LLVM_NVPTX.cpp
rename to hpvm/lib/Transforms/DFG2LLVM_OpenCL/DFG2LLVM_OpenCL.cpp
index d250562043b633aa69b4ac6bf77ba2bf51167093..2d9a07500f355f7fd805f74c668814d905842fed 100644
--- a/hpvm/lib/Transforms/DFG2LLVM_NVPTX/DFG2LLVM_NVPTX.cpp
+++ b/hpvm/lib/Transforms/DFG2LLVM_OpenCL/DFG2LLVM_OpenCL.cpp
@@ -1,4 +1,4 @@
-//=== DFG2LLVM_NVPTX.cpp ===//
+//=== DFG2LLVM_OpenCL.cpp ===//
 //
 //                     The LLVM Compiler Infrastructure
 //
@@ -6,6 +6,14 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+// 
+// This pass is responsible for generating code for kernel code and code for 
+// launching kernels for GPU target using HPVM dataflow graph. The kernels are
+// generated into a separate file which is the C-Backend uses to generate 
+// OpenCL kernels with.
+//
+//===----------------------------------------------------------------------===//
+
 
 #define ENABLE_ASSERTS
 #define TARGET_PTX 64
@@ -14,7 +22,7 @@
 #define CONSTANT_ADDRSPACE 4
 #define SHARED_ADDRSPACE 3
 
-#define DEBUG_TYPE "DFG2LLVM_NVPTX"
+#define DEBUG_TYPE "DFG2LLVM_OpenCL"
 #include "SupportHPVM/DFG2LLVM.h"
 #include "SupportHPVM/HPVMTimer.h"
 #include "SupportHPVM/HPVMUtils.h"
@@ -54,8 +62,8 @@ using namespace dfg2llvm;
 using namespace hpvmUtils;
 
 // HPVM Command line option to use timer or not
-static cl::opt<bool> HPVMTimer_NVPTX("hpvm-timers-ptx",
-                                     cl::desc("Enable hpvm timers"));
+static cl::opt<bool> HPVMTimer_OpenCL("hpvm-timers-ptx",
+                                      cl::desc("Enable hpvm timers"));
 
 namespace {
 // Helper class declarations
@@ -149,10 +157,10 @@ static void findIntrinsicInst(Function *, Intrinsic::ID,
 static AtomicRMWInst::BinOp getAtomicOp(Intrinsic::ID);
 static std::string getAtomicOpName(Intrinsic::ID);
 
-// DFG2LLVM_NVPTX - The first implementation.
-struct DFG2LLVM_NVPTX : public DFG2LLVM {
+// DFG2LLVM_OpenCL - The first implementation.
+struct DFG2LLVM_OpenCL : public DFG2LLVM {
   static char ID; // Pass identification, replacement for typeid
-  DFG2LLVM_NVPTX() : DFG2LLVM(ID) {}
+  DFG2LLVM_OpenCL() : DFG2LLVM(ID) {}
 
 private:
 public:
@@ -160,7 +168,7 @@ public:
 };
 
 // Visitor for Code generation traversal (tree traversal for now)
-class CGT_NVPTX : public CodeGenTraversal {
+class CGT_OpenCL : public CodeGenTraversal {
 
 private:
   // Member variables
@@ -194,8 +202,8 @@ private:
 
   // Virtual Functions
   void init() {
-    HPVMTimer = HPVMTimer_NVPTX;
-    TargetName = "NVPTX";
+    HPVMTimer = HPVMTimer_OpenCL;
+    TargetName = "OpenCL";
   }
   void initRuntimeAPI();
   void codeGen(DFInternalNode *N);
@@ -203,7 +211,7 @@ private:
 
 public:
   // Constructor
-  CGT_NVPTX(Module &_M, BuildDFG &_DFG)
+  CGT_OpenCL(Module &_M, BuildDFG &_DFG)
       : CodeGenTraversal(_M, _DFG), KernelM(CloneModule(_M)) {
     init();
     initRuntimeAPI();
@@ -257,7 +265,7 @@ public:
 };
 
 // Initialize the HPVM runtime API. This makes it easier to insert these calls
-void CGT_NVPTX::initRuntimeAPI() {
+void CGT_OpenCL::initRuntimeAPI() {
 
   // Load Runtime API Module
   SMDiagnostic Err;
@@ -289,7 +297,7 @@ void CGT_NVPTX::initRuntimeAPI() {
   initTimerAPI();
 
   // Insert init context in main
-  DEBUG(errs() << "Gen Code to initialize NVPTX Timer\n");
+  DEBUG(errs() << "Gen Code to initialize OpenCL Timer\n");
   Function *VI = M.getFunction("llvm.hpvm.init");
   assert(VI->getNumUses() == 1 && "__hpvm__init should only be used once");
 
@@ -302,7 +310,7 @@ void CGT_NVPTX::initRuntimeAPI() {
   switchToTimer(hpvm_TimerID_NONE, InitCall);
 
   // Insert print instruction at hpvm exit
-  DEBUG(errs() << "Gen Code to print NVPTX Timer\n");
+  DEBUG(errs() << "Gen Code to print OpenCL Timer\n");
   Function *VC = M.getFunction("llvm.hpvm.cleanup");
   DEBUG(errs() << *VC << "\n");
   assert(VC->getNumUses() == 1 && "__hpvm__clear should only be used once");
@@ -316,8 +324,8 @@ void CGT_NVPTX::initRuntimeAPI() {
 // used to generate a function to associate with this leaf node. The function
 // is responsible for all the memory allocation/transfer and invoking the
 // kernel call on the device
-void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
-                                   const Twine &FileName) {
+void CGT_OpenCL::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
+                                    const Twine &FileName) {
   // Check if clone already exists. If it does, it means we have visited this
   // function before.
   //  assert(N->getGenFunc() == NULL && "Code already generated for this node");
@@ -338,18 +346,18 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
 
   // Create of clone of F with no instructions. Only the type is the same as F
   // without the extra arguments.
-  Function *F_X86;
+  Function *F_CPU;
 
   // Clone the function, if we are seeing this function for the first time. We
   // only need a clone in terms of type.
   ValueToValueMapTy VMap;
 
   // Create new function with the same type
-  F_X86 =
+  F_CPU =
       Function::Create(F->getFunctionType(), F->getLinkage(), F->getName(), &M);
 
   // Loop over the arguments, copying the names of arguments over.
-  Function::arg_iterator dest_iterator = F_X86->arg_begin();
+  Function::arg_iterator dest_iterator = F_CPU->arg_begin();
   for (Function::const_arg_iterator i = F->arg_begin(), e = F->arg_end();
        i != e; ++i) {
     dest_iterator->setName(i->getName()); // Copy the name over...
@@ -358,29 +366,29 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
   }
 
   // Add a basic block to this empty function
-  BasicBlock *BB = BasicBlock::Create(M.getContext(), "entry", F_X86);
+  BasicBlock *BB = BasicBlock::Create(M.getContext(), "entry", F_CPU);
   ReturnInst *RI = ReturnInst::Create(
-      M.getContext(), UndefValue::get(F_X86->getReturnType()), BB);
+      M.getContext(), UndefValue::get(F_CPU->getReturnType()), BB);
 
   // FIXME: Adding Index and Dim arguments are probably not required except
-  // for consistency purpose (DFG2LLVM_X86 does assume that all leaf nodes do
+  // for consistency purpose (DFG2LLVM_CPU does assume that all leaf nodes do
   // have those arguments)
 
   // Add Index and Dim arguments except for the root node
   if (!N->isRoot() && !N->getParent()->isChildGraphStreaming())
-    F_X86 = addIdxDimArgs(F_X86);
+    F_CPU = addIdxDimArgs(F_CPU);
 
-  BB = &*F_X86->begin();
+  BB = &*F_CPU->begin();
   RI = cast<ReturnInst>(BB->getTerminator());
 
   // Add the generated function info to DFNode
-  //  N->setGenFunc(F_X86, hpvm::CPU_TARGET);
-  N->addGenFunc(F_X86, hpvm::GPU_TARGET, true);
-  DEBUG(errs() << "Added GPUGenFunc: " << F_X86->getName() << " for node "
+  //  N->setGenFunc(F_CPU, hpvm::CPU_TARGET);
+  N->addGenFunc(F_CPU, hpvm::GPU_TARGET, true);
+  DEBUG(errs() << "Added GPUGenFunc: " << F_CPU->getName() << " for node "
                << N->getFuncPointer()->getName() << "\n");
 
   // Loop over the arguments, to create the VMap
-  dest_iterator = F_X86->arg_begin();
+  dest_iterator = F_CPU->arg_begin();
   for (Function::const_arg_iterator i = F->arg_begin(), e = F->arg_end();
        i != e; ++i) {
     // Add mapping to VMap and increment dest iterator
@@ -435,16 +443,16 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
 
   DEBUG(errs() << "Inserting launch call"
                << "\n");
-  CallInst *NVPTX_Ctx = CallInst::Create(llvm_hpvm_ocl_launch,
-                                         ArrayRef<Value *>(LaunchInstArgs, 2),
-                                         "graph" + KF->getName(), InitCall);
-  DEBUG(errs() << *NVPTX_Ctx << "\n");
-  GraphIDAddr = new GlobalVariable(M, NVPTX_Ctx->getType(), false,
-                                   GlobalValue::CommonLinkage,
-                                   Constant::getNullValue(NVPTX_Ctx->getType()),
-                                   "graph" + KF->getName() + ".addr");
+  CallInst *OpenCL_Ctx = CallInst::Create(llvm_hpvm_ocl_launch,
+                                          ArrayRef<Value *>(LaunchInstArgs, 2),
+                                          "graph" + KF->getName(), InitCall);
+  DEBUG(errs() << *OpenCL_Ctx << "\n");
+  GraphIDAddr = new GlobalVariable(
+      M, OpenCL_Ctx->getType(), false, GlobalValue::CommonLinkage,
+      Constant::getNullValue(OpenCL_Ctx->getType()),
+      "graph" + KF->getName() + ".addr");
   DEBUG(errs() << "Store at: " << *GraphIDAddr << "\n");
-  StoreInst *SI = new StoreInst(NVPTX_Ctx, GraphIDAddr, InitCall);
+  StoreInst *SI = new StoreInst(OpenCL_Ctx, GraphIDAddr, InitCall);
   DEBUG(errs() << *SI << "\n");
   switchToTimer(hpvm_TimerID_NONE, InitCall);
   switchToTimer(hpvm_TimerID_SETUP, RI);
@@ -463,14 +471,14 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
     for(unsigned i=0; i<KF->getFunctionType()->getNumParams(); i++) {
 
       // The kernel object gives us the mapping of arguments from kernel launch
-      // node function (F_X86) to kernel (kernel->KF)
-      Value* inputVal = getArgumentAt(F_X86, K->getInArgMap()[i]);
+      // node function (F_CPU) to kernel (kernel->KF)
+      Value* inputVal = getArgumentAt(F_CPU, K->getInArgMap()[i]);
 
   */
 
   for (auto &InArgMapPair : kernelInArgMap) {
     unsigned i = InArgMapPair.first;
-    Value *inputVal = getArgumentAt(F_X86, InArgMapPair.second);
+    Value *inputVal = getArgumentAt(F_CPU, InArgMapPair.second);
     DEBUG(errs() << "\tArgument " << i << " = " << *inputVal << "\n");
 
     // input value has been obtained.
@@ -504,7 +512,7 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
       // Assert that the pointer argument size (next argument) is in the map
       assert(kernelInArgMap.find(i + 1) != kernelInArgMap.end());
 
-      Value *inputSize = getArgumentAt(F_X86, kernelInArgMap[i + 1]);
+      Value *inputSize = getArgumentAt(F_CPU, kernelInArgMap[i + 1]);
       assert(
           inputSize->getType() == Type::getInt64Ty(M.getContext()) &&
           "Pointer type input must always be followed by size (integer type)");
@@ -606,7 +614,7 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
 
     std::vector<Value *> AllocInputArgs;
     for (unsigned i = 0; i < K->allocInArgMap.size(); i++) {
-      AllocInputArgs.push_back(getArgumentAt(F_X86, K->allocInArgMap.at(i)));
+      AllocInputArgs.push_back(getArgumentAt(F_CPU, K->allocInArgMap.at(i)));
     }
 
     CallInst *CI = CallInst::Create(F_alloc, AllocInputArgs, "", RI);
@@ -759,7 +767,7 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
   DFNode *C = N->getChildGraph()->getExit();
   // Get OutputType of this node
   StructType *OutTy = N->getOutputType();
-  Value *retVal = UndefValue::get(F_X86->getReturnType());
+  Value *retVal = UndefValue::get(F_CPU->getReturnType());
   // Find the kernel's output arg map, to use instead of the bindings
   std::vector<unsigned> outArgMap = kernel->getOutArgMap();
   // Find all the input edges to exit node
@@ -779,7 +787,7 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
     // argument from argument list of this internal node
     Value *inputVal;
     if (SrcDF->isEntryNode()) {
-      inputVal = getArgumentAt(F_X86, i);
+      inputVal = getArgumentAt(F_CPU, i);
       DEBUG(errs() << "Argument " << i << " = " << *inputVal << "\n");
     } else {
       // edge is from a internal node
@@ -812,13 +820,13 @@ void CGT_NVPTX::insertRuntimeCalls(DFInternalNode *N, Kernel *K,
   DEBUG(errs() << "Extracted all\n");
   switchToTimer(hpvm_TimerID_NONE, RI);
   retVal->setName("output");
-  ReturnInst *newRI = ReturnInst::Create(F_X86->getContext(), retVal);
+  ReturnInst *newRI = ReturnInst::Create(F_CPU->getContext(), retVal);
   ReplaceInstWithInst(RI, newRI);
 }
 
 // Right now, only targeting the one level case. In general, device functions
 // can return values so we don't need to change them
-void CGT_NVPTX::codeGen(DFInternalNode *N) {
+void CGT_OpenCL::codeGen(DFInternalNode *N) {
   DEBUG(errs() << "Inside internal node: " << N->getFuncPointer()->getName()
                << "\n");
   if (KernelLaunchNode == NULL)
@@ -910,7 +918,7 @@ void CGT_NVPTX::codeGen(DFInternalNode *N) {
   }
 }
 
-void CGT_NVPTX::codeGen(DFLeafNode *N) {
+void CGT_OpenCL::codeGen(DFLeafNode *N) {
   DEBUG(errs() << "Inside leaf node: " << N->getFuncPointer()->getName()
                << "\n");
 
@@ -991,49 +999,49 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
   // Look up if we have visited this function before. If we have, then just
   // get the cloned function pointer from DFNode. Otherwise, create the cloned
   // function and add it to the DFNode GenFunc.
-  //  Function *F_nvptx = N->getGenFunc();
-  Function *F_nvptx = N->getGenFuncForTarget(hpvm::GPU_TARGET);
+  //  Function *F_opencl = N->getGenFunc();
+  Function *F_opencl = N->getGenFuncForTarget(hpvm::GPU_TARGET);
 
-  assert(F_nvptx == NULL &&
+  assert(F_opencl == NULL &&
          "Error: Visiting a node for which code already generated");
   // Clone the function
   ValueToValueMapTy VMap;
 
-  // F_nvptx->setName(FName+"_nvptx");
+  // F_opencl->setName(FName+"_opencl");
 
   Twine FName = F->getName();
   StringRef fStr = FName.getSingleStringRef();
-  Twine newFName = Twine(fStr, "_nvptx");
-  F_nvptx = CloneFunction(F, VMap);
-  F_nvptx->setName(newFName);
+  Twine newFName = Twine(fStr, "_opencl");
+  F_opencl = CloneFunction(F, VMap);
+  F_opencl->setName(newFName);
 
   //  errs() << "Old Function Name: " << F->getName() << "\n";
-  //  errs() << "New Function Name: " << F_nvptx->getName() << "\n";
+  //  errs() << "New Function Name: " << F_opencl->getName() << "\n";
 
-  F_nvptx->removeFromParent();
+  F_opencl->removeFromParent();
 
   // Insert the cloned function into the kernels module
-  KernelM->getFunctionList().push_back(F_nvptx);
+  KernelM->getFunctionList().push_back(F_opencl);
 
-  // TODO: Iterate over all the instructions of F_nvptx and identify the
+  // TODO: Iterate over all the instructions of F_opencl and identify the
   // callees and clone them into this module.
-  DEBUG(errs() << *F_nvptx->getType());
-  DEBUG(errs() << *F_nvptx);
+  DEBUG(errs() << *F_opencl->getType());
+  DEBUG(errs() << *F_opencl);
 
   // Transform  the function to void and remove all target dependent attributes
   // from the function
-  F_nvptx = transformFunctionToVoid(F_nvptx);
+  F_opencl = transformFunctionToVoid(F_opencl);
 
   // Add generated function info to DFNode
-  //  N->setGenFunc(F_nvptx, hpvm::GPU_TARGET);
-  N->addGenFunc(F_nvptx, hpvm::GPU_TARGET, false);
+  //  N->setGenFunc(F_opencl, hpvm::GPU_TARGET);
+  N->addGenFunc(F_opencl, hpvm::GPU_TARGET, false);
 
   DEBUG(
       errs()
       << "Removing all attributes from Kernel Function and adding nounwind\n");
-  F_nvptx->removeAttributes(AttributeList::FunctionIndex,
-                            F_nvptx->getAttributes().getFnAttributes());
-  F_nvptx->addAttribute(AttributeList::FunctionIndex, Attribute::NoUnwind);
+  F_opencl->removeAttributes(AttributeList::FunctionIndex,
+                             F_opencl->getAttributes().getFnAttributes());
+  F_opencl->addAttribute(AttributeList::FunctionIndex, Attribute::NoUnwind);
 
   // FIXME: For now, assume only one allocation node
   kernel->AllocationNode = NULL;
@@ -1111,8 +1119,8 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
   // global address space
   unsigned argIndex = 0;
   std::vector<unsigned> GlobalMemArgs;
-  for (Function::arg_iterator ai = F_nvptx->arg_begin(),
-                              ae = F_nvptx->arg_end();
+  for (Function::arg_iterator ai = F_opencl->arg_begin(),
+                              ae = F_opencl->arg_end();
        ai != ae; ++ai) {
     if (ai->getType()->isPointerTy()) {
       // If the arguement is already chosen for shared memory arguemnt list,
@@ -1133,11 +1141,11 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
   // loads are not dependent on node id of current node, should be moved to
   // constant memory, subject to size of course
   std::vector<unsigned> ConstantMemArgs =
-      globalToConstantMemoryOpt(&GlobalMemArgs, F_nvptx);
+      globalToConstantMemoryOpt(&GlobalMemArgs, F_opencl);
 
-  F_nvptx = changeArgAddrspace(F_nvptx, ConstantMemArgs, GLOBAL_ADDRSPACE);
-  F_nvptx = changeArgAddrspace(F_nvptx, SharedMemArgs, SHARED_ADDRSPACE);
-  F_nvptx = changeArgAddrspace(F_nvptx, GlobalMemArgs, GLOBAL_ADDRSPACE);
+  F_opencl = changeArgAddrspace(F_opencl, ConstantMemArgs, GLOBAL_ADDRSPACE);
+  F_opencl = changeArgAddrspace(F_opencl, SharedMemArgs, SHARED_ADDRSPACE);
+  F_opencl = changeArgAddrspace(F_opencl, GlobalMemArgs, GLOBAL_ADDRSPACE);
 
   // Function to replace call instructions to functions in the kernel
   std::map<Function *, Function *> OrgToClonedFuncMap;
@@ -1168,7 +1176,7 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
   };
 
   // Go through all the instructions
-  for (inst_iterator i = inst_begin(F_nvptx), e = inst_end(F_nvptx); i != e;
+  for (inst_iterator i = inst_begin(F_opencl), e = inst_end(F_opencl); i != e;
        ++i) {
     Instruction *I = &(*i);
     // Leaf nodes should not contain HPVM graph intrinsics or launch
@@ -1189,7 +1197,7 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
       /**************************** llvm.hpvm.getNode()
        * *****************************/
       case Intrinsic::hpvm_getNode: {
-        DEBUG(errs() << F_nvptx->getName() << "\t: Handling getNode\n");
+        DEBUG(errs() << F_opencl->getName() << "\t: Handling getNode\n");
         // add mapping <intrinsic, this node> to the node-specific map
         Leaf_HandleToDFNodeMap[II] = N;
         IItoRemove.push_back(II);
@@ -1197,7 +1205,7 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
       /************************* llvm.hpvm.getParentNode()
        * **************************/
       case Intrinsic::hpvm_getParentNode: {
-        DEBUG(errs() << F_nvptx->getName() << "\t: Handling getParentNode\n");
+        DEBUG(errs() << F_opencl->getName() << "\t: Handling getParentNode\n");
         // get the parent node of the arg node
         // get argument node
         ArgII = cast<IntrinsicInst>((II->getOperand(0))->stripPointerCasts());
@@ -1213,7 +1221,7 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
       /*************************** llvm.hpvm.getNumDims()
        * ***************************/
       case Intrinsic::hpvm_getNumDims: {
-        DEBUG(errs() << F_nvptx->getName() << "\t: Handling getNumDims\n");
+        DEBUG(errs() << F_opencl->getName() << "\t: Handling getNumDims\n");
         // get node from map
         // get the appropriate field
         ArgII = cast<IntrinsicInst>((II->getOperand(0))->stripPointerCasts());
@@ -1234,7 +1242,8 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
       case Intrinsic::hpvm_getNodeInstanceID_x:
       case Intrinsic::hpvm_getNodeInstanceID_y:
       case Intrinsic::hpvm_getNodeInstanceID_z: {
-        DEBUG(errs() << F_nvptx->getName() << "\t: Handling getNodeInstanceID\n"
+        DEBUG(errs() << F_opencl->getName()
+                     << "\t: Handling getNodeInstanceID\n"
                      << "\t: " << *II << "\n");
         ArgII = cast<IntrinsicInst>((II->getOperand(0))->stripPointerCasts());
         ArgDFNode = Leaf_HandleToDFNodeMap[ArgII];
@@ -1318,7 +1327,7 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
         // then, why do we need to keep that info in the graph?  (only for the
         // kernel configuration during the call)
 
-        DEBUG(errs() << F_nvptx->getName()
+        DEBUG(errs() << F_opencl->getName()
                      << "\t: Handling getNumNodeInstances\n");
         ArgII = cast<IntrinsicInst>((II->getOperand(0))->stripPointerCasts());
         ArgDFNode = Leaf_HandleToDFNodeMap[ArgII];
@@ -1376,7 +1385,7 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
         IItoRemove.push_back(II);
       } break;
       case Intrinsic::hpvm_barrier: {
-        DEBUG(errs() << F_nvptx->getName() << "\t: Handling barrier\n");
+        DEBUG(errs() << F_opencl->getName() << "\t: Handling barrier\n");
         DEBUG(errs() << "Substitute with barrier()\n");
         DEBUG(errs() << *II << "\n");
         FunctionType *FT = FunctionType::get(
@@ -1587,7 +1596,7 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
   // search for pattern where float is being casted to int and loaded/stored and
   // change it.
   DEBUG(errs() << "finding pattern for replacement!\n");
-  for (inst_iterator i = inst_begin(F_nvptx), e = inst_end(F_nvptx); i != e;
+  for (inst_iterator i = inst_begin(F_opencl), e = inst_end(F_opencl); i != e;
        ++i) {
     bool cont = false;
     bool keepGEPI = false;
@@ -1625,7 +1634,7 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
     // check that addressspace is 1
     //	  if (GEPIaddrspace != 1) {
     //			// does not fit this pattern - addrspace of pointer
-    //argument is not global 			continue;
+    // argument is not global 			continue;
     //		}
     if (!(GEPI->hasOneUse())) {
       // does not fit this pattern - more than one uses
@@ -1867,8 +1876,8 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
     KernelM->getFunctionList().push_back(F);
   }
 
-  addCLMetadata(F_nvptx);
-  kernel->KernelFunction = F_nvptx;
+  addCLMetadata(F_opencl);
+  kernel->KernelFunction = F_opencl;
   DEBUG(errs() << "Identified kernel - " << kernel->KernelFunction->getName()
                << "\n");
   DEBUG(errs() << *KernelM);
@@ -1876,8 +1885,8 @@ void CGT_NVPTX::codeGen(DFLeafNode *N) {
   return;
 }
 
-bool DFG2LLVM_NVPTX::runOnModule(Module &M) {
-  DEBUG(errs() << "\nDFG2LLVM_NVPTX PASS\n");
+bool DFG2LLVM_OpenCL::runOnModule(Module &M) {
+  DEBUG(errs() << "\nDFG2LLVM_OpenCL PASS\n");
 
   // Get the BuildDFG Analysis Results:
   // - Dataflow graph
@@ -1891,7 +1900,7 @@ bool DFG2LLVM_NVPTX::runOnModule(Module &M) {
   //    = DFG.getHandleToDFEdgeMap();
 
   // Visitor for Code Generation Graph Traversal
-  CGT_NVPTX *CGTVisitor = new CGT_NVPTX(M, DFG);
+  CGT_OpenCL *CGTVisitor = new CGT_OpenCL(M, DFG);
 
   // Iterate over all the DFGs and produce code for each one of them
   for (auto rootNode : Roots) {
@@ -1907,7 +1916,7 @@ bool DFG2LLVM_NVPTX::runOnModule(Module &M) {
   return true;
 }
 
-std::string CGT_NVPTX::getKernelsModuleName(Module &M) {
+std::string CGT_OpenCL::getKernelsModuleName(Module &M) {
   /*SmallString<128> currentDir;
           llvm::sys::fs::current_path(currentDir);
           std::string fileName = getFilenameFromModule(M);
@@ -1917,7 +1926,7 @@ std::string CGT_NVPTX::getKernelsModuleName(Module &M) {
   return mid.append(".kernels.ll");
 }
 
-void CGT_NVPTX::fixValueAddrspace(Value *V, unsigned addrspace) {
+void CGT_OpenCL::fixValueAddrspace(Value *V, unsigned addrspace) {
   assert(isa<PointerType>(V->getType()) && "Value should be of Pointer Type!");
   PointerType *OldTy = cast<PointerType>(V->getType());
   PointerType *NewTy = PointerType::get(OldTy->getElementType(), addrspace);
@@ -1935,8 +1944,8 @@ void CGT_NVPTX::fixValueAddrspace(Value *V, unsigned addrspace) {
 }
 
 std::vector<unsigned>
-CGT_NVPTX::globalToConstantMemoryOpt(std::vector<unsigned> *GlobalMemArgs,
-                                     Function *F) {
+CGT_OpenCL::globalToConstantMemoryOpt(std::vector<unsigned> *GlobalMemArgs,
+                                      Function *F) {
   std::vector<unsigned> ConstantMemArgs;
   for (Function::arg_iterator ai = F->arg_begin(), ae = F->arg_end(); ai != ae;
        ++ai) {
@@ -1959,9 +1968,9 @@ CGT_NVPTX::globalToConstantMemoryOpt(std::vector<unsigned> *GlobalMemArgs,
   return ConstantMemArgs;
 }
 
-Function *CGT_NVPTX::changeArgAddrspace(Function *F,
-                                        std::vector<unsigned> &Args,
-                                        unsigned addrspace) {
+Function *CGT_OpenCL::changeArgAddrspace(Function *F,
+                                         std::vector<unsigned> &Args,
+                                         unsigned addrspace) {
   unsigned idx = 0;
   std::vector<Type *> ArgTypes;
   for (Function::arg_iterator ai = F->arg_begin(), ae = F->arg_end(); ai != ae;
@@ -1986,7 +1995,7 @@ Function *CGT_NVPTX::changeArgAddrspace(Function *F,
 }
 
 /* Add metadata to module KernelM, for OpenCL kernels */
-void CGT_NVPTX::addCLMetadata(Function *F) {
+void CGT_OpenCL::addCLMetadata(Function *F) {
 
   IRBuilder<> Builder(&*F->begin());
 
@@ -2013,7 +2022,7 @@ void CGT_NVPTX::addCLMetadata(Function *F) {
   MDN_annotations->addOperand(MDNvvmAnnotationsNode);
 }
 
-void CGT_NVPTX::writeKernelsModule() {
+void CGT_OpenCL::writeKernelsModule() {
 
   // In addition to deleting all other functions, we also want to spiff it
   // up a little bit.  Do this now.
@@ -2035,7 +2044,7 @@ void CGT_NVPTX::writeKernelsModule() {
   Out.keep();
 }
 
-Function *CGT_NVPTX::transformFunctionToVoid(Function *F) {
+Function *CGT_OpenCL::transformFunctionToVoid(Function *F) {
 
   DEBUG(errs() << "Transforming function to void: " << F->getName() << "\n");
   // FIXME: Maybe do that using the Node?
@@ -2361,16 +2370,16 @@ static std::string getFilenameFromModule(const Module &M) {
   return moduleID.substr(moduleID.find_last_of("/") + 1);
 }
 
-// Changes the data layout of the Module to be compiled with NVPTX backend
+// Changes the data layout of the Module to be compiled with OpenCL backend
 // TODO: Figure out when to call it, probably after duplicating the modules
 static void changeDataLayout(Module &M) {
-  std::string nvptx32_layoutStr = "e-p:32:32-i64:64-v16:16-v32:32-n16:32:64";
-  std::string nvptx64_layoutStr = "e-i64:64-v16:16-v32:32-n16:32:64";
+  std::string opencl32_layoutStr = "e-p:32:32-i64:64-v16:16-v32:32-n16:32:64";
+  std::string opencl64_layoutStr = "e-i64:64-v16:16-v32:32-n16:32:64";
 
   if (TARGET_PTX == 32)
-    M.setDataLayout(StringRef(nvptx32_layoutStr));
+    M.setDataLayout(StringRef(opencl32_layoutStr));
   else if (TARGET_PTX == 64)
-    M.setDataLayout(StringRef(nvptx64_layoutStr));
+    M.setDataLayout(StringRef(opencl64_layoutStr));
   else
     assert(false && "Invalid PTX target");
 
@@ -2378,13 +2387,13 @@ static void changeDataLayout(Module &M) {
 }
 
 static void changeTargetTriple(Module &M) {
-  std::string nvptx32_TargetTriple = "nvptx--nvidiacl";
-  std::string nvptx64_TargetTriple = "nvptx64--nvidiacl";
+  std::string opencl32_TargetTriple = "opencl--nvidiacl";
+  std::string opencl64_TargetTriple = "opencl64--nvidiacl";
 
   if (TARGET_PTX == 32)
-    M.setTargetTriple(StringRef(nvptx32_TargetTriple));
+    M.setTargetTriple(StringRef(opencl32_TargetTriple));
   else if (TARGET_PTX == 64)
-    M.setTargetTriple(StringRef(nvptx64_TargetTriple));
+    M.setTargetTriple(StringRef(opencl64_TargetTriple));
   else
     assert(false && "Invalid PTX target");
 
@@ -2464,9 +2473,9 @@ static std::string getAtomicOpName(Intrinsic::ID ID) {
 
 } // End of namespace
 
-char DFG2LLVM_NVPTX::ID = 0;
-static RegisterPass<DFG2LLVM_NVPTX> X("dfg2llvm-nvptx",
-		"Dataflow Graph to LLVM for NVPTX Pass",
+char DFG2LLVM_OpenCL::ID = 0;
+static RegisterPass<DFG2LLVM_OpenCL> X("dfg2llvm-opencl",
+		"Dataflow Graph to LLVM for OpenCL Pass",
 		false /* does not modify the CFG */,
 		true /* transformation,   *
 					* not just analysis */);
diff --git a/hpvm/lib/Transforms/DFG2LLVM_X86/DFG2LLVM_X86.exports b/hpvm/lib/Transforms/DFG2LLVM_OpenCL/DFG2LLVM_OpenCL.exports
similarity index 100%
rename from hpvm/lib/Transforms/DFG2LLVM_X86/DFG2LLVM_X86.exports
rename to hpvm/lib/Transforms/DFG2LLVM_OpenCL/DFG2LLVM_OpenCL.exports
diff --git a/hpvm/lib/Transforms/DFG2LLVM_NVPTX/LLVMBuild.txt b/hpvm/lib/Transforms/DFG2LLVM_OpenCL/LLVMBuild.txt
similarity index 84%
rename from hpvm/lib/Transforms/DFG2LLVM_NVPTX/LLVMBuild.txt
rename to hpvm/lib/Transforms/DFG2LLVM_OpenCL/LLVMBuild.txt
index fb7cae49f8452ee6f207e6f0ed87d9ea9d3e65e6..08d8b9d98d66c63cb02b4be8395b57c448482906 100644
--- a/hpvm/lib/Transforms/DFG2LLVM_NVPTX/LLVMBuild.txt
+++ b/hpvm/lib/Transforms/DFG2LLVM_OpenCL/LLVMBuild.txt
@@ -1,4 +1,4 @@
-;===- ./lib/Transforms/DFG2LLVM_NVPTX/LLVMBuild.txt ------------*- Conf -*--===;
+;===- ./lib/Transforms/DFG2LLVM_OpenCL/LLVMBuild.txt ------------*- Conf -*--===;
 ;
 ;                     The LLVM Compiler Infrastructure
 ;
@@ -17,5 +17,5 @@
 
 [component_0]
 type = Library
-name = DFG2LLVM_NVPTX
+name = DFG2LLVM_OpenCL
 parent = Transforms
diff --git a/hpvm/lib/Transforms/GenHPVM/GenHPVM.cpp b/hpvm/lib/Transforms/GenHPVM/GenHPVM.cpp
index d6ad357a33dee7014ef703c7abba00c28b325378..864dad58cb507fb49ddf3bdc9fcafefa57f866d7 100644
--- a/hpvm/lib/Transforms/GenHPVM/GenHPVM.cpp
+++ b/hpvm/lib/Transforms/GenHPVM/GenHPVM.cpp
@@ -6,6 +6,12 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// This pass takes LLVM IR with HPVM-C functions to generate textual representa-
+// -tion for HPVM IR consisting of HPVM intrinsics. Memory-to-register
+// optimization pass is expected to execute prior to execution of this pass.
+//
+//===----------------------------------------------------------------------===//
 
 #define DEBUG_TYPE "genhpvm"
 #include "GenHPVM/GenHPVM.h"
diff --git a/hpvm/lib/Transforms/LocalMem/LocalMem.cpp b/hpvm/lib/Transforms/LocalMem/LocalMem.cpp
index fc33ebee71123d89c5f931901dd213c82a401941..c0823274266feba5c78df3cead41c86e80b8a542 100644
--- a/hpvm/lib/Transforms/LocalMem/LocalMem.cpp
+++ b/hpvm/lib/Transforms/LocalMem/LocalMem.cpp
@@ -6,6 +6,12 @@
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
+//
+// This pass traverses the dataflow graph to recognize the allocation nodes
+// which allocate scratch memory. This pass does not make changes to the textual
+// representation of HPVM IR.
+//
+//===----------------------------------------------------------------------===//
 
 #define DEBUG_TYPE "LocalMem"
 #include "SupportHPVM/DFG2LLVM.h"
diff --git a/hpvm/projects/hpvm-rt/hpvm-rt.cpp b/hpvm/projects/hpvm-rt/hpvm-rt.cpp
index e0e017c03e017edef7e6c1dfed17ceb8db9d2ba5..b6273ec2cca712469269f68f538ce437e9b062ec 100644
--- a/hpvm/projects/hpvm-rt/hpvm-rt.cpp
+++ b/hpvm/projects/hpvm-rt/hpvm-rt.cpp
@@ -39,7 +39,7 @@ typedef struct {
   std::vector<CircularBuffer<uint64_t> *> *BindOutputBuffers;
   std::vector<CircularBuffer<uint64_t> *> *EdgeBuffers;
   std::vector<CircularBuffer<uint64_t> *> *isLastInputBuffers;
-} DFNodeContext_X86;
+} DFNodeContext_CPU;
 
 typedef struct {
   cl_context clOCLContext;
@@ -212,7 +212,7 @@ static inline void checkErr(cl_int err, cl_int success, const char *name) {
 
 /************************* Depth Stack Routines ***************************/
 
-void llvm_hpvm_x86_dstack_push(unsigned n, uint64_t limitX, uint64_t iX,
+void llvm_hpvm_cpu_dstack_push(unsigned n, uint64_t limitX, uint64_t iX,
                                uint64_t limitY, uint64_t iY, uint64_t limitZ,
                                uint64_t iZ) {
   DEBUG(cout << "Pushing node information on stack:\n");
@@ -226,7 +226,7 @@ void llvm_hpvm_x86_dstack_push(unsigned n, uint64_t limitX, uint64_t iX,
   pthread_mutex_unlock(&ocl_mtx);
 }
 
-void llvm_hpvm_x86_dstack_pop() {
+void llvm_hpvm_cpu_dstack_pop() {
   DEBUG(cout << "Popping from depth stack\n");
   pthread_mutex_lock(&ocl_mtx);
   DStack.pop_back();
@@ -234,7 +234,7 @@ void llvm_hpvm_x86_dstack_pop() {
   pthread_mutex_unlock(&ocl_mtx);
 }
 
-uint64_t llvm_hpvm_x86_getDimLimit(unsigned level, unsigned dim) {
+uint64_t llvm_hpvm_cpu_getDimLimit(unsigned level, unsigned dim) {
   DEBUG(cout << "Request limit for dim " << dim << " of ancestor " << level
              << flush << "\n");
   pthread_mutex_lock(&ocl_mtx);
@@ -246,7 +246,7 @@ uint64_t llvm_hpvm_x86_getDimLimit(unsigned level, unsigned dim) {
   return result;
 }
 
-uint64_t llvm_hpvm_x86_getDimInstance(unsigned level, unsigned dim) {
+uint64_t llvm_hpvm_cpu_getDimInstance(unsigned level, unsigned dim) {
   DEBUG(cout << "Request instance id for dim " << dim << " of ancestor "
              << level << flush << "\n");
   pthread_mutex_lock(&ocl_mtx);
@@ -350,13 +350,13 @@ static void *llvm_hpvm_ocl_request_mem(void *ptr, size_t size,
   return d_input;
 }
 
-void *llvm_hpvm_x86_argument_ptr(void *ptr, size_t size) {
+void *llvm_hpvm_cpu_argument_ptr(void *ptr, size_t size) {
   return llvm_hpvm_request_mem(ptr, size);
 }
 
 void *llvm_hpvm_request_mem(void *ptr, size_t size) {
   pthread_mutex_lock(&ocl_mtx);
-  DEBUG(cout << "[X86] Request memory: " << ptr << flush << "\n");
+  DEBUG(cout << "[CPU] Request memory: " << ptr << flush << "\n");
   MemTrackerEntry *MTE = MTracker.lookup(ptr);
   if (MTE == NULL) {
     cout << "ERROR: Requesting memory not present in Table\n";
@@ -1152,8 +1152,8 @@ void hpvm_DestroyTimerSet(struct hpvm_TimerSet *timers) {
 
 // Launch API for a streaming dataflow graph
 void *llvm_hpvm_streamLaunch(void (*LaunchFunc)(void *, void *), void *args) {
-  DFNodeContext_X86 *Context =
-      (DFNodeContext_X86 *)malloc(sizeof(DFNodeContext_X86));
+  DFNodeContext_CPU *Context =
+      (DFNodeContext_CPU *)malloc(sizeof(DFNodeContext_CPU));
 
   Context->threads = new std::vector<pthread_t>();
   Context->ArgInPortSizeMap = new std::map<unsigned, uint64_t>();
@@ -1176,7 +1176,7 @@ void *llvm_hpvm_streamLaunch(void (*LaunchFunc)(void *, void *), void *args) {
 void llvm_hpvm_streamPush(void *graphID, void *args) {
   DEBUG(cout << "StreamPush -- Graph: " << graphID << ", Arguments: " << args
              << flush << "\n");
-  DFNodeContext_X86 *Ctx = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Ctx = (DFNodeContext_CPU *)graphID;
   unsigned offset = 0;
   for (unsigned i = 0; i < Ctx->ArgInPortSizeMap->size(); i++) {
     uint64_t element;
@@ -1198,7 +1198,7 @@ void llvm_hpvm_streamPush(void *graphID, void *args) {
 // Pop API for a streaming dataflow graph
 void *llvm_hpvm_streamPop(void *graphID) {
   DEBUG(cout << "StreamPop -- Graph: " << graphID << flush << "\n");
-  DFNodeContext_X86 *Ctx = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Ctx = (DFNodeContext_CPU *)graphID;
   unsigned totalBytes = 0;
   for (uint64_t size : *(Ctx->BindOutSizes))
     totalBytes += size;
@@ -1216,7 +1216,7 @@ void *llvm_hpvm_streamPop(void *graphID) {
 // Wait API for a streaming dataflow graph
 void llvm_hpvm_streamWait(void *graphID) {
   DEBUG(cout << "StreamWait -- Graph: " << graphID << flush << "\n");
-  DFNodeContext_X86 *Ctx = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Ctx = (DFNodeContext_CPU *)graphID;
   // Push garbage to all other input buffers
   for (unsigned i = 0; i < Ctx->BindInputBuffers->size(); i++) {
     uint64_t element = 0;
@@ -1235,7 +1235,7 @@ void *llvm_hpvm_createBindInBuffer(void *graphID, uint64_t size,
                                    unsigned inArgPort) {
   DEBUG(cout << "Create BindInBuffer -- Graph: " << graphID
              << ", Size: " << size << flush << "\n");
-  DFNodeContext_X86 *Context = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Context = (DFNodeContext_CPU *)graphID;
   CircularBuffer<uint64_t> *bufferID =
       new CircularBuffer<uint64_t>(BUFFER_SIZE, "BindIn");
   DEBUG(cout << "\tNew Buffer: " << bufferID << flush << "\n");
@@ -1249,7 +1249,7 @@ void *llvm_hpvm_createBindInBuffer(void *graphID, uint64_t size,
 void *llvm_hpvm_createBindOutBuffer(void *graphID, uint64_t size) {
   DEBUG(cout << "Create BindOutBuffer -- Graph: " << graphID
              << ", Size: " << size << flush << "\n");
-  DFNodeContext_X86 *Context = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Context = (DFNodeContext_CPU *)graphID;
   // Twine name = Twine("Bind.Out.")+Twine(Context->BindOutputBuffers->size());
   CircularBuffer<uint64_t> *bufferID =
       new CircularBuffer<uint64_t>(BUFFER_SIZE, "BindOut");
@@ -1261,7 +1261,7 @@ void *llvm_hpvm_createBindOutBuffer(void *graphID, uint64_t size) {
 void *llvm_hpvm_createEdgeBuffer(void *graphID, uint64_t size) {
   DEBUG(cout << "Create EdgeBuffer -- Graph: " << graphID << ", Size: " << size
              << flush << "\n");
-  DFNodeContext_X86 *Context = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Context = (DFNodeContext_CPU *)graphID;
   // Twine name = Twine("Edge.")+Twine(Context->EdgeBuffers->size());
   CircularBuffer<uint64_t> *bufferID =
       new CircularBuffer<uint64_t>(BUFFER_SIZE, "Edge");
@@ -1274,7 +1274,7 @@ void *llvm_hpvm_createEdgeBuffer(void *graphID, uint64_t size) {
 void *llvm_hpvm_createLastInputBuffer(void *graphID, uint64_t size) {
   DEBUG(cout << "Create isLastInputBuffer -- Graph: " << graphID
              << ", Size: " << size << flush << "\n");
-  DFNodeContext_X86 *Context = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Context = (DFNodeContext_CPU *)graphID;
   // Twine name = Twine("isLastInput.")+Twine(Context->EdgeBuffers->size());
   CircularBuffer<uint64_t> *bufferID =
       new CircularBuffer<uint64_t>(BUFFER_SIZE, "LastInput");
@@ -1286,7 +1286,7 @@ void *llvm_hpvm_createLastInputBuffer(void *graphID, uint64_t size) {
 // Free buffers
 void llvm_hpvm_freeBuffers(void *graphID) {
   DEBUG(cout << "Free all buffers -- Graph: " << graphID << flush << "\n");
-  DFNodeContext_X86 *Context = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Context = (DFNodeContext_CPU *)graphID;
   for (CircularBuffer<uint64_t> *bufferID : *(Context->BindInputBuffers))
     delete bufferID;
   for (CircularBuffer<uint64_t> *bufferID : *(Context->BindOutputBuffers))
@@ -1314,7 +1314,7 @@ void llvm_hpvm_createThread(void *graphID, void *(*Func)(void *),
                             void *arguments) {
   DEBUG(cout << "Create Thread -- Graph: " << graphID << ", Func: " << Func
              << ", Args: " << arguments << flush << "\n");
-  DFNodeContext_X86 *Ctx = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Ctx = (DFNodeContext_CPU *)graphID;
   int err;
   pthread_t threadID;
   if ((err = pthread_create(&threadID, NULL, Func, arguments)) != 0)
@@ -1326,16 +1326,16 @@ void llvm_hpvm_createThread(void *graphID, void *(*Func)(void *),
 // Wait for thread to finish
 void llvm_hpvm_freeThreads(void *graphID) {
   DEBUG(cout << "Free Threads -- Graph: " << graphID << flush << "\n");
-  DFNodeContext_X86 *Ctx = (DFNodeContext_X86 *)graphID;
+  DFNodeContext_CPU *Ctx = (DFNodeContext_CPU *)graphID;
   for (pthread_t thread : *(Ctx->threads))
     pthread_join(thread, NULL);
 }
 
 /************************ OPENCL & PTHREAD API ********************************/
 
-void *llvm_hpvm_x86_launch(void *(*rootFunc)(void *), void *arguments) {
-  DFNodeContext_X86 *Context =
-      (DFNodeContext_X86 *)malloc(sizeof(DFNodeContext_X86));
+void *llvm_hpvm_cpu_launch(void *(*rootFunc)(void *), void *arguments) {
+  DFNodeContext_CPU *Context =
+      (DFNodeContext_CPU *)malloc(sizeof(DFNodeContext_CPU));
   // int err;
   // if((err = pthread_create(&Context->threadID, NULL, rootFunc, arguments)) !=
   // 0) cout << "Failed to create pthread. Error code = " << err << flush <<
@@ -1344,9 +1344,9 @@ void *llvm_hpvm_x86_launch(void *(*rootFunc)(void *), void *arguments) {
   return Context;
 }
 
-void llvm_hpvm_x86_wait(void *graphID) {
+void llvm_hpvm_cpu_wait(void *graphID) {
   DEBUG(cout << "Waiting for pthread to finish ...\n");
-  // DFNodeContext_X86* Context = (DFNodeContext_X86*) graphID;
+  // DFNodeContext_CPU* Context = (DFNodeContext_CPU*) graphID;
   // pthread_join(Context->threadID, NULL);
   free(graphID);
   DEBUG(cout << "\t... pthread Done!\n");
@@ -1451,8 +1451,7 @@ void *llvm_hpvm_ocl_initContext(enum hpvm::Target T) {
     DEBUG(cout << "\tNAME = " << buffer << flush << "\n");
     clGetPlatformInfo(platformId, CL_PLATFORM_VENDOR, 10240, buffer, NULL);
     DEBUG(cout << "\tVENDOR = " << buffer << flush << "\n");
-    clGetPlatformInfo(platformId, CL_PLATFORM_EXTENSIONS, 10240, buffer,
-                      NULL);
+    clGetPlatformInfo(platformId, CL_PLATFORM_EXTENSIONS, 10240, buffer, NULL);
     DEBUG(cout << "\tEXTENSIONS = " << buffer << flush << "\n");
   } else {
     platformId = findPlatform("intel");
@@ -1466,8 +1465,7 @@ void *llvm_hpvm_ocl_initContext(enum hpvm::Target T) {
     DEBUG(cout << "\tNAME = " << buffer << flush << "\n");
     clGetPlatformInfo(platformId, CL_PLATFORM_VENDOR, 10240, buffer, NULL);
     DEBUG(cout << "\tVENDOR = " << buffer << flush << "\n");
-    clGetPlatformInfo(platformId, CL_PLATFORM_EXTENSIONS, 10240, buffer,
-                      NULL);
+    clGetPlatformInfo(platformId, CL_PLATFORM_EXTENSIONS, 10240, buffer, NULL);
     DEBUG(cout << "\tEXTENSIONS = " << buffer << flush << "\n");
   }
   DEBUG(cout << "Found plarform with id: " << platformId << "\n");
@@ -1483,7 +1481,7 @@ void *llvm_hpvm_ocl_initContext(enum hpvm::Target T) {
   errcode = clGetContextInfo(globalOCLContext, CL_CONTEXT_DEVICES, 0, NULL,
                              &dataBytes);
   checkErr(errcode, CL_SUCCESS, "Failure to get context info length");
-  
+
   DEBUG(cout << "Got databytes: " << dataBytes << "\n");
 
   clDevices = (cl_device_id *)malloc(dataBytes);
diff --git a/hpvm/projects/hpvm-rt/hpvm-rt.h b/hpvm/projects/hpvm-rt/hpvm-rt.h
index 519b467c9047fbbdeea3a4610bedda3a77c36fe2..94fe5b5ef0d82aca9f7556f7022aa513b9d2cc28 100644
--- a/hpvm/projects/hpvm-rt/hpvm-rt.h
+++ b/hpvm/projects/hpvm-rt/hpvm-rt.h
@@ -64,12 +64,12 @@ public:
   unsigned getNumDim() const { return numDim; }
 };
 
-void llvm_hpvm_x86_dstack_push(unsigned n, uint64_t limitX = 0, uint64_t iX = 0,
+void llvm_hpvm_cpu_dstack_push(unsigned n, uint64_t limitX = 0, uint64_t iX = 0,
                                uint64_t limitY = 0, uint64_t iY = 0,
                                uint64_t limitZ = 0, uint64_t iZ = 0);
-void llvm_hpvm_x86_dstack_pop();
-uint64_t llvm_hpvm_x86_getDimLimit(unsigned level, unsigned dim);
-uint64_t llvm_hpvm_x86_getDimInstance(unsigned level, unsigned dim);
+void llvm_hpvm_cpu_dstack_pop();
+uint64_t llvm_hpvm_cpu_getDimLimit(unsigned level, unsigned dim);
+uint64_t llvm_hpvm_cpu_getDimInstance(unsigned level, unsigned dim);
 
 /********************* Memory Tracker **********************************/
 class MemTrackerEntry {
@@ -148,11 +148,11 @@ void llvm_hpvm_untrack_mem(void *);
 void *llvm_hpvm_request_mem(void *, size_t);
 
 /*********************** OPENCL & PTHREAD API **************************/
-void *llvm_hpvm_x86_launch(void *(void *), void *);
-void llvm_hpvm_x86_wait(void *);
+void *llvm_hpvm_cpu_launch(void *(void *), void *);
+void llvm_hpvm_cpu_wait(void *);
 void *llvm_hpvm_ocl_initContext(enum hpvm::Target);
 
-void *llvm_hpvm_x86_argument_ptr(void *, size_t);
+void *llvm_hpvm_cpu_argument_ptr(void *, size_t);
 
 void llvm_hpvm_ocl_clearContext(void *);
 void llvm_hpvm_ocl_argument_shared(void *, int, size_t);
diff --git a/hpvm/projects/llvm-cbe/lib/Target/CBackend/CBackend.cpp b/hpvm/projects/llvm-cbe/lib/Target/CBackend/CBackend.cpp
index 50a7e6848350ef99c96f56cb5ac6d2d75308f398..7e05ac5ee2ab09b282b4b17c17f8e514e223c427 100644
--- a/hpvm/projects/llvm-cbe/lib/Target/CBackend/CBackend.cpp
+++ b/hpvm/projects/llvm-cbe/lib/Target/CBackend/CBackend.cpp
@@ -29,15 +29,8 @@
 
 #include <iostream>
 
-//#include "PHINodePass.h"
+#define DEBUG_TYPE "cbe"
 
-// Jackson Korba 9/29/14
-#ifndef DEBUG_TYPE
-#define DEBUG_TYPE ""
-#endif
-// End Modification
-
-#define DEBUG(x) x
 // Some ms header decided to define setjmp as _setjmp, undo this for this file
 // since we don't need it
 #ifdef setjmp
@@ -149,7 +142,7 @@ bool CWriter::isInlineAsm(Instruction &I) const {
 bool CWriter::runOnFunction(Function &F) {
   // Do not codegen any 'available_externally' functions at all, they have
   // definitions outside the translation unit.
-  errs() << "Running CBE on function: " << F.getName() << "\n";
+  DEBUG(errs() << "Running CBE on function: " << F.getName() << "\n");
   if (F.hasAvailableExternallyLinkage())
     return false;
 
@@ -262,9 +255,7 @@ raw_ostream &CWriter::printTypeString(raw_ostream &Out, Type *Ty,
   }
 
   default:
-#ifndef NDEBUG
-    errs() << "Unknown primitive type: " << *Ty << "\n";
-#endif
+    DEBUG(errs() << "Unknown primitive type: " << *Ty << "\n");
     llvm_unreachable(0);
   }
 }
@@ -367,9 +358,7 @@ static const std::string getCmpPredicateName(CmpInst::Predicate P) {
   case ICmpInst::ICMP_SGT:
     return "sgt";
   default:
-#ifndef NDEBUG
-    errs() << "Invalid icmp predicate!" << P;
-#endif
+    DEBUG(errs() << "Invalid icmp predicate!" << P);
     llvm_unreachable(0);
   }
 }
@@ -414,9 +403,7 @@ raw_ostream &CWriter::printSimpleType(raw_ostream &Out, Type *Ty,
                << " __attribute__((vector_size(8)))";
 
   default:
-#ifndef NDEBUG
-    errs() << "Unknown primitive type: " << *Ty << "\n";
-#endif
+    DEBUG(errs() << "Unknown primitive type: " << *Ty << "\n");
     llvm_unreachable(0);
   }
 }
@@ -462,9 +449,7 @@ CWriter::printTypeName(raw_ostream &Out, Type *Ty, bool isSigned,
   }
 
   default:
-#ifndef NDEBUG
-    errs() << "Unexpected type: " << *Ty << "\n";
-#endif
+    DEBUG(errs() << "Unexpected type: " << *Ty << "\n");
     llvm_unreachable(0);
   }
 }
@@ -1067,9 +1052,8 @@ void CWriter::printConstant(Constant *CPV, enum OperandContext Context) {
       return;
     }
     default:
-#ifndef NDEBUG
-      errs() << "CWriter Error: Unhandled constant expression: " << *CE << "\n";
-#endif
+      DEBUG(errs() << "CWriter Error: Unhandled constant expression: " << *CE
+                   << "\n");
       llvm_unreachable(0);
     }
   } else if (isa<UndefValue>(CPV) && CPV->getType()->isSingleValueType()) {
@@ -1344,9 +1328,7 @@ void CWriter::printConstant(Constant *CPV, enum OperandContext Context) {
     }
     // FALL THROUGH
   default:
-#ifndef NDEBUG
-    errs() << "Unknown constant type: " << *CPV << "\n";
-#endif
+    DEBUG(errs() << "Unknown constant type: " << *CPV << "\n");
     llvm_unreachable(0);
   }
 }
@@ -2447,9 +2429,7 @@ void CWriter::generateHeader(Module &M) {
           Out << " > ";
           break;
         default:
-#ifndef NDEBUG
-          errs() << "Invalid icmp predicate!" << (*it).first;
-#endif
+          DEBUG(errs() << "Invalid icmp predicate!" << (*it).first);
           llvm_unreachable(0);
         }
         Out << "r.vector[" << n << "];\n";
@@ -2688,9 +2668,7 @@ void CWriter::generateHeader(Module &M) {
             Out << " >> ";
             break;
           default:
-#ifndef NDEBUG
-            errs() << "Invalid operator type!" << opcode;
-#endif
+            DEBUG(errs() << "Invalid operator type!" << opcode);
             llvm_unreachable(0);
           }
           Out << "b.vector[" << n << "]";
@@ -2746,9 +2724,7 @@ void CWriter::generateHeader(Module &M) {
           Out << " >> ";
           break;
         default:
-#ifndef NDEBUG
-          errs() << "Invalid operator type!" << opcode;
-#endif
+          DEBUG(errs() << "Invalid operator type!" << opcode);
           llvm_unreachable(0);
         }
         Out << "b;\n";
@@ -2821,9 +2797,7 @@ void CWriter::generateHeader(Module &M) {
           Out << "AShr";
           break;
         default:
-#ifndef NDEBUG
-          errs() << "Invalid operator type!" << opcode;
-#endif
+          DEBUG(errs() << "Invalid operator type!" << opcode);
           llvm_unreachable(0);
         }
         Out << "(16, &a, &b, &r);\n";
@@ -2888,9 +2862,7 @@ void CWriter::generateHeader(Module &M) {
           Out << " >> ";
           break;
         default:
-#ifndef NDEBUG
-          errs() << "Invalid operator type!" << opcode;
-#endif
+          DEBUG(errs() << "Invalid operator type!" << opcode);
           llvm_unreachable(0);
         }
         Out << "b";
@@ -3569,7 +3541,7 @@ void CWriter::printLoop(Loop *L) {
                       Loop::LoopBounds::Direction::Increasing)
                          ? "increasing"
                          : "decreasing")
-                 << "\n";)
+                 << "\n");
 
     std::string startStr;
     if (ConstantInt *startConst = dyn_cast<ConstantInt>(StartValue)) {
@@ -3592,7 +3564,7 @@ void CWriter::printLoop(Loop *L) {
 
     DEBUG(errs() << "\n  for ( " << IVName << " = " << startStr << "; "
                  << IVName << BranchPredicate << finalStr << "; " << IVName
-                 << " = " << IVName << " + " << stepStr << ") {\n";)
+                 << " = " << IVName << " + " << stepStr << ") {\n");
 
     Out << "\n  for ( " << IVName << " = " << startStr << "; " << IVName
         << BranchPredicate << finalStr << "; " << IVName << " = " << IVName
@@ -4039,7 +4011,7 @@ bool CWriter::findMatch(BasicBlock *CurrBlock, BasicBlock *CompBlock,
 // that immediately succeeds the current one.
 //
 void CWriter::visitBranchInst(BranchInst &I) {
-  errs() << "Visiting Branch Instruction: " << I << "\n";
+  DEBUG(errs() << "Visiting Branch Instruction: " << I << "\n");
   Out << "\n/* Branch: " << I << " */\n";
 
   if (I.isConditional()) {
@@ -4055,12 +4027,13 @@ void CWriter::visitBranchInst(BranchInst &I) {
     }
     if (Loop *L = LI->getLoopFor(I.getParent())) {
       if (L == LI->getLoopFor(BB0) && !(L == LI->getLoopFor(BB1))) {
-        errs() << "This is a loop branch!\n";
+        DEBUG(errs() << "This is a loop branch!\n");
         Out << "/* This is a loop branch! */\n";
         // BB0 is in the loop. Print it if it hsn't been printed
         if (VisitedBlocks.find(BB0) != VisitedBlocks.end()) {
-          errs() << "Branching back to header: " << BB0->getName() << "\n";
-          errs() << "This is the end of the loop, closing!\n";
+          DEBUG(errs() << "Branching back to header: " << BB0->getName()
+                       << "\n");
+          DEBUG(errs() << "This is the end of the loop, closing!\n");
           Out << "/* Branching back to header: " << BB0->getName() << " */\n";
           Out << "/* Closing loop! */\n";
           // BB0 is the loop header. CLose the loop then print BB1.
@@ -4069,18 +4042,19 @@ void CWriter::visitBranchInst(BranchInst &I) {
           printPHICopiesForSuccessor(I.getParent(), BB1, 2);
           printBBorLoop(BB1);
         } else {
-          errs() << "Not branching to header! Branching to: " << BB0->getName()
-                 << "\n";
+          DEBUG(errs() << "Not branching to header! Branching to: "
+                       << BB0->getName() << "\n");
           // BB0 is not the loop header. That means we are entering loop body
 
           llvm_unreachable("loop branch unhandled!\n");
         }
       } else if (L == LI->getLoopFor(BB1) && !(L == LI->getLoopFor(BB0))) {
-        errs() << "This is a loop branch!\n";
+        DEBUG(errs() << "This is a loop branch!\n");
         Out << "/* This is a loop branch! */\n";
         if (VisitedBlocks.find(BB1) != VisitedBlocks.end()) {
-          errs() << "Branching back to header: " << BB1->getName() << "\n";
-          errs() << "This is the end of the loop, closing!\n";
+          DEBUG(errs() << "Branching back to header: " << BB1->getName()
+                       << "\n");
+          DEBUG(errs() << "This is the end of the loop, closing!\n");
           Out << "/* Branching back to header: " << BB1->getName() << " */\n";
           Out << "/* Closing loop! */\n";
           // BB0 is the loop header. CLose the loop then print BB1.
@@ -4089,22 +4063,23 @@ void CWriter::visitBranchInst(BranchInst &I) {
           printPHICopiesForSuccessor(I.getParent(), BB0, 2);
           printBBorLoop(BB0);
         } else {
-          errs() << "Not branching to header! Branching to: " << BB1->getName()
-                 << "\n";
+          DEBUG(errs() << "Not branching to header! Branching to: "
+                       << BB1->getName() << "\n");
           // BB1 is not the loop header. That means we are entering loop body
           llvm_unreachable("loop branch unhandled!\n");
         }
       } else {
-        errs() << "This is a conditional statement within a loop!\n";
+        DEBUG(errs() << "This is a conditional statement within a loop!\n");
         Out << "/* This is a conditional statement within a loop! */\n";
-        errs() << ImmPostDomm->getName()
-               << " is the immediate post dominator of " << BB0->getName()
-               << " and " << BB1->getName() << "\n";
+        DEBUG(errs() << ImmPostDomm->getName()
+                     << " is the immediate post dominator of " << BB0->getName()
+                     << " and " << BB1->getName() << "\n");
         if (VisitedBlocks.find(ImmPostDomm) != VisitedBlocks.end()) {
-          errs() << "Not pushing " << ImmPostDomm->getName()
-                 << " because it has already been visited!\n";
+          DEBUG(errs() << "Not pushing " << ImmPostDomm->getName()
+                       << " because it has already been visited!\n");
         } else {
-          errs() << "Pushing " << ImmPostDomm->getName() << " onto stack!\n";
+          DEBUG(errs() << "Pushing " << ImmPostDomm->getName()
+                       << " onto stack!\n");
           ImmPostDommBlocks.push(ImmPostDomm);
         }
 
@@ -4117,25 +4092,25 @@ void CWriter::visitBranchInst(BranchInst &I) {
         Out << ") { /* " << I << "*/\n";
         printPHICopiesForSuccessor(I.getParent(), BB0, 2);
         printBBorLoop(BB0);
-        errs() << "Back to handling " << I.getParent()->getName() << ": " << I
-               << "\n";
+        DEBUG(errs() << "Back to handling " << I.getParent()->getName() << ": "
+                     << I << "\n");
         Out << "/* Back to handling " << I.getParent()->getName() << ": " << I
             << " */\n";
         if (!noElse) {
-          errs() << "Printing else!\n";
+          DEBUG(errs() << "Printing else!\n");
           Out << "  } else { /*" << I << "*/\n";
           printPHICopiesForSuccessor(I.getParent(), BB1, 2);
           ElseBlocks.push(BB1);
           ElseBranches.push(&I);
           printBBorLoop(BB1);
-          errs() << "Back to handling " << I.getParent()->getName() << ": " << I
-                 << "\n";
-          errs() << "Check to see if else block is closed!\n";
+          DEBUG(errs() << "Back to handling " << I.getParent()->getName()
+                       << ": " << I << "\n");
+          DEBUG(errs() << "Check to see if else block is closed!\n");
           Out << "/* Back to handling " << I.getParent()->getName() << ": " << I
               << " */\n";
           Out << "/* Check to see if else block is closed! */\n";
           if (!ElseBlocks.empty() && ElseBlocks.top() == BB1) {
-            errs() << "Else block not closed, need to close braces!\n";
+            DEBUG(errs() << "Else block not closed, need to close braces!\n");
             Out << "/* Else block not closed, need to close braces! */\n";
             Out << "} /* closing " << *(ElseBranches.top()) << " */\n";
             ElseBranches.pop();
@@ -4143,20 +4118,20 @@ void CWriter::visitBranchInst(BranchInst &I) {
           }
           if (!ImmPostDommBlocks.empty() &&
               ImmPostDommBlocks.top() == ImmPostDomm) {
-            errs() << "Will now pop post dom them handle it!\n";
+            DEBUG(errs() << "Will now pop post dom them handle it!\n");
             ImmPostDommBlocks.pop();
             printBBorLoop(ImmPostDomm);
           } else {
-            errs()
-                << "*!*!*!*!*!*!Not sure what is happening here!*!*!*!*!*!*!\n";
+            DEBUG(errs() << "*!*!*!*!*!*!Not sure what is happening "
+                            "here!*!*!*!*!*!*!\n");
           }
         } else {
-          errs() << "No else block. Adding one for phis, then moving to "
-                 << BB1->getName() << "!\n";
+          DEBUG(errs() << "No else block. Adding one for phis, then moving to "
+                       << BB1->getName() << "!\n");
           Out << "/* (3913) No else block. Adding one for phis, then moving to "
               << BB1->getName() << "! */\n";
           Out << "  } /* closing " << I << "*/\n";
-          errs() << "Will now pop post dom them handle it!\n";
+          DEBUG(errs() << "Will now pop post dom them handle it!\n");
           ImmPostDommBlocks.pop();
           Out << "else {\n";
           printPHICopiesForSuccessor(I.getParent(), BB1, 2);
@@ -4165,14 +4140,16 @@ void CWriter::visitBranchInst(BranchInst &I) {
         }
       }
     } else {
-      errs() << "This is a conditional statement!\n";
-      errs() << ImmPostDomm->getName() << " is the immediate post dominator of "
-             << BB0->getName() << " and " << BB1->getName() << "\n";
+      DEBUG(errs() << "This is a conditional statement!\n");
+      DEBUG(errs() << ImmPostDomm->getName()
+                   << " is the immediate post dominator of " << BB0->getName()
+                   << " and " << BB1->getName() << "\n");
       if (VisitedBlocks.find(ImmPostDomm) != VisitedBlocks.end()) {
-        errs() << "Not pushing " << ImmPostDomm->getName()
-               << " because it has already been visited!\n";
+        DEBUG(errs() << "Not pushing " << ImmPostDomm->getName()
+                     << " because it has already been visited!\n");
       } else {
-        errs() << "Pushing " << ImmPostDomm->getName() << " onto stack!\n";
+        DEBUG(errs() << "Pushing " << ImmPostDomm->getName()
+                     << " onto stack!\n");
         ImmPostDommBlocks.push(ImmPostDomm);
       }
       bool noElse = false;
@@ -4184,26 +4161,26 @@ void CWriter::visitBranchInst(BranchInst &I) {
       Out << ") { /* " << I << "*/\n";
       printPHICopiesForSuccessor(I.getParent(), BB0, 2);
       printBBorLoop(BB0);
-      errs() << "Back to handling " << I.getParent()->getName() << ": " << I
-             << "\n";
+      DEBUG(errs() << "Back to handling " << I.getParent()->getName() << ": "
+                   << I << "\n");
       Out << "/* Back to handling " << I.getParent()->getName() << ": " << I
           << " */\n";
       if (!noElse) {
-        errs() << "Printing else!\n";
+        DEBUG(errs() << "Printing else!\n");
         Out << "/* Printing else! */\n";
         Out << "  } else { /*" << I << "*/\n";
         printPHICopiesForSuccessor(I.getParent(), BB1, 2);
         ElseBlocks.push(BB1);
         ElseBranches.push(&I);
         printBBorLoop(BB1);
-        errs() << "Back to handling " << I.getParent()->getName() << ": " << I
-               << "\n";
-        errs() << "Check to see if else block is closed!\n";
+        DEBUG(errs() << "Back to handling " << I.getParent()->getName() << ": "
+                     << I << "\n");
+        DEBUG(errs() << "Check to see if else block is closed!\n");
         Out << "/* Back to handling " << I.getParent()->getName() << ": " << I
             << " */\n";
         Out << "/* Check to see if else block is closed! */\n";
         if (!ElseBlocks.empty() && ElseBlocks.top() == BB1) {
-          errs() << "Else block not closed, need to close braces!\n";
+          DEBUG(errs() << "Else block not closed, need to close braces!\n");
           Out << "/* Else block not closed, need to close braces! */\n";
           Out << "} /* closing " << *(ElseBranches.top()) << " */\n";
           ElseBranches.pop();
@@ -4211,20 +4188,21 @@ void CWriter::visitBranchInst(BranchInst &I) {
         }
         if (!ImmPostDommBlocks.empty() &&
             ImmPostDommBlocks.top() == ImmPostDomm) {
-          errs() << "Will now pop post dom them handle it!\n";
+          DEBUG(errs() << "Will now pop post dom them handle it!\n");
           ImmPostDommBlocks.pop();
           printBBorLoop(ImmPostDomm);
         } else {
-          errs()
-              << "*!*!*!*!*!*!Not sure what is happening here!*!*!*!*!*!*!\n";
+          DEBUG(
+              errs()
+              << "*!*!*!*!*!*!Not sure what is happening here!*!*!*!*!*!*!\n");
         }
       } else {
-        errs() << "No else block. Adding one for phis, then moving to "
-               << BB1->getName() << "!\n";
+        DEBUG(errs() << "No else block. Adding one for phis, then moving to "
+                     << BB1->getName() << "!\n");
         Out << "/* (3985) No else block. Adding one for phis, then moving to "
             << BB1->getName() << "! */\n";
         Out << "  } /* closing " << I << "*/\n";
-        errs() << "Will now pop post dom them handle it!\n";
+        DEBUG(errs() << "Will now pop post dom them handle it!\n");
         ImmPostDommBlocks.pop();
         Out << "else {\n";
         printPHICopiesForSuccessor(I.getParent(), BB1, 2);
@@ -4233,11 +4211,12 @@ void CWriter::visitBranchInst(BranchInst &I) {
       }
     }
   } else {
-    errs() << "This is an unconditional branch!\n";
+    DEBUG(errs() << "This is an unconditional branch!\n");
     BasicBlock *BB = I.getSuccessor(0);
     printPHICopiesForSuccessor(I.getParent(), BB, 2);
     if (!ElseBlocks.empty() && I.getParent() == ElseBlocks.top()) {
-      errs() << "Branch marks end of else block, need to close braces!\n";
+      DEBUG(
+          errs() << "Branch marks end of else block, need to close braces!\n");
       Out << "/* Branch marks end of else block, need to close braces! */\n";
       Out << "} /* closing " << *(ElseBranches.top()) << " */\n";
       ElseBranches.pop();
@@ -4470,9 +4449,7 @@ void CWriter::visitBinaryOperator(BinaryOperator &I) {
       Out << " >> ";
       break;
     default:
-#ifndef NDEBUG
-      errs() << "Invalid operator type!" << I;
-#endif
+      DEBUG(errs() << "Invalid operator type!" << I);
       llvm_unreachable(0);
     }
 
@@ -4534,9 +4511,7 @@ void CWriter::visitICmpInst(ICmpInst &I) {
     Out << " > ";
     break;
   default:
-#ifndef NDEBUG
-    errs() << "Invalid icmp predicate!" << I;
-#endif
+    DEBUG(errs() << "Invalid icmp predicate!" << I);
     llvm_unreachable(0);
   }
 
@@ -4774,9 +4749,7 @@ void CWriter::printIntrinsicDefinition(FunctionType *funT, unsigned Opcode,
            "CBackend does not support arbitrary size integers.");
     switch (Opcode) {
     default:
-#ifndef NDEBUG
-      errs() << "Unsupported Intrinsic!" << Opcode;
-#endif
+      DEBUG(errs() << "Unsupported Intrinsic!" << Opcode);
       llvm_unreachable(0);
 
     case Intrinsic::uadd_with_overflow:
@@ -4878,17 +4851,13 @@ void CWriter::printIntrinsicDefinition(FunctionType *funT, unsigned Opcode,
     } else if (elemT->isPPC_FP128Ty()) {
       suffix = "l";
     } else {
-#ifndef NDEBUG
-      errs() << "Unsupported Intrinsic!" << Opcode;
-#endif
+      DEBUG(errs() << "Unsupported Intrinsic!" << Opcode);
       llvm_unreachable(0);
     }
 
     switch (Opcode) {
     default:
-#ifndef NDEBUG
-      errs() << "Unsupported Intrinsic!" << Opcode;
-#endif
+      DEBUG(errs() << "Unsupported Intrinsic!" << Opcode);
       llvm_unreachable(0);
 
     case Intrinsic::ceil:
@@ -5138,9 +5107,7 @@ bool CWriter::visitBuiltinCall(CallInst &I, Intrinsic::ID ID) {
 
   switch (ID) {
   default: {
-#ifndef NDEBUG
-    errs() << "Unknown LLVM intrinsic! " << I;
-#endif
+    DEBUG(errs() << "Unknown LLVM intrinsic! " << I);
     llvm_unreachable(0);
     return false;
   }
diff --git a/hpvm/scripts/automate_tests.sh b/hpvm/scripts/automate_tests.sh
new file mode 100644
index 0000000000000000000000000000000000000000..565d20754732b77f58acc96b08cd96bfb29ba6e2
--- /dev/null
+++ b/hpvm/scripts/automate_tests.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+CURRENT_DIR=`pwd`
+BUILD_DIR=$CURRENT_DIR/build
+
+if [ -x $BUILD_DIR/tools/hpvm/projects/$HPVM_RT ]; then
+    true
+else
+    echo HPVM not installed! Exiting without running tests!.
+    exit 0
+fi
+
+LIT_DIR=$BUILD_DIR/bin/
+
+LIT_TOOL=$LIT_DIR/llvm-lit
+
+TEST_DIR=$CURRENT_DIR/test
+REG_TEST_DIR=$TEST_DIR/regressionTests
+UNIT_TEST_DIR=$TEST_DIR/unitTests
+
+echo
+echo Running tests ...
+echo
+
+# Run regression tests
+$LIT_TOOL -v $REG_TEST_DIR
+
+# Run unit tests
+#$LIT_TOOL -v $UNIT_TEST_DIR
diff --git a/hpvm/llvm_installer/llvm_installer.sh b/hpvm/scripts/llvm_installer.sh
similarity index 91%
rename from hpvm/llvm_installer/llvm_installer.sh
rename to hpvm/scripts/llvm_installer.sh
index a972fb62adf907147d747d9003bc3524ef7dec3d..f878ae077c043dbc03b3bae3be16494ce676f9a2 100755
--- a/hpvm/llvm_installer/llvm_installer.sh
+++ b/hpvm/scripts/llvm_installer.sh
@@ -8,7 +8,7 @@ WGET=wget
 
 CURRENT_DIR=`pwd`
 INSTALL_DIR=`pwd`/install
-BUILD_DIR=$CURRENT_DIR/$LLVM_SRC/build
+BUILD_DIR=$CURRENT_DIR/build
 
 # Using 2 threads by default
 NUM_THREADS=2
@@ -19,6 +19,8 @@ LLVM_SRC="llvm-$VERSION.src"
 LIBCXX_SRC="libcxx-$VERSION.src"
 LIBCXXABI_SRC="libcxxabi-$VERSION.src"
 
+HPVM_RT=hpvm-rt.bc
+
 AUTOMATE="y"
 
 read -p "Build and install HPVM automatically? (y or n): " AUTOMATE   
@@ -151,7 +153,7 @@ echo Patches applied.
 
 if [ ! $AUTOMATE == "y" ]; then
   echo
-  echo HPMV not installed. Exiting. 
+  echo HPVM not installed. Exiting. 
   exit  
 fi
 
@@ -179,21 +181,13 @@ echo make -j$NUM_THREADS
 make -j$NUM_THREADS
 #make install
 
-# echo Building HPVM runtime
-# HPVM_RT_DIR=$HPVM_DIR/projects/hpvm-rt
-# cd $HPVM_RT_DIR
-# make
-
-#cp -r $CURRENT_DIR/projects $HPVM_DIR/
-#make -j$NUM_THREADS
-
 
-#if [ -x $INSTALL_DIR/bin/clang ]; then
-#    true
-#else
-#    echo LLVM not installed properly.
-#    exit 0
-#fi
+if [ -x $BUILD_DIR/tools/hpvm/projects/$HPVM_RT ]; then
+    true
+else
+    echo HPVM not installed properly.
+    exit 0
+fi
 
 cd $CURRENT_DIR
 
diff --git a/hpvm/test/benchmarks/hpvm-cava/Makefile b/hpvm/test/benchmarks/hpvm-cava/Makefile
index 07bb7f06c0544dc87c8c4947bf04501e5e410e29..58dfa72aacb172252ea1c13ed3331322fb600861 100644
--- a/hpvm/test/benchmarks/hpvm-cava/Makefile
+++ b/hpvm/test/benchmarks/hpvm-cava/Makefile
@@ -61,12 +61,12 @@ TESTGEN_OPTFLAGS = -load LLVMGenHPVM.so -genhpvm -globaldce
 
 ifeq ($(TARGET),seq)
   DEVICE = CPU_TARGET
-  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -load LLVMClearDFG.so -dfg2llvm-x86 -clearDFG
-  HPVM_OPTFLAGS += -hpvm-timers-x86
+  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -dfg2llvm-cpu -clearDFG
+  HPVM_OPTFLAGS += -hpvm-timers-cpu
 else
   DEVICE = GPU_TARGET
-  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_NVPTX.so -load LLVMDFG2LLVM_X86.so -load LLVMClearDFG.so -localmem -dfg2llvm-nvptx -dfg2llvm-x86 -clearDFG
-  HPVM_OPTFLAGS += -hpvm-timers-x86 -hpvm-timers-ptx
+  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -localmem -dfg2llvm-opencl -dfg2llvm-cpu -clearDFG
+  HPVM_OPTFLAGS += -hpvm-timers-cpu -hpvm-timers-ptx
 endif
   TESTGEN_OPTFLAGS += -hpvm-timers-gen
 
diff --git a/hpvm/test/benchmarks/parboil/common/mk/hpvm.mk b/hpvm/test/benchmarks/parboil/common/mk/hpvm.mk
index 9e0318600a3a2d43ed60922e2f48e7e23ea290a7..5938ca87582de4f1212a2b6acd3f5819a708c4b2 100755
--- a/hpvm/test/benchmarks/parboil/common/mk/hpvm.mk
+++ b/hpvm/test/benchmarks/parboil/common/mk/hpvm.mk
@@ -15,14 +15,14 @@ HPVM_RT_PATH = $(LLVM_BUILD_DIR)/tools/hpvm/projects/hpvm-rt
 HPVM_RT_LIB = $(HPVM_RT_PATH)/hpvm-rt.bc
 
 TESTGEN_OPTFLAGS = -load LLVMGenHPVM.so -genhpvm -globaldce
-KERNEL_GEN_FLAGS = -O3 -target nvptx64-nvidia-nvcl
+KERNEL_GEN_FLAGS = -O3 -target opencl64-nvidia-nvcl
 
 ifeq ($(TARGET),seq)
   DEVICE = CPU_TARGET
-  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -load LLVMClearDFG.so -dfg2llvm-x86 -clearDFG
+  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -dfg2llvm-cpu -clearDFG
 else
   DEVICE = GPU_TARGET
-  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_NVPTX.so -load LLVMDFG2LLVM_X86.so -load LLVMClearDFG.so -localmem -dfg2llvm-nvptx -dfg2llvm-x86 -clearDFG
+  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -localmem -dfg2llvm-opencl -dfg2llvm-cpu -clearDFG
 endif
 
 CFLAGS += -DDEVICE=$(DEVICE)
@@ -30,16 +30,16 @@ CXXFLAGS += -DDEVICE=$(DEVICE)
 
 HOST_LINKFLAGS =
 
-ifeq ($(TIMER),x86)
-  HPVM_OPTFLAGS += -hpvm-timers-x86
+ifeq ($(TIMER),cpu)
+  HPVM_OPTFLAGS += -hpvm-timers-cpu
 else ifeq ($(TIMER),gen)
   TESTGEN_OPTFLAGS += -hpvm-timers-gen
 else ifeq ($(TIMER),no)
 else
   ifeq ($(TARGET),seq)
-    HPVM_OPTFLAGS += -hpvm-timers-x86
+    HPVM_OPTFLAGS += -hpvm-timers-cpu
   else
-    HPVM_OPTFLAGS += -hpvm-timers-x86 -hpvm-timers-ptx
+    HPVM_OPTFLAGS += -hpvm-timers-cpu -hpvm-timers-ptx
   endif
   TESTGEN_OPTFLAGS += -hpvm-timers-gen
 endif
diff --git a/hpvm/test/benchmarks/pipeline/Makefile b/hpvm/test/benchmarks/pipeline/Makefile
index 7a246a651a06ea67246578371d8797682aea5bfd..8a55393f241f30d840cbf85c31488e652c2023a0 100644
--- a/hpvm/test/benchmarks/pipeline/Makefile
+++ b/hpvm/test/benchmarks/pipeline/Makefile
@@ -48,12 +48,12 @@ TESTGEN_OPTFLAGS = -load LLVMGenHPVM.so -genhpvm -globaldce
 
 ifeq ($(TARGET),seq)
   DEVICE = CPU_TARGET
-  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -load LLVMClearDFG.so -dfg2llvm-x86 -clearDFG
-  HPVM_OPTFLAGS += -hpvm-timers-x86
+  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -dfg2llvm-cpu -clearDFG
+  HPVM_OPTFLAGS += -hpvm-timers-cpu
 else
   DEVICE = GPU_TARGET
-  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_NVPTX.so -load LLVMDFG2LLVM_X86.so -load LLVMClearDFG.so -localmem -dfg2llvm-nvptx -dfg2llvm-x86 -clearDFG
-  HPVM_OPTFLAGS += -hpvm-timers-x86 -hpvm-timers-ptx
+  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -localmem -dfg2llvm-opencl -dfg2llvm-cpu -clearDFG
+  HPVM_OPTFLAGS += -hpvm-timers-cpu -hpvm-timers-ptx
 endif
   TESTGEN_OPTFLAGS += -hpvm-timers-gen
 
diff --git a/hpvm/test/benchmarks/pipeline/src/main.cc b/hpvm/test/benchmarks/pipeline/src/main.cc
index cda1d975a63fc07c174ed57ddef1e72f0973f033..057c13b62745ba618b13b9f2c1443fb41ca45bdb 100644
--- a/hpvm/test/benchmarks/pipeline/src/main.cc
+++ b/hpvm/test/benchmarks/pipeline/src/main.cc
@@ -143,7 +143,7 @@ void packData(struct InStruct *args, float *I, size_t bytesI, float *Is,
  * Need 2D grid, a thread per pixel
  * No use of separable algorithm because we need to do this in one kernel
  * No use of shared memory because
- * - we don't handle it in the X86 pass
+ * - we don't handle it in the CPU pass
  */
 
 #define GAUSSIAN_SIZE 7
@@ -452,7 +452,7 @@ void WrapperComputeZeroCrossings(float *L, size_t bytesL, float *B,
  * Need 2D grid, a thread per pixel
  * No use of separable algorithm because we need to do this in one kernel
  * No use of shared memory because
- * - we don't handle it in the X86 pass
+ * - we don't handle it in the CPU pass
  */
 
 #define SOBEL_SIZE 3
@@ -834,7 +834,7 @@ int main(int argc, char *argv[]) {
   resize(E, out, Size(HEIGHT, WIDTH));
   imshow(input_window, in);
   imshow(output_window, out);
-//  waitKey(0);
+  //  waitKey(0);
 
   struct InStruct *args = (struct InStruct *)malloc(sizeof(InStruct));
   packData(args, (float *)src.data, I_sz, (float *)Is.data, I_sz,
@@ -873,7 +873,7 @@ int main(int argc, char *argv[]) {
         __hpvm__push(DFG, args);
         void *ret = __hpvm__pop(DFG);
         // This is reading the result of the streaming graph
-        size_t framesize =  ((OutStruct *)ret)->ret;
+        size_t framesize = ((OutStruct *)ret)->ret;
 
         llvm_hpvm_request_mem(maxG, bytesMaxG);
         llvm_hpvm_request_mem(E.data, I_sz);
diff --git a/hpvm/test/benchmarks/template/Makefile b/hpvm/test/benchmarks/template/Makefile
index 5524f05286be7fb8bea1aac163f5732e1f31c966..fed129b32d723116b20886ea1f5a414d08192b85 100644
--- a/hpvm/test/benchmarks/template/Makefile
+++ b/hpvm/test/benchmarks/template/Makefile
@@ -52,12 +52,12 @@ TESTGEN_OPTFLAGS = -load LLVMGenHPVM.so -genhpvm -globaldce
 
 ifeq ($(TARGET),seq)
   DEVICE = CPU_TARGET
-  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -load LLVMClearDFG.so -dfg2llvm-x86 -clearDFG
-  HPVM_OPTFLAGS += -hpvm-timers-x86
+  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -dfg2llvm-cpu -clearDFG
+  HPVM_OPTFLAGS += -hpvm-timers-cpu
 else
   DEVICE = GPU_TARGET
-  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_NVPTX.so -load LLVMDFG2LLVM_X86.so -load LLVMClearDFG.so -localmem -dfg2llvm-nvptx -dfg2llvm-x86 -clearDFG
-  HPVM_OPTFLAGS += -hpvm-timers-x86 -hpvm-timers-ptx
+  HPVM_OPTFLAGS = -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -localmem -dfg2llvm-opencl -dfg2llvm-cpu -clearDFG
+  HPVM_OPTFLAGS += -hpvm-timers-cpu -hpvm-timers-ptx
 endif
   TESTGEN_OPTFLAGS += -hpvm-timers-gen
 
diff --git a/hpvm/test/regressionTests/ClearDFG/ThreeLevel.CPU.ll b/hpvm/test/regressionTests/ClearDFG/ThreeLevel.CPU.ll
new file mode 100644
index 0000000000000000000000000000000000000000..460dd15b6b1f6dd38483a18e899f0d96b68cac08
--- /dev/null
+++ b/hpvm/test/regressionTests/ClearDFG/ThreeLevel.CPU.ll
@@ -0,0 +1,225 @@
+; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -S -dfg2llvm-cpu -clearDFG <  %s | FileCheck %s
+; ModuleID = 'ThreeLevel.ll'
+source_filename = "ThreeLevel.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.Root = type { i32*, i64, i32*, i64, i32*, i64 }
+%struct.out.Func1 = type <{ i32* }>
+%struct.out.Func3 = type <{ i32* }>
+%struct.out.Func2 = type <{ i32* }>
+%struct.out.PipeRoot = type <{ i32* }>
+
+
+; CHECK-LABEL: i32 @main(
+; CHECK-NOT: call void @llvm.hpvm.init()
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
+; CHECK-NOT: call i8* @llvm.hpvm.launch(i8*
+; CHECK: call void @llvm_hpvm_cpu_wait(i8*
+
+; CHECK-LABEL: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned
+; CHECK: call i8* @llvm_hpvm_cpu_argument_ptr(
+
+; CHECK-LABEL: @Func3_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-LABEL: for.body1:
+; CHECK: %index.y = phi i64 [ 0, %for.body ], [ %index.y.inc, %for.body1 ]
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
+
+; CHECK-LABEL: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-LABEL: for.body:
+; CHECK-NEXT: %index.x = phi i64 [ 0, %entry ], [ %index.x.inc, %for.body ]
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func3_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
+
+; CHECK-LABEL: @PipeRoot_cloned.4(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
+
+; CHECK-LABEL: @LaunchDataflowGraph(
+; CHECK: call %struct.out.PipeRoot @PipeRoot_cloned.4(
+
+declare dso_local void @__hpvm__hint(i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__return(i32, ...) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__createNodeND(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindIn(i8*, i32, i32, i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindOut(i8*, i32, i32, i32) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @main() local_unnamed_addr #2 {
+entry:
+  %In1 = alloca i32, align 4
+  %In2 = alloca i32, align 4
+  %Out = alloca i32, align 4
+  %RootArgs = alloca %struct.Root, align 8
+  %0 = bitcast i32* %In1 to i8*
+  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #3
+  store i32 1, i32* %In1, align 4, !tbaa !6
+  %1 = bitcast i32* %In2 to i8*
+  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %1) #3
+  store i32 2, i32* %In2, align 4, !tbaa !6
+  %2 = bitcast i32* %Out to i8*
+  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %2) #3
+  store i32 0, i32* %Out, align 4, !tbaa !6
+  %3 = bitcast %struct.Root* %RootArgs to i8*
+  call void @llvm.lifetime.start.p0i8(i64 48, i8* nonnull %3) #3
+  %input1 = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 0
+  store i32* %In1, i32** %input1, align 8, !tbaa !10
+  %Insize1 = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 1
+  store i64 32, i64* %Insize1, align 8, !tbaa !14
+  %input2 = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 2
+  store i32* %In2, i32** %input2, align 8, !tbaa !15
+  %Insize2 = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 3
+  store i64 32, i64* %Insize2, align 8, !tbaa !16
+  %output = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 4
+  store i32* %Out, i32** %output, align 8, !tbaa !17
+  %Outsize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 5
+  store i64 32, i64* %Outsize, align 8, !tbaa !18
+  call void @llvm.hpvm.init()
+  %4 = bitcast %struct.Root* %RootArgs to i8*
+  %graphID = call i8* @llvm.hpvm.launch(i8* bitcast (%struct.out.PipeRoot (i32*, i64, i32*, i64, i32*, i64)* @PipeRoot_cloned to i8*), i8* %4, i1 false)
+  call void @llvm.hpvm.wait(i8* %graphID)
+  call void @llvm.hpvm.cleanup()
+  call void @llvm.lifetime.end.p0i8(i64 48, i8* nonnull %3) #3
+  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %2) #3
+  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %1) #3
+  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #3
+  ret i32 0
+}
+
+declare dso_local void @__hpvm__init(...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__launch(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__wait(i8*) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__cleanup(...) local_unnamed_addr #0
+
+declare i8* @llvm_hpvm_initializeTimerSet()
+
+declare void @llvm_hpvm_switchToTimer(i8**, i32)
+
+declare void @llvm_hpvm_printTimerSet(i8**, i8*)
+
+; Function Attrs: nounwind uwtable
+define dso_local %struct.out.Func1 @Func1_cloned(i32* in %In, i64 %Insize, i32* out %Out, i64 %Outsize) #2 {
+entry:
+  %returnStruct = insertvalue %struct.out.Func1 undef, i32* %Out, 0
+  ret %struct.out.Func1 %returnStruct
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode2D(i8*, i64, i64) #3
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.input(i8*, i32, i32, i1) #3
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.output(i8*, i32, i32, i1) #3
+
+; Function Attrs: nounwind uwtable
+define dso_local %struct.out.Func3 @Func3_cloned(i32* in %In, i64 %Insize, i32* out %Out, i64 %Outsize) #2 {
+; CHECK-NOT: @Func3_cloned
+entry:
+  %Func1_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%struct.out.Func1 (i32*, i64, i32*, i64)* @Func1_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 3, i32 3, i1 false)
+  call void @llvm.hpvm.bind.output(i8* %Func1_cloned.node, i32 0, i32 0, i1 false)
+  ret %struct.out.Func3 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode1D(i8*, i64) #3
+
+; Function Attrs: nounwind uwtable
+define dso_local %struct.out.Func2 @Func2_cloned(i32* in %In, i64 %Insize, i32* out %Out, i64 %Outsize) #2 {
+; CHECK-NOT: @Func2_cloned
+entry:
+  %Func3_cloned.node = call i8* @llvm.hpvm.createNode1D(i8* bitcast (%struct.out.Func3 (i32*, i64, i32*, i64)* @Func3_cloned to i8*), i64 3)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 3, i32 3, i1 false)
+  call void @llvm.hpvm.bind.output(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  ret %struct.out.Func2 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode(i8*) #3
+
+; Function Attrs: nounwind uwtable
+define dso_local %struct.out.PipeRoot @PipeRoot_cloned(i32* in %In1, i64 %Insize1, i32* in %In2, i64 %InSize2, i32* out %Out, i64 %Outsize) #2 {
+; CHECK-NOT: @PipeRoot_cloned
+entry:
+  %Func2_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%struct.out.Func2 (i32*, i64, i32*, i64)* @Func2_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 3, i32 3, i1 false)
+  call void @llvm.hpvm.bind.output(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  ret %struct.out.PipeRoot undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.init() #3
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.launch(i8*, i8*, i1) #3
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.wait(i8*) #3
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.cleanup() #3
+
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #1 = { argmemonly nounwind }
+attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #3 = { nounwind }
+
+!llvm.module.flags = !{!0}
+!llvm.ident = !{!1}
+!hpvm_hint_cpu = !{!2, !3, !4, !5}
+!hpvm_hint_gpu = !{}
+!hpvm_hint_spir = !{}
+!hpvm_hint_cudnn = !{}
+!hpvm_hint_promise = !{}
+!hpvm_hint_cpu_gpu = !{}
+!hpvm_hint_cpu_spir = !{}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{!"clang version 9.0.0 (https://gitlab.engr.illinois.edu/llvm/hpvm.git 6690f9e7e8b46b96aea222d3e85315cd63545953)"}
+!2 = !{%struct.out.Func1 (i32*, i64, i32*, i64)* @Func1_cloned}
+!3 = !{%struct.out.Func3 (i32*, i64, i32*, i64)* @Func3_cloned}
+!4 = !{%struct.out.Func2 (i32*, i64, i32*, i64)* @Func2_cloned}
+!5 = !{%struct.out.PipeRoot (i32*, i64, i32*, i64, i32*, i64)* @PipeRoot_cloned}
+!6 = !{!7, !7, i64 0}
+!7 = !{!"int", !8, i64 0}
+!8 = !{!"omnipotent char", !9, i64 0}
+!9 = !{!"Simple C/C++ TBAA"}
+!10 = !{!11, !12, i64 0}
+!11 = !{!"Root", !12, i64 0, !13, i64 8, !12, i64 16, !13, i64 24, !12, i64 32, !13, i64 40}
+!12 = !{!"any pointer", !8, i64 0}
+!13 = !{!"long", !8, i64 0}
+!14 = !{!11, !13, i64 8}
+!15 = !{!11, !12, i64 16}
+!16 = !{!11, !13, i64 24}
+!17 = !{!11, !12, i64 32}
+!18 = !{!11, !13, i64 40}
diff --git a/hpvm/test/regressionTests/ClearDFG/ThreeLevel.OpenCLToCPU.ll b/hpvm/test/regressionTests/ClearDFG/ThreeLevel.OpenCLToCPU.ll
new file mode 100644
index 0000000000000000000000000000000000000000..5fc96483927bb2328fa63517ff094de3ff3bbcee
--- /dev/null
+++ b/hpvm/test/regressionTests/ClearDFG/ThreeLevel.OpenCLToCPU.ll
@@ -0,0 +1,256 @@
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -S -localmem -dfg2llvm-opencl -dfg2llvm-cpu -clearDFG <  %s | FileCheck %s
+; ModuleID = 'ThreeLevel.ll'
+source_filename = "ThreeLevel.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.Root = type { i32*, i64, i32*, i64 }
+%emptyStruct = type <{}>
+%emptyStruct.0 = type <{}>
+%emptyStruct.1 = type <{}>
+%emptyStruct.2 = type <{}>
+
+declare dso_local void @__hpvm__hint(i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__getNode(...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__getParentNode(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_y(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_y(i8*) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__createNodeND(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindIn(i8*, i32, i32, i32) local_unnamed_addr #0
+
+; CHECK-LABEL: @Launch(
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8*
+; CHECK-NOT: call i8* @llvm.hpvm.launch(i8*
+; CHECK: call void @llvm_hpvm_cpu_wait(i8*
+
+; Function Attrs: noinline nounwind uwtable
+define dso_local void @Launch() local_unnamed_addr #2 {
+entry:
+  %RootArgs = alloca %struct.Root, align 8
+  %0 = bitcast %struct.Root* %RootArgs to i8*
+  call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %0) #6
+  %call = tail call noalias i8* @malloc(i64 1024) #6
+  %1 = bitcast %struct.Root* %RootArgs to i8**
+  store i8* %call, i8** %1, align 8, !tbaa !6
+  %Insize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 1
+  store i64 1024, i64* %Insize, align 8, !tbaa !12
+  %output = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 2
+  %call1 = tail call noalias i8* @malloc(i64 1024) #6
+  %2 = bitcast i32** %output to i8**
+  store i8* %call1, i8** %2, align 8, !tbaa !13
+  %Outsize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 3
+  store i64 1024, i64* %Outsize, align 8, !tbaa !14
+  %3 = bitcast %struct.Root* %RootArgs to i8*
+  %graphID = call i8* @llvm.hpvm.launch(i8* bitcast (%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned to i8*), i8* %3, i1 false)
+  call void @llvm.hpvm.wait(i8* %graphID)
+  call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull %0) #6
+  ret void
+}
+
+; Function Attrs: nofree nounwind
+declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #3
+
+declare dso_local i8* @__hpvm__launch(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__wait(i8*) local_unnamed_addr #0
+
+; CHECK-LABEL: @main(
+; CHECK-NOT: call void @llvm.hpvm.init(
+; CHECK-NOT: call void @llvm.hpvm.cleanup(
+; CHECK: call i8* @llvm_hpvm_ocl_initContext(i32
+; CHECK: call i8* @llvm_hpvm_ocl_launch(i8*
+; CHECK: call void @llvm_hpvm_ocl_clearContext(i8*
+
+; CHECK-LABEL: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_executeNode(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+
+; CHECK-LABEL: @PipeRoot_cloned.3(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_clone
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop(
+
+; CHECK-LABEL: define i8* @LaunchDataflowGraph(i8*
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @main() local_unnamed_addr #4 {
+entry:
+  call void @llvm.hpvm.init()
+  tail call void @Launch()
+  call void @llvm.hpvm.cleanup()
+  ret i32 0
+}
+
+declare dso_local void @__hpvm__init(...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__cleanup(...) local_unnamed_addr #0
+
+declare i8* @llvm_hpvm_initializeTimerSet()
+
+declare void @llvm_hpvm_switchToTimer(i8**, i32)
+
+declare void @llvm_hpvm_printTimerSet(i8**, i8*)
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getNode() #5
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getParentNode(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.y(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.y(i8*) #5
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode2D(i8*, i64, i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct @Func1_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+; CHECK-NOT: @Func1_cloned(
+entry:
+  %call4 = call i8* @llvm.hpvm.getNode()
+  %call15 = call i8* @llvm.hpvm.getParentNode(i8* %call4)
+  %call26 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call4)
+  %call37 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call4)
+  %call58 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call15)
+  %call79 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call15)
+  %call910 = call i64 @llvm.hpvm.getNumNodeInstances.x(i8* %call4)
+  %call1111 = call i64 @llvm.hpvm.getNumNodeInstances.y(i8* %call4)
+  %mul = mul i64 %call910, %call58
+  %add = add i64 %mul, %call26
+  %mul13 = mul i64 %call1111, %call79
+  %add14 = add i64 %mul13, %call37
+  %sext = shl i64 %add14, 32
+  %idxprom = ashr exact i64 %sext, 32
+  %arrayidx = getelementptr inbounds i32, i32* %In, i64 %idxprom
+  %0 = load i32, i32* %arrayidx, align 4, !tbaa !15
+  %sext36 = shl i64 %add, 32
+  %idxprom15 = ashr exact i64 %sext36, 32
+  %arrayidx16 = getelementptr inbounds i32, i32* %Out, i64 %idxprom15
+  %1 = load i32, i32* %arrayidx16, align 4, !tbaa !15
+  %add17 = add nsw i32 %1, %0
+  store i32 %add17, i32* %arrayidx16, align 4, !tbaa !15
+  ret %emptyStruct undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.input(i8*, i32, i32, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.0 @Func3_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+; CHECK-NOT: @Func3_cloned(
+entry:
+  %Func1_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.0 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode(i8*) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.1 @Func2_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+; CHECK-NOT: @Func2_cloned(
+entry:
+  %Func3_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.1 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.launch(i8*, i8*, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.2 @PipeRoot_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+; CHECK-NOT: @PipeRoot_cloned(
+entry:
+  %Func2_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.2 undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.wait(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.init() #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.cleanup() #6
+
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #1 = { argmemonly nounwind }
+attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #3 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #4 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #5 = { nounwind readnone }
+attributes #6 = { nounwind }
+
+!llvm.module.flags = !{!0}
+!llvm.ident = !{!1}
+!hpvm_hint_gpu = !{!2}
+!hpvm_hint_cpu = !{!3, !4, !5}
+!hpvm_hint_spir = !{}
+!hpvm_hint_cudnn = !{}
+!hpvm_hint_promise = !{}
+!hpvm_hint_cpu_gpu = !{}
+!hpvm_hint_cpu_spir = !{}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{!"clang version 9.0.0 (https://gitlab.engr.illinois.edu/llvm/hpvm.git 6690f9e7e8b46b96aea222d3e85315cd63545953)"}
+!2 = !{%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned}
+!3 = !{%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned}
+!4 = !{%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned}
+!5 = !{%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned}
+!6 = !{!7, !8, i64 0}
+!7 = !{!"Root", !8, i64 0, !11, i64 8, !8, i64 16, !11, i64 24}
+!8 = !{!"any pointer", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C/C++ TBAA"}
+!11 = !{!"long", !9, i64 0}
+!12 = !{!7, !11, i64 8}
+!13 = !{!7, !8, i64 16}
+!14 = !{!7, !11, i64 24}
+!15 = !{!16, !16, i64 0}
+!16 = !{!"int", !9, i64 0}
diff --git a/hpvm/test/regressionTests/ClearDFG/ThreeLevel.constmem.OpenCLToCPU.ll b/hpvm/test/regressionTests/ClearDFG/ThreeLevel.constmem.OpenCLToCPU.ll
new file mode 100644
index 0000000000000000000000000000000000000000..af3c2cf9ffef654c002f7d261a622db9f09d6a58
--- /dev/null
+++ b/hpvm/test/regressionTests/ClearDFG/ThreeLevel.constmem.OpenCLToCPU.ll
@@ -0,0 +1,276 @@
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -load LLVMDFG2LLVM_CPU.so -load LLVMClearDFG.so -S -localmem -dfg2llvm-opencl -dfg2llvm-cpu -clearDFG <  %s | FileCheck %s
+; ModuleID = 'ThreeLevel.opt.ll'
+source_filename = "ThreeLevel.opt.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.Root = type { i32*, i64, i32*, i64 }
+%struct.out.Allocation = type <{ i8*, i64 }>
+%emptyStruct = type <{}>
+%emptyStruct.0 = type <{}>
+%emptyStruct.1 = type <{}>
+%emptyStruct.2 = type <{}>
+
+declare dso_local void @__hpvm__hint(i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__getNode(...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__getParentNode(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_y(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_y(i8*) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__malloc(i64) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__return(i32, ...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__createNodeND(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindIn(i8*, i32, i32, i32) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__edge(i8*, i8*, i32, i32, i32, i32) local_unnamed_addr #0
+
+; CHECK-LABEL: @Launch(
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8*
+; CHECK-NOT: call i8* @llvm.hpvm.launch(i8*
+; CHECK: call void @llvm_hpvm_cpu_wait(i8*
+
+; Function Attrs: noinline nounwind uwtable
+define dso_local void @Launch() local_unnamed_addr #2 {
+entry:
+  %RootArgs = alloca %struct.Root, align 8
+  %0 = bitcast %struct.Root* %RootArgs to i8*
+  call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %0) #6
+  %call = tail call noalias i8* @malloc(i64 1024) #6
+  %1 = bitcast %struct.Root* %RootArgs to i8**
+  store i8* %call, i8** %1, align 8, !tbaa !6
+  %Insize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 1
+  store i64 1024, i64* %Insize, align 8, !tbaa !12
+  %output = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 2
+  %call1 = tail call noalias i8* @malloc(i64 1024) #6
+  %2 = bitcast i32** %output to i8**
+  store i8* %call1, i8** %2, align 8, !tbaa !13
+  %Outsize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 3
+  store i64 1024, i64* %Outsize, align 8, !tbaa !14
+  %3 = bitcast %struct.Root* %RootArgs to i8*
+  %graphID = call i8* @llvm.hpvm.launch(i8* bitcast (%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned to i8*), i8* %3, i1 false)
+  call void @llvm.hpvm.wait(i8* %graphID)
+  call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull %0) #6
+  ret void
+}
+
+; Function Attrs: nofree nounwind
+declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #3
+
+declare dso_local i8* @__hpvm__launch(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__wait(i8*) local_unnamed_addr #0
+
+; CHECK-LABEL: @main(
+; CHECK-NOT: call void @llvm.hpvm.init(
+; CHECK-NOT: call void @llvm.hpvm.cleanup(
+; CHECK: call i8* @llvm_hpvm_ocl_initContext(i32
+; CHECK: call i8* @llvm_hpvm_ocl_launch(i8*
+; CHECK: call void @llvm_hpvm_ocl_clearContext(i8*
+
+; CHECK-LABEL: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_shared(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_executeNode(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+
+; CHECK-LABEL: @PipeRoot_cloned.4(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_clone
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop(
+
+; CHECK-LABEL: define i8* @LaunchDataflowGraph(i8*
+
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @main() local_unnamed_addr #4 {
+entry:
+  call void @llvm.hpvm.init()
+  tail call void @Launch()
+  call void @llvm.hpvm.cleanup()
+  ret i32 0
+}
+
+declare dso_local void @__hpvm__init(...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__cleanup(...) local_unnamed_addr #0
+
+declare i8* @llvm_hpvm_initializeTimerSet()
+
+declare void @llvm_hpvm_switchToTimer(i8**, i32)
+
+declare void @llvm_hpvm_printTimerSet(i8**, i8*)
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getNode() #5
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getParentNode(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.y(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.y(i8*) #5
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.malloc(i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %struct.out.Allocation @Allocation_cloned(i64 %block) #4 {
+entry:
+  %call1 = call i8* @llvm.hpvm.malloc(i64 %block)
+  %returnStruct = insertvalue %struct.out.Allocation undef, i8* %call1, 0
+  %returnStruct2 = insertvalue %struct.out.Allocation %returnStruct, i64 %block, 1
+  ret %struct.out.Allocation %returnStruct2
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode2D(i8*, i64, i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct @Func1_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+; CHECK-NOT: @Func1_cloned(
+entry:
+  %call4 = call i8* @llvm.hpvm.getNode()
+  %call15 = call i8* @llvm.hpvm.getParentNode(i8* %call4)
+  %call26 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call4)
+  %call37 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call4)
+  %call58 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call15)
+  %call79 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call15)
+  %call910 = call i64 @llvm.hpvm.getNumNodeInstances.x(i8* %call4)
+  %call1111 = call i64 @llvm.hpvm.getNumNodeInstances.y(i8* %call4)
+  %mul = mul i64 %call910, %call58
+  %add = add i64 %mul, %call26
+  %arrayidx = getelementptr inbounds i32, i32* %In, i64 3
+  %0 = load i32, i32* %arrayidx, align 4, !tbaa !15
+  %sext = shl i64 %add, 32
+  %idxprom = ashr exact i64 %sext, 32
+  %arrayidx15 = getelementptr inbounds i32, i32* %Out, i64 %idxprom
+  %1 = load i32, i32* %arrayidx15, align 4, !tbaa !15
+  %add16 = add nsw i32 %1, %0
+  store i32 %add16, i32* %arrayidx15, align 4, !tbaa !15
+  ret %emptyStruct undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.input(i8*, i32, i32, i1) #6
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createEdge(i8*, i8*, i1, i32, i32, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.0 @Func3_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+; CHECK-NOT: @Func3_cloned(
+entry:
+  %Func1_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned to i8*), i64 3, i64 5)
+  %Allocation_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%struct.out.Allocation (i64)* @Allocation_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Allocation_cloned.node, i32 1, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 3, i32 3, i1 false)
+  %output = call i8* @llvm.hpvm.createEdge(i8* %Allocation_cloned.node, i8* %Func1_cloned.node, i1 true, i32 0, i32 0, i1 false)
+  %output1 = call i8* @llvm.hpvm.createEdge(i8* %Allocation_cloned.node, i8* %Func1_cloned.node, i1 true, i32 1, i32 1, i1 false)
+  ret %emptyStruct.0 undef
+}
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.1 @Func2_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+; CHECK-NOT: @Func2_cloned(
+entry:
+  %Func3_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.1 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.launch(i8*, i8*, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.2 @PipeRoot_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+; CHECK-NOT: @PipeRoot_cloned(
+entry:
+  %Func2_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.2 undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.wait(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.init() #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.cleanup() #6
+
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #1 = { argmemonly nounwind }
+attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #3 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #4 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #5 = { nounwind readnone }
+attributes #6 = { nounwind }
+
+!llvm.module.flags = !{!0}
+!llvm.ident = !{!1}
+!hpvm_hint_gpu = !{!2}
+!hpvm_hint_cpu = !{!3, !4, !5}
+!hpvm_hint_spir = !{}
+!hpvm_hint_cudnn = !{}
+!hpvm_hint_promise = !{}
+!hpvm_hint_cpu_gpu = !{}
+!hpvm_hint_cpu_spir = !{}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{!"clang version 9.0.0 (https://gitlab.engr.illinois.edu/llvm/hpvm.git 6690f9e7e8b46b96aea222d3e85315cd63545953)"}
+!2 = !{%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned}
+!3 = !{%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned}
+!4 = !{%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned}
+!5 = !{%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned}
+!6 = !{!7, !8, i64 0}
+!7 = !{!"Root", !8, i64 0, !11, i64 8, !8, i64 16, !11, i64 24}
+!8 = !{!"any pointer", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C/C++ TBAA"}
+!11 = !{!"long", !9, i64 0}
+!12 = !{!7, !11, i64 8}
+!13 = !{!7, !8, i64 16}
+!14 = !{!7, !11, i64 24}
+!15 = !{!16, !16, i64 0}
+!16 = !{!"int", !9, i64 0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_X86/OneLevel.codeGen.ll b/hpvm/test/regressionTests/DFG2LLVM_CPU/OneLevel.codeGen.ll
similarity index 94%
rename from hpvm/test/regressionTests/DFG2LLVM_X86/OneLevel.codeGen.ll
rename to hpvm/test/regressionTests/DFG2LLVM_CPU/OneLevel.codeGen.ll
index 1373d13159ee90421d75a2f16e99e3d4a9a24bdd..85dedaaaab83edf19b96efca97dbed58c4403731 100644
--- a/hpvm/test/regressionTests/DFG2LLVM_X86/OneLevel.codeGen.ll
+++ b/hpvm/test/regressionTests/DFG2LLVM_CPU/OneLevel.codeGen.ll
@@ -1,4 +1,4 @@
-; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -S -dfg2llvm-x86 <  %s | FileCheck %s
+; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -S -dfg2llvm-cpu <  %s | FileCheck %s
 ; ModuleID = 'CreateNode.ll'
 source_filename = "CreateNode.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
@@ -10,9 +10,9 @@ target triple = "x86_64-unknown-linux-gnu"
 
 ; CHECK-LABEL: i32 @main(
 ; CHECK: call void @llvm.hpvm.init()
-; CHECK: call i8* @llvm_hpvm_x86_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
 ; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8*
-; CHECK-NEXT: call void @llvm_hpvm_x86_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
 
 ; CHECK-LABEL: @PipeRoot_cloned(
 ; CHECK: call i8* @llvm.hpvm.createNode(
@@ -23,12 +23,12 @@ target triple = "x86_64-unknown-linux-gnu"
 ; CHECK-NEXT: call void @llvm.hpvm.bind.output(i8* %Func_cloned.node
 
 ; CHECK-LABEL: @Func_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned
-; CHECK: call i8* @llvm_hpvm_x86_argument_ptr(
+; CHECK: call i8* @llvm_hpvm_cpu_argument_ptr(
 
 ; CHECK-LABEL: @PipeRoot_cloned.2(
-; CHECK: call void @llvm_hpvm_x86_dstack_push(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @LaunchDataflowGraph(i8*
 ; call %struct.out.PipeRoot @PipeRoot_cloned.2(
@@ -148,9 +148,9 @@ declare void @llvm.hpvm.wait(i8*) #3
 ; Function Attrs: nounwind
 declare void @llvm.hpvm.cleanup() #3
 
-attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #1 = { argmemonly nounwind }
-attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #3 = { nounwind }
 
 !llvm.module.flags = !{!0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_X86/OneRootBasic.ll b/hpvm/test/regressionTests/DFG2LLVM_CPU/OneRootBasic.ll
similarity index 93%
rename from hpvm/test/regressionTests/DFG2LLVM_X86/OneRootBasic.ll
rename to hpvm/test/regressionTests/DFG2LLVM_CPU/OneRootBasic.ll
index a0f0f6ecfc4b68cbc3f86272fb11cf3702f9b54e..5ff1dffba26f6a92029e914016bd7a07e0619159 100644
--- a/hpvm/test/regressionTests/DFG2LLVM_X86/OneRootBasic.ll
+++ b/hpvm/test/regressionTests/DFG2LLVM_CPU/OneRootBasic.ll
@@ -1,4 +1,4 @@
-; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -S -dfg2llvm-x86 <  %s | FileCheck %s
+; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -S -dfg2llvm-cpu <  %s | FileCheck %s
 ; ModuleID = 'oneLaunchAlloca.ll'
 source_filename = "oneLaunchAlloca.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
@@ -13,9 +13,9 @@ declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
 
 ; CHECK-LABEL: i32 @main(
 ; CHECK: call void @llvm.hpvm.init()
-; CHECK: call i8* @llvm_hpvm_x86_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
 ; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8*
-; CHECK-NEXT: call void @llvm_hpvm_x86_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
 
 ; CHECK-LABEL: @PipeRoot_cloned.1(
 
@@ -92,8 +92,8 @@ declare void @llvm.hpvm.wait(i8*) #3
 ; Function Attrs: nounwind
 declare void @llvm.hpvm.cleanup() #3
 
-attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
-attributes #1 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #1 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #2 = { argmemonly nounwind }
 attributes #3 = { nounwind }
 
diff --git a/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.OpenCL.ll b/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.OpenCL.ll
new file mode 100644
index 0000000000000000000000000000000000000000..aea2545e44b6ffaad40f28a1b0f1bad570fae775
--- /dev/null
+++ b/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.OpenCL.ll
@@ -0,0 +1,250 @@
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -load LLVMDFG2LLVM_CPU.so -S -localmem -dfg2llvm-opencl -dfg2llvm-cpu <  %s | FileCheck %s
+; ModuleID = 'ThreeLevel.ll'
+source_filename = "ThreeLevel.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.Root = type { i32*, i64, i32*, i64 }
+%emptyStruct = type <{}>
+%emptyStruct.0 = type <{}>
+%emptyStruct.1 = type <{}>
+%emptyStruct.2 = type <{}>
+
+declare dso_local void @__hpvm__hint(i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__getNode(...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__getParentNode(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_y(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_y(i8*) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__createNodeND(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindIn(i8*, i32, i32, i32) local_unnamed_addr #0
+
+; CHECK-LABEL: @Launch(
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8*
+; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
+
+; Function Attrs: noinline nounwind uwtable
+define dso_local void @Launch() local_unnamed_addr #2 {
+entry:
+  %RootArgs = alloca %struct.Root, align 8
+  %0 = bitcast %struct.Root* %RootArgs to i8*
+  call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %0) #6
+  %call = tail call noalias i8* @malloc(i64 1024) #6
+  %1 = bitcast %struct.Root* %RootArgs to i8**
+  store i8* %call, i8** %1, align 8, !tbaa !6
+  %Insize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 1
+  store i64 1024, i64* %Insize, align 8, !tbaa !12
+  %output = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 2
+  %call1 = tail call noalias i8* @malloc(i64 1024) #6
+  %2 = bitcast i32** %output to i8**
+  store i8* %call1, i8** %2, align 8, !tbaa !13
+  %Outsize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 3
+  store i64 1024, i64* %Outsize, align 8, !tbaa !14
+  %3 = bitcast %struct.Root* %RootArgs to i8*
+  %graphID = call i8* @llvm.hpvm.launch(i8* bitcast (%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned to i8*), i8* %3, i1 false)
+  call void @llvm.hpvm.wait(i8* %graphID)
+  call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull %0) #6
+  ret void
+}
+
+; Function Attrs: nofree nounwind
+declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #3
+
+declare dso_local i8* @__hpvm__launch(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__wait(i8*) local_unnamed_addr #0
+
+; CHECK-LABEL: @main(
+; CHECK: call i8* @llvm_hpvm_ocl_initContext(i32
+; CHECK: call i8* @llvm_hpvm_ocl_launch(i8*
+; CHECK: call void @llvm_hpvm_ocl_clearContext(i8*
+
+; CHECK-LABEL: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_executeNode(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+
+; CHECK-LABEL: @PipeRoot_cloned.3(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_clone
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop(
+
+; CHECK-LABEL: define i8* @LaunchDataflowGraph(i8*
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @main() local_unnamed_addr #4 {
+entry:
+  call void @llvm.hpvm.init()
+  tail call void @Launch()
+  call void @llvm.hpvm.cleanup()
+  ret i32 0
+}
+
+declare dso_local void @__hpvm__init(...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__cleanup(...) local_unnamed_addr #0
+
+declare i8* @llvm_hpvm_initializeTimerSet()
+
+declare void @llvm_hpvm_switchToTimer(i8**, i32)
+
+declare void @llvm_hpvm_printTimerSet(i8**, i8*)
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getNode() #5
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getParentNode(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.y(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.y(i8*) #5
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode2D(i8*, i64, i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct @Func1_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %call4 = call i8* @llvm.hpvm.getNode()
+  %call15 = call i8* @llvm.hpvm.getParentNode(i8* %call4)
+  %call26 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call4)
+  %call37 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call4)
+  %call58 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call15)
+  %call79 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call15)
+  %call910 = call i64 @llvm.hpvm.getNumNodeInstances.x(i8* %call4)
+  %call1111 = call i64 @llvm.hpvm.getNumNodeInstances.y(i8* %call4)
+  %mul = mul i64 %call910, %call58
+  %add = add i64 %mul, %call26
+  %mul13 = mul i64 %call1111, %call79
+  %add14 = add i64 %mul13, %call37
+  %sext = shl i64 %add14, 32
+  %idxprom = ashr exact i64 %sext, 32
+  %arrayidx = getelementptr inbounds i32, i32* %In, i64 %idxprom
+  %0 = load i32, i32* %arrayidx, align 4, !tbaa !15
+  %sext36 = shl i64 %add, 32
+  %idxprom15 = ashr exact i64 %sext36, 32
+  %arrayidx16 = getelementptr inbounds i32, i32* %Out, i64 %idxprom15
+  %1 = load i32, i32* %arrayidx16, align 4, !tbaa !15
+  %add17 = add nsw i32 %1, %0
+  store i32 %add17, i32* %arrayidx16, align 4, !tbaa !15
+  ret %emptyStruct undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.input(i8*, i32, i32, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.0 @Func3_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func1_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.0 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode(i8*) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.1 @Func2_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func3_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.1 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.launch(i8*, i8*, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.2 @PipeRoot_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func2_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.2 undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.wait(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.init() #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.cleanup() #6
+
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #1 = { argmemonly nounwind }
+attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #3 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #4 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #5 = { nounwind readnone }
+attributes #6 = { nounwind }
+
+!llvm.module.flags = !{!0}
+!llvm.ident = !{!1}
+!hpvm_hint_gpu = !{!2}
+!hpvm_hint_cpu = !{!3, !4, !5}
+!hpvm_hint_spir = !{}
+!hpvm_hint_cudnn = !{}
+!hpvm_hint_promise = !{}
+!hpvm_hint_cpu_gpu = !{}
+!hpvm_hint_cpu_spir = !{}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{!"clang version 9.0.0 (https://gitlab.engr.illinois.edu/llvm/hpvm.git 6690f9e7e8b46b96aea222d3e85315cd63545953)"}
+!2 = !{%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned}
+!3 = !{%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned}
+!4 = !{%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned}
+!5 = !{%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned}
+!6 = !{!7, !8, i64 0}
+!7 = !{!"Root", !8, i64 0, !11, i64 8, !8, i64 16, !11, i64 24}
+!8 = !{!"any pointer", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C/C++ TBAA"}
+!11 = !{!"long", !9, i64 0}
+!12 = !{!7, !11, i64 8}
+!13 = !{!7, !8, i64 16}
+!14 = !{!7, !11, i64 24}
+!15 = !{!16, !16, i64 0}
+!16 = !{!"int", !9, i64 0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_X86/ThreeLevel.codeGen.ll b/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.codeGen.ll
similarity index 94%
rename from hpvm/test/regressionTests/DFG2LLVM_X86/ThreeLevel.codeGen.ll
rename to hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.codeGen.ll
index a60f28a08a3bad2272687169bb1f4778f1bb8b6e..457d0c2bc3538e7e8cd94c1481e513da07b48602 100644
--- a/hpvm/test/regressionTests/DFG2LLVM_X86/ThreeLevel.codeGen.ll
+++ b/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.codeGen.ll
@@ -1,4 +1,4 @@
-; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -S -dfg2llvm-x86 <  %s | FileCheck %s
+; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -S -dfg2llvm-cpu <  %s | FileCheck %s
 ; ModuleID = 'ThreeLevel.ll'
 source_filename = "ThreeLevel.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
@@ -13,9 +13,9 @@ target triple = "x86_64-unknown-linux-gnu"
 
 ; CHECK-LABEL: i32 @main(
 ; CHECK: call void @llvm.hpvm.init()
-; CHECK: call i8* @llvm_hpvm_x86_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
 ; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8*
-; CHECK-NEXT: call void @llvm_hpvm_x86_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
 
 ; CHECK-LABEL: @Func3_cloned(
 ; CHECK: call i8* @llvm.hpvm.createNode2D(
@@ -42,26 +42,26 @@ target triple = "x86_64-unknown-linux-gnu"
 ; CHECK-NEXT: call void @llvm.hpvm.bind.output(i8* %Func2_cloned.node
 
 ; CHECK-LABEL: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned
-; CHECK: call i8* @llvm_hpvm_x86_argument_ptr(
+; CHECK: call i8* @llvm_hpvm_cpu_argument_ptr(
 
 ; CHECK-LABEL: @Func3_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
 ; CHECK-LABEL: for.body1:
 ; CHECK: %index.y = phi i64 [ 0, %for.body ], [ %index.y.inc, %for.body1 ]
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_push(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_cloned(
 ; CHECK-LABEL: for.body:
 ; CHECK-NEXT: %index.x = phi i64 [ 0, %entry ], [ %index.x.inc, %for.body ]
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_push(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func3_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @PipeRoot_cloned.4(
-; CHECK: call void @llvm_hpvm_x86_dstack_push(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @LaunchDataflowGraph(
 ; CHECK: call %struct.out.PipeRoot @PipeRoot_cloned.4(
@@ -210,9 +210,9 @@ declare void @llvm.hpvm.wait(i8*) #3
 ; Function Attrs: nounwind
 declare void @llvm.hpvm.cleanup() #3
 
-attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #1 = { argmemonly nounwind }
-attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #3 = { nounwind }
 
 !llvm.module.flags = !{!0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.cond.ll b/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.cond.ll
new file mode 100644
index 0000000000000000000000000000000000000000..a96357bac630d04ead246e059cc170647fa7a043
--- /dev/null
+++ b/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.cond.ll
@@ -0,0 +1,300 @@
+; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -S -dfg2llvm-cpu <  %s | FileCheck %s
+; ModuleID = 'ThreeLevel.cond.ll'
+source_filename = "ThreeLevel.cond.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.Root = type { i32*, i64, i32*, i64 }
+%emptyStruct = type <{}>
+%emptyStruct.0 = type <{}>
+%emptyStruct.1 = type <{}>
+%emptyStruct.2 = type <{}>
+
+; CHECK-LABEL: @Launch(
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
+; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
+
+; CHECK-LABEL: i32 @main(
+; CHECK: call void @llvm.hpvm.init()
+; CHECK: call void @llvm.hpvm.cleanup()
+
+; CHECK-LABEL: @Func3_cloned(
+; CHECK: call i8* @llvm.hpvm.createNode2D(
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node
+
+; CHECK-LABEL: @Func2_cloned(
+; CHECK: call i8* @llvm.hpvm.createNode2D(
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node
+
+; CHECK-LABEL: @PipeRoot_cloned(
+; CHECK: call i8* @llvm.hpvm.createNode(
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node
+; CHECK-NEXT: call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node
+
+; CHECK-LABEL: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned
+; CHECK: call i8* @llvm_hpvm_cpu_argument_ptr(
+; CHECK: call i8* @llvm_hpvm_cpu_argument_ptr(
+; CHECK: call i64 @llvm_hpvm_cpu_getDimInstance(
+; CHECK: call i64 @llvm_hpvm_cpu_getDimInstance(
+
+
+; CHECK-LABEL: @Func3_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
+; CHECK: br i1 %cond.y, label %for.body1, label %for.end2
+; CHECK-LABEL: for.end2:
+; CHECK:  br i1 %cond.x, label %for.body, label %for.end
+; CHECK-LABEL: for.end:
+
+; CHECK-LABEL: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-LABEL: for.body:
+; CHECK-NEXT: %index.x = phi i64 [ 0, %entry ], [ %index.x.inc, %for.end2 ]
+; CHECK-LABEL: for.body1:
+; CHECK-NEXT: %index.y = phi i64 [ 0, %for.body ], [ %index.y.inc, %for.body1 ]
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func3_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
+; CHECK: br i1 %cond.y, label %for.body1, label %for.end2
+; CHECK-LABEL: for.end2:
+; CHECK: br i1 %cond.x, label %for.body, label %for.end
+; CHECK-LABEL: for.end:
+
+; CHECK-LABEL: @PipeRoot_cloned.4(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
+
+; CHECK-LABEL: @LaunchDataflowGraph(
+; CHECK: call %emptyStruct.2 @PipeRoot_cloned.4(
+
+declare dso_local void @__hpvm__hint(i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__getNode(...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__getParentNode(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_y(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_y(i8*) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__createNodeND(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindIn(i8*, i32, i32, i32) local_unnamed_addr #0
+
+; Function Attrs: noinline nounwind uwtable
+define dso_local void @Launch() local_unnamed_addr #2 {
+entry:
+  %RootArgs = alloca %struct.Root, align 8
+  %0 = bitcast %struct.Root* %RootArgs to i8*
+  call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %0) #6
+  %call = tail call noalias i8* @malloc(i64 1024) #6
+  %1 = bitcast %struct.Root* %RootArgs to i8**
+  store i8* %call, i8** %1, align 8, !tbaa !6
+  %Insize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 1
+  store i64 1024, i64* %Insize, align 8, !tbaa !12
+  %output = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 2
+  %call1 = tail call noalias i8* @malloc(i64 1024) #6
+  %2 = bitcast i32** %output to i8**
+  store i8* %call1, i8** %2, align 8, !tbaa !13
+  %Outsize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 3
+  store i64 1024, i64* %Outsize, align 8, !tbaa !14
+  %3 = bitcast %struct.Root* %RootArgs to i8*
+  %graphID = call i8* @llvm.hpvm.launch(i8* bitcast (%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned to i8*), i8* %3, i1 false)
+  call void @llvm.hpvm.wait(i8* %graphID)
+  call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull %0) #6
+  ret void
+}
+
+; Function Attrs: nofree nounwind
+declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #3
+
+declare dso_local i8* @__hpvm__launch(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__wait(i8*) local_unnamed_addr #0
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @main() local_unnamed_addr #4 {
+entry:
+  call void @llvm.hpvm.init()
+  tail call void @Launch()
+  call void @llvm.hpvm.cleanup()
+  ret i32 0
+}
+
+declare dso_local void @__hpvm__init(...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__cleanup(...) local_unnamed_addr #0
+
+declare i8* @llvm_hpvm_initializeTimerSet()
+
+declare void @llvm_hpvm_switchToTimer(i8**, i32)
+
+declare void @llvm_hpvm_printTimerSet(i8**, i8*)
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getNode() #5
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getParentNode(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.y(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.y(i8*) #5
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode2D(i8*, i64, i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct @Func1_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %call4 = call i8* @llvm.hpvm.getNode()
+  %call15 = call i8* @llvm.hpvm.getParentNode(i8* %call4)
+  %call26 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call4)
+  %conv = trunc i64 %call26 to i32
+  %call37 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call4)
+  %conv4 = trunc i64 %call37 to i32
+  %call58 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call15)
+  %conv6 = trunc i64 %call58 to i32
+  %call79 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call15)
+  %conv8 = trunc i64 %call79 to i32
+  %call910 = call i64 @llvm.hpvm.getNumNodeInstances.x(i8* %call4)
+  %conv10 = trunc i64 %call910 to i32
+  %call1111 = call i64 @llvm.hpvm.getNumNodeInstances.y(i8* %call4)
+  %conv12 = trunc i64 %call1111 to i32
+  %mul = mul nsw i32 %conv10, %conv6
+  %add = add nsw i32 %mul, %conv
+  %mul13 = mul nsw i32 %conv12, %conv8
+  %add14 = add nsw i32 %mul13, %conv4
+  %cmp = icmp eq i32 %add, %add14
+  br i1 %cmp, label %if.end, label %if.then
+
+if.then:                                          ; preds = %entry
+  %arrayidx = getelementptr inbounds i32, i32* %In, i64 3
+  %0 = load i32, i32* %arrayidx, align 4, !tbaa !15
+  %idxprom = sext i32 %add to i64
+  %arrayidx16 = getelementptr inbounds i32, i32* %Out, i64 %idxprom
+  %1 = load i32, i32* %arrayidx16, align 4, !tbaa !15
+  %add17 = add nsw i32 %1, %0
+  store i32 %add17, i32* %arrayidx16, align 4, !tbaa !15
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret %emptyStruct undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.input(i8*, i32, i32, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.0 @Func3_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func1_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.0 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode(i8*) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.1 @Func2_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func3_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.1 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.launch(i8*, i8*, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.2 @PipeRoot_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func2_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.2 undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.wait(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.init() #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.cleanup() #6
+
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #1 = { argmemonly nounwind }
+attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #3 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #4 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #5 = { nounwind readnone }
+attributes #6 = { nounwind }
+
+!llvm.module.flags = !{!0}
+!llvm.ident = !{!1}
+!hpvm_hint_cpu = !{!2, !3, !4, !5}
+!hpvm_hint_gpu = !{}
+!hpvm_hint_spir = !{}
+!hpvm_hint_cudnn = !{}
+!hpvm_hint_promise = !{}
+!hpvm_hint_cpu_gpu = !{}
+!hpvm_hint_cpu_spir = !{}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{!"clang version 9.0.0 (https://gitlab.engr.illinois.edu/llvm/hpvm.git 3551132592a00cab6c966df508ab511598269f78)"}
+!2 = !{%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned}
+!3 = !{%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned}
+!4 = !{%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned}
+!5 = !{%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned}
+!6 = !{!7, !8, i64 0}
+!7 = !{!"Root", !8, i64 0, !11, i64 8, !8, i64 16, !11, i64 24}
+!8 = !{!"any pointer", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C/C++ TBAA"}
+!11 = !{!"long", !9, i64 0}
+!12 = !{!7, !11, i64 8}
+!13 = !{!7, !8, i64 16}
+!14 = !{!7, !11, i64 24}
+!15 = !{!16, !16, i64 0}
+!16 = !{!"int", !9, i64 0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.constmem.OpenCL.ll b/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.constmem.OpenCL.ll
new file mode 100644
index 0000000000000000000000000000000000000000..c3fd32fd5f9c6ccee12834871d11212e39f70bc7
--- /dev/null
+++ b/hpvm/test/regressionTests/DFG2LLVM_CPU/ThreeLevel.constmem.OpenCL.ll
@@ -0,0 +1,270 @@
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -load LLVMDFG2LLVM_CPU.so -S -localmem -dfg2llvm-opencl -dfg2llvm-cpu <  %s | FileCheck %s
+; ModuleID = 'ThreeLevel.opt.ll'
+source_filename = "ThreeLevel.opt.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.Root = type { i32*, i64, i32*, i64 }
+%struct.out.Allocation = type <{ i8*, i64 }>
+%emptyStruct = type <{}>
+%emptyStruct.0 = type <{}>
+%emptyStruct.1 = type <{}>
+%emptyStruct.2 = type <{}>
+
+declare dso_local void @__hpvm__hint(i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__getNode(...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__getParentNode(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_y(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_y(i8*) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__malloc(i64) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__return(i32, ...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__createNodeND(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindIn(i8*, i32, i32, i32) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__edge(i8*, i8*, i32, i32, i32, i32) local_unnamed_addr #0
+
+; CHECK-LABEL: @Launch(
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8*
+; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
+
+; Function Attrs: noinline nounwind uwtable
+define dso_local void @Launch() local_unnamed_addr #2 {
+entry:
+  %RootArgs = alloca %struct.Root, align 8
+  %0 = bitcast %struct.Root* %RootArgs to i8*
+  call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %0) #6
+  %call = tail call noalias i8* @malloc(i64 1024) #6
+  %1 = bitcast %struct.Root* %RootArgs to i8**
+  store i8* %call, i8** %1, align 8, !tbaa !6
+  %Insize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 1
+  store i64 1024, i64* %Insize, align 8, !tbaa !12
+  %output = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 2
+  %call1 = tail call noalias i8* @malloc(i64 1024) #6
+  %2 = bitcast i32** %output to i8**
+  store i8* %call1, i8** %2, align 8, !tbaa !13
+  %Outsize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 3
+  store i64 1024, i64* %Outsize, align 8, !tbaa !14
+  %3 = bitcast %struct.Root* %RootArgs to i8*
+  %graphID = call i8* @llvm.hpvm.launch(i8* bitcast (%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned to i8*), i8* %3, i1 false)
+  call void @llvm.hpvm.wait(i8* %graphID)
+  call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull %0) #6
+  ret void
+}
+
+; Function Attrs: nofree nounwind
+declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #3
+
+declare dso_local i8* @__hpvm__launch(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__wait(i8*) local_unnamed_addr #0
+
+; CHECK-LABEL: @main(
+; CHECK: call i8* @llvm_hpvm_ocl_initContext(i32
+; CHECK: call i8* @llvm_hpvm_ocl_launch(i8*
+; CHECK: call void @llvm_hpvm_ocl_clearContext(i8*
+
+; CHECK-LABEL: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_cloned(
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_shared(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_executeNode(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+
+; CHECK-LABEL: @PipeRoot_cloned.4(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
+; CHECK-NEXT: @Func2_cloned.3_cloned_cloned_cloned_cloned_cloned_clone
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop(
+
+; CHECK-LABEL: define i8* @LaunchDataflowGraph(i8*
+
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @main() local_unnamed_addr #4 {
+entry:
+  call void @llvm.hpvm.init()
+  tail call void @Launch()
+  call void @llvm.hpvm.cleanup()
+  ret i32 0
+}
+
+declare dso_local void @__hpvm__init(...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__cleanup(...) local_unnamed_addr #0
+
+declare i8* @llvm_hpvm_initializeTimerSet()
+
+declare void @llvm_hpvm_switchToTimer(i8**, i32)
+
+declare void @llvm_hpvm_printTimerSet(i8**, i8*)
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getNode() #5
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getParentNode(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.y(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.y(i8*) #5
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.malloc(i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %struct.out.Allocation @Allocation_cloned(i64 %block) #4 {
+entry:
+  %call1 = call i8* @llvm.hpvm.malloc(i64 %block)
+  %returnStruct = insertvalue %struct.out.Allocation undef, i8* %call1, 0
+  %returnStruct2 = insertvalue %struct.out.Allocation %returnStruct, i64 %block, 1
+  ret %struct.out.Allocation %returnStruct2
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode2D(i8*, i64, i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct @Func1_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %call4 = call i8* @llvm.hpvm.getNode()
+  %call15 = call i8* @llvm.hpvm.getParentNode(i8* %call4)
+  %call26 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call4)
+  %call37 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call4)
+  %call58 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call15)
+  %call79 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call15)
+  %call910 = call i64 @llvm.hpvm.getNumNodeInstances.x(i8* %call4)
+  %call1111 = call i64 @llvm.hpvm.getNumNodeInstances.y(i8* %call4)
+  %mul = mul i64 %call910, %call58
+  %add = add i64 %mul, %call26
+  %arrayidx = getelementptr inbounds i32, i32* %In, i64 3
+  %0 = load i32, i32* %arrayidx, align 4, !tbaa !15
+  %sext = shl i64 %add, 32
+  %idxprom = ashr exact i64 %sext, 32
+  %arrayidx15 = getelementptr inbounds i32, i32* %Out, i64 %idxprom
+  %1 = load i32, i32* %arrayidx15, align 4, !tbaa !15
+  %add16 = add nsw i32 %1, %0
+  store i32 %add16, i32* %arrayidx15, align 4, !tbaa !15
+  ret %emptyStruct undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.input(i8*, i32, i32, i1) #6
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createEdge(i8*, i8*, i1, i32, i32, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.0 @Func3_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func1_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned to i8*), i64 3, i64 5)
+  %Allocation_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%struct.out.Allocation (i64)* @Allocation_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Allocation_cloned.node, i32 1, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 3, i32 3, i1 false)
+  %output = call i8* @llvm.hpvm.createEdge(i8* %Allocation_cloned.node, i8* %Func1_cloned.node, i1 true, i32 0, i32 0, i1 false)
+  %output1 = call i8* @llvm.hpvm.createEdge(i8* %Allocation_cloned.node, i8* %Func1_cloned.node, i1 true, i32 1, i32 1, i1 false)
+  ret %emptyStruct.0 undef
+}
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.1 @Func2_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func3_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.1 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.launch(i8*, i8*, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.2 @PipeRoot_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func2_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.2 undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.wait(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.init() #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.cleanup() #6
+
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #1 = { argmemonly nounwind }
+attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #3 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #4 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #5 = { nounwind readnone }
+attributes #6 = { nounwind }
+
+!llvm.module.flags = !{!0}
+!llvm.ident = !{!1}
+!hpvm_hint_gpu = !{!2}
+!hpvm_hint_cpu = !{!3, !4, !5}
+!hpvm_hint_spir = !{}
+!hpvm_hint_cudnn = !{}
+!hpvm_hint_promise = !{}
+!hpvm_hint_cpu_gpu = !{}
+!hpvm_hint_cpu_spir = !{}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{!"clang version 9.0.0 (https://gitlab.engr.illinois.edu/llvm/hpvm.git 6690f9e7e8b46b96aea222d3e85315cd63545953)"}
+!2 = !{%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned}
+!3 = !{%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned}
+!4 = !{%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned}
+!5 = !{%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned}
+!6 = !{!7, !8, i64 0}
+!7 = !{!"Root", !8, i64 0, !11, i64 8, !8, i64 16, !11, i64 24}
+!8 = !{!"any pointer", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C/C++ TBAA"}
+!11 = !{!"long", !9, i64 0}
+!12 = !{!7, !11, i64 8}
+!13 = !{!7, !8, i64 16}
+!14 = !{!7, !11, i64 24}
+!15 = !{!16, !16, i64 0}
+!16 = !{!"int", !9, i64 0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_X86/TwoLevel.codeGen.ll b/hpvm/test/regressionTests/DFG2LLVM_CPU/TwoLevel.codeGen.ll
similarity index 94%
rename from hpvm/test/regressionTests/DFG2LLVM_X86/TwoLevel.codeGen.ll
rename to hpvm/test/regressionTests/DFG2LLVM_CPU/TwoLevel.codeGen.ll
index b218b70fd0e32b6e6222e7a14e88ab3a09f57977..f8eed46c26eec0b383695ba6da9d81426f2abec7 100644
--- a/hpvm/test/regressionTests/DFG2LLVM_X86/TwoLevel.codeGen.ll
+++ b/hpvm/test/regressionTests/DFG2LLVM_CPU/TwoLevel.codeGen.ll
@@ -1,4 +1,4 @@
-; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -S -dfg2llvm-x86 <  %s | FileCheck %s
+; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -S -dfg2llvm-cpu <  %s | FileCheck %s
 ; ModuleID = 'TwoLevel.ll'
 source_filename = "TwoLevel.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
@@ -11,9 +11,9 @@ target triple = "x86_64-unknown-linux-gnu"
 
 ; CHECK-LABEL: i32 @main(
 ; CHECK: call void @llvm.hpvm.init()
-; CHECK: call i8* @llvm_hpvm_x86_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
+; CHECK: call i8* @llvm_hpvm_cpu_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
 ; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8* 
-; CHECK-NEXT: call void @llvm_hpvm_x86_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
 
 ; CHECK-LABEL: @Func2_cloned(
 ; CHECK: call i8* @llvm.hpvm.createNode1D(
@@ -32,19 +32,19 @@ target triple = "x86_64-unknown-linux-gnu"
 ; CHECK-NEXT: call void @llvm.hpvm.bind.output(i8* %Func2_cloned.node
 
 ; CHECK-LABEL: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK: call i8* @llvm_hpvm_x86_argument_ptr(
+; CHECK: call i8* @llvm_hpvm_cpu_argument_ptr(
 
 ; CHECK-LABEL: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
 ; CHECK-LABEL: for.body
 ; CHECK: %index.x = phi i64 [ 0, %entry ], [ %index.x.inc, %for.body ]
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_push(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @PipeRoot_cloned.3(
-; CHECK: call void @llvm_hpvm_x86_dstack_push(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @LaunchDataflowGraph(i8*
 ; call %struct.out.PipeRoot @PipeRoot_cloned.3(
@@ -178,9 +178,9 @@ declare void @llvm.hpvm.wait(i8*) #3
 ; Function Attrs: nounwind
 declare void @llvm.hpvm.cleanup() #3
 
-attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #1 = { argmemonly nounwind }
-attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #3 = { nounwind }
 
 !llvm.module.flags = !{!0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_X86/TwoRoot.ll b/hpvm/test/regressionTests/DFG2LLVM_CPU/TwoRoot.ll
similarity index 92%
rename from hpvm/test/regressionTests/DFG2LLVM_X86/TwoRoot.ll
rename to hpvm/test/regressionTests/DFG2LLVM_CPU/TwoRoot.ll
index 5ce7a58e2189d1a00806979af6bab0cbe1029852..c562ab141ea7276977a10fadf37c1bc49da9288c 100644
--- a/hpvm/test/regressionTests/DFG2LLVM_X86/TwoRoot.ll
+++ b/hpvm/test/regressionTests/DFG2LLVM_CPU/TwoRoot.ll
@@ -1,4 +1,4 @@
-; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_X86.so -S -dfg2llvm-x86 <  %s | FileCheck %s
+; RUN: opt -load LLVMBuildDFG.so -load LLVMDFG2LLVM_CPU.so -S -dfg2llvm-cpu <  %s | FileCheck %s
 ; ModuleID = 'TwoLaunch.ll'
 source_filename = "TwoLaunch.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
@@ -11,12 +11,12 @@ target triple = "x86_64-unknown-linux-gnu"
 
 ; CHECK-LABEL: i32 @main(
 ; CHECK: call void @llvm.hpvm.init()
-; CHECK: @llvm_hpvm_x86_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
+; CHECK: @llvm_hpvm_cpu_launch(i8* (i8*)* @LaunchDataflowGraph, i8*
 ; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8*
-; CHECK: @llvm_hpvm_x86_launch(i8* (i8*)* @LaunchDataflowGraph.7, i8*
+; CHECK: @llvm_hpvm_cpu_launch(i8* (i8*)* @LaunchDataflowGraph.7, i8*
 ; CHECK-NEXT: call i8* @llvm.hpvm.launch(i8*
-; CHECK-NEXT: call void @llvm_hpvm_x86_wait(i8*
-; CHECK-NEXT: call void @llvm_hpvm_x86_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_cpu_wait(i8*
 
 ; CHECK-LABEL: @Func2_cloned(
 ; CHECK: call i8* @llvm.hpvm.createNode1D(
@@ -35,30 +35,30 @@ target triple = "x86_64-unknown-linux-gnu"
 ; CHECK-NEXT: call void @llvm.hpvm.bind.output(i8* %Func2_cloned.node
 
 ; CHECK-LABEL: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK: call i8* @llvm_hpvm_x86_argument_ptr(
+; CHECK: call i8* @llvm_hpvm_cpu_argument_ptr(
 
 ; CHECK-LABEL: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
 ; CHECK: %index.x = phi i64 [ 0, %entry ], [ %index.x.inc, %for.body ]
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_push(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func1_cloned.1_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @PipeRoot_cloned.3(
-; CHECK: call void @llvm_hpvm_x86_dstack_push(
+; CHECK: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @LaunchDataflowGraph(i8*
 ; CHECK: call %struct.out.PipeRoot @PipeRoot_cloned.3(
 
 ; CHECK-LABEL: @Func1_cloned.4_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK: @llvm_hpvm_x86_argument_ptr(
+; CHECK: @llvm_hpvm_cpu_argument_ptr(
 
 ; CHECK-LABEL: @Func2_cloned.5_cloned_cloned_cloned_cloned_cloned_cloned(
 ; CHECK: %index.x = phi i64 [ 0, %entry ], [ %index.x.inc, %for.body ]
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_push(
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_push(
 ; CHECK-NEXT: @Func1_cloned.4_cloned_cloned_cloned_cloned_cloned_cloned(
-; CHECK-NEXT: call void @llvm_hpvm_x86_dstack_pop()
+; CHECK-NEXT: call void @llvm_hpvm_cpu_dstack_pop()
 
 ; CHECK-LABEL: @LaunchDataflowGraph.7(i8*
 ; call %struct.out.PipeRoot @PipeRoot_cloned.6(
@@ -195,9 +195,9 @@ declare void @llvm.hpvm.wait(i8*) #3
 ; Function Attrs: nounwind
 declare void @llvm.hpvm.cleanup() #3
 
-attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #1 = { argmemonly nounwind }
-attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cpu-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
 attributes #3 = { nounwind }
 
 !llvm.module.flags = !{!0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.atomic.ll b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.atomic.ll
similarity index 99%
rename from hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.atomic.ll
rename to hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.atomic.ll
index 451035b21ede68a4796ebd1a0baa3645a77a31ef..ea6ec14d1069d45fcf7b434cfd08308c8f7c9158 100644
--- a/hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.atomic.ll
+++ b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.atomic.ll
@@ -1,4 +1,4 @@
-; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_NVPTX.so -S -localmem -dfg2llvm-nvptx <  %s | FileCheck %s
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -S -localmem -dfg2llvm-opencl <  %s | FileCheck %s
 ; ModuleID = 'ThreeLevel.atomic.ll'
 source_filename = "ThreeLevel.constmem.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
diff --git a/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.cond.const.ll b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.cond.const.ll
new file mode 100644
index 0000000000000000000000000000000000000000..04fd2ca0b5787990bb866151672e472ff009118c
--- /dev/null
+++ b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.cond.const.ll
@@ -0,0 +1,259 @@
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -S -localmem -dfg2llvm-opencl <  %s | FileCheck %s
+; ModuleID = 'ThreeLevel.cond.genhpvm.gpu.ll'
+source_filename = "ThreeLevel.cond.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.Root = type { i32*, i64, i32*, i64 }
+%emptyStruct = type <{}>
+%emptyStruct.0 = type <{}>
+%emptyStruct.1 = type <{}>
+%emptyStruct.2 = type <{}>
+
+declare dso_local void @__hpvm__hint(i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__getNode(...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__getParentNode(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_y(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_y(i8*) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__createNodeND(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindIn(i8*, i32, i32, i32) local_unnamed_addr #0
+
+
+; Function Attrs: noinline nounwind uwtable
+define dso_local void @Launch() local_unnamed_addr #2 {
+entry:
+  %RootArgs = alloca %struct.Root, align 8
+  %0 = bitcast %struct.Root* %RootArgs to i8*
+  call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %0) #6
+  %call = tail call noalias i8* @malloc(i64 1024) #6
+  %1 = bitcast %struct.Root* %RootArgs to i8**
+  store i8* %call, i8** %1, align 8, !tbaa !6
+  %Insize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 1
+  store i64 1024, i64* %Insize, align 8, !tbaa !12
+  %output = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 2
+  %call1 = tail call noalias i8* @malloc(i64 1024) #6
+  %2 = bitcast i32** %output to i8**
+  store i8* %call1, i8** %2, align 8, !tbaa !13
+  %Outsize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 3
+  store i64 1024, i64* %Outsize, align 8, !tbaa !14
+  %3 = bitcast %struct.Root* %RootArgs to i8*
+  %graphID = call i8* @llvm.hpvm.launch(i8* bitcast (%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned to i8*), i8* %3, i1 false)
+  call void @llvm.hpvm.wait(i8* %graphID)
+  call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull %0) #6
+  ret void
+}
+
+; Function Attrs: nofree nounwind
+declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #3
+
+declare dso_local i8* @__hpvm__launch(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__wait(i8*) local_unnamed_addr #0
+
+; CHECK-LABEL: @main(
+; CHECK: call i8* @llvm_hpvm_ocl_initContext(i32
+; CHECK: call i8* @llvm_hpvm_ocl_launch(i8*
+; CHECK: call void @llvm.hpvm.init(
+; CHECK: call void @llvm_hpvm_ocl_clearContext(i8*
+; CHECK: call void @llvm.hpvm.cleanup(
+
+; CHECK-LABEL: @Func1_cloned(
+; CHECK: br i1 %cmp, label %if.end, label %if.then
+; CHECK-LABEL: if.then:
+; CHECK-LABEL: if.end:
+
+; CHECK-LABEL: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned
+; CHECK-NOT: call void @llvm_hpvm_ocl_argument_shared(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK-NOT: call void @llvm_hpvm_ocl_argument_shared(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_executeNode(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+
+
+
+define dso_local i32 @main() local_unnamed_addr #4 {
+entry:
+  call void @llvm.hpvm.init()
+  tail call void @Launch()
+  call void @llvm.hpvm.cleanup()
+  ret i32 0
+}
+
+declare dso_local void @__hpvm__init(...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__cleanup(...) local_unnamed_addr #0
+
+declare i8* @llvm_hpvm_initializeTimerSet()
+
+declare void @llvm_hpvm_switchToTimer(i8**, i32)
+
+declare void @llvm_hpvm_printTimerSet(i8**, i8*)
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getNode() #5
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getParentNode(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.y(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.y(i8*) #5
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode2D(i8*, i64, i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct @Func1_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %call4 = call i8* @llvm.hpvm.getNode()
+  %call15 = call i8* @llvm.hpvm.getParentNode(i8* %call4)
+  %call26 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call4)
+  %conv = trunc i64 %call26 to i32
+  %call37 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call4)
+  %conv4 = trunc i64 %call37 to i32
+  %call58 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call15)
+  %conv6 = trunc i64 %call58 to i32
+  %call79 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call15)
+  %conv8 = trunc i64 %call79 to i32
+  %call910 = call i64 @llvm.hpvm.getNumNodeInstances.x(i8* %call4)
+  %conv10 = trunc i64 %call910 to i32
+  %call1111 = call i64 @llvm.hpvm.getNumNodeInstances.y(i8* %call4)
+  %conv12 = trunc i64 %call1111 to i32
+  %mul = mul nsw i32 %conv10, %conv6
+  %add = add nsw i32 %mul, %conv
+  %mul13 = mul nsw i32 %conv12, %conv8
+  %add14 = add nsw i32 %mul13, %conv4
+  %cmp = icmp eq i32 %add, %add14
+  br i1 %cmp, label %if.end, label %if.then
+
+if.then:                                          ; preds = %entry
+  %arrayidx = getelementptr inbounds i32, i32* %In, i64 3
+  %0 = load i32, i32* %arrayidx, align 4, !tbaa !15
+  %idxprom = sext i32 %add to i64
+  %arrayidx16 = getelementptr inbounds i32, i32* %Out, i64 %idxprom
+  %1 = load i32, i32* %arrayidx16, align 4, !tbaa !15
+  %add17 = add nsw i32 %1, %0
+  store i32 %add17, i32* %arrayidx16, align 4, !tbaa !15
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret %emptyStruct undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.input(i8*, i32, i32, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.0 @Func3_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func1_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.0 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode(i8*) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.1 @Func2_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func3_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.1 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.launch(i8*, i8*, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.2 @PipeRoot_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func2_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.2 undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.wait(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.init() #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.cleanup() #6
+
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #1 = { argmemonly nounwind }
+attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #3 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #4 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #5 = { nounwind readnone }
+attributes #6 = { nounwind }
+
+!llvm.module.flags = !{!0}
+!llvm.ident = !{!1}
+!hpvm_hint_gpu = !{!2}
+!hpvm_hint_cpu = !{!3, !4, !5}
+!hpvm_hint_spir = !{}
+!hpvm_hint_cudnn = !{}
+!hpvm_hint_promise = !{}
+!hpvm_hint_cpu_gpu = !{}
+!hpvm_hint_cpu_spir = !{}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{!"clang version 9.0.0 (https://gitlab.engr.illinois.edu/llvm/hpvm.git 3551132592a00cab6c966df508ab511598269f78)"}
+!2 = !{%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned}
+!3 = !{%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned}
+!4 = !{%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned}
+!5 = !{%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned}
+!6 = !{!7, !8, i64 0}
+!7 = !{!"Root", !8, i64 0, !11, i64 8, !8, i64 16, !11, i64 24}
+!8 = !{!"any pointer", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C/C++ TBAA"}
+!11 = !{!"long", !9, i64 0}
+!12 = !{!7, !11, i64 8}
+!13 = !{!7, !8, i64 16}
+!14 = !{!7, !11, i64 24}
+!15 = !{!16, !16, i64 0}
+!16 = !{!"int", !9, i64 0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.cond.ll b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.cond.ll
new file mode 100644
index 0000000000000000000000000000000000000000..1b1c589a4b0bdd9e1893d4fe51b37a03f6ff95b7
--- /dev/null
+++ b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.cond.ll
@@ -0,0 +1,258 @@
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -S -localmem -dfg2llvm-opencl <  %s | FileCheck %s
+; ModuleID = 'ThreeLevel.cond.ll'
+source_filename = "ThreeLevel.cond.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+%struct.Root = type { i32*, i64, i32*, i64 }
+%emptyStruct = type <{}>
+%emptyStruct.0 = type <{}>
+%emptyStruct.1 = type <{}>
+%emptyStruct.2 = type <{}>
+
+declare dso_local void @__hpvm__hint(i32) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__attributes(i32, ...) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__getNode(...) local_unnamed_addr #0
+
+declare dso_local i8* @__hpvm__getParentNode(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNodeInstanceID_y(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_x(i8*) local_unnamed_addr #0
+
+declare dso_local i64 @__hpvm__getNumNodeInstances_y(i8*) local_unnamed_addr #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
+
+declare dso_local i8* @__hpvm__createNodeND(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__bindIn(i8*, i32, i32, i32) local_unnamed_addr #0
+
+; Function Attrs: noinline nounwind uwtable
+define dso_local void @Launch() local_unnamed_addr #2 {
+entry:
+  %RootArgs = alloca %struct.Root, align 8
+  %0 = bitcast %struct.Root* %RootArgs to i8*
+  call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %0) #6
+  %call = tail call noalias i8* @malloc(i64 1024) #6
+  %1 = bitcast %struct.Root* %RootArgs to i8**
+  store i8* %call, i8** %1, align 8, !tbaa !6
+  %Insize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 1
+  store i64 1024, i64* %Insize, align 8, !tbaa !12
+  %output = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 2
+  %call1 = tail call noalias i8* @malloc(i64 1024) #6
+  %2 = bitcast i32** %output to i8**
+  store i8* %call1, i8** %2, align 8, !tbaa !13
+  %Outsize = getelementptr inbounds %struct.Root, %struct.Root* %RootArgs, i64 0, i32 3
+  store i64 1024, i64* %Outsize, align 8, !tbaa !14
+  %3 = bitcast %struct.Root* %RootArgs to i8*
+  %graphID = call i8* @llvm.hpvm.launch(i8* bitcast (%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned to i8*), i8* %3, i1 false)
+  call void @llvm.hpvm.wait(i8* %graphID)
+  call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull %0) #6
+  ret void
+}
+
+; Function Attrs: nofree nounwind
+declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #3
+
+declare dso_local i8* @__hpvm__launch(i32, ...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__wait(i8*) local_unnamed_addr #0
+
+; CHECK-LABEL: @main(
+; CHECK: call i8* @llvm_hpvm_ocl_initContext(i32
+; CHECK: call i8* @llvm_hpvm_ocl_launch(i8*
+; CHECK: call void @llvm.hpvm.init(
+; CHECK: call void @llvm_hpvm_ocl_clearContext(i8*
+; CHECK: call void @llvm.hpvm.cleanup(
+
+; CHECK-LABEL: @Func1_cloned(
+; CHECK: br i1 %cmp, label %if.end, label %if.then
+; CHECK-LABEL: if.then:
+; CHECK-LABEL: if.end:
+
+; CHECK-LABEL: @Func2_cloned.2_cloned_cloned_cloned_cloned_cloned_cloned
+; CHECK-NOT: call void @llvm_hpvm_ocl_argument_shared(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK-NOT: call void @llvm_hpvm_ocl_argument_shared(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_argument_ptr(i8*
+; CHECK: call void @llvm_hpvm_ocl_argument_scalar(i8*
+; CHECK: call i8* @llvm_hpvm_ocl_executeNode(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_wait(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+; CHECK-NEXT: call void @llvm_hpvm_ocl_free(i8*
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @main() local_unnamed_addr #4 {
+entry:
+  call void @llvm.hpvm.init()
+  tail call void @Launch()
+  call void @llvm.hpvm.cleanup()
+  ret i32 0
+}
+
+declare dso_local void @__hpvm__init(...) local_unnamed_addr #0
+
+declare dso_local void @__hpvm__cleanup(...) local_unnamed_addr #0
+
+declare i8* @llvm_hpvm_initializeTimerSet()
+
+declare void @llvm_hpvm_switchToTimer(i8**, i32)
+
+declare void @llvm_hpvm_printTimerSet(i8**, i8*)
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getNode() #5
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.hpvm.getParentNode(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNodeInstanceID.y(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.x(i8*) #5
+
+; Function Attrs: nounwind readnone
+declare i64 @llvm.hpvm.getNumNodeInstances.y(i8*) #5
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode2D(i8*, i64, i64) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct @Func1_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %call4 = call i8* @llvm.hpvm.getNode()
+  %call15 = call i8* @llvm.hpvm.getParentNode(i8* %call4)
+  %call26 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call4)
+  %conv = trunc i64 %call26 to i32
+  %call37 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call4)
+  %conv4 = trunc i64 %call37 to i32
+  %call58 = call i64 @llvm.hpvm.getNodeInstanceID.x(i8* %call15)
+  %conv6 = trunc i64 %call58 to i32
+  %call79 = call i64 @llvm.hpvm.getNodeInstanceID.y(i8* %call15)
+  %conv8 = trunc i64 %call79 to i32
+  %call910 = call i64 @llvm.hpvm.getNumNodeInstances.x(i8* %call4)
+  %conv10 = trunc i64 %call910 to i32
+  %call1111 = call i64 @llvm.hpvm.getNumNodeInstances.y(i8* %call4)
+  %conv12 = trunc i64 %call1111 to i32
+  %mul = mul nsw i32 %conv10, %conv6
+  %add = add nsw i32 %mul, %conv
+  %mul13 = mul nsw i32 %conv12, %conv8
+  %add14 = add nsw i32 %mul13, %conv4
+  %cmp = icmp eq i32 %add, %add14
+  br i1 %cmp, label %if.end, label %if.then
+
+if.then:                                          ; preds = %entry
+  %idxprom = sext i32 %add14 to i64
+  %arrayidx = getelementptr inbounds i32, i32* %In, i64 %idxprom
+  %0 = load i32, i32* %arrayidx, align 4, !tbaa !15
+  %idxprom16 = sext i32 %add to i64
+  %arrayidx17 = getelementptr inbounds i32, i32* %Out, i64 %idxprom16
+  %1 = load i32, i32* %arrayidx17, align 4, !tbaa !15
+  %add18 = add nsw i32 %1, %0
+  store i32 %add18, i32* %arrayidx17, align 4, !tbaa !15
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret %emptyStruct undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.bind.input(i8*, i32, i32, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.0 @Func3_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func1_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func1_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.0 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.createNode(i8*) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.1 @Func2_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func3_cloned.node = call i8* @llvm.hpvm.createNode2D(i8* bitcast (%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned to i8*), i64 3, i64 5)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func3_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.1 undef
+}
+
+; Function Attrs: nounwind
+declare i8* @llvm.hpvm.launch(i8*, i8*, i1) #6
+
+; Function Attrs: nounwind uwtable
+define dso_local %emptyStruct.2 @PipeRoot_cloned(i32* in %In, i64 %Insize, i32* in out %Out, i64 %Outsize) #4 {
+entry:
+  %Func2_cloned.node = call i8* @llvm.hpvm.createNode(i8* bitcast (%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned to i8*))
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 0, i32 0, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 1, i32 1, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 2, i32 2, i1 false)
+  call void @llvm.hpvm.bind.input(i8* %Func2_cloned.node, i32 3, i32 3, i1 false)
+  ret %emptyStruct.2 undef
+}
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.wait(i8*) #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.init() #6
+
+; Function Attrs: nounwind
+declare void @llvm.hpvm.cleanup() #6
+
+attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #1 = { argmemonly nounwind }
+attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #3 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #4 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #5 = { nounwind readnone }
+attributes #6 = { nounwind }
+
+!llvm.module.flags = !{!0}
+!llvm.ident = !{!1}
+!hpvm_hint_gpu = !{!2}
+!hpvm_hint_cpu = !{!3, !4, !5}
+!hpvm_hint_spir = !{}
+!hpvm_hint_cudnn = !{}
+!hpvm_hint_promise = !{}
+!hpvm_hint_cpu_gpu = !{}
+!hpvm_hint_cpu_spir = !{}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{!"clang version 9.0.0 (https://gitlab.engr.illinois.edu/llvm/hpvm.git 3551132592a00cab6c966df508ab511598269f78)"}
+!2 = !{%emptyStruct (i32*, i64, i32*, i64)* @Func1_cloned}
+!3 = !{%emptyStruct.0 (i32*, i64, i32*, i64)* @Func3_cloned}
+!4 = !{%emptyStruct.1 (i32*, i64, i32*, i64)* @Func2_cloned}
+!5 = !{%emptyStruct.2 (i32*, i64, i32*, i64)* @PipeRoot_cloned}
+!6 = !{!7, !8, i64 0}
+!7 = !{!"Root", !8, i64 0, !11, i64 8, !8, i64 16, !11, i64 24}
+!8 = !{!"any pointer", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C/C++ TBAA"}
+!11 = !{!"long", !9, i64 0}
+!12 = !{!7, !11, i64 8}
+!13 = !{!7, !8, i64 16}
+!14 = !{!7, !11, i64 24}
+!15 = !{!16, !16, i64 0}
+!16 = !{!"int", !9, i64 0}
diff --git a/hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.constmem.ll b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.constmem.ll
similarity index 99%
rename from hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.constmem.ll
rename to hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.constmem.ll
index 060608fdc5ae28ff52382fd722e7288c5531874f..5de9fd4e33adf8bfa819c8d4c2eb2a52f7a3a15c 100644
--- a/hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.constmem.ll
+++ b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.constmem.ll
@@ -1,4 +1,4 @@
-; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_NVPTX.so -S -localmem -dfg2llvm-nvptx <  %s | FileCheck %s
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -S -localmem -dfg2llvm-opencl <  %s | FileCheck %s
 ; ModuleID = 'ThreeLevel.opt.ll'
 source_filename = "ThreeLevel.opt.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
diff --git a/hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.ll b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.ll
similarity index 99%
rename from hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.ll
rename to hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.ll
index ed99bee9f704b3dff96abcbd50982ec64a38c2d5..9af2e48674ae0867e74eb46da2a399d3e1d6df44 100644
--- a/hpvm/test/regressionTests/DFG2LLVM_NVPTX/ThreeLevel.ll
+++ b/hpvm/test/regressionTests/DFG2LLVM_OPENCL/ThreeLevel.ll
@@ -1,4 +1,4 @@
-; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_NVPTX.so -S -localmem -dfg2llvm-nvptx <  %s | FileCheck %s
+; RUN: opt -load LLVMBuildDFG.so -load LLVMLocalMem.so -load LLVMDFG2LLVM_OpenCL.so -S -localmem -dfg2llvm-opencl <  %s | FileCheck %s
 ; ModuleID = 'ThreeLevel.ll'
 source_filename = "ThreeLevel.c"
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"