diff --git a/hpvm/docs/developerdocs/backend-passes.rst b/hpvm/docs/developerdocs/backend-passes.rst
index 3d975dfe1f2e44dd974d42ab513512dd5e139f85..d8e2c2bffa2eaee025e64187311050e6bcb1c78e 100644
--- a/hpvm/docs/developerdocs/backend-passes.rst
+++ b/hpvm/docs/developerdocs/backend-passes.rst
@@ -1,15 +1,16 @@
 HPVM Backend Passes
 ====================
-HPVM and ApproxHPVM support numerous backend targets for code generation. The transformations to target these devices utilise a common utility class `CodeGenTraversal`, where each node in the HPVM Dataflow graph in a bottom up  topological ordering. For many target backends, some different logic is required when processing Internal Nodes from Leaf Nodes. As such the `CodeGenTraversal` utility class provides 2 virtual functions, namely `codeGen(DFInternalNode* )` and `codeGen(DFLeafNode* )` to generate target specific LLVM IR for each node function as well as adding HPVM runtime calls where needed.  
+
+HPVM includes multiple backend targets for code generation. The transformations to target these devices utilise a common utility class `CodeGenTraversal`, where each node in the HPVM Dataflow graph in a bottom up  topological ordering. For many target backends, some different logic is required when processing Internal Nodes from Leaf Nodes. As such the `CodeGenTraversal` utility class provides 2 virtual functions, namely `codeGen(DFInternalNode* )` and `codeGen(DFLeafNode* )` to generate target specific LLVM IR for each node function as well as adding HPVM runtime calls where needed.  
 
 DFG2LLVM_CPU
 ^^^^^^^^^^^^
 Description
 -----------
 
-The CPU backend plays a vital role in HPVM. It firstly enables targeting devices where only CPUs are available. Additionally, it is used in various ApproxHPVM backends for code generation and dataflow between HPVM Nodes. Finally, it is used to generate the launch code that launches the DFG.
+The CPU backend plays an important role in HPVM codegen. It enables targeting devices that only host a CPU. Additionally, it is used in the compilation pipeline involving other backends. For instance, it is used for inserting runtime calls for data movement and runtime calls for launching compute kernels. 
 
-For Leaf Nodes that are targeting the CPU, this pass generates sequential code that uses loops to represent the dynamic replication of those nodes.. We are currently working on another CPU backend where multi-dimensional nodes can be executed in parallel instead of sequentially.
+For Leaf Nodes targeting the CPU, this backend generates sequential code that uses loops to represent the dynamic replication of those nodes. We are currently working on an extension to this backend where multi-dimensional nodes can be executed in parallel.
 
 codeGen(DFLeafNode* )
 ----------------------
@@ -39,7 +40,7 @@ Consider a 3 dimensional Leaf node function with the following body:
 The above example will illustrate the steps in transforming the leaf node in this pass. 
 
 
-1. A clone of the function is made, with 6 additional arguments added to its argument list which will correspond to the current dimension indices and dimension sizes along the x,y,z axes. Any node querying intrinsics on the current node will be updated to refer to these newly added arguments instead, and these calls will be removed.
+1. A clone of the function is made, with 6 additional arguments added to its argument list which correspond to the current dimension indices and dimension sizes along the x,y,z axes. Any node querying intrinsics on the current node will be updated to refer to these newly added arguments instead, and these calls will be removed.
 
 .. code-block:: c
 
@@ -92,7 +93,7 @@ The above example will illustrate the steps in transforming the leaf node in thi
 
   }
 
-3. Later on, when the parent (internal) node for this leaf node will process, it will create nested for-loops over each of the dimensions. Inside the loop body, the leaf node function will be called with the appropriate arguments.
+3. Later on, when the parent (internal) node for this leaf node will process, it will create nested for-loops over each of the dimensions. Inside the loop body, the leaf node function is called with appropriate arguments.
 
 If the original call was:
 
@@ -115,11 +116,11 @@ codeGen(DFInternalNode* )
 
 For Internal nodes in the CPU backends, the CPU version of that node is only generated if all its children nodes have CPU as itâ€™s target device. 
 
-1. First we check if the immediate child nodes of the current internal node are also CPU.
+1. First, it checks if the immediate child nodes of the current internal node are also targeted for CPU.
 
-2. If so, then we create a cloned version of the internal node function. This node function will have 6 additional arguments added to it, similar to leaf node case.
+2. If yes, it creates a cloned version of the internal node function. This node function will have 6 additional arguments added to it, similar to leaf node case.
 
-3. As stated in the leaf node case, the internal node is responsible for converting each of the multi-dimensional child nodes it has into for loops along each axis.
+3. As in the leaf node case, the internal node is responsible for converting each multi-dimensional child nodes into for loops along each axis.
 
 4. Each Internal Node has an entry node for the Child Subgraph as well as an exit node for the child subgraph. The exit node return values are appropriately updated to reflect the newly cloned nodes.
 
@@ -127,19 +128,19 @@ Depending on whether the rootNodeâ€™s subgraph is streaming or non-streaming, th
 
 * For non-streaming launches:
 
-1.  a wrapper root node function is created which takes an opaque pointer to the packed struct and returns an i8*. 
+1. A wrapper root node function is created which takes an opaque pointer to the packed struct and returns an i8*. 
 
 2. Inside the wrapper root node function, the actual root node function is invoked by extracting the correct arguments from the input struct.
 
-3. Similarly the output struct (if the node returns) is the last element of the input struct and appropriate store instructions are generated for that. The uses of the original launch function are also updated to use this wrapper root function.
+3. Similarly the output struct (if the node returns) is the last element of the input struct and appropriate store instructions are generated. The uses of the original launch function are also updated to use the new wrapper root function.
 
 * For streaming launches:
 
 1. Similar to the non-streaming case, a wrapper root node function is created which takes the packed struct of arguments and the graph handle identifier.
 
-2. For each incoming argument to that function, we first check if that edge is streaming. If so we extract that value from the struct and create a buffer for it in the newly created function.
+2. For each incoming argument to that function, it checks if that edge is streaming. If yes, it extracts the value from the struct and creates a buffer for it in the newly created function.
 
-3. For each type of edge in the child graph of the root node function, we create appropriate bindIn_buffer, bindOut_buffer, edge_buffer calls based on the extracted buffers. 
+3. For each type of edge in the child graph of the root node function, it creates appropriate bindIn_buffer, bindOut_buffer, edge_buffer calls based on the extracted buffers. 
 
 4. An additional buffer is created which is used to indicate whether the following buffer input is the last input.
 
@@ -153,7 +154,8 @@ DFG2LLVM_CUDNN
 Description
 -----------
 
-This pass represents code generation for NVIDIA Backends using CuDNN library calls for tensor operations. DFG2LLVM_CUDNN is specifically utilized for the ApproxHPVM tensor intrinsics. On a high level, each ApproxHPVM intrinsic gets mapped to a particular CuDNN library api call. 
+This backend targets calls in `hpvm-tensor-rt` that in turn call tensor library calls in the cuDNN library.
+This backend converts (in most cases one-to-one) tensor intrinsic calls to corresponding runtime calls.
 
 codeGen(DFLeafNode* )
 ----------------------
@@ -213,9 +215,9 @@ Consider the following leaf node function which performs a tensor convolution:
   __hpvm__return(2, r, (size_t)0);
   }
 
-4. Most definitions of the intrinsics arguments map almost identically to their CUDNN implementations. For some intrinsics, these are mapped to a single runtime call with different function arguments. For example, max pooling and mean pooling are separate intrinsics in ApproxHPVM, but they both get mapped to tensorPooling in CUDNN, with an integer specifying the type of pooling. 
+4. Most definitions of the intrinsics arguments map almost identically to their CUDNN implementations. For some intrinsics, these are mapped to a single runtime call with different function arguments. For example, max pooling and mean pooling are separate intrinsics in HPVM, but they both get mapped to tensorPooling in CUDNN, with an integer specifying the type of pooling. 
 
-5. The run-times for ApproxHPVM are currently separate from the HPVM runtime, and as such this pass inserts the initTensorRuntime before the HPVM runtime init call. Similarly, it inserts the tensor runtimes cleanup call before the HPVM runtime cleanup call. (Assertion that both runtime calls can only be used once in the entire module).
+5. The tensor runtime (`hpvm-tensor-rt`) and HPVM runtime (`hpvm-rt`) are currently not integrated, and as such this pass inserts the `initTensorRuntime` calls (for `hpvm-tensor-rt` initialization) before the `hpvm-rt` init call. Similarly, it inserts the tensor runtimes cleanup call before the HPVM runtime cleanup call. (Assertion that both runtime calls can only be used once in the entire module).
 
 codeGen(DFInternalNode* )
 -------------------------
@@ -226,11 +228,14 @@ FuseHPVMTensorNodes
 ^^^^^^^^^^^^^^^^^^^
 Description
 -----------
-When a user writes ApproxHPVM code through the various frontends (e.g. Keras, C++, PyTorch), each tensor operation gets mapped to its own HPVM Dataflow (Leaf) Node, with appropriate HPVM edge bindings feeding the output of one layer into the next. The FuseHPVMTensorNodes pass combines specific patterns of tensor operations from multiple separate nodes into a single HPVM leaf node. 
+
+For users writing tensor code though our frontends (e.g. Keras, C++, PyTorch), each tensor operation is mapped to its own HPVM Dataflow (Leaf) Node, with appropriate HPVM edge bindings feeding the output of one layer into the next. The FuseHPVMTensorNodes pass combines specific patterns of tensor operations from multiple separate nodes into a single HPVM leaf node. 
 
 codeGen(DFLeafNode* )
 ---------------------
-While the pass is generic, we only support TENSOR_TARGET nodes for fusion in this pass. Additionally each leaf node is first identified as being a valid HPVM tensor node (i.e. contains ApproxHPVM intrinsics as the first intrinsic). 
+
+While the pass is generic, we only support `TENSOR_TARGET` (this hint implies HPVM nodes with tensor operations) nodes for fusion. 
+Additionally each leaf node is first identified as being a valid HPVM tensor node (i.e. contains ApproxHPVM intrinsics as the first intrinsic). 
 
 Consider the following consecutive leaf nodes:
 
@@ -283,14 +288,14 @@ The exhaustive list of patterns which are fused are:
 
 According to the list above, the nodes satisfy the pattern 1.
 
-2. We then check if each node in the pattern belongs to the same target. As we can see all nodes are labelled as `TENSOR`.
+2. It checks if each node in the pattern belongs to the same target. Note that all nodes above are labelled as `TENSOR_TARGET`.
 
 3. The pass collects these node handles into a fusion target as:
 
 * `conv_node -> add_node -> relu_node -> pool_max_node`
 
 
-4. Once the pass has collected a list of all fusion targets it goes on to fuse them iteratively. 
+4. Once the pass has collected the list of all fusion targets (sets of HPVM nodes to fuse), it fuses these iteratively. 
 
 5. Each pair of nodes is fused together into a single node and then reinserted into the beginning of the fusion target list. For example, first the `conv_node` and `add_node` will be fused creating `fused_node_1` and then the state of the list will be:
 
@@ -411,7 +416,7 @@ codeGen(DFLeafNode* )
   }
 
 
-2. For each of the patterns listed previously, a specific â€˜wrapperâ€™ function exists which when invoked by the runtime carries out all of the operations. For our pattern, the corresponding wrapper call is `wrapper_ConvLayer2`. The first argument to these wrapper functions is the name of the HPVM node from where it is invoked.
+2. For each of the patterns listed previously, a specific â€˜wrapperâ€™ function exists which when invoked by the runtime carries out all of the operations. For the pattern above (convolution layer) the corresponding wrapper call is `wrapper_ConvLayer2`. The first argument to these wrapper functions is the ID of the HPVM node - this is added by the frontend to assign a linear ordering to HPVM nodes.
 
 .. code-block:: c
 
@@ -458,7 +463,7 @@ DFG2LLVM_OpenCL
 ^^^^^^^^^^^^^^^^^^^
 Description
 -----------
-This pass is responsible for generating code for kernel code and code for 
+This backend generates GPU kernel code and code for 
 launching kernels for the GPU target using HPVM dataflow graph. The kernels are
 generated into a separate file which is the C-Backend uses to generate 
 OpenCL kernels with.