diff --git a/hpvm/docs/hpvm-specification.md b/hpvm/docs/hpvm-specification.md
index cd61d95b4e3d4f4068a985bd5f3bac4578f6e14d..54023fc9eddc4d9f317ac1b4cc585e52b98b8ae5 100644
--- a/hpvm/docs/hpvm-specification.md
+++ b/hpvm/docs/hpvm-specification.md
@@ -1,7 +1,22 @@
-# HPVM Abstraction
+# Table of Contents
+* [HPVM Abstraction](#abstraction)
+    * [Dataflow Node](#node)
+    * [Dataflow Edge](#edge)
+    * [Input and Output Bind](#bind)
+    * [Host Code](#host)
+* [HPVM Implementation](#implementation)
+    * [Intrinsics for Describing Graphs](#describing)
+    * [Intrinsics for Querying Graphs](#querying)
+    * [Intrinsics for Memory Allocation and Synchronization](#memory)
+    * [Intrinsics for Graph Interaction](#interaction)
+* [Implementation Limitations](#limitations)
+
+<a name="abstraction"></a>
+# HPVM Abstraction 
 An HPVM program is a combination of host code plus a set of one or more distinct dataflow graphs. Each dataflow graph (DFG) is a hierarchical graph with side effects. The DFG must be acyclic. Nodes represent units of execution, and edges between nodes describe the explicit data transfer requirements. A node can begin execution once a data item becomes available on every one of its input edges. Repeated transfer of data items between nodes (if more inputs are provided) yields a pipelined execution of different nodes in the graph. The execution of a DFG is initiated and terminated by host code that launches the graph. Nodes may access globally shared memory through load and store instructions (side-effects).
 
-## Dataflow Node
+<a name="node"></a>
+## Dataflow Node 
 A *dataflow node* represents unit of computation in the DFG. A node can begin execution once a data item becomes available on every one of its input edges.
 
 A single static dataflow node represents multiple dynamic instances of the node, each executing the same computation with different index values used to uniquely identify each dynamic instance w.r.t. the others. The dynamic instances of a node may be executed concurrently, and any required synchronization must imposed using HPVM synchronization operations.
@@ -14,8 +29,8 @@ Leaf nodes contain code expressing actual computations. Leaf nodes may contain i
 
 Note that the graph is fully interpreted at compile-time and  cannot be modified at runtime except for the number of dynamic instances, which can be data dependent.
 
-
-## Dataflow Edge
+<a name="edge"></a>
+## Dataflow Edge 
 A *dataflow edge* from the output ```out``` of a source dataflow node ```Src``` to the input ```in``` of a sink dataflow node ```Dst``` describes the explicit data transfer requirements. ```Src``` and ```Dst``` node must belong to the same child graph, i.e. must be children of the same internal node.
 
 An edge from source to sink has the semantics of copying the specified data from the source to the sink after the source node has completed execution. The pairs ```(Src, out)``` and ```(Dst, in)```, representing source and sink respectively, must be unique w.r.t. every other edge in the same child graph, i.e. two dataflow edges in the same child graph cannot have the same source or destination.
@@ -26,7 +41,8 @@ An edge can be instantiated at runtime using one of two replication mechanisms:
 - *All-to-all*, where all dynamic instances of the source node are connected to all dynamic instances of the sink node, thus expressing a synchronization barrier between the two groups of nodes, or
 - *One-to-one*, where each dynamic instance of the source node is connected to a single corresponding instance of the sink node. One-to-one replication requires that the grid structure (number of dimensions and the extents in each dimension) of the source and sink nodes be identical.
 
-## Input and Output Bind
+<a name="bind"></a>
+## Input and Output Bind 
 An internal node is responsible for mapping its inputs, provided by incoming dataflow edges, to the inputs to one or more nodes of the child graph.
 
 An internal node binds its input ```ip``` to input ```ic``` of its child node ```Dst``` using an *input bind*.
@@ -36,7 +52,8 @@ Conversely, an internal node binds output ```oc``` of its child node ```Src``` t
 
 A bind is always ***all-to-all***.
 
-## Host Code
+<a name="host"></a>
+## Host Code 
 In an HPVM program, the host code is responsible for setting up, initiating the execution and blocking for completion of a DFG. The host can interact with the DFG to sustain a streaming computation by sending all data required for, and receiving all data produced by, one execution of the DFG. The list of actions that can be performed by the host is described below:
 
 - **Initialization and Cleanup**:
@@ -60,7 +77,8 @@ The host code blocks for completion of specified DFG.
     - For a non-streaming DFG, the data produced by the DFG are ready to be read by the host.
     - For a streaming DFG, no more data may be provided for processing by the DFG.
 
-# HPVM Implementation
+<a name="implementation"></a>
+# HPVM Implementation 
 
 This section describes the implementation of HPVM on top of LLVM IR.
 
@@ -78,7 +96,8 @@ We represent nodes with opaque handles (pointers of LLVM type i8\*). We represen
 
 Pointer arguments of node functions are required to be annotated with attributes in, and/or out, depending on their expected use (read only, write only, read write).
 
-## Intrinsics for Describing Graphs
+<a name="describing"></a>
+## Intrinsics for Describing Graphs 
 
 The intrinsics for describing graphs can only be used by internal nodes. Also, internal nodes are only allowed to have these intrinsics as part of their node function, with the exception of a return statement of the appropriate type, in order to return the result of the outgoing dataflow edges.
 
@@ -104,7 +123,8 @@ Bind input ```ip``` of current node to input ```ic``` of child node ```N```. Arg
 ```void llvm.hpvm.bind.output(i8* N, i32 oc, i32 op, i1 isStream)```  
 Bind output ```oc``` of child node ```N``` to output ```op``` of current node. Field ```oc``` of the return struct in ```N```'s node function and field ```op``` of the return struct in the current node function must have matching types. ```isStream``` chooses a streaming (1) or non streaming (0) bind.
 
-## Intrinsics for Querying Graphs
+<a name="querying"></a>
+## Intrinsics for Querying Graphs 
 
 The following intrinsics are used to query the structure of the DFG. They can only be used by leaf nodes.
 
@@ -123,6 +143,7 @@ Get index of current dynamic node instance of node ```N``` in dimension x, y or
 ```i64 llvm.hpvm.getNumNodeInstances.{x,y,z}(i8* N)```  
 Get number of dynamic instances of node ```N``` in dimension x, y or z respectively. The dimension must be one of the dimensions in which the node is replicated.
 
+<a name="memory"></a>
 ## Intrinsics for Memory Allocation and Synchronization
 
 The following intrinsics are used for memory allocation and synchronization. They can only be used by leaf nodes.
@@ -158,6 +179,7 @@ Atomically computes the bitwise XOR of ```v``` and the value stored at memory lo
 ```void llvm.hpvm.barrier()```  
 Local synchronization barrier across dynamic instances of current leaf node.
 
+<a name="interaction"></a>
 ## Intrinsics for Graph Interaction
 
 The following intrinsics are for graph initialization/termination and interaction with the host code, and can be used only by the host code.
@@ -189,6 +211,7 @@ Push set of input data ```args``` (same as type included in launch) to streaming
 ```i8* llvm.hpvm.pop(i8* GraphID)```  
 Pop and return data from streaming DFG with handle ```GraphID```. The return type is a struct containing a field for every output of DFG. 
 
+<a name="limitations"></a>
 ## Implementation Limitations
 Due to limitations of our current prototype implementation, the following restrictions are imposed: