diff --git a/docs-src/docs/quantization.md b/docs-src/docs/quantization.md
index 80c12f550ff54ac1ff12e295742bee2dfe39e982..e1bb673ad212072a912a275aea6e4e7b436d2da4 100644
--- a/docs-src/docs/quantization.md
+++ b/docs-src/docs/quantization.md
@@ -21,7 +21,7 @@ Note that very aggressive quantization can yield even more efficiency. If weight
 ## Integer vs. FP32
 
 There are two main attributes when discussing a numerical format. The first is **dynamic range**, which refers to the range of representable numbers. The second one is how many values can be represented within the dynamic range, which in turn determines the **precision / resolution** of the format (the distance between two numbers).  
-For all integer formats, the dynamic range is \([-2^{n-1} .. 2^{n-1}-1]\), where \(n\) is the number of bits. So for INT8 the range is \([-128 .. 127]\), and for INT4 it is \([-16 .. 15]\) (we're limiting ourselves to signed integers for now). The number of representable values is \(2^n\).
+For all integer formats, the dynamic range is \([-2^{n-1} .. 2^{n-1}-1]\), where \(n\) is the number of bits. So for INT8 the range is \([-128 .. 127]\), and for INT4 it is \([-8 .. 7]\) (we're limiting ourselves to signed integers for now). The number of representable values is \(2^n\).
 Contrast that with FP32, where the dynamic range is \(\pm 3.4\ x\ 10^{38}\), and approximately \(4.2\ x\ 10^9\) values can be represented.  
 We can immediately see that FP32 is much more **versatile**, in that it is able to represent a wide range of distributions accurately. This is a nice property for deep learning models, where the distributions of weights and activations are usually very different (at least in dynamic range). In addition the dynamic range can differ between layers in the model.  
 In order to be able to represent these different distributions with an integer format, a **scale factor** is used to map the dynamic range of the tensor to the integer format range. But still we remain with the issue of having a significantly lower number of representable values, that is - much lower resolution.  
diff --git a/docs/404.html b/docs/404.html
index 621190a52e853db796b6f1679a8e76bf8eb4c2b9..50b22817962b0c2a219ab98b440035c389513485 100644
--- a/docs/404.html
+++ b/docs/404.html
@@ -13,12 +13,13 @@
 
   <link rel="stylesheet" href="/css/theme.css" type="text/css" />
   <link rel="stylesheet" href="/css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="/css/highlight.css">
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
   <link href="/extra.css" rel="stylesheet">
   
-  <script src="/js/jquery-2.1.1.min.js"></script>
-  <script src="/js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="/js/highlight.pack.js"></script> 
+  <script src="/js/jquery-2.1.1.min.js" defer></script>
+  <script src="/js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -31,8 +32,8 @@
       <div class="wy-side-nav-search">
         <a href="/index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="/search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="//search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -48,17 +49,17 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="/install/index.html">Installation</a>
+    <a class="" href="/install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="/usage/index.html">Usage</a>
+    <a class="" href="/usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="/schedule/index.html">Compression Scheduling</a>
+    <a class="" href="/schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -67,23 +68,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="/pruning/index.html">Pruning</a>
+    <a class="" href="/pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="/regularization/index.html">Regularization</a>
+    <a class="" href="/regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="/quantization/index.html">Quantization</a>
+    <a class="" href="/quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="/knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="/knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="/conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="/conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -94,32 +95,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="/algo_pruning/index.html">Pruning</a>
+    <a class="" href="/algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="/algo_quantization/index.html">Quantization</a>
+    <a class="" href="/algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="/algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="/algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="/model_zoo/index.html">Model Zoo</a>
+    <a class="" href="/model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="/jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="/jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="/design/index.html">Design</a>
+    <a class="" href="/design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -128,11 +129,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="/tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="/tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="/tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="/tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -202,11 +203,10 @@
       
     </span>
 </div>
-    <script>var base_url = '';</script>
-    <script src="/js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="/search/require.js"></script>
-      <script src="/search/search.js"></script>
+    <script>var base_url = '/';</script>
+    <script src="/js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="/search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/algo_earlyexit/index.html b/docs/algo_earlyexit.html
similarity index 80%
rename from docs/algo_earlyexit/index.html
rename to docs/algo_earlyexit.html
index 65066e3b9b6fa5d765e70c88191b152dae67b65f..3f03df7947d2e82b9c5d2101b99bbb55165fe6e8 100644
--- a/docs/algo_earlyexit/index.html
+++ b/docs/algo_earlyexit.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Early Exit - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Early Exit";
     var mkdocs_page_input_path = "algo_earlyexit.md";
-    var mkdocs_page_url = "/algo_earlyexit/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -101,15 +102,15 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class=" current">
                     
-    <a class="current" href="index.html">Early Exit</a>
+    <a class="current" href="algo_earlyexit.html">Early Exit</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#early-exit-inference">Early Exit Inference</a></li>
@@ -132,17 +133,17 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -151,11 +152,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -170,7 +171,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -178,7 +179,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -200,7 +201,7 @@
 <p>While Deep Neural Networks benefit from a large number of layers, it's often the case that many data points in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in <a href="#panda">Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition</a> points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in <a href="#branchynet">BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks</a> look at a selective approach to exit placement and criteria for exiting early.</p>
 <h2 id="why-does-early-exit-work">Why Does Early Exit Work?</h2>
 <p>Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can represent more complex and expressive boundaries between classes (assuming we’re confident of avoiding over-fitting the data), it’s also clear that much of the data can be properly classified with even the simplest of classification boundaries.</p>
-<p><img alt="Figure !fig(boundaries): Simple and more expressive classification boundaries" src="../imgs/decision_boundary.png" /></p>
+<p><img alt="Figure !fig(boundaries): Simple and more expressive classification boundaries" src="imgs/decision_boundary.png" /></p>
 <p>Data points far from the boundary can be considered "easy to classify" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is "difficult to classify" and require the full expressiveness of the neural network to accurately classify it.</p>
 <h2 id="example-code-for-early-exit">Example code for Early Exit</h2>
 <p>Both CIFAR10 and ImageNet code comes directly from publicly available examples from PyTorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work.</p>
@@ -251,10 +252,10 @@ Deeper networks can benefit from multiple exits. Our examples illustrate both a
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../model_zoo/index.html" class="btn btn-neutral float-right" title="Model Zoo">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="model_zoo.html" class="btn btn-neutral float-right" title="Model Zoo">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../algo_quantization/index.html" class="btn btn-neutral" title="Quantization"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="algo_quantization.html" class="btn btn-neutral" title="Quantization"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -280,18 +281,17 @@ Deeper networks can benefit from multiple exits. Our examples illustrate both a
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../algo_quantization/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="algo_quantization.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../model_zoo/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="model_zoo.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/algo_pruning/index.html b/docs/algo_pruning.html
similarity index 85%
rename from docs/algo_pruning/index.html
rename to docs/algo_pruning.html
index 6e5579ce9f44954c774bc9e9cb6e69c3fa1eabd5..c9ba916a2595ec7b6890abe5b21f7ba863bea6a8 100644
--- a/docs/algo_pruning/index.html
+++ b/docs/algo_pruning.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Pruning - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Pruning";
     var mkdocs_page_input_path = "algo_pruning.md";
-    var mkdocs_page_url = "/algo_pruning/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -101,7 +102,7 @@
     <ul class="subnav">
                 <li class=" current">
                     
-    <a class="current" href="index.html">Pruning</a>
+    <a class="current" href="algo_pruning.html">Pruning</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#weights-pruning-algorithms">Weights Pruning Algorithms</a></li>
@@ -140,28 +141,28 @@
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -170,11 +171,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -189,7 +190,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -197,7 +198,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -216,7 +217,7 @@
             <div class="section">
               
                 <h1 id="weights-pruning-algorithms">Weights Pruning Algorithms</h1>
-<p><center><img alt="algorithms" src="../imgs/algorithms-pruning.png" /></center></p>
+<p><center><img alt="algorithms" src="imgs/algorithms-pruning.png" /></center></p>
 <h2 id="magnitude-pruner">Magnitude Pruner</h2>
 <p>This is the most basic pruner: it applies a thresholding function, \(thresh(.)\), on each element, \(w_i\), of a weights tensor.  A different threshold can be used for each layer's weights tensor.<br>
 Because the threshold is applied on individual elements, this pruner belongs to the element-wise pruning algorithm family.</p>
@@ -227,7 +228,7 @@ Because the threshold is applied on individual elements, this pruner belongs to
 <p>Finding a threshold magnitude per layer is daunting, especially since each layer's elements have different average absolute values.  We can take advantage of the fact that the weights of convolutional and fully connected layers exhibit a Gaussian distribution with a mean value roughly zero, to avoid using a direct threshold based on the values of each specific tensor.
 <br>
 The diagram below shows the distribution the weights tensor of the first convolutional layer, and first fully-connected layer in TorchVision's pre-trained Alexnet model.  You can see that they have an approximate Gaussian distribution.<br>
-<center><img alt="conv1" src="../imgs/alexnet-conv1-hist.png" /> <img alt="fc1" src="../imgs/alexnet-fc1-hist.png" /></center>
+<center><img alt="conv1" src="imgs/alexnet-conv1-hist.png" /> <img alt="fc1" src="imgs/alexnet-fc1-hist.png" /></center>
 <center>The distributions of Alexnet conv1 and fc1 layers</center></p>
 <p>We use the standard deviation of the weights tensor as a sort of normalizing factor between the different weights tensors.  For example, if a tensor is Normally distributed, then about 68% of the elements have an absolute value less than the standard deviation (\(\sigma\)) of the tensor.  Thus, if we set the threshold to \(s*\sigma\), then basically we are thresholding \(s * 68\%\) of the tensor elements.  </p>
 <p>\[ thresh(w_i)=\left\lbrace
@@ -250,7 +251,7 @@ The diagram below shows the distribution the weights tensor of the first convolu
 <h3 id="schedule">Schedule</h3>
 <p>In their <a href="https://arxiv.org/abs/1506.02626">paper</a> Song Han et al. use iterative pruning and change the value of the \(s\) multiplier at each pruning step.  Distiller's <code>SensitivityPruner</code> works differently: the value \(s\) is set once based on a one-time calculation of the standard-deviation of the tensor (the first time we prune), and relies on the fact that as the tensor is pruned, more elements are "pulled" toward the center of the distribution and thus more elements gets pruned.</p>
 <p>This actually works quite well as we can see in the diagram below.  This is a TensorBoard screen-capture from Alexnet training, which shows how this method starts off pruning very aggressively, but then slowly reduces the pruning rate.
-<center><img alt="conv1" src="../imgs/alexnet-fc1-training-plot.png" /></center></p>
+<center><img alt="conv1" src="imgs/alexnet-fc1-training-plot.png" /></center></p>
 <p>We use a simple iterative-pruning schedule such as: <em>Prune every second epoch starting at epoch 0, and ending at epoch 38.</em>  This excerpt from <code>alexnet.schedule_sensitivity.yaml</code> shows how this iterative schedule is conveyed in Distiller scheduling configuration YAML:</p>
 <pre><code>pruners:
   my_pruner:
@@ -287,13 +288,13 @@ This pruner is much more stable compared to <code>SensitivityPruner</code> becau
 <p>In <a href="https://arxiv.org/abs/1600.604493">Dynamic Network Surgery for Efficient DNNs</a> Guo et. al propose that network pruning and splicing work in tandem.  A <code>SpilicingPruner</code> is a pruner that both prunes and splices connections and works best with a Dynamic Network Surgery schedule, which, for example, configures the <code>PruningPolicy</code> to mask weights only during the forward pass.</p>
 <h2 id="automated-gradual-pruner-agp">Automated Gradual Pruner (AGP)</h2>
 <p>In <a href="https://arxiv.org/abs/1710.01878">To prune, or not to prune: exploring the efficacy of pruning for model compression</a>, authors Michael Zhu and Suyog Gupta provide an algorithm to schedule a Level Pruner which Distiller implements in <code>AutomatedGradualPruner</code>.
-<center><img alt="agp formula" src="../imgs/agp_formula.png" /></center></p>
+<center><img alt="agp formula" src="imgs/agp_formula.png" /></center></p>
 <blockquote>
 <p>"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value \(s_i\) (usually 0) to a final sparsity value \(s_f\) over a span of n pruning steps.
 The intuition behind this sparsity function in equation (1)  is to prune the network rapidly in the initial phase when the redundant connections are
 abundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.""</p>
 </blockquote>
-<p><center><img alt="Gradual Pruning" src="../imgs/gradual_pruning.png" /></center></p>
+<p><center><img alt="Gradual Pruning" src="imgs/gradual_pruning.png" /></center></p>
 <p>You can play with the scheduling parameters in the <a href="https://github.com/NervanaSystems/distiller/blob/master/jupyter/agp_schedule.ipynb">agp_schedule.ipynb notebook</a>.</p>
 <p>The authors describe AGP:</p>
 <blockquote>
@@ -307,7 +308,7 @@ abundant and gradually reduce the number of weights being pruned each time as th
 <h2 id="rnn-pruner">RNN Pruner</h2>
 <p>The authors of <a href="https://arxiv.org/abs/1704.05119">Exploring Sparsity in Recurrent Neural Networks</a>, Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, "propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network."  They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training.  They show pruning of RNN, GRU, LSTM and embedding layers.</p>
 <p>Distiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm.</p>
-<p><center><img alt="Baidu RNN Pruning" src="../imgs/baidu_rnn_pruning.png" /></center></p>
+<p><center><img alt="Baidu RNN Pruning" src="imgs/baidu_rnn_pruning.png" /></center></p>
 <h1 id="structure-pruners">Structure Pruners</h1>
 <p>Element-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation.  Structure pruners, remove entire "structures", such as kernels, filters, and even entire feature-maps.</p>
 <h2 id="structure-ranking-pruners">Structure Ranking Pruners</h2>
@@ -347,10 +348,10 @@ This method is called <em>Network Trimming</em> from the research paper:
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../algo_quantization/index.html" class="btn btn-neutral float-right" title="Quantization">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="algo_quantization.html" class="btn btn-neutral float-right" title="Quantization">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../conditional_computation/index.html" class="btn btn-neutral" title="Conditional Computation"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="conditional_computation.html" class="btn btn-neutral" title="Conditional Computation"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -376,18 +377,17 @@ This method is called <em>Network Trimming</em> from the research paper:
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../conditional_computation/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="conditional_computation.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../algo_quantization/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="algo_quantization.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/algo_quantization/index.html b/docs/algo_quantization.html
similarity index 81%
rename from docs/algo_quantization/index.html
rename to docs/algo_quantization.html
index 21eebdbf76b62a7b2c7fd09598ae3de5e5360660..3cd4ac6c3a4a1961b00b76923ce7bbd0b6712b5d 100644
--- a/docs/algo_quantization/index.html
+++ b/docs/algo_quantization.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Quantization - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Quantization";
     var mkdocs_page_input_path = "algo_quantization.md";
-    var mkdocs_page_url = "/algo_quantization/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -101,11 +102,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class=" current">
                     
-    <a class="current" href="index.html">Quantization</a>
+    <a class="current" href="algo_quantization.html">Quantization</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#quantization-algorithms">Quantization Algorithms</a></li>
@@ -127,24 +128,24 @@
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -153,11 +154,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -172,7 +173,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -180,7 +181,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -200,7 +201,7 @@
               
                 <h1 id="quantization-algorithms">Quantization Algorithms</h1>
 <p><strong>Note:</strong><br />
-For any of the methods below that require quantization-aware training, please see <a href="../schedule/index.html#quantization">here</a> for details on how to invoke it using Distiller's scheduling mechanism.</p>
+For any of the methods below that require quantization-aware training, please see <a href="schedule.html#quantization">here</a> for details on how to invoke it using Distiller's scheduling mechanism.</p>
 <h2 id="range-based-linear-quantization">Range-Based Linear Quantization</h2>
 <p>Let's break down the terminology we use here:</p>
 <ul>
@@ -261,7 +262,7 @@ For any of the methods below that require quantization-aware training, please se
 </ul>
 <h3 id="other-features">Other Features</h3>
 <ul>
-<li><strong>Removing Outliers:</strong> As discussed <a href="../quantization/index.html#outliers-removal">here</a>, in some cases the float range of activations contains outliers. Spending dynamic range on these outliers hurts our ability to represent the values we actually care about accurately.
+<li><strong>Removing Outliers:</strong> As discussed <a href="quantization.html#outliers-removal">here</a>, in some cases the float range of activations contains outliers. Spending dynamic range on these outliers hurts our ability to represent the values we actually care about accurately.
    <p align="center">
        <img src="../imgs/quant_clipped.png"/>
    </p>
@@ -281,9 +282,9 @@ For any of the methods below that require quantization-aware training, please se
 </ul>
 </li>
 <li>All other layers are unaffected and are executed using their original FP32 implementation.</li>
-<li>To automatically transform an existing model to a quantized model using this method, use the <code>PostTrainLinearQuantizer</code> class. For details on ways to invoke the quantizer see <a href="../schedule/index.html#post-training-quantization">here</a>.</li>
+<li>To automatically transform an existing model to a quantized model using this method, use the <code>PostTrainLinearQuantizer</code> class. For details on ways to invoke the quantizer see <a href="schedule.html#post-training-quantization">here</a>.</li>
 <li>The transform performed by the Quantizer only works on sub-classes of <code>torch.nn.Module</code>. But operations such as element-wise addition / multiplication and concatenation do not have associated Modules in PyTorch. They are either overloaded operators, or simple functions in the <code>torch</code> namespace. To be able to quantize these operations, we've implemented very simple modules that wrap these operations <a href="https://github.com/NervanaSystems/distiller/blob/master/distiller/distiller/modules">here</a>. It is necessary to manually modify your model and replace any existing operator with a corresponding module. For an example, see our slightly modified <a href="https://github.com/NervanaSystems/distiller/blob/quantization_updates/models/imagenet/resnet.py">ResNet implementation</a>.</li>
-<li>For weights and bias the scale factor and zero-point are determined once at quantization setup ("offline" / "static"). For activations, both "static" and "dynamic" quantization is supported. Static quantizaton of activations requires that statistics be collected beforehand. See details on how to do that <a href="../schedule/index.html#collecting-statistics-for-quantization">here</a>.</li>
+<li>For weights and bias the scale factor and zero-point are determined once at quantization setup ("offline" / "static"). For activations, both "static" and "dynamic" quantization is supported. Static quantizaton of activations requires that statistics be collected beforehand. See details on how to do that <a href="schedule.html#collecting-statistics-for-quantization">here</a>.</li>
 <li>The calculated quantization parameters are stored as buffers within the module, so they are automatically serialized when the model checkpoint is saved.</li>
 </ul>
 <h4 id="quantization-aware-training">Quantization-Aware Training</h4>
@@ -309,7 +310,7 @@ Note that the current implementation of <code>QuantAwareTrainRangeLinearQuantize
 <p>
 <script type="math/tex; mode=display">w_q = 2 quantize_k \left( f(w_f) \right) - 1</script>
 </p>
-<p>This method requires training the model with quantization-aware training, as discussed <a href="../quantization/index.html#quantization-aware-training">here</a>. Use the <code>DorefaQuantizer</code> class to transform an existing model to a model suitable for training with quantization using DoReFa.</p>
+<p>This method requires training the model with quantization-aware training, as discussed <a href="quantization.html#quantization-aware-training">here</a>. Use the <code>DorefaQuantizer</code> class to transform an existing model to a model suitable for training with quantization using DoReFa.</p>
 <h3 id="notes">Notes:</h3>
 <ul>
 <li>Gradients quantization as proposed in the paper is not supported yet.</li>
@@ -318,7 +319,7 @@ Note that the current implementation of <code>QuantAwareTrainRangeLinearQuantize
 <h2 id="pact">PACT</h2>
 <p>(As proposed in <a href="https://arxiv.org/abs/1805.06085">PACT: Parameterized Clipping Activation for Quantized Neural Networks</a>)</p>
 <p>This method is similar to DoReFa, but the upper clipping values, <script type="math/tex">\alpha</script>, of the activation functions are learned parameters instead of hard coded to 1. Note that per the paper's recommendation, <script type="math/tex">\alpha</script> is shared per layer.</p>
-<p>This method requires training the model with quantization-aware training, as discussed <a href="../quantization/#quantization-aware-training">here</a>. Use the <code>PACTQuantizer</code> class to transform an existing model to a model suitable for training with quantization using PACT.</p>
+<p>This method requires training the model with quantization-aware training, as discussed <a href="quantization/#quantization-aware-training">here</a>. Use the <code>PACTQuantizer</code> class to transform an existing model to a model suitable for training with quantization using PACT.</p>
 <h2 id="wrpn">WRPN</h2>
 <p>(As proposed in <a href="https://arxiv.org/abs/1709.01134">WRPN: Wide Reduced-Precision Networks</a>)  </p>
 <p>In this method, activations are clipped to <script type="math/tex">[0, 1]</script> and quantized as follows (<script type="math/tex">k</script> is the number of bits used for quantization):</p>
@@ -330,7 +331,7 @@ Note that the current implementation of <code>QuantAwareTrainRangeLinearQuantize
 <script type="math/tex; mode=display">w_q = \frac{1}{2^{k-1}-1} round \left( \left(2^{k-1} - 1 \right)w_f \right)</script>
 </p>
 <p>Note that <script type="math/tex">k-1</script> bits are used to quantize weights, leaving one bit for sign.</p>
-<p>This method requires training the model with quantization-aware training, as discussed <a href="../quantization/#quantization-aware-training">here</a>. Use the <code>WRPNQuantizer</code> class to transform an existing model to a model suitable for training with quantization using WRPN.</p>
+<p>This method requires training the model with quantization-aware training, as discussed <a href="quantization/#quantization-aware-training">here</a>. Use the <code>WRPNQuantizer</code> class to transform an existing model to a model suitable for training with quantization using WRPN.</p>
 <h3 id="notes_1">Notes:</h3>
 <ul>
 <li>The paper proposed widening of layers as a means to reduce accuracy loss. This isn't implemented as part of <code>WRPNQuantizer</code> at the moment. To experiment with this, modify your model implementation to have wider layers.</li>
@@ -343,10 +344,10 @@ Note that the current implementation of <code>QuantAwareTrainRangeLinearQuantize
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../algo_earlyexit/index.html" class="btn btn-neutral float-right" title="Early Exit">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="algo_earlyexit.html" class="btn btn-neutral float-right" title="Early Exit">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../algo_pruning/index.html" class="btn btn-neutral" title="Pruning"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="algo_pruning.html" class="btn btn-neutral" title="Pruning"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -372,18 +373,17 @@ Note that the current implementation of <code>QuantAwareTrainRangeLinearQuantize
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../algo_pruning/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="algo_pruning.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../algo_earlyexit/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="algo_earlyexit.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/conditional_computation/index.html b/docs/conditional_computation.html
similarity index 75%
rename from docs/conditional_computation/index.html
rename to docs/conditional_computation.html
index 5ca30fb3fa7ab5c9839a81e474da8f91aa205617..650556a8c802f02cfee815a44b32fd488f7e4169 100644
--- a/docs/conditional_computation/index.html
+++ b/docs/conditional_computation.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Conditional Computation - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Conditional Computation";
     var mkdocs_page_input_path = "conditional_computation.md";
-    var mkdocs_page_url = "/conditional_computation/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class=" current">
                     
-    <a class="current" href="index.html">Conditional Computation</a>
+    <a class="current" href="conditional_computation.html">Conditional Computation</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#conditional-computation">Conditional Computation</a></li>
@@ -113,32 +114,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -147,11 +148,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -166,7 +167,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -174,7 +175,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -236,10 +237,10 @@ In Distiller we currently have implemented a variant of Early Exit.</p>
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../algo_pruning/index.html" class="btn btn-neutral float-right" title="Pruning">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="algo_pruning.html" class="btn btn-neutral float-right" title="Pruning">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../knowledge_distillation/index.html" class="btn btn-neutral" title="Knowledge Distillation"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="knowledge_distillation.html" class="btn btn-neutral" title="Knowledge Distillation"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -265,18 +266,17 @@ In Distiller we currently have implemented a variant of Early Exit.</p>
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../knowledge_distillation/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="knowledge_distillation.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../algo_pruning/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="algo_pruning.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/css/highlight.css b/docs/css/highlight.css
deleted file mode 100644
index 0ae40a72f603e6f5edab5ade2ddf329fd44eb991..0000000000000000000000000000000000000000
--- a/docs/css/highlight.css
+++ /dev/null
@@ -1,124 +0,0 @@
-/*
-This is the GitHub theme for highlight.js
-
-github.com style (c) Vasily Polovnyov <vast@whiteants.net>
-
-*/
-
-.hljs {
-  display: block;
-  overflow-x: auto;
-  color: #333;
-  -webkit-text-size-adjust: none;
-}
-
-.hljs-comment,
-.diff .hljs-header,
-.hljs-javadoc {
-  color: #998;
-  font-style: italic;
-}
-
-.hljs-keyword,
-.css .rule .hljs-keyword,
-.hljs-winutils,
-.nginx .hljs-title,
-.hljs-subst,
-.hljs-request,
-.hljs-status {
-  color: #333;
-  font-weight: bold;
-}
-
-.hljs-number,
-.hljs-hexcolor,
-.ruby .hljs-constant {
-  color: #008080;
-}
-
-.hljs-string,
-.hljs-tag .hljs-value,
-.hljs-phpdoc,
-.hljs-dartdoc,
-.tex .hljs-formula {
-  color: #d14;
-}
-
-.hljs-title,
-.hljs-id,
-.scss .hljs-preprocessor {
-  color: #900;
-  font-weight: bold;
-}
-
-.hljs-list .hljs-keyword,
-.hljs-subst {
-  font-weight: normal;
-}
-
-.hljs-class .hljs-title,
-.hljs-type,
-.vhdl .hljs-literal,
-.tex .hljs-command {
-  color: #458;
-  font-weight: bold;
-}
-
-.hljs-tag,
-.hljs-tag .hljs-title,
-.hljs-rule .hljs-property,
-.django .hljs-tag .hljs-keyword {
-  color: #000080;
-  font-weight: normal;
-}
-
-.hljs-attribute,
-.hljs-variable,
-.lisp .hljs-body,
-.hljs-name {
-  color: #008080;
-}
-
-.hljs-regexp {
-  color: #009926;
-}
-
-.hljs-symbol,
-.ruby .hljs-symbol .hljs-string,
-.lisp .hljs-keyword,
-.clojure .hljs-keyword,
-.scheme .hljs-keyword,
-.tex .hljs-special,
-.hljs-prompt {
-  color: #990073;
-}
-
-.hljs-built_in {
-  color: #0086b3;
-}
-
-.hljs-preprocessor,
-.hljs-pragma,
-.hljs-pi,
-.hljs-doctype,
-.hljs-shebang,
-.hljs-cdata {
-  color: #999;
-  font-weight: bold;
-}
-
-.hljs-deletion {
-  background: #fdd;
-}
-
-.hljs-addition {
-  background: #dfd;
-}
-
-.diff .hljs-change {
-  background: #0086b3;
-}
-
-.hljs-chunk {
-  color: #aaa;
-}
diff --git a/docs/css/theme_extra.css b/docs/css/theme_extra.css
index cf8123e35af37af334233a673327563a522a7186..ab107ba645f33d3de2742a676a04ab4d57c60974 100644
--- a/docs/css/theme_extra.css
+++ b/docs/css/theme_extra.css
@@ -128,8 +128,11 @@ pre .cs, pre .c {
  * Additions specific to the search functionality provided by MkDocs
  */
 
-.search-results article {
+.search-results {
     margin-top: 23px;
+}
+
+.search-results article {
     border-top: 1px solid #E1E4E5;
     padding-top: 24px;
 }
diff --git a/docs/design/index.html b/docs/design.html
similarity index 79%
rename from docs/design/index.html
rename to docs/design.html
index 822af8178fd190ba97e05822f2ccf09c3f8792d4..b8a3c6d1a05560279faf046216f3598033fce948 100644
--- a/docs/design/index.html
+++ b/docs/design.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Design - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Design";
     var mkdocs_page_input_path = "design.md";
-    var mkdocs_page_url = "/design/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -101,32 +102,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1 current">
 		
-    <a class="current" href="index.html">Design</a>
+    <a class="current" href="design.html">Design</a>
     <ul class="subnav">
             
     <li class="toctree-l2"><a href="#distiller-design">Distiller design</a></li>
@@ -149,11 +150,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -168,7 +169,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -176,7 +177,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
     
@@ -214,7 +215,7 @@ train():
 </code></pre>
 
 <p>These callbacks can be seen in the diagram below, as the arrow pointing from the Training Loop and into Distiller's <em>Scheduler</em>, which invokes the correct algorithm.  The application also uses Distiller services to collect statistics in <em>Summaries</em> and logs files, which can be queried at a later time, from Jupyter notebooks or TensorBoard.</p>
-<p><center><img alt="Distiller design" src="../imgs/distiller-design.png" /></center><br></p>
+<p><center><img alt="Distiller design" src="imgs/distiller-design.png" /></center><br></p>
 <h2 id="sparsification-and-fine-tuning">Sparsification and fine-tuning</h2>
 <ul>
 <li>The application sets up a model as normally done in PyTorch.</li>
@@ -261,7 +262,7 @@ To execute the model transformation, call the <code>prepare_model</code> functio
 <p>The <code>Quantizer</code> class supports quantization-aware training, that is - training with quantization in the loop. This requires handling of a couple of flows / scenarios:</p>
 <ol>
 <li>
-<p>Maintaining a full precision copy of the weights, as described <a href="../quantization/index.html#quantization-aware-training">here</a>. This is enabled by setting <code>train_with_fp_copy=True</code> in the <code>Quantizer</code> constructor. At model transformation, in each module that has parameters that should be quantized, a new <code>torch.nn.Parameter</code> is added, which will maintain the required full precision copy of the parameters. Note that this is done in-place - a new module <strong>is not</strong> created. We preferred not to sub-class the existing PyTorch modules for this purpose. In order to this in-place, and also guarantee proper back-propagation through the weights quantization function, we employ the following "hack": </p>
+<p>Maintaining a full precision copy of the weights, as described <a href="quantization.html#quantization-aware-training">here</a>. This is enabled by setting <code>train_with_fp_copy=True</code> in the <code>Quantizer</code> constructor. At model transformation, in each module that has parameters that should be quantized, a new <code>torch.nn.Parameter</code> is added, which will maintain the required full precision copy of the parameters. Note that this is done in-place - a new module <strong>is not</strong> created. We preferred not to sub-class the existing PyTorch modules for this purpose. In order to this in-place, and also guarantee proper back-propagation through the weights quantization function, we employ the following "hack": </p>
 <ol>
 <li>The existing <code>torch.nn.Parameter</code>, e.g. <code>weights</code>, is replaced by a <code>torch.nn.Parameter</code> named <code>float_weight</code>.</li>
 <li>To maintain the existing functionality of the module, we then register a <code>buffer</code> in the module with the original name - <code>weights</code>.</li>
@@ -269,7 +270,7 @@ To execute the model transformation, call the <code>prepare_model</code> functio
 </ol>
 </li>
 <li>
-<p>In addition, some quantization methods may introduce additional learned parameters to the model. For example, in the <a href="../algo_quantization/index.html#PACT">PACT</a> method, acitvations are clipped to a value <script type="math/tex">\alpha</script>, which is a learned parameter per-layer</p>
+<p>In addition, some quantization methods may introduce additional learned parameters to the model. For example, in the <a href="algo_quantization.html#PACT">PACT</a> method, acitvations are clipped to a value <script type="math/tex">\alpha</script>, which is a learned parameter per-layer</p>
 </li>
 </ol>
 <p>To support these two cases, the <code>Quantizer</code> class also accepts an instance of a <code>torch.optim.Optimizer</code> (normally this would be one an instance of its sub-classes). The quantizer will take care of modifying the optimizer according to the changes made to the parameters.   </p>
@@ -288,10 +289,10 @@ In <code>distiller/quantization/clipped_linear.py</code> there are examples of l
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../tutorial-struct_pruning/index.html" class="btn btn-neutral float-right" title="Pruning Filters and Channels">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="tutorial-struct_pruning.html" class="btn btn-neutral float-right" title="Pruning Filters and Channels">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../jupyter/index.html" class="btn btn-neutral" title="Jupyter Notebooks"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="jupyter.html" class="btn btn-neutral" title="Jupyter Notebooks"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -317,18 +318,17 @@ In <code>distiller/quantization/clipped_linear.py</code> there are examples of l
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../jupyter/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="jupyter.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../tutorial-struct_pruning/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="tutorial-struct_pruning.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/earlyexit.html b/docs/earlyexit.html
new file mode 100644
index 0000000000000000000000000000000000000000..248004587fdb8515456e6f08fc77a1d281c75c2e
--- /dev/null
+++ b/docs/earlyexit.html
@@ -0,0 +1,253 @@
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  
+  <link rel="shortcut icon" href="img/favicon.ico">
+  <title>Early Exit Inference - Neural Network Distiller</title>
+  <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
+
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
+  
+  <script>
+    // Current page data
+    var mkdocs_page_name = "Early Exit Inference";
+    var mkdocs_page_input_path = "earlyexit.md";
+    var mkdocs_page_url = null;
+  </script>
+  
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
+  
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  <div class="wy-grid-for-nav">
+
+    
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
+      <div class="wy-side-nav-search">
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <div role="search">
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
+  </form>
+</div>
+      </div>
+
+      <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+	<ul class="current">
+	  
+          
+            <li class="toctree-l1">
+		
+    <a class="" href="index.html">Home</a>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <a class="" href="install.html">Installation</a>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <a class="" href="usage.html">Usage</a>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <a class="" href="schedule.html">Compression Scheduling</a>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <span class="caption-text">Compressing Models</span>
+    <ul class="subnav">
+                <li class="">
+                    
+    <a class="" href="pruning.html">Pruning</a>
+                </li>
+                <li class="">
+                    
+    <a class="" href="regularization.html">Regularization</a>
+                </li>
+                <li class="">
+                    
+    <a class="" href="quantization.html">Quantization</a>
+                </li>
+                <li class="">
+                    
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
+                </li>
+                <li class="">
+                    
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
+                </li>
+    </ul>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <span class="caption-text">Algorithms</span>
+    <ul class="subnav">
+                <li class="">
+                    
+    <a class="" href="algo_pruning.html">Pruning</a>
+                </li>
+                <li class="">
+                    
+    <a class="" href="algo_quantization.html">Quantization</a>
+                </li>
+                <li class="">
+                    
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
+                </li>
+    </ul>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <a class="" href="model_zoo.html">Model Zoo</a>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <a class="" href="design.html">Design</a>
+	    </li>
+          
+            <li class="toctree-l1">
+		
+    <span class="caption-text">Tutorials</span>
+    <ul class="subnav">
+                <li class="">
+                    
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
+                </li>
+                <li class="">
+                    
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
+                </li>
+    </ul>
+	    </li>
+          
+        </ul>
+      </div>
+      &nbsp;
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+      
+      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
+        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+        <a href="index.html">Neural Network Distiller</a>
+      </nav>
+
+      
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li><a href="index.html">Docs</a> &raquo;</li>
+    
+      
+    
+    <li>Early Exit Inference</li>
+    <li class="wy-breadcrumbs-aside">
+      
+    </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main">
+            <div class="section">
+              
+                <h1 id="early-exit-inference">Early Exit Inference</h1>
+<p>While Deep Neural Networks benefit from a large number of layers, it's often the case that many datapoints in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in <a href="#panda">Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition</a> points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in <a href="#branchynet">BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks</a> look at a selective approach to exit placement and criteria for exiting early.</p>
+<h2 id="why-does-early-exit-work">Why Does Early Exit Work?</h2>
+<p>Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can representative more complex and expressive boundaries between classes (assuming we’re confident of avoiding over-fitting the data), it’s also clear that much of the data can be properly classified with even the simplest of classification boundaries.</p>
+<p><img alt="Figure !fig(boundaries): Simple and more expressive classification boundaries" src="/docs-src/docs/imgs/decision_boundary.png" /></p>
+<p>Data points far from the boundary can be considered "easy to classify" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is "difficult to classify" and require the full expressiveness of the neural network to accurately classify it.</p>
+<h2 id="example-code-for-early-exit">Example code for Early Exit</h2>
+<p>Both CIFAR10 and Imagenet code comes directly from publically available examples from Pytorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work.</p>
+<p>Deeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and Imagenet, respectively.</p>
+<p>Note that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues.</p>
+<h3 id="heuristics">Heuristics</h3>
+<p>The insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more agressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy.</p>
+<p>There are other benefits to adding exits in that training the modified network now has backpropagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient.</p>
+<h3 id="early-exit-hyperparameters">Early Exit Hyperparameters</h3>
+<p>There are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit:</p>
+<ol>
+<li>
+<p><strong>--earlyexit_thresholds</strong> defines the
+thresholds for each of the early exits. The cross entropy measure must be <strong>less than</strong> the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify "--earlyexit_thresholds 0.9 1.2" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits.</p>
+</li>
+<li>
+<p><strong>--earlyexit_lossweights</strong> provide the weights for the linear combination of losses during training to compute a signle, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of "--earlyexit_lossweights 0.2 0.3" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy.</p>
+</li>
+</ol>
+<h3 id="output-stats">Output Stats</h3>
+<p>The example code outputs various statistics regarding the loss and accuracy at each of the exits. During training, the Top1 and Top5 stats represent the accuracy should all of the data be forced out that exit (in order to compute the loss at that exit). During inference (i.e. validation and test stages), the Top1 and Top5 stats represent the accuracy for those data points that could exit because the calculated entropy at that exit was lower than the specified threshold for that exit.</p>
+<h3 id="cifar10">CIFAR10</h3>
+<p>In the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers.</p>
+<h3 id="imagenet">Imagenet</h3>
+<p>This supports training and inference of the imagenet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic resnet code and could be used with other size resnets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly.</p>
+<h2 id="references">References</h2>
+<p><div id="panda"></div> <strong>Priyadarshini Panda, Abhronil Sengupta, Kaushik Roy</strong>.
+    <a href="https://arxiv.org/abs/1509.08971v6"><em>Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition</em></a>, arXiv:1509.08971v6, 2017.</p>
+<div id="branchynet"></div>
+
+<p><strong>Surat Teerapittayanon, Bradley McDanel, H. T. Kung</strong>.
+    <a href="http://arxiv.org/abs/1709.01686"><em>BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks</em></a>, arXiv:1709.01686, 2017.</p>
+              
+            </div>
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <!-- Copyright etc -->
+    
+  </div>
+
+  Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
+</footer>
+      
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+
+  <div class="rst-versions" role="note" style="cursor: pointer">
+    <span class="rst-current-version" data-toggle="rst-current-version">
+      
+      
+      
+    </span>
+</div>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
+
+</body>
+</html>
diff --git a/docs/index.html b/docs/index.html
index 184335c78e6fb573edba19d18d9285ac7202a16c..7375a9b17b2df266721ff2ab362a98d021fd61ae 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   <meta name="description" content="Distiller Documentation by Intel AI">
   
-  <link rel="shortcut icon" href="./img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Home - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="./css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="./css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="./css/highlight.css">
-  <link href="./extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Home";
     var mkdocs_page_input_path = "index.md";
-    var mkdocs_page_url = "/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="./js/jquery-2.1.1.min.js"></script>
-  <script src="./js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="./js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -39,7 +40,7 @@
         <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
   <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -69,17 +70,17 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -88,23 +89,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -115,32 +116,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -149,11 +150,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -230,7 +231,7 @@ And of course, if we used a sparse or compressed representation, then we are red
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="install/index.html" class="btn btn-neutral float-right" title="Installation">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="install.html" class="btn btn-neutral float-right" title="Installation">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
     </div>
@@ -258,20 +259,19 @@ And of course, if we used a sparse or compressed representation, then we are red
       
       
       
-        <span style="margin-left: 15px"><a href="install/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="install.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
     <script>var base_url = '.';</script>
-    <script src="./js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="./search/require.js"></script>
-      <script src="./search/search.js"></script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
 
 <!--
-MkDocs version : 0.17.2
-Build Date UTC : 2019-02-24 16:27:36
+MkDocs version : 1.0.4
+Build Date UTC : 2019-03-28 17:45:12
 -->
diff --git a/docs/install/index.html b/docs/install.html
similarity index 74%
rename from docs/install/index.html
rename to docs/install.html
index 5c7e6b0d618baff97ec48626634ba1a338ba16e1..f03207db079c62a3c334e918680b573c2d9b6446 100644
--- a/docs/install/index.html
+++ b/docs/install.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Installation - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Installation";
     var mkdocs_page_input_path = "install.md";
-    var mkdocs_page_url = "/install/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,12 +51,12 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1 current">
 		
-    <a class="current" href="index.html">Installation</a>
+    <a class="current" href="install.html">Installation</a>
     <ul class="subnav">
             
     <li class="toctree-l2"><a href="#distiller-installation">Distiller Installation</a></li>
@@ -76,12 +77,12 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -90,23 +91,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -117,32 +118,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -151,11 +152,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -170,7 +171,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -178,7 +179,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
     
@@ -251,10 +252,10 @@ $ pip3 install -e .
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../usage/index.html" class="btn btn-neutral float-right" title="Usage">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="usage.html" class="btn btn-neutral float-right" title="Usage">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../index.html" class="btn btn-neutral" title="Home"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="index.html" class="btn btn-neutral" title="Home"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -280,18 +281,17 @@ $ pip3 install -e .
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../usage/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="usage.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/js/highlight.pack.js b/docs/js/highlight.pack.js
deleted file mode 100644
index a5818dfb2fc8fdbc582d65fc1ca4c20ebeb532c1..0000000000000000000000000000000000000000
--- a/docs/js/highlight.pack.js
+++ /dev/null
@@ -1,2 +0,0 @@
-!function(e){"undefined"!=typeof exports?e(exports):(window.hljs=e({}),"function"==typeof define&&define.amd&&define([],function(){return window.hljs}))}(function(e){function n(e){return e.replace(/&/gm,"&amp;").replace(/</gm,"&lt;").replace(/>/gm,"&gt;")}function t(e){return e.nodeName.toLowerCase()}function r(e,n){var t=e&&e.exec(n);return t&&0==t.index}function a(e){var n=(e.className+" "+(e.parentNode?e.parentNode.className:"")).split(/\s+/);return n=n.map(function(e){return e.replace(/^lang(uage)?-/,"")}),n.filter(function(e){return N(e)||/no(-?)highlight|plain|text/.test(e)})[0]}function i(e,n){var t,r={};for(t in e)r[t]=e[t];if(n)for(t in n)r[t]=n[t];return r}function o(e){var n=[];return function r(e,a){for(var i=e.firstChild;i;i=i.nextSibling)3==i.nodeType?a+=i.nodeValue.length:1==i.nodeType&&(n.push({event:"start",offset:a,node:i}),a=r(i,a),t(i).match(/br|hr|img|input/)||n.push({event:"stop",offset:a,node:i}));return a}(e,0),n}function u(e,r,a){function i(){return e.length&&r.length?e[0].offset!=r[0].offset?e[0].offset<r[0].offset?e:r:"start"==r[0].event?e:r:e.length?e:r}function o(e){function r(e){return" "+e.nodeName+'="'+n(e.value)+'"'}l+="<"+t(e)+Array.prototype.map.call(e.attributes,r).join("")+">"}function u(e){l+="</"+t(e)+">"}function c(e){("start"==e.event?o:u)(e.node)}for(var s=0,l="",f=[];e.length||r.length;){var g=i();if(l+=n(a.substr(s,g[0].offset-s)),s=g[0].offset,g==e){f.reverse().forEach(u);do c(g.splice(0,1)[0]),g=i();while(g==e&&g.length&&g[0].offset==s);f.reverse().forEach(o)}else"start"==g[0].event?f.push(g[0].node):f.pop(),c(g.splice(0,1)[0])}return l+n(a.substr(s))}function c(e){function n(e){return e&&e.source||e}function t(t,r){return new RegExp(n(t),"m"+(e.cI?"i":"")+(r?"g":""))}function r(a,o){if(!a.compiled){if(a.compiled=!0,a.k=a.k||a.bK,a.k){var u={},c=function(n,t){e.cI&&(t=t.toLowerCase()),t.split(" ").forEach(function(e){var t=e.split("|");u[t[0]]=[n,t[1]?Number(t[1]):1]})};"string"==typeof a.k?c("keyword",a.k):Object.keys(a.k).forEach(function(e){c(e,a.k[e])}),a.k=u}a.lR=t(a.l||/\b\w+\b/,!0),o&&(a.bK&&(a.b="\\b("+a.bK.split(" ").join("|")+")\\b"),a.b||(a.b=/\B|\b/),a.bR=t(a.b),a.e||a.eW||(a.e=/\B|\b/),a.e&&(a.eR=t(a.e)),a.tE=n(a.e)||"",a.eW&&o.tE&&(a.tE+=(a.e?"|":"")+o.tE)),a.i&&(a.iR=t(a.i)),void 0===a.r&&(a.r=1),a.c||(a.c=[]);var s=[];a.c.forEach(function(e){e.v?e.v.forEach(function(n){s.push(i(e,n))}):s.push("self"==e?a:e)}),a.c=s,a.c.forEach(function(e){r(e,a)}),a.starts&&r(a.starts,o);var l=a.c.map(function(e){return e.bK?"\\.?("+e.b+")\\.?":e.b}).concat([a.tE,a.i]).map(n).filter(Boolean);a.t=l.length?t(l.join("|"),!0):{exec:function(){return null}}}}r(e)}function s(e,t,a,i){function o(e,n){for(var t=0;t<n.c.length;t++)if(r(n.c[t].bR,e))return n.c[t]}function u(e,n){if(r(e.eR,n)){for(;e.endsParent&&e.parent;)e=e.parent;return e}return e.eW?u(e.parent,n):void 0}function f(e,n){return!a&&r(n.iR,e)}function g(e,n){var t=E.cI?n[0].toLowerCase():n[0];return e.k.hasOwnProperty(t)&&e.k[t]}function p(e,n,t,r){var a=r?"":x.classPrefix,i='<span class="'+a,o=t?"":"</span>";return i+=e+'">',i+n+o}function d(){if(!L.k)return n(y);var e="",t=0;L.lR.lastIndex=0;for(var r=L.lR.exec(y);r;){e+=n(y.substr(t,r.index-t));var a=g(L,r);a?(B+=a[1],e+=p(a[0],n(r[0]))):e+=n(r[0]),t=L.lR.lastIndex,r=L.lR.exec(y)}return e+n(y.substr(t))}function h(){if(L.sL&&!w[L.sL])return n(y);var e=L.sL?s(L.sL,y,!0,M[L.sL]):l(y);return L.r>0&&(B+=e.r),"continuous"==L.subLanguageMode&&(M[L.sL]=e.top),p(e.language,e.value,!1,!0)}function b(){return void 0!==L.sL?h():d()}function v(e,t){var r=e.cN?p(e.cN,"",!0):"";e.rB?(k+=r,y=""):e.eB?(k+=n(t)+r,y=""):(k+=r,y=t),L=Object.create(e,{parent:{value:L}})}function m(e,t){if(y+=e,void 0===t)return k+=b(),0;var r=o(t,L);if(r)return k+=b(),v(r,t),r.rB?0:t.length;var a=u(L,t);if(a){var i=L;i.rE||i.eE||(y+=t),k+=b();do L.cN&&(k+="</span>"),B+=L.r,L=L.parent;while(L!=a.parent);return i.eE&&(k+=n(t)),y="",a.starts&&v(a.starts,""),i.rE?0:t.length}if(f(t,L))throw new Error('Illegal lexeme "'+t+'" for mode "'+(L.cN||"<unnamed>")+'"');return y+=t,t.length||1}var E=N(e);if(!E)throw new Error('Unknown language: "'+e+'"');c(E);var R,L=i||E,M={},k="";for(R=L;R!=E;R=R.parent)R.cN&&(k=p(R.cN,"",!0)+k);var y="",B=0;try{for(var C,j,I=0;;){if(L.t.lastIndex=I,C=L.t.exec(t),!C)break;j=m(t.substr(I,C.index-I),C[0]),I=C.index+j}for(m(t.substr(I)),R=L;R.parent;R=R.parent)R.cN&&(k+="</span>");return{r:B,value:k,language:e,top:L}}catch(S){if(-1!=S.message.indexOf("Illegal"))return{r:0,value:n(t)};throw S}}function l(e,t){t=t||x.languages||Object.keys(w);var r={r:0,value:n(e)},a=r;return t.forEach(function(n){if(N(n)){var t=s(n,e,!1);t.language=n,t.r>a.r&&(a=t),t.r>r.r&&(a=r,r=t)}}),a.language&&(r.second_best=a),r}function f(e){return x.tabReplace&&(e=e.replace(/^((<[^>]+>|\t)+)/gm,function(e,n){return n.replace(/\t/g,x.tabReplace)})),x.useBR&&(e=e.replace(/\n/g,"<br>")),e}function g(e,n,t){var r=n?E[n]:t,a=[e.trim()];return e.match(/\bhljs\b/)||a.push("hljs"),-1===e.indexOf(r)&&a.push(r),a.join(" ").trim()}function p(e){var n=a(e);if(!/no(-?)highlight|plain|text/.test(n)){var t;x.useBR?(t=document.createElementNS("http://www.w3.org/1999/xhtml","div"),t.innerHTML=e.innerHTML.replace(/\n/g,"").replace(/<br[ \/]*>/g,"\n")):t=e;var r=t.textContent,i=n?s(n,r,!0):l(r),c=o(t);if(c.length){var p=document.createElementNS("http://www.w3.org/1999/xhtml","div");p.innerHTML=i.value,i.value=u(c,o(p),r)}i.value=f(i.value),e.innerHTML=i.value,e.className=g(e.className,n,i.language),e.result={language:i.language,re:i.r},i.second_best&&(e.second_best={language:i.second_best.language,re:i.second_best.r})}}function d(e){x=i(x,e)}function h(){if(!h.called){h.called=!0;var e=document.querySelectorAll("pre code");Array.prototype.forEach.call(e,p)}}function b(){addEventListener("DOMContentLoaded",h,!1),addEventListener("load",h,!1)}function v(n,t){var r=w[n]=t(e);r.aliases&&r.aliases.forEach(function(e){E[e]=n})}function m(){return Object.keys(w)}function N(e){return w[e]||w[E[e]]}var x={classPrefix:"hljs-",tabReplace:null,useBR:!1,languages:void 0},w={},E={};return e.highlight=s,e.highlightAuto=l,e.fixMarkup=f,e.highlightBlock=p,e.configure=d,e.initHighlighting=h,e.initHighlightingOnLoad=b,e.registerLanguage=v,e.listLanguages=m,e.getLanguage=N,e.inherit=i,e.IR="[a-zA-Z]\\w*",e.UIR="[a-zA-Z_]\\w*",e.NR="\\b\\d+(\\.\\d+)?",e.CNR="\\b(0[xX][a-fA-F0-9]+|(\\d+(\\.\\d*)?|\\.\\d+)([eE][-+]?\\d+)?)",e.BNR="\\b(0b[01]+)",e.RSR="!|!=|!==|%|%=|&|&&|&=|\\*|\\*=|\\+|\\+=|,|-|-=|/=|/|:|;|<<|<<=|<=|<|===|==|=|>>>=|>>=|>=|>>>|>>|>|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~",e.BE={b:"\\\\[\\s\\S]",r:0},e.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[e.BE]},e.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[e.BE]},e.PWM={b:/\b(a|an|the|are|I|I'm|isn't|don't|doesn't|won't|but|just|should|pretty|simply|enough|gonna|going|wtf|so|such)\b/},e.C=function(n,t,r){var a=e.inherit({cN:"comment",b:n,e:t,c:[]},r||{});return a.c.push(e.PWM),a},e.CLCM=e.C("//","$"),e.CBCM=e.C("/\\*","\\*/"),e.HCM=e.C("#","$"),e.NM={cN:"number",b:e.NR,r:0},e.CNM={cN:"number",b:e.CNR,r:0},e.BNM={cN:"number",b:e.BNR,r:0},e.CSSNM={cN:"number",b:e.NR+"(%|em|ex|ch|rem|vw|vh|vmin|vmax|cm|mm|in|pt|pc|px|deg|grad|rad|turn|s|ms|Hz|kHz|dpi|dpcm|dppx)?",r:0},e.RM={cN:"regexp",b:/\//,e:/\/[gimuy]*/,i:/\n/,c:[e.BE,{b:/\[/,e:/\]/,r:0,c:[e.BE]}]},e.TM={cN:"title",b:e.IR,r:0},e.UTM={cN:"title",b:e.UIR,r:0},e});hljs.registerLanguage("objectivec",function(e){var t={cN:"built_in",b:"(AV|CA|CF|CG|CI|MK|MP|NS|UI)\\w+"},i={keyword:"int float while char export sizeof typedef const struct for union unsigned long volatile static bool mutable if do return goto void enum else break extern asm case short default double register explicit signed typename this switch continue wchar_t inline readonly assign readwrite self @synchronized id typeof nonatomic super unichar IBOutlet IBAction strong weak copy in out inout bycopy byref oneway __strong __weak __block __autoreleasing @private @protected @public @try @property @end @throw @catch @finally @autoreleasepool @synthesize @dynamic @selector @optional @required",literal:"false true FALSE TRUE nil YES NO NULL",built_in:"BOOL dispatch_once_t dispatch_queue_t dispatch_sync dispatch_async dispatch_once"},o=/[a-zA-Z@][a-zA-Z0-9_]*/,n="@interface @class @protocol @implementation";return{aliases:["m","mm","objc","obj-c"],k:i,l:o,i:"</",c:[t,e.CLCM,e.CBCM,e.CNM,e.QSM,{cN:"string",v:[{b:'@"',e:'"',i:"\\n",c:[e.BE]},{b:"'",e:"[^\\\\]'",i:"[^\\\\][^']"}]},{cN:"preprocessor",b:"#",e:"$",c:[{cN:"title",v:[{b:'"',e:'"'},{b:"<",e:">"}]}]},{cN:"class",b:"("+n.split(" ").join("|")+")\\b",e:"({|$)",eE:!0,k:n,l:o,c:[e.UTM]},{cN:"variable",b:"\\."+e.UIR,r:0}]}});hljs.registerLanguage("sql",function(e){var t=e.C("--","$");return{cI:!0,i:/[<>]/,c:[{cN:"operator",bK:"begin end start commit rollback savepoint lock alter create drop rename call delete do handler insert load replace select truncate update set show pragma grant merge describe use explain help declare prepare execute deallocate savepoint release unlock purge reset change stop analyze cache flush optimize repair kill install uninstall checksum restore check backup revoke",e:/;/,eW:!0,k:{keyword:"abs absolute acos action add adddate addtime aes_decrypt aes_encrypt after aggregate all allocate alter analyze and any are as asc ascii asin assertion at atan atan2 atn2 authorization authors avg backup before begin benchmark between bin binlog bit_and bit_count bit_length bit_or bit_xor both by cache call cascade cascaded case cast catalog ceil ceiling chain change changed char_length character_length charindex charset check checksum checksum_agg choose close coalesce coercibility collate collation collationproperty column columns columns_updated commit compress concat concat_ws concurrent connect connection connection_id consistent constraint constraints continue contributors conv convert convert_tz corresponding cos cot count count_big crc32 create cross cume_dist curdate current current_date current_time current_timestamp current_user cursor curtime data database databases datalength date_add date_format date_sub dateadd datediff datefromparts datename datepart datetime2fromparts datetimeoffsetfromparts day dayname dayofmonth dayofweek dayofyear deallocate declare decode default deferrable deferred degrees delayed delete des_decrypt des_encrypt des_key_file desc describe descriptor diagnostics difference disconnect distinct distinctrow div do domain double drop dumpfile each else elt enclosed encode encrypt end end-exec engine engines eomonth errors escape escaped event eventdata events except exception exec execute exists exp explain export_set extended external extract fast fetch field fields find_in_set first first_value floor flush for force foreign format found found_rows from from_base64 from_days from_unixtime full function get get_format get_lock getdate getutcdate global go goto grant grants greatest group group_concat grouping grouping_id gtid_subset gtid_subtract handler having help hex high_priority hosts hour ident_current ident_incr ident_seed identified identity if ifnull ignore iif ilike immediate in index indicator inet6_aton inet6_ntoa inet_aton inet_ntoa infile initially inner innodb input insert install instr intersect into is is_free_lock is_ipv4 is_ipv4_compat is_ipv4_mapped is_not is_not_null is_used_lock isdate isnull isolation join key kill language last last_day last_insert_id last_value lcase lead leading least leaves left len lenght level like limit lines ln load load_file local localtime localtimestamp locate lock log log10 log2 logfile logs low_priority lower lpad ltrim make_set makedate maketime master master_pos_wait match matched max md5 medium merge microsecond mid min minute mod mode module month monthname mutex name_const names national natural nchar next no no_write_to_binlog not now nullif nvarchar oct octet_length of old_password on only open optimize option optionally or ord order outer outfile output pad parse partial partition password patindex percent_rank percentile_cont percentile_disc period_add period_diff pi plugin position pow power pragma precision prepare preserve primary prior privileges procedure procedure_analyze processlist profile profiles public publishingservername purge quarter query quick quote quotename radians rand read references regexp relative relaylog release release_lock rename repair repeat replace replicate reset restore restrict return returns reverse revoke right rlike rollback rollup round row row_count rows rpad rtrim savepoint schema scroll sec_to_time second section select serializable server session session_user set sha sha1 sha2 share show sign sin size slave sleep smalldatetimefromparts snapshot some soname soundex sounds_like space sql sql_big_result sql_buffer_result sql_cache sql_calc_found_rows sql_no_cache sql_small_result sql_variant_property sqlstate sqrt square start starting status std stddev stddev_pop stddev_samp stdev stdevp stop str str_to_date straight_join strcmp string stuff subdate substr substring subtime subtring_index sum switchoffset sysdate sysdatetime sysdatetimeoffset system_user sysutcdatetime table tables tablespace tan temporary terminated tertiary_weights then time time_format time_to_sec timediff timefromparts timestamp timestampadd timestampdiff timezone_hour timezone_minute to to_base64 to_days to_seconds todatetimeoffset trailing transaction translation trigger trigger_nestlevel triggers trim truncate try_cast try_convert try_parse ucase uncompress uncompressed_length unhex unicode uninstall union unique unix_timestamp unknown unlock update upgrade upped upper usage use user user_resources using utc_date utc_time utc_timestamp uuid uuid_short validate_password_strength value values var var_pop var_samp variables variance varp version view warnings week weekday weekofyear weight_string when whenever where with work write xml xor year yearweek zon",literal:"true false null",built_in:"array bigint binary bit blob boolean char character date dec decimal float int integer interval number numeric real serial smallint varchar varying int8 serial8 text"},c:[{cN:"string",b:"'",e:"'",c:[e.BE,{b:"''"}]},{cN:"string",b:'"',e:'"',c:[e.BE,{b:'""'}]},{cN:"string",b:"`",e:"`",c:[e.BE]},e.CNM,e.CBCM,t]},e.CBCM,t]}});hljs.registerLanguage("javascript",function(e){return{aliases:["js"],k:{keyword:"in of if for while finally var new function do return void else break catch instanceof with throw case default try this switch continue typeof delete let yield const export super debugger as await",literal:"true false null undefined NaN Infinity",built_in:"eval isFinite isNaN parseFloat parseInt decodeURI decodeURIComponent encodeURI encodeURIComponent escape unescape Object Function Boolean Error EvalError InternalError RangeError ReferenceError StopIteration SyntaxError TypeError URIError Number Math Date String RegExp Array Float32Array Float64Array Int16Array Int32Array Int8Array Uint16Array Uint32Array Uint8Array Uint8ClampedArray ArrayBuffer DataView JSON Intl arguments require module console window document Symbol Set Map WeakSet WeakMap Proxy Reflect Promise"},c:[{cN:"pi",r:10,v:[{b:/^\s*('|")use strict('|")/},{b:/^\s*('|")use asm('|")/}]},e.ASM,e.QSM,{cN:"string",b:"`",e:"`",c:[e.BE,{cN:"subst",b:"\\$\\{",e:"\\}"}]},e.CLCM,e.CBCM,{cN:"number",b:"\\b(0[xXbBoO][a-fA-F0-9]+|(\\d+(\\.\\d*)?|\\.\\d+)([eE][-+]?\\d+)?)",r:0},{b:"("+e.RSR+"|\\b(case|return|throw)\\b)\\s*",k:"return throw case",c:[e.CLCM,e.CBCM,e.RM,{b:/</,e:/>\s*[);\]]/,r:0,sL:"xml"}],r:0},{cN:"function",bK:"function",e:/\{/,eE:!0,c:[e.inherit(e.TM,{b:/[A-Za-z$_][0-9A-Za-z$_]*/}),{cN:"params",b:/\(/,e:/\)/,c:[e.CLCM,e.CBCM],i:/["'\(]/}],i:/\[|%/},{b:/\$[(.]/},{b:"\\."+e.IR,r:0},{bK:"import",e:"[;$]",k:"import from as",c:[e.ASM,e.QSM]},{cN:"class",bK:"class",e:/[{;=]/,eE:!0,i:/[:"\[\]]/,c:[{bK:"extends"},e.UTM]}]}});hljs.registerLanguage("scss",function(e){{var t="[a-zA-Z-][a-zA-Z0-9_-]*",i={cN:"variable",b:"(\\$"+t+")\\b"},r={cN:"function",b:t+"\\(",rB:!0,eE:!0,e:"\\("},o={cN:"hexcolor",b:"#[0-9A-Fa-f]+"};({cN:"attribute",b:"[A-Z\\_\\.\\-]+",e:":",eE:!0,i:"[^\\s]",starts:{cN:"value",eW:!0,eE:!0,c:[r,o,e.CSSNM,e.QSM,e.ASM,e.CBCM,{cN:"important",b:"!important"}]}})}return{cI:!0,i:"[=/|']",c:[e.CLCM,e.CBCM,r,{cN:"id",b:"\\#[A-Za-z0-9_-]+",r:0},{cN:"class",b:"\\.[A-Za-z0-9_-]+",r:0},{cN:"attr_selector",b:"\\[",e:"\\]",i:"$"},{cN:"tag",b:"\\b(a|abbr|acronym|address|area|article|aside|audio|b|base|big|blockquote|body|br|button|canvas|caption|cite|code|col|colgroup|command|datalist|dd|del|details|dfn|div|dl|dt|em|embed|fieldset|figcaption|figure|footer|form|frame|frameset|(h[1-6])|head|header|hgroup|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|map|mark|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|progress|q|rp|rt|ruby|samp|script|section|select|small|span|strike|strong|style|sub|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|tt|ul|var|video)\\b",r:0},{cN:"pseudo",b:":(visited|valid|root|right|required|read-write|read-only|out-range|optional|only-of-type|only-child|nth-of-type|nth-last-of-type|nth-last-child|nth-child|not|link|left|last-of-type|last-child|lang|invalid|indeterminate|in-range|hover|focus|first-of-type|first-line|first-letter|first-child|first|enabled|empty|disabled|default|checked|before|after|active)"},{cN:"pseudo",b:"::(after|before|choices|first-letter|first-line|repeat-index|repeat-item|selection|value)"},i,{cN:"attribute",b:"\\b(z-index|word-wrap|word-spacing|word-break|width|widows|white-space|visibility|vertical-align|unicode-bidi|transition-timing-function|transition-property|transition-duration|transition-delay|transition|transform-style|transform-origin|transform|top|text-underline-position|text-transform|text-shadow|text-rendering|text-overflow|text-indent|text-decoration-style|text-decoration-line|text-decoration-color|text-decoration|text-align-last|text-align|tab-size|table-layout|right|resize|quotes|position|pointer-events|perspective-origin|perspective|page-break-inside|page-break-before|page-break-after|padding-top|padding-right|padding-left|padding-bottom|padding|overflow-y|overflow-x|overflow-wrap|overflow|outline-width|outline-style|outline-offset|outline-color|outline|orphans|order|opacity|object-position|object-fit|normal|none|nav-up|nav-right|nav-left|nav-index|nav-down|min-width|min-height|max-width|max-height|mask|marks|margin-top|margin-right|margin-left|margin-bottom|margin|list-style-type|list-style-position|list-style-image|list-style|line-height|letter-spacing|left|justify-content|initial|inherit|ime-mode|image-orientation|image-resolution|image-rendering|icon|hyphens|height|font-weight|font-variant-ligatures|font-variant|font-style|font-stretch|font-size-adjust|font-size|font-language-override|font-kerning|font-feature-settings|font-family|font|float|flex-wrap|flex-shrink|flex-grow|flex-flow|flex-direction|flex-basis|flex|filter|empty-cells|display|direction|cursor|counter-reset|counter-increment|content|column-width|column-span|column-rule-width|column-rule-style|column-rule-color|column-rule|column-gap|column-fill|column-count|columns|color|clip-path|clip|clear|caption-side|break-inside|break-before|break-after|box-sizing|box-shadow|box-decoration-break|bottom|border-width|border-top-width|border-top-style|border-top-right-radius|border-top-left-radius|border-top-color|border-top|border-style|border-spacing|border-right-width|border-right-style|border-right-color|border-right|border-radius|border-left-width|border-left-style|border-left-color|border-left|border-image-width|border-image-source|border-image-slice|border-image-repeat|border-image-outset|border-image|border-color|border-collapse|border-bottom-width|border-bottom-style|border-bottom-right-radius|border-bottom-left-radius|border-bottom-color|border-bottom|border|background-size|background-repeat|background-position|background-origin|background-image|background-color|background-clip|background-attachment|background-blend-mode|background|backface-visibility|auto|animation-timing-function|animation-play-state|animation-name|animation-iteration-count|animation-fill-mode|animation-duration|animation-direction|animation-delay|animation|align-self|align-items|align-content)\\b",i:"[^\\s]"},{cN:"value",b:"\\b(whitespace|wait|w-resize|visible|vertical-text|vertical-ideographic|uppercase|upper-roman|upper-alpha|underline|transparent|top|thin|thick|text|text-top|text-bottom|tb-rl|table-header-group|table-footer-group|sw-resize|super|strict|static|square|solid|small-caps|separate|se-resize|scroll|s-resize|rtl|row-resize|ridge|right|repeat|repeat-y|repeat-x|relative|progress|pointer|overline|outside|outset|oblique|nowrap|not-allowed|normal|none|nw-resize|no-repeat|no-drop|newspaper|ne-resize|n-resize|move|middle|medium|ltr|lr-tb|lowercase|lower-roman|lower-alpha|loose|list-item|line|line-through|line-edge|lighter|left|keep-all|justify|italic|inter-word|inter-ideograph|inside|inset|inline|inline-block|inherit|inactive|ideograph-space|ideograph-parenthesis|ideograph-numeric|ideograph-alpha|horizontal|hidden|help|hand|groove|fixed|ellipsis|e-resize|double|dotted|distribute|distribute-space|distribute-letter|distribute-all-lines|disc|disabled|default|decimal|dashed|crosshair|collapse|col-resize|circle|char|center|capitalize|break-word|break-all|bottom|both|bolder|bold|block|bidi-override|below|baseline|auto|always|all-scroll|absolute|table|table-cell)\\b"},{cN:"value",b:":",e:";",c:[r,i,o,e.CSSNM,e.QSM,e.ASM,{cN:"important",b:"!important"}]},{cN:"at_rule",b:"@",e:"[{;]",k:"mixin include extend for if else each while charset import debug media page content font-face namespace warn",c:[r,i,e.QSM,e.ASM,o,e.CSSNM,{cN:"preprocessor",b:"\\s[A-Za-z0-9_.-]+",r:0}]}]}});hljs.registerLanguage("mel",function(e){return{k:"int float string vector matrix if else switch case default while do for in break continue global proc return about abs addAttr addAttributeEditorNodeHelp addDynamic addNewShelfTab addPP addPanelCategory addPrefixToName advanceToNextDrivenKey affectedNet affects aimConstraint air alias aliasAttr align alignCtx alignCurve alignSurface allViewFit ambientLight angle angleBetween animCone animCurveEditor animDisplay animView annotate appendStringArray applicationName applyAttrPreset applyTake arcLenDimContext arcLengthDimension arclen arrayMapper art3dPaintCtx artAttrCtx artAttrPaintVertexCtx artAttrSkinPaintCtx artAttrTool artBuildPaintMenu artFluidAttrCtx artPuttyCtx artSelectCtx artSetPaintCtx artUserPaintCtx assignCommand assignInputDevice assignViewportFactories attachCurve attachDeviceAttr attachSurface attrColorSliderGrp attrCompatibility attrControlGrp attrEnumOptionMenu attrEnumOptionMenuGrp attrFieldGrp attrFieldSliderGrp attrNavigationControlGrp attrPresetEditWin attributeExists attributeInfo attributeMenu attributeQuery autoKeyframe autoPlace bakeClip bakeFluidShading bakePartialHistory bakeResults bakeSimulation basename basenameEx batchRender bessel bevel bevelPlus binMembership bindSkin blend2 blendShape blendShapeEditor blendShapePanel blendTwoAttr blindDataType boneLattice boundary boxDollyCtx boxZoomCtx bufferCurve buildBookmarkMenu buildKeyframeMenu button buttonManip CBG cacheFile cacheFileCombine cacheFileMerge cacheFileTrack camera cameraView canCreateManip canvas capitalizeString catch catchQuiet ceil changeSubdivComponentDisplayLevel changeSubdivRegion channelBox character characterMap characterOutlineEditor characterize chdir checkBox checkBoxGrp checkDefaultRenderGlobals choice circle circularFillet clamp clear clearCache clip clipEditor clipEditorCurrentTimeCtx clipSchedule clipSchedulerOutliner clipTrimBefore closeCurve closeSurface cluster cmdFileOutput cmdScrollFieldExecuter cmdScrollFieldReporter cmdShell coarsenSubdivSelectionList collision color colorAtPoint colorEditor colorIndex colorIndexSliderGrp colorSliderButtonGrp colorSliderGrp columnLayout commandEcho commandLine commandPort compactHairSystem componentEditor compositingInterop computePolysetVolume condition cone confirmDialog connectAttr connectControl connectDynamic connectJoint connectionInfo constrain constrainValue constructionHistory container containsMultibyte contextInfo control convertFromOldLayers convertIffToPsd convertLightmap convertSolidTx convertTessellation convertUnit copyArray copyFlexor copyKey copySkinWeights cos cpButton cpCache cpClothSet cpCollision cpConstraint cpConvClothToMesh cpForces cpGetSolverAttr cpPanel cpProperty cpRigidCollisionFilter cpSeam cpSetEdit cpSetSolverAttr cpSolver cpSolverTypes cpTool cpUpdateClothUVs createDisplayLayer createDrawCtx createEditor createLayeredPsdFile createMotionField createNewShelf createNode createRenderLayer createSubdivRegion cross crossProduct ctxAbort ctxCompletion ctxEditMode ctxTraverse currentCtx currentTime currentTimeCtx currentUnit curve curveAddPtCtx curveCVCtx curveEPCtx curveEditorCtx curveIntersect curveMoveEPCtx curveOnSurface curveSketchCtx cutKey cycleCheck cylinder dagPose date defaultLightListCheckBox defaultNavigation defineDataServer defineVirtualDevice deformer deg_to_rad delete deleteAttr deleteShadingGroupsAndMaterials deleteShelfTab deleteUI deleteUnusedBrushes delrandstr detachCurve detachDeviceAttr detachSurface deviceEditor devicePanel dgInfo dgdirty dgeval dgtimer dimWhen directKeyCtx directionalLight dirmap dirname disable disconnectAttr disconnectJoint diskCache displacementToPoly displayAffected displayColor displayCull displayLevelOfDetail displayPref displayRGBColor displaySmoothness displayStats displayString displaySurface distanceDimContext distanceDimension doBlur dolly dollyCtx dopeSheetEditor dot dotProduct doubleProfileBirailSurface drag dragAttrContext draggerContext dropoffLocator duplicate duplicateCurve duplicateSurface dynCache dynControl dynExport dynExpression dynGlobals dynPaintEditor dynParticleCtx dynPref dynRelEdPanel dynRelEditor dynamicLoad editAttrLimits editDisplayLayerGlobals editDisplayLayerMembers editRenderLayerAdjustment editRenderLayerGlobals editRenderLayerMembers editor editorTemplate effector emit emitter enableDevice encodeString endString endsWith env equivalent equivalentTol erf error eval evalDeferred evalEcho event exactWorldBoundingBox exclusiveLightCheckBox exec executeForEachObject exists exp expression expressionEditorListen extendCurve extendSurface extrude fcheck fclose feof fflush fgetline fgetword file fileBrowserDialog fileDialog fileExtension fileInfo filetest filletCurve filter filterCurve filterExpand filterStudioImport findAllIntersections findAnimCurves findKeyframe findMenuItem findRelatedSkinCluster finder firstParentOf fitBspline flexor floatEq floatField floatFieldGrp floatScrollBar floatSlider floatSlider2 floatSliderButtonGrp floatSliderGrp floor flow fluidCacheInfo fluidEmitter fluidVoxelInfo flushUndo fmod fontDialog fopen formLayout format fprint frameLayout fread freeFormFillet frewind fromNativePath fwrite gamma gauss geometryConstraint getApplicationVersionAsFloat getAttr getClassification getDefaultBrush getFileList getFluidAttr getInputDeviceRange getMayaPanelTypes getModifiers getPanel getParticleAttr getPluginResource getenv getpid glRender glRenderEditor globalStitch gmatch goal gotoBindPose grabColor gradientControl gradientControlNoAttr graphDollyCtx graphSelectContext graphTrackCtx gravity grid gridLayout group groupObjectsByName HfAddAttractorToAS HfAssignAS HfBuildEqualMap HfBuildFurFiles HfBuildFurImages HfCancelAFR HfConnectASToHF HfCreateAttractor HfDeleteAS HfEditAS HfPerformCreateAS HfRemoveAttractorFromAS HfSelectAttached HfSelectAttractors HfUnAssignAS hardenPointCurve hardware hardwareRenderPanel headsUpDisplay headsUpMessage help helpLine hermite hide hilite hitTest hotBox hotkey hotkeyCheck hsv_to_rgb hudButton hudSlider hudSliderButton hwReflectionMap hwRender hwRenderLoad hyperGraph hyperPanel hyperShade hypot iconTextButton iconTextCheckBox iconTextRadioButton iconTextRadioCollection iconTextScrollList iconTextStaticLabel ikHandle ikHandleCtx ikHandleDisplayScale ikSolver ikSplineHandleCtx ikSystem ikSystemInfo ikfkDisplayMethod illustratorCurves image imfPlugins inheritTransform insertJoint insertJointCtx insertKeyCtx insertKnotCurve insertKnotSurface instance instanceable instancer intField intFieldGrp intScrollBar intSlider intSliderGrp interToUI internalVar intersect iprEngine isAnimCurve isConnected isDirty isParentOf isSameObject isTrue isValidObjectName isValidString isValidUiName isolateSelect itemFilter itemFilterAttr itemFilterRender itemFilterType joint jointCluster jointCtx jointDisplayScale jointLattice keyTangent keyframe keyframeOutliner keyframeRegionCurrentTimeCtx keyframeRegionDirectKeyCtx keyframeRegionDollyCtx keyframeRegionInsertKeyCtx keyframeRegionMoveKeyCtx keyframeRegionScaleKeyCtx keyframeRegionSelectKeyCtx keyframeRegionSetKeyCtx keyframeRegionTrackCtx keyframeStats lassoContext lattice latticeDeformKeyCtx launch launchImageEditor layerButton layeredShaderPort layeredTexturePort layout layoutDialog lightList lightListEditor lightListPanel lightlink lineIntersection linearPrecision linstep listAnimatable listAttr listCameras listConnections listDeviceAttachments listHistory listInputDeviceAxes listInputDeviceButtons listInputDevices listMenuAnnotation listNodeTypes listPanelCategories listRelatives listSets listTransforms listUnselected listerEditor loadFluid loadNewShelf loadPlugin loadPluginLanguageResources loadPrefObjects localizedPanelLabel lockNode loft log longNameOf lookThru ls lsThroughFilter lsType lsUI Mayatomr mag makeIdentity makeLive makePaintable makeRoll makeSingleSurface makeTubeOn makebot manipMoveContext manipMoveLimitsCtx manipOptions manipRotateContext manipRotateLimitsCtx manipScaleContext manipScaleLimitsCtx marker match max memory menu menuBarLayout menuEditor menuItem menuItemToShelf menuSet menuSetPref messageLine min minimizeApp mirrorJoint modelCurrentTimeCtx modelEditor modelPanel mouse movIn movOut move moveIKtoFK moveKeyCtx moveVertexAlongDirection multiProfileBirailSurface mute nParticle nameCommand nameField namespace namespaceInfo newPanelItems newton nodeCast nodeIconButton nodeOutliner nodePreset nodeType noise nonLinear normalConstraint normalize nurbsBoolean nurbsCopyUVSet nurbsCube nurbsEditUV nurbsPlane nurbsSelect nurbsSquare nurbsToPoly nurbsToPolygonsPref nurbsToSubdiv nurbsToSubdivPref nurbsUVSet nurbsViewDirectionVector objExists objectCenter objectLayer objectType objectTypeUI obsoleteProc oceanNurbsPreviewPlane offsetCurve offsetCurveOnSurface offsetSurface openGLExtension openMayaPref optionMenu optionMenuGrp optionVar orbit orbitCtx orientConstraint outlinerEditor outlinerPanel overrideModifier paintEffectsDisplay pairBlend palettePort paneLayout panel panelConfiguration panelHistory paramDimContext paramDimension paramLocator parent parentConstraint particle particleExists particleInstancer particleRenderInfo partition pasteKey pathAnimation pause pclose percent performanceOptions pfxstrokes pickWalk picture pixelMove planarSrf plane play playbackOptions playblast plugAttr plugNode pluginInfo pluginResourceUtil pointConstraint pointCurveConstraint pointLight pointMatrixMult pointOnCurve pointOnSurface pointPosition poleVectorConstraint polyAppend polyAppendFacetCtx polyAppendVertex polyAutoProjection polyAverageNormal polyAverageVertex polyBevel polyBlendColor polyBlindData polyBoolOp polyBridgeEdge polyCacheMonitor polyCheck polyChipOff polyClipboard polyCloseBorder polyCollapseEdge polyCollapseFacet polyColorBlindData polyColorDel polyColorPerVertex polyColorSet polyCompare polyCone polyCopyUV polyCrease polyCreaseCtx polyCreateFacet polyCreateFacetCtx polyCube polyCut polyCutCtx polyCylinder polyCylindricalProjection polyDelEdge polyDelFacet polyDelVertex polyDuplicateAndConnect polyDuplicateEdge polyEditUV polyEditUVShell polyEvaluate polyExtrudeEdge polyExtrudeFacet polyExtrudeVertex polyFlipEdge polyFlipUV polyForceUV polyGeoSampler polyHelix polyInfo polyInstallAction polyLayoutUV polyListComponentConversion polyMapCut polyMapDel polyMapSew polyMapSewMove polyMergeEdge polyMergeEdgeCtx polyMergeFacet polyMergeFacetCtx polyMergeUV polyMergeVertex polyMirrorFace polyMoveEdge polyMoveFacet polyMoveFacetUV polyMoveUV polyMoveVertex polyNormal polyNormalPerVertex polyNormalizeUV polyOptUvs polyOptions polyOutput polyPipe polyPlanarProjection polyPlane polyPlatonicSolid polyPoke polyPrimitive polyPrism polyProjection polyPyramid polyQuad polyQueryBlindData polyReduce polySelect polySelectConstraint polySelectConstraintMonitor polySelectCtx polySelectEditCtx polySeparate polySetToFaceNormal polySewEdge polyShortestPathCtx polySmooth polySoftEdge polySphere polySphericalProjection polySplit polySplitCtx polySplitEdge polySplitRing polySplitVertex polyStraightenUVBorder polySubdivideEdge polySubdivideFacet polyToSubdiv polyTorus polyTransfer polyTriangulate polyUVSet polyUnite polyWedgeFace popen popupMenu pose pow preloadRefEd print progressBar progressWindow projFileViewer projectCurve projectTangent projectionContext projectionManip promptDialog propModCtx propMove psdChannelOutliner psdEditTextureFile psdExport psdTextureFile putenv pwd python querySubdiv quit rad_to_deg radial radioButton radioButtonGrp radioCollection radioMenuItemCollection rampColorPort rand randomizeFollicles randstate rangeControl readTake rebuildCurve rebuildSurface recordAttr recordDevice redo reference referenceEdit referenceQuery refineSubdivSelectionList refresh refreshAE registerPluginResource rehash reloadImage removeJoint removeMultiInstance removePanelCategory rename renameAttr renameSelectionList renameUI render renderGlobalsNode renderInfo renderLayerButton renderLayerParent renderLayerPostProcess renderLayerUnparent renderManip renderPartition renderQualityNode renderSettings renderThumbnailUpdate renderWindowEditor renderWindowSelectContext renderer reorder reorderDeformers requires reroot resampleFluid resetAE resetPfxToPolyCamera resetTool resolutionNode retarget reverseCurve reverseSurface revolve rgb_to_hsv rigidBody rigidSolver roll rollCtx rootOf rot rotate rotationInterpolation roundConstantRadius rowColumnLayout rowLayout runTimeCommand runup sampleImage saveAllShelves saveAttrPreset saveFluid saveImage saveInitialState saveMenu savePrefObjects savePrefs saveShelf saveToolSettings scale scaleBrushBrightness scaleComponents scaleConstraint scaleKey scaleKeyCtx sceneEditor sceneUIReplacement scmh scriptCtx scriptEditorInfo scriptJob scriptNode scriptTable scriptToShelf scriptedPanel scriptedPanelType scrollField scrollLayout sculpt searchPathArray seed selLoadSettings select selectContext selectCurveCV selectKey selectKeyCtx selectKeyframeRegionCtx selectMode selectPref selectPriority selectType selectedNodes selectionConnection separator setAttr setAttrEnumResource setAttrMapping setAttrNiceNameResource setConstraintRestPosition setDefaultShadingGroup setDrivenKeyframe setDynamic setEditCtx setEditor setFluidAttr setFocus setInfinity setInputDeviceMapping setKeyCtx setKeyPath setKeyframe setKeyframeBlendshapeTargetWts setMenuMode setNodeNiceNameResource setNodeTypeFlag setParent setParticleAttr setPfxToPolyCamera setPluginResource setProject setStampDensity setStartupMessage setState setToolTo setUITemplate setXformManip sets shadingConnection shadingGeometryRelCtx shadingLightRelCtx shadingNetworkCompare shadingNode shapeCompare shelfButton shelfLayout shelfTabLayout shellField shortNameOf showHelp showHidden showManipCtx showSelectionInTitle showShadingGroupAttrEditor showWindow sign simplify sin singleProfileBirailSurface size sizeBytes skinCluster skinPercent smoothCurve smoothTangentSurface smoothstep snap2to2 snapKey snapMode snapTogetherCtx snapshot soft softMod softModCtx sort sound soundControl source spaceLocator sphere sphrand spotLight spotLightPreviewPort spreadSheetEditor spring sqrt squareSurface srtContext stackTrace startString startsWith stitchAndExplodeShell stitchSurface stitchSurfacePoints strcmp stringArrayCatenate stringArrayContains stringArrayCount stringArrayInsertAtIndex stringArrayIntersector stringArrayRemove stringArrayRemoveAtIndex stringArrayRemoveDuplicates stringArrayRemoveExact stringArrayToString stringToStringArray strip stripPrefixFromName stroke subdAutoProjection subdCleanTopology subdCollapse subdDuplicateAndConnect subdEditUV subdListComponentConversion subdMapCut subdMapSewMove subdMatchTopology subdMirror subdToBlind subdToPoly subdTransferUVsToCache subdiv subdivCrease subdivDisplaySmoothness substitute substituteAllString substituteGeometry substring surface surfaceSampler surfaceShaderList swatchDisplayPort switchTable symbolButton symbolCheckBox sysFile system tabLayout tan tangentConstraint texLatticeDeformContext texManipContext texMoveContext texMoveUVShellContext texRotateContext texScaleContext texSelectContext texSelectShortestPathCtx texSmudgeUVContext texWinToolCtx text textCurves textField textFieldButtonGrp textFieldGrp textManip textScrollList textToShelf textureDisplacePlane textureHairColor texturePlacementContext textureWindow threadCount threePointArcCtx timeControl timePort timerX toNativePath toggle toggleAxis toggleWindowVisibility tokenize tokenizeList tolerance tolower toolButton toolCollection toolDropped toolHasOptions toolPropertyWindow torus toupper trace track trackCtx transferAttributes transformCompare transformLimits translator trim trunc truncateFluidCache truncateHairCache tumble tumbleCtx turbulence twoPointArcCtx uiRes uiTemplate unassignInputDevice undo undoInfo ungroup uniform unit unloadPlugin untangleUV untitledFileName untrim upAxis updateAE userCtx uvLink uvSnapshot validateShelfName vectorize view2dToolCtx viewCamera viewClipPlane viewFit viewHeadOn viewLookAt viewManip viewPlace viewSet visor volumeAxis vortex waitCursor warning webBrowser webBrowserPrefs whatIs window windowPref wire wireContext workspace wrinkle wrinkleContext writeTake xbmLangPathList xform",i:"</",c:[e.CNM,e.ASM,e.QSM,{cN:"string",b:"`",e:"`",c:[e.BE]},{cN:"variable",v:[{b:"\\$\\d"},{b:"[\\$\\%\\@](\\^\\w\\b|#\\w+|[^\\s\\w{]|{\\w+}|\\w+)"},{b:"\\*(\\^\\w\\b|#\\w+|[^\\s\\w{]|{\\w+}|\\w+)",r:0}]},e.CLCM,e.CBCM]}});hljs.registerLanguage("d",function(e){var r={keyword:"abstract alias align asm assert auto body break byte case cast catch class const continue debug default delete deprecated do else enum export extern final finally for foreach foreach_reverse|10 goto if immutable import in inout int interface invariant is lazy macro mixin module new nothrow out override package pragma private protected public pure ref return scope shared static struct super switch synchronized template this throw try typedef typeid typeof union unittest version void volatile while with __FILE__ __LINE__ __gshared|10 __thread __traits __DATE__ __EOF__ __TIME__ __TIMESTAMP__ __VENDOR__ __VERSION__",built_in:"bool cdouble cent cfloat char creal dchar delegate double dstring float function idouble ifloat ireal long real short string ubyte ucent uint ulong ushort wchar wstring",literal:"false null true"},t="(0|[1-9][\\d_]*)",a="(0|[1-9][\\d_]*|\\d[\\d_]*|[\\d_]+?\\d)",i="0[bB][01_]+",n="([\\da-fA-F][\\da-fA-F_]*|_[\\da-fA-F][\\da-fA-F_]*)",c="0[xX]"+n,_="([eE][+-]?"+a+")",d="("+a+"(\\.\\d*|"+_+")|\\d+\\."+a+a+"|\\."+t+_+"?)",o="(0[xX]("+n+"\\."+n+"|\\.?"+n+")[pP][+-]?"+a+")",s="("+t+"|"+i+"|"+c+")",l="("+o+"|"+d+")",u="\\\\(['\"\\?\\\\abfnrtv]|u[\\dA-Fa-f]{4}|[0-7]{1,3}|x[\\dA-Fa-f]{2}|U[\\dA-Fa-f]{8})|&[a-zA-Z\\d]{2,};",b={cN:"number",b:"\\b"+s+"(L|u|U|Lu|LU|uL|UL)?",r:0},f={cN:"number",b:"\\b("+l+"([fF]|L|i|[fF]i|Li)?|"+s+"(i|[fF]i|Li))",r:0},g={cN:"string",b:"'("+u+"|.)",e:"'",i:"."},h={b:u,r:0},p={cN:"string",b:'"',c:[h],e:'"[cwd]?'},w={cN:"string",b:'[rq]"',e:'"[cwd]?',r:5},N={cN:"string",b:"`",e:"`[cwd]?"},A={cN:"string",b:'x"[\\da-fA-F\\s\\n\\r]*"[cwd]?',r:10},F={cN:"string",b:'q"\\{',e:'\\}"'},m={cN:"shebang",b:"^#!",e:"$",r:5},y={cN:"preprocessor",b:"#(line)",e:"$",r:5},L={cN:"keyword",b:"@[a-zA-Z_][a-zA-Z_\\d]*"},v=e.C("\\/\\+","\\+\\/",{c:["self"],r:10});return{l:e.UIR,k:r,c:[e.CLCM,e.CBCM,v,A,p,w,N,F,f,b,g,m,y,L]}});hljs.registerLanguage("ruleslanguage",function(T){return{k:{keyword:"BILL_PERIOD BILL_START BILL_STOP RS_EFFECTIVE_START RS_EFFECTIVE_STOP RS_JURIS_CODE RS_OPCO_CODE INTDADDATTRIBUTE|5 INTDADDVMSG|5 INTDBLOCKOP|5 INTDBLOCKOPNA|5 INTDCLOSE|5 INTDCOUNT|5 INTDCOUNTSTATUSCODE|5 INTDCREATEMASK|5 INTDCREATEDAYMASK|5 INTDCREATEFACTORMASK|5 INTDCREATEHANDLE|5 INTDCREATEOVERRIDEDAYMASK|5 INTDCREATEOVERRIDEMASK|5 INTDCREATESTATUSCODEMASK|5 INTDCREATETOUPERIOD|5 INTDDELETE|5 INTDDIPTEST|5 INTDEXPORT|5 INTDGETERRORCODE|5 INTDGETERRORMESSAGE|5 INTDISEQUAL|5 INTDJOIN|5 INTDLOAD|5 INTDLOADACTUALCUT|5 INTDLOADDATES|5 INTDLOADHIST|5 INTDLOADLIST|5 INTDLOADLISTDATES|5 INTDLOADLISTENERGY|5 INTDLOADLISTHIST|5 INTDLOADRELATEDCHANNEL|5 INTDLOADSP|5 INTDLOADSTAGING|5 INTDLOADUOM|5 INTDLOADUOMDATES|5 INTDLOADUOMHIST|5 INTDLOADVERSION|5 INTDOPEN|5 INTDREADFIRST|5 INTDREADNEXT|5 INTDRECCOUNT|5 INTDRELEASE|5 INTDREPLACE|5 INTDROLLAVG|5 INTDROLLPEAK|5 INTDSCALAROP|5 INTDSCALE|5 INTDSETATTRIBUTE|5 INTDSETDSTPARTICIPANT|5 INTDSETSTRING|5 INTDSETVALUE|5 INTDSETVALUESTATUS|5 INTDSHIFTSTARTTIME|5 INTDSMOOTH|5 INTDSORT|5 INTDSPIKETEST|5 INTDSUBSET|5 INTDTOU|5 INTDTOURELEASE|5 INTDTOUVALUE|5 INTDUPDATESTATS|5 INTDVALUE|5 STDEV INTDDELETEEX|5 INTDLOADEXACTUAL|5 INTDLOADEXCUT|5 INTDLOADEXDATES|5 INTDLOADEX|5 INTDLOADEXRELATEDCHANNEL|5 INTDSAVEEX|5 MVLOAD|5 MVLOADACCT|5 MVLOADACCTDATES|5 MVLOADACCTHIST|5 MVLOADDATES|5 MVLOADHIST|5 MVLOADLIST|5 MVLOADLISTDATES|5 MVLOADLISTHIST|5 IF FOR NEXT DONE SELECT END CALL ABORT CLEAR CHANNEL FACTOR LIST NUMBER OVERRIDE SET WEEK DISTRIBUTIONNODE ELSE WHEN THEN OTHERWISE IENUM CSV INCLUDE LEAVE RIDER SAVE DELETE NOVALUE SECTION WARN SAVE_UPDATE DETERMINANT LABEL REPORT REVENUE EACH IN FROM TOTAL CHARGE BLOCK AND OR CSV_FILE RATE_CODE AUXILIARY_DEMAND UIDACCOUNT RS BILL_PERIOD_SELECT HOURS_PER_MONTH INTD_ERROR_STOP SEASON_SCHEDULE_NAME ACCOUNTFACTOR ARRAYUPPERBOUND CALLSTOREDPROC GETADOCONNECTION GETCONNECT GETDATASOURCE GETQUALIFIER GETUSERID HASVALUE LISTCOUNT LISTOP LISTUPDATE LISTVALUE PRORATEFACTOR RSPRORATE SETBINPATH SETDBMONITOR WQ_OPEN BILLINGHOURS DATE DATEFROMFLOAT DATETIMEFROMSTRING DATETIMETOSTRING DATETOFLOAT DAY DAYDIFF DAYNAME DBDATETIME HOUR MINUTE MONTH MONTHDIFF MONTHHOURS MONTHNAME ROUNDDATE SAMEWEEKDAYLASTYEAR SECOND WEEKDAY WEEKDIFF YEAR YEARDAY YEARSTR COMPSUM HISTCOUNT HISTMAX HISTMIN HISTMINNZ HISTVALUE MAXNRANGE MAXRANGE MINRANGE COMPIKVA COMPKVA COMPKVARFROMKQKW COMPLF IDATTR FLAG LF2KW LF2KWH MAXKW POWERFACTOR READING2USAGE AVGSEASON MAXSEASON MONTHLYMERGE SEASONVALUE SUMSEASON ACCTREADDATES ACCTTABLELOAD CONFIGADD CONFIGGET CREATEOBJECT CREATEREPORT EMAILCLIENT EXPBLKMDMUSAGE EXPMDMUSAGE EXPORT_USAGE FACTORINEFFECT GETUSERSPECIFIEDSTOP INEFFECT ISHOLIDAY RUNRATE SAVE_PROFILE SETREPORTTITLE USEREXIT WATFORRUNRATE TO TABLE ACOS ASIN ATAN ATAN2 BITAND CEIL COS COSECANT COSH COTANGENT DIVQUOT DIVREM EXP FABS FLOOR FMOD FREPM FREXPN LOG LOG10 MAX MAXN MIN MINNZ MODF POW ROUND ROUND2VALUE ROUNDINT SECANT SIN SINH SQROOT TAN TANH FLOAT2STRING FLOAT2STRINGNC INSTR LEFT LEN LTRIM MID RIGHT RTRIM STRING STRINGNC TOLOWER TOUPPER TRIM NUMDAYS READ_DATE STAGING",built_in:"IDENTIFIER OPTIONS XML_ELEMENT XML_OP XML_ELEMENT_OF DOMDOCCREATE DOMDOCLOADFILE DOMDOCLOADXML DOMDOCSAVEFILE DOMDOCGETROOT DOMDOCADDPI DOMNODEGETNAME DOMNODEGETTYPE DOMNODEGETVALUE DOMNODEGETCHILDCT DOMNODEGETFIRSTCHILD DOMNODEGETSIBLING DOMNODECREATECHILDELEMENT DOMNODESETATTRIBUTE DOMNODEGETCHILDELEMENTCT DOMNODEGETFIRSTCHILDELEMENT DOMNODEGETSIBLINGELEMENT DOMNODEGETATTRIBUTECT DOMNODEGETATTRIBUTEI DOMNODEGETATTRIBUTEBYNAME DOMNODEGETBYNAME"},c:[T.CLCM,T.CBCM,T.ASM,T.QSM,T.CNM,{cN:"array",b:"#[a-zA-Z .]+"}]}});hljs.registerLanguage("actionscript",function(e){var a="[a-zA-Z_$][a-zA-Z0-9_$]*",c="([*]|[a-zA-Z_$][a-zA-Z0-9_$]*)",t={cN:"rest_arg",b:"[.]{3}",e:a,r:10};return{aliases:["as"],k:{keyword:"as break case catch class const continue default delete do dynamic each else extends final finally for function get if implements import in include instanceof interface internal is namespace native new override package private protected public return set static super switch this throw try typeof use var void while with",literal:"true false null undefined"},c:[e.ASM,e.QSM,e.CLCM,e.CBCM,e.CNM,{cN:"package",bK:"package",e:"{",c:[e.TM]},{cN:"class",bK:"class interface",e:"{",eE:!0,c:[{bK:"extends implements"},e.TM]},{cN:"preprocessor",bK:"import include",e:";"},{cN:"function",bK:"function",e:"[{;]",eE:!0,i:"\\S",c:[e.TM,{cN:"params",b:"\\(",e:"\\)",c:[e.ASM,e.QSM,e.CLCM,e.CBCM,t]},{cN:"type",b:":",e:c,r:10}]}]}});hljs.registerLanguage("coffeescript",function(e){var c={keyword:"in if for while finally new do return else break catch instanceof throw try this switch continue typeof delete debugger super then unless until loop of by when and or is isnt not",literal:"true false null undefined yes no on off",reserved:"case default function var void with const let enum export import native __hasProp __extends __slice __bind __indexOf",built_in:"npm require console print module global window document"},n="[A-Za-z$_][0-9A-Za-z$_]*",t={cN:"subst",b:/#\{/,e:/}/,k:c},r=[e.BNM,e.inherit(e.CNM,{starts:{e:"(\\s*/)?",r:0}}),{cN:"string",v:[{b:/'''/,e:/'''/,c:[e.BE]},{b:/'/,e:/'/,c:[e.BE]},{b:/"""/,e:/"""/,c:[e.BE,t]},{b:/"/,e:/"/,c:[e.BE,t]}]},{cN:"regexp",v:[{b:"///",e:"///",c:[t,e.HCM]},{b:"//[gim]*",r:0},{b:/\/(?![ *])(\\\/|.)*?\/[gim]*(?=\W|$)/}]},{cN:"property",b:"@"+n},{b:"`",e:"`",eB:!0,eE:!0,sL:"javascript"}];t.c=r;var i=e.inherit(e.TM,{b:n}),s="(\\(.*\\))?\\s*\\B[-=]>",o={cN:"params",b:"\\([^\\(]",rB:!0,c:[{b:/\(/,e:/\)/,k:c,c:["self"].concat(r)}]};return{aliases:["coffee","cson","iced"],k:c,i:/\/\*/,c:r.concat([e.C("###","###"),e.HCM,{cN:"function",b:"^\\s*"+n+"\\s*=\\s*"+s,e:"[-=]>",rB:!0,c:[i,o]},{b:/[:\(,=]\s*/,r:0,c:[{cN:"function",b:s,e:"[-=]>",rB:!0,c:[o]}]},{cN:"class",bK:"class",e:"$",i:/[:="\[\]]/,c:[{bK:"extends",eW:!0,i:/[:="\[\]]/,c:[i]},i]},{cN:"attribute",b:n+":",e:":",rB:!0,rE:!0,r:0}])}});hljs.registerLanguage("tex",function(c){var e={cN:"command",b:"\\\\[a-zA-Zа-яА-я]+[\\*]?"},m={cN:"command",b:"\\\\[^a-zA-Zа-яА-я0-9]"},r={cN:"special",b:"[{}\\[\\]\\&#~]",r:0};return{c:[{b:"\\\\[a-zA-Zа-яА-я]+[\\*]? *= *-?\\d*\\.?\\d+(pt|pc|mm|cm|in|dd|cc|ex|em)?",rB:!0,c:[e,m,{cN:"number",b:" *=",e:"-?\\d*\\.?\\d+(pt|pc|mm|cm|in|dd|cc|ex|em)?",eB:!0}],r:10},e,m,r,{cN:"formula",b:"\\$\\$",e:"\\$\\$",c:[e,m,r],r:0},{cN:"formula",b:"\\$",e:"\\$",c:[e,m,r],r:0},c.C("%","$",{r:0})]}});hljs.registerLanguage("go",function(e){var t={keyword:"break default func interface select case map struct chan else goto package switch const fallthrough if range type continue for import return var go defer",constant:"true false iota nil",typename:"bool byte complex64 complex128 float32 float64 int8 int16 int32 int64 string uint8 uint16 uint32 uint64 int uint uintptr rune",built_in:"append cap close complex copy imag len make new panic print println real recover delete"};return{aliases:["golang"],k:t,i:"</",c:[e.CLCM,e.CBCM,e.QSM,{cN:"string",b:"'",e:"[^\\\\]'"},{cN:"string",b:"`",e:"`"},{cN:"number",b:e.CNR+"[dflsi]?",r:0},e.CNM]}});hljs.registerLanguage("vbscript-html",function(s){return{sL:"xml",subLanguageMode:"continuous",c:[{b:"<%",e:"%>",sL:"vbscript"}]}});hljs.registerLanguage("haskell",function(e){var c=[e.C("--","$"),e.C("{-","-}",{c:["self"]})],a={cN:"pragma",b:"{-#",e:"#-}"},i={cN:"preprocessor",b:"^#",e:"$"},n={cN:"type",b:"\\b[A-Z][\\w']*",r:0},t={cN:"container",b:"\\(",e:"\\)",i:'"',c:[a,i,{cN:"type",b:"\\b[A-Z][\\w]*(\\((\\.\\.|,|\\w+)\\))?"},e.inherit(e.TM,{b:"[_a-z][\\w']*"})].concat(c)},l={cN:"container",b:"{",e:"}",c:t.c};return{aliases:["hs"],k:"let in if then else case of where do module import hiding qualified type data newtype deriving class instance as default infix infixl infixr foreign export ccall stdcall cplusplus jvm dotnet safe unsafe family forall mdo proc rec",c:[{cN:"module",b:"\\bmodule\\b",e:"where",k:"module where",c:[t].concat(c),i:"\\W\\.|;"},{cN:"import",b:"\\bimport\\b",e:"$",k:"import|0 qualified as hiding",c:[t].concat(c),i:"\\W\\.|;"},{cN:"class",b:"^(\\s*)?(class|instance)\\b",e:"where",k:"class family instance where",c:[n,t].concat(c)},{cN:"typedef",b:"\\b(data|(new)?type)\\b",e:"$",k:"data family type newtype deriving",c:[a,n,t,l].concat(c)},{cN:"default",bK:"default",e:"$",c:[n,t].concat(c)},{cN:"infix",bK:"infix infixl infixr",e:"$",c:[e.CNM].concat(c)},{cN:"foreign",b:"\\bforeign\\b",e:"$",k:"foreign import export ccall stdcall cplusplus jvm dotnet safe unsafe",c:[n,e.QSM].concat(c)},{cN:"shebang",b:"#!\\/usr\\/bin\\/env runhaskell",e:"$"},a,i,e.QSM,e.CNM,n,e.inherit(e.TM,{b:"^[_a-z][\\w']*"}),{b:"->|<-"}].concat(c)}});hljs.registerLanguage("scilab",function(e){var n=[e.CNM,{cN:"string",b:"'|\"",e:"'|\"",c:[e.BE,{b:"''"}]}];return{aliases:["sci"],k:{keyword:"abort break case clear catch continue do elseif else endfunction end for functionglobal if pause return resume select try then while%f %F %t %T %pi %eps %inf %nan %e %i %z %s",built_in:"abs and acos asin atan ceil cd chdir clearglobal cosh cos cumprod deff disp errorexec execstr exists exp eye gettext floor fprintf fread fsolve imag isdef isemptyisinfisnan isvector lasterror length load linspace list listfiles log10 log2 logmax min msprintf mclose mopen ones or pathconvert poly printf prod pwd rand realround sinh sin size gsort sprintf sqrt strcat strcmps tring sum system tanh tantype typename warning zeros matrix"},i:'("|#|/\\*|\\s+/\\w+)',c:[{cN:"function",bK:"function endfunction",e:"$",k:"function endfunction|10",c:[e.UTM,{cN:"params",b:"\\(",e:"\\)"}]},{cN:"transposed_variable",b:"[a-zA-Z_][a-zA-Z_0-9]*('+[\\.']*|[\\.']+)",e:"",r:0},{cN:"matrix",b:"\\[",e:"\\]'*[\\.']*",r:0,c:n},e.C("//","$")].concat(n)}});hljs.registerLanguage("profile",function(e){return{c:[e.CNM,{cN:"built_in",b:"{",e:"}$",eB:!0,eE:!0,c:[e.ASM,e.QSM],r:0},{cN:"filename",b:"[a-zA-Z_][\\da-zA-Z_]+\\.[\\da-zA-Z_]{1,3}",e:":",eE:!0},{cN:"header",b:"(ncalls|tottime|cumtime)",e:"$",k:"ncalls tottime|10 cumtime|10 filename",r:10},{cN:"summary",b:"function calls",e:"$",c:[e.CNM],r:10},e.ASM,e.QSM,{cN:"function",b:"\\(",e:"\\)$",c:[e.UTM],r:0}]}});hljs.registerLanguage("thrift",function(e){var t="bool byte i16 i32 i64 double string binary";return{k:{keyword:"namespace const typedef struct enum service exception void oneway set list map required optional",built_in:t,literal:"true false"},c:[e.QSM,e.NM,e.CLCM,e.CBCM,{cN:"class",bK:"struct enum service exception",e:/\{/,i:/\n/,c:[e.inherit(e.TM,{starts:{eW:!0,eE:!0}})]},{b:"\\b(set|list|map)\\s*<",e:">",k:t,c:["self"]}]}});hljs.registerLanguage("matlab",function(e){var a=[e.CNM,{cN:"string",b:"'",e:"'",c:[e.BE,{b:"''"}]}],s={r:0,c:[{cN:"operator",b:/'['\.]*/}]};return{k:{keyword:"break case catch classdef continue else elseif end enumerated events for function global if methods otherwise parfor persistent properties return spmd switch try while",built_in:"sin sind sinh asin asind asinh cos cosd cosh acos acosd acosh tan tand tanh atan atand atan2 atanh sec secd sech asec asecd asech csc cscd csch acsc acscd acsch cot cotd coth acot acotd acoth hypot exp expm1 log log1p log10 log2 pow2 realpow reallog realsqrt sqrt nthroot nextpow2 abs angle complex conj imag real unwrap isreal cplxpair fix floor ceil round mod rem sign airy besselj bessely besselh besseli besselk beta betainc betaln ellipj ellipke erf erfc erfcx erfinv expint gamma gammainc gammaln psi legendre cross dot factor isprime primes gcd lcm rat rats perms nchoosek factorial cart2sph cart2pol pol2cart sph2cart hsv2rgb rgb2hsv zeros ones eye repmat rand randn linspace logspace freqspace meshgrid accumarray size length ndims numel disp isempty isequal isequalwithequalnans cat reshape diag blkdiag tril triu fliplr flipud flipdim rot90 find sub2ind ind2sub bsxfun ndgrid permute ipermute shiftdim circshift squeeze isscalar isvector ans eps realmax realmin pi i inf nan isnan isinf isfinite j why compan gallery hadamard hankel hilb invhilb magic pascal rosser toeplitz vander wilkinson"},i:'(//|"|#|/\\*|\\s+/\\w+)',c:[{cN:"function",bK:"function",e:"$",c:[e.UTM,{cN:"params",b:"\\(",e:"\\)"},{cN:"params",b:"\\[",e:"\\]"}]},{b:/[a-zA-Z_][a-zA-Z_0-9]*'['\.]*/,rB:!0,r:0,c:[{b:/[a-zA-Z_][a-zA-Z_0-9]*/,r:0},s.c[0]]},{cN:"matrix",b:"\\[",e:"\\]",c:a,r:0,starts:s},{cN:"cell",b:"\\{",e:/}/,c:a,r:0,starts:s},{b:/\)/,r:0,starts:s},e.C("^\\s*\\%\\{\\s*$","^\\s*\\%\\}\\s*$"),e.C("\\%","$")].concat(a)}});hljs.registerLanguage("vbscript",function(e){return{aliases:["vbs"],cI:!0,k:{keyword:"call class const dim do loop erase execute executeglobal exit for each next function if then else on error option explicit new private property let get public randomize redim rem select case set stop sub while wend with end to elseif is or xor and not class_initialize class_terminate default preserve in me byval byref step resume goto",built_in:"lcase month vartype instrrev ubound setlocale getobject rgb getref string weekdayname rnd dateadd monthname now day minute isarray cbool round formatcurrency conversions csng timevalue second year space abs clng timeserial fixs len asc isempty maths dateserial atn timer isobject filter weekday datevalue ccur isdate instr datediff formatdatetime replace isnull right sgn array snumeric log cdbl hex chr lbound msgbox ucase getlocale cos cdate cbyte rtrim join hour oct typename trim strcomp int createobject loadpicture tan formatnumber mid scriptenginebuildversion scriptengine split scriptengineminorversion cint sin datepart ltrim sqr scriptenginemajorversion time derived eval date formatpercent exp inputbox left ascw chrw regexp server response request cstr err",literal:"true false null nothing empty"},i:"//",c:[e.inherit(e.QSM,{c:[{b:'""'}]}),e.C(/'/,/$/,{r:0}),e.CNM]}});hljs.registerLanguage("capnproto",function(t){return{aliases:["capnp"],k:{keyword:"struct enum interface union group import using const annotation extends in of on as with from fixed",built_in:"Void Bool Int8 Int16 Int32 Int64 UInt8 UInt16 UInt32 UInt64 Float32 Float64 Text Data AnyPointer AnyStruct Capability List",literal:"true false"},c:[t.QSM,t.NM,t.HCM,{cN:"shebang",b:/@0x[\w\d]{16};/,i:/\n/},{cN:"number",b:/@\d+\b/},{cN:"class",bK:"struct enum",e:/\{/,i:/\n/,c:[t.inherit(t.TM,{starts:{eW:!0,eE:!0}})]},{cN:"class",bK:"interface",e:/\{/,i:/\n/,c:[t.inherit(t.TM,{starts:{eW:!0,eE:!0}})]}]}});hljs.registerLanguage("xl",function(e){var t="ObjectLoader Animate MovieCredits Slides Filters Shading Materials LensFlare Mapping VLCAudioVideo StereoDecoder PointCloud NetworkAccess RemoteControl RegExp ChromaKey Snowfall NodeJS Speech Charts",o={keyword:"if then else do while until for loop import with is as where when by data constant",literal:"true false nil",type:"integer real text name boolean symbol infix prefix postfix block tree",built_in:"in mod rem and or xor not abs sign floor ceil sqrt sin cos tan asin acos atan exp expm1 log log2 log10 log1p pi at",module:t,id:"text_length text_range text_find text_replace contains page slide basic_slide title_slide title subtitle fade_in fade_out fade_at clear_color color line_color line_width texture_wrap texture_transform texture scale_?x scale_?y scale_?z? translate_?x translate_?y translate_?z? rotate_?x rotate_?y rotate_?z? rectangle circle ellipse sphere path line_to move_to quad_to curve_to theme background contents locally time mouse_?x mouse_?y mouse_buttons"},a={cN:"constant",b:"[A-Z][A-Z_0-9]+",r:0},r={cN:"variable",b:"([A-Z][a-z_0-9]+)+",r:0},i={cN:"id",b:"[a-z][a-z_0-9]+",r:0},l={cN:"string",b:'"',e:'"',i:"\\n"},n={cN:"string",b:"'",e:"'",i:"\\n"},s={cN:"string",b:"<<",e:">>"},c={cN:"number",b:"[0-9]+#[0-9A-Z_]+(\\.[0-9-A-Z_]+)?#?([Ee][+-]?[0-9]+)?",r:10},_={cN:"import",bK:"import",e:"$",k:{keyword:"import",module:t},r:0,c:[l]},d={cN:"function",b:"[a-z].*->"};return{aliases:["tao"],l:/[a-zA-Z][a-zA-Z0-9_?]*/,k:o,c:[e.CLCM,e.CBCM,l,n,s,d,_,a,r,i,c,e.NM]}});hljs.registerLanguage("scala",function(e){var t={cN:"annotation",b:"@[A-Za-z]+"},a={cN:"string",b:'u?r?"""',e:'"""',r:10},r={cN:"symbol",b:"'\\w[\\w\\d_]*(?!')"},c={cN:"type",b:"\\b[A-Z][A-Za-z0-9_]*",r:0},i={cN:"title",b:/[^0-9\n\t "'(),.`{}\[\]:;][^\n\t "'(),.`{}\[\]:;]+|[^0-9\n\t "'(),.`{}\[\]:;=]/,r:0},l={cN:"class",bK:"class object trait type",e:/[:={\[(\n;]/,c:[{cN:"keyword",bK:"extends with",r:10},i]},n={cN:"function",bK:"def val",e:/[:={\[(\n;]/,c:[i]};return{k:{literal:"true false null",keyword:"type yield lazy override def with val var sealed abstract private trait object if forSome for while throw finally protected extends import final return else break new catch super class case package default try this match continue throws implicit"},c:[e.CLCM,e.CBCM,a,e.QSM,r,c,n,l,e.CNM,t]}});hljs.registerLanguage("elixir",function(e){var n="[a-zA-Z_][a-zA-Z0-9_]*(\\!|\\?)?",r="[a-zA-Z_]\\w*[!?=]?|[-+~]\\@|<<|>>|=~|===?|<=>|[<>]=?|\\*\\*|[-/+%^&*~`|]|\\[\\]=?",b="and false then defined module in return redo retry end for true self when next until do begin unless nil break not case cond alias while ensure or include use alias fn quote",c={cN:"subst",b:"#\\{",e:"}",l:n,k:b},a={cN:"string",c:[e.BE,c],v:[{b:/'/,e:/'/},{b:/"/,e:/"/}]},i={cN:"function",bK:"def defp defmacro",e:/\B\b/,c:[e.inherit(e.TM,{b:n,endsParent:!0})]},s=e.inherit(i,{cN:"class",bK:"defmodule defrecord",e:/\bdo\b|$|;/}),l=[a,e.HCM,s,i,{cN:"constant",b:"(\\b[A-Z_]\\w*(.)?)+",r:0},{cN:"symbol",b:":",c:[a,{b:r}],r:0},{cN:"symbol",b:n+":",r:0},{cN:"number",b:"(\\b0[0-7_]+)|(\\b0x[0-9a-fA-F_]+)|(\\b[1-9][0-9_]*(\\.[0-9_]+)?)|[0_]\\b",r:0},{cN:"variable",b:"(\\$\\W)|((\\$|\\@\\@?)(\\w+))"},{b:"->"},{b:"("+e.RSR+")\\s*",c:[e.HCM,{cN:"regexp",i:"\\n",c:[e.BE,c],v:[{b:"/",e:"/[a-z]*"},{b:"%r\\[",e:"\\][a-z]*"}]}],r:0}];return c.c=l,{l:n,k:b,c:l}});hljs.registerLanguage("sml",function(e){return{aliases:["ml"],k:{keyword:"abstype and andalso as case datatype do else end eqtype exception fn fun functor handle if in include infix infixr let local nonfix of op open orelse raise rec sharing sig signature struct structure then type val with withtype where while",built_in:"array bool char exn int list option order real ref string substring vector unit word",literal:"true false NONE SOME LESS EQUAL GREATER nil"},i:/\/\/|>>/,l:"[a-z_]\\w*!?",c:[{cN:"literal",b:"\\[(\\|\\|)?\\]|\\(\\)"},e.C("\\(\\*","\\*\\)",{c:["self"]}),{cN:"symbol",b:"'[A-Za-z_](?!')[\\w']*"},{cN:"tag",b:"`[A-Z][\\w']*"},{cN:"type",b:"\\b[A-Z][\\w']*",r:0},{b:"[a-z_]\\w*'[\\w']*"},e.inherit(e.ASM,{cN:"char",r:0}),e.inherit(e.QSM,{i:null}),{cN:"number",b:"\\b(0[xX][a-fA-F0-9_]+[Lln]?|0[oO][0-7_]+[Lln]?|0[bB][01_]+[Lln]?|[0-9][0-9_]*([Lln]|(\\.[0-9_]*)?([eE][-+]?[0-9_]+)?)?)",r:0},{b:/[-=]>/}]}});hljs.registerLanguage("apache",function(e){var r={cN:"number",b:"[\\$%]\\d+"};return{aliases:["apacheconf"],cI:!0,c:[e.HCM,{cN:"tag",b:"</?",e:">"},{cN:"keyword",b:/\w+/,r:0,k:{common:"order deny allow setenv rewriterule rewriteengine rewritecond documentroot sethandler errordocument loadmodule options header listen serverroot servername"},starts:{e:/$/,r:0,k:{literal:"on off all"},c:[{cN:"sqbracket",b:"\\s\\[",e:"\\]$"},{cN:"cbracket",b:"[\\$%]\\{",e:"\\}",c:["self",r]},r,e.QSM]}}],i:/\S/}});hljs.registerLanguage("dockerfile",function(n){return{aliases:["docker"],cI:!0,k:{built_ins:"from maintainer cmd expose add copy entrypoint volume user workdir onbuild run env"},c:[n.HCM,{k:{built_in:"run cmd entrypoint volume add copy workdir onbuild"},b:/^ *(onbuild +)?(run|cmd|entrypoint|volume|add|copy|workdir) +/,starts:{e:/[^\\]\n/,sL:"bash",subLanguageMode:"continuous"}},{k:{built_in:"from maintainer expose env user onbuild"},b:/^ *(onbuild +)?(from|maintainer|expose|env|user|onbuild) +/,e:/[^\\]\n/,c:[n.ASM,n.QSM,n.NM,n.HCM]}]}});hljs.registerLanguage("markdown",function(e){return{aliases:["md","mkdown","mkd"],c:[{cN:"header",v:[{b:"^#{1,6}",e:"$"},{b:"^.+?\\n[=-]{2,}$"}]},{b:"<",e:">",sL:"xml",r:0},{cN:"bullet",b:"^([*+-]|(\\d+\\.))\\s+"},{cN:"strong",b:"[*_]{2}.+?[*_]{2}"},{cN:"emphasis",v:[{b:"\\*.+?\\*"},{b:"_.+?_",r:0}]},{cN:"blockquote",b:"^>\\s+",e:"$"},{cN:"code",v:[{b:"`.+?`"},{b:"^( {4}|	)",e:"$",r:0}]},{cN:"horizontal_rule",b:"^[-\\*]{3,}",e:"$"},{b:"\\[.+?\\][\\(\\[].*?[\\)\\]]",rB:!0,c:[{cN:"link_label",b:"\\[",e:"\\]",eB:!0,rE:!0,r:0},{cN:"link_url",b:"\\]\\(",e:"\\)",eB:!0,eE:!0},{cN:"link_reference",b:"\\]\\[",e:"\\]",eB:!0,eE:!0}],r:10},{b:"^\\[.+\\]:",rB:!0,c:[{cN:"link_reference",b:"\\[",e:"\\]:",eB:!0,eE:!0,starts:{cN:"link_url",e:"$"}}]}]}});hljs.registerLanguage("haml",function(s){return{cI:!0,c:[{cN:"doctype",b:"^!!!( (5|1\\.1|Strict|Frameset|Basic|Mobile|RDFa|XML\\b.*))?$",r:10},s.C("^\\s*(!=#|=#|-#|/).*$",!1,{r:0}),{b:"^\\s*(-|=|!=)(?!#)",starts:{e:"\\n",sL:"ruby"}},{cN:"tag",b:"^\\s*%",c:[{cN:"title",b:"\\w+"},{cN:"value",b:"[#\\.]\\w+"},{b:"{\\s*",e:"\\s*}",eE:!0,c:[{b:":\\w+\\s*=>",e:",\\s+",rB:!0,eW:!0,c:[{cN:"symbol",b:":\\w+"},{cN:"string",b:'"',e:'"'},{cN:"string",b:"'",e:"'"},{b:"\\w+",r:0}]}]},{b:"\\(\\s*",e:"\\s*\\)",eE:!0,c:[{b:"\\w+\\s*=",e:"\\s+",rB:!0,eW:!0,c:[{cN:"attribute",b:"\\w+",r:0},{cN:"string",b:'"',e:'"'},{cN:"string",b:"'",e:"'"},{b:"\\w+",r:0}]}]}]},{cN:"bullet",b:"^\\s*[=~]\\s*",r:0},{b:"#{",starts:{e:"}",sL:"ruby"}}]}});hljs.registerLanguage("fortran",function(e){var t={cN:"params",b:"\\(",e:"\\)"},n={constant:".False. .True.",type:"integer real character complex logical dimension allocatable|10 parameter external implicit|10 none double precision assign intent optional pointer target in out common equivalence data",keyword:"kind do while private call intrinsic where elsewhere type endtype endmodule endselect endinterface end enddo endif if forall endforall only contains default return stop then public subroutine|10 function program .and. .or. .not. .le. .eq. .ge. .gt. .lt. goto save else use module select case access blank direct exist file fmt form formatted iostat name named nextrec number opened rec recl sequential status unformatted unit continue format pause cycle exit c_null_char c_alert c_backspace c_form_feed flush wait decimal round iomsg synchronous nopass non_overridable pass protected volatile abstract extends import non_intrinsic value deferred generic final enumerator class associate bind enum c_int c_short c_long c_long_long c_signed_char c_size_t c_int8_t c_int16_t c_int32_t c_int64_t c_int_least8_t c_int_least16_t c_int_least32_t c_int_least64_t c_int_fast8_t c_int_fast16_t c_int_fast32_t c_int_fast64_t c_intmax_t C_intptr_t c_float c_double c_long_double c_float_complex c_double_complex c_long_double_complex c_bool c_char c_null_ptr c_null_funptr c_new_line c_carriage_return c_horizontal_tab c_vertical_tab iso_c_binding c_loc c_funloc c_associated  c_f_pointer c_ptr c_funptr iso_fortran_env character_storage_size error_unit file_storage_size input_unit iostat_end iostat_eor numeric_storage_size output_unit c_f_procpointer ieee_arithmetic ieee_support_underflow_control ieee_get_underflow_mode ieee_set_underflow_mode newunit contiguous pad position action delim readwrite eor advance nml interface procedure namelist include sequence elemental pure",built_in:"alog alog10 amax0 amax1 amin0 amin1 amod cabs ccos cexp clog csin csqrt dabs dacos dasin datan datan2 dcos dcosh ddim dexp dint dlog dlog10 dmax1 dmin1 dmod dnint dsign dsin dsinh dsqrt dtan dtanh float iabs idim idint idnint ifix isign max0 max1 min0 min1 sngl algama cdabs cdcos cdexp cdlog cdsin cdsqrt cqabs cqcos cqexp cqlog cqsin cqsqrt dcmplx dconjg derf derfc dfloat dgamma dimag dlgama iqint qabs qacos qasin qatan qatan2 qcmplx qconjg qcos qcosh qdim qerf qerfc qexp qgamma qimag qlgama qlog qlog10 qmax1 qmin1 qmod qnint qsign qsin qsinh qsqrt qtan qtanh abs acos aimag aint anint asin atan atan2 char cmplx conjg cos cosh exp ichar index int log log10 max min nint sign sin sinh sqrt tan tanh print write dim lge lgt lle llt mod nullify allocate deallocate adjustl adjustr all allocated any associated bit_size btest ceiling count cshift date_and_time digits dot_product eoshift epsilon exponent floor fraction huge iand ibclr ibits ibset ieor ior ishft ishftc lbound len_trim matmul maxexponent maxloc maxval merge minexponent minloc minval modulo mvbits nearest pack present product radix random_number random_seed range repeat reshape rrspacing scale scan selected_int_kind selected_real_kind set_exponent shape size spacing spread sum system_clock tiny transpose trim ubound unpack verify achar iachar transfer dble entry dprod cpu_time command_argument_count get_command get_command_argument get_environment_variable is_iostat_end ieee_arithmetic ieee_support_underflow_control ieee_get_underflow_mode ieee_set_underflow_mode is_iostat_eor move_alloc new_line selected_char_kind same_type_as extends_type_ofacosh asinh atanh bessel_j0 bessel_j1 bessel_jn bessel_y0 bessel_y1 bessel_yn erf erfc erfc_scaled gamma log_gamma hypot norm2 atomic_define atomic_ref execute_command_line leadz trailz storage_size merge_bits bge bgt ble blt dshiftl dshiftr findloc iall iany iparity image_index lcobound ucobound maskl maskr num_images parity popcnt poppar shifta shiftl shiftr this_image"};return{cI:!0,aliases:["f90","f95"],k:n,c:[e.inherit(e.ASM,{cN:"string",r:0}),e.inherit(e.QSM,{cN:"string",r:0}),{cN:"function",bK:"subroutine function program",i:"[${=\\n]",c:[e.UTM,t]},e.C("!","$",{r:0}),{cN:"number",b:"(?=\\b|\\+|\\-|\\.)(?=\\.\\d|\\d)(?:\\d+)?(?:\\.?\\d*)(?:[de][+-]?\\d+)?\\b\\.?",r:0}]}});hljs.registerLanguage("smali",function(r){var t=["add","and","cmp","cmpg","cmpl","const","div","double","float","goto","if","int","long","move","mul","neg","new","nop","not","or","rem","return","shl","shr","sput","sub","throw","ushr","xor"],n=["aget","aput","array","check","execute","fill","filled","goto/16","goto/32","iget","instance","invoke","iput","monitor","packed","sget","sparse"],s=["transient","constructor","abstract","final","synthetic","public","private","protected","static","bridge","system"];return{aliases:["smali"],c:[{cN:"string",b:'"',e:'"',r:0},r.C("#","$",{r:0}),{cN:"keyword",b:"\\s*\\.end\\s[a-zA-Z0-9]*",r:1},{cN:"keyword",b:"^[ ]*\\.[a-zA-Z]*",r:0},{cN:"keyword",b:"\\s:[a-zA-Z_0-9]*",r:0},{cN:"keyword",b:"\\s("+s.join("|")+")",r:1},{cN:"keyword",b:"\\[",r:0},{cN:"instruction",b:"\\s("+t.join("|")+")\\s",r:1},{cN:"instruction",b:"\\s("+t.join("|")+")((\\-|/)[a-zA-Z0-9]+)+\\s",r:10},{cN:"instruction",b:"\\s("+n.join("|")+")((\\-|/)[a-zA-Z0-9]+)*\\s",r:10},{cN:"class",b:"L[^(;:\n]*;",r:0},{cN:"function",b:'( |->)[^(\n ;"]*\\(',r:0},{cN:"function",b:"\\)",r:0},{cN:"variable",b:"[vp][0-9]+",r:0}]}});hljs.registerLanguage("julia",function(r){var e={keyword:"in abstract baremodule begin bitstype break catch ccall const continue do else elseif end export finally for function global if immutable import importall let local macro module quote return try type typealias using while",literal:"true false ANY ARGS CPU_CORES C_NULL DL_LOAD_PATH DevNull ENDIAN_BOM ENV I|0 Inf Inf16 Inf32 InsertionSort JULIA_HOME LOAD_PATH MS_ASYNC MS_INVALIDATE MS_SYNC MergeSort NaN NaN16 NaN32 OS_NAME QuickSort RTLD_DEEPBIND RTLD_FIRST RTLD_GLOBAL RTLD_LAZY RTLD_LOCAL RTLD_NODELETE RTLD_NOLOAD RTLD_NOW RoundDown RoundFromZero RoundNearest RoundToZero RoundUp STDERR STDIN STDOUT VERSION WORD_SIZE catalan cglobal e eu eulergamma golden im nothing pi γ π φ",built_in:"ASCIIString AbstractArray AbstractRNG AbstractSparseArray Any ArgumentError Array Associative Base64Pipe Bidiagonal BigFloat BigInt BitArray BitMatrix BitVector Bool BoundsError Box CFILE Cchar Cdouble Cfloat Char CharString Cint Clong Clonglong ClusterManager Cmd Coff_t Colon Complex Complex128 Complex32 Complex64 Condition Cptrdiff_t Cshort Csize_t Cssize_t Cuchar Cuint Culong Culonglong Cushort Cwchar_t DArray DataType DenseArray Diagonal Dict DimensionMismatch DirectIndexString Display DivideError DomainError EOFError EachLine Enumerate ErrorException Exception Expr Factorization FileMonitor FileOffset Filter Float16 Float32 Float64 FloatRange FloatingPoint Function GetfieldNode GotoNode Hermitian IO IOBuffer IOStream IPv4 IPv6 InexactError Int Int128 Int16 Int32 Int64 Int8 IntSet Integer InterruptException IntrinsicFunction KeyError LabelNode LambdaStaticData LineNumberNode LoadError LocalProcess MIME MathConst MemoryError MersenneTwister Method MethodError MethodTable Module NTuple NewvarNode Nothing Number ObjectIdDict OrdinalRange OverflowError ParseError PollingFileWatcher ProcessExitedException ProcessGroup Ptr QuoteNode Range Range1 Ranges Rational RawFD Real Regex RegexMatch RemoteRef RepString RevString RopeString RoundingMode Set SharedArray Signed SparseMatrixCSC StackOverflowError Stat StatStruct StepRange String SubArray SubString SymTridiagonal Symbol SymbolNode Symmetric SystemError Task TextDisplay Timer TmStruct TopNode Triangular Tridiagonal Type TypeConstructor TypeError TypeName TypeVar UTF16String UTF32String UTF8String UdpSocket Uint Uint128 Uint16 Uint32 Uint64 Uint8 UndefRefError UndefVarError UniformScaling UnionType UnitRange Unsigned Vararg VersionNumber WString WeakKeyDict WeakRef Woodbury Zip"},t="[A-Za-z_\\u00A1-\\uFFFF][A-Za-z_0-9\\u00A1-\\uFFFF]*",o={l:t,k:e},n={cN:"type-annotation",b:/::/},a={cN:"subtype",b:/<:/},i={cN:"number",b:/(\b0x[\d_]*(\.[\d_]*)?|0x\.\d[\d_]*)p[-+]?\d+|\b0[box][a-fA-F0-9][a-fA-F0-9_]*|(\b\d[\d_]*(\.[\d_]*)?|\.\d[\d_]*)([eEfF][-+]?\d+)?/,r:0},l={cN:"char",b:/'(.|\\[xXuU][a-zA-Z0-9]+)'/},c={cN:"subst",b:/\$\(/,e:/\)/,k:e},u={cN:"variable",b:"\\$"+t},d={cN:"string",c:[r.BE,c,u],v:[{b:/\w*"/,e:/"\w*/},{b:/\w*"""/,e:/"""\w*/}]},g={cN:"string",c:[r.BE,c,u],b:"`",e:"`"},s={cN:"macrocall",b:"@"+t},S={cN:"comment",v:[{b:"#=",e:"=#",r:10},{b:"#",e:"$"}]};return o.c=[i,l,n,a,d,g,s,S,r.HCM],c.c=o.c,o});hljs.registerLanguage("delphi",function(e){var r="exports register file shl array record property for mod while set ally label uses raise not stored class safecall var interface or private static exit index inherited to else stdcall override shr asm far resourcestring finalization packed virtual out and protected library do xorwrite goto near function end div overload object unit begin string on inline repeat until destructor write message program with read initialization except default nil if case cdecl in downto threadvar of try pascal const external constructor type public then implementation finally published procedure",t=[e.CLCM,e.C(/\{/,/\}/,{r:0}),e.C(/\(\*/,/\*\)/,{r:10})],i={cN:"string",b:/'/,e:/'/,c:[{b:/''/}]},c={cN:"string",b:/(#\d+)+/},o={b:e.IR+"\\s*=\\s*class\\s*\\(",rB:!0,c:[e.TM]},n={cN:"function",bK:"function constructor destructor procedure",e:/[:;]/,k:"function constructor|10 destructor|10 procedure|10",c:[e.TM,{cN:"params",b:/\(/,e:/\)/,k:r,c:[i,c]}].concat(t)};return{cI:!0,k:r,i:/"|\$[G-Zg-z]|\/\*|<\/|\|/,c:[i,c,e.NM,o,n].concat(t)}});hljs.registerLanguage("brainfuck",function(r){var n={cN:"literal",b:"[\\+\\-]",r:0};return{aliases:["bf"],c:[r.C("[^\\[\\]\\.,\\+\\-<> \r\n]","[\\[\\]\\.,\\+\\-<> \r\n]",{rE:!0,r:0}),{cN:"title",b:"[\\[\\]]",r:0},{cN:"string",b:"[\\.,]",r:0},{b:/\+\+|\-\-/,rB:!0,c:[n]},n]}});hljs.registerLanguage("ini",function(e){return{cI:!0,i:/\S/,c:[e.C(";","$"),{cN:"title",b:"^\\[",e:"\\]"},{cN:"setting",b:"^[a-z0-9\\[\\]_-]+[ \\t]*=[ \\t]*",e:"$",c:[{cN:"value",eW:!0,k:"on off true false yes no",c:[e.QSM,e.NM],r:0}]}]}});hljs.registerLanguage("json",function(e){var t={literal:"true false null"},i=[e.QSM,e.CNM],l={cN:"value",e:",",eW:!0,eE:!0,c:i,k:t},c={b:"{",e:"}",c:[{cN:"attribute",b:'\\s*"',e:'"\\s*:\\s*',eB:!0,eE:!0,c:[e.BE],i:"\\n",starts:l}],i:"\\S"},n={b:"\\[",e:"\\]",c:[e.inherit(l,{cN:null})],i:"\\S"};return i.splice(i.length,0,c,n),{c:i,k:t,i:"\\S"}});hljs.registerLanguage("powershell",function(e){var t={b:"`[\\s\\S]",r:0},r={cN:"variable",v:[{b:/\$[\w\d][\w\d_:]*/}]},o={cN:"string",b:/"/,e:/"/,c:[t,r,{cN:"variable",b:/\$[A-z]/,e:/[^A-z]/}]},a={cN:"string",b:/'/,e:/'/};return{aliases:["ps"],l:/-?[A-z\.\-]+/,cI:!0,k:{keyword:"if else foreach return function do while until elseif begin for trap data dynamicparam end break throw param continue finally in switch exit filter try process catch",literal:"$null $true $false",built_in:"Add-Content Add-History Add-Member Add-PSSnapin Clear-Content Clear-Item Clear-Item Property Clear-Variable Compare-Object ConvertFrom-SecureString Convert-Path ConvertTo-Html ConvertTo-SecureString Copy-Item Copy-ItemProperty Export-Alias Export-Clixml Export-Console Export-Csv ForEach-Object Format-Custom Format-List Format-Table Format-Wide Get-Acl Get-Alias Get-AuthenticodeSignature Get-ChildItem Get-Command Get-Content Get-Credential Get-Culture Get-Date Get-EventLog Get-ExecutionPolicy Get-Help Get-History Get-Host Get-Item Get-ItemProperty Get-Location Get-Member Get-PfxCertificate Get-Process Get-PSDrive Get-PSProvider Get-PSSnapin Get-Service Get-TraceSource Get-UICulture Get-Unique Get-Variable Get-WmiObject Group-Object Import-Alias Import-Clixml Import-Csv Invoke-Expression Invoke-History Invoke-Item Join-Path Measure-Command Measure-Object Move-Item Move-ItemProperty New-Alias New-Item New-ItemProperty New-Object New-PSDrive New-Service New-TimeSpan New-Variable Out-Default Out-File Out-Host Out-Null Out-Printer Out-String Pop-Location Push-Location Read-Host Remove-Item Remove-ItemProperty Remove-PSDrive Remove-PSSnapin Remove-Variable Rename-Item Rename-ItemProperty Resolve-Path Restart-Service Resume-Service Select-Object Select-String Set-Acl Set-Alias Set-AuthenticodeSignature Set-Content Set-Date Set-ExecutionPolicy Set-Item Set-ItemProperty Set-Location Set-PSDebug Set-Service Set-TraceSource Set-Variable Sort-Object Split-Path Start-Service Start-Sleep Start-Transcript Stop-Process Stop-Service Stop-Transcript Suspend-Service Tee-Object Test-Path Trace-Command Update-FormatData Update-TypeData Where-Object Write-Debug Write-Error Write-Host Write-Output Write-Progress Write-Verbose Write-Warning",operator:"-ne -eq -lt -gt -ge -le -not -like -notlike -match -notmatch -contains -notcontains -in -notin -replace"},c:[e.HCM,e.NM,o,a,r]}});hljs.registerLanguage("gradle",function(e){return{cI:!0,k:{keyword:"task project allprojects subprojects artifacts buildscript configurations dependencies repositories sourceSets description delete from into include exclude source classpath destinationDir includes options sourceCompatibility targetCompatibility group flatDir doLast doFirst flatten todir fromdir ant def abstract break case catch continue default do else extends final finally for if implements instanceof native new private protected public return static switch synchronized throw throws transient try volatile while strictfp package import false null super this true antlrtask checkstyle codenarc copy boolean byte char class double float int interface long short void compile runTime file fileTree abs any append asList asWritable call collect compareTo count div dump each eachByte eachFile eachLine every find findAll flatten getAt getErr getIn getOut getText grep immutable inject inspect intersect invokeMethods isCase join leftShift minus multiply newInputStream newOutputStream newPrintWriter newReader newWriter next plus pop power previous print println push putAt read readBytes readLines reverse reverseEach round size sort splitEachLine step subMap times toInteger toList tokenize upto waitForOrKill withPrintWriter withReader withStream withWriter withWriterAppend write writeLine"},c:[e.CLCM,e.CBCM,e.ASM,e.QSM,e.NM,e.RM]}});hljs.registerLanguage("erb",function(e){return{sL:"xml",subLanguageMode:"continuous",c:[e.C("<%#","%>"),{b:"<%[%=-]?",e:"[%-]?%>",sL:"ruby",eB:!0,eE:!0}]}});hljs.registerLanguage("swift",function(e){var i={keyword:"class deinit enum extension func import init let protocol static struct subscript typealias var break case continue default do else fallthrough if in for return switch where while as dynamicType is new super self Self Type __COLUMN__ __FILE__ __FUNCTION__ __LINE__ associativity didSet get infix inout left mutating none nonmutating operator override postfix precedence prefix right set unowned unowned safe unsafe weak willSet",literal:"true false nil",built_in:"abs advance alignof alignofValue assert bridgeFromObjectiveC bridgeFromObjectiveCUnconditional bridgeToObjectiveC bridgeToObjectiveCUnconditional c contains count countElements countLeadingZeros debugPrint debugPrintln distance dropFirst dropLast dump encodeBitsAsWords enumerate equal false filter find getBridgedObjectiveCType getVaList indices insertionSort isBridgedToObjectiveC isBridgedVerbatimToObjectiveC isUniquelyReferenced join lexicographicalCompare map max maxElement min minElement nil numericCast partition posix print println quickSort reduce reflect reinterpretCast reverse roundUpToAlignment sizeof sizeofValue sort split startsWith strideof strideofValue swap swift toString transcode true underestimateCount unsafeReflect withExtendedLifetime withObjectAtPlusZero withUnsafePointer withUnsafePointerToObject withUnsafePointers withVaList"},t={cN:"type",b:"\\b[A-Z][\\w']*",r:0},n=e.C("/\\*","\\*/",{c:["self"]}),r={cN:"subst",b:/\\\(/,e:"\\)",k:i,c:[]},s={cN:"number",b:"\\b([\\d_]+(\\.[\\deE_]+)?|0x[a-fA-F0-9_]+(\\.[a-fA-F0-9p_]+)?|0b[01_]+|0o[0-7_]+)\\b",r:0},o=e.inherit(e.QSM,{c:[r,e.BE]});return r.c=[s],{k:i,c:[o,e.CLCM,n,t,s,{cN:"func",bK:"func",e:"{",eE:!0,c:[e.inherit(e.TM,{b:/[A-Za-z$_][0-9A-Za-z$_]*/,i:/\(/}),{cN:"generics",b:/</,e:/>/,i:/>/},{cN:"params",b:/\(/,e:/\)/,endsParent:!0,k:i,c:["self",s,o,e.CBCM,{b:":"}],i:/["']/}],i:/\[|%/},{cN:"class",bK:"struct protocol class extension enum",k:i,e:"\\{",eE:!0,c:[e.inherit(e.TM,{b:/[A-Za-z$_][0-9A-Za-z$_]*/})]},{cN:"preprocessor",b:"(@assignment|@class_protocol|@exported|@final|@lazy|@noreturn|@NSCopying|@NSManaged|@objc|@optional|@required|@auto_closure|@noreturn|@IBAction|@IBDesignable|@IBInspectable|@IBOutlet|@infix|@prefix|@postfix)"}]}});hljs.registerLanguage("lisp",function(b){var e="[a-zA-Z_\\-\\+\\*\\/\\<\\=\\>\\&\\#][a-zA-Z0-9_\\-\\+\\*\\/\\<\\=\\>\\&\\#!]*",c="\\|[^]*?\\|",r="(\\-|\\+)?\\d+(\\.\\d+|\\/\\d+)?((d|e|f|l|s|D|E|F|L|S)(\\+|\\-)?\\d+)?",a={cN:"shebang",b:"^#!",e:"$"},i={cN:"literal",b:"\\b(t{1}|nil)\\b"},l={cN:"number",v:[{b:r,r:0},{b:"#(b|B)[0-1]+(/[0-1]+)?"},{b:"#(o|O)[0-7]+(/[0-7]+)?"},{b:"#(x|X)[0-9a-fA-F]+(/[0-9a-fA-F]+)?"},{b:"#(c|C)\\("+r+" +"+r,e:"\\)"}]},t=b.inherit(b.QSM,{i:null}),d=b.C(";","$",{r:0}),n={cN:"variable",b:"\\*",e:"\\*"},u={cN:"keyword",b:"[:&]"+e},N={b:e,r:0},o={b:c},s={b:"\\(",e:"\\)",c:["self",i,t,l,N]},v={cN:"quoted",c:[l,t,n,u,s,N],v:[{b:"['`]\\(",e:"\\)"},{b:"\\(quote ",e:"\\)",k:"quote"},{b:"'"+c}]},f={cN:"quoted",v:[{b:"'"+e},{b:"#'"+e+"(::"+e+")*"}]},g={cN:"list",b:"\\(\\s*",e:"\\)"},q={eW:!0,r:0};return g.c=[{cN:"keyword",v:[{b:e},{b:c}]},q],q.c=[v,f,g,i,l,t,d,n,u,o,N],{i:/\S/,c:[l,a,i,t,d,v,f,g,N]}});hljs.registerLanguage("rsl",function(e){return{k:{keyword:"float color point normal vector matrix while for if do return else break extern continue",built_in:"abs acos ambient area asin atan atmosphere attribute calculatenormal ceil cellnoise clamp comp concat cos degrees depth Deriv diffuse distance Du Dv environment exp faceforward filterstep floor format fresnel incident length lightsource log match max min mod noise normalize ntransform opposite option phong pnoise pow printf ptlined radians random reflect refract renderinfo round setcomp setxcomp setycomp setzcomp shadow sign sin smoothstep specular specularbrdf spline sqrt step tan texture textureinfo trace transform vtransform xcomp ycomp zcomp"},i:"</",c:[e.CLCM,e.CBCM,e.QSM,e.ASM,e.CNM,{cN:"preprocessor",b:"#",e:"$"},{cN:"shader",bK:"surface displacement light volume imager",e:"\\("},{cN:"shading",bK:"illuminate illuminance gather",e:"\\("}]}});hljs.registerLanguage("scheme",function(e){var t="[^\\(\\)\\[\\]\\{\\}\",'`;#|\\\\\\s]+",r="(\\-|\\+)?\\d+([./]\\d+)?",i=r+"[+\\-]"+r+"i",a={built_in:"case-lambda call/cc class define-class exit-handler field import inherit init-field interface let*-values let-values let/ec mixin opt-lambda override protect provide public rename require require-for-syntax syntax syntax-case syntax-error unit/sig unless when with-syntax and begin call-with-current-continuation call-with-input-file call-with-output-file case cond define define-syntax delay do dynamic-wind else for-each if lambda let let* let-syntax letrec letrec-syntax map or syntax-rules ' * + , ,@ - ... / ; < <= = => > >= ` abs acos angle append apply asin assoc assq assv atan boolean? caar cadr call-with-input-file call-with-output-file call-with-values car cdddar cddddr cdr ceiling char->integer char-alphabetic? char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>? char-downcase char-lower-case? char-numeric? char-ready? char-upcase char-upper-case? char-whitespace? char<=? char<? char=? char>=? char>? char? close-input-port close-output-port complex? cons cos current-input-port current-output-port denominator display eof-object? eq? equal? eqv? eval even? exact->inexact exact? exp expt floor force gcd imag-part inexact->exact inexact? input-port? integer->char integer? interaction-environment lcm length list list->string list->vector list-ref list-tail list? load log magnitude make-polar make-rectangular make-string make-vector max member memq memv min modulo negative? newline not null-environment null? number->string number? numerator odd? open-input-file open-output-file output-port? pair? peek-char port? positive? procedure? quasiquote quote quotient rational? rationalize read read-char real-part real? remainder reverse round scheme-report-environment set! set-car! set-cdr! sin sqrt string string->list string->number string->symbol string-append string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>? string-copy string-fill! string-length string-ref string-set! string<=? string<? string=? string>=? string>? string? substring symbol->string symbol? tan transcript-off transcript-on truncate values vector vector->list vector-fill! vector-length vector-ref vector-set! with-input-from-file with-output-to-file write write-char zero?"},n={cN:"shebang",b:"^#!",e:"$"},c={cN:"literal",b:"(#t|#f|#\\\\"+t+"|#\\\\.)"},l={cN:"number",v:[{b:r,r:0},{b:i,r:0},{b:"#b[0-1]+(/[0-1]+)?"},{b:"#o[0-7]+(/[0-7]+)?"},{b:"#x[0-9a-f]+(/[0-9a-f]+)?"}]},s=e.QSM,o=[e.C(";","$",{r:0}),e.C("#\\|","\\|#")],u={b:t,r:0},p={cN:"variable",b:"'"+t},d={eW:!0,r:0},g={cN:"list",v:[{b:"\\(",e:"\\)"},{b:"\\[",e:"\\]"}],c:[{cN:"keyword",b:t,l:t,k:a},d]};return d.c=[c,l,s,u,p,g].concat(o),{i:/\S/,c:[n,l,s,p,g].concat(o)}});hljs.registerLanguage("stata",function(e){return{aliases:["do","ado"],cI:!0,k:"if else in foreach for forv forva forval forvalu forvalue forvalues by bys bysort xi quietly qui capture about ac ac_7 acprplot acprplot_7 adjust ado adopath adoupdate alpha ameans an ano anov anova anova_estat anova_terms anovadef aorder ap app appe appen append arch arch_dr arch_estat arch_p archlm areg areg_p args arima arima_dr arima_estat arima_p as asmprobit asmprobit_estat asmprobit_lf asmprobit_mfx__dlg asmprobit_p ass asse asser assert avplot avplot_7 avplots avplots_7 bcskew0 bgodfrey binreg bip0_lf biplot bipp_lf bipr_lf bipr_p biprobit bitest bitesti bitowt blogit bmemsize boot bootsamp bootstrap bootstrap_8 boxco_l boxco_p boxcox boxcox_6 boxcox_p bprobit br break brier bro brow brows browse brr brrstat bs bs_7 bsampl_w bsample bsample_7 bsqreg bstat bstat_7 bstat_8 bstrap bstrap_7 ca ca_estat ca_p cabiplot camat canon canon_8 canon_8_p canon_estat canon_p cap caprojection capt captu captur capture cat cc cchart cchart_7 cci cd censobs_table centile cf char chdir checkdlgfiles checkestimationsample checkhlpfiles checksum chelp ci cii cl class classutil clear cli clis clist clo clog clog_lf clog_p clogi clogi_sw clogit clogit_lf clogit_p clogitp clogl_sw cloglog clonevar clslistarray cluster cluster_measures cluster_stop cluster_tree cluster_tree_8 clustermat cmdlog cnr cnre cnreg cnreg_p cnreg_sw cnsreg codebook collaps4 collapse colormult_nb colormult_nw compare compress conf confi confir confirm conren cons const constr constra constrai constrain constraint continue contract copy copyright copysource cor corc corr corr2data corr_anti corr_kmo corr_smc corre correl correla correlat correlate corrgram cou coun count cox cox_p cox_sw coxbase coxhaz coxvar cprplot cprplot_7 crc cret cretu cretur creturn cross cs cscript cscript_log csi ct ct_is ctset ctst_5 ctst_st cttost cumsp cumsp_7 cumul cusum cusum_7 cutil d datasig datasign datasigna datasignat datasignatu datasignatur datasignature datetof db dbeta de dec deco decod decode deff des desc descr descri describ describe destring dfbeta dfgls dfuller di di_g dir dirstats dis discard disp disp_res disp_s displ displa display distinct do doe doed doedi doedit dotplot dotplot_7 dprobit drawnorm drop ds ds_util dstdize duplicates durbina dwstat dydx e ed edi edit egen eivreg emdef en enc enco encod encode eq erase ereg ereg_lf ereg_p ereg_sw ereghet ereghet_glf ereghet_glf_sh ereghet_gp ereghet_ilf ereghet_ilf_sh ereghet_ip eret eretu eretur ereturn err erro error est est_cfexist est_cfname est_clickable est_expand est_hold est_table est_unhold est_unholdok estat estat_default estat_summ estat_vce_only esti estimates etodow etof etomdy ex exi exit expand expandcl fac fact facto factor factor_estat factor_p factor_pca_rotated factor_rotate factormat fcast fcast_compute fcast_graph fdades fdadesc fdadescr fdadescri fdadescrib fdadescribe fdasav fdasave fdause fh_st file open file read file close file filefilter fillin find_hlp_file findfile findit findit_7 fit fl fli flis flist for5_0 form forma format fpredict frac_154 frac_adj frac_chk frac_cox frac_ddp frac_dis frac_dv frac_in frac_mun frac_pp frac_pq frac_pv frac_wgt frac_xo fracgen fracplot fracplot_7 fracpoly fracpred fron_ex fron_hn fron_p fron_tn fron_tn2 frontier ftodate ftoe ftomdy ftowdate g gamhet_glf gamhet_gp gamhet_ilf gamhet_ip gamma gamma_d2 gamma_p gamma_sw gammahet gdi_hexagon gdi_spokes ge gen gene gener genera generat generate genrank genstd genvmean gettoken gl gladder gladder_7 glim_l01 glim_l02 glim_l03 glim_l04 glim_l05 glim_l06 glim_l07 glim_l08 glim_l09 glim_l10 glim_l11 glim_l12 glim_lf glim_mu glim_nw1 glim_nw2 glim_nw3 glim_p glim_v1 glim_v2 glim_v3 glim_v4 glim_v5 glim_v6 glim_v7 glm glm_6 glm_p glm_sw glmpred glo glob globa global glogit glogit_8 glogit_p gmeans gnbre_lf gnbreg gnbreg_5 gnbreg_p gomp_lf gompe_sw gomper_p gompertz gompertzhet gomphet_glf gomphet_glf_sh gomphet_gp gomphet_ilf gomphet_ilf_sh gomphet_ip gphdot gphpen gphprint gprefs gprobi_p gprobit gprobit_8 gr gr7 gr_copy gr_current gr_db gr_describe gr_dir gr_draw gr_draw_replay gr_drop gr_edit gr_editviewopts gr_example gr_example2 gr_export gr_print gr_qscheme gr_query gr_read gr_rename gr_replay gr_save gr_set gr_setscheme gr_table gr_undo gr_use graph graph7 grebar greigen greigen_7 greigen_8 grmeanby grmeanby_7 gs_fileinfo gs_filetype gs_graphinfo gs_stat gsort gwood h hadimvo hareg hausman haver he heck_d2 heckma_p heckman heckp_lf heckpr_p heckprob hel help hereg hetpr_lf hetpr_p hetprob hettest hexdump hilite hist hist_7 histogram hlogit hlu hmeans hotel hotelling hprobit hreg hsearch icd9 icd9_ff icd9p iis impute imtest inbase include inf infi infil infile infix inp inpu input ins insheet insp inspe inspec inspect integ inten intreg intreg_7 intreg_p intrg2_ll intrg_ll intrg_ll2 ipolate iqreg ir irf irf_create irfm iri is_svy is_svysum isid istdize ivprob_1_lf ivprob_lf ivprobit ivprobit_p ivreg ivreg_footnote ivtob_1_lf ivtob_lf ivtobit ivtobit_p jackknife jacknife jknife jknife_6 jknife_8 jkstat joinby kalarma1 kap kap_3 kapmeier kappa kapwgt kdensity kdensity_7 keep ksm ksmirnov ktau kwallis l la lab labe label labelbook ladder levels levelsof leverage lfit lfit_p li lincom line linktest lis list lloghet_glf lloghet_glf_sh lloghet_gp lloghet_ilf lloghet_ilf_sh lloghet_ip llogi_sw llogis_p llogist llogistic llogistichet lnorm_lf lnorm_sw lnorma_p lnormal lnormalhet lnormhet_glf lnormhet_glf_sh lnormhet_gp lnormhet_ilf lnormhet_ilf_sh lnormhet_ip lnskew0 loadingplot loc loca local log logi logis_lf logistic logistic_p logit logit_estat logit_p loglogs logrank loneway lookfor lookup lowess lowess_7 lpredict lrecomp lroc lroc_7 lrtest ls lsens lsens_7 lsens_x lstat ltable ltable_7 ltriang lv lvr2plot lvr2plot_7 m ma mac macr macro makecns man manova manova_estat manova_p manovatest mantel mark markin markout marksample mat mat_capp mat_order mat_put_rr mat_rapp mata mata_clear mata_describe mata_drop mata_matdescribe mata_matsave mata_matuse mata_memory mata_mlib mata_mosave mata_rename mata_which matalabel matcproc matlist matname matr matri matrix matrix_input__dlg matstrik mcc mcci md0_ md1_ md1debug_ md2_ md2debug_ mds mds_estat mds_p mdsconfig mdslong mdsmat mdsshepard mdytoe mdytof me_derd mean means median memory memsize meqparse mer merg merge mfp mfx mhelp mhodds minbound mixed_ll mixed_ll_reparm mkassert mkdir mkmat mkspline ml ml_5 ml_adjs ml_bhhhs ml_c_d ml_check ml_clear ml_cnt ml_debug ml_defd ml_e0 ml_e0_bfgs ml_e0_cycle ml_e0_dfp ml_e0i ml_e1 ml_e1_bfgs ml_e1_bhhh ml_e1_cycle ml_e1_dfp ml_e2 ml_e2_cycle ml_ebfg0 ml_ebfr0 ml_ebfr1 ml_ebh0q ml_ebhh0 ml_ebhr0 ml_ebr0i ml_ecr0i ml_edfp0 ml_edfr0 ml_edfr1 ml_edr0i ml_eds ml_eer0i ml_egr0i ml_elf ml_elf_bfgs ml_elf_bhhh ml_elf_cycle ml_elf_dfp ml_elfi ml_elfs ml_enr0i ml_enrr0 ml_erdu0 ml_erdu0_bfgs ml_erdu0_bhhh ml_erdu0_bhhhq ml_erdu0_cycle ml_erdu0_dfp ml_erdu0_nrbfgs ml_exde ml_footnote ml_geqnr ml_grad0 ml_graph ml_hbhhh ml_hd0 ml_hold ml_init ml_inv ml_log ml_max ml_mlout ml_mlout_8 ml_model ml_nb0 ml_opt ml_p ml_plot ml_query ml_rdgrd ml_repor ml_s_e ml_score ml_searc ml_technique ml_unhold mleval mlf_ mlmatbysum mlmatsum mlog mlogi mlogit mlogit_footnote mlogit_p mlopts mlsum mlvecsum mnl0_ mor more mov move mprobit mprobit_lf mprobit_p mrdu0_ mrdu1_ mvdecode mvencode mvreg mvreg_estat n nbreg nbreg_al nbreg_lf nbreg_p nbreg_sw nestreg net newey newey_7 newey_p news nl nl_7 nl_9 nl_9_p nl_p nl_p_7 nlcom nlcom_p nlexp2 nlexp2_7 nlexp2a nlexp2a_7 nlexp3 nlexp3_7 nlgom3 nlgom3_7 nlgom4 nlgom4_7 nlinit nllog3 nllog3_7 nllog4 nllog4_7 nlog_rd nlogit nlogit_p nlogitgen nlogittree nlpred no nobreak noi nois noisi noisil noisily note notes notes_dlg nptrend numlabel numlist odbc old_ver olo olog ologi ologi_sw ologit ologit_p ologitp on one onew onewa oneway op_colnm op_comp op_diff op_inv op_str opr opro oprob oprob_sw oprobi oprobi_p oprobit oprobitp opts_exclusive order orthog orthpoly ou out outf outfi outfil outfile outs outsh outshe outshee outsheet ovtest pac pac_7 palette parse parse_dissim pause pca pca_8 pca_display pca_estat pca_p pca_rotate pcamat pchart pchart_7 pchi pchi_7 pcorr pctile pentium pergram pergram_7 permute permute_8 personal peto_st pkcollapse pkcross pkequiv pkexamine pkexamine_7 pkshape pksumm pksumm_7 pl plo plot plugin pnorm pnorm_7 poisgof poiss_lf poiss_sw poisso_p poisson poisson_estat post postclose postfile postutil pperron pr prais prais_e prais_e2 prais_p predict predictnl preserve print pro prob probi probit probit_estat probit_p proc_time procoverlay procrustes procrustes_estat procrustes_p profiler prog progr progra program prop proportion prtest prtesti pwcorr pwd q\\s qby qbys qchi qchi_7 qladder qladder_7 qnorm qnorm_7 qqplot qqplot_7 qreg qreg_c qreg_p qreg_sw qu quadchk quantile quantile_7 que quer query range ranksum ratio rchart rchart_7 rcof recast reclink recode reg reg3 reg3_p regdw regr regre regre_p2 regres regres_p regress regress_estat regriv_p remap ren rena renam rename renpfix repeat replace report reshape restore ret retu retur return rm rmdir robvar roccomp roccomp_7 roccomp_8 rocf_lf rocfit rocfit_8 rocgold rocplot rocplot_7 roctab roctab_7 rolling rologit rologit_p rot rota rotat rotate rotatemat rreg rreg_p ru run runtest rvfplot rvfplot_7 rvpplot rvpplot_7 sa safesum sample sampsi sav save savedresults saveold sc sca scal scala scalar scatter scm_mine sco scob_lf scob_p scobi_sw scobit scor score scoreplot scoreplot_help scree screeplot screeplot_help sdtest sdtesti se search separate seperate serrbar serrbar_7 serset set set_defaults sfrancia sh she shel shell shewhart shewhart_7 signestimationsample signrank signtest simul simul_7 simulate simulate_8 sktest sleep slogit slogit_d2 slogit_p smooth snapspan so sor sort spearman spikeplot spikeplot_7 spikeplt spline_x split sqreg sqreg_p sret sretu sretur sreturn ssc st st_ct st_hc st_hcd st_hcd_sh st_is st_issys st_note st_promo st_set st_show st_smpl st_subid stack statsby statsby_8 stbase stci stci_7 stcox stcox_estat stcox_fr stcox_fr_ll stcox_p stcox_sw stcoxkm stcoxkm_7 stcstat stcurv stcurve stcurve_7 stdes stem stepwise stereg stfill stgen stir stjoin stmc stmh stphplot stphplot_7 stphtest stphtest_7 stptime strate strate_7 streg streg_sw streset sts sts_7 stset stsplit stsum sttocc sttoct stvary stweib su suest suest_8 sum summ summa summar summari summariz summarize sunflower sureg survcurv survsum svar svar_p svmat svy svy_disp svy_dreg svy_est svy_est_7 svy_estat svy_get svy_gnbreg_p svy_head svy_header svy_heckman_p svy_heckprob_p svy_intreg_p svy_ivreg_p svy_logistic_p svy_logit_p svy_mlogit_p svy_nbreg_p svy_ologit_p svy_oprobit_p svy_poisson_p svy_probit_p svy_regress_p svy_sub svy_sub_7 svy_x svy_x_7 svy_x_p svydes svydes_8 svygen svygnbreg svyheckman svyheckprob svyintreg svyintreg_7 svyintrg svyivreg svylc svylog_p svylogit svymarkout svymarkout_8 svymean svymlog svymlogit svynbreg svyolog svyologit svyoprob svyoprobit svyopts svypois svypois_7 svypoisson svyprobit svyprobt svyprop svyprop_7 svyratio svyreg svyreg_p svyregress svyset svyset_7 svyset_8 svytab svytab_7 svytest svytotal sw sw_8 swcnreg swcox swereg swilk swlogis swlogit swologit swoprbt swpois swprobit swqreg swtobit swweib symmetry symmi symplot symplot_7 syntax sysdescribe sysdir sysuse szroeter ta tab tab1 tab2 tab_or tabd tabdi tabdis tabdisp tabi table tabodds tabodds_7 tabstat tabu tabul tabula tabulat tabulate te tempfile tempname tempvar tes test testnl testparm teststd tetrachoric time_it timer tis tob tobi tobit tobit_p tobit_sw token tokeni tokeniz tokenize tostring total translate translator transmap treat_ll treatr_p treatreg trim trnb_cons trnb_mean trpoiss_d2 trunc_ll truncr_p truncreg tsappend tset tsfill tsline tsline_ex tsreport tsrevar tsrline tsset tssmooth tsunab ttest ttesti tut_chk tut_wait tutorial tw tware_st two twoway twoway__fpfit_serset twoway__function_gen twoway__histogram_gen twoway__ipoint_serset twoway__ipoints_serset twoway__kdensity_gen twoway__lfit_serset twoway__normgen_gen twoway__pci_serset twoway__qfit_serset twoway__scatteri_serset twoway__sunflower_gen twoway_ksm_serset ty typ type typeof u unab unabbrev unabcmd update us use uselabel var var_mkcompanion var_p varbasic varfcast vargranger varirf varirf_add varirf_cgraph varirf_create varirf_ctable varirf_describe varirf_dir varirf_drop varirf_erase varirf_graph varirf_ograph varirf_rename varirf_set varirf_table varlist varlmar varnorm varsoc varstable varstable_w varstable_w2 varwle vce vec vec_fevd vec_mkphi vec_p vec_p_w vecirf_create veclmar veclmar_w vecnorm vecnorm_w vecrank vecstable verinst vers versi versio version view viewsource vif vwls wdatetof webdescribe webseek webuse weib1_lf weib2_lf weib_lf weib_lf0 weibhet_glf weibhet_glf_sh weibhet_glfa weibhet_glfa_sh weibhet_gp weibhet_ilf weibhet_ilf_sh weibhet_ilfa weibhet_ilfa_sh weibhet_ip weibu_sw weibul_p weibull weibull_c weibull_s weibullhet wh whelp whi which whil while wilc_st wilcoxon win wind windo window winexec wntestb wntestb_7 wntestq xchart xchart_7 xcorr xcorr_7 xi xi_6 xmlsav xmlsave xmluse xpose xsh xshe xshel xshell xt_iis xt_tis xtab_p xtabond xtbin_p xtclog xtcloglog xtcloglog_8 xtcloglog_d2 xtcloglog_pa_p xtcloglog_re_p xtcnt_p xtcorr xtdata xtdes xtfront_p xtfrontier xtgee xtgee_elink xtgee_estat xtgee_makeivar xtgee_p xtgee_plink xtgls xtgls_p xthaus xthausman xtht_p xthtaylor xtile xtint_p xtintreg xtintreg_8 xtintreg_d2 xtintreg_p xtivp_1 xtivp_2 xtivreg xtline xtline_ex xtlogit xtlogit_8 xtlogit_d2 xtlogit_fe_p xtlogit_pa_p xtlogit_re_p xtmixed xtmixed_estat xtmixed_p xtnb_fe xtnb_lf xtnbreg xtnbreg_pa_p xtnbreg_refe_p xtpcse xtpcse_p xtpois xtpoisson xtpoisson_d2 xtpoisson_pa_p xtpoisson_refe_p xtpred xtprobit xtprobit_8 xtprobit_d2 xtprobit_re_p xtps_fe xtps_lf xtps_ren xtps_ren_8 xtrar_p xtrc xtrc_p xtrchh xtrefe_p xtreg xtreg_be xtreg_fe xtreg_ml xtreg_pa_p xtreg_re xtregar xtrere_p xtset xtsf_ll xtsf_llti xtsum xttab xttest0 xttobit xttobit_8 xttobit_p xttrans yx yxview__barlike_draw yxview_area_draw yxview_bar_draw yxview_dot_draw yxview_dropline_draw yxview_function_draw yxview_iarrow_draw yxview_ilabels_draw yxview_normal_draw yxview_pcarrow_draw yxview_pcbarrow_draw yxview_pccapsym_draw yxview_pcscatter_draw yxview_pcspike_draw yxview_rarea_draw yxview_rbar_draw yxview_rbarm_draw yxview_rcap_draw yxview_rcapsym_draw yxview_rconnected_draw yxview_rline_draw yxview_rscatter_draw yxview_rspike_draw yxview_spike_draw yxview_sunflower_draw zap_s zinb zinb_llf zinb_plf zip zip_llf zip_p zip_plf zt_ct_5 zt_hc_5 zt_hcd_5 zt_is_5 zt_iss_5 zt_sho_5 zt_smp_5 ztbase_5 ztcox_5 ztdes_5 ztereg_5 ztfill_5 ztgen_5 ztir_5 ztjoin_5 ztnb ztnb_p ztp ztp_p zts_5 ztset_5 ztspli_5 ztsum_5 zttoct_5 ztvary_5 ztweib_5",c:[{cN:"label",v:[{b:"\\$\\{?[a-zA-Z0-9_]+\\}?"},{b:"`[a-zA-Z0-9_]+'"}]},{cN:"string",v:[{b:'`"[^\r\n]*?"\''},{b:'"[^\r\n"]*"'}]},{cN:"literal",v:[{b:"\\b(abs|acos|asin|atan|atan2|atanh|ceil|cloglog|comb|cos|digamma|exp|floor|invcloglog|invlogit|ln|lnfact|lnfactorial|lngamma|log|log10|max|min|mod|reldif|round|sign|sin|sqrt|sum|tan|tanh|trigamma|trunc|betaden|Binomial|binorm|binormal|chi2|chi2tail|dgammapda|dgammapdada|dgammapdadx|dgammapdx|dgammapdxdx|F|Fden|Ftail|gammaden|gammap|ibeta|invbinomial|invchi2|invchi2tail|invF|invFtail|invgammap|invibeta|invnchi2|invnFtail|invnibeta|invnorm|invnormal|invttail|nbetaden|nchi2|nFden|nFtail|nibeta|norm|normal|normalden|normd|npnchi2|tden|ttail|uniform|abbrev|char|index|indexnot|length|lower|ltrim|match|plural|proper|real|regexm|regexr|regexs|reverse|rtrim|string|strlen|strlower|strltrim|strmatch|strofreal|strpos|strproper|strreverse|strrtrim|strtrim|strupper|subinstr|subinword|substr|trim|upper|word|wordcount|_caller|autocode|byteorder|chop|clip|cond|e|epsdouble|epsfloat|group|inlist|inrange|irecode|matrix|maxbyte|maxdouble|maxfloat|maxint|maxlong|mi|minbyte|mindouble|minfloat|minint|minlong|missing|r|recode|replay|return|s|scalar|d|date|day|dow|doy|halfyear|mdy|month|quarter|week|year|d|daily|dofd|dofh|dofm|dofq|dofw|dofy|h|halfyearly|hofd|m|mofd|monthly|q|qofd|quarterly|tin|twithin|w|weekly|wofd|y|yearly|yh|ym|yofd|yq|yw|cholesky|colnumb|colsof|corr|det|diag|diag0cnt|el|get|hadamard|I|inv|invsym|issym|issymmetric|J|matmissing|matuniform|mreldif|nullmat|rownumb|rowsof|sweep|syminv|trace|vec|vecdiag)(?=\\(|$)"}]},e.C("^[ 	]*\\*.*$",!1),e.CLCM,e.CBCM]}});hljs.registerLanguage("asciidoc",function(e){return{aliases:["adoc"],c:[e.C("^/{4,}\\n","\\n/{4,}$",{r:10}),e.C("^//","$",{r:0}),{cN:"title",b:"^\\.\\w.*$"},{b:"^[=\\*]{4,}\\n",e:"\\n^[=\\*]{4,}$",r:10},{cN:"header",b:"^(={1,5}) .+?( \\1)?$",r:10},{cN:"header",b:"^[^\\[\\]\\n]+?\\n[=\\-~\\^\\+]{2,}$",r:10},{cN:"attribute",b:"^:.+?:",e:"\\s",eE:!0,r:10},{cN:"attribute",b:"^\\[.+?\\]$",r:0},{cN:"blockquote",b:"^_{4,}\\n",e:"\\n_{4,}$",r:10},{cN:"code",b:"^[\\-\\.]{4,}\\n",e:"\\n[\\-\\.]{4,}$",r:10},{b:"^\\+{4,}\\n",e:"\\n\\+{4,}$",c:[{b:"<",e:">",sL:"xml",r:0}],r:10},{cN:"bullet",b:"^(\\*+|\\-+|\\.+|[^\\n]+?::)\\s+"},{cN:"label",b:"^(NOTE|TIP|IMPORTANT|WARNING|CAUTION):\\s+",r:10},{cN:"strong",b:"\\B\\*(?![\\*\\s])",e:"(\\n{2}|\\*)",c:[{b:"\\\\*\\w",r:0}]},{cN:"emphasis",b:"\\B'(?!['\\s])",e:"(\\n{2}|')",c:[{b:"\\\\'\\w",r:0}],r:0},{cN:"emphasis",b:"_(?![_\\s])",e:"(\\n{2}|_)",r:0},{cN:"smartquote",v:[{b:"``.+?''"},{b:"`.+?'"}]},{cN:"code",b:"(`.+?`|\\+.+?\\+)",r:0},{cN:"code",b:"^[ \\t]",e:"$",r:0},{cN:"horizontal_rule",b:"^'{3,}[ \\t]*$",r:10},{b:"(link:)?(http|https|ftp|file|irc|image:?):\\S+\\[.*?\\]",rB:!0,c:[{b:"(link|image:?):",r:0},{cN:"link_url",b:"\\w",e:"[^\\[]+",r:0},{cN:"link_label",b:"\\[",e:"\\]",eB:!0,eE:!0,r:0}],r:10}]}});hljs.registerLanguage("php",function(e){var c={cN:"variable",b:"\\$+[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*"},i={cN:"preprocessor",b:/<\?(php)?|\?>/},a={cN:"string",c:[e.BE,i],v:[{b:'b"',e:'"'},{b:"b'",e:"'"},e.inherit(e.ASM,{i:null}),e.inherit(e.QSM,{i:null})]},n={v:[e.BNM,e.CNM]};return{aliases:["php3","php4","php5","php6"],cI:!0,k:"and include_once list abstract global private echo interface as static endswitch array null if endwhile or const for endforeach self var while isset public protected exit foreach throw elseif include __FILE__ empty require_once do xor return parent clone use __CLASS__ __LINE__ else break print eval new catch __METHOD__ case exception default die require __FUNCTION__ enddeclare final try switch continue endfor endif declare unset true false trait goto instanceof insteadof __DIR__ __NAMESPACE__ yield finally",c:[e.CLCM,e.HCM,e.C("/\\*","\\*/",{c:[{cN:"phpdoc",b:"\\s@[A-Za-z]+"},i]}),e.C("__halt_compiler.+?;",!1,{eW:!0,k:"__halt_compiler",l:e.UIR}),{cN:"string",b:"<<<['\"]?\\w+['\"]?$",e:"^\\w+;",c:[e.BE]},i,c,{b:/(::|->)+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/},{cN:"function",bK:"function",e:/[;{]/,eE:!0,i:"\\$|\\[|%",c:[e.UTM,{cN:"params",b:"\\(",e:"\\)",c:["self",c,e.CBCM,a,n]}]},{cN:"class",bK:"class interface",e:"{",eE:!0,i:/[:\(\$"]/,c:[{bK:"extends implements"},e.UTM]},{bK:"namespace",e:";",i:/[\.']/,c:[e.UTM]},{bK:"use",e:";",c:[e.UTM]},{b:"=>"},a,n]}});hljs.registerLanguage("java",function(e){var a=e.UIR+"(<"+e.UIR+">)?",t="false synchronized int abstract float private char boolean static null if const for true while long strictfp finally protected import native final void enum else break transient catch instanceof byte super volatile case assert short package default double public try this switch continue throws protected public private",c="(\\b(0b[01_]+)|\\b0[xX][a-fA-F0-9_]+|(\\b[\\d_]+(\\.[\\d_]*)?|\\.[\\d_]+)([eE][-+]?\\d+)?)[lLfF]?",r={cN:"number",b:c,r:0};return{aliases:["jsp"],k:t,i:/<\//,c:[{cN:"javadoc",b:"/\\*\\*",e:"\\*/",r:0,c:[{cN:"javadoctag",b:"(^|\\s)@[A-Za-z]+"}]},e.CLCM,e.CBCM,e.ASM,e.QSM,{cN:"class",bK:"class interface",e:/[{;=]/,eE:!0,k:"class interface",i:/[:"\[\]]/,c:[{bK:"extends implements"},e.UTM]},{bK:"new throw return",r:0},{cN:"function",b:"("+a+"\\s+)+"+e.UIR+"\\s*\\(",rB:!0,e:/[{;=]/,eE:!0,k:t,c:[{b:e.UIR+"\\s*\\(",rB:!0,r:0,c:[e.UTM]},{cN:"params",b:/\(/,e:/\)/,k:t,r:0,c:[e.ASM,e.QSM,e.CNM,e.CBCM]},e.CLCM,e.CBCM]},r,{cN:"annotation",b:"@[A-Za-z]+"}]}});hljs.registerLanguage("glsl",function(e){return{k:{keyword:"atomic_uint attribute bool break bvec2 bvec3 bvec4 case centroid coherent const continue default discard dmat2 dmat2x2 dmat2x3 dmat2x4 dmat3 dmat3x2 dmat3x3 dmat3x4 dmat4 dmat4x2 dmat4x3 dmat4x4 do double dvec2 dvec3 dvec4 else flat float for highp if iimage1D iimage1DArray iimage2D iimage2DArray iimage2DMS iimage2DMSArray iimage2DRect iimage3D iimageBuffer iimageCube iimageCubeArray image1D image1DArray image2D image2DArray image2DMS image2DMSArray image2DRect image3D imageBuffer imageCube imageCubeArray in inout int invariant isampler1D isampler1DArray isampler2D isampler2DArray isampler2DMS isampler2DMSArray isampler2DRect isampler3D isamplerBuffer isamplerCube isamplerCubeArray ivec2 ivec3 ivec4 layout lowp mat2 mat2x2 mat2x3 mat2x4 mat3 mat3x2 mat3x3 mat3x4 mat4 mat4x2 mat4x3 mat4x4 mediump noperspective out patch precision readonly restrict return sample sampler1D sampler1DArray sampler1DArrayShadow sampler1DShadow sampler2D sampler2DArray sampler2DArrayShadow sampler2DMS sampler2DMSArray sampler2DRect sampler2DRectShadow sampler2DShadow sampler3D samplerBuffer samplerCube samplerCubeArray samplerCubeArrayShadow samplerCubeShadow smooth struct subroutine switch uimage1D uimage1DArray uimage2D uimage2DArray uimage2DMS uimage2DMSArray uimage2DRect uimage3D uimageBuffer uimageCube uimageCubeArray uint uniform usampler1D usampler1DArray usampler2D usampler2DArray usampler2DMS usampler2DMSArray usampler2DRect usampler3D usamplerBuffer usamplerCube usamplerCubeArray uvec2 uvec3 uvec4 varying vec2 vec3 vec4 void volatile while writeonly",built_in:"gl_BackColor gl_BackLightModelProduct gl_BackLightProduct gl_BackMaterial gl_BackSecondaryColor gl_ClipDistance gl_ClipPlane gl_ClipVertex gl_Color gl_DepthRange gl_EyePlaneQ gl_EyePlaneR gl_EyePlaneS gl_EyePlaneT gl_Fog gl_FogCoord gl_FogFragCoord gl_FragColor gl_FragCoord gl_FragData gl_FragDepth gl_FrontColor gl_FrontFacing gl_FrontLightModelProduct gl_FrontLightProduct gl_FrontMaterial gl_FrontSecondaryColor gl_InstanceID gl_InvocationID gl_Layer gl_LightModel gl_LightSource gl_MaxAtomicCounterBindings gl_MaxAtomicCounterBufferSize gl_MaxClipDistances gl_MaxClipPlanes gl_MaxCombinedAtomicCounterBuffers gl_MaxCombinedAtomicCounters gl_MaxCombinedImageUniforms gl_MaxCombinedImageUnitsAndFragmentOutputs gl_MaxCombinedTextureImageUnits gl_MaxDrawBuffers gl_MaxFragmentAtomicCounterBuffers gl_MaxFragmentAtomicCounters gl_MaxFragmentImageUniforms gl_MaxFragmentInputComponents gl_MaxFragmentUniformComponents gl_MaxFragmentUniformVectors gl_MaxGeometryAtomicCounterBuffers gl_MaxGeometryAtomicCounters gl_MaxGeometryImageUniforms gl_MaxGeometryInputComponents gl_MaxGeometryOutputComponents gl_MaxGeometryOutputVertices gl_MaxGeometryTextureImageUnits gl_MaxGeometryTotalOutputComponents gl_MaxGeometryUniformComponents gl_MaxGeometryVaryingComponents gl_MaxImageSamples gl_MaxImageUnits gl_MaxLights gl_MaxPatchVertices gl_MaxProgramTexelOffset gl_MaxTessControlAtomicCounterBuffers gl_MaxTessControlAtomicCounters gl_MaxTessControlImageUniforms gl_MaxTessControlInputComponents gl_MaxTessControlOutputComponents gl_MaxTessControlTextureImageUnits gl_MaxTessControlTotalOutputComponents gl_MaxTessControlUniformComponents gl_MaxTessEvaluationAtomicCounterBuffers gl_MaxTessEvaluationAtomicCounters gl_MaxTessEvaluationImageUniforms gl_MaxTessEvaluationInputComponents gl_MaxTessEvaluationOutputComponents gl_MaxTessEvaluationTextureImageUnits gl_MaxTessEvaluationUniformComponents gl_MaxTessGenLevel gl_MaxTessPatchComponents gl_MaxTextureCoords gl_MaxTextureImageUnits gl_MaxTextureUnits gl_MaxVaryingComponents gl_MaxVaryingFloats gl_MaxVaryingVectors gl_MaxVertexAtomicCounterBuffers gl_MaxVertexAtomicCounters gl_MaxVertexAttribs gl_MaxVertexImageUniforms gl_MaxVertexOutputComponents gl_MaxVertexTextureImageUnits gl_MaxVertexUniformComponents gl_MaxVertexUniformVectors gl_MaxViewports gl_MinProgramTexelOffsetgl_ModelViewMatrix gl_ModelViewMatrixInverse gl_ModelViewMatrixInverseTranspose gl_ModelViewMatrixTranspose gl_ModelViewProjectionMatrix gl_ModelViewProjectionMatrixInverse gl_ModelViewProjectionMatrixInverseTranspose gl_ModelViewProjectionMatrixTranspose gl_MultiTexCoord0 gl_MultiTexCoord1 gl_MultiTexCoord2 gl_MultiTexCoord3 gl_MultiTexCoord4 gl_MultiTexCoord5 gl_MultiTexCoord6 gl_MultiTexCoord7 gl_Normal gl_NormalMatrix gl_NormalScale gl_ObjectPlaneQ gl_ObjectPlaneR gl_ObjectPlaneS gl_ObjectPlaneT gl_PatchVerticesIn gl_PerVertex gl_Point gl_PointCoord gl_PointSize gl_Position gl_PrimitiveID gl_PrimitiveIDIn gl_ProjectionMatrix gl_ProjectionMatrixInverse gl_ProjectionMatrixInverseTranspose gl_ProjectionMatrixTranspose gl_SampleID gl_SampleMask gl_SampleMaskIn gl_SamplePosition gl_SecondaryColor gl_TessCoord gl_TessLevelInner gl_TessLevelOuter gl_TexCoord gl_TextureEnvColor gl_TextureMatrixInverseTranspose gl_TextureMatrixTranspose gl_Vertex gl_VertexID gl_ViewportIndex gl_in gl_out EmitStreamVertex EmitVertex EndPrimitive EndStreamPrimitive abs acos acosh all any asin asinh atan atanh atomicCounter atomicCounterDecrement atomicCounterIncrement barrier bitCount bitfieldExtract bitfieldInsert bitfieldReverse ceil clamp cos cosh cross dFdx dFdy degrees determinant distance dot equal exp exp2 faceforward findLSB findMSB floatBitsToInt floatBitsToUint floor fma fract frexp ftransform fwidth greaterThan greaterThanEqual imageAtomicAdd imageAtomicAnd imageAtomicCompSwap imageAtomicExchange imageAtomicMax imageAtomicMin imageAtomicOr imageAtomicXor imageLoad imageStore imulExtended intBitsToFloat interpolateAtCentroid interpolateAtOffset interpolateAtSample inverse inversesqrt isinf isnan ldexp length lessThan lessThanEqual log log2 matrixCompMult max memoryBarrier min mix mod modf noise1 noise2 noise3 noise4 normalize not notEqual outerProduct packDouble2x32 packHalf2x16 packSnorm2x16 packSnorm4x8 packUnorm2x16 packUnorm4x8 pow radians reflect refract round roundEven shadow1D shadow1DLod shadow1DProj shadow1DProjLod shadow2D shadow2DLod shadow2DProj shadow2DProjLod sign sin sinh smoothstep sqrt step tan tanh texelFetch texelFetchOffset texture texture1D texture1DLod texture1DProj texture1DProjLod texture2D texture2DLod texture2DProj texture2DProjLod texture3D texture3DLod texture3DProj texture3DProjLod textureCube textureCubeLod textureGather textureGatherOffset textureGatherOffsets textureGrad textureGradOffset textureLod textureLodOffset textureOffset textureProj textureProjGrad textureProjGradOffset textureProjLod textureProjLodOffset textureProjOffset textureQueryLod textureSize transpose trunc uaddCarry uintBitsToFloat umulExtended unpackDouble2x32 unpackHalf2x16 unpackSnorm2x16 unpackSnorm4x8 unpackUnorm2x16 unpackUnorm4x8 usubBorrow gl_TextureMatrix gl_TextureMatrixInverse",literal:"true false"},i:'"',c:[e.CLCM,e.CBCM,e.CNM,{cN:"preprocessor",b:"#",e:"$"}]}});hljs.registerLanguage("lua",function(e){var t="\\[=*\\[",a="\\]=*\\]",r={b:t,e:a,c:["self"]},n=[e.C("--(?!"+t+")","$"),e.C("--"+t,a,{c:[r],r:10})];return{l:e.UIR,k:{keyword:"and break do else elseif end false for if in local nil not or repeat return then true until while",built_in:"_G _VERSION assert collectgarbage dofile error getfenv getmetatable ipairs load loadfile loadstring module next pairs pcall print rawequal rawget rawset require select setfenv setmetatable tonumber tostring type unpack xpcall coroutine debug io math os package string table"},c:n.concat([{cN:"function",bK:"function",e:"\\)",c:[e.inherit(e.TM,{b:"([_a-zA-Z]\\w*\\.)*([_a-zA-Z]\\w*:)?[_a-zA-Z]\\w*"}),{cN:"params",b:"\\(",eW:!0,c:n}].concat(n)},e.CNM,e.ASM,e.QSM,{cN:"string",b:t,e:a,c:[r],r:5}])}});hljs.registerLanguage("protobuf",function(e){return{k:{keyword:"package import option optional required repeated group",built_in:"double float int32 int64 uint32 uint64 sint32 sint64 fixed32 fixed64 sfixed32 sfixed64 bool string bytes",literal:"true false"},c:[e.QSM,e.NM,e.CLCM,{cN:"class",bK:"message enum service",e:/\{/,i:/\n/,c:[e.inherit(e.TM,{starts:{eW:!0,eE:!0}})]},{cN:"function",bK:"rpc",e:/;/,eE:!0,k:"rpc returns"},{cN:"constant",b:/^\s*[A-Z_]+/,e:/\s*=/,eE:!0}]}});hljs.registerLanguage("gcode",function(e){var N="[A-Z_][A-Z0-9_.]*",i="\\%",c={literal:"",built_in:"",keyword:"IF DO WHILE ENDWHILE CALL ENDIF SUB ENDSUB GOTO REPEAT ENDREPEAT EQ LT GT NE GE LE OR XOR"},r={cN:"preprocessor",b:"([O])([0-9]+)"},l=[e.CLCM,e.CBCM,e.C(/\(/,/\)/),e.inherit(e.CNM,{b:"([-+]?([0-9]*\\.?[0-9]+\\.?))|"+e.CNR}),e.inherit(e.ASM,{i:null}),e.inherit(e.QSM,{i:null}),{cN:"keyword",b:"([G])([0-9]+\\.?[0-9]?)"},{cN:"title",b:"([M])([0-9]+\\.?[0-9]?)"},{cN:"title",b:"(VC|VS|#)",e:"(\\d+)"},{cN:"title",b:"(VZOFX|VZOFY|VZOFZ)"},{cN:"built_in",b:"(ATAN|ABS|ACOS|ASIN|SIN|COS|EXP|FIX|FUP|ROUND|LN|TAN)(\\[)",e:"([-+]?([0-9]*\\.?[0-9]+\\.?))(\\])"},{cN:"label",v:[{b:"N",e:"\\d+",i:"\\W"}]}];return{aliases:["nc"],cI:!0,l:N,k:c,c:[{cN:"preprocessor",b:i},r].concat(l)}});hljs.registerLanguage("vim",function(e){return{l:/[!#@\w]+/,k:{keyword:"N|0 P|0 X|0 a|0 ab abc abo al am an|0 ar arga argd arge argdo argg argl argu as au aug aun b|0 bN ba bad bd be bel bf bl bm bn bo bp br brea breaka breakd breakl bro bufdo buffers bun bw c|0 cN cNf ca cabc caddb cad caddf cal cat cb cc ccl cd ce cex cf cfir cgetb cgete cg changes chd che checkt cl cla clo cm cmapc cme cn cnew cnf cno cnorea cnoreme co col colo com comc comp con conf cope cp cpf cq cr cs cst cu cuna cunme cw d|0 delm deb debugg delc delf dif diffg diffo diffp diffpu diffs diffthis dig di dl dell dj dli do doautoa dp dr ds dsp e|0 ea ec echoe echoh echom echon el elsei em en endfo endf endt endw ene ex exe exi exu f|0 files filet fin fina fini fir fix fo foldc foldd folddoc foldo for fu g|0 go gr grepa gu gv ha h|0 helpf helpg helpt hi hid his i|0 ia iabc if ij il im imapc ime ino inorea inoreme int is isp iu iuna iunme j|0 ju k|0 keepa kee keepj lN lNf l|0 lad laddb laddf la lan lat lb lc lch lcl lcs le lefta let lex lf lfir lgetb lgete lg lgr lgrepa lh ll lla lli lmak lm lmapc lne lnew lnf ln loadk lo loc lockv lol lope lp lpf lr ls lt lu lua luad luaf lv lvimgrepa lw m|0 ma mak map mapc marks mat me menut mes mk mks mksp mkv mkvie mod mz mzf nbc nb nbs n|0 new nm nmapc nme nn nnoreme noa no noh norea noreme norm nu nun nunme ol o|0 om omapc ome on ono onoreme opt ou ounme ow p|0 profd prof pro promptr pc ped pe perld po popu pp pre prev ps pt ptN ptf ptj ptl ptn ptp ptr pts pu pw py3 python3 py3d py3f py pyd pyf q|0 quita qa r|0 rec red redi redr redraws reg res ret retu rew ri rightb rub rubyd rubyf rund ru rv s|0 sN san sa sal sav sb sbN sba sbf sbl sbm sbn sbp sbr scrip scripte scs se setf setg setl sf sfir sh sim sig sil sl sla sm smap smapc sme sn sni sno snor snoreme sor so spelld spe spelli spellr spellu spellw sp spr sre st sta startg startr star stopi stj sts sun sunm sunme sus sv sw sy synti sync t|0 tN tabN tabc tabdo tabe tabf tabfir tabl tabm tabnew tabn tabo tabp tabr tabs tab ta tags tc tcld tclf te tf th tj tl tm tn to tp tr try ts tu u|0 undoj undol una unh unl unlo unm unme uns up v|0 ve verb vert vim vimgrepa vi viu vie vm vmapc vme vne vn vnoreme vs vu vunme windo w|0 wN wa wh wi winc winp wn wp wq wqa ws wu wv x|0 xa xmapc xm xme xn xnoreme xu xunme y|0 z|0 ~ Next Print append abbreviate abclear aboveleft all amenu anoremenu args argadd argdelete argedit argglobal arglocal argument ascii autocmd augroup aunmenu buffer bNext ball badd bdelete behave belowright bfirst blast bmodified bnext botright bprevious brewind break breakadd breakdel breaklist browse bunload bwipeout change cNext cNfile cabbrev cabclear caddbuffer caddexpr caddfile call catch cbuffer cclose center cexpr cfile cfirst cgetbuffer cgetexpr cgetfile chdir checkpath checktime clist clast close cmap cmapclear cmenu cnext cnewer cnfile cnoremap cnoreabbrev cnoremenu copy colder colorscheme command comclear compiler continue confirm copen cprevious cpfile cquit crewind cscope cstag cunmap cunabbrev cunmenu cwindow delete delmarks debug debuggreedy delcommand delfunction diffupdate diffget diffoff diffpatch diffput diffsplit digraphs display deletel djump dlist doautocmd doautoall deletep drop dsearch dsplit edit earlier echo echoerr echohl echomsg else elseif emenu endif endfor endfunction endtry endwhile enew execute exit exusage file filetype find finally finish first fixdel fold foldclose folddoopen folddoclosed foldopen function global goto grep grepadd gui gvim hardcopy help helpfind helpgrep helptags highlight hide history insert iabbrev iabclear ijump ilist imap imapclear imenu inoremap inoreabbrev inoremenu intro isearch isplit iunmap iunabbrev iunmenu join jumps keepalt keepmarks keepjumps lNext lNfile list laddexpr laddbuffer laddfile last language later lbuffer lcd lchdir lclose lcscope left leftabove lexpr lfile lfirst lgetbuffer lgetexpr lgetfile lgrep lgrepadd lhelpgrep llast llist lmake lmap lmapclear lnext lnewer lnfile lnoremap loadkeymap loadview lockmarks lockvar lolder lopen lprevious lpfile lrewind ltag lunmap luado luafile lvimgrep lvimgrepadd lwindow move mark make mapclear match menu menutranslate messages mkexrc mksession mkspell mkvimrc mkview mode mzscheme mzfile nbclose nbkey nbsart next nmap nmapclear nmenu nnoremap nnoremenu noautocmd noremap nohlsearch noreabbrev noremenu normal number nunmap nunmenu oldfiles open omap omapclear omenu only onoremap onoremenu options ounmap ounmenu ownsyntax print profdel profile promptfind promptrepl pclose pedit perl perldo pop popup ppop preserve previous psearch ptag ptNext ptfirst ptjump ptlast ptnext ptprevious ptrewind ptselect put pwd py3do py3file python pydo pyfile quit quitall qall read recover redo redir redraw redrawstatus registers resize retab return rewind right rightbelow ruby rubydo rubyfile rundo runtime rviminfo substitute sNext sandbox sargument sall saveas sbuffer sbNext sball sbfirst sblast sbmodified sbnext sbprevious sbrewind scriptnames scriptencoding scscope set setfiletype setglobal setlocal sfind sfirst shell simalt sign silent sleep slast smagic smapclear smenu snext sniff snomagic snoremap snoremenu sort source spelldump spellgood spellinfo spellrepall spellundo spellwrong split sprevious srewind stop stag startgreplace startreplace startinsert stopinsert stjump stselect sunhide sunmap sunmenu suspend sview swapname syntax syntime syncbind tNext tabNext tabclose tabedit tabfind tabfirst tablast tabmove tabnext tabonly tabprevious tabrewind tag tcl tcldo tclfile tearoff tfirst throw tjump tlast tmenu tnext topleft tprevious trewind tselect tunmenu undo undojoin undolist unabbreviate unhide unlet unlockvar unmap unmenu unsilent update vglobal version verbose vertical vimgrep vimgrepadd visual viusage view vmap vmapclear vmenu vnew vnoremap vnoremenu vsplit vunmap vunmenu write wNext wall while winsize wincmd winpos wnext wprevious wqall wsverb wundo wviminfo xit xall xmapclear xmap xmenu xnoremap xnoremenu xunmap xunmenu yank",built_in:"abs acos add and append argc argidx argv asin atan atan2 browse browsedir bufexists buflisted bufloaded bufname bufnr bufwinnr byte2line byteidx call ceil changenr char2nr cindent clearmatches col complete complete_add complete_check confirm copy cos cosh count cscope_connection cursor deepcopy delete did_filetype diff_filler diff_hlID empty escape eval eventhandler executable exists exp expand extend feedkeys filereadable filewritable filter finddir findfile float2nr floor fmod fnameescape fnamemodify foldclosed foldclosedend foldlevel foldtext foldtextresult foreground function garbagecollect get getbufline getbufvar getchar getcharmod getcmdline getcmdpos getcmdtype getcwd getfontname getfperm getfsize getftime getftype getline getloclist getmatches getpid getpos getqflist getreg getregtype gettabvar gettabwinvar getwinposx getwinposy getwinvar glob globpath has has_key haslocaldir hasmapto histadd histdel histget histnr hlexists hlID hostname iconv indent index input inputdialog inputlist inputrestore inputsave inputsecret insert invert isdirectory islocked items join keys len libcall libcallnr line line2byte lispindent localtime log log10 luaeval map maparg mapcheck match matchadd matcharg matchdelete matchend matchlist matchstr max min mkdir mode mzeval nextnonblank nr2char or pathshorten pow prevnonblank printf pumvisible py3eval pyeval range readfile reltime reltimestr remote_expr remote_foreground remote_peek remote_read remote_send remove rename repeat resolve reverse round screenattr screenchar screencol screenrow search searchdecl searchpair searchpairpos searchpos server2client serverlist setbufvar setcmdpos setline setloclist setmatches setpos setqflist setreg settabvar settabwinvar setwinvar sha256 shellescape shiftwidth simplify sin sinh sort soundfold spellbadword spellsuggest split sqrt str2float str2nr strchars strdisplaywidth strftime stridx string strlen strpart strridx strtrans strwidth submatch substitute synconcealed synID synIDattr synIDtrans synstack system tabpagebuflist tabpagenr tabpagewinnr tagfiles taglist tan tanh tempname tolower toupper tr trunc type undofile undotree values virtcol visualmode wildmenumode winbufnr wincol winheight winline winnr winrestcmd winrestview winsaveview winwidth writefile xor"},i:/[{:]/,c:[e.NM,e.ASM,{cN:"string",b:/"((\\")|[^"\n])*("|\n)/},{cN:"variable",b:/[bwtglsav]:[\w\d_]*/},{cN:"function",bK:"function function!",e:"$",r:0,c:[e.TM,{cN:"params",b:"\\(",e:"\\)"}]}]}});hljs.registerLanguage("processing",function(e){return{k:{keyword:"BufferedReader PVector PFont PImage PGraphics HashMap boolean byte char color double float int long String Array FloatDict FloatList IntDict IntList JSONArray JSONObject Object StringDict StringList Table TableRow XML false synchronized int abstract float private char boolean static null if const for true while long throw strictfp finally protected import native final return void enum else break transient new catch instanceof byte super volatile case assert short package default double public try this switch continue throws protected public private",constant:"P2D P3D HALF_PI PI QUARTER_PI TAU TWO_PI",variable:"displayHeight displayWidth mouseY mouseX mousePressed pmouseX pmouseY key keyCode pixels focused frameCount frameRate height width",title:"setup draw",built_in:"size createGraphics beginDraw createShape loadShape PShape arc ellipse line point quad rect triangle bezier bezierDetail bezierPoint bezierTangent curve curveDetail curvePoint curveTangent curveTightness shape shapeMode beginContour beginShape bezierVertex curveVertex endContour endShape quadraticVertex vertex ellipseMode noSmooth rectMode smooth strokeCap strokeJoin strokeWeight mouseClicked mouseDragged mouseMoved mousePressed mouseReleased mouseWheel keyPressed keyPressedkeyReleased keyTyped print println save saveFrame day hour millis minute month second year background clear colorMode fill noFill noStroke stroke alpha blue brightness color green hue lerpColor red saturation modelX modelY modelZ screenX screenY screenZ ambient emissive shininess specular add createImage beginCamera camera endCamera frustum ortho perspective printCamera printProjection cursor frameRate noCursor exit loop noLoop popStyle pushStyle redraw binary boolean byte char float hex int str unbinary unhex join match matchAll nf nfc nfp nfs split splitTokens trim append arrayCopy concat expand reverse shorten sort splice subset box sphere sphereDetail createInput createReader loadBytes loadJSONArray loadJSONObject loadStrings loadTable loadXML open parseXML saveTable selectFolder selectInput beginRaw beginRecord createOutput createWriter endRaw endRecord PrintWritersaveBytes saveJSONArray saveJSONObject saveStream saveStrings saveXML selectOutput popMatrix printMatrix pushMatrix resetMatrix rotate rotateX rotateY rotateZ scale shearX shearY translate ambientLight directionalLight lightFalloff lights lightSpecular noLights normal pointLight spotLight image imageMode loadImage noTint requestImage tint texture textureMode textureWrap blend copy filter get loadPixels set updatePixels blendMode loadShader PShaderresetShader shader createFont loadFont text textFont textAlign textLeading textMode textSize textWidth textAscent textDescent abs ceil constrain dist exp floor lerp log mag map max min norm pow round sq sqrt acos asin atan atan2 cos degrees radians sin tan noise noiseDetail noiseSeed random randomGaussian randomSeed"},c:[e.CLCM,e.CBCM,e.ASM,e.QSM,e.CNM]}});hljs.registerLanguage("mizar",function(e){return{k:"environ vocabularies notations constructors definitions registrations theorems schemes requirements begin end definition registration cluster existence pred func defpred deffunc theorem proof let take assume then thus hence ex for st holds consider reconsider such that and in provided of as from be being by means equals implies iff redefine define now not or attr is mode suppose per cases set thesis contradiction scheme reserve struct correctness compatibility coherence symmetry assymetry reflexivity irreflexivity connectedness uniqueness commutativity idempotence involutiveness projectivity",c:[e.C("::","$")]}});hljs.registerLanguage("vbnet",function(e){return{aliases:["vb"],cI:!0,k:{keyword:"addhandler addressof alias and andalso aggregate ansi as assembly auto binary by byref byval call case catch class compare const continue custom declare default delegate dim distinct do each equals else elseif end enum erase error event exit explicit finally for friend from function get global goto group handles if implements imports in inherits interface into is isfalse isnot istrue join key let lib like loop me mid mod module mustinherit mustoverride mybase myclass namespace narrowing new next not notinheritable notoverridable of off on operator option optional or order orelse overloads overridable overrides paramarray partial preserve private property protected public raiseevent readonly redim rem removehandler resume return select set shadows shared skip static step stop structure strict sub synclock take text then throw to try unicode until using when where while widening with withevents writeonly xor",built_in:"boolean byte cbool cbyte cchar cdate cdec cdbl char cint clng cobj csbyte cshort csng cstr ctype date decimal directcast double gettype getxmlnamespace iif integer long object sbyte short single string trycast typeof uinteger ulong ushort",literal:"true false nothing"},i:"//|{|}|endif|gosub|variant|wend",c:[e.inherit(e.QSM,{c:[{b:'""'}]}),e.C("'","$",{rB:!0,c:[{cN:"xmlDocTag",b:"'''|<!--|-->",c:[e.PWM]},{cN:"xmlDocTag",b:"</?",e:">",c:[e.PWM]}]}),e.CNM,{cN:"preprocessor",b:"#",e:"$",k:"if else elseif end region externalsource"}]}});hljs.registerLanguage("q",function(e){var s={keyword:"do while select delete by update from",constant:"0b 1b",built_in:"neg not null string reciprocal floor ceiling signum mod xbar xlog and or each scan over prior mmu lsq inv md5 ltime gtime count first var dev med cov cor all any rand sums prds mins maxs fills deltas ratios avgs differ prev next rank reverse iasc idesc asc desc msum mcount mavg mdev xrank mmin mmax xprev rotate distinct group where flip type key til get value attr cut set upsert raze union inter except cross sv vs sublist enlist read0 read1 hopen hclose hdel hsym hcount peach system ltrim rtrim trim lower upper ssr view tables views cols xcols keys xkey xcol xasc xdesc fkeys meta lj aj aj0 ij pj asof uj ww wj wj1 fby xgroup ungroup ej save load rsave rload show csv parse eval min max avg wavg wsum sin cos tan sum",typename:"`float `double int `timestamp `timespan `datetime `time `boolean `symbol `char `byte `short `long `real `month `date `minute `second `guid"};return{aliases:["k","kdb"],k:s,l:/\b(`?)[A-Za-z0-9_]+\b/,c:[e.CLCM,e.QSM,e.CNM]}});hljs.registerLanguage("livescript",function(e){var t={keyword:"in if for while finally new do return else break catch instanceof throw try this switch continue typeof delete debugger case default function var with then unless until loop of by when and or is isnt not it that otherwise from to til fallthrough super case default function var void const let enum export import native __hasProp __extends __slice __bind __indexOf",literal:"true false null undefined yes no on off it that void",built_in:"npm require console print module global window document"},s="[A-Za-z$_](?:-[0-9A-Za-z$_]|[0-9A-Za-z$_])*",i=e.inherit(e.TM,{b:s}),n={cN:"subst",b:/#\{/,e:/}/,k:t},r={cN:"subst",b:/#[A-Za-z$_]/,e:/(?:\-[0-9A-Za-z$_]|[0-9A-Za-z$_])*/,k:t},c=[e.BNM,{cN:"number",b:"(\\b0[xX][a-fA-F0-9_]+)|(\\b\\d(\\d|_\\d)*(\\.(\\d(\\d|_\\d)*)?)?(_*[eE]([-+]\\d(_\\d|\\d)*)?)?[_a-z]*)",r:0,starts:{e:"(\\s*/)?",r:0}},{cN:"string",v:[{b:/'''/,e:/'''/,c:[e.BE]},{b:/'/,e:/'/,c:[e.BE]},{b:/"""/,e:/"""/,c:[e.BE,n,r]},{b:/"/,e:/"/,c:[e.BE,n,r]},{b:/\\/,e:/(\s|$)/,eE:!0}]},{cN:"pi",v:[{b:"//",e:"//[gim]*",c:[n,e.HCM]},{b:/\/(?![ *])(\\\/|.)*?\/[gim]*(?=\W|$)/}]},{cN:"property",b:"@"+s},{b:"``",e:"``",eB:!0,eE:!0,sL:"javascript"}];n.c=c;var a={cN:"params",b:"\\(",rB:!0,c:[{b:/\(/,e:/\)/,k:t,c:["self"].concat(c)}]};return{aliases:["ls"],k:t,i:/\/\*/,c:c.concat([e.C("\\/\\*","\\*\\/"),e.HCM,{cN:"function",c:[i,a],rB:!0,v:[{b:"("+s+"\\s*(?:=|:=)\\s*)?(\\(.*\\))?\\s*\\B\\->\\*?",e:"\\->\\*?"},{b:"("+s+"\\s*(?:=|:=)\\s*)?!?(\\(.*\\))?\\s*\\B[-~]{1,2}>\\*?",e:"[-~]{1,2}>\\*?"},{b:"("+s+"\\s*(?:=|:=)\\s*)?(\\(.*\\))?\\s*\\B!?[-~]{1,2}>\\*?",e:"!?[-~]{1,2}>\\*?"}]},{cN:"class",bK:"class",e:"$",i:/[:="\[\]]/,c:[{bK:"extends",eW:!0,i:/[:="\[\]]/,c:[i]},i]},{cN:"attribute",b:s+":",e:":",rB:!0,rE:!0,r:0}])}});hljs.registerLanguage("haxe",function(e){var r="([*]|[a-zA-Z_$][a-zA-Z0-9_$]*)";return{aliases:["hx"],k:{keyword:"break callback case cast catch class continue default do dynamic else enum extends extern for function here if implements import in inline interface never new override package private public return static super switch this throw trace try typedef untyped using var while",literal:"true false null"},c:[e.ASM,e.QSM,e.CLCM,e.CBCM,e.CNM,{cN:"class",bK:"class interface",e:"{",eE:!0,c:[{bK:"extends implements"},e.TM]},{cN:"preprocessor",b:"#",e:"$",k:"if else elseif end error"},{cN:"function",bK:"function",e:"[{;]",eE:!0,i:"\\S",c:[e.TM,{cN:"params",b:"\\(",e:"\\)",c:[e.ASM,e.QSM,e.CLCM,e.CBCM]},{cN:"type",b:":",e:r,r:10}]}]}});hljs.registerLanguage("monkey",function(e){var n={cN:"number",r:0,v:[{b:"[$][a-fA-F0-9]+"},e.NM]};return{cI:!0,k:{keyword:"public private property continue exit extern new try catch eachin not abstract final select case default const local global field end if then else elseif endif while wend repeat until forever for to step next return module inline throw",built_in:"DebugLog DebugStop Error Print ACos ACosr ASin ASinr ATan ATan2 ATan2r ATanr Abs Abs Ceil Clamp Clamp Cos Cosr Exp Floor Log Max Max Min Min Pow Sgn Sgn Sin Sinr Sqrt Tan Tanr Seed PI HALFPI TWOPI",literal:"true false null and or shl shr mod"},c:[e.C("#rem","#end"),e.C("'","$",{r:0}),{cN:"function",bK:"function method",e:"[(=:]|$",i:/\n/,c:[e.UTM]},{cN:"class",bK:"class interface",e:"$",c:[{bK:"extends implements"},e.UTM]},{cN:"variable",b:"\\b(self|super)\\b"},{cN:"preprocessor",bK:"import",e:"$"},{cN:"preprocessor",b:"\\s*#",e:"$",k:"if else elseif endif end then"},{cN:"pi",b:"^\\s*strict\\b"},{bK:"alias",e:"=",c:[e.UTM]},e.QSM,n]}});hljs.registerLanguage("bash",function(e){var t={cN:"variable",v:[{b:/\$[\w\d#@][\w\d_]*/},{b:/\$\{(.*?)}/}]},s={cN:"string",b:/"/,e:/"/,c:[e.BE,t,{cN:"variable",b:/\$\(/,e:/\)/,c:[e.BE]}]},a={cN:"string",b:/'/,e:/'/};return{aliases:["sh","zsh"],l:/-?[a-z\.]+/,k:{keyword:"if then else elif fi for while in do done case esac function",literal:"true false",built_in:"break cd continue eval exec exit export getopts hash pwd readonly return shift test times trap umask unset alias bind builtin caller command declare echo enable help let local logout mapfile printf read readarray source type typeset ulimit unalias set shopt autoload bg bindkey bye cap chdir clone comparguments compcall compctl compdescribe compfiles compgroups compquote comptags comptry compvalues dirs disable disown echotc echoti emulate fc fg float functions getcap getln history integer jobs kill limit log noglob popd print pushd pushln rehash sched setcap setopt stat suspend ttyctl unfunction unhash unlimit unsetopt vared wait whence where which zcompile zformat zftp zle zmodload zparseopts zprof zpty zregexparse zsocket zstyle ztcp",operator:"-ne -eq -lt -gt -f -d -e -s -l -a"},c:[{cN:"shebang",b:/^#![^\n]+sh\s*$/,r:10},{cN:"function",b:/\w[\w\d_]*\s*\(\s*\)\s*\{/,rB:!0,c:[e.inherit(e.TM,{b:/\w[\w\d_]*/})],r:0},e.HCM,e.NM,s,a,t]}});hljs.registerLanguage("erlang",function(e){var r="[a-z'][a-zA-Z0-9_']*",c="("+r+":"+r+"|"+r+")",a={keyword:"after and andalso|10 band begin bnot bor bsl bzr bxor case catch cond div end fun if let not of orelse|10 query receive rem try when xor",literal:"false true"},n=e.C("%","$"),i={cN:"number",b:"\\b(\\d+#[a-fA-F0-9]+|\\d+(\\.\\d+)?([eE][-+]?\\d+)?)",r:0},b={b:"fun\\s+"+r+"/\\d+"},d={b:c+"\\(",e:"\\)",rB:!0,r:0,c:[{cN:"function_name",b:c,r:0},{b:"\\(",e:"\\)",eW:!0,rE:!0,r:0}]},o={cN:"tuple",b:"{",e:"}",r:0},t={cN:"variable",b:"\\b_([A-Z][A-Za-z0-9_]*)?",r:0},l={cN:"variable",b:"[A-Z][a-zA-Z0-9_]*",r:0},f={b:"#"+e.UIR,r:0,rB:!0,c:[{cN:"record_name",b:"#"+e.UIR,r:0},{b:"{",e:"}",r:0}]},s={bK:"fun receive if try case",e:"end",k:a};s.c=[n,b,e.inherit(e.ASM,{cN:""}),s,d,e.QSM,i,o,t,l,f];var u=[n,b,s,d,e.QSM,i,o,t,l,f];d.c[1].c=u,o.c=u,f.c[1].c=u;var v={cN:"params",b:"\\(",e:"\\)",c:u};return{aliases:["erl"],k:a,i:"(</|\\*=|\\+=|-=|/\\*|\\*/|\\(\\*|\\*\\))",c:[{cN:"function",b:"^"+r+"\\s*\\(",e:"->",rB:!0,i:"\\(|#|//|/\\*|\\\\|:|;",c:[v,e.inherit(e.TM,{b:r})],starts:{e:";|\\.",k:a,c:u}},n,{cN:"pp",b:"^-",e:"\\.",r:0,eE:!0,rB:!0,l:"-"+e.IR,k:"-module -record -undef -export -ifdef -ifndef -author -copyright -doc -vsn -import -include -include_lib -compile -define -else -endif -file -behaviour -behavior -spec",c:[v]},i,e.QSM,f,t,l,o,{b:/\.$/}]}});hljs.registerLanguage("kotlin",function(e){var a="val var get set class trait object public open private protected final enum if else do while for when break continue throw try catch finally import package is as in return fun override default companion reified inline volatile transient native";return{k:{typename:"Byte Short Char Int Long Boolean Float Double Void Unit Nothing",literal:"true false null",keyword:a},c:[e.CLCM,{cN:"javadoc",b:"/\\*\\*",e:"\\*//*",r:0,c:[{cN:"javadoctag",b:"(^|\\s)@[A-Za-z]+"}]},e.CBCM,{cN:"type",b:/</,e:/>/,rB:!0,eE:!1,r:0},{cN:"function",bK:"fun",e:"[(]|$",rB:!0,eE:!0,k:a,i:/fun\s+(<.*>)?[^\s\(]+(\s+[^\s\(]+)\s*=/,r:5,c:[{b:e.UIR+"\\s*\\(",rB:!0,r:0,c:[e.UTM]},{cN:"type",b:/</,e:/>/,k:"reified",r:0},{cN:"params",b:/\(/,e:/\)/,k:a,r:0,i:/\([^\(,\s:]+,/,c:[{cN:"typename",b:/:\s*/,e:/\s*[=\)]/,eB:!0,rE:!0,r:0}]},e.CLCM,e.CBCM]},{cN:"class",bK:"class trait",e:/[:\{(]|$/,eE:!0,i:"extends implements",c:[e.UTM,{cN:"type",b:/</,e:/>/,eB:!0,eE:!0,r:0},{cN:"typename",b:/[,:]\s*/,e:/[<\(,]|$/,eB:!0,rE:!0}]},{cN:"variable",bK:"var val",e:/\s*[=:$]/,eE:!0},e.QSM,{cN:"shebang",b:"^#!/usr/bin/env",e:"$",i:"\n"},e.CNM]}});hljs.registerLanguage("stylus",function(t){var e={cN:"variable",b:"\\$"+t.IR},o={cN:"hexcolor",b:"#([a-fA-F0-9]{6}|[a-fA-F0-9]{3})",r:10},i=["charset","css","debug","extend","font-face","for","import","include","media","mixin","page","warn","while"],r=["after","before","first-letter","first-line","active","first-child","focus","hover","lang","link","visited"],n=["a","abbr","address","article","aside","audio","b","blockquote","body","button","canvas","caption","cite","code","dd","del","details","dfn","div","dl","dt","em","fieldset","figcaption","figure","footer","form","h1","h2","h3","h4","h5","h6","header","hgroup","html","i","iframe","img","input","ins","kbd","label","legend","li","mark","menu","nav","object","ol","p","q","quote","samp","section","span","strong","summary","sup","table","tbody","td","textarea","tfoot","th","thead","time","tr","ul","var","video"],a="[\\.\\s\\n\\[\\:,]",l=["align-content","align-items","align-self","animation","animation-delay","animation-direction","animation-duration","animation-fill-mode","animation-iteration-count","animation-name","animation-play-state","animation-timing-function","auto","backface-visibility","background","background-attachment","background-clip","background-color","background-image","background-origin","background-position","background-repeat","background-size","border","border-bottom","border-bottom-color","border-bottom-left-radius","border-bottom-right-radius","border-bottom-style","border-bottom-width","border-collapse","border-color","border-image","border-image-outset","border-image-repeat","border-image-slice","border-image-source","border-image-width","border-left","border-left-color","border-left-style","border-left-width","border-radius","border-right","border-right-color","border-right-style","border-right-width","border-spacing","border-style","border-top","border-top-color","border-top-left-radius","border-top-right-radius","border-top-style","border-top-width","border-width","bottom","box-decoration-break","box-shadow","box-sizing","break-after","break-before","break-inside","caption-side","clear","clip","clip-path","color","column-count","column-fill","column-gap","column-rule","column-rule-color","column-rule-style","column-rule-width","column-span","column-width","columns","content","counter-increment","counter-reset","cursor","direction","display","empty-cells","filter","flex","flex-basis","flex-direction","flex-flow","flex-grow","flex-shrink","flex-wrap","float","font","font-family","font-feature-settings","font-kerning","font-language-override","font-size","font-size-adjust","font-stretch","font-style","font-variant","font-variant-ligatures","font-weight","height","hyphens","icon","image-orientation","image-rendering","image-resolution","ime-mode","inherit","initial","justify-content","left","letter-spacing","line-height","list-style","list-style-image","list-style-position","list-style-type","margin","margin-bottom","margin-left","margin-right","margin-top","marks","mask","max-height","max-width","min-height","min-width","nav-down","nav-index","nav-left","nav-right","nav-up","none","normal","object-fit","object-position","opacity","order","orphans","outline","outline-color","outline-offset","outline-style","outline-width","overflow","overflow-wrap","overflow-x","overflow-y","padding","padding-bottom","padding-left","padding-right","padding-top","page-break-after","page-break-before","page-break-inside","perspective","perspective-origin","pointer-events","position","quotes","resize","right","tab-size","table-layout","text-align","text-align-last","text-decoration","text-decoration-color","text-decoration-line","text-decoration-style","text-indent","text-overflow","text-rendering","text-shadow","text-transform","text-underline-position","top","transform","transform-origin","transform-style","transition","transition-delay","transition-duration","transition-property","transition-timing-function","unicode-bidi","vertical-align","visibility","white-space","widows","width","word-break","word-spacing","word-wrap","z-index"],d=["\\{","\\}","\\?","(\\bReturn\\b)","(\\bEnd\\b)","(\\bend\\b)",";","#\\s","\\*\\s","===\\s","\\|","%"];return{aliases:["styl"],cI:!1,i:"("+d.join("|")+")",k:"if else for in",c:[t.QSM,t.ASM,t.CLCM,t.CBCM,o,{b:"\\.[a-zA-Z][a-zA-Z0-9_-]*"+a,rB:!0,c:[{cN:"class",b:"\\.[a-zA-Z][a-zA-Z0-9_-]*"}]},{b:"\\#[a-zA-Z][a-zA-Z0-9_-]*"+a,rB:!0,c:[{cN:"id",b:"\\#[a-zA-Z][a-zA-Z0-9_-]*"}]},{b:"\\b("+n.join("|")+")"+a,rB:!0,c:[{cN:"tag",b:"\\b[a-zA-Z][a-zA-Z0-9_-]*"}]},{cN:"pseudo",b:"&?:?:\\b("+r.join("|")+")"+a},{cN:"at_rule",b:"@("+i.join("|")+")\\b"},e,t.CSSNM,t.NM,{cN:"function",b:"\\b[a-zA-Z][a-zA-Z0-9_-]*\\(.*\\)",i:"[\\n]",rB:!0,c:[{cN:"title",b:"\\b[a-zA-Z][a-zA-Z0-9_-]*"},{cN:"params",b:/\(/,e:/\)/,c:[o,e,t.ASM,t.CSSNM,t.NM,t.QSM]}]},{cN:"attribute",b:"\\b("+l.reverse().join("|")+")\\b"}]}});hljs.registerLanguage("css",function(e){var c="[a-zA-Z-][a-zA-Z0-9_-]*",a={cN:"function",b:c+"\\(",rB:!0,eE:!0,e:"\\("},r={cN:"rule",b:/[A-Z\_\.\-]+\s*:/,rB:!0,e:";",eW:!0,c:[{cN:"attribute",b:/\S/,e:":",eE:!0,starts:{cN:"value",eW:!0,eE:!0,c:[a,e.CSSNM,e.QSM,e.ASM,e.CBCM,{cN:"hexcolor",b:"#[0-9A-Fa-f]+"},{cN:"important",b:"!important"}]}}]};return{cI:!0,i:/[=\/|']/,c:[e.CBCM,r,{cN:"id",b:/\#[A-Za-z0-9_-]+/},{cN:"class",b:/\.[A-Za-z0-9_-]+/,r:0},{cN:"attr_selector",b:/\[/,e:/\]/,i:"$"},{cN:"pseudo",b:/:(:)?[a-zA-Z0-9\_\-\+\(\)"']+/},{cN:"at_rule",b:"@(font-face|page)",l:"[a-z-]+",k:"font-face page"},{cN:"at_rule",b:"@",e:"[{;]",c:[{cN:"keyword",b:/\S+/},{b:/\s/,eW:!0,eE:!0,r:0,c:[a,e.ASM,e.QSM,e.CSSNM]}]},{cN:"tag",b:c,r:0},{cN:"rules",b:"{",e:"}",i:/\S/,r:0,c:[e.CBCM,r]}]}});hljs.registerLanguage("puppet",function(e){var s="augeas computer cron exec file filebucket host interface k5login macauthorization mailalias maillist mcx mount nagios_command nagios_contact nagios_contactgroup nagios_host nagios_hostdependency nagios_hostescalation nagios_hostextinfo nagios_hostgroup nagios_service firewall nagios_servicedependency nagios_serviceescalation nagios_serviceextinfo nagios_servicegroup nagios_timeperiod notify package resources router schedule scheduled_task selboolean selmodule service ssh_authorized_key sshkey stage tidy user vlan yumrepo zfs zone zpool",r="alias audit before loglevel noop require subscribe tag owner ensure group mode name|0 changes context force incl lens load_path onlyif provider returns root show_diff type_check en_address ip_address realname command environment hour monute month monthday special target weekday creates cwd ogoutput refresh refreshonly tries try_sleep umask backup checksum content ctime force ignore links mtime purge recurse recurselimit replace selinux_ignore_defaults selrange selrole seltype seluser source souirce_permissions sourceselect validate_cmd validate_replacement allowdupe attribute_membership auth_membership forcelocal gid ia_load_module members system host_aliases ip allowed_trunk_vlans description device_url duplex encapsulation etherchannel native_vlan speed principals allow_root auth_class auth_type authenticate_user k_of_n mechanisms rule session_owner shared options device fstype enable hasrestart directory present absent link atboot blockdevice device dump pass remounts poller_tag use message withpath adminfile allow_virtual allowcdrom category configfiles flavor install_options instance package_settings platform responsefile status uninstall_options vendor unless_system_user unless_uid binary control flags hasstatus manifest pattern restart running start stop allowdupe auths expiry gid groups home iterations key_membership keys managehome membership password password_max_age password_min_age profile_membership profiles project purge_ssh_keys role_membership roles salt shell uid baseurl cost descr enabled enablegroups exclude failovermethod gpgcheck gpgkey http_caching include includepkgs keepalive metadata_expire metalink mirrorlist priority protect proxy proxy_password proxy_username repo_gpgcheck s3_enabled skip_if_unavailable sslcacert sslclientcert sslclientkey sslverify mounted",a={keyword:"and case class default define else elsif false if in import enherits node or true undef unless main settings $string "+s,literal:r,built_in:"architecture augeasversion blockdevices boardmanufacturer boardproductname boardserialnumber cfkey dhcp_servers domain ec2_ ec2_userdata facterversion filesystems ldom fqdn gid hardwareisa hardwaremodel hostname id|0 interfaces ipaddress ipaddress_ ipaddress6 ipaddress6_ iphostnumber is_virtual kernel kernelmajversion kernelrelease kernelversion kernelrelease kernelversion lsbdistcodename lsbdistdescription lsbdistid lsbdistrelease lsbmajdistrelease lsbminordistrelease lsbrelease macaddress macaddress_ macosx_buildversion macosx_productname macosx_productversion macosx_productverson_major macosx_productversion_minor manufacturer memoryfree memorysize netmask metmask_ network_ operatingsystem operatingsystemmajrelease operatingsystemrelease osfamily partitions path physicalprocessorcount processor processorcount productname ps puppetversion rubysitedir rubyversion selinux selinux_config_mode selinux_config_policy selinux_current_mode selinux_current_mode selinux_enforced selinux_policyversion serialnumber sp_ sshdsakey sshecdsakey sshrsakey swapencrypted swapfree swapsize timezone type uniqueid uptime uptime_days uptime_hours uptime_seconds uuid virtual vlans xendomains zfs_version zonenae zones zpool_version"},i=e.C("#","$"),o={cN:"string",c:[e.BE],v:[{b:/'/,e:/'/},{b:/"/,e:/"/}]},n=[o,i,{cN:"keyword",bK:"class",e:"$|;",i:/=/,c:[e.inherit(e.TM,{b:"(::)?[A-Za-z_]\\w*(::\\w+)*"}),i,o]},{cN:"keyword",b:"([a-zA-Z_(::)]+ *\\{)",c:[o,i],r:0},{cN:"keyword",b:"(\\}|\\{)",r:0},{cN:"function",b:"[a-zA-Z_]+\\s*=>"},{cN:"constant",b:"(::)?(\\b[A-Z][a-z_]*(::)?)+",r:0},{cN:"number",b:"(\\b0[0-7_]+)|(\\b0x[0-9a-fA-F_]+)|(\\b[1-9][0-9_]*(\\.[0-9_]+)?)|[0_]\\b",r:0}];return{aliases:["pp"],k:a,c:n}});hljs.registerLanguage("nimrod",function(t){return{aliases:["nim"],k:{keyword:"addr and as asm bind block break|0 case|0 cast const|0 continue|0 converter discard distinct|10 div do elif else|0 end|0 enum|0 except export finally for from generic if|0 import|0 in include|0 interface is isnot|10 iterator|10 let|0 macro method|10 mixin mod nil not notin|10 object|0 of or out proc|10 ptr raise ref|10 return shl shr static template|10 try|0 tuple type|0 using|0 var|0 when while|0 with without xor yield",literal:"shared guarded stdin stdout stderr result|10 true false"},c:[{cN:"decorator",b:/{\./,e:/\.}/,r:10},{cN:"string",b:/[a-zA-Z]\w*"/,e:/"/,c:[{b:/""/}]},{cN:"string",b:/([a-zA-Z]\w*)?"""/,e:/"""/},t.QSM,{cN:"type",b:/\b[A-Z]\w+\b/,r:0},{cN:"type",b:/\b(int|int8|int16|int32|int64|uint|uint8|uint16|uint32|uint64|float|float32|float64|bool|char|string|cstring|pointer|expr|stmt|void|auto|any|range|array|openarray|varargs|seq|set|clong|culong|cchar|cschar|cshort|cint|csize|clonglong|cfloat|cdouble|clongdouble|cuchar|cushort|cuint|culonglong|cstringarray|semistatic)\b/},{cN:"number",b:/\b(0[xX][0-9a-fA-F][_0-9a-fA-F]*)('?[iIuU](8|16|32|64))?/,r:0},{cN:"number",b:/\b(0o[0-7][_0-7]*)('?[iIuUfF](8|16|32|64))?/,r:0},{cN:"number",b:/\b(0(b|B)[01][_01]*)('?[iIuUfF](8|16|32|64))?/,r:0},{cN:"number",b:/\b(\d[_\d]*)('?[iIuUfF](8|16|32|64))?/,r:0},t.HCM]}});hljs.registerLanguage("smalltalk",function(a){var r="[a-z][a-zA-Z0-9_]*",s={cN:"char",b:"\\$.{1}"},c={cN:"symbol",b:"#"+a.UIR};return{aliases:["st"],k:"self super nil true false thisContext",c:[a.C('"','"'),a.ASM,{cN:"class",b:"\\b[A-Z][A-Za-z0-9_]*",r:0},{cN:"method",b:r+":",r:0},a.CNM,c,s,{cN:"localvars",b:"\\|[ ]*"+r+"([ ]+"+r+")*[ ]*\\|",rB:!0,e:/\|/,i:/\S/,c:[{b:"(\\|[ ]*)?"+r}]},{cN:"array",b:"\\#\\(",e:"\\)",c:[a.ASM,s,a.CNM,c]}]}});hljs.registerLanguage("x86asm",function(s){return{cI:!0,l:"\\.?"+s.IR,k:{keyword:"lock rep repe repz repne repnz xaquire xrelease bnd nobnd aaa aad aam aas adc add and arpl bb0_reset bb1_reset bound bsf bsr bswap bt btc btr bts call cbw cdq cdqe clc cld cli clts cmc cmp cmpsb cmpsd cmpsq cmpsw cmpxchg cmpxchg486 cmpxchg8b cmpxchg16b cpuid cpu_read cpu_write cqo cwd cwde daa das dec div dmint emms enter equ f2xm1 fabs fadd faddp fbld fbstp fchs fclex fcmovb fcmovbe fcmove fcmovnb fcmovnbe fcmovne fcmovnu fcmovu fcom fcomi fcomip fcomp fcompp fcos fdecstp fdisi fdiv fdivp fdivr fdivrp femms feni ffree ffreep fiadd ficom ficomp fidiv fidivr fild fimul fincstp finit fist fistp fisttp fisub fisubr fld fld1 fldcw fldenv fldl2e fldl2t fldlg2 fldln2 fldpi fldz fmul fmulp fnclex fndisi fneni fninit fnop fnsave fnstcw fnstenv fnstsw fpatan fprem fprem1 fptan frndint frstor fsave fscale fsetpm fsin fsincos fsqrt fst fstcw fstenv fstp fstsw fsub fsubp fsubr fsubrp ftst fucom fucomi fucomip fucomp fucompp fxam fxch fxtract fyl2x fyl2xp1 hlt ibts icebp idiv imul in inc incbin insb insd insw int int01 int1 int03 int3 into invd invpcid invlpg invlpga iret iretd iretq iretw jcxz jecxz jrcxz jmp jmpe lahf lar lds lea leave les lfence lfs lgdt lgs lidt lldt lmsw loadall loadall286 lodsb lodsd lodsq lodsw loop loope loopne loopnz loopz lsl lss ltr mfence monitor mov movd movq movsb movsd movsq movsw movsx movsxd movzx mul mwait neg nop not or out outsb outsd outsw packssdw packsswb packuswb paddb paddd paddsb paddsiw paddsw paddusb paddusw paddw pand pandn pause paveb pavgusb pcmpeqb pcmpeqd pcmpeqw pcmpgtb pcmpgtd pcmpgtw pdistib pf2id pfacc pfadd pfcmpeq pfcmpge pfcmpgt pfmax pfmin pfmul pfrcp pfrcpit1 pfrcpit2 pfrsqit1 pfrsqrt pfsub pfsubr pi2fd pmachriw pmaddwd pmagw pmulhriw pmulhrwa pmulhrwc pmulhw pmullw pmvgezb pmvlzb pmvnzb pmvzb pop popa popad popaw popf popfd popfq popfw por prefetch prefetchw pslld psllq psllw psrad psraw psrld psrlq psrlw psubb psubd psubsb psubsiw psubsw psubusb psubusw psubw punpckhbw punpckhdq punpckhwd punpcklbw punpckldq punpcklwd push pusha pushad pushaw pushf pushfd pushfq pushfw pxor rcl rcr rdshr rdmsr rdpmc rdtsc rdtscp ret retf retn rol ror rdm rsdc rsldt rsm rsts sahf sal salc sar sbb scasb scasd scasq scasw sfence sgdt shl shld shr shrd sidt sldt skinit smi smint smintold smsw stc std sti stosb stosd stosq stosw str sub svdc svldt svts swapgs syscall sysenter sysexit sysret test ud0 ud1 ud2b ud2 ud2a umov verr verw fwait wbinvd wrshr wrmsr xadd xbts xchg xlatb xlat xor cmove cmovz cmovne cmovnz cmova cmovnbe cmovae cmovnb cmovb cmovnae cmovbe cmovna cmovg cmovnle cmovge cmovnl cmovl cmovnge cmovle cmovng cmovc cmovnc cmovo cmovno cmovs cmovns cmovp cmovpe cmovnp cmovpo je jz jne jnz ja jnbe jae jnb jb jnae jbe jna jg jnle jge jnl jl jnge jle jng jc jnc jo jno js jns jpo jnp jpe jp sete setz setne setnz seta setnbe setae setnb setnc setb setnae setcset setbe setna setg setnle setge setnl setl setnge setle setng sets setns seto setno setpe setp setpo setnp addps addss andnps andps cmpeqps cmpeqss cmpleps cmpless cmpltps cmpltss cmpneqps cmpneqss cmpnleps cmpnless cmpnltps cmpnltss cmpordps cmpordss cmpunordps cmpunordss cmpps cmpss comiss cvtpi2ps cvtps2pi cvtsi2ss cvtss2si cvttps2pi cvttss2si divps divss ldmxcsr maxps maxss minps minss movaps movhps movlhps movlps movhlps movmskps movntps movss movups mulps mulss orps rcpps rcpss rsqrtps rsqrtss shufps sqrtps sqrtss stmxcsr subps subss ucomiss unpckhps unpcklps xorps fxrstor fxrstor64 fxsave fxsave64 xgetbv xsetbv xsave xsave64 xsaveopt xsaveopt64 xrstor xrstor64 prefetchnta prefetcht0 prefetcht1 prefetcht2 maskmovq movntq pavgb pavgw pextrw pinsrw pmaxsw pmaxub pminsw pminub pmovmskb pmulhuw psadbw pshufw pf2iw pfnacc pfpnacc pi2fw pswapd maskmovdqu clflush movntdq movnti movntpd movdqa movdqu movdq2q movq2dq paddq pmuludq pshufd pshufhw pshuflw pslldq psrldq psubq punpckhqdq punpcklqdq addpd addsd andnpd andpd cmpeqpd cmpeqsd cmplepd cmplesd cmpltpd cmpltsd cmpneqpd cmpneqsd cmpnlepd cmpnlesd cmpnltpd cmpnltsd cmpordpd cmpordsd cmpunordpd cmpunordsd cmppd comisd cvtdq2pd cvtdq2ps cvtpd2dq cvtpd2pi cvtpd2ps cvtpi2pd cvtps2dq cvtps2pd cvtsd2si cvtsd2ss cvtsi2sd cvtss2sd cvttpd2pi cvttpd2dq cvttps2dq cvttsd2si divpd divsd maxpd maxsd minpd minsd movapd movhpd movlpd movmskpd movupd mulpd mulsd orpd shufpd sqrtpd sqrtsd subpd subsd ucomisd unpckhpd unpcklpd xorpd addsubpd addsubps haddpd haddps hsubpd hsubps lddqu movddup movshdup movsldup clgi stgi vmcall vmclear vmfunc vmlaunch vmload vmmcall vmptrld vmptrst vmread vmresume vmrun vmsave vmwrite vmxoff vmxon invept invvpid pabsb pabsw pabsd palignr phaddw phaddd phaddsw phsubw phsubd phsubsw pmaddubsw pmulhrsw pshufb psignb psignw psignd extrq insertq movntsd movntss lzcnt blendpd blendps blendvpd blendvps dppd dpps extractps insertps movntdqa mpsadbw packusdw pblendvb pblendw pcmpeqq pextrb pextrd pextrq phminposuw pinsrb pinsrd pinsrq pmaxsb pmaxsd pmaxud pmaxuw pminsb pminsd pminud pminuw pmovsxbw pmovsxbd pmovsxbq pmovsxwd pmovsxwq pmovsxdq pmovzxbw pmovzxbd pmovzxbq pmovzxwd pmovzxwq pmovzxdq pmuldq pmulld ptest roundpd roundps roundsd roundss crc32 pcmpestri pcmpestrm pcmpistri pcmpistrm pcmpgtq popcnt getsec pfrcpv pfrsqrtv movbe aesenc aesenclast aesdec aesdeclast aesimc aeskeygenassist vaesenc vaesenclast vaesdec vaesdeclast vaesimc vaeskeygenassist vaddpd vaddps vaddsd vaddss vaddsubpd vaddsubps vandpd vandps vandnpd vandnps vblendpd vblendps vblendvpd vblendvps vbroadcastss vbroadcastsd vbroadcastf128 vcmpeq_ospd vcmpeqpd vcmplt_ospd vcmpltpd vcmple_ospd vcmplepd vcmpunord_qpd vcmpunordpd vcmpneq_uqpd vcmpneqpd vcmpnlt_uspd vcmpnltpd vcmpnle_uspd vcmpnlepd vcmpord_qpd vcmpordpd vcmpeq_uqpd vcmpnge_uspd vcmpngepd vcmpngt_uspd vcmpngtpd vcmpfalse_oqpd vcmpfalsepd vcmpneq_oqpd vcmpge_ospd vcmpgepd vcmpgt_ospd vcmpgtpd vcmptrue_uqpd vcmptruepd vcmplt_oqpd vcmple_oqpd vcmpunord_spd vcmpneq_uspd vcmpnlt_uqpd vcmpnle_uqpd vcmpord_spd vcmpeq_uspd vcmpnge_uqpd vcmpngt_uqpd vcmpfalse_ospd vcmpneq_ospd vcmpge_oqpd vcmpgt_oqpd vcmptrue_uspd vcmppd vcmpeq_osps vcmpeqps vcmplt_osps vcmpltps vcmple_osps vcmpleps vcmpunord_qps vcmpunordps vcmpneq_uqps vcmpneqps vcmpnlt_usps vcmpnltps vcmpnle_usps vcmpnleps vcmpord_qps vcmpordps vcmpeq_uqps vcmpnge_usps vcmpngeps vcmpngt_usps vcmpngtps vcmpfalse_oqps vcmpfalseps vcmpneq_oqps vcmpge_osps vcmpgeps vcmpgt_osps vcmpgtps vcmptrue_uqps vcmptrueps vcmplt_oqps vcmple_oqps vcmpunord_sps vcmpneq_usps vcmpnlt_uqps vcmpnle_uqps vcmpord_sps vcmpeq_usps vcmpnge_uqps vcmpngt_uqps vcmpfalse_osps vcmpneq_osps vcmpge_oqps vcmpgt_oqps vcmptrue_usps vcmpps vcmpeq_ossd vcmpeqsd vcmplt_ossd vcmpltsd vcmple_ossd vcmplesd vcmpunord_qsd vcmpunordsd vcmpneq_uqsd vcmpneqsd vcmpnlt_ussd vcmpnltsd vcmpnle_ussd vcmpnlesd vcmpord_qsd vcmpordsd vcmpeq_uqsd vcmpnge_ussd vcmpngesd vcmpngt_ussd vcmpngtsd vcmpfalse_oqsd vcmpfalsesd vcmpneq_oqsd vcmpge_ossd vcmpgesd vcmpgt_ossd vcmpgtsd vcmptrue_uqsd vcmptruesd vcmplt_oqsd vcmple_oqsd vcmpunord_ssd vcmpneq_ussd vcmpnlt_uqsd vcmpnle_uqsd vcmpord_ssd vcmpeq_ussd vcmpnge_uqsd vcmpngt_uqsd vcmpfalse_ossd vcmpneq_ossd vcmpge_oqsd vcmpgt_oqsd vcmptrue_ussd vcmpsd vcmpeq_osss vcmpeqss vcmplt_osss vcmpltss vcmple_osss vcmpless vcmpunord_qss vcmpunordss vcmpneq_uqss vcmpneqss vcmpnlt_usss vcmpnltss vcmpnle_usss vcmpnless vcmpord_qss vcmpordss vcmpeq_uqss vcmpnge_usss vcmpngess vcmpngt_usss vcmpngtss vcmpfalse_oqss vcmpfalsess vcmpneq_oqss vcmpge_osss vcmpgess vcmpgt_osss vcmpgtss vcmptrue_uqss vcmptruess vcmplt_oqss vcmple_oqss vcmpunord_sss vcmpneq_usss vcmpnlt_uqss vcmpnle_uqss vcmpord_sss vcmpeq_usss vcmpnge_uqss vcmpngt_uqss vcmpfalse_osss vcmpneq_osss vcmpge_oqss vcmpgt_oqss vcmptrue_usss vcmpss vcomisd vcomiss vcvtdq2pd vcvtdq2ps vcvtpd2dq vcvtpd2ps vcvtps2dq vcvtps2pd vcvtsd2si vcvtsd2ss vcvtsi2sd vcvtsi2ss vcvtss2sd vcvtss2si vcvttpd2dq vcvttps2dq vcvttsd2si vcvttss2si vdivpd vdivps vdivsd vdivss vdppd vdpps vextractf128 vextractps vhaddpd vhaddps vhsubpd vhsubps vinsertf128 vinsertps vlddqu vldqqu vldmxcsr vmaskmovdqu vmaskmovps vmaskmovpd vmaxpd vmaxps vmaxsd vmaxss vminpd vminps vminsd vminss vmovapd vmovaps vmovd vmovq vmovddup vmovdqa vmovqqa vmovdqu vmovqqu vmovhlps vmovhpd vmovhps vmovlhps vmovlpd vmovlps vmovmskpd vmovmskps vmovntdq vmovntqq vmovntdqa vmovntpd vmovntps vmovsd vmovshdup vmovsldup vmovss vmovupd vmovups vmpsadbw vmulpd vmulps vmulsd vmulss vorpd vorps vpabsb vpabsw vpabsd vpacksswb vpackssdw vpackuswb vpackusdw vpaddb vpaddw vpaddd vpaddq vpaddsb vpaddsw vpaddusb vpaddusw vpalignr vpand vpandn vpavgb vpavgw vpblendvb vpblendw vpcmpestri vpcmpestrm vpcmpistri vpcmpistrm vpcmpeqb vpcmpeqw vpcmpeqd vpcmpeqq vpcmpgtb vpcmpgtw vpcmpgtd vpcmpgtq vpermilpd vpermilps vperm2f128 vpextrb vpextrw vpextrd vpextrq vphaddw vphaddd vphaddsw vphminposuw vphsubw vphsubd vphsubsw vpinsrb vpinsrw vpinsrd vpinsrq vpmaddwd vpmaddubsw vpmaxsb vpmaxsw vpmaxsd vpmaxub vpmaxuw vpmaxud vpminsb vpminsw vpminsd vpminub vpminuw vpminud vpmovmskb vpmovsxbw vpmovsxbd vpmovsxbq vpmovsxwd vpmovsxwq vpmovsxdq vpmovzxbw vpmovzxbd vpmovzxbq vpmovzxwd vpmovzxwq vpmovzxdq vpmulhuw vpmulhrsw vpmulhw vpmullw vpmulld vpmuludq vpmuldq vpor vpsadbw vpshufb vpshufd vpshufhw vpshuflw vpsignb vpsignw vpsignd vpslldq vpsrldq vpsllw vpslld vpsllq vpsraw vpsrad vpsrlw vpsrld vpsrlq vptest vpsubb vpsubw vpsubd vpsubq vpsubsb vpsubsw vpsubusb vpsubusw vpunpckhbw vpunpckhwd vpunpckhdq vpunpckhqdq vpunpcklbw vpunpcklwd vpunpckldq vpunpcklqdq vpxor vrcpps vrcpss vrsqrtps vrsqrtss vroundpd vroundps vroundsd vroundss vshufpd vshufps vsqrtpd vsqrtps vsqrtsd vsqrtss vstmxcsr vsubpd vsubps vsubsd vsubss vtestps vtestpd vucomisd vucomiss vunpckhpd vunpckhps vunpcklpd vunpcklps vxorpd vxorps vzeroall vzeroupper pclmullqlqdq pclmulhqlqdq pclmullqhqdq pclmulhqhqdq pclmulqdq vpclmullqlqdq vpclmulhqlqdq vpclmullqhqdq vpclmulhqhqdq vpclmulqdq vfmadd132ps vfmadd132pd vfmadd312ps vfmadd312pd vfmadd213ps vfmadd213pd vfmadd123ps vfmadd123pd vfmadd231ps vfmadd231pd vfmadd321ps vfmadd321pd vfmaddsub132ps vfmaddsub132pd vfmaddsub312ps vfmaddsub312pd vfmaddsub213ps vfmaddsub213pd vfmaddsub123ps vfmaddsub123pd vfmaddsub231ps vfmaddsub231pd vfmaddsub321ps vfmaddsub321pd vfmsub132ps vfmsub132pd vfmsub312ps vfmsub312pd vfmsub213ps vfmsub213pd vfmsub123ps vfmsub123pd vfmsub231ps vfmsub231pd vfmsub321ps vfmsub321pd vfmsubadd132ps vfmsubadd132pd vfmsubadd312ps vfmsubadd312pd vfmsubadd213ps vfmsubadd213pd vfmsubadd123ps vfmsubadd123pd vfmsubadd231ps vfmsubadd231pd vfmsubadd321ps vfmsubadd321pd vfnmadd132ps vfnmadd132pd vfnmadd312ps vfnmadd312pd vfnmadd213ps vfnmadd213pd vfnmadd123ps vfnmadd123pd vfnmadd231ps vfnmadd231pd vfnmadd321ps vfnmadd321pd vfnmsub132ps vfnmsub132pd vfnmsub312ps vfnmsub312pd vfnmsub213ps vfnmsub213pd vfnmsub123ps vfnmsub123pd vfnmsub231ps vfnmsub231pd vfnmsub321ps vfnmsub321pd vfmadd132ss vfmadd132sd vfmadd312ss vfmadd312sd vfmadd213ss vfmadd213sd vfmadd123ss vfmadd123sd vfmadd231ss vfmadd231sd vfmadd321ss vfmadd321sd vfmsub132ss vfmsub132sd vfmsub312ss vfmsub312sd vfmsub213ss vfmsub213sd vfmsub123ss vfmsub123sd vfmsub231ss vfmsub231sd vfmsub321ss vfmsub321sd vfnmadd132ss vfnmadd132sd vfnmadd312ss vfnmadd312sd vfnmadd213ss vfnmadd213sd vfnmadd123ss vfnmadd123sd vfnmadd231ss vfnmadd231sd vfnmadd321ss vfnmadd321sd vfnmsub132ss vfnmsub132sd vfnmsub312ss vfnmsub312sd vfnmsub213ss vfnmsub213sd vfnmsub123ss vfnmsub123sd vfnmsub231ss vfnmsub231sd vfnmsub321ss vfnmsub321sd rdfsbase rdgsbase rdrand wrfsbase wrgsbase vcvtph2ps vcvtps2ph adcx adox rdseed clac stac xstore xcryptecb xcryptcbc xcryptctr xcryptcfb xcryptofb montmul xsha1 xsha256 llwpcb slwpcb lwpval lwpins vfmaddpd vfmaddps vfmaddsd vfmaddss vfmaddsubpd vfmaddsubps vfmsubaddpd vfmsubaddps vfmsubpd vfmsubps vfmsubsd vfmsubss vfnmaddpd vfnmaddps vfnmaddsd vfnmaddss vfnmsubpd vfnmsubps vfnmsubsd vfnmsubss vfrczpd vfrczps vfrczsd vfrczss vpcmov vpcomb vpcomd vpcomq vpcomub vpcomud vpcomuq vpcomuw vpcomw vphaddbd vphaddbq vphaddbw vphadddq vphaddubd vphaddubq vphaddubw vphaddudq vphadduwd vphadduwq vphaddwd vphaddwq vphsubbw vphsubdq vphsubwd vpmacsdd vpmacsdqh vpmacsdql vpmacssdd vpmacssdqh vpmacssdql vpmacsswd vpmacssww vpmacswd vpmacsww vpmadcsswd vpmadcswd vpperm vprotb vprotd vprotq vprotw vpshab vpshad vpshaq vpshaw vpshlb vpshld vpshlq vpshlw vbroadcasti128 vpblendd vpbroadcastb vpbroadcastw vpbroadcastd vpbroadcastq vpermd vpermpd vpermps vpermq vperm2i128 vextracti128 vinserti128 vpmaskmovd vpmaskmovq vpsllvd vpsllvq vpsravd vpsrlvd vpsrlvq vgatherdpd vgatherqpd vgatherdps vgatherqps vpgatherdd vpgatherqd vpgatherdq vpgatherqq xabort xbegin xend xtest andn bextr blci blcic blsi blsic blcfill blsfill blcmsk blsmsk blsr blcs bzhi mulx pdep pext rorx sarx shlx shrx tzcnt tzmsk t1mskc valignd valignq vblendmpd vblendmps vbroadcastf32x4 vbroadcastf64x4 vbroadcasti32x4 vbroadcasti64x4 vcompresspd vcompressps vcvtpd2udq vcvtps2udq vcvtsd2usi vcvtss2usi vcvttpd2udq vcvttps2udq vcvttsd2usi vcvttss2usi vcvtudq2pd vcvtudq2ps vcvtusi2sd vcvtusi2ss vexpandpd vexpandps vextractf32x4 vextractf64x4 vextracti32x4 vextracti64x4 vfixupimmpd vfixupimmps vfixupimmsd vfixupimmss vgetexppd vgetexpps vgetexpsd vgetexpss vgetmantpd vgetmantps vgetmantsd vgetmantss vinsertf32x4 vinsertf64x4 vinserti32x4 vinserti64x4 vmovdqa32 vmovdqa64 vmovdqu32 vmovdqu64 vpabsq vpandd vpandnd vpandnq vpandq vpblendmd vpblendmq vpcmpltd vpcmpled vpcmpneqd vpcmpnltd vpcmpnled vpcmpd vpcmpltq vpcmpleq vpcmpneqq vpcmpnltq vpcmpnleq vpcmpq vpcmpequd vpcmpltud vpcmpleud vpcmpnequd vpcmpnltud vpcmpnleud vpcmpud vpcmpequq vpcmpltuq vpcmpleuq vpcmpnequq vpcmpnltuq vpcmpnleuq vpcmpuq vpcompressd vpcompressq vpermi2d vpermi2pd vpermi2ps vpermi2q vpermt2d vpermt2pd vpermt2ps vpermt2q vpexpandd vpexpandq vpmaxsq vpmaxuq vpminsq vpminuq vpmovdb vpmovdw vpmovqb vpmovqd vpmovqw vpmovsdb vpmovsdw vpmovsqb vpmovsqd vpmovsqw vpmovusdb vpmovusdw vpmovusqb vpmovusqd vpmovusqw vpord vporq vprold vprolq vprolvd vprolvq vprord vprorq vprorvd vprorvq vpscatterdd vpscatterdq vpscatterqd vpscatterqq vpsraq vpsravq vpternlogd vpternlogq vptestmd vptestmq vptestnmd vptestnmq vpxord vpxorq vrcp14pd vrcp14ps vrcp14sd vrcp14ss vrndscalepd vrndscaleps vrndscalesd vrndscaless vrsqrt14pd vrsqrt14ps vrsqrt14sd vrsqrt14ss vscalefpd vscalefps vscalefsd vscalefss vscatterdpd vscatterdps vscatterqpd vscatterqps vshuff32x4 vshuff64x2 vshufi32x4 vshufi64x2 kandnw kandw kmovw knotw kortestw korw kshiftlw kshiftrw kunpckbw kxnorw kxorw vpbroadcastmb2q vpbroadcastmw2d vpconflictd vpconflictq vplzcntd vplzcntq vexp2pd vexp2ps vrcp28pd vrcp28ps vrcp28sd vrcp28ss vrsqrt28pd vrsqrt28ps vrsqrt28sd vrsqrt28ss vgatherpf0dpd vgatherpf0dps vgatherpf0qpd vgatherpf0qps vgatherpf1dpd vgatherpf1dps vgatherpf1qpd vgatherpf1qps vscatterpf0dpd vscatterpf0dps vscatterpf0qpd vscatterpf0qps vscatterpf1dpd vscatterpf1dps vscatterpf1qpd vscatterpf1qps prefetchwt1 bndmk bndcl bndcu bndcn bndmov bndldx bndstx sha1rnds4 sha1nexte sha1msg1 sha1msg2 sha256rnds2 sha256msg1 sha256msg2 hint_nop0 hint_nop1 hint_nop2 hint_nop3 hint_nop4 hint_nop5 hint_nop6 hint_nop7 hint_nop8 hint_nop9 hint_nop10 hint_nop11 hint_nop12 hint_nop13 hint_nop14 hint_nop15 hint_nop16 hint_nop17 hint_nop18 hint_nop19 hint_nop20 hint_nop21 hint_nop22 hint_nop23 hint_nop24 hint_nop25 hint_nop26 hint_nop27 hint_nop28 hint_nop29 hint_nop30 hint_nop31 hint_nop32 hint_nop33 hint_nop34 hint_nop35 hint_nop36 hint_nop37 hint_nop38 hint_nop39 hint_nop40 hint_nop41 hint_nop42 hint_nop43 hint_nop44 hint_nop45 hint_nop46 hint_nop47 hint_nop48 hint_nop49 hint_nop50 hint_nop51 hint_nop52 hint_nop53 hint_nop54 hint_nop55 hint_nop56 hint_nop57 hint_nop58 hint_nop59 hint_nop60 hint_nop61 hint_nop62 hint_nop63",literal:"ip eip rip al ah bl bh cl ch dl dh sil dil bpl spl r8b r9b r10b r11b r12b r13b r14b r15b ax bx cx dx si di bp sp r8w r9w r10w r11w r12w r13w r14w r15w eax ebx ecx edx esi edi ebp esp eip r8d r9d r10d r11d r12d r13d r14d r15d rax rbx rcx rdx rsi rdi rbp rsp r8 r9 r10 r11 r12 r13 r14 r15 cs ds es fs gs ss st st0 st1 st2 st3 st4 st5 st6 st7 mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 xmm0  xmm1  xmm2  xmm3  xmm4  xmm5  xmm6  xmm7  xmm8  xmm9 xmm10  xmm11 xmm12 xmm13 xmm14 xmm15 xmm16 xmm17 xmm18 xmm19 xmm20 xmm21 xmm22 xmm23 xmm24 xmm25 xmm26 xmm27 xmm28 xmm29 xmm30 xmm31 ymm0  ymm1  ymm2  ymm3  ymm4  ymm5  ymm6  ymm7  ymm8  ymm9 ymm10  ymm11 ymm12 ymm13 ymm14 ymm15 ymm16 ymm17 ymm18 ymm19 ymm20 ymm21 ymm22 ymm23 ymm24 ymm25 ymm26 ymm27 ymm28 ymm29 ymm30 ymm31 zmm0  zmm1  zmm2  zmm3  zmm4  zmm5  zmm6  zmm7  zmm8  zmm9 zmm10  zmm11 zmm12 zmm13 zmm14 zmm15 zmm16 zmm17 zmm18 zmm19 zmm20 zmm21 zmm22 zmm23 zmm24 zmm25 zmm26 zmm27 zmm28 zmm29 zmm30 zmm31 k0 k1 k2 k3 k4 k5 k6 k7 bnd0 bnd1 bnd2 bnd3 cr0 cr1 cr2 cr3 cr4 cr8 dr0 dr1 dr2 dr3 dr8 tr3 tr4 tr5 tr6 tr7 r0 r1 r2 r3 r4 r5 r6 r7 r0b r1b r2b r3b r4b r5b r6b r7b r0w r1w r2w r3w r4w r5w r6w r7w r0d r1d r2d r3d r4d r5d r6d r7d r0h r1h r2h r3h r0l r1l r2l r3l r4l r5l r6l r7l r8l r9l r10l r11l r12l r13l r14l r15l",pseudo:"db dw dd dq dt ddq do dy dz resb resw resd resq rest resdq reso resy resz incbin equ times",preprocessor:"%define %xdefine %+ %undef %defstr %deftok %assign %strcat %strlen %substr %rotate %elif %else %endif %ifmacro %ifctx %ifidn %ifidni %ifid %ifnum %ifstr %iftoken %ifempty %ifenv %error %warning %fatal %rep %endrep %include %push %pop %repl %pathsearch %depend %use %arg %stacksize %local %line %comment %endcomment .nolist byte word dword qword nosplit rel abs seg wrt strict near far a32 ptr __FILE__ __LINE__ __SECT__  __BITS__ __OUTPUT_FORMAT__ __DATE__ __TIME__ __DATE_NUM__ __TIME_NUM__ __UTC_DATE__ __UTC_TIME__ __UTC_DATE_NUM__ __UTC_TIME_NUM__  __PASS__ struc endstruc istruc at iend align alignb sectalign daz nodaz up down zero default option assume public ",built_in:"bits use16 use32 use64 default section segment absolute extern global common cpu float __utf16__ __utf16le__ __utf16be__ __utf32__ __utf32le__ __utf32be__ __float8__ __float16__ __float32__ __float64__ __float80m__ __float80e__ __float128l__ __float128h__ __Infinity__ __QNaN__ __SNaN__ Inf NaN QNaN SNaN float8 float16 float32 float64 float80m float80e float128l float128h __FLOAT_DAZ__ __FLOAT_ROUND__ __FLOAT__"},c:[s.C(";","$",{r:0}),{cN:"number",b:"\\b(?:([0-9][0-9_]*)?\\.[0-9_]*(?:[eE][+-]?[0-9_]+)?|(0[Xx])?[0-9][0-9_]*\\.?[0-9_]*(?:[pP](?:[+-]?[0-9_]+)?)?)\\b",r:0},{cN:"number",b:"\\$[0-9][0-9A-Fa-f]*",r:0},{cN:"number",b:"\\b(?:[0-9A-Fa-f][0-9A-Fa-f_]*[HhXx]|[0-9][0-9_]*[DdTt]?|[0-7][0-7_]*[QqOo]|[0-1][0-1_]*[BbYy])\\b"},{cN:"number",b:"\\b(?:0[HhXx][0-9A-Fa-f_]+|0[DdTt][0-9_]+|0[QqOo][0-7_]+|0[BbYy][0-1_]+)\\b"},s.QSM,{cN:"string",b:"'",e:"[^\\\\]'",r:0},{cN:"string",b:"`",e:"[^\\\\]`",r:0},{cN:"string",b:"\\.[A-Za-z0-9]+",r:0},{cN:"label",b:"^\\s*[A-Za-z._?][A-Za-z0-9_$#@~.?]*(:|\\s+label)",r:0},{cN:"label",b:"^\\s*%%[A-Za-z0-9_$#@~.?]*:",r:0},{cN:"argument",b:"%[0-9]+",r:0},{cN:"built_in",b:"%!S+",r:0}]}});hljs.registerLanguage("roboconf",function(e){var n="[a-zA-Z-_][^\n{\r\n]+\\{";return{aliases:["graph","instances"],cI:!0,k:"import",c:[{cN:"facet",b:"^facet "+n,e:"}",k:"facet installer exports children extends",c:[e.HCM]},{cN:"instance-of",b:"^instance of "+n,e:"}",k:"name count channels instance-data instance-state instance of",c:[{cN:"keyword",b:"[a-zA-Z-_]+( |	)*:"},e.HCM]},{cN:"component",b:"^"+n,e:"}",l:"\\(?[a-zA-Z]+\\)?",k:"installer exports children extends imports facets alias (optional)",c:[{cN:"string",b:"\\.[a-zA-Z-_]+",e:"\\s|,|;",eE:!0},e.HCM]},e.HCM]}});hljs.registerLanguage("ruby",function(e){var c="[a-zA-Z_]\\w*[!?=]?|[-+~]\\@|<<|>>|=~|===?|<=>|[<>]=?|\\*\\*|[-/+%^&*~`|]|\\[\\]=?",r="and false then defined module in return redo if BEGIN retry end for true self when next until do begin unless END rescue nil else break undef not super class case require yield alias while ensure elsif or include attr_reader attr_writer attr_accessor",b={cN:"yardoctag",b:"@[A-Za-z]+"},a={cN:"value",b:"#<",e:">"},n=[e.C("#","$",{c:[b]}),e.C("^\\=begin","^\\=end",{c:[b],r:10}),e.C("^__END__","\\n$")],s={cN:"subst",b:"#\\{",e:"}",k:r},t={cN:"string",c:[e.BE,s],v:[{b:/'/,e:/'/},{b:/"/,e:/"/},{b:/`/,e:/`/},{b:"%[qQwWx]?\\(",e:"\\)"},{b:"%[qQwWx]?\\[",e:"\\]"},{b:"%[qQwWx]?{",e:"}"},{b:"%[qQwWx]?<",e:">"},{b:"%[qQwWx]?/",e:"/"},{b:"%[qQwWx]?%",e:"%"},{b:"%[qQwWx]?-",e:"-"},{b:"%[qQwWx]?\\|",e:"\\|"},{b:/\B\?(\\\d{1,3}|\\x[A-Fa-f0-9]{1,2}|\\u[A-Fa-f0-9]{4}|\\?\S)\b/}]},i={cN:"params",b:"\\(",e:"\\)",k:r},d=[t,a,{cN:"class",bK:"class module",e:"$|;",i:/=/,c:[e.inherit(e.TM,{b:"[A-Za-z_]\\w*(::\\w+)*(\\?|\\!)?"}),{cN:"inheritance",b:"<\\s*",c:[{cN:"parent",b:"("+e.IR+"::)?"+e.IR}]}].concat(n)},{cN:"function",bK:"def",e:" |$|;",r:0,c:[e.inherit(e.TM,{b:c}),i].concat(n)},{cN:"constant",b:"(::)?(\\b[A-Z]\\w*(::)?)+",r:0},{cN:"symbol",b:e.UIR+"(\\!|\\?)?:",r:0},{cN:"symbol",b:":",c:[t,{b:c}],r:0},{cN:"number",b:"(\\b0[0-7_]+)|(\\b0x[0-9a-fA-F_]+)|(\\b[1-9][0-9_]*(\\.[0-9_]+)?)|[0_]\\b",r:0},{cN:"variable",b:"(\\$\\W)|((\\$|\\@\\@?)(\\w+))"},{b:"("+e.RSR+")\\s*",c:[a,{cN:"regexp",c:[e.BE,s],i:/\n/,v:[{b:"/",e:"/[a-z]*"},{b:"%r{",e:"}[a-z]*"},{b:"%r\\(",e:"\\)[a-z]*"},{b:"%r!",e:"![a-z]*"},{b:"%r\\[",e:"\\][a-z]*"}]}].concat(n),r:0}].concat(n);s.c=d,i.c=d;var o="[>?]>",l="[\\w#]+\\(\\w+\\):\\d+:\\d+>",u="(\\w+-)?\\d+\\.\\d+\\.\\d(p\\d+)?[^>]+>",N=[{b:/^\s*=>/,cN:"status",starts:{e:"$",c:d}},{cN:"prompt",b:"^("+o+"|"+l+"|"+u+")",starts:{e:"$",c:d}}];return{aliases:["rb","gemspec","podspec","thor","irb"],k:r,c:n.concat(N).concat(d)}});hljs.registerLanguage("typescript",function(e){return{aliases:["ts"],k:{keyword:"in if for while finally var new function|0 do return void else break catch instanceof with throw case default try this switch continue typeof delete let yield const class public private get set super interface extendsstatic constructor implements enum export import declare type protected",literal:"true false null undefined NaN Infinity",built_in:"eval isFinite isNaN parseFloat parseInt decodeURI decodeURIComponent encodeURI encodeURIComponent escape unescape Object Function Boolean Error EvalError InternalError RangeError ReferenceError StopIteration SyntaxError TypeError URIError Number Math Date String RegExp Array Float32Array Float64Array Int16Array Int32Array Int8Array Uint16Array Uint32Array Uint8Array Uint8ClampedArray ArrayBuffer DataView JSON Intl arguments require module console window document any number boolean string void"},c:[{cN:"pi",b:/^\s*('|")use strict('|")/,r:0},e.ASM,e.QSM,e.CLCM,e.CBCM,e.CNM,{b:"("+e.RSR+"|\\b(case|return|throw)\\b)\\s*",k:"return throw case",c:[e.CLCM,e.CBCM,e.RM,{b:/</,e:/>;/,r:0,sL:"xml"}],r:0},{cN:"function",bK:"function",e:/\{/,eE:!0,c:[e.inherit(e.TM,{b:/[A-Za-z$_][0-9A-Za-z$_]*/}),{cN:"params",b:/\(/,e:/\)/,c:[e.CLCM,e.CBCM],i:/["'\(]/}],i:/\[|%/,r:0},{cN:"constructor",bK:"constructor",e:/\{/,eE:!0,r:10},{cN:"module",bK:"module",e:/\{/,eE:!0},{cN:"interface",bK:"interface",e:/\{/,eE:!0},{b:/\$[(.]/},{b:"\\."+e.IR,r:0}]}});hljs.registerLanguage("handlebars",function(e){var a="each in with if else unless bindattr action collection debugger log outlet template unbound view yield";return{aliases:["hbs","html.hbs","html.handlebars"],cI:!0,sL:"xml",subLanguageMode:"continuous",c:[{cN:"expression",b:"{{",e:"}}",c:[{cN:"begin-block",b:"#[a-zA-Z- .]+",k:a},{cN:"string",b:'"',e:'"'},{cN:"end-block",b:"\\/[a-zA-Z- .]+",k:a},{cN:"variable",b:"[a-zA-Z-.]+",k:a}]}]}});hljs.registerLanguage("mercury",function(e){var i={keyword:"module use_module import_module include_module end_module initialise mutable initialize finalize finalise interface implementation pred mode func type inst solver any_pred any_func is semidet det nondet multi erroneous failure cc_nondet cc_multi typeclass instance where pragma promise external trace atomic or_else require_complete_switch require_det require_semidet require_multi require_nondet require_cc_multi require_cc_nondet require_erroneous require_failure",pragma:"inline no_inline type_spec source_file fact_table obsolete memo loop_check minimal_model terminates does_not_terminate check_termination promise_equivalent_clauses",preprocessor:"foreign_proc foreign_decl foreign_code foreign_type foreign_import_module foreign_export_enum foreign_export foreign_enum may_call_mercury will_not_call_mercury thread_safe not_thread_safe maybe_thread_safe promise_pure promise_semipure tabled_for_io local untrailed trailed attach_to_io_state can_pass_as_mercury_type stable will_not_throw_exception may_modify_trail will_not_modify_trail may_duplicate may_not_duplicate affects_liveness does_not_affect_liveness doesnt_affect_liveness no_sharing unknown_sharing sharing",built_in:"some all not if then else true fail false try catch catch_any semidet_true semidet_false semidet_fail impure_true impure semipure"},r={cN:"label",b:"XXX",e:"$",eW:!0,r:0},t=e.inherit(e.CLCM,{b:"%"}),_=e.inherit(e.CBCM,{r:0});t.c.push(r),_.c.push(r);var n={cN:"number",b:"0'.\\|0[box][0-9a-fA-F]*"},a=e.inherit(e.ASM,{r:0}),o=e.inherit(e.QSM,{r:0}),l={cN:"constant",b:"\\\\[abfnrtv]\\|\\\\x[0-9a-fA-F]*\\\\\\|%[-+# *.0-9]*[dioxXucsfeEgGp]",r:0};o.c.push(l);var s={cN:"built_in",v:[{b:"<=>"},{b:"<=",r:0},{b:"=>",r:0},{b:"/\\\\"},{b:"\\\\/"}]},c={cN:"built_in",v:[{b:":-\\|-->"},{b:"=",r:0}]};return{aliases:["m","moo"],k:i,c:[s,c,t,_,n,e.NM,a,o,{b:/:-/}]}});hljs.registerLanguage("fix",function(u){return{c:[{b:/[^\u2401\u0001]+/,e:/[\u2401\u0001]/,eE:!0,rB:!0,rE:!1,c:[{b:/([^\u2401\u0001=]+)/,e:/=([^\u2401\u0001=]+)/,rE:!0,rB:!1,cN:"attribute"},{b:/=/,e:/([\u2401\u0001])/,eE:!0,eB:!0,cN:"string"}]}],cI:!0}});hljs.registerLanguage("clojure",function(e){var t={built_in:"def cond apply if-not if-let if not not= = < > <= >= == + / * - rem quot neg? pos? delay? symbol? keyword? true? false? integer? empty? coll? list? set? ifn? fn? associative? sequential? sorted? counted? reversible? number? decimal? class? distinct? isa? float? rational? reduced? ratio? odd? even? char? seq? vector? string? map? nil? contains? zero? instance? not-every? not-any? libspec? -> ->> .. . inc compare do dotimes mapcat take remove take-while drop letfn drop-last take-last drop-while while intern condp case reduced cycle split-at split-with repeat replicate iterate range merge zipmap declare line-seq sort comparator sort-by dorun doall nthnext nthrest partition eval doseq await await-for let agent atom send send-off release-pending-sends add-watch mapv filterv remove-watch agent-error restart-agent set-error-handler error-handler set-error-mode! error-mode shutdown-agents quote var fn loop recur throw try monitor-enter monitor-exit defmacro defn defn- macroexpand macroexpand-1 for dosync and or when when-not when-let comp juxt partial sequence memoize constantly complement identity assert peek pop doto proxy defstruct first rest cons defprotocol cast coll deftype defrecord last butlast sigs reify second ffirst fnext nfirst nnext defmulti defmethod meta with-meta ns in-ns create-ns import refer keys select-keys vals key val rseq name namespace promise into transient persistent! conj! assoc! dissoc! pop! disj! use class type num float double short byte boolean bigint biginteger bigdec print-method print-dup throw-if printf format load compile get-in update-in pr pr-on newline flush read slurp read-line subvec with-open memfn time re-find re-groups rand-int rand mod locking assert-valid-fdecl alias resolve ref deref refset swap! reset! set-validator! compare-and-set! alter-meta! reset-meta! commute get-validator alter ref-set ref-history-count ref-min-history ref-max-history ensure sync io! new next conj set! to-array future future-call into-array aset gen-class reduce map filter find empty hash-map hash-set sorted-map sorted-map-by sorted-set sorted-set-by vec vector seq flatten reverse assoc dissoc list disj get union difference intersection extend extend-type extend-protocol int nth delay count concat chunk chunk-buffer chunk-append chunk-first chunk-rest max min dec unchecked-inc-int unchecked-inc unchecked-dec-inc unchecked-dec unchecked-negate unchecked-add-int unchecked-add unchecked-subtract-int unchecked-subtract chunk-next chunk-cons chunked-seq? prn vary-meta lazy-seq spread list* str find-keyword keyword symbol gensym force rationalize"},r="a-zA-Z_\\-!.?+*=<>&#'",n="["+r+"]["+r+"0-9/;:]*",a="[-+]?\\d+(\\.\\d+)?",o={b:n,r:0},s={cN:"number",b:a,r:0},i=e.inherit(e.QSM,{i:null}),c=e.C(";","$",{r:0}),d={cN:"literal",b:/\b(true|false|nil)\b/},l={cN:"collection",b:"[\\[\\{]",e:"[\\]\\}]"},m={cN:"comment",b:"\\^"+n},p=e.C("\\^\\{","\\}"),u={cN:"attribute",b:"[:]"+n},f={cN:"list",b:"\\(",e:"\\)"},h={eW:!0,r:0},y={k:t,l:n,cN:"keyword",b:n,starts:h},b=[f,i,m,p,c,u,l,s,d,o];return f.c=[e.C("comment",""),y,h],h.c=b,l.c=b,{aliases:["clj"],i:/\S/,c:[f,i,m,p,c,u,l,s,d]}});hljs.registerLanguage("perl",function(e){var t="getpwent getservent quotemeta msgrcv scalar kill dbmclose undef lc ma syswrite tr send umask sysopen shmwrite vec qx utime local oct semctl localtime readpipe do return format read sprintf dbmopen pop getpgrp not getpwnam rewinddir qqfileno qw endprotoent wait sethostent bless s|0 opendir continue each sleep endgrent shutdown dump chomp connect getsockname die socketpair close flock exists index shmgetsub for endpwent redo lstat msgctl setpgrp abs exit select print ref gethostbyaddr unshift fcntl syscall goto getnetbyaddr join gmtime symlink semget splice x|0 getpeername recv log setsockopt cos last reverse gethostbyname getgrnam study formline endhostent times chop length gethostent getnetent pack getprotoent getservbyname rand mkdir pos chmod y|0 substr endnetent printf next open msgsnd readdir use unlink getsockopt getpriority rindex wantarray hex system getservbyport endservent int chr untie rmdir prototype tell listen fork shmread ucfirst setprotoent else sysseek link getgrgid shmctl waitpid unpack getnetbyname reset chdir grep split require caller lcfirst until warn while values shift telldir getpwuid my getprotobynumber delete and sort uc defined srand accept package seekdir getprotobyname semop our rename seek if q|0 chroot sysread setpwent no crypt getc chown sqrt write setnetent setpriority foreach tie sin msgget map stat getlogin unless elsif truncate exec keys glob tied closedirioctl socket readlink eval xor readline binmode setservent eof ord bind alarm pipe atan2 getgrent exp time push setgrent gt lt or ne m|0 break given say state when",r={cN:"subst",b:"[$@]\\{",e:"\\}",k:t},s={b:"->{",e:"}"},n={cN:"variable",v:[{b:/\$\d/},{b:/[\$%@](\^\w\b|#\w+(::\w+)*|{\w+}|\w+(::\w*)*)/},{b:/[\$%@][^\s\w{]/,r:0}]},i=e.C("^(__END__|__DATA__)","\\n$",{r:5}),o=[e.BE,r,n],a=[n,e.HCM,i,e.C("^\\=\\w","\\=cut",{eW:!0}),s,{cN:"string",c:o,v:[{b:"q[qwxr]?\\s*\\(",e:"\\)",r:5},{b:"q[qwxr]?\\s*\\[",e:"\\]",r:5},{b:"q[qwxr]?\\s*\\{",e:"\\}",r:5},{b:"q[qwxr]?\\s*\\|",e:"\\|",r:5},{b:"q[qwxr]?\\s*\\<",e:"\\>",r:5},{b:"qw\\s+q",e:"q",r:5},{b:"'",e:"'",c:[e.BE]},{b:'"',e:'"'},{b:"`",e:"`",c:[e.BE]},{b:"{\\w+}",c:[],r:0},{b:"-?\\w+\\s*\\=\\>",c:[],r:0}]},{cN:"number",b:"(\\b0[0-7_]+)|(\\b0x[0-9a-fA-F_]+)|(\\b[1-9][0-9_]*(\\.[0-9_]+)?)|[0_]\\b",r:0},{b:"(\\/\\/|"+e.RSR+"|\\b(split|return|print|reverse|grep)\\b)\\s*",k:"split return print reverse grep",r:0,c:[e.HCM,i,{cN:"regexp",b:"(s|tr|y)/(\\\\.|[^/])*/(\\\\.|[^/])*/[a-z]*",r:10},{cN:"regexp",b:"(m|qr)?/",e:"/[a-z]*",c:[e.BE],r:0}]},{cN:"sub",bK:"sub",e:"(\\s*\\(.*?\\))?[;{]",r:5},{cN:"operator",b:"-\\w\\b",r:0}];return r.c=a,s.c=a,{aliases:["pl"],k:t,c:a}});hljs.registerLanguage("twig",function(e){var t={cN:"params",b:"\\(",e:"\\)"},a="attribute block constant cycle date dump include max min parent random range source template_from_string",r={cN:"function",bK:a,r:0,c:[t]},c={cN:"filter",b:/\|[A-Za-z_]+:?/,k:"abs batch capitalize convert_encoding date date_modify default escape first format join json_encode keys last length lower merge nl2br number_format raw replace reverse round slice sort split striptags title trim upper url_encode",c:[r]},n="autoescape block do embed extends filter flush for if import include macro sandbox set spaceless use verbatim";return n=n+" "+n.split(" ").map(function(e){return"end"+e}).join(" "),{aliases:["craftcms"],cI:!0,sL:"xml",subLanguageMode:"continuous",c:[e.C(/\{#/,/#}/),{cN:"template_tag",b:/\{%/,e:/%}/,k:n,c:[c,r]},{cN:"variable",b:/\{\{/,e:/}}/,c:[c,r]}]}});hljs.registerLanguage("livecodeserver",function(e){var r={cN:"variable",b:"\\b[gtps][A-Z]+[A-Za-z0-9_\\-]*\\b|\\$_[A-Z]+",r:0},t=[e.CBCM,e.HCM,e.C("--","$"),e.C("[^:]//","$")],a=e.inherit(e.TM,{v:[{b:"\\b_*rig[A-Z]+[A-Za-z0-9_\\-]*"},{b:"\\b_[a-z0-9\\-]+"}]}),o=e.inherit(e.TM,{b:"\\b([A-Za-z0-9_\\-]+)\\b"});return{cI:!1,k:{keyword:"$_COOKIE $_FILES $_GET $_GET_BINARY $_GET_RAW $_POST $_POST_BINARY $_POST_RAW $_SESSION $_SERVER codepoint codepoints segment segments codeunit codeunits sentence sentences trueWord trueWords paragraph after byte bytes english the until http forever descending using line real8 with seventh for stdout finally element word words fourth before black ninth sixth characters chars stderr uInt1 uInt1s uInt2 uInt2s stdin string lines relative rel any fifth items from middle mid at else of catch then third it file milliseconds seconds second secs sec int1 int1s int4 int4s internet int2 int2s normal text item last long detailed effective uInt4 uInt4s repeat end repeat URL in try into switch to words https token binfile each tenth as ticks tick system real4 by dateItems without char character ascending eighth whole dateTime numeric short first ftp integer abbreviated abbr abbrev private case while if",constant:"SIX TEN FORMFEED NINE ZERO NONE SPACE FOUR FALSE COLON CRLF PI COMMA ENDOFFILE EOF EIGHT FIVE QUOTE EMPTY ONE TRUE RETURN CR LINEFEED RIGHT BACKSLASH NULL SEVEN TAB THREE TWO six ten formfeed nine zero none space four false colon crlf pi comma endoffile eof eight five quote empty one true return cr linefeed right backslash null seven tab three two RIVERSION RISTATE FILE_READ_MODE FILE_WRITE_MODE FILE_WRITE_MODE DIR_WRITE_MODE FILE_READ_UMASK FILE_WRITE_UMASK DIR_READ_UMASK DIR_WRITE_UMASK",operator:"div mod wrap and or bitAnd bitNot bitOr bitXor among not in a an within contains ends with begins the keys of keys",built_in:"put abs acos aliasReference annuity arrayDecode arrayEncode asin atan atan2 average avg avgDev base64Decode base64Encode baseConvert binaryDecode binaryEncode byteOffset byteToNum cachedURL cachedURLs charToNum cipherNames codepointOffset codepointProperty codepointToNum codeunitOffset commandNames compound compress constantNames cos date dateFormat decompress directories diskSpace DNSServers exp exp1 exp2 exp10 extents files flushEvents folders format functionNames geometricMean global globals hasMemory harmonicMean hostAddress hostAddressToName hostName hostNameToAddress isNumber ISOToMac itemOffset keys len length libURLErrorData libUrlFormData libURLftpCommand libURLLastHTTPHeaders libURLLastRHHeaders libUrlMultipartFormAddPart libUrlMultipartFormData libURLVersion lineOffset ln ln1 localNames log log2 log10 longFilePath lower macToISO matchChunk matchText matrixMultiply max md5Digest median merge millisec millisecs millisecond milliseconds min monthNames nativeCharToNum normalizeText num number numToByte numToChar numToCodepoint numToNativeChar offset open openfiles openProcesses openProcessIDs openSockets paragraphOffset paramCount param params peerAddress pendingMessages platform popStdDev populationStandardDeviation populationVariance popVariance processID random randomBytes replaceText result revCreateXMLTree revCreateXMLTreeFromFile revCurrentRecord revCurrentRecordIsFirst revCurrentRecordIsLast revDatabaseColumnCount revDatabaseColumnIsNull revDatabaseColumnLengths revDatabaseColumnNames revDatabaseColumnNamed revDatabaseColumnNumbered revDatabaseColumnTypes revDatabaseConnectResult revDatabaseCursors revDatabaseID revDatabaseTableNames revDatabaseType revDataFromQuery revdb_closeCursor revdb_columnbynumber revdb_columncount revdb_columnisnull revdb_columnlengths revdb_columnnames revdb_columntypes revdb_commit revdb_connect revdb_connections revdb_connectionerr revdb_currentrecord revdb_cursorconnection revdb_cursorerr revdb_cursors revdb_dbtype revdb_disconnect revdb_execute revdb_iseof revdb_isbof revdb_movefirst revdb_movelast revdb_movenext revdb_moveprev revdb_query revdb_querylist revdb_recordcount revdb_rollback revdb_tablenames revGetDatabaseDriverPath revNumberOfRecords revOpenDatabase revOpenDatabases revQueryDatabase revQueryDatabaseBlob revQueryResult revQueryIsAtStart revQueryIsAtEnd revUnixFromMacPath revXMLAttribute revXMLAttributes revXMLAttributeValues revXMLChildContents revXMLChildNames revXMLCreateTreeFromFileWithNamespaces revXMLCreateTreeWithNamespaces revXMLDataFromXPathQuery revXMLEvaluateXPath revXMLFirstChild revXMLMatchingNode revXMLNextSibling revXMLNodeContents revXMLNumberOfChildren revXMLParent revXMLPreviousSibling revXMLRootNode revXMLRPC_CreateRequest revXMLRPC_Documents revXMLRPC_Error revXMLRPC_GetHost revXMLRPC_GetMethod revXMLRPC_GetParam revXMLText revXMLRPC_Execute revXMLRPC_GetParamCount revXMLRPC_GetParamNode revXMLRPC_GetParamType revXMLRPC_GetPath revXMLRPC_GetPort revXMLRPC_GetProtocol revXMLRPC_GetRequest revXMLRPC_GetResponse revXMLRPC_GetSocket revXMLTree revXMLTrees revXMLValidateDTD revZipDescribeItem revZipEnumerateItems revZipOpenArchives round sampVariance sec secs seconds sentenceOffset sha1Digest shell shortFilePath sin specialFolderPath sqrt standardDeviation statRound stdDev sum sysError systemVersion tan tempName textDecode textEncode tick ticks time to tokenOffset toLower toUpper transpose truewordOffset trunc uniDecode uniEncode upper URLDecode URLEncode URLStatus uuid value variableNames variance version waitDepth weekdayNames wordOffset xsltApplyStylesheet xsltApplyStylesheetFromFile xsltLoadStylesheet xsltLoadStylesheetFromFile add breakpoint cancel clear local variable file word line folder directory URL close socket process combine constant convert create new alias folder directory decrypt delete variable word line folder directory URL dispatch divide do encrypt filter get include intersect kill libURLDownloadToFile libURLFollowHttpRedirects libURLftpUpload libURLftpUploadFile libURLresetAll libUrlSetAuthCallback libURLSetCustomHTTPHeaders libUrlSetExpect100 libURLSetFTPListCommand libURLSetFTPMode libURLSetFTPStopTime libURLSetStatusCallback load multiply socket prepare process post seek rel relative read from process rename replace require resetAll resolve revAddXMLNode revAppendXML revCloseCursor revCloseDatabase revCommitDatabase revCopyFile revCopyFolder revCopyXMLNode revDeleteFolder revDeleteXMLNode revDeleteAllXMLTrees revDeleteXMLTree revExecuteSQL revGoURL revInsertXMLNode revMoveFolder revMoveToFirstRecord revMoveToLastRecord revMoveToNextRecord revMoveToPreviousRecord revMoveToRecord revMoveXMLNode revPutIntoXMLNode revRollBackDatabase revSetDatabaseDriverPath revSetXMLAttribute revXMLRPC_AddParam revXMLRPC_DeleteAllDocuments revXMLAddDTD revXMLRPC_Free revXMLRPC_FreeAll revXMLRPC_DeleteDocument revXMLRPC_DeleteParam revXMLRPC_SetHost revXMLRPC_SetMethod revXMLRPC_SetPort revXMLRPC_SetProtocol revXMLRPC_SetSocket revZipAddItemWithData revZipAddItemWithFile revZipAddUncompressedItemWithData revZipAddUncompressedItemWithFile revZipCancel revZipCloseArchive revZipDeleteItem revZipExtractItemToFile revZipExtractItemToVariable revZipSetProgressCallback revZipRenameItem revZipReplaceItemWithData revZipReplaceItemWithFile revZipOpenArchive send set sort split start stop subtract union unload wait write"},c:[r,{cN:"keyword",b:"\\bend\\sif\\b"},{cN:"function",bK:"function",e:"$",c:[r,o,e.ASM,e.QSM,e.BNM,e.CNM,a]},{cN:"function",bK:"end",e:"$",c:[o,a]},{cN:"command",bK:"command on",e:"$",c:[r,o,e.ASM,e.QSM,e.BNM,e.CNM,a]},{cN:"command",bK:"end",e:"$",c:[o,a]},{cN:"preprocessor",b:"<\\?rev|<\\?lc|<\\?livecode",r:10},{cN:"preprocessor",b:"<\\?"},{cN:"preprocessor",b:"\\?>"},e.ASM,e.QSM,e.BNM,e.CNM,a].concat(t),i:";$|^\\[|^="}});hljs.registerLanguage("step21",function(e){var r="[A-Z_][A-Z0-9_.]*",i="END-ISO-10303-21;",l={literal:"",built_in:"",keyword:"HEADER ENDSEC DATA"},s={cN:"preprocessor",b:"ISO-10303-21;",r:10},t=[e.CLCM,e.CBCM,e.C("/\\*\\*!","\\*/"),e.CNM,e.inherit(e.ASM,{i:null}),e.inherit(e.QSM,{i:null}),{cN:"string",b:"'",e:"'"},{cN:"label",v:[{b:"#",e:"\\d+",i:"\\W"}]}];return{aliases:["p21","step","stp"],cI:!0,l:r,k:l,c:[{cN:"preprocessor",b:i,r:10},s].concat(t)}});hljs.registerLanguage("cpp",function(t){var i={keyword:"false int float while private char catch export virtual operator sizeof dynamic_cast|10 typedef const_cast|10 const struct for static_cast|10 union namespace unsigned long volatile static protected bool template mutable if public friend do goto auto void enum else break extern using true class asm case typeid short reinterpret_cast|10 default double register explicit signed typename try this switch continue wchar_t inline delete alignof char16_t char32_t constexpr decltype noexcept nullptr static_assert thread_local restrict _Bool complex _Complex _Imaginary intmax_t uintmax_t int8_t uint8_t int16_t uint16_t int32_t uint32_t  int64_t uint64_t int_least8_t uint_least8_t int_least16_t uint_least16_t int_least32_t uint_least32_t int_least64_t uint_least64_t int_fast8_t uint_fast8_t int_fast16_t uint_fast16_t int_fast32_t uint_fast32_t int_fast64_t uint_fast64_t intptr_t uintptr_t atomic_bool atomic_char atomic_schar atomic_uchar atomic_short atomic_ushort atomic_int atomic_uint atomic_long atomic_ulong atomic_llong atomic_ullong atomic_wchar_t atomic_char16_t atomic_char32_t atomic_intmax_t atomic_uintmax_t atomic_intptr_t atomic_uintptr_t atomic_size_t atomic_ptrdiff_t atomic_int_least8_t atomic_int_least16_t atomic_int_least32_t atomic_int_least64_t atomic_uint_least8_t atomic_uint_least16_t atomic_uint_least32_t atomic_uint_least64_t atomic_int_fast8_t atomic_int_fast16_t atomic_int_fast32_t atomic_int_fast64_t atomic_uint_fast8_t atomic_uint_fast16_t atomic_uint_fast32_t atomic_uint_fast64_t",built_in:"std string cin cout cerr clog stringstream istringstream ostringstream auto_ptr deque list queue stack vector map set bitset multiset multimap unordered_set unordered_map unordered_multiset unordered_multimap array shared_ptr abort abs acos asin atan2 atan calloc ceil cosh cos exit exp fabs floor fmod fprintf fputs free frexp fscanf isalnum isalpha iscntrl isdigit isgraph islower isprint ispunct isspace isupper isxdigit tolower toupper labs ldexp log10 log malloc memchr memcmp memcpy memset modf pow printf putchar puts scanf sinh sin snprintf sprintf sqrt sscanf strcat strchr strcmp strcpy strcspn strlen strncat strncmp strncpy strpbrk strrchr strspn strstr tanh tan vfprintf vprintf vsprintf"};return{aliases:["c","cc","h","c++","h++","hpp"],k:i,i:"</",c:[t.CLCM,t.CBCM,t.QSM,{cN:"string",b:"'\\\\?.",e:"'",i:"."},{cN:"number",b:"\\b(\\d+(\\.\\d*)?|\\.\\d+)(u|U|l|L|ul|UL|f|F)"},t.CNM,{cN:"preprocessor",b:"#",e:"$",k:"if else elif endif define undef warning error line pragma",c:[{b:/\\\n/,r:0},{b:'include\\s*[<"]',e:'[>"]',k:"include",i:"\\n"},t.CLCM]},{b:"\\b(deque|list|queue|stack|vector|map|set|bitset|multiset|multimap|unordered_map|unordered_set|unordered_multiset|unordered_multimap|array)\\s*<",e:">",k:i,c:["self"]},{b:t.IR+"::",k:i},{bK:"new throw return else",r:0},{cN:"function",b:"("+t.IR+"\\s+)+"+t.IR+"\\s*\\(",rB:!0,e:/[{;=]/,eE:!0,k:i,c:[{b:t.IR+"\\s*\\(",rB:!0,c:[t.TM],r:0},{cN:"params",b:/\(/,e:/\)/,k:i,r:0,c:[t.CBCM]},t.CLCM,t.CBCM]}]}});hljs.registerLanguage("vala",function(e){return{k:{keyword:"char uchar unichar int uint long ulong short ushort int8 int16 int32 int64 uint8 uint16 uint32 uint64 float double bool struct enum string void weak unowned owned async signal static abstract interface override while do for foreach else switch case break default return try catch public private protected internal using new this get set const stdout stdin stderr var",built_in:"DBus GLib CCode Gee Object",literal:"false true null"},c:[{cN:"class",bK:"class interface delegate namespace",e:"{",eE:!0,i:"[^,:\\n\\s\\.]",c:[e.UTM]},e.CLCM,e.CBCM,{cN:"string",b:'"""',e:'"""',r:5},e.ASM,e.QSM,e.CNM,{cN:"preprocessor",b:"^#",e:"$",r:2},{cN:"constant",b:" [A-Z_]+ ",r:0}]}});hljs.registerLanguage("http",function(t){return{aliases:["https"],i:"\\S",c:[{cN:"status",b:"^HTTP/[0-9\\.]+",e:"$",c:[{cN:"number",b:"\\b\\d{3}\\b"}]},{cN:"request",b:"^[A-Z]+ (.*?) HTTP/[0-9\\.]+$",rB:!0,e:"$",c:[{cN:"string",b:" ",e:" ",eB:!0,eE:!0}]},{cN:"attribute",b:"^\\w",e:": ",eE:!0,i:"\\n|\\s|=",starts:{cN:"string",e:"$"}},{b:"\\n\\n",starts:{sL:"",eW:!0}}]}});hljs.registerLanguage("avrasm",function(r){return{cI:!0,l:"\\.?"+r.IR,k:{keyword:"adc add adiw and andi asr bclr bld brbc brbs brcc brcs break breq brge brhc brhs brid brie brlo brlt brmi brne brpl brsh brtc brts brvc brvs bset bst call cbi cbr clc clh cli cln clr cls clt clv clz com cp cpc cpi cpse dec eicall eijmp elpm eor fmul fmuls fmulsu icall ijmp in inc jmp ld ldd ldi lds lpm lsl lsr mov movw mul muls mulsu neg nop or ori out pop push rcall ret reti rjmp rol ror sbc sbr sbrc sbrs sec seh sbi sbci sbic sbis sbiw sei sen ser ses set sev sez sleep spm st std sts sub subi swap tst wdr",built_in:"r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 x|0 xh xl y|0 yh yl z|0 zh zl ucsr1c udr1 ucsr1a ucsr1b ubrr1l ubrr1h ucsr0c ubrr0h tccr3c tccr3a tccr3b tcnt3h tcnt3l ocr3ah ocr3al ocr3bh ocr3bl ocr3ch ocr3cl icr3h icr3l etimsk etifr tccr1c ocr1ch ocr1cl twcr twdr twar twsr twbr osccal xmcra xmcrb eicra spmcsr spmcr portg ddrg ping portf ddrf sreg sph spl xdiv rampz eicrb eimsk gimsk gicr eifr gifr timsk tifr mcucr mcucsr tccr0 tcnt0 ocr0 assr tccr1a tccr1b tcnt1h tcnt1l ocr1ah ocr1al ocr1bh ocr1bl icr1h icr1l tccr2 tcnt2 ocr2 ocdr wdtcr sfior eearh eearl eedr eecr porta ddra pina portb ddrb pinb portc ddrc pinc portd ddrd pind spdr spsr spcr udr0 ucsr0a ucsr0b ubrr0l acsr admux adcsr adch adcl porte ddre pine pinf",preprocessor:".byte .cseg .db .def .device .dseg .dw .endmacro .equ .eseg .exit .include .list .listmac .macro .nolist .org .set"},c:[r.CBCM,r.C(";","$",{r:0}),r.CNM,r.BNM,{cN:"number",b:"\\b(\\$[a-zA-Z0-9]+|0o[0-7]+)"},r.QSM,{cN:"string",b:"'",e:"[^\\\\]'",i:"[^\\\\][^']"},{cN:"label",b:"^[A-Za-z0-9_.$]+:"},{cN:"preprocessor",b:"#",e:"$"},{cN:"localvars",b:"@[0-9]+"}]}});hljs.registerLanguage("aspectj",function(e){var t="false synchronized int abstract float private char boolean static null if const for true while long throw strictfp finally protected import native final return void enum else extends implements break transient new catch instanceof byte super volatile case assert short package default double public try this switch continue throws privileged aspectOf adviceexecution proceed cflowbelow cflow initialization preinitialization staticinitialization withincode target within execution getWithinTypeName handler thisJoinPoint thisJoinPointStaticPart thisEnclosingJoinPointStaticPart declare parents warning error soft precedence thisAspectInstance",i="get set args call";return{k:t,i:/<\//,c:[{cN:"javadoc",b:"/\\*\\*",e:"\\*/",r:0,c:[{cN:"javadoctag",b:"(^|\\s)@[A-Za-z]+"}]},e.CLCM,e.CBCM,e.ASM,e.QSM,{cN:"aspect",bK:"aspect",e:/[{;=]/,eE:!0,i:/[:;"\[\]]/,c:[{bK:"extends implements pertypewithin perthis pertarget percflowbelow percflow issingleton"},e.UTM,{b:/\([^\)]*/,e:/[)]+/,k:t+" "+i,eE:!1}]},{cN:"class",bK:"class interface",e:/[{;=]/,eE:!0,r:0,k:"class interface",i:/[:"\[\]]/,c:[{bK:"extends implements"},e.UTM]},{bK:"pointcut after before around throwing returning",e:/[)]/,eE:!1,i:/["\[\]]/,c:[{b:e.UIR+"\\s*\\(",rB:!0,c:[e.UTM]}]},{b:/[:]/,rB:!0,e:/[{;]/,r:0,eE:!1,k:t,i:/["\[\]]/,c:[{b:e.UIR+"\\s*\\(",k:t+" "+i},e.QSM]},{bK:"new throw",r:0},{cN:"function",b:/\w+ +\w+(\.)?\w+\s*\([^\)]*\)\s*((throws)[\w\s,]+)?[\{;]/,rB:!0,e:/[{;=]/,k:t,eE:!0,c:[{b:e.UIR+"\\s*\\(",rB:!0,r:0,c:[e.UTM]},{cN:"params",b:/\(/,e:/\)/,r:0,k:t,c:[e.ASM,e.QSM,e.CNM,e.CBCM]},e.CLCM,e.CBCM]},e.CNM,{cN:"annotation",b:"@[A-Za-z]+"}]}});hljs.registerLanguage("rib",function(e){return{k:"ArchiveRecord AreaLightSource Atmosphere Attribute AttributeBegin AttributeEnd Basis Begin Blobby Bound Clipping ClippingPlane Color ColorSamples ConcatTransform Cone CoordinateSystem CoordSysTransform CropWindow Curves Cylinder DepthOfField Detail DetailRange Disk Displacement Display End ErrorHandler Exposure Exterior Format FrameAspectRatio FrameBegin FrameEnd GeneralPolygon GeometricApproximation Geometry Hider Hyperboloid Identity Illuminate Imager Interior LightSource MakeCubeFaceEnvironment MakeLatLongEnvironment MakeShadow MakeTexture Matte MotionBegin MotionEnd NuPatch ObjectBegin ObjectEnd ObjectInstance Opacity Option Orientation Paraboloid Patch PatchMesh Perspective PixelFilter PixelSamples PixelVariance Points PointsGeneralPolygons PointsPolygons Polygon Procedural Projection Quantize ReadArchive RelativeDetail ReverseOrientation Rotate Scale ScreenWindow ShadingInterpolation ShadingRate Shutter Sides Skew SolidBegin SolidEnd Sphere SubdivisionMesh Surface TextureCoordinates Torus Transform TransformBegin TransformEnd TransformPoints Translate TrimCurve WorldBegin WorldEnd",i:"</",c:[e.HCM,e.CNM,e.ASM,e.QSM]}});hljs.registerLanguage("python",function(e){var r={cN:"prompt",b:/^(>>>|\.\.\.) /},b={cN:"string",c:[e.BE],v:[{b:/(u|b)?r?'''/,e:/'''/,c:[r],r:10},{b:/(u|b)?r?"""/,e:/"""/,c:[r],r:10},{b:/(u|r|ur)'/,e:/'/,r:10},{b:/(u|r|ur)"/,e:/"/,r:10},{b:/(b|br)'/,e:/'/},{b:/(b|br)"/,e:/"/},e.ASM,e.QSM]},l={cN:"number",r:0,v:[{b:e.BNR+"[lLjJ]?"},{b:"\\b(0o[0-7]+)[lLjJ]?"},{b:e.CNR+"[lLjJ]?"}]},c={cN:"params",b:/\(/,e:/\)/,c:["self",r,l,b]};return{aliases:["py","gyp"],k:{keyword:"and elif is global as in if from raise for except finally print import pass return exec else break not with class assert yield try while continue del or def lambda nonlocal|10 None True False",built_in:"Ellipsis NotImplemented"},i:/(<\/|->|\?)/,c:[r,l,b,e.HCM,{v:[{cN:"function",bK:"def",r:10},{cN:"class",bK:"class"}],e:/:/,i:/[${=;\n,]/,c:[e.UTM,c]},{cN:"decorator",b:/@/,e:/$/},{b:/\b(print|exec)\(/}]}});hljs.registerLanguage("axapta",function(e){return{k:"false int abstract private char boolean static null if for true while long throw finally protected final return void enum else break new catch byte super case short default double public try this switch continue reverse firstfast firstonly forupdate nofetch sum avg minof maxof count order group by asc desc index hint like dispaly edit client server ttsbegin ttscommit str real date container anytype common div mod",c:[e.CLCM,e.CBCM,e.ASM,e.QSM,e.CNM,{cN:"preprocessor",b:"#",e:"$"},{cN:"class",bK:"class interface",e:"{",eE:!0,i:":",c:[{bK:"extends implements"},e.UTM]}]}});hljs.registerLanguage("nix",function(e){var t={keyword:"rec with let in inherit assert if else then",constant:"true false or and null",built_in:"import abort baseNameOf dirOf isNull builtins map removeAttrs throw toString derivation"},i={cN:"subst",b:/\$\{/,e:/}/,k:t},r={cN:"variable",b:/[a-zA-Z0-9-_]+(\s*=)/},n={cN:"string",b:"''",e:"''",c:[i]},s={cN:"string",b:'"',e:'"',c:[i]},a=[e.NM,e.HCM,e.CBCM,n,s,r];return i.c=a,{aliases:["nixos"],k:t,c:a}});hljs.registerLanguage("diff",function(e){return{aliases:["patch"],c:[{cN:"chunk",r:10,v:[{b:/^@@ +\-\d+,\d+ +\+\d+,\d+ +@@$/},{b:/^\*\*\* +\d+,\d+ +\*\*\*\*$/},{b:/^\-\-\- +\d+,\d+ +\-\-\-\-$/}]},{cN:"header",v:[{b:/Index: /,e:/$/},{b:/=====/,e:/=====$/},{b:/^\-\-\-/,e:/$/},{b:/^\*{3} /,e:/$/},{b:/^\+\+\+/,e:/$/},{b:/\*{5}/,e:/\*{5}$/}]},{cN:"addition",b:"^\\+",e:"$"},{cN:"deletion",b:"^\\-",e:"$"},{cN:"change",b:"^\\!",e:"$"}]}});hljs.registerLanguage("parser3",function(r){var e=r.C("{","}",{c:["self"]});return{sL:"xml",r:0,c:[r.C("^#","$"),r.C("\\^rem{","}",{r:10,c:[e]}),{cN:"preprocessor",b:"^@(?:BASE|USE|CLASS|OPTIONS)$",r:10},{cN:"title",b:"@[\\w\\-]+\\[[\\w^;\\-]*\\](?:\\[[\\w^;\\-]*\\])?(?:.*)$"},{cN:"variable",b:"\\$\\{?[\\w\\-\\.\\:]+\\}?"},{cN:"keyword",b:"\\^[\\w\\-\\.\\:]+"},{cN:"number",b:"\\^#[0-9a-fA-F]+"},r.CNM]}});hljs.registerLanguage("django",function(e){var t={cN:"filter",b:/\|[A-Za-z]+:?/,k:"truncatewords removetags linebreaksbr yesno get_digit timesince random striptags filesizeformat escape linebreaks length_is ljust rjust cut urlize fix_ampersands title floatformat capfirst pprint divisibleby add make_list unordered_list urlencode timeuntil urlizetrunc wordcount stringformat linenumbers slice date dictsort dictsortreversed default_if_none pluralize lower join center default truncatewords_html upper length phone2numeric wordwrap time addslashes slugify first escapejs force_escape iriencode last safe safeseq truncatechars localize unlocalize localtime utc timezone",c:[{cN:"argument",b:/"/,e:/"/},{cN:"argument",b:/'/,e:/'/}]};return{aliases:["jinja"],cI:!0,sL:"xml",subLanguageMode:"continuous",c:[e.C(/\{%\s*comment\s*%}/,/\{%\s*endcomment\s*%}/),e.C(/\{#/,/#}/),{cN:"template_tag",b:/\{%/,e:/%}/,k:"comment endcomment load templatetag ifchanged endifchanged if endif firstof for endfor in ifnotequal endifnotequal widthratio extends include spaceless endspaceless regroup by as ifequal endifequal ssi now with cycle url filter endfilter debug block endblock else autoescape endautoescape csrf_token empty elif endwith static trans blocktrans endblocktrans get_static_prefix get_media_prefix plural get_current_language language get_available_languages get_current_language_bidi get_language_info get_language_info_list localize endlocalize localtime endlocaltime timezone endtimezone get_current_timezone verbatim",c:[t]},{cN:"variable",b:/\{\{/,e:/}}/,c:[t]}]}});hljs.registerLanguage("rust",function(e){var t=e.inherit(e.CBCM);return t.c.push("self"),{aliases:["rs"],k:{keyword:"alignof as be box break const continue crate do else enum extern false fn for if impl in let loop match mod mut offsetof once priv proc pub pure ref return self sizeof static struct super trait true type typeof unsafe unsized use virtual while yield int i8 i16 i32 i64 uint u8 u32 u64 float f32 f64 str char bool",built_in:"assert! assert_eq! bitflags! bytes! cfg! col! concat! concat_idents! debug_assert! debug_assert_eq! env! panic! file! format! format_args! include_bin! include_str! line! local_data_key! module_path! option_env! print! println! select! stringify! try! unimplemented! unreachable! vec! write! writeln!"},l:e.IR+"!?",i:"</",c:[e.CLCM,t,e.inherit(e.QSM,{i:null}),{cN:"string",b:/r(#*)".*?"\1(?!#)/},{cN:"string",b:/'\\?(x\w{2}|u\w{4}|U\w{8}|.)'/},{b:/'[a-zA-Z_][a-zA-Z0-9_]*/},{cN:"number",b:/\b(0[xbo][A-Fa-f0-9_]+|\d[\d_]*(\.[0-9_]+)?([eE][+-]?[0-9_]+)?)([uif](8|16|32|64|size))?/,r:0},{cN:"function",bK:"fn",e:"(\\(|<)",eE:!0,c:[e.UTM]},{cN:"preprocessor",b:"#\\!?\\[",e:"\\]"},{bK:"type",e:"(=|<)",c:[e.UTM],i:"\\S"},{bK:"trait enum",e:"({|<)",c:[e.UTM],i:"\\S"},{b:e.IR+"::"},{b:"->"}]}});hljs.registerLanguage("vhdl",function(e){var t="\\d(_|\\d)*",r="[eE][-+]?"+t,n=t+"(\\."+t+")?("+r+")?",o="\\w+",i=t+"#"+o+"(\\."+o+")?#("+r+")?",a="\\b("+i+"|"+n+")";return{cI:!0,k:{keyword:"abs access after alias all and architecture array assert attribute begin block body buffer bus case component configuration constant context cover disconnect downto default else elsif end entity exit fairness file for force function generate generic group guarded if impure in inertial inout is label library linkage literal loop map mod nand new next nor not null of on open or others out package port postponed procedure process property protected pure range record register reject release rem report restrict restrict_guarantee return rol ror select sequence severity shared signal sla sll sra srl strong subtype then to transport type unaffected units until use variable vmode vprop vunit wait when while with xnor xor",typename:"boolean bit character severity_level integer time delay_length natural positive string bit_vector file_open_kind file_open_status std_ulogic std_ulogic_vector std_logic std_logic_vector unsigned signed boolean_vector integer_vector real_vector time_vector"},i:"{",c:[e.CBCM,e.C("--","$"),e.QSM,{cN:"number",b:a,r:0},{cN:"literal",b:"'(U|X|0|1|Z|W|L|H|-)'",c:[e.BE]},{cN:"attribute",b:"'[A-Za-z](_?[A-Za-z0-9])*",c:[e.BE]}]}});hljs.registerLanguage("ocaml",function(e){return{aliases:["ml"],k:{keyword:"and as assert asr begin class constraint do done downto else end exception external for fun function functor if in include inherit! inherit initializer land lazy let lor lsl lsr lxor match method!|10 method mod module mutable new object of open! open or private rec sig struct then to try type val! val virtual when while with parser value",built_in:"array bool bytes char exn|5 float int int32 int64 list lazy_t|5 nativeint|5 string unit in_channel out_channel ref",literal:"true false"},i:/\/\/|>>/,l:"[a-z_]\\w*!?",c:[{cN:"literal",b:"\\[(\\|\\|)?\\]|\\(\\)"},e.C("\\(\\*","\\*\\)",{c:["self"]}),{cN:"symbol",b:"'[A-Za-z_](?!')[\\w']*"},{cN:"tag",b:"`[A-Z][\\w']*"},{cN:"type",b:"\\b[A-Z][\\w']*",r:0},{b:"[a-z_]\\w*'[\\w']*"},e.inherit(e.ASM,{cN:"char",r:0}),e.inherit(e.QSM,{i:null}),{cN:"number",b:"\\b(0[xX][a-fA-F0-9_]+[Lln]?|0[oO][0-7_]+[Lln]?|0[bB][01_]+[Lln]?|[0-9][0-9_]*([Lln]|(\\.[0-9_]*)?([eE][-+]?[0-9_]+)?)?)",r:0},{b:/[-=]>/}]}});hljs.registerLanguage("cmake",function(e){return{aliases:["cmake.in"],cI:!0,k:{keyword:"add_custom_command add_custom_target add_definitions add_dependencies add_executable add_library add_subdirectory add_test aux_source_directory break build_command cmake_minimum_required cmake_policy configure_file create_test_sourcelist define_property else elseif enable_language enable_testing endforeach endfunction endif endmacro endwhile execute_process export find_file find_library find_package find_path find_program fltk_wrap_ui foreach function get_cmake_property get_directory_property get_filename_component get_property get_source_file_property get_target_property get_test_property if include include_directories include_external_msproject include_regular_expression install link_directories load_cache load_command macro mark_as_advanced message option output_required_files project qt_wrap_cpp qt_wrap_ui remove_definitions return separate_arguments set set_directory_properties set_property set_source_files_properties set_target_properties set_tests_properties site_name source_group string target_link_libraries try_compile try_run unset variable_watch while build_name exec_program export_library_dependencies install_files install_programs install_targets link_libraries make_directory remove subdir_depends subdirs use_mangled_mesa utility_source variable_requires write_file qt5_use_modules qt5_use_package qt5_wrap_cpp on off true false and or",operator:"equal less greater strless strgreater strequal matches"},c:[{cN:"envvar",b:"\\${",e:"}"},e.HCM,e.QSM,e.NM]}});hljs.registerLanguage("1c",function(c){var e="[a-zA-Zа-яА-Я][a-zA-Z0-9_а-яА-Я]*",r="возврат дата для если и или иначе иначеесли исключение конецесли конецпопытки конецпроцедуры конецфункции конеццикла константа не перейти перем перечисление по пока попытка прервать продолжить процедура строка тогда фс функция цикл число экспорт",t="ansitooem oemtoansi ввестивидсубконто ввестидату ввестизначение ввестиперечисление ввестипериод ввестиплансчетов ввестистроку ввестичисло вопрос восстановитьзначение врег выбранныйплансчетов вызватьисключение датагод датамесяц датачисло добавитьмесяц завершитьработусистемы заголовоксистемы записьжурналарегистрации запуститьприложение зафиксироватьтранзакцию значениевстроку значениевстрокувнутр значениевфайл значениеизстроки значениеизстрокивнутр значениеизфайла имякомпьютера имяпользователя каталогвременныхфайлов каталогиб каталогпользователя каталогпрограммы кодсимв командасистемы конгода конецпериодаби конецрассчитанногопериодаби конецстандартногоинтервала конквартала конмесяца коннедели лев лог лог10 макс максимальноеколичествосубконто мин монопольныйрежим названиеинтерфейса названиенабораправ назначитьвид назначитьсчет найти найтипомеченныенаудаление найтиссылки началопериодаби началостандартногоинтервала начатьтранзакцию начгода начквартала начмесяца начнедели номерднягода номерднянедели номернеделигода нрег обработкаожидания окр описаниеошибки основнойжурналрасчетов основнойплансчетов основнойязык открытьформу открытьформумодально отменитьтранзакцию очиститьокносообщений периодстр полноеимяпользователя получитьвремята получитьдатута получитьдокументта получитьзначенияотбора получитьпозициюта получитьпустоезначение получитьта прав праводоступа предупреждение префиксавтонумерации пустаястрока пустоезначение рабочаядаттьпустоезначение рабочаядата разделительстраниц разделительстрок разм разобратьпозициюдокумента рассчитатьрегистрына рассчитатьрегистрыпо сигнал симв символтабуляции создатьобъект сокрл сокрлп сокрп сообщить состояние сохранитьзначение сред статусвозврата стрдлина стрзаменить стрколичествострок стрполучитьстроку  стрчисловхождений сформироватьпозициюдокумента счетпокоду текущаядата текущеевремя типзначения типзначениястр удалитьобъекты установитьтана установитьтапо фиксшаблон формат цел шаблон",i={cN:"dquote",b:'""'},n={cN:"string",b:'"',e:'"|$',c:[i]},a={cN:"string",b:"\\|",e:'"|$',c:[i]};return{cI:!0,l:e,k:{keyword:r,built_in:t},c:[c.CLCM,c.NM,n,a,{cN:"function",b:"(процедура|функция)",e:"$",l:e,k:"процедура функция",c:[c.inherit(c.TM,{b:e}),{cN:"tail",eW:!0,c:[{cN:"params",b:"\\(",e:"\\)",l:e,k:"знач",c:[n,a]},{cN:"export",b:"экспорт",eW:!0,l:e,k:"экспорт",c:[c.CLCM]}]},c.CLCM]},{cN:"preprocessor",b:"#",e:"$"},{cN:"date",b:"'\\d{2}\\.\\d{2}\\.(\\d{2}|\\d{4})'"}]}});hljs.registerLanguage("tcl",function(e){return{aliases:["tk"],k:"after append apply array auto_execok auto_import auto_load auto_mkindex auto_mkindex_old auto_qualify auto_reset bgerror binary break catch cd chan clock close concat continue dde dict encoding eof error eval exec exit expr fblocked fconfigure fcopy file fileevent filename flush for foreach format gets glob global history http if incr info interp join lappend|10 lassign|10 lindex|10 linsert|10 list llength|10 load lrange|10 lrepeat|10 lreplace|10 lreverse|10 lsearch|10 lset|10 lsort|10 mathfunc mathop memory msgcat namespace open package parray pid pkg::create pkg_mkIndex platform platform::shell proc puts pwd read refchan regexp registry regsub|10 rename return safe scan seek set socket source split string subst switch tcl_endOfWord tcl_findLibrary tcl_startOfNextWord tcl_startOfPreviousWord tcl_wordBreakAfter tcl_wordBreakBefore tcltest tclvars tell time tm trace unknown unload unset update uplevel upvar variable vwait while",c:[e.C(";[ \\t]*#","$"),e.C("^[ \\t]*#","$"),{bK:"proc",e:"[\\{]",eE:!0,c:[{cN:"symbol",b:"[ \\t\\n\\r]+(::)?[a-zA-Z_]((::)?[a-zA-Z0-9_])*",e:"[ \\t\\n\\r]",eW:!0,eE:!0}]},{cN:"variable",eE:!0,v:[{b:"\\$(\\{)?(::)?[a-zA-Z_]((::)?[a-zA-Z0-9_])*\\(([a-zA-Z0-9_])*\\)",e:"[^a-zA-Z0-9_\\}\\$]"},{b:"\\$(\\{)?(::)?[a-zA-Z_]((::)?[a-zA-Z0-9_])*",e:"(\\))?[^a-zA-Z0-9_\\}\\$]"}]},{cN:"string",c:[e.BE],v:[e.inherit(e.ASM,{i:null}),e.inherit(e.QSM,{i:null})]},{cN:"number",v:[e.BNM,e.CNM]}]}});hljs.registerLanguage("groovy",function(e){return{k:{typename:"byte short char int long boolean float double void",literal:"true false null",keyword:"def as in assert trait super this abstract static volatile transient public private protected synchronized final class interface enum if else for while switch case break default continue throw throws try catch finally implements extends new import package return instanceof"},c:[e.CLCM,{cN:"javadoc",b:"/\\*\\*",e:"\\*//*",r:0,c:[{cN:"javadoctag",b:"(^|\\s)@[A-Za-z]+"}]},e.CBCM,{cN:"string",b:'"""',e:'"""'},{cN:"string",b:"'''",e:"'''"},{cN:"string",b:"\\$/",e:"/\\$",r:10},e.ASM,{cN:"regexp",b:/~?\/[^\/\n]+\//,c:[e.BE]},e.QSM,{cN:"shebang",b:"^#!/usr/bin/env",e:"$",i:"\n"},e.BNM,{cN:"class",bK:"class interface trait enum",e:"{",i:":",c:[{bK:"extends implements"},e.UTM]},e.CNM,{cN:"annotation",b:"@[A-Za-z]+"},{cN:"string",b:/[^\?]{0}[A-Za-z0-9_$]+ *:/},{b:/\?/,e:/\:/},{cN:"label",b:"^\\s*[A-Za-z0-9_$]+:",r:0}]}});hljs.registerLanguage("erlang-repl",function(r){return{k:{special_functions:"spawn spawn_link self",reserved:"after and andalso|10 band begin bnot bor bsl bsr bxor case catch cond div end fun if let not of or orelse|10 query receive rem try when xor"},c:[{cN:"prompt",b:"^[0-9]+> ",r:10},r.C("%","$"),{cN:"number",b:"\\b(\\d+#[a-fA-F0-9]+|\\d+(\\.\\d+)?([eE][-+]?\\d+)?)",r:0},r.ASM,r.QSM,{cN:"constant",b:"\\?(::)?([A-Z]\\w*(::)?)+"},{cN:"arrow",b:"->"},{cN:"ok",b:"ok"},{cN:"exclamation_mark",b:"!"},{cN:"function_or_atom",b:"(\\b[a-z'][a-zA-Z0-9_']*:[a-z'][a-zA-Z0-9_']*)|(\\b[a-z'][a-zA-Z0-9_']*)",r:0},{cN:"variable",b:"[A-Z][a-zA-Z0-9_']*",r:0}]}});hljs.registerLanguage("nginx",function(e){var r={cN:"variable",v:[{b:/\$\d+/},{b:/\$\{/,e:/}/},{b:"[\\$\\@]"+e.UIR}]},b={eW:!0,l:"[a-z/_]+",k:{built_in:"on off yes no true false none blocked debug info notice warn error crit select break last permanent redirect kqueue rtsig epoll poll /dev/poll"},r:0,i:"=>",c:[e.HCM,{cN:"string",c:[e.BE,r],v:[{b:/"/,e:/"/},{b:/'/,e:/'/}]},{cN:"url",b:"([a-z]+):/",e:"\\s",eW:!0,eE:!0,c:[r]},{cN:"regexp",c:[e.BE,r],v:[{b:"\\s\\^",e:"\\s|{|;",rE:!0},{b:"~\\*?\\s+",e:"\\s|{|;",rE:!0},{b:"\\*(\\.[a-z\\-]+)+"},{b:"([a-z\\-]+\\.)+\\*"}]},{cN:"number",b:"\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}(:\\d{1,5})?\\b"},{cN:"number",b:"\\b\\d+[kKmMgGdshdwy]*\\b",r:0},r]};return{aliases:["nginxconf"],c:[e.HCM,{b:e.UIR+"\\s",e:";|{",rB:!0,c:[{cN:"title",b:e.UIR,starts:b}],r:0}],i:"[^\\s\\}]"}});hljs.registerLanguage("mathematica",function(e){return{aliases:["mma"],l:"(\\$|\\b)"+e.IR+"\\b",k:"AbelianGroup Abort AbortKernels AbortProtect Above Abs Absolute AbsoluteCorrelation AbsoluteCorrelationFunction AbsoluteCurrentValue AbsoluteDashing AbsoluteFileName AbsoluteOptions AbsolutePointSize AbsoluteThickness AbsoluteTime AbsoluteTiming AccountingForm Accumulate Accuracy AccuracyGoal ActionDelay ActionMenu ActionMenuBox ActionMenuBoxOptions Active ActiveItem ActiveStyle AcyclicGraphQ AddOnHelpPath AddTo AdjacencyGraph AdjacencyList AdjacencyMatrix AdjustmentBox AdjustmentBoxOptions AdjustTimeSeriesForecast AffineTransform After AiryAi AiryAiPrime AiryAiZero AiryBi AiryBiPrime AiryBiZero AlgebraicIntegerQ AlgebraicNumber AlgebraicNumberDenominator AlgebraicNumberNorm AlgebraicNumberPolynomial AlgebraicNumberTrace AlgebraicRules AlgebraicRulesData Algebraics AlgebraicUnitQ Alignment AlignmentMarker AlignmentPoint All AllowedDimensions AllowGroupClose AllowInlineCells AllowKernelInitialization AllowReverseGroupClose AllowScriptLevelChange AlphaChannel AlternatingGroup AlternativeHypothesis Alternatives AmbientLight Analytic AnchoredSearch And AndersonDarlingTest AngerJ AngleBracket AngularGauge Animate AnimationCycleOffset AnimationCycleRepetitions AnimationDirection AnimationDisplayTime AnimationRate AnimationRepetitions AnimationRunning Animator AnimatorBox AnimatorBoxOptions AnimatorElements Annotation Annuity AnnuityDue Antialiasing Antisymmetric Apart ApartSquareFree Appearance AppearanceElements AppellF1 Append AppendTo Apply ArcCos ArcCosh ArcCot ArcCoth ArcCsc ArcCsch ArcSec ArcSech ArcSin ArcSinDistribution ArcSinh ArcTan ArcTanh Arg ArgMax ArgMin ArgumentCountQ ARIMAProcess ArithmeticGeometricMean ARMAProcess ARProcess Array ArrayComponents ArrayDepth ArrayFlatten ArrayPad ArrayPlot ArrayQ ArrayReshape ArrayRules Arrays Arrow Arrow3DBox ArrowBox Arrowheads AspectRatio AspectRatioFixed Assert Assuming Assumptions AstronomicalData Asynchronous AsynchronousTaskObject AsynchronousTasks AtomQ Attributes AugmentedSymmetricPolynomial AutoAction AutoDelete AutoEvaluateEvents AutoGeneratedPackage AutoIndent AutoIndentSpacings AutoItalicWords AutoloadPath AutoMatch Automatic AutomaticImageSize AutoMultiplicationSymbol AutoNumberFormatting AutoOpenNotebooks AutoOpenPalettes AutorunSequencing AutoScaling AutoScroll AutoSpacing AutoStyleOptions AutoStyleWords Axes AxesEdge AxesLabel AxesOrigin AxesStyle Axis BabyMonsterGroupB Back Background BackgroundTasksSettings Backslash Backsubstitution Backward Band BandpassFilter BandstopFilter BarabasiAlbertGraphDistribution BarChart BarChart3D BarLegend BarlowProschanImportance BarnesG BarOrigin BarSpacing BartlettHannWindow BartlettWindow BaseForm Baseline BaselinePosition BaseStyle BatesDistribution BattleLemarieWavelet Because BeckmannDistribution Beep Before Begin BeginDialogPacket BeginFrontEndInteractionPacket BeginPackage BellB BellY Below BenfordDistribution BeniniDistribution BenktanderGibratDistribution BenktanderWeibullDistribution BernoulliB BernoulliDistribution BernoulliGraphDistribution BernoulliProcess BernsteinBasis BesselFilterModel BesselI BesselJ BesselJZero BesselK BesselY BesselYZero Beta BetaBinomialDistribution BetaDistribution BetaNegativeBinomialDistribution BetaPrimeDistribution BetaRegularized BetweennessCentrality BezierCurve BezierCurve3DBox BezierCurve3DBoxOptions BezierCurveBox BezierCurveBoxOptions BezierFunction BilateralFilter Binarize BinaryFormat BinaryImageQ BinaryRead BinaryReadList BinaryWrite BinCounts BinLists Binomial BinomialDistribution BinomialProcess BinormalDistribution BiorthogonalSplineWavelet BipartiteGraphQ BirnbaumImportance BirnbaumSaundersDistribution BitAnd BitClear BitGet BitLength BitNot BitOr BitSet BitShiftLeft BitShiftRight BitXor Black BlackmanHarrisWindow BlackmanNuttallWindow BlackmanWindow Blank BlankForm BlankNullSequence BlankSequence Blend Block BlockRandom BlomqvistBeta BlomqvistBetaTest Blue Blur BodePlot BohmanWindow Bold Bookmarks Boole BooleanConsecutiveFunction BooleanConvert BooleanCountingFunction BooleanFunction BooleanGraph BooleanMaxterms BooleanMinimize BooleanMinterms Booleans BooleanTable BooleanVariables BorderDimensions BorelTannerDistribution Bottom BottomHatTransform BoundaryStyle Bounds Box BoxBaselineShift BoxData BoxDimensions Boxed Boxes BoxForm BoxFormFormatTypes BoxFrame BoxID BoxMargins BoxMatrix BoxRatios BoxRotation BoxRotationPoint BoxStyle BoxWhiskerChart Bra BracketingBar BraKet BrayCurtisDistance BreadthFirstScan Break Brown BrownForsytheTest BrownianBridgeProcess BrowserCategory BSplineBasis BSplineCurve BSplineCurve3DBox BSplineCurveBox BSplineCurveBoxOptions BSplineFunction BSplineSurface BSplineSurface3DBox BubbleChart BubbleChart3D BubbleScale BubbleSizes BulletGauge BusinessDayQ ButterflyGraph ButterworthFilterModel Button ButtonBar ButtonBox ButtonBoxOptions ButtonCell ButtonContents ButtonData ButtonEvaluator ButtonExpandable ButtonFrame ButtonFunction ButtonMargins ButtonMinHeight ButtonNote ButtonNotebook ButtonSource ButtonStyle ButtonStyleMenuListing Byte ByteCount ByteOrdering C CachedValue CacheGraphics CalendarData CalendarType CallPacket CanberraDistance Cancel CancelButton CandlestickChart Cap CapForm CapitalDifferentialD CardinalBSplineBasis CarmichaelLambda Cases Cashflow Casoratian Catalan CatalanNumber Catch CauchyDistribution CauchyWindow CayleyGraph CDF CDFDeploy CDFInformation CDFWavelet Ceiling Cell CellAutoOverwrite CellBaseline CellBoundingBox CellBracketOptions CellChangeTimes CellContents CellContext CellDingbat CellDynamicExpression CellEditDuplicate CellElementsBoundingBox CellElementSpacings CellEpilog CellEvaluationDuplicate CellEvaluationFunction CellEventActions CellFrame CellFrameColor CellFrameLabelMargins CellFrameLabels CellFrameMargins CellGroup CellGroupData CellGrouping CellGroupingRules CellHorizontalScrolling CellID CellLabel CellLabelAutoDelete CellLabelMargins CellLabelPositioning CellMargins CellObject CellOpen CellPrint CellProlog Cells CellSize CellStyle CellTags CellularAutomaton CensoredDistribution Censoring Center CenterDot CentralMoment CentralMomentGeneratingFunction CForm ChampernowneNumber ChanVeseBinarize Character CharacterEncoding CharacterEncodingsPath CharacteristicFunction CharacteristicPolynomial CharacterRange Characters ChartBaseStyle ChartElementData ChartElementDataFunction ChartElementFunction ChartElements ChartLabels ChartLayout ChartLegends ChartStyle Chebyshev1FilterModel Chebyshev2FilterModel ChebyshevDistance ChebyshevT ChebyshevU Check CheckAbort CheckAll Checkbox CheckboxBar CheckboxBox CheckboxBoxOptions ChemicalData ChessboardDistance ChiDistribution ChineseRemainder ChiSquareDistribution ChoiceButtons ChoiceDialog CholeskyDecomposition Chop Circle CircleBox CircleDot CircleMinus CirclePlus CircleTimes CirculantGraph CityData Clear ClearAll ClearAttributes ClearSystemCache ClebschGordan ClickPane Clip ClipboardNotebook ClipFill ClippingStyle ClipPlanes ClipRange Clock ClockGauge ClockwiseContourIntegral Close Closed CloseKernels ClosenessCentrality Closing ClosingAutoSave ClosingEvent ClusteringComponents CMYKColor Coarse Coefficient CoefficientArrays CoefficientDomain CoefficientList CoefficientRules CoifletWavelet Collect Colon ColonForm ColorCombine ColorConvert ColorData ColorDataFunction ColorFunction ColorFunctionScaling Colorize ColorNegate ColorOutput ColorProfileData ColorQuantize ColorReplace ColorRules ColorSelectorSettings ColorSeparate ColorSetter ColorSetterBox ColorSetterBoxOptions ColorSlider ColorSpace Column ColumnAlignments ColumnBackgrounds ColumnForm ColumnLines ColumnsEqual ColumnSpacings ColumnWidths CommonDefaultFormatTypes Commonest CommonestFilter CommonUnits CommunityBoundaryStyle CommunityGraphPlot CommunityLabels CommunityRegionStyle CompatibleUnitQ CompilationOptions CompilationTarget Compile Compiled CompiledFunction Complement CompleteGraph CompleteGraphQ CompleteKaryTree CompletionsListPacket Complex Complexes ComplexExpand ComplexInfinity ComplexityFunction ComponentMeasurements ComponentwiseContextMenu Compose ComposeList ComposeSeries Composition CompoundExpression CompoundPoissonDistribution CompoundPoissonProcess CompoundRenewalProcess Compress CompressedData Condition ConditionalExpression Conditioned Cone ConeBox ConfidenceLevel ConfidenceRange ConfidenceTransform ConfigurationPath Congruent Conjugate ConjugateTranspose Conjunction Connect ConnectedComponents ConnectedGraphQ ConnesWindow ConoverTest ConsoleMessage ConsoleMessagePacket ConsolePrint Constant ConstantArray Constants ConstrainedMax ConstrainedMin ContentPadding ContentsBoundingBox ContentSelectable ContentSize Context ContextMenu Contexts ContextToFilename ContextToFileName Continuation Continue ContinuedFraction ContinuedFractionK ContinuousAction ContinuousMarkovProcess ContinuousTimeModelQ ContinuousWaveletData ContinuousWaveletTransform ContourDetect ContourGraphics ContourIntegral ContourLabels ContourLines ContourPlot ContourPlot3D Contours ContourShading ContourSmoothing ContourStyle ContraharmonicMean Control ControlActive ControlAlignment ControllabilityGramian ControllabilityMatrix ControllableDecomposition ControllableModelQ ControllerDuration ControllerInformation ControllerInformationData ControllerLinking ControllerManipulate ControllerMethod ControllerPath ControllerState ControlPlacement ControlsRendering ControlType Convergents ConversionOptions ConversionRules ConvertToBitmapPacket ConvertToPostScript ConvertToPostScriptPacket Convolve ConwayGroupCo1 ConwayGroupCo2 ConwayGroupCo3 CoordinateChartData CoordinatesToolOptions CoordinateTransform CoordinateTransformData CoprimeQ Coproduct CopulaDistribution Copyable CopyDirectory CopyFile CopyTag CopyToClipboard CornerFilter CornerNeighbors Correlation CorrelationDistance CorrelationFunction CorrelationTest Cos Cosh CoshIntegral CosineDistance CosineWindow CosIntegral Cot Coth Count CounterAssignments CounterBox CounterBoxOptions CounterClockwiseContourIntegral CounterEvaluator CounterFunction CounterIncrements CounterStyle CounterStyleMenuListing CountRoots CountryData Covariance CovarianceEstimatorFunction CovarianceFunction CoxianDistribution CoxIngersollRossProcess CoxModel CoxModelFit CramerVonMisesTest CreateArchive CreateDialog CreateDirectory CreateDocument CreateIntermediateDirectories CreatePalette CreatePalettePacket CreateScheduledTask CreateTemporary CreateWindow CriticalityFailureImportance CriticalitySuccessImportance CriticalSection Cross CrossingDetect CrossMatrix Csc Csch CubeRoot Cubics Cuboid CuboidBox Cumulant CumulantGeneratingFunction Cup CupCap Curl CurlyDoubleQuote CurlyQuote CurrentImage CurrentlySpeakingPacket CurrentValue CurvatureFlowFilter CurveClosed Cyan CycleGraph CycleIndexPolynomial Cycles CyclicGroup Cyclotomic Cylinder CylinderBox CylindricalDecomposition D DagumDistribution DamerauLevenshteinDistance DampingFactor Darker Dashed Dashing DataCompression DataDistribution DataRange DataReversed Date DateDelimiters DateDifference DateFunction DateList DateListLogPlot DateListPlot DatePattern DatePlus DateRange DateString DateTicksFormat DaubechiesWavelet DavisDistribution DawsonF DayCount DayCountConvention DayMatchQ DayName DayPlus DayRange DayRound DeBruijnGraph Debug DebugTag Decimal DeclareKnownSymbols DeclarePackage Decompose Decrement DedekindEta Default DefaultAxesStyle DefaultBaseStyle DefaultBoxStyle DefaultButton DefaultColor DefaultControlPlacement DefaultDuplicateCellStyle DefaultDuration DefaultElement DefaultFaceGridsStyle DefaultFieldHintStyle DefaultFont DefaultFontProperties DefaultFormatType DefaultFormatTypeForStyle DefaultFrameStyle DefaultFrameTicksStyle DefaultGridLinesStyle DefaultInlineFormatType DefaultInputFormatType DefaultLabelStyle DefaultMenuStyle DefaultNaturalLanguage DefaultNewCellStyle DefaultNewInlineCellStyle DefaultNotebook DefaultOptions DefaultOutputFormatType DefaultStyle DefaultStyleDefinitions DefaultTextFormatType DefaultTextInlineFormatType DefaultTicksStyle DefaultTooltipStyle DefaultValues Defer DefineExternal DefineInputStreamMethod DefineOutputStreamMethod Definition Degree DegreeCentrality DegreeGraphDistribution DegreeLexicographic DegreeReverseLexicographic Deinitialization Del Deletable Delete DeleteBorderComponents DeleteCases DeleteContents DeleteDirectory DeleteDuplicates DeleteFile DeleteSmallComponents DeleteWithContents DeletionWarning Delimiter DelimiterFlashTime DelimiterMatching Delimiters Denominator DensityGraphics DensityHistogram DensityPlot DependentVariables Deploy Deployed Depth DepthFirstScan Derivative DerivativeFilter DescriptorStateSpace DesignMatrix Det DGaussianWavelet DiacriticalPositioning Diagonal DiagonalMatrix Dialog DialogIndent DialogInput DialogLevel DialogNotebook DialogProlog DialogReturn DialogSymbols Diamond DiamondMatrix DiceDissimilarity DictionaryLookup DifferenceDelta DifferenceOrder DifferenceRoot DifferenceRootReduce Differences DifferentialD DifferentialRoot DifferentialRootReduce DifferentiatorFilter DigitBlock DigitBlockMinimum DigitCharacter DigitCount DigitQ DihedralGroup Dilation Dimensions DiracComb DiracDelta DirectedEdge DirectedEdges DirectedGraph DirectedGraphQ DirectedInfinity Direction Directive Directory DirectoryName DirectoryQ DirectoryStack DirichletCharacter DirichletConvolve DirichletDistribution DirichletL DirichletTransform DirichletWindow DisableConsolePrintPacket DiscreteChirpZTransform DiscreteConvolve DiscreteDelta DiscreteHadamardTransform DiscreteIndicator DiscreteLQEstimatorGains DiscreteLQRegulatorGains DiscreteLyapunovSolve DiscreteMarkovProcess DiscretePlot DiscretePlot3D DiscreteRatio DiscreteRiccatiSolve DiscreteShift DiscreteTimeModelQ DiscreteUniformDistribution DiscreteVariables DiscreteWaveletData DiscreteWaveletPacketTransform DiscreteWaveletTransform Discriminant Disjunction Disk DiskBox DiskMatrix Dispatch DispersionEstimatorFunction Display DisplayAllSteps DisplayEndPacket DisplayFlushImagePacket DisplayForm DisplayFunction DisplayPacket DisplayRules DisplaySetSizePacket DisplayString DisplayTemporary DisplayWith DisplayWithRef DisplayWithVariable DistanceFunction DistanceTransform Distribute Distributed DistributedContexts DistributeDefinitions DistributionChart DistributionDomain DistributionFitTest DistributionParameterAssumptions DistributionParameterQ Dithering Div Divergence Divide DivideBy Dividers Divisible Divisors DivisorSigma DivisorSum DMSList DMSString Do DockedCells DocumentNotebook DominantColors DOSTextFormat Dot DotDashed DotEqual Dotted DoubleBracketingBar DoubleContourIntegral DoubleDownArrow DoubleLeftArrow DoubleLeftRightArrow DoubleLeftTee DoubleLongLeftArrow DoubleLongLeftRightArrow DoubleLongRightArrow DoubleRightArrow DoubleRightTee DoubleUpArrow DoubleUpDownArrow DoubleVerticalBar DoublyInfinite Down DownArrow DownArrowBar DownArrowUpArrow DownLeftRightVector DownLeftTeeVector DownLeftVector DownLeftVectorBar DownRightTeeVector DownRightVector DownRightVectorBar Downsample DownTee DownTeeArrow DownValues DragAndDrop DrawEdges DrawFrontFaces DrawHighlighted Drop DSolve Dt DualLinearProgramming DualSystemsModel DumpGet DumpSave DuplicateFreeQ Dynamic DynamicBox DynamicBoxOptions DynamicEvaluationTimeout DynamicLocation DynamicModule DynamicModuleBox DynamicModuleBoxOptions DynamicModuleParent DynamicModuleValues DynamicName DynamicNamespace DynamicReference DynamicSetting DynamicUpdating DynamicWrapper DynamicWrapperBox DynamicWrapperBoxOptions E EccentricityCentrality EdgeAdd EdgeBetweennessCentrality EdgeCapacity EdgeCapForm EdgeColor EdgeConnectivity EdgeCost EdgeCount EdgeCoverQ EdgeDashing EdgeDelete EdgeDetect EdgeForm EdgeIndex EdgeJoinForm EdgeLabeling EdgeLabels EdgeLabelStyle EdgeList EdgeOpacity EdgeQ EdgeRenderingFunction EdgeRules EdgeShapeFunction EdgeStyle EdgeThickness EdgeWeight Editable EditButtonSettings EditCellTagsSettings EditDistance EffectiveInterest Eigensystem Eigenvalues EigenvectorCentrality Eigenvectors Element ElementData Eliminate EliminationOrder EllipticE EllipticExp EllipticExpPrime EllipticF EllipticFilterModel EllipticK EllipticLog EllipticNomeQ EllipticPi EllipticReducedHalfPeriods EllipticTheta EllipticThetaPrime EmitSound EmphasizeSyntaxErrors EmpiricalDistribution Empty EmptyGraphQ EnableConsolePrintPacket Enabled Encode End EndAdd EndDialogPacket EndFrontEndInteractionPacket EndOfFile EndOfLine EndOfString EndPackage EngineeringForm Enter EnterExpressionPacket EnterTextPacket Entropy EntropyFilter Environment Epilog Equal EqualColumns EqualRows EqualTilde EquatedTo Equilibrium EquirippleFilterKernel Equivalent Erf Erfc Erfi ErlangB ErlangC ErlangDistribution Erosion ErrorBox ErrorBoxOptions ErrorNorm ErrorPacket ErrorsDialogSettings EstimatedDistribution EstimatedProcess EstimatorGains EstimatorRegulator EuclideanDistance EulerE EulerGamma EulerianGraphQ EulerPhi Evaluatable Evaluate Evaluated EvaluatePacket EvaluationCell EvaluationCompletionAction EvaluationElements EvaluationMode EvaluationMonitor EvaluationNotebook EvaluationObject EvaluationOrder Evaluator EvaluatorNames EvenQ EventData EventEvaluator EventHandler EventHandlerTag EventLabels ExactBlackmanWindow ExactNumberQ ExactRootIsolation ExampleData Except ExcludedForms ExcludePods Exclusions ExclusionsStyle Exists Exit ExitDialog Exp Expand ExpandAll ExpandDenominator ExpandFileName ExpandNumerator Expectation ExpectationE ExpectedValue ExpGammaDistribution ExpIntegralE ExpIntegralEi Exponent ExponentFunction ExponentialDistribution ExponentialFamily ExponentialGeneratingFunction ExponentialMovingAverage ExponentialPowerDistribution ExponentPosition ExponentStep Export ExportAutoReplacements ExportPacket ExportString Expression ExpressionCell ExpressionPacket ExpToTrig ExtendedGCD Extension ExtentElementFunction ExtentMarkers ExtentSize ExternalCall ExternalDataCharacterEncoding Extract ExtractArchive ExtremeValueDistribution FaceForm FaceGrids FaceGridsStyle Factor FactorComplete Factorial Factorial2 FactorialMoment FactorialMomentGeneratingFunction FactorialPower FactorInteger FactorList FactorSquareFree FactorSquareFreeList FactorTerms FactorTermsList Fail FailureDistribution False FARIMAProcess FEDisableConsolePrintPacket FeedbackSector FeedbackSectorStyle FeedbackType FEEnableConsolePrintPacket Fibonacci FieldHint FieldHintStyle FieldMasked FieldSize File FileBaseName FileByteCount FileDate FileExistsQ FileExtension FileFormat FileHash FileInformation FileName FileNameDepth FileNameDialogSettings FileNameDrop FileNameJoin FileNames FileNameSetter FileNameSplit FileNameTake FilePrint FileType FilledCurve FilledCurveBox Filling FillingStyle FillingTransform FilterRules FinancialBond FinancialData FinancialDerivative FinancialIndicator Find FindArgMax FindArgMin FindClique FindClusters FindCurvePath FindDistributionParameters FindDivisions FindEdgeCover FindEdgeCut FindEulerianCycle FindFaces FindFile FindFit FindGeneratingFunction FindGeoLocation FindGeometricTransform FindGraphCommunities FindGraphIsomorphism FindGraphPartition FindHamiltonianCycle FindIndependentEdgeSet FindIndependentVertexSet FindInstance FindIntegerNullVector FindKClan FindKClique FindKClub FindKPlex FindLibrary FindLinearRecurrence FindList FindMaximum FindMaximumFlow FindMaxValue FindMinimum FindMinimumCostFlow FindMinimumCut FindMinValue FindPermutation FindPostmanTour FindProcessParameters FindRoot FindSequenceFunction FindSettings FindShortestPath FindShortestTour FindThreshold FindVertexCover FindVertexCut Fine FinishDynamic FiniteAbelianGroupCount FiniteGroupCount FiniteGroupData First FirstPassageTimeDistribution FischerGroupFi22 FischerGroupFi23 FischerGroupFi24Prime FisherHypergeometricDistribution FisherRatioTest FisherZDistribution Fit FitAll FittedModel FixedPoint FixedPointList FlashSelection Flat Flatten FlattenAt FlatTopWindow FlipView Floor FlushPrintOutputPacket Fold FoldList Font FontColor FontFamily FontForm FontName FontOpacity FontPostScriptName FontProperties FontReencoding FontSize FontSlant FontSubstitutions FontTracking FontVariations FontWeight For ForAll Format FormatRules FormatType FormatTypeAutoConvert FormatValues FormBox FormBoxOptions FortranForm Forward ForwardBackward Fourier FourierCoefficient FourierCosCoefficient FourierCosSeries FourierCosTransform FourierDCT FourierDCTFilter FourierDCTMatrix FourierDST FourierDSTMatrix FourierMatrix FourierParameters FourierSequenceTransform FourierSeries FourierSinCoefficient FourierSinSeries FourierSinTransform FourierTransform FourierTrigSeries FractionalBrownianMotionProcess FractionalPart FractionBox FractionBoxOptions FractionLine Frame FrameBox FrameBoxOptions Framed FrameInset FrameLabel Frameless FrameMargins FrameStyle FrameTicks FrameTicksStyle FRatioDistribution FrechetDistribution FreeQ FrequencySamplingFilterKernel FresnelC FresnelS Friday FrobeniusNumber FrobeniusSolve FromCharacterCode FromCoefficientRules FromContinuedFraction FromDate FromDigits FromDMS Front FrontEndDynamicExpression FrontEndEventActions FrontEndExecute FrontEndObject FrontEndResource FrontEndResourceString FrontEndStackSize FrontEndToken FrontEndTokenExecute FrontEndValueCache FrontEndVersion FrontFaceColor FrontFaceOpacity Full FullAxes FullDefinition FullForm FullGraphics FullOptions FullSimplify Function FunctionExpand FunctionInterpolation FunctionSpace FussellVeselyImportance GaborFilter GaborMatrix GaborWavelet GainMargins GainPhaseMargins Gamma GammaDistribution GammaRegularized GapPenalty Gather GatherBy GaugeFaceElementFunction GaugeFaceStyle GaugeFrameElementFunction GaugeFrameSize GaugeFrameStyle GaugeLabels GaugeMarkers GaugeStyle GaussianFilter GaussianIntegers GaussianMatrix GaussianWindow GCD GegenbauerC General GeneralizedLinearModelFit GenerateConditions GeneratedCell GeneratedParameters GeneratingFunction Generic GenericCylindricalDecomposition GenomeData GenomeLookup GeodesicClosing GeodesicDilation GeodesicErosion GeodesicOpening GeoDestination GeodesyData GeoDirection GeoDistance GeoGridPosition GeometricBrownianMotionProcess GeometricDistribution GeometricMean GeometricMeanFilter GeometricTransformation GeometricTransformation3DBox GeometricTransformation3DBoxOptions GeometricTransformationBox GeometricTransformationBoxOptions GeoPosition GeoPositionENU GeoPositionXYZ GeoProjectionData GestureHandler GestureHandlerTag Get GetBoundingBoxSizePacket GetContext GetEnvironment GetFileName GetFrontEndOptionsDataPacket GetLinebreakInformationPacket GetMenusPacket GetPageBreakInformationPacket Glaisher GlobalClusteringCoefficient GlobalPreferences GlobalSession Glow GoldenRatio GompertzMakehamDistribution GoodmanKruskalGamma GoodmanKruskalGammaTest Goto Grad Gradient GradientFilter GradientOrientationFilter Graph GraphAssortativity GraphCenter GraphComplement GraphData GraphDensity GraphDiameter GraphDifference GraphDisjointUnion GraphDistance GraphDistanceMatrix GraphElementData GraphEmbedding GraphHighlight GraphHighlightStyle GraphHub Graphics Graphics3D Graphics3DBox Graphics3DBoxOptions GraphicsArray GraphicsBaseline GraphicsBox GraphicsBoxOptions GraphicsColor GraphicsColumn GraphicsComplex GraphicsComplex3DBox GraphicsComplex3DBoxOptions GraphicsComplexBox GraphicsComplexBoxOptions GraphicsContents GraphicsData GraphicsGrid GraphicsGridBox GraphicsGroup GraphicsGroup3DBox GraphicsGroup3DBoxOptions GraphicsGroupBox GraphicsGroupBoxOptions GraphicsGrouping GraphicsHighlightColor GraphicsRow GraphicsSpacing GraphicsStyle GraphIntersection GraphLayout GraphLinkEfficiency GraphPeriphery GraphPlot GraphPlot3D GraphPower GraphPropertyDistribution GraphQ GraphRadius GraphReciprocity GraphRoot GraphStyle GraphUnion Gray GrayLevel GreatCircleDistance Greater GreaterEqual GreaterEqualLess GreaterFullEqual GreaterGreater GreaterLess GreaterSlantEqual GreaterTilde Green Grid GridBaseline GridBox GridBoxAlignment GridBoxBackground GridBoxDividers GridBoxFrame GridBoxItemSize GridBoxItemStyle GridBoxOptions GridBoxSpacings GridCreationSettings GridDefaultElement GridElementStyleOptions GridFrame GridFrameMargins GridGraph GridLines GridLinesStyle GroebnerBasis GroupActionBase GroupCentralizer GroupElementFromWord GroupElementPosition GroupElementQ GroupElements GroupElementToWord GroupGenerators GroupMultiplicationTable GroupOrbits GroupOrder GroupPageBreakWithin GroupSetwiseStabilizer GroupStabilizer GroupStabilizerChain Gudermannian GumbelDistribution HaarWavelet HadamardMatrix HalfNormalDistribution HamiltonianGraphQ HammingDistance HammingWindow HankelH1 HankelH2 HankelMatrix HannPoissonWindow HannWindow HaradaNortonGroupHN HararyGraph HarmonicMean HarmonicMeanFilter HarmonicNumber Hash HashTable Haversine HazardFunction Head HeadCompose Heads HeavisideLambda HeavisidePi HeavisideTheta HeldGroupHe HeldPart HelpBrowserLookup HelpBrowserNotebook HelpBrowserSettings HermiteDecomposition HermiteH HermitianMatrixQ HessenbergDecomposition Hessian HexadecimalCharacter Hexahedron HexahedronBox HexahedronBoxOptions HiddenSurface HighlightGraph HighlightImage HighpassFilter HigmanSimsGroupHS HilbertFilter HilbertMatrix Histogram Histogram3D HistogramDistribution HistogramList HistogramTransform HistogramTransformInterpolation HitMissTransform HITSCentrality HodgeDual HoeffdingD HoeffdingDTest Hold HoldAll HoldAllComplete HoldComplete HoldFirst HoldForm HoldPattern HoldRest HolidayCalendar HomeDirectory HomePage Horizontal HorizontalForm HorizontalGauge HorizontalScrollPosition HornerForm HotellingTSquareDistribution HoytDistribution HTMLSave Hue HumpDownHump HumpEqual HurwitzLerchPhi HurwitzZeta HyperbolicDistribution HypercubeGraph HyperexponentialDistribution Hyperfactorial Hypergeometric0F1 Hypergeometric0F1Regularized Hypergeometric1F1 Hypergeometric1F1Regularized Hypergeometric2F1 Hypergeometric2F1Regularized HypergeometricDistribution HypergeometricPFQ HypergeometricPFQRegularized HypergeometricU Hyperlink HyperlinkCreationSettings Hyphenation HyphenationOptions HypoexponentialDistribution HypothesisTestData I Identity IdentityMatrix If IgnoreCase Im Image Image3D Image3DSlices ImageAccumulate ImageAdd ImageAdjust ImageAlign ImageApply ImageAspectRatio ImageAssemble ImageCache ImageCacheValid ImageCapture ImageChannels ImageClip ImageColorSpace ImageCompose ImageConvolve ImageCooccurrence ImageCorners ImageCorrelate ImageCorrespondingPoints ImageCrop ImageData ImageDataPacket ImageDeconvolve ImageDemosaic ImageDifference ImageDimensions ImageDistance ImageEffect ImageFeatureTrack ImageFileApply ImageFileFilter ImageFileScan ImageFilter ImageForestingComponents ImageForwardTransformation ImageHistogram ImageKeypoints ImageLevels ImageLines ImageMargins ImageMarkers ImageMeasurements ImageMultiply ImageOffset ImagePad ImagePadding ImagePartition ImagePeriodogram ImagePerspectiveTransformation ImageQ ImageRangeCache ImageReflect ImageRegion ImageResize ImageResolution ImageRotate ImageRotated ImageScaled ImageScan ImageSize ImageSizeAction ImageSizeCache ImageSizeMultipliers ImageSizeRaw ImageSubtract ImageTake ImageTransformation ImageTrim ImageType ImageValue ImageValuePositions Implies Import ImportAutoReplacements ImportString ImprovementImportance In IncidenceGraph IncidenceList IncidenceMatrix IncludeConstantBasis IncludeFileExtension IncludePods IncludeSingularTerm Increment Indent IndentingNewlineSpacings IndentMaxFraction IndependenceTest IndependentEdgeSetQ IndependentUnit IndependentVertexSetQ Indeterminate IndexCreationOptions Indexed IndexGraph IndexTag Inequality InexactNumberQ InexactNumbers Infinity Infix Information Inherited InheritScope Initialization InitializationCell InitializationCellEvaluation InitializationCellWarning InlineCounterAssignments InlineCounterIncrements InlineRules Inner Inpaint Input InputAliases InputAssumptions InputAutoReplacements InputField InputFieldBox InputFieldBoxOptions InputForm InputGrouping InputNamePacket InputNotebook InputPacket InputSettings InputStream InputString InputStringPacket InputToBoxFormPacket Insert InsertionPointObject InsertResults Inset Inset3DBox Inset3DBoxOptions InsetBox InsetBoxOptions Install InstallService InString Integer IntegerDigits IntegerExponent IntegerLength IntegerPart IntegerPartitions IntegerQ Integers IntegerString Integral Integrate Interactive InteractiveTradingChart Interlaced Interleaving InternallyBalancedDecomposition InterpolatingFunction InterpolatingPolynomial Interpolation InterpolationOrder InterpolationPoints InterpolationPrecision Interpretation InterpretationBox InterpretationBoxOptions InterpretationFunction InterpretTemplate InterquartileRange Interrupt InterruptSettings Intersection Interval IntervalIntersection IntervalMemberQ IntervalUnion Inverse InverseBetaRegularized InverseCDF InverseChiSquareDistribution InverseContinuousWaveletTransform InverseDistanceTransform InverseEllipticNomeQ InverseErf InverseErfc InverseFourier InverseFourierCosTransform InverseFourierSequenceTransform InverseFourierSinTransform InverseFourierTransform InverseFunction InverseFunctions InverseGammaDistribution InverseGammaRegularized InverseGaussianDistribution InverseGudermannian InverseHaversine InverseJacobiCD InverseJacobiCN InverseJacobiCS InverseJacobiDC InverseJacobiDN InverseJacobiDS InverseJacobiNC InverseJacobiND InverseJacobiNS InverseJacobiSC InverseJacobiSD InverseJacobiSN InverseLaplaceTransform InversePermutation InverseRadon InverseSeries InverseSurvivalFunction InverseWaveletTransform InverseWeierstrassP InverseZTransform Invisible InvisibleApplication InvisibleTimes IrreduciblePolynomialQ IsolatingInterval IsomorphicGraphQ IsotopeData Italic Item ItemBox ItemBoxOptions ItemSize ItemStyle ItoProcess JaccardDissimilarity JacobiAmplitude Jacobian JacobiCD JacobiCN JacobiCS JacobiDC JacobiDN JacobiDS JacobiNC JacobiND JacobiNS JacobiP JacobiSC JacobiSD JacobiSN JacobiSymbol JacobiZeta JankoGroupJ1 JankoGroupJ2 JankoGroupJ3 JankoGroupJ4 JarqueBeraALMTest JohnsonDistribution Join Joined JoinedCurve JoinedCurveBox JoinForm JordanDecomposition JordanModelDecomposition K KagiChart KaiserBesselWindow KaiserWindow KalmanEstimator KalmanFilter KarhunenLoeveDecomposition KaryTree KatzCentrality KCoreComponents KDistribution KelvinBei KelvinBer KelvinKei KelvinKer KendallTau KendallTauTest KernelExecute KernelMixtureDistribution KernelObject Kernels Ket Khinchin KirchhoffGraph KirchhoffMatrix KleinInvariantJ KnightTourGraph KnotData KnownUnitQ KolmogorovSmirnovTest KroneckerDelta KroneckerModelDecomposition KroneckerProduct KroneckerSymbol KuiperTest KumaraswamyDistribution Kurtosis KuwaharaFilter Label Labeled LabeledSlider LabelingFunction LabelStyle LaguerreL LambdaComponents LambertW LanczosWindow LandauDistribution Language LanguageCategory LaplaceDistribution LaplaceTransform Laplacian LaplacianFilter LaplacianGaussianFilter Large Larger Last Latitude LatitudeLongitude LatticeData LatticeReduce Launch LaunchKernels LayeredGraphPlot LayerSizeFunction LayoutInformation LCM LeafCount LeapYearQ LeastSquares LeastSquaresFilterKernel Left LeftArrow LeftArrowBar LeftArrowRightArrow LeftDownTeeVector LeftDownVector LeftDownVectorBar LeftRightArrow LeftRightVector LeftTee LeftTeeArrow LeftTeeVector LeftTriangle LeftTriangleBar LeftTriangleEqual LeftUpDownVector LeftUpTeeVector LeftUpVector LeftUpVectorBar LeftVector LeftVectorBar LegendAppearance Legended LegendFunction LegendLabel LegendLayout LegendMargins LegendMarkers LegendMarkerSize LegendreP LegendreQ LegendreType Length LengthWhile LerchPhi Less LessEqual LessEqualGreater LessFullEqual LessGreater LessLess LessSlantEqual LessTilde LetterCharacter LetterQ Level LeveneTest LeviCivitaTensor LevyDistribution Lexicographic LibraryFunction LibraryFunctionError LibraryFunctionInformation LibraryFunctionLoad LibraryFunctionUnload LibraryLoad LibraryUnload LicenseID LiftingFilterData LiftingWaveletTransform LightBlue LightBrown LightCyan Lighter LightGray LightGreen Lighting LightingAngle LightMagenta LightOrange LightPink LightPurple LightRed LightSources LightYellow Likelihood Limit LimitsPositioning LimitsPositioningTokens LindleyDistribution Line Line3DBox LinearFilter LinearFractionalTransform LinearModelFit LinearOffsetFunction LinearProgramming LinearRecurrence LinearSolve LinearSolveFunction LineBox LineBreak LinebreakAdjustments LineBreakChart LineBreakWithin LineColor LineForm LineGraph LineIndent LineIndentMaxFraction LineIntegralConvolutionPlot LineIntegralConvolutionScale LineLegend LineOpacity LineSpacing LineWrapParts LinkActivate LinkClose LinkConnect LinkConnectedQ LinkCreate LinkError LinkFlush LinkFunction LinkHost LinkInterrupt LinkLaunch LinkMode LinkObject LinkOpen LinkOptions LinkPatterns LinkProtocol LinkRead LinkReadHeld LinkReadyQ Links LinkWrite LinkWriteHeld LiouvilleLambda List Listable ListAnimate ListContourPlot ListContourPlot3D ListConvolve ListCorrelate ListCurvePathPlot ListDeconvolve ListDensityPlot Listen ListFourierSequenceTransform ListInterpolation ListLineIntegralConvolutionPlot ListLinePlot ListLogLinearPlot ListLogLogPlot ListLogPlot ListPicker ListPickerBox ListPickerBoxBackground ListPickerBoxOptions ListPlay ListPlot ListPlot3D ListPointPlot3D ListPolarPlot ListQ ListStreamDensityPlot ListStreamPlot ListSurfacePlot3D ListVectorDensityPlot ListVectorPlot ListVectorPlot3D ListZTransform Literal LiteralSearch LocalClusteringCoefficient LocalizeVariables LocationEquivalenceTest LocationTest Locator LocatorAutoCreate LocatorBox LocatorBoxOptions LocatorCentering LocatorPane LocatorPaneBox LocatorPaneBoxOptions LocatorRegion Locked Log Log10 Log2 LogBarnesG LogGamma LogGammaDistribution LogicalExpand LogIntegral LogisticDistribution LogitModelFit LogLikelihood LogLinearPlot LogLogisticDistribution LogLogPlot LogMultinormalDistribution LogNormalDistribution LogPlot LogRankTest LogSeriesDistribution LongEqual Longest LongestAscendingSequence LongestCommonSequence LongestCommonSequencePositions LongestCommonSubsequence LongestCommonSubsequencePositions LongestMatch LongForm Longitude LongLeftArrow LongLeftRightArrow LongRightArrow Loopback LoopFreeGraphQ LowerCaseQ LowerLeftArrow LowerRightArrow LowerTriangularize LowpassFilter LQEstimatorGains LQGRegulator LQOutputRegulatorGains LQRegulatorGains LUBackSubstitution LucasL LuccioSamiComponents LUDecomposition LyapunovSolve LyonsGroupLy MachineID MachineName MachineNumberQ MachinePrecision MacintoshSystemPageSetup Magenta Magnification Magnify MainSolve MaintainDynamicCaches Majority MakeBoxes MakeExpression MakeRules MangoldtLambda ManhattanDistance Manipulate Manipulator MannWhitneyTest MantissaExponent Manual Map MapAll MapAt MapIndexed MAProcess MapThread MarcumQ MardiaCombinedTest MardiaKurtosisTest MardiaSkewnessTest MarginalDistribution MarkovProcessProperties Masking MatchingDissimilarity MatchLocalNameQ MatchLocalNames MatchQ Material MathematicaNotation MathieuC MathieuCharacteristicA MathieuCharacteristicB MathieuCharacteristicExponent MathieuCPrime MathieuGroupM11 MathieuGroupM12 MathieuGroupM22 MathieuGroupM23 MathieuGroupM24 MathieuS MathieuSPrime MathMLForm MathMLText Matrices MatrixExp MatrixForm MatrixFunction MatrixLog MatrixPlot MatrixPower MatrixQ MatrixRank Max MaxBend MaxDetect MaxExtraBandwidths MaxExtraConditions MaxFeatures MaxFilter Maximize MaxIterations MaxMemoryUsed MaxMixtureKernels MaxPlotPoints MaxPoints MaxRecursion MaxStableDistribution MaxStepFraction MaxSteps MaxStepSize MaxValue MaxwellDistribution McLaughlinGroupMcL Mean MeanClusteringCoefficient MeanDegreeConnectivity MeanDeviation MeanFilter MeanGraphDistance MeanNeighborDegree MeanShift MeanShiftFilter Median MedianDeviation MedianFilter Medium MeijerG MeixnerDistribution MemberQ MemoryConstrained MemoryInUse Menu MenuAppearance MenuCommandKey MenuEvaluator MenuItem MenuPacket MenuSortingValue MenuStyle MenuView MergeDifferences Mesh MeshFunctions MeshRange MeshShading MeshStyle Message MessageDialog MessageList MessageName MessageOptions MessagePacket Messages MessagesNotebook MetaCharacters MetaInformation Method MethodOptions MexicanHatWavelet MeyerWavelet Min MinDetect MinFilter MinimalPolynomial MinimalStateSpaceModel Minimize Minors MinRecursion MinSize MinStableDistribution Minus MinusPlus MinValue Missing MissingDataMethod MittagLefflerE MixedRadix MixedRadixQuantity MixtureDistribution Mod Modal Mode Modular ModularLambda Module Modulus MoebiusMu Moment Momentary MomentConvert MomentEvaluate MomentGeneratingFunction Monday Monitor MonomialList MonomialOrder MonsterGroupM MorletWavelet MorphologicalBinarize MorphologicalBranchPoints MorphologicalComponents MorphologicalEulerNumber MorphologicalGraph MorphologicalPerimeter MorphologicalTransform Most MouseAnnotation MouseAppearance MouseAppearanceTag MouseButtons Mouseover MousePointerNote MousePosition MovingAverage MovingMedian MoyalDistribution MultiedgeStyle MultilaunchWarning MultiLetterItalics MultiLetterStyle MultilineFunction Multinomial MultinomialDistribution MultinormalDistribution MultiplicativeOrder Multiplicity Multiselection MultivariateHypergeometricDistribution MultivariatePoissonDistribution MultivariateTDistribution N NakagamiDistribution NameQ Names NamespaceBox Nand NArgMax NArgMin NBernoulliB NCache NDSolve NDSolveValue Nearest NearestFunction NeedCurrentFrontEndPackagePacket NeedCurrentFrontEndSymbolsPacket NeedlemanWunschSimilarity Needs Negative NegativeBinomialDistribution NegativeMultinomialDistribution NeighborhoodGraph Nest NestedGreaterGreater NestedLessLess NestedScriptRules NestList NestWhile NestWhileList NevilleThetaC NevilleThetaD NevilleThetaN NevilleThetaS NewPrimitiveStyle NExpectation Next NextPrime NHoldAll NHoldFirst NHoldRest NicholsGridLines NicholsPlot NIntegrate NMaximize NMaxValue NMinimize NMinValue NominalVariables NonAssociative NoncentralBetaDistribution NoncentralChiSquareDistribution NoncentralFRatioDistribution NoncentralStudentTDistribution NonCommutativeMultiply NonConstants None NonlinearModelFit NonlocalMeansFilter NonNegative NonPositive Nor NorlundB Norm Normal NormalDistribution NormalGrouping Normalize NormalizedSquaredEuclideanDistance NormalsFunction NormFunction Not NotCongruent NotCupCap NotDoubleVerticalBar Notebook NotebookApply NotebookAutoSave NotebookClose NotebookConvertSettings NotebookCreate NotebookCreateReturnObject NotebookDefault NotebookDelete NotebookDirectory NotebookDynamicExpression NotebookEvaluate NotebookEventActions NotebookFileName NotebookFind NotebookFindReturnObject NotebookGet NotebookGetLayoutInformationPacket NotebookGetMisspellingsPacket NotebookInformation NotebookInterfaceObject NotebookLocate NotebookObject NotebookOpen NotebookOpenReturnObject NotebookPath NotebookPrint NotebookPut NotebookPutReturnObject NotebookRead NotebookResetGeneratedCells Notebooks NotebookSave NotebookSaveAs NotebookSelection NotebookSetupLayoutInformationPacket NotebooksMenu NotebookWrite NotElement NotEqualTilde NotExists NotGreater NotGreaterEqual NotGreaterFullEqual NotGreaterGreater NotGreaterLess NotGreaterSlantEqual NotGreaterTilde NotHumpDownHump NotHumpEqual NotLeftTriangle NotLeftTriangleBar NotLeftTriangleEqual NotLess NotLessEqual NotLessFullEqual NotLessGreater NotLessLess NotLessSlantEqual NotLessTilde NotNestedGreaterGreater NotNestedLessLess NotPrecedes NotPrecedesEqual NotPrecedesSlantEqual NotPrecedesTilde NotReverseElement NotRightTriangle NotRightTriangleBar NotRightTriangleEqual NotSquareSubset NotSquareSubsetEqual NotSquareSuperset NotSquareSupersetEqual NotSubset NotSubsetEqual NotSucceeds NotSucceedsEqual NotSucceedsSlantEqual NotSucceedsTilde NotSuperset NotSupersetEqual NotTilde NotTildeEqual NotTildeFullEqual NotTildeTilde NotVerticalBar NProbability NProduct NProductFactors NRoots NSolve NSum NSumTerms Null NullRecords NullSpace NullWords Number NumberFieldClassNumber NumberFieldDiscriminant NumberFieldFundamentalUnits NumberFieldIntegralBasis NumberFieldNormRepresentatives NumberFieldRegulator NumberFieldRootsOfUnity NumberFieldSignature NumberForm NumberFormat NumberMarks NumberMultiplier NumberPadding NumberPoint NumberQ NumberSeparator NumberSigns NumberString Numerator NumericFunction NumericQ NuttallWindow NValues NyquistGridLines NyquistPlot O ObservabilityGramian ObservabilityMatrix ObservableDecomposition ObservableModelQ OddQ Off Offset OLEData On ONanGroupON OneIdentity Opacity Open OpenAppend Opener OpenerBox OpenerBoxOptions OpenerView OpenFunctionInspectorPacket Opening OpenRead OpenSpecialOptions OpenTemporary OpenWrite Operate OperatingSystem OptimumFlowData Optional OptionInspectorSettings OptionQ Options OptionsPacket OptionsPattern OptionValue OptionValueBox OptionValueBoxOptions Or Orange Order OrderDistribution OrderedQ Ordering Orderless OrnsteinUhlenbeckProcess Orthogonalize Out Outer OutputAutoOverwrite OutputControllabilityMatrix OutputControllableModelQ OutputForm OutputFormData OutputGrouping OutputMathEditExpression OutputNamePacket OutputResponse OutputSizeLimit OutputStream Over OverBar OverDot Overflow OverHat Overlaps Overlay OverlayBox OverlayBoxOptions Overscript OverscriptBox OverscriptBoxOptions OverTilde OverVector OwenT OwnValues PackingMethod PaddedForm Padding PadeApproximant PadLeft PadRight PageBreakAbove PageBreakBelow PageBreakWithin PageFooterLines PageFooters PageHeaderLines PageHeaders PageHeight PageRankCentrality PageWidth PairedBarChart PairedHistogram PairedSmoothHistogram PairedTTest PairedZTest PaletteNotebook PalettePath Pane PaneBox PaneBoxOptions Panel PanelBox PanelBoxOptions Paneled PaneSelector PaneSelectorBox PaneSelectorBoxOptions PaperWidth ParabolicCylinderD ParagraphIndent ParagraphSpacing ParallelArray ParallelCombine ParallelDo ParallelEvaluate Parallelization Parallelize ParallelMap ParallelNeeds ParallelProduct ParallelSubmit ParallelSum ParallelTable ParallelTry Parameter ParameterEstimator ParameterMixtureDistribution ParameterVariables ParametricFunction ParametricNDSolve ParametricNDSolveValue ParametricPlot ParametricPlot3D ParentConnect ParentDirectory ParentForm Parenthesize ParentList ParetoDistribution Part PartialCorrelationFunction PartialD ParticleData Partition PartitionsP PartitionsQ ParzenWindow PascalDistribution PassEventsDown PassEventsUp Paste PasteBoxFormInlineCells PasteButton Path PathGraph PathGraphQ Pattern PatternSequence PatternTest PauliMatrix PaulWavelet Pause PausedTime PDF PearsonChiSquareTest PearsonCorrelationTest PearsonDistribution PerformanceGoal PeriodicInterpolation Periodogram PeriodogramArray PermutationCycles PermutationCyclesQ PermutationGroup PermutationLength PermutationList PermutationListQ PermutationMax PermutationMin PermutationOrder PermutationPower PermutationProduct PermutationReplace Permutations PermutationSupport Permute PeronaMalikFilter Perpendicular PERTDistribution PetersenGraph PhaseMargins Pi Pick PIDData PIDDerivativeFilter PIDFeedforward PIDTune Piecewise PiecewiseExpand PieChart PieChart3D PillaiTrace PillaiTraceTest Pink Pivoting PixelConstrained PixelValue PixelValuePositions Placed Placeholder PlaceholderReplace Plain PlanarGraphQ Play PlayRange Plot Plot3D Plot3Matrix PlotDivision PlotJoined PlotLabel PlotLayout PlotLegends PlotMarkers PlotPoints PlotRange PlotRangeClipping PlotRangePadding PlotRegion PlotStyle Plus PlusMinus Pochhammer PodStates PodWidth Point Point3DBox PointBox PointFigureChart PointForm PointLegend PointSize PoissonConsulDistribution PoissonDistribution PoissonProcess PoissonWindow PolarAxes PolarAxesOrigin PolarGridLines PolarPlot PolarTicks PoleZeroMarkers PolyaAeppliDistribution PolyGamma Polygon Polygon3DBox Polygon3DBoxOptions PolygonBox PolygonBoxOptions PolygonHoleScale PolygonIntersections PolygonScale PolyhedronData PolyLog PolynomialExtendedGCD PolynomialForm PolynomialGCD PolynomialLCM PolynomialMod PolynomialQ PolynomialQuotient PolynomialQuotientRemainder PolynomialReduce PolynomialRemainder Polynomials PopupMenu PopupMenuBox PopupMenuBoxOptions PopupView PopupWindow Position Positive PositiveDefiniteMatrixQ PossibleZeroQ Postfix PostScript Power PowerDistribution PowerExpand PowerMod PowerModList PowerSpectralDensity PowersRepresentations PowerSymmetricPolynomial Precedence PrecedenceForm Precedes PrecedesEqual PrecedesSlantEqual PrecedesTilde Precision PrecisionGoal PreDecrement PredictionRoot PreemptProtect PreferencesPath Prefix PreIncrement Prepend PrependTo PreserveImageOptions Previous PriceGraphDistribution PrimaryPlaceholder Prime PrimeNu PrimeOmega PrimePi PrimePowerQ PrimeQ Primes PrimeZetaP PrimitiveRoot PrincipalComponents PrincipalValue Print PrintAction PrintForm PrintingCopies PrintingOptions PrintingPageRange PrintingStartingPageNumber PrintingStyleEnvironment PrintPrecision PrintTemporary Prism PrismBox PrismBoxOptions PrivateCellOptions PrivateEvaluationOptions PrivateFontOptions PrivateFrontEndOptions PrivateNotebookOptions PrivatePaths Probability ProbabilityDistribution ProbabilityPlot ProbabilityPr ProbabilityScalePlot ProbitModelFit ProcessEstimator ProcessParameterAssumptions ProcessParameterQ ProcessStateDomain ProcessTimeDomain Product ProductDistribution ProductLog ProgressIndicator ProgressIndicatorBox ProgressIndicatorBoxOptions Projection Prolog PromptForm Properties Property PropertyList PropertyValue Proportion Proportional Protect Protected ProteinData Pruning PseudoInverse Purple Put PutAppend Pyramid PyramidBox PyramidBoxOptions QBinomial QFactorial QGamma QHypergeometricPFQ QPochhammer QPolyGamma QRDecomposition QuadraticIrrationalQ Quantile QuantilePlot Quantity QuantityForm QuantityMagnitude QuantityQ QuantityUnit Quartics QuartileDeviation Quartiles QuartileSkewness QueueingNetworkProcess QueueingProcess QueueProperties Quiet Quit Quotient QuotientRemainder RadialityCentrality RadicalBox RadicalBoxOptions RadioButton RadioButtonBar RadioButtonBox RadioButtonBoxOptions Radon RamanujanTau RamanujanTauL RamanujanTauTheta RamanujanTauZ Random RandomChoice RandomComplex RandomFunction RandomGraph RandomImage RandomInteger RandomPermutation RandomPrime RandomReal RandomSample RandomSeed RandomVariate RandomWalkProcess Range RangeFilter RangeSpecification RankedMax RankedMin Raster Raster3D Raster3DBox Raster3DBoxOptions RasterArray RasterBox RasterBoxOptions Rasterize RasterSize Rational RationalFunctions Rationalize Rationals Ratios Raw RawArray RawBoxes RawData RawMedium RayleighDistribution Re Read ReadList ReadProtected Real RealBlockDiagonalForm RealDigits RealExponent Reals Reap Record RecordLists RecordSeparators Rectangle RectangleBox RectangleBoxOptions RectangleChart RectangleChart3D RecurrenceFilter RecurrenceTable RecurringDigitsForm Red Reduce RefBox ReferenceLineStyle ReferenceMarkers ReferenceMarkerStyle Refine ReflectionMatrix ReflectionTransform Refresh RefreshRate RegionBinarize RegionFunction RegionPlot RegionPlot3D RegularExpression Regularization Reinstall Release ReleaseHold ReliabilityDistribution ReliefImage ReliefPlot Remove RemoveAlphaChannel RemoveAsynchronousTask Removed RemoveInputStreamMethod RemoveOutputStreamMethod RemoveProperty RemoveScheduledTask RenameDirectory RenameFile RenderAll RenderingOptions RenewalProcess RenkoChart Repeated RepeatedNull RepeatedString Replace ReplaceAll ReplaceHeldPart ReplaceImageValue ReplaceList ReplacePart ReplacePixelValue ReplaceRepeated Resampling Rescale RescalingTransform ResetDirectory ResetMenusPacket ResetScheduledTask Residue Resolve Rest Resultant ResumePacket Return ReturnExpressionPacket ReturnInputFormPacket ReturnPacket ReturnTextPacket Reverse ReverseBiorthogonalSplineWavelet ReverseElement ReverseEquilibrium ReverseGraph ReverseUpEquilibrium RevolutionAxis RevolutionPlot3D RGBColor RiccatiSolve RiceDistribution RidgeFilter RiemannR RiemannSiegelTheta RiemannSiegelZ Riffle Right RightArrow RightArrowBar RightArrowLeftArrow RightCosetRepresentative RightDownTeeVector RightDownVector RightDownVectorBar RightTee RightTeeArrow RightTeeVector RightTriangle RightTriangleBar RightTriangleEqual RightUpDownVector RightUpTeeVector RightUpVector RightUpVectorBar RightVector RightVectorBar RiskAchievementImportance RiskReductionImportance RogersTanimotoDissimilarity Root RootApproximant RootIntervals RootLocusPlot RootMeanSquare RootOfUnityQ RootReduce Roots RootSum Rotate RotateLabel RotateLeft RotateRight RotationAction RotationBox RotationBoxOptions RotationMatrix RotationTransform Round RoundImplies RoundingRadius Row RowAlignments RowBackgrounds RowBox RowHeights RowLines RowMinHeight RowReduce RowsEqual RowSpacings RSolve RudvalisGroupRu Rule RuleCondition RuleDelayed RuleForm RulerUnits Run RunScheduledTask RunThrough RuntimeAttributes RuntimeOptions RussellRaoDissimilarity SameQ SameTest SampleDepth SampledSoundFunction SampledSoundList SampleRate SamplingPeriod SARIMAProcess SARMAProcess SatisfiabilityCount SatisfiabilityInstances SatisfiableQ Saturday Save Saveable SaveAutoDelete SaveDefinitions SawtoothWave Scale Scaled ScaleDivisions ScaledMousePosition ScaleOrigin ScalePadding ScaleRanges ScaleRangeStyle ScalingFunctions ScalingMatrix ScalingTransform Scan ScheduledTaskActiveQ ScheduledTaskData ScheduledTaskObject ScheduledTasks SchurDecomposition ScientificForm ScreenRectangle ScreenStyleEnvironment ScriptBaselineShifts ScriptLevel ScriptMinSize ScriptRules ScriptSizeMultipliers Scrollbars ScrollingOptions ScrollPosition Sec Sech SechDistribution SectionGrouping SectorChart SectorChart3D SectorOrigin SectorSpacing SeedRandom Select Selectable SelectComponents SelectedCells SelectedNotebook Selection SelectionAnimate SelectionCell SelectionCellCreateCell SelectionCellDefaultStyle SelectionCellParentStyle SelectionCreateCell SelectionDebuggerTag SelectionDuplicateCell SelectionEvaluate SelectionEvaluateCreateCell SelectionMove SelectionPlaceholder SelectionSetStyle SelectWithContents SelfLoops SelfLoopStyle SemialgebraicComponentInstances SendMail Sequence SequenceAlignment SequenceForm SequenceHold SequenceLimit Series SeriesCoefficient SeriesData SessionTime Set SetAccuracy SetAlphaChannel SetAttributes Setbacks SetBoxFormNamesPacket SetDelayed SetDirectory SetEnvironment SetEvaluationNotebook SetFileDate SetFileLoadingContext SetNotebookStatusLine SetOptions SetOptionsPacket SetPrecision SetProperty SetSelectedNotebook SetSharedFunction SetSharedVariable SetSpeechParametersPacket SetStreamPosition SetSystemOptions Setter SetterBar SetterBox SetterBoxOptions Setting SetValue Shading Shallow ShannonWavelet ShapiroWilkTest Share Sharpen ShearingMatrix ShearingTransform ShenCastanMatrix Short ShortDownArrow Shortest ShortestMatch ShortestPathFunction ShortLeftArrow ShortRightArrow ShortUpArrow Show ShowAutoStyles ShowCellBracket ShowCellLabel ShowCellTags ShowClosedCellArea ShowContents ShowControls ShowCursorTracker ShowGroupOpenCloseIcon ShowGroupOpener ShowInvisibleCharacters ShowPageBreaks ShowPredictiveInterface ShowSelection ShowShortBoxForm ShowSpecialCharacters ShowStringCharacters ShowSyntaxStyles ShrinkingDelay ShrinkWrapBoundingBox SiegelTheta SiegelTukeyTest Sign Signature SignedRankTest SignificanceLevel SignPadding SignTest SimilarityRules SimpleGraph SimpleGraphQ Simplify Sin Sinc SinghMaddalaDistribution SingleEvaluation SingleLetterItalics SingleLetterStyle SingularValueDecomposition SingularValueList SingularValuePlot SingularValues Sinh SinhIntegral SinIntegral SixJSymbol Skeleton SkeletonTransform SkellamDistribution Skewness SkewNormalDistribution Skip SliceDistribution Slider Slider2D Slider2DBox Slider2DBoxOptions SliderBox SliderBoxOptions SlideView Slot SlotSequence Small SmallCircle Smaller SmithDelayCompensator SmithWatermanSimilarity SmoothDensityHistogram SmoothHistogram SmoothHistogram3D SmoothKernelDistribution SocialMediaData Socket SokalSneathDissimilarity Solve SolveAlways SolveDelayed Sort SortBy Sound SoundAndGraphics SoundNote SoundVolume Sow Space SpaceForm Spacer Spacings Span SpanAdjustments SpanCharacterRounding SpanFromAbove SpanFromBoth SpanFromLeft SpanLineThickness SpanMaxSize SpanMinSize SpanningCharacters SpanSymmetric SparseArray SpatialGraphDistribution Speak SpeakTextPacket SpearmanRankTest SpearmanRho Spectrogram SpectrogramArray Specularity SpellingCorrection SpellingDictionaries SpellingDictionariesPath SpellingOptions SpellingSuggestionsPacket Sphere SphereBox SphericalBesselJ SphericalBesselY SphericalHankelH1 SphericalHankelH2 SphericalHarmonicY SphericalPlot3D SphericalRegion SpheroidalEigenvalue SpheroidalJoiningFactor SpheroidalPS SpheroidalPSPrime SpheroidalQS SpheroidalQSPrime SpheroidalRadialFactor SpheroidalS1 SpheroidalS1Prime SpheroidalS2 SpheroidalS2Prime Splice SplicedDistribution SplineClosed SplineDegree SplineKnots SplineWeights Split SplitBy SpokenString Sqrt SqrtBox SqrtBoxOptions Square SquaredEuclideanDistance SquareFreeQ SquareIntersection SquaresR SquareSubset SquareSubsetEqual SquareSuperset SquareSupersetEqual SquareUnion SquareWave StabilityMargins StabilityMarginsStyle StableDistribution Stack StackBegin StackComplete StackInhibit StandardDeviation StandardDeviationFilter StandardForm Standardize StandbyDistribution Star StarGraph StartAsynchronousTask StartingStepSize StartOfLine StartOfString StartScheduledTask StartupSound StateDimensions StateFeedbackGains StateOutputEstimator StateResponse StateSpaceModel StateSpaceRealization StateSpaceTransform StationaryDistribution StationaryWaveletPacketTransform StationaryWaveletTransform StatusArea StatusCentrality StepMonitor StieltjesGamma StirlingS1 StirlingS2 StopAsynchronousTask StopScheduledTask StrataVariables StratonovichProcess StreamColorFunction StreamColorFunctionScaling StreamDensityPlot StreamPlot StreamPoints StreamPosition Streams StreamScale StreamStyle String StringBreak StringByteCount StringCases StringCount StringDrop StringExpression StringForm StringFormat StringFreeQ StringInsert StringJoin StringLength StringMatchQ StringPosition StringQ StringReplace StringReplaceList StringReplacePart StringReverse StringRotateLeft StringRotateRight StringSkeleton StringSplit StringTake StringToStream StringTrim StripBoxes StripOnInput StripWrapperBoxes StrokeForm StructuralImportance StructuredArray StructuredSelection StruveH StruveL Stub StudentTDistribution Style StyleBox StyleBoxAutoDelete StyleBoxOptions StyleData StyleDefinitions StyleForm StyleKeyMapping StyleMenuListing StyleNameDialogSettings StyleNames StylePrint StyleSheetPath Subfactorial Subgraph SubMinus SubPlus SubresultantPolynomialRemainders SubresultantPolynomials Subresultants Subscript SubscriptBox SubscriptBoxOptions Subscripted Subset SubsetEqual Subsets SubStar Subsuperscript SubsuperscriptBox SubsuperscriptBoxOptions Subtract SubtractFrom SubValues Succeeds SucceedsEqual SucceedsSlantEqual SucceedsTilde SuchThat Sum SumConvergence Sunday SuperDagger SuperMinus SuperPlus Superscript SuperscriptBox SuperscriptBoxOptions Superset SupersetEqual SuperStar Surd SurdForm SurfaceColor SurfaceGraphics SurvivalDistribution SurvivalFunction SurvivalModel SurvivalModelFit SuspendPacket SuzukiDistribution SuzukiGroupSuz SwatchLegend Switch Symbol SymbolName SymletWavelet Symmetric SymmetricGroup SymmetricMatrixQ SymmetricPolynomial SymmetricReduction Symmetrize SymmetrizedArray SymmetrizedArrayRules SymmetrizedDependentComponents SymmetrizedIndependentComponents SymmetrizedReplacePart SynchronousInitialization SynchronousUpdating Syntax SyntaxForm SyntaxInformation SyntaxLength SyntaxPacket SyntaxQ SystemDialogInput SystemException SystemHelpPath SystemInformation SystemInformationData SystemOpen SystemOptions SystemsModelDelay SystemsModelDelayApproximate SystemsModelDelete SystemsModelDimensions SystemsModelExtract SystemsModelFeedbackConnect SystemsModelLabels SystemsModelOrder SystemsModelParallelConnect SystemsModelSeriesConnect SystemsModelStateFeedbackConnect SystemStub Tab TabFilling Table TableAlignments TableDepth TableDirections TableForm TableHeadings TableSpacing TableView TableViewBox TabSpacings TabView TabViewBox TabViewBoxOptions TagBox TagBoxNote TagBoxOptions TaggingRules TagSet TagSetDelayed TagStyle TagUnset Take TakeWhile Tally Tan Tanh TargetFunctions TargetUnits TautologyQ TelegraphProcess TemplateBox TemplateBoxOptions TemplateSlotSequence TemporalData Temporary TemporaryVariable TensorContract TensorDimensions TensorExpand TensorProduct TensorQ TensorRank TensorReduce TensorSymmetry TensorTranspose TensorWedge Tetrahedron TetrahedronBox TetrahedronBoxOptions TeXForm TeXSave Text Text3DBox Text3DBoxOptions TextAlignment TextBand TextBoundingBox TextBox TextCell TextClipboardType TextData TextForm TextJustification TextLine TextPacket TextParagraph TextRecognize TextRendering TextStyle Texture TextureCoordinateFunction TextureCoordinateScaling Therefore ThermometerGauge Thick Thickness Thin Thinning ThisLink ThompsonGroupTh Thread ThreeJSymbol Threshold Through Throw Thumbnail Thursday Ticks TicksStyle Tilde TildeEqual TildeFullEqual TildeTilde TimeConstrained TimeConstraint Times TimesBy TimeSeriesForecast TimeSeriesInvertibility TimeUsed TimeValue TimeZone Timing Tiny TitleGrouping TitsGroupT ToBoxes ToCharacterCode ToColor ToContinuousTimeModel ToDate ToDiscreteTimeModel ToeplitzMatrix ToExpression ToFileName Together Toggle ToggleFalse Toggler TogglerBar TogglerBox TogglerBoxOptions ToHeldExpression ToInvertibleTimeSeries TokenWords Tolerance ToLowerCase ToNumberField TooBig Tooltip TooltipBox TooltipBoxOptions TooltipDelay TooltipStyle Top TopHatTransform TopologicalSort ToRadicals ToRules ToString Total TotalHeight TotalVariationFilter TotalWidth TouchscreenAutoZoom TouchscreenControlPlacement ToUpperCase Tr Trace TraceAbove TraceAction TraceBackward TraceDepth TraceDialog TraceForward TraceInternal TraceLevel TraceOff TraceOn TraceOriginal TracePrint TraceScan TrackedSymbols TradingChart TraditionalForm TraditionalFunctionNotation TraditionalNotation TraditionalOrder TransferFunctionCancel TransferFunctionExpand TransferFunctionFactor TransferFunctionModel TransferFunctionPoles TransferFunctionTransform TransferFunctionZeros TransformationFunction TransformationFunctions TransformationMatrix TransformedDistribution TransformedField Translate TranslationTransform TransparentColor Transpose TreeForm TreeGraph TreeGraphQ TreePlot TrendStyle TriangleWave TriangularDistribution Trig TrigExpand TrigFactor TrigFactorList Trigger TrigReduce TrigToExp TrimmedMean True TrueQ TruncatedDistribution TsallisQExponentialDistribution TsallisQGaussianDistribution TTest Tube TubeBezierCurveBox TubeBezierCurveBoxOptions TubeBox TubeBSplineCurveBox TubeBSplineCurveBoxOptions Tuesday TukeyLambdaDistribution TukeyWindow Tuples TuranGraph TuringMachine Transparent UnateQ Uncompress Undefined UnderBar Underflow Underlined Underoverscript UnderoverscriptBox UnderoverscriptBoxOptions Underscript UnderscriptBox UnderscriptBoxOptions UndirectedEdge UndirectedGraph UndirectedGraphQ UndocumentedTestFEParserPacket UndocumentedTestGetSelectionPacket Unequal Unevaluated UniformDistribution UniformGraphDistribution UniformSumDistribution Uninstall Union UnionPlus Unique UnitBox UnitConvert UnitDimensions Unitize UnitRootTest UnitSimplify UnitStep UnitTriangle UnitVector Unprotect UnsameQ UnsavedVariables Unset UnsetShared UntrackedVariables Up UpArrow UpArrowBar UpArrowDownArrow Update UpdateDynamicObjects UpdateDynamicObjectsSynchronous UpdateInterval UpDownArrow UpEquilibrium UpperCaseQ UpperLeftArrow UpperRightArrow UpperTriangularize Upsample UpSet UpSetDelayed UpTee UpTeeArrow UpValues URL URLFetch URLFetchAsynchronous URLSave URLSaveAsynchronous UseGraphicsRange Using UsingFrontEnd V2Get ValidationLength Value ValueBox ValueBoxOptions ValueForm ValueQ ValuesData Variables Variance VarianceEquivalenceTest VarianceEstimatorFunction VarianceGammaDistribution VarianceTest VectorAngle VectorColorFunction VectorColorFunctionScaling VectorDensityPlot VectorGlyphData VectorPlot VectorPlot3D VectorPoints VectorQ Vectors VectorScale VectorStyle Vee Verbatim Verbose VerboseConvertToPostScriptPacket VerifyConvergence VerifySolutions VerifyTestAssumptions Version VersionNumber VertexAdd VertexCapacity VertexColors VertexComponent VertexConnectivity VertexCoordinateRules VertexCoordinates VertexCorrelationSimilarity VertexCosineSimilarity VertexCount VertexCoverQ VertexDataCoordinates VertexDegree VertexDelete VertexDiceSimilarity VertexEccentricity VertexInComponent VertexInDegree VertexIndex VertexJaccardSimilarity VertexLabeling VertexLabels VertexLabelStyle VertexList VertexNormals VertexOutComponent VertexOutDegree VertexQ VertexRenderingFunction VertexReplace VertexShape VertexShapeFunction VertexSize VertexStyle VertexTextureCoordinates VertexWeight Vertical VerticalBar VerticalForm VerticalGauge VerticalSeparator VerticalSlider VerticalTilde ViewAngle ViewCenter ViewMatrix ViewPoint ViewPointSelectorSettings ViewPort ViewRange ViewVector ViewVertical VirtualGroupData Visible VisibleCell VoigtDistribution VonMisesDistribution WaitAll WaitAsynchronousTask WaitNext WaitUntil WakebyDistribution WalleniusHypergeometricDistribution WaringYuleDistribution WatershedComponents WatsonUSquareTest WattsStrogatzGraphDistribution WaveletBestBasis WaveletFilterCoefficients WaveletImagePlot WaveletListPlot WaveletMapIndexed WaveletMatrixPlot WaveletPhi WaveletPsi WaveletScale WaveletScalogram WaveletThreshold WeaklyConnectedComponents WeaklyConnectedGraphQ WeakStationarity WeatherData WeberE Wedge Wednesday WeibullDistribution WeierstrassHalfPeriods WeierstrassInvariants WeierstrassP WeierstrassPPrime WeierstrassSigma WeierstrassZeta WeightedAdjacencyGraph WeightedAdjacencyMatrix WeightedData WeightedGraphQ Weights WelchWindow WheelGraph WhenEvent Which While White Whitespace WhitespaceCharacter WhittakerM WhittakerW WienerFilter WienerProcess WignerD WignerSemicircleDistribution WilksW WilksWTest WindowClickSelect WindowElements WindowFloating WindowFrame WindowFrameElements WindowMargins WindowMovable WindowOpacity WindowSelected WindowSize WindowStatusArea WindowTitle WindowToolbars WindowWidth With WolframAlpha WolframAlphaDate WolframAlphaQuantity WolframAlphaResult Word WordBoundary WordCharacter WordData WordSearch WordSeparators WorkingPrecision Write WriteString Wronskian XMLElement XMLObject Xnor Xor Yellow YuleDissimilarity ZernikeR ZeroSymmetric ZeroTest ZeroWidthTimes Zeta ZetaZero ZipfDistribution ZTest ZTransform $Aborted $ActivationGroupID $ActivationKey $ActivationUserRegistered $AddOnsDirectory $AssertFunction $Assumptions $AsynchronousTask $BaseDirectory $BatchInput $BatchOutput $BoxForms $ByteOrdering $Canceled $CharacterEncoding $CharacterEncodings $CommandLine $CompilationTarget $ConditionHold $ConfiguredKernels $Context $ContextPath $ControlActiveSetting $CreationDate $CurrentLink $DateStringFormat $DefaultFont $DefaultFrontEnd $DefaultImagingDevice $DefaultPath $Display $DisplayFunction $DistributedContexts $DynamicEvaluation $Echo $Epilog $ExportFormats $Failed $FinancialDataSource $FormatType $FrontEnd $FrontEndSession $GeoLocation $HistoryLength $HomeDirectory $HTTPCookies $IgnoreEOF $ImagingDevices $ImportFormats $InitialDirectory $Input $InputFileName $InputStreamMethods $Inspector $InstallationDate $InstallationDirectory $InterfaceEnvironment $IterationLimit $KernelCount $KernelID $Language $LaunchDirectory $LibraryPath $LicenseExpirationDate $LicenseID $LicenseProcesses $LicenseServer $LicenseSubprocesses $LicenseType $Line $Linked $LinkSupported $LoadedFiles $MachineAddresses $MachineDomain $MachineDomains $MachineEpsilon $MachineID $MachineName $MachinePrecision $MachineType $MaxExtraPrecision $MaxLicenseProcesses $MaxLicenseSubprocesses $MaxMachineNumber $MaxNumber $MaxPiecewiseCases $MaxPrecision $MaxRootDegree $MessageGroups $MessageList $MessagePrePrint $Messages $MinMachineNumber $MinNumber $MinorReleaseNumber $MinPrecision $ModuleNumber $NetworkLicense $NewMessage $NewSymbol $Notebooks $NumberMarks $Off $OperatingSystem $Output $OutputForms $OutputSizeLimit $OutputStreamMethods $Packages $ParentLink $ParentProcessID $PasswordFile $PatchLevelID $Path $PathnameSeparator $PerformanceGoal $PipeSupported $Post $Pre $PreferencesDirectory $PrePrint $PreRead $PrintForms $PrintLiteral $ProcessID $ProcessorCount $ProcessorType $ProductInformation $ProgramName $RandomState $RecursionLimit $ReleaseNumber $RootDirectory $ScheduledTask $ScriptCommandLine $SessionID $SetParentLink $SharedFunctions $SharedVariables $SoundDisplay $SoundDisplayFunction $SuppressInputFormHeads $SynchronousEvaluation $SyntaxHandler $System $SystemCharacterEncoding $SystemID $SystemWordLength $TemporaryDirectory $TemporaryPrefix $TextStyle $TimedOut $TimeUnit $TimeZone $TopDirectory $TraceOff $TraceOn $TracePattern $TracePostAction $TracePreAction $Urgent $UserAddOnsDirectory $UserBaseDirectory $UserDocumentsDirectory $UserName $Version $VersionNumber",
-c:[{cN:"comment",b:/\(\*/,e:/\*\)/},e.ASM,e.QSM,e.CNM,{cN:"list",b:/\{/,e:/\}/,i:/:/}]}});hljs.registerLanguage("fsharp",function(e){var t={b:"<",e:">",c:[e.inherit(e.TM,{b:/'[a-zA-Z0-9_]+/})]};return{aliases:["fs"],k:"yield! return! let! do!abstract and as assert base begin class default delegate do done downcast downto elif else end exception extern false finally for fun function global if in inherit inline interface internal lazy let match member module mutable namespace new null of open or override private public rec return sig static struct then to true try type upcast use val void when while with yield",c:[{cN:"string",b:'@"',e:'"',c:[{b:'""'}]},{cN:"string",b:'"""',e:'"""'},e.C("\\(\\*","\\*\\)"),{cN:"class",bK:"type",e:"\\(|=|$",eE:!0,c:[e.UTM,t]},{cN:"annotation",b:"\\[<",e:">\\]",r:10},{cN:"attribute",b:"\\B('[A-Za-z])\\b",c:[e.BE]},e.CLCM,e.inherit(e.QSM,{i:null}),e.CNM]}});hljs.registerLanguage("verilog",function(e){return{aliases:["v"],cI:!0,k:{keyword:"always and assign begin buf bufif0 bufif1 case casex casez cmos deassign default defparam disable edge else end endcase endfunction endmodule endprimitive endspecify endtable endtask event for force forever fork function if ifnone initial inout input join macromodule module nand negedge nmos nor not notif0 notif1 or output parameter pmos posedge primitive pulldown pullup rcmos release repeat rnmos rpmos rtran rtranif0 rtranif1 specify specparam table task timescale tran tranif0 tranif1 wait while xnor xor",typename:"highz0 highz1 integer large medium pull0 pull1 real realtime reg scalared signed small strong0 strong1 supply0 supply0 supply1 supply1 time tri tri0 tri1 triand trior trireg vectored wand weak0 weak1 wire wor"},c:[e.CBCM,e.CLCM,e.QSM,{cN:"number",b:"\\b(\\d+'(b|h|o|d|B|H|O|D))?[0-9xzXZ]+",c:[e.BE],r:0},{cN:"typename",b:"\\.\\w+",r:0},{cN:"value",b:"#\\((?!parameter).+\\)"},{cN:"keyword",b:"\\+|-|\\*|/|%|<|>|=|#|`|\\!|&|\\||@|:|\\^|~|\\{|\\}",r:0}]}});hljs.registerLanguage("dos",function(e){var r=e.C(/@?rem\b/,/$/,{r:10}),t={cN:"label",b:"^\\s*[A-Za-z._?][A-Za-z0-9_$#@~.?]*(:|\\s+label)",r:0};return{aliases:["bat","cmd"],cI:!0,k:{flow:"if else goto for in do call exit not exist errorlevel defined",operator:"equ neq lss leq gtr geq",keyword:"shift cd dir echo setlocal endlocal set pause copy",stream:"prn nul lpt3 lpt2 lpt1 con com4 com3 com2 com1 aux",winutils:"ping net ipconfig taskkill xcopy ren del",built_in:"append assoc at attrib break cacls cd chcp chdir chkdsk chkntfs cls cmd color comp compact convert date dir diskcomp diskcopy doskey erase fs find findstr format ftype graftabl help keyb label md mkdir mode more move path pause print popd pushd promt rd recover rem rename replace restore rmdir shiftsort start subst time title tree type ver verify vol"},c:[{cN:"envvar",b:/%%[^ ]|%[^ ]+?%|![^ ]+?!/},{cN:"function",b:t.b,e:"goto:eof",c:[e.inherit(e.TM,{b:"([_a-zA-Z]\\w*\\.)*([_a-zA-Z]\\w*:)?[_a-zA-Z]\\w*"}),r]},{cN:"number",b:"\\b\\d+",r:0},r]}});hljs.registerLanguage("gherkin",function(e){return{aliases:["feature"],k:"Feature Background Ability Business Need Scenario Scenarios Scenario Outline Scenario Template Examples Given And Then But When",c:[{cN:"keyword",b:"\\*"},e.C("@[^@\r\n	 ]+","$"),{cN:"string",b:"\\|",e:"\\$"},{cN:"variable",b:"<",e:">"},e.HCM,{cN:"string",b:'"""',e:'"""'},e.QSM]}});hljs.registerLanguage("xml",function(t){var e="[A-Za-z0-9\\._:-]+",s={b:/<\?(php)?(?!\w)/,e:/\?>/,sL:"php",subLanguageMode:"continuous"},c={eW:!0,i:/</,r:0,c:[s,{cN:"attribute",b:e,r:0},{b:"=",r:0,c:[{cN:"value",c:[s],v:[{b:/"/,e:/"/},{b:/'/,e:/'/},{b:/[^\s\/>]+/}]}]}]};return{aliases:["html","xhtml","rss","atom","xsl","plist"],cI:!0,c:[{cN:"doctype",b:"<!DOCTYPE",e:">",r:10,c:[{b:"\\[",e:"\\]"}]},t.C("<!--","-->",{r:10}),{cN:"cdata",b:"<\\!\\[CDATA\\[",e:"\\]\\]>",r:10},{cN:"tag",b:"<style(?=\\s|>|$)",e:">",k:{title:"style"},c:[c],starts:{e:"</style>",rE:!0,sL:"css"}},{cN:"tag",b:"<script(?=\\s|>|$)",e:">",k:{title:"script"},c:[c],starts:{e:"</script>",rE:!0,sL:""}},s,{cN:"pi",b:/<\?\w+/,e:/\?>/,r:10},{cN:"tag",b:"</?",e:"/?>",c:[{cN:"title",b:/[^ \/><\n\t]+/,r:0},c]}]}});hljs.registerLanguage("autohotkey",function(e){var r={cN:"escape",b:"`[\\s\\S]"},c=e.C(";","$",{r:0}),n=[{cN:"built_in",b:"A_[a-zA-Z0-9]+"},{cN:"built_in",bK:"ComSpec Clipboard ClipboardAll ErrorLevel"}];return{cI:!0,k:{keyword:"Break Continue Else Gosub If Loop Return While",literal:"A true false NOT AND OR"},c:n.concat([r,e.inherit(e.QSM,{c:[r]}),c,{cN:"number",b:e.NR,r:0},{cN:"var_expand",b:"%",e:"%",i:"\\n",c:[r]},{cN:"label",c:[r],v:[{b:'^[^\\n";]+::(?!=)'},{b:'^[^\\n";]+:(?!=)',r:0}]},{b:",\\s*,",r:10}])}});hljs.registerLanguage("r",function(e){var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";return{c:[e.HCM,{b:r,l:r,k:{keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass ...",literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},r:0},{cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},{cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},{cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},{cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},{b:"`",e:"`",r:0},{cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]}]}});hljs.registerLanguage("cs",function(e){var r="abstract as base bool break byte case catch char checked const continue decimal dynamic default delegate do double else enum event explicit extern false finally fixed float for foreach goto if implicit in int interface internal is lock long null when object operator out override params private protected public readonly ref sbyte sealed short sizeof stackalloc static string struct switch this true try typeof uint ulong unchecked unsafe ushort using virtual volatile void while async protected public private internal ascending descending from get group into join let orderby partial select set value var where yield",t=e.IR+"(<"+e.IR+">)?";return{aliases:["csharp"],k:r,i:/::/,c:[e.C("///","$",{rB:!0,c:[{cN:"xmlDocTag",v:[{b:"///",r:0},{b:"<!--|-->"},{b:"</?",e:">"}]}]}),e.CLCM,e.CBCM,{cN:"preprocessor",b:"#",e:"$",k:"if else elif endif define undef warning error line region endregion pragma checksum"},{cN:"string",b:'@"',e:'"',c:[{b:'""'}]},e.ASM,e.QSM,e.CNM,{bK:"class namespace interface",e:/[{;=]/,i:/[^\s:]/,c:[e.TM,e.CLCM,e.CBCM]},{bK:"new return throw await",r:0},{cN:"function",b:"("+t+"\\s+)+"+e.IR+"\\s*\\(",rB:!0,e:/[{;=]/,eE:!0,k:r,c:[{b:e.IR+"\\s*\\(",rB:!0,c:[e.TM],r:0},{cN:"params",b:/\(/,e:/\)/,k:r,r:0,c:[e.ASM,e.QSM,e.CNM,e.CBCM]},e.CLCM,e.CBCM]}]}});hljs.registerLanguage("nsis",function(e){var t={cN:"symbol",b:"\\$(ADMINTOOLS|APPDATA|CDBURN_AREA|CMDLINE|COMMONFILES32|COMMONFILES64|COMMONFILES|COOKIES|DESKTOP|DOCUMENTS|EXEDIR|EXEFILE|EXEPATH|FAVORITES|FONTS|HISTORY|HWNDPARENT|INSTDIR|INTERNET_CACHE|LANGUAGE|LOCALAPPDATA|MUSIC|NETHOOD|OUTDIR|PICTURES|PLUGINSDIR|PRINTHOOD|PROFILE|PROGRAMFILES32|PROGRAMFILES64|PROGRAMFILES|QUICKLAUNCH|RECENT|RESOURCES_LOCALIZED|RESOURCES|SENDTO|SMPROGRAMS|SMSTARTUP|STARTMENU|SYSDIR|TEMP|TEMPLATES|VIDEOS|WINDIR)"},n={cN:"constant",b:"\\$+{[a-zA-Z0-9_]+}"},i={cN:"variable",b:"\\$+[a-zA-Z0-9_]+",i:"\\(\\){}"},r={cN:"constant",b:"\\$+\\([a-zA-Z0-9_]+\\)"},o={cN:"params",b:"(ARCHIVE|FILE_ATTRIBUTE_ARCHIVE|FILE_ATTRIBUTE_NORMAL|FILE_ATTRIBUTE_OFFLINE|FILE_ATTRIBUTE_READONLY|FILE_ATTRIBUTE_SYSTEM|FILE_ATTRIBUTE_TEMPORARY|HKCR|HKCU|HKDD|HKEY_CLASSES_ROOT|HKEY_CURRENT_CONFIG|HKEY_CURRENT_USER|HKEY_DYN_DATA|HKEY_LOCAL_MACHINE|HKEY_PERFORMANCE_DATA|HKEY_USERS|HKLM|HKPD|HKU|IDABORT|IDCANCEL|IDIGNORE|IDNO|IDOK|IDRETRY|IDYES|MB_ABORTRETRYIGNORE|MB_DEFBUTTON1|MB_DEFBUTTON2|MB_DEFBUTTON3|MB_DEFBUTTON4|MB_ICONEXCLAMATION|MB_ICONINFORMATION|MB_ICONQUESTION|MB_ICONSTOP|MB_OK|MB_OKCANCEL|MB_RETRYCANCEL|MB_RIGHT|MB_RTLREADING|MB_SETFOREGROUND|MB_TOPMOST|MB_USERICON|MB_YESNO|NORMAL|OFFLINE|READONLY|SHCTX|SHELL_CONTEXT|SYSTEM|TEMPORARY)"},l={cN:"constant",b:"\\!(addincludedir|addplugindir|appendfile|cd|define|delfile|echo|else|endif|error|execute|finalize|getdllversionsystem|ifdef|ifmacrodef|ifmacrondef|ifndef|if|include|insertmacro|macroend|macro|makensis|packhdr|searchparse|searchreplace|tempfile|undef|verbose|warning)"};return{cI:!1,k:{keyword:"Abort AddBrandingImage AddSize AllowRootDirInstall AllowSkipFiles AutoCloseWindow BGFont BGGradient BrandingText BringToFront Call CallInstDLL Caption ChangeUI CheckBitmap ClearErrors CompletedText ComponentText CopyFiles CRCCheck CreateDirectory CreateFont CreateShortCut Delete DeleteINISec DeleteINIStr DeleteRegKey DeleteRegValue DetailPrint DetailsButtonText DirText DirVar DirVerify EnableWindow EnumRegKey EnumRegValue Exch Exec ExecShell ExecWait ExpandEnvStrings File FileBufSize FileClose FileErrorText FileOpen FileRead FileReadByte FileReadUTF16LE FileReadWord FileSeek FileWrite FileWriteByte FileWriteUTF16LE FileWriteWord FindClose FindFirst FindNext FindWindow FlushINI FunctionEnd GetCurInstType GetCurrentAddress GetDlgItem GetDLLVersion GetDLLVersionLocal GetErrorLevel GetFileTime GetFileTimeLocal GetFullPathName GetFunctionAddress GetInstDirError GetLabelAddress GetTempFileName Goto HideWindow Icon IfAbort IfErrors IfFileExists IfRebootFlag IfSilent InitPluginsDir InstallButtonText InstallColors InstallDir InstallDirRegKey InstProgressFlags InstType InstTypeGetText InstTypeSetText IntCmp IntCmpU IntFmt IntOp IsWindow LangString LicenseBkColor LicenseData LicenseForceSelection LicenseLangString LicenseText LoadLanguageFile LockWindow LogSet LogText ManifestDPIAware ManifestSupportedOS MessageBox MiscButtonText Name Nop OutFile Page PageCallbacks PageExEnd Pop Push Quit ReadEnvStr ReadINIStr ReadRegDWORD ReadRegStr Reboot RegDLL Rename RequestExecutionLevel ReserveFile Return RMDir SearchPath SectionEnd SectionGetFlags SectionGetInstTypes SectionGetSize SectionGetText SectionGroupEnd SectionIn SectionSetFlags SectionSetInstTypes SectionSetSize SectionSetText SendMessage SetAutoClose SetBrandingImage SetCompress SetCompressor SetCompressorDictSize SetCtlColors SetCurInstType SetDatablockOptimize SetDateSave SetDetailsPrint SetDetailsView SetErrorLevel SetErrors SetFileAttributes SetFont SetOutPath SetOverwrite SetPluginUnload SetRebootFlag SetRegView SetShellVarContext SetSilent ShowInstDetails ShowUninstDetails ShowWindow SilentInstall SilentUnInstall Sleep SpaceTexts StrCmp StrCmpS StrCpy StrLen SubCaption SubSectionEnd Unicode UninstallButtonText UninstallCaption UninstallIcon UninstallSubCaption UninstallText UninstPage UnRegDLL Var VIAddVersionKey VIFileVersion VIProductVersion WindowIcon WriteINIStr WriteRegBin WriteRegDWORD WriteRegExpandStr WriteRegStr WriteUninstaller XPStyle",literal:"admin all auto both colored current false force hide highest lastused leave listonly none normal notset off on open print show silent silentlog smooth textonly true user "},c:[e.HCM,e.CBCM,{cN:"string",b:'"',e:'"',i:"\\n",c:[{cN:"symbol",b:"\\$(\\\\(n|r|t)|\\$)"},t,n,i,r]},e.C(";","$",{r:0}),{cN:"function",bK:"Function PageEx Section SectionGroup SubSection",e:"$"},l,n,i,r,o,e.NM,{cN:"literal",b:e.IR+"::"+e.IR}]}});hljs.registerLanguage("less",function(e){var r="[\\w-]+",t="("+r+"|@{"+r+"})",a=[],c=[],n=function(e){return{cN:"string",b:"~?"+e+".*?"+e}},i=function(e,r,t){return{cN:e,b:r,r:t}},s=function(r,t,a){return e.inherit({cN:r,b:t+"\\(",e:"\\(",rB:!0,eE:!0,r:0},a)},b={b:"\\(",e:"\\)",c:c,r:0};c.push(e.CLCM,e.CBCM,n("'"),n('"'),e.CSSNM,i("hexcolor","#[0-9A-Fa-f]+\\b"),s("function","(url|data-uri)",{starts:{cN:"string",e:"[\\)\\n]",eE:!0}}),s("function",r),b,i("variable","@@?"+r,10),i("variable","@{"+r+"}"),i("built_in","~?`[^`]*?`"),{cN:"attribute",b:r+"\\s*:",e:":",rB:!0,eE:!0});var o=c.concat({b:"{",e:"}",c:a}),u={bK:"when",eW:!0,c:[{bK:"and not"}].concat(c)},C={cN:"attribute",b:t,e:":",eE:!0,c:[e.CLCM,e.CBCM],i:/\S/,starts:{e:"[;}]",rE:!0,c:c,i:"[<=$]"}},l={cN:"at_rule",b:"@(import|media|charset|font-face|(-[a-z]+-)?keyframes|supports|document|namespace|page|viewport|host)\\b",starts:{e:"[;{}]",rE:!0,c:c,r:0}},d={cN:"variable",v:[{b:"@"+r+"\\s*:",r:15},{b:"@"+r}],starts:{e:"[;}]",rE:!0,c:o}},p={v:[{b:"[\\.#:&\\[]",e:"[;{}]"},{b:t+"[^;]*{",e:"{"}],rB:!0,rE:!0,i:"[<='$\"]",c:[e.CLCM,e.CBCM,u,i("keyword","all\\b"),i("variable","@{"+r+"}"),i("tag",t+"%?",0),i("id","#"+t),i("class","\\."+t,0),i("keyword","&",0),s("pseudo",":not"),s("keyword",":extend"),i("pseudo","::?"+t),{cN:"attr_selector",b:"\\[",e:"\\]"},{b:"\\(",e:"\\)",c:o},{b:"!important"}]};return a.push(e.CLCM,e.CBCM,l,d,p,C),{cI:!0,i:"[=>'/<($\"]",c:a}});hljs.registerLanguage("pf",function(t){var o={cN:"variable",b:/\$[\w\d#@][\w\d_]*/},e={cN:"variable",b:/</,e:/>/};return{aliases:["pf.conf"],l:/[a-z0-9_<>-]+/,k:{built_in:"block match pass load anchor|5 antispoof|10 set table",keyword:"in out log quick on rdomain inet inet6 proto from port os to routeallow-opts divert-packet divert-reply divert-to flags group icmp-typeicmp6-type label once probability recieved-on rtable prio queuetos tag tagged user keep fragment for os dropaf-to|10 binat-to|10 nat-to|10 rdr-to|10 bitmask least-stats random round-robinsource-hash static-portdup-to reply-to route-toparent bandwidth default min max qlimitblock-policy debug fingerprints hostid limit loginterface optimizationreassemble ruleset-optimization basic none profile skip state-defaultsstate-policy timeoutconst counters persistno modulate synproxy state|5 floating if-bound no-sync pflow|10 sloppysource-track global rule max-src-nodes max-src-states max-src-connmax-src-conn-rate overload flushscrub|5 max-mss min-ttl no-df|10 random-id",literal:"all any no-route self urpf-failed egress|5 unknown"},c:[t.HCM,t.NM,t.QSM,o,e]}});hljs.registerLanguage("lasso",function(e){var r="[a-zA-Z_][a-zA-Z0-9_.]*",a="<\\?(lasso(script)?|=)",t="\\]|\\?>",s={literal:"true false none minimal full all void and or not bw nbw ew new cn ncn lt lte gt gte eq neq rx nrx ft",built_in:"array date decimal duration integer map pair string tag xml null boolean bytes keyword list locale queue set stack staticarray local var variable global data self inherited",keyword:"error_code error_msg error_pop error_push error_reset cache database_names database_schemanames database_tablenames define_tag define_type email_batch encode_set html_comment handle handle_error header if inline iterate ljax_target link link_currentaction link_currentgroup link_currentrecord link_detail link_firstgroup link_firstrecord link_lastgroup link_lastrecord link_nextgroup link_nextrecord link_prevgroup link_prevrecord log loop namespace_using output_none portal private protect records referer referrer repeating resultset rows search_args search_arguments select sort_args sort_arguments thread_atomic value_list while abort case else if_empty if_false if_null if_true loop_abort loop_continue loop_count params params_up return return_value run_children soap_definetag soap_lastrequest soap_lastresponse tag_name ascending average by define descending do equals frozen group handle_failure import in into join let match max min on order parent protected provide public require returnhome skip split_thread sum take thread to trait type where with yield yieldhome"},n=e.C("<!--","-->",{r:0}),o={cN:"preprocessor",b:"\\[noprocess\\]",starts:{cN:"markup",e:"\\[/noprocess\\]",rE:!0,c:[n]}},i={cN:"preprocessor",b:"\\[/noprocess|"+a},l={cN:"variable",b:"'"+r+"'"},c=[e.CLCM,{cN:"javadoc",b:"/\\*\\*!",e:"\\*/",c:[e.PWM]},e.CBCM,e.inherit(e.CNM,{b:e.CNR+"|(-?infinity|nan)\\b"}),e.inherit(e.ASM,{i:null}),e.inherit(e.QSM,{i:null}),{cN:"string",b:"`",e:"`"},{cN:"variable",v:[{b:"[#$]"+r},{b:"#",e:"\\d+",i:"\\W"}]},{cN:"tag",b:"::\\s*",e:r,i:"\\W"},{cN:"attribute",v:[{b:"-"+e.UIR,r:0},{b:"(\\.\\.\\.)"}]},{cN:"subst",v:[{b:"->\\s*",c:[l]},{b:":=|/(?!\\w)=?|[-+*%=<>&|!?\\\\]+",r:0}]},{cN:"built_in",b:"\\.\\.?\\s*",r:0,c:[l]},{cN:"class",bK:"define",rE:!0,e:"\\(|=>",c:[e.inherit(e.TM,{b:e.UIR+"(=(?!>))?"})]}];return{aliases:["ls","lassoscript"],cI:!0,l:r+"|&[lg]t;",k:s,c:[{cN:"preprocessor",b:t,r:0,starts:{cN:"markup",e:"\\[|"+a,rE:!0,r:0,c:[n]}},o,i,{cN:"preprocessor",b:"\\[no_square_brackets",starts:{e:"\\[/no_square_brackets\\]",l:r+"|&[lg]t;",k:s,c:[{cN:"preprocessor",b:t,r:0,starts:{cN:"markup",e:"\\[noprocess\\]|"+a,rE:!0,c:[n]}},o,i].concat(c)}},{cN:"preprocessor",b:"\\[",r:0},{cN:"shebang",b:"^#!.+lasso9\\b",r:10}].concat(c)}});hljs.registerLanguage("prolog",function(c){var r={cN:"atom",b:/[a-z][A-Za-z0-9_]*/,r:0},b={cN:"name",v:[{b:/[A-Z][a-zA-Z0-9_]*/},{b:/_[A-Za-z0-9_]*/}],r:0},a={b:/\(/,e:/\)/,r:0},e={b:/\[/,e:/\]/},n={cN:"comment",b:/%/,e:/$/,c:[c.PWM]},t={cN:"string",b:/`/,e:/`/,c:[c.BE]},g={cN:"string",b:/0\'(\\\'|.)/},N={cN:"string",b:/0\'\\s/},o={b:/:-/},s=[r,b,a,o,e,n,c.CBCM,c.QSM,c.ASM,t,g,N,c.CNM];return a.c=s,e.c=s,{c:s.concat([{b:/\.$/}])}});hljs.registerLanguage("oxygene",function(e){var r="abstract add and array as asc aspect assembly async begin break block by case class concat const copy constructor continue create default delegate desc distinct div do downto dynamic each else empty end ensure enum equals event except exit extension external false final finalize finalizer finally flags for forward from function future global group has if implementation implements implies in index inherited inline interface into invariants is iterator join locked locking loop matching method mod module namespace nested new nil not notify nullable of old on operator or order out override parallel params partial pinned private procedure property protected public queryable raise read readonly record reintroduce remove repeat require result reverse sealed select self sequence set shl shr skip static step soft take then to true try tuple type union unit unsafe until uses using var virtual raises volatile where while with write xor yield await mapped deprecated stdcall cdecl pascal register safecall overload library platform reference packed strict published autoreleasepool selector strong weak unretained",t=e.C("{","}",{r:0}),a=e.C("\\(\\*","\\*\\)",{r:10}),n={cN:"string",b:"'",e:"'",c:[{b:"''"}]},o={cN:"string",b:"(#\\d+)+"},i={cN:"function",bK:"function constructor destructor procedure method",e:"[:;]",k:"function constructor|10 destructor|10 procedure|10 method|10",c:[e.TM,{cN:"params",b:"\\(",e:"\\)",k:r,c:[n,o]},t,a]};return{cI:!0,k:r,i:'("|\\$[G-Zg-z]|\\/\\*|</|=>|->)',c:[t,a,e.CLCM,n,o,e.NM,i,{cN:"class",b:"=\\bclass\\b",e:"end;",k:r,c:[n,o,t,a,e.CLCM,i]}]}});hljs.registerLanguage("applescript",function(e){var t=e.inherit(e.QSM,{i:""}),r={cN:"params",b:"\\(",e:"\\)",c:["self",e.CNM,t]},o=e.C("--","$"),n=e.C("\\(\\*","\\*\\)",{c:["self",o]}),a=[o,n,e.HCM];return{aliases:["osascript"],k:{keyword:"about above after against and around as at back before beginning behind below beneath beside between but by considering contain contains continue copy div does eighth else end equal equals error every exit fifth first for fourth from front get given global if ignoring in into is it its last local me middle mod my ninth not of on onto or over prop property put ref reference repeat returning script second set seventh since sixth some tell tenth that the|0 then third through thru timeout times to transaction try until where while whose with without",constant:"AppleScript false linefeed return pi quote result space tab true",type:"alias application boolean class constant date file integer list number real record string text",command:"activate beep count delay launch log offset read round run say summarize write",property:"character characters contents day frontmost id item length month name paragraph paragraphs rest reverse running time version weekday word words year"},c:[t,e.CNM,{cN:"type",b:"\\bPOSIX file\\b"},{cN:"command",b:"\\b(clipboard info|the clipboard|info for|list (disks|folder)|mount volume|path to|(close|open for) access|(get|set) eof|current date|do shell script|get volume settings|random number|set volume|system attribute|system info|time to GMT|(load|run|store) script|scripting components|ASCII (character|number)|localized string|choose (application|color|file|file name|folder|from list|remote application|URL)|display (alert|dialog))\\b|^\\s*return\\b"},{cN:"constant",b:"\\b(text item delimiters|current application|missing value)\\b"},{cN:"keyword",b:"\\b(apart from|aside from|instead of|out of|greater than|isn't|(doesn't|does not) (equal|come before|come after|contain)|(greater|less) than( or equal)?|(starts?|ends|begins?) with|contained by|comes (before|after)|a (ref|reference))\\b"},{cN:"property",b:"\\b(POSIX path|(date|time) string|quoted form)\\b"},{cN:"function_start",bK:"on",i:"[${=;\\n]",c:[e.UTM,r]}].concat(a),i:"//|->|=>"}});hljs.registerLanguage("makefile",function(e){var a={cN:"variable",b:/\$\(/,e:/\)/,c:[e.BE]};return{aliases:["mk","mak"],c:[e.HCM,{b:/^\w+\s*\W*=/,rB:!0,r:0,starts:{cN:"constant",e:/\s*\W*=/,eE:!0,starts:{e:/$/,r:0,c:[a]}}},{cN:"title",b:/^[\w]+:\s*$/},{cN:"phony",b:/^\.PHONY:/,e:/$/,k:".PHONY",l:/[\.\w]+/},{b:/^\t+/,e:/$/,r:0,c:[e.QSM,a]}]}});hljs.registerLanguage("dust",function(e){var a="if eq ne lt lte gt gte select default math sep";return{aliases:["dst"],cI:!0,sL:"xml",subLanguageMode:"continuous",c:[{cN:"expression",b:"{",e:"}",r:0,c:[{cN:"begin-block",b:"#[a-zA-Z- .]+",k:a},{cN:"string",b:'"',e:'"'},{cN:"end-block",b:"\\/[a-zA-Z- .]+",k:a},{cN:"variable",b:"[a-zA-Z-.]+",k:a,r:0}]}]}});hljs.registerLanguage("clojure-repl",function(e){return{c:[{cN:"prompt",b:/^([\w.-]+|\s*#_)=>/,starts:{e:/$/,sL:"clojure",subLanguageMode:"continuous"}}]}});hljs.registerLanguage("dart",function(e){var t={cN:"subst",b:"\\$\\{",e:"}",k:"true false null this is new super"},r={cN:"string",v:[{b:"r'''",e:"'''"},{b:'r"""',e:'"""'},{b:"r'",e:"'",i:"\\n"},{b:'r"',e:'"',i:"\\n"},{b:"'''",e:"'''",c:[e.BE,t]},{b:'"""',e:'"""',c:[e.BE,t]},{b:"'",e:"'",i:"\\n",c:[e.BE,t]},{b:'"',e:'"',i:"\\n",c:[e.BE,t]}]};t.c=[e.CNM,r];var n={keyword:"assert break case catch class const continue default do else enum extends false final finally for if in is new null rethrow return super switch this throw true try var void while with",literal:"abstract as dynamic export external factory get implements import library operator part set static typedef",built_in:"print Comparable DateTime Duration Function Iterable Iterator List Map Match Null Object Pattern RegExp Set Stopwatch String StringBuffer StringSink Symbol Type Uri bool double int num document window querySelector querySelectorAll Element ElementList"};return{k:n,c:[r,{cN:"dartdoc",b:"/\\*\\*",e:"\\*/",sL:"markdown",subLanguageMode:"continuous"},{cN:"dartdoc",b:"///",e:"$",sL:"markdown",subLanguageMode:"continuous"},e.CLCM,e.CBCM,{cN:"class",bK:"class interface",e:"{",eE:!0,c:[{bK:"extends implements"},e.UTM]},e.CNM,{cN:"annotation",b:"@[A-Za-z]+"},{b:"=>"}]}});
\ No newline at end of file
diff --git a/docs/js/theme.js b/docs/js/theme.js
index dda9975ea19d35335d8e34fef5083f411e74ec55..aecbb8611c841d1eb3ff7fee16b080ef81be824d 100644
--- a/docs/js/theme.js
+++ b/docs/js/theme.js
@@ -13,19 +13,27 @@ $( document ).ready(function() {
 
     // Keyboard navigation
     document.addEventListener("keydown", function(e) {
-        if ($(e.target).is(':input')) return true;
-        var key = e.which || e.keyCode || window.event && window.event.keyCode;
-        var page;
-        switch (key) {
-            case 39:  // right arrow
-                page = $('[role="navigation"] a:contains(Next):first').prop('href');
-                break;
-            case 37:  // left arrow
-                page = $('[role="navigation"] a:contains(Previous):first').prop('href');
-                break;
-            default: break;
-        }
-        if (page) window.location.href = page;
+      var key = e.which || e.keyCode || window.event && window.event.keyCode;
+      var page;
+      switch (key) {
+          case 78:  // n
+              page = $('[role="navigation"] a:contains(Next):first').prop('href');
+              break;
+          case 80:  // p
+              page = $('[role="navigation"] a:contains(Previous):first').prop('href');
+              break;
+          case 13:  // enter
+              if (e.target === document.getElementById('mkdocs-search-query')) {
+                e.preventDefault();
+              }
+              break;
+          default: break;
+      }
+      if ($(e.target).is(':input')) {
+        return true;
+      } else if (page) {
+        window.location.href = page;
+      }
     });
 
     $(document).on('click', "[data-toggle='rst-current-version']", function() {
@@ -35,8 +43,6 @@ $( document ).ready(function() {
     // Make tables responsive
     $("table.docutils:not(.field-list)").wrap("<div class='wy-table-responsive'></div>");
 
-    hljs.initHighlightingOnLoad();
-
     $('table').addClass('docutils');
 });
 
diff --git a/docs/jupyter/index.html b/docs/jupyter.html
similarity index 77%
rename from docs/jupyter/index.html
rename to docs/jupyter.html
index ffe5102a4fab11fed644bb3224934a892f8fe6a3..d555c69af32d49640452c64752ccf37ea16ccbff 100644
--- a/docs/jupyter/index.html
+++ b/docs/jupyter.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Jupyter Notebooks - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Jupyter Notebooks";
     var mkdocs_page_input_path = "jupyter.md";
-    var mkdocs_page_url = "/jupyter/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -101,27 +102,27 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1 current">
 		
-    <a class="current" href="index.html">Jupyter Notebooks</a>
+    <a class="current" href="jupyter.html">Jupyter Notebooks</a>
     <ul class="subnav">
             
     <li class="toctree-l2"><a href="#jupyter-environment">Jupyter environment</a></li>
@@ -142,7 +143,7 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -151,11 +152,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -170,7 +171,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -178,7 +179,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
     
@@ -241,10 +242,10 @@ We welcome new ideas and implementations of Jupyter.</p>
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../design/index.html" class="btn btn-neutral float-right" title="Design">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="design.html" class="btn btn-neutral float-right" title="Design">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../model_zoo/index.html" class="btn btn-neutral" title="Model Zoo"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="model_zoo.html" class="btn btn-neutral" title="Model Zoo"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -270,18 +271,17 @@ We welcome new ideas and implementations of Jupyter.</p>
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../model_zoo/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="model_zoo.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../design/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="design.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/knowledge_distillation/index.html b/docs/knowledge_distillation.html
similarity index 81%
rename from docs/knowledge_distillation/index.html
rename to docs/knowledge_distillation.html
index f74bbbfaf539acb8a9efe0db68841c97f8ecae1c..ccb1c41b45414592fc85c6d54a5a5def27a74aaf 100644
--- a/docs/knowledge_distillation/index.html
+++ b/docs/knowledge_distillation.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Knowledge Distillation - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Knowledge Distillation";
     var mkdocs_page_input_path = "knowledge_distillation.md";
-    var mkdocs_page_url = "/knowledge_distillation/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,19 +75,19 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class=" current">
                     
-    <a class="current" href="index.html">Knowledge Distillation</a>
+    <a class="current" href="knowledge_distillation.html">Knowledge Distillation</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#knowledge-distillation">Knowledge Distillation</a></li>
@@ -104,7 +105,7 @@
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -115,32 +116,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -149,11 +150,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -168,7 +169,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -176,7 +177,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -195,7 +196,7 @@
             <div class="section">
               
                 <h1 id="knowledge-distillation">Knowledge Distillation</h1>
-<p>(For details on how to train a model with knowledge distillation in Distiller, see <a href="../schedule/index.html#knowledge-distillation">here</a>)</p>
+<p>(For details on how to train a model with knowledge distillation in Distiller, see <a href="schedule.html#knowledge-distillation">here</a>)</p>
 <p>Knowledge distillation is model compression method in which a small model is trained to mimic a pre-trained, larger model (or ensemble of models). This training setting is sometimes referred to as "teacher-student", where the large model is the teacher and the small model is the student (we'll be using these terms interchangeably).</p>
 <p>The method was first proposed by <a href="#bucila-et-al-2006">Bucila et al., 2006</a> and generalized by <a href="#hinton-et-al-2015">Hinton et al., 2015</a>. The implementation in Distiller is based on the latter publication. Here we'll provide a summary of the method. For more information the reader may refer to the paper (a <a href="https://www.youtube.com/watch?v=EK61htlw8hY">video lecture</a> with <a href="http://www.ttic.edu/dl/dark14.pdf">slides</a> is also available).</p>
 <p>In distillation, knowledge is transferred from the teacher model to the student by minimizing a loss function in which the target is the distribution of class probabilities predicted by the teacher model. That is - the output of a softmax function on the teacher model's logits. However, in many cases, this probability distribution has the correct class at a very high probability, with all other class probabilities very close to 0. As such, it doesn't provide much information beyond the ground truth labels already provided in the dataset. To tackle this issue, <a href="#hinton-et-al-2015">Hinton et al., 2015</a> introduced the concept of "softmax temperature". The probability <script type="math/tex">p_i</script> of class <script type="math/tex">i</script> is calculated from the logits <script type="math/tex">z</script> as:</p>
@@ -209,7 +210,7 @@
 <script type="math/tex; mode=display">\mathcal{L}(x;W) = \alpha * \mathcal{H}(y, \sigma(z_s; T=1)) + \beta * \mathcal{H}(\sigma(z_t; T=\tau), \sigma(z_s, T=\tau))</script>
 </p>
 <p>where <script type="math/tex">x</script> is the input, <script type="math/tex">W</script> are the student model parameters, <script type="math/tex">y</script> is the ground truth label, <script type="math/tex">\mathcal{H}</script> is the cross-entropy loss function, <script type="math/tex">\sigma</script> is the softmax function parameterized by the temperature <script type="math/tex">T</script>, and <script type="math/tex">\alpha</script> and <script type="math/tex">\beta</script> are coefficients. <script type="math/tex">z_s</script> and <script type="math/tex">z_t</script> are the logits of the student and teacher respectively.</p>
-<p><img alt="Knowledge Distillation" src="../imgs/knowledge_distillation.png" /></p>
+<p><img alt="Knowledge Distillation" src="imgs/knowledge_distillation.png" /></p>
 <h2 id="new-hyper-parameters">New Hyper-Parameters</h2>
 <p>In general <script type="math/tex">\tau</script>, <script type="math/tex">\alpha</script> and <script type="math/tex">\beta</script> are hyper parameters.</p>
 <p>In their experiments, <a href="#hinton-et-al-2015">Hinton et al., 2015</a> use temperature values ranging from 1 to 20. They note that empirically, when the student model is very small compared to the teacher model, lower temperatures work better. This makes sense if we consider that as we raise the temperature, the resulting soft-labels distribution becomes richer in information, and a very small model might not be able to capture all of this information. However, there's no clear way to predict up front what kind of capacity for information the student model will have.</p>
@@ -245,10 +246,10 @@
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../conditional_computation/index.html" class="btn btn-neutral float-right" title="Conditional Computation">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="conditional_computation.html" class="btn btn-neutral float-right" title="Conditional Computation">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../quantization/index.html" class="btn btn-neutral" title="Quantization"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="quantization.html" class="btn btn-neutral" title="Quantization"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -274,18 +275,17 @@
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../quantization/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="quantization.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../conditional_computation/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="conditional_computation.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/model_zoo/index.html b/docs/model_zoo.html
similarity index 92%
rename from docs/model_zoo/index.html
rename to docs/model_zoo.html
index 87b6d8b3d2f946657dcddd794a19726b6743b433..95ab3224bb660d77322092bd30609389e7c7bb28 100644
--- a/docs/model_zoo/index.html
+++ b/docs/model_zoo.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Model Zoo - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Model Zoo";
     var mkdocs_page_input_path = "model_zoo.md";
-    var mkdocs_page_url = "/model_zoo/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -101,22 +102,22 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1 current">
 		
-    <a class="current" href="index.html">Model Zoo</a>
+    <a class="current" href="model_zoo.html">Model Zoo</a>
     <ul class="subnav">
             
     <li class="toctree-l2"><a href="#distiller-model-zoo">Distiller Model Zoo</a></li>
@@ -143,12 +144,12 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -157,11 +158,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -176,7 +177,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -184,7 +185,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
     
@@ -380,7 +381,7 @@ Distiller schedule: <code>distiller/examples/ssl/ssl_4D-removal_4L_training.yaml
 2. The data-loss of the regularized training follows the same shape as the un-regularized training (baseline), and eventually the two seem to merge.
 3. We see similar behavior in the validation Top1 and Top5 accuracy results, but the regularized training eventually performs better.
 4. In the top right corner we see the behavior of the regularization loss (<code>Reg Loss</code>), which actually increases for some time, until the data-loss has a sharp drop (after ~16K mini-batches), at which point the regularization loss also starts dropping.
-<center><img alt="ReseNet20 SSL" src="../imgs/resnet20_ssl.png" /></center><br></p>
+<center><img alt="ReseNet20 SSL" src="imgs/resnet20_ssl.png" /></center><br></p>
 <p>This <strong>regularization</strong> yields 5 layers with zeroed weight tensors.  We load this model, remove the 5 layers, and start the fine tuning of the weights.  This process of layer removal is specific to ResNet for CIFAR, which we altered by adding code to skip over layers during the forward path.  When you export to ONNX, the removed layers do not participate in the forward path, so they don't get incarnated.  </p>
 <p>We managed to remove 5 of the 16 3x3 convolution layers which dominate the computation time.  It's not bad, but we probably could have done better.</p>
 <h3 id="fine-tuning">Fine-tuning</h3>
@@ -482,10 +483,10 @@ Top1: 92.830 and Top5: 99.760</p>
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../jupyter/index.html" class="btn btn-neutral float-right" title="Jupyter Notebooks">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="jupyter.html" class="btn btn-neutral float-right" title="Jupyter Notebooks">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../algo_earlyexit/index.html" class="btn btn-neutral" title="Early Exit"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="algo_earlyexit.html" class="btn btn-neutral" title="Early Exit"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -511,18 +512,17 @@ Top1: 92.830 and Top5: 99.760</p>
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../algo_earlyexit/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="algo_earlyexit.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../jupyter/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="jupyter.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/pruning/index.html b/docs/pruning.html
similarity index 83%
rename from docs/pruning/index.html
rename to docs/pruning.html
index a9a30b16962e689704166f1216bff53a5cbefff4..3f69fae1344c2d5a330aa6059fb69a41df0a3f65 100644
--- a/docs/pruning/index.html
+++ b/docs/pruning.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Pruning - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Pruning";
     var mkdocs_page_input_path = "pruning.md";
-    var mkdocs_page_url = "/pruning/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,7 +75,7 @@
     <ul class="subnav">
                 <li class=" current">
                     
-    <a class="current" href="index.html">Pruning</a>
+    <a class="current" href="pruning.html">Pruning</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#pruning">Pruning</a></li>
@@ -100,19 +101,19 @@
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -123,32 +124,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -157,11 +158,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -176,7 +177,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -184,7 +185,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -251,7 +252,7 @@ Much as we can prune structures, we can also perform sensitivity analysis on str
 <p>The diagram below shows the results of running an element-wise sensitivity analysis on Alexnet, using Distillers's <code>perform_sensitivity_analysis</code> utility function.
 <br>
 As reported by Song Han, and exhibited in the diagram, in Alexnet the feature detecting layers (convolution layers) are more sensitive to pruning, and their sensitivity drops, the deeper they are.  The fully-connected layers are much less sensitive, which is great, because that's where most of the parameters are.</p>
-<p><center><img alt="Alexnet sensitivity graph" src="../imgs/alexnet_top5_sensitivity.png" /></center><br></p>
+<p><center><img alt="Alexnet sensitivity graph" src="imgs/alexnet_top5_sensitivity.png" /></center><br></p>
 <h2 id="references">References</h2>
 <p><div id="han-et-al-2015"></div> <strong>Song Han, Jeff Pool, John Tran, William J. Dally</strong>.
     <a href="https://arxiv.org/abs/1506.02626"><em>Learning both Weights and Connections for Efficient Neural Networks</em></a>,
@@ -270,10 +271,10 @@ As reported by Song Han, and exhibited in the diagram, in Alexnet the feature de
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../regularization/index.html" class="btn btn-neutral float-right" title="Regularization">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="regularization.html" class="btn btn-neutral float-right" title="Regularization">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../schedule/index.html" class="btn btn-neutral" title="Compression Scheduling"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="schedule.html" class="btn btn-neutral" title="Compression Scheduling"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -299,18 +300,17 @@ As reported by Song Han, and exhibited in the diagram, in Alexnet the feature de
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../schedule/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="schedule.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../regularization/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="regularization.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/quantization/index.html b/docs/quantization.html
similarity index 88%
rename from docs/quantization/index.html
rename to docs/quantization.html
index bcec5522752a810ae72e7986e78d9e6c65684eaa..795485f2bbb0f8c0e74436d8bf8d2a8a4c922999 100644
--- a/docs/quantization/index.html
+++ b/docs/quantization.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Quantization - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Quantization";
     var mkdocs_page_input_path = "quantization.md";
-    var mkdocs_page_url = "/quantization/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,15 +75,15 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class=" current">
                     
-    <a class="current" href="index.html">Quantization</a>
+    <a class="current" href="quantization.html">Quantization</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#quantization">Quantization</a></li>
@@ -108,11 +109,11 @@
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -123,32 +124,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -157,11 +158,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -176,7 +177,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -184,7 +185,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -233,7 +234,7 @@ Additionally integer compute is <strong>faster</strong> than floating point comp
 <p>Note that very aggressive quantization can yield even more efficiency. If weights are binary (-1, 1) or ternary (-1, 0, 1 using 2-bits), then convolution and fully-connected layers can be computed with additions and subtractions only, removing multiplications completely. If activations are binary as well, then additions can also be removed, in favor of bitwise operations (<a href="#rastegari-et-al-2016">Rastegari et al., 2016</a>).</p>
 <h2 id="integer-vs-fp32">Integer vs. FP32</h2>
 <p>There are two main attributes when discussing a numerical format. The first is <strong>dynamic range</strong>, which refers to the range of representable numbers. The second one is how many values can be represented within the dynamic range, which in turn determines the <strong>precision / resolution</strong> of the format (the distance between two numbers).<br />
-For all integer formats, the dynamic range is <script type="math/tex">[-2^{n-1} .. 2^{n-1}-1]</script>, where <script type="math/tex">n</script> is the number of bits. So for INT8 the range is <script type="math/tex">[-128 .. 127]</script>, and for INT4 it is <script type="math/tex">[-16 .. 15]</script> (we're limiting ourselves to signed integers for now). The number of representable values is <script type="math/tex">2^n</script>.
+For all integer formats, the dynamic range is <script type="math/tex">[-2^{n-1} .. 2^{n-1}-1]</script>, where <script type="math/tex">n</script> is the number of bits. So for INT8 the range is <script type="math/tex">[-128 .. 127]</script>, and for INT4 it is <script type="math/tex">[-8 .. 7]</script> (we're limiting ourselves to signed integers for now). The number of representable values is <script type="math/tex">2^n</script>.
 Contrast that with FP32, where the dynamic range is <script type="math/tex">\pm 3.4\ x\ 10^{38}</script>, and approximately <script type="math/tex">4.2\ x\ 10^9</script> values can be represented.<br />
 We can immediately see that FP32 is much more <strong>versatile</strong>, in that it is able to represent a wide range of distributions accurately. This is a nice property for deep learning models, where the distributions of weights and activations are usually very different (at least in dynamic range). In addition the dynamic range can differ between layers in the model.<br />
 In order to be able to represent these different distributions with an integer format, a <strong>scale factor</strong> is used to map the dynamic range of the tensor to the integer format range. But still we remain with the issue of having a significantly lower number of representable values, that is - much lower resolution.<br />
@@ -257,7 +258,7 @@ As mentioned above, a scale factor is used to adapt the dynamic range of the ten
 <p>Naively quantizing a FP32 model to INT4 and lower usually incurs significant accuracy degradation. Many works have tried to mitigate this effect. They usually employ one or more of the following concepts in order to improve model accuracy:</p>
 <ul>
 <li><strong>Training / Re-Training</strong>: For INT4 and lower, training is required in order to obtain reasonable accuracy. The training loop is modified to take quantization into account. See details in the <a href="#quantization-aware-training">next section</a>.<br />
-<a href="#zhou-et-al-2016">Zhou S et al., 2016</a> have shown that bootstrapping the quantized model with trained FP32 weights leads to higher accuracy, as opposed to training from scratch. Other methods <em>require</em> a trained FP32 model, either as a starting point (<a href="#zhou-et-al-2017">Zhou A et al., 2017</a>), or as a teacher network in a knowledge distillation training setup (see <a href="../knowledge_distillation/index.html#combining">here</a>).</li>
+<a href="#zhou-et-al-2016">Zhou S et al., 2016</a> have shown that bootstrapping the quantized model with trained FP32 weights leads to higher accuracy, as opposed to training from scratch. Other methods <em>require</em> a trained FP32 model, either as a starting point (<a href="#zhou-et-al-2017">Zhou A et al., 2017</a>), or as a teacher network in a knowledge distillation training setup (see <a href="knowledge_distillation.html#combining">here</a>).</li>
 <li><strong>Replacing the activation function</strong>: The most common activation function in vision models is ReLU, which is unbounded. That is - its dynamic range is not limited for positive inputs. This is very problematic for INT4 and below due to the very limited range and resolution. Therefore, most methods replace ReLU with another function which is bounded. In some cases a clipping function with hard coded values is used (<a href="#zhou-et-al-2016">Zhou S et al., 2016</a>, <a href="#mishra-et-al-2018">Mishra et al., 2018</a>). Another method learns the clipping value per layer, with better results (<a href="#choi-et-al-2018">Choi et al., 2018</a>). Once the clipping value is set, the scale factor used for quantization is also set, and no further calibration steps are required (as opposed to INT8 methods described above).</li>
 <li><strong>Modifying network structure</strong>: <a href="#mishra-et-al-2018">Mishra et al., 2018</a> try to compensate for the loss of information due to quantization by using wider layers (more channels). <a href="#lin-et-al-2017">Lin et al., 2017</a> proposed a binary quantization method in which a single FP32 convolution is replaced with multiple binary convolutions, each scaled to represent a different "base", covering a larger dynamic range overall.</li>
 <li><strong>First and last layer</strong>: Many methods do not quantize the first and last layer of the model. It has been observed by <a href="#han-et-al-2015">Han et al., 2015</a> that the first convolutional layer is more sensitive to weights pruning, and some quantization works cite the same reason and show it empirically (<a href="#zhou-et-al-2016">Zhou S et al., 2016</a>, <a href="#choi-et-al-2018">Choi et al., 2018</a>). Some works also note that these layers usually constitute a very small portion of the overall computation within the model, further reducing the motivation to quantize them (<a href="#rastegari-et-al-2016">Rastegari et al., 2016</a>). Most methods keep the first and last layers at FP32. However, <a href="#choi-et-al-2018">Choi et al., 2018</a> showed that "conservative" quantization of these layers, e.g. to INT8, does not reduce accuracy.</li>
@@ -266,7 +267,7 @@ As mentioned above, a scale factor is used to adapt the dynamic range of the ten
 </ul>
 <h2 id="quantization-aware-training">Quantization-Aware Training</h2>
 <p>As mentioned above, in order to minimize the loss of accuracy from "aggressive" quantization, many methods that target INT4 and lower (and in some cases for INT8 as well) involve training the model in a way that considers the quantization. This means training with quantization of weights and activations "baked" into the training procedure. The training graph usually looks like this:</p>
-<p><img alt="Quantization-Aware Training" src="../imgs/training_quant_flow.png" /></p>
+<p><img alt="Quantization-Aware Training" src="imgs/training_quant_flow.png" /></p>
 <p>A full precision copy of the weights is maintained throughout the training process ("weights_fp" in the diagram). Its purpose is to accumulate the small changes from the gradients without loss of precision (Note that the quantization of the weights is an integral part of the training graph, meaning that we back-propagate through it as well). Once the model is trained, only the quantized weights are used for inference.<br />
 In the diagram we show "layer N" as the conv + batch-norm + activation combination, but the same applies to fully-connected layers, element-wise operations, etc. During training, the operations within "layer N" can still run in full precision, with the "quantize" operations in the boundaries ensuring discrete-valued weights and activations. This is sometimes called "simulated quantization".  </p>
 <h3 id="straight-through-estimator">Straight-Through Estimator</h3>
@@ -332,10 +333,10 @@ In the diagram we show "layer N" as the conv + batch-norm + activation combinati
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../knowledge_distillation/index.html" class="btn btn-neutral float-right" title="Knowledge Distillation">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="knowledge_distillation.html" class="btn btn-neutral float-right" title="Knowledge Distillation">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../regularization/index.html" class="btn btn-neutral" title="Regularization"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="regularization.html" class="btn btn-neutral" title="Regularization"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -361,18 +362,17 @@ In the diagram we show "layer N" as the conv + batch-norm + activation combinati
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../regularization/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="regularization.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../knowledge_distillation/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="knowledge_distillation.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/regularization/index.html b/docs/regularization.html
similarity index 81%
rename from docs/regularization/index.html
rename to docs/regularization.html
index 319d868c863e10588961ddb1795884fe37197699..ab472f5c5fddb77edaf9e93ec2853396ec7a8df2 100644
--- a/docs/regularization/index.html
+++ b/docs/regularization.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Regularization - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Regularization";
     var mkdocs_page_input_path = "regularization.md";
-    var mkdocs_page_url = "/regularization/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,11 +75,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class=" current">
                     
-    <a class="current" href="index.html">Regularization</a>
+    <a class="current" href="regularization.html">Regularization</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#regularization">Regularization</a></li>
@@ -98,15 +99,15 @@
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -117,32 +118,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -151,11 +152,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -170,7 +171,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -178,7 +179,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -302,10 +303,10 @@ it can be beneficial to improve inference speed.</p>
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../quantization/index.html" class="btn btn-neutral float-right" title="Quantization">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="quantization.html" class="btn btn-neutral float-right" title="Quantization">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../pruning/index.html" class="btn btn-neutral" title="Pruning"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="pruning.html" class="btn btn-neutral" title="Pruning"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -331,18 +332,17 @@ it can be beneficial to improve inference speed.</p>
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../pruning/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="pruning.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../quantization/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="quantization.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/schedule/index.html b/docs/schedule.html
similarity index 87%
rename from docs/schedule/index.html
rename to docs/schedule.html
index 5789fd21059ec36a301e5cdf46761590236b6841..1b00a67b3bd7515203ce313f095a38e75625e698 100644
--- a/docs/schedule/index.html
+++ b/docs/schedule.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Compression Scheduling - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Compression Scheduling";
     var mkdocs_page_input_path = "schedule.md";
-    var mkdocs_page_url = "/schedule/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1 current">
 		
-    <a class="current" href="index.html">Compression Scheduling</a>
+    <a class="current" href="schedule.html">Compression Scheduling</a>
     <ul class="subnav">
             
     <li class="toctree-l2"><a href="#compression-scheduler">Compression scheduler</a></li>
@@ -98,23 +99,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -125,32 +126,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -159,11 +160,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -178,7 +179,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -186,7 +187,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
     
@@ -432,7 +433,7 @@ policies:
 </code></pre>
 
 <h2 id="quantization-aware-training">Quantization-Aware Training</h2>
-<p>Similarly to pruners and regularizers, specifying a quantizer in the scheduler YAML follows the constructor arguments of the <code>Quantizer</code> class (see details <a href="../design/index.html#quantization">here</a>). <strong>Note</strong> that only a single quantizer instance may be defined per YAML.</p>
+<p>Similarly to pruners and regularizers, specifying a quantizer in the scheduler YAML follows the constructor arguments of the <code>Quantizer</code> class (see details <a href="design.html#quantization">here</a>). <strong>Note</strong> that only a single quantizer instance may be defined per YAML.</p>
 <p>Let's see an example:</p>
 <pre><code>quantizers:
   dorefa_quantizer:
@@ -494,9 +495,9 @@ policies:
       frequency: 1
 </code></pre>
 
-<p><strong>Important Note</strong>: As mentioned <a href="../design/index.html#quantization-aware-training">here</a>, since the quantizer modifies the model's parameters (assuming training with quantization in the loop is used), the call to <code>prepare_model()</code> must be performed before an optimizer is called. Therefore, currently, the starting epoch for a quantization policy must be 0, otherwise the quantization process will not work as expected. If one wishes to do a "warm-startup" (or "boot-strapping"), training for a few epochs with full precision and only then starting to quantize, the only way to do this right now is to execute a separate run to generate the boot-strapped weights, and execute a second which will resume the checkpoint with the boot-strapped weights.</p>
+<p><strong>Important Note</strong>: As mentioned <a href="design.html#quantization-aware-training">here</a>, since the quantizer modifies the model's parameters (assuming training with quantization in the loop is used), the call to <code>prepare_model()</code> must be performed before an optimizer is called. Therefore, currently, the starting epoch for a quantization policy must be 0, otherwise the quantization process will not work as expected. If one wishes to do a "warm-startup" (or "boot-strapping"), training for a few epochs with full precision and only then starting to quantize, the only way to do this right now is to execute a separate run to generate the boot-strapped weights, and execute a second which will resume the checkpoint with the boot-strapped weights.</p>
 <h2 id="post-training-quantization">Post-Training Quantization</h2>
-<p>Post-training quantization differs from the other techniques described here. Since it is not executed during training, it does not require any Policies nor a Scheduler. Currently, the only method implemented for post-training quantization is <a href="../algo_quantization/index.html#range-based-linear-quantization">range-based linear quantization</a>. Quantizing a model using this method, requires adding 2 lines of code:</p>
+<p>Post-training quantization differs from the other techniques described here. Since it is not executed during training, it does not require any Policies nor a Scheduler. Currently, the only method implemented for post-training quantization is <a href="algo_quantization.html#range-based-linear-quantization">range-based linear quantization</a>. Quantizing a model using this method, requires adding 2 lines of code:</p>
 <pre><code class="python">quantizer = distiller.quantization.PostTrainLinearQuantizer(model, &lt;quantizer arguments&gt;)
 quantizer.prepare_model()
 # Execute evaluation on model as usual
@@ -579,7 +580,7 @@ args = parser.parse_args()
 
 <p>The genreated YAML stats file can then be provided using the <code>`--qe-stats-file</code> argument. An example of a generated stats file can be found <a href="https://github.com/NervanaSystems/distiller/blob/master/examples/quantization/post_training_quant/stats/resnet18_quant_stats.yaml">here</a>.</p>
 <h2 id="knowledge-distillation">Knowledge Distillation</h2>
-<p>Knowledge distillation (see <a href="../knowledge_distillation/index.html">here</a>) is also implemented as a <code>Policy</code>, which should be added to the scheduler. However, with the current implementation, it cannot be defined within the YAML file like the rest of the policies described above.</p>
+<p>Knowledge distillation (see <a href="knowledge_distillation.html">here</a>) is also implemented as a <code>Policy</code>, which should be added to the scheduler. However, with the current implementation, it cannot be defined within the YAML file like the rest of the policies described above.</p>
 <p>To make the integration of this method into applications a bit easier, a helper function can be used that will add a set of command-line arguments related to knowledge distillation:</p>
 <pre><code>import argparse
 import distiller
@@ -644,10 +645,10 @@ else:
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../pruning/index.html" class="btn btn-neutral float-right" title="Pruning">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="pruning.html" class="btn btn-neutral float-right" title="Pruning">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../usage/index.html" class="btn btn-neutral" title="Usage"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="usage.html" class="btn btn-neutral" title="Usage"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -673,18 +674,17 @@ else:
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../usage/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="usage.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../pruning/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="pruning.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/search.html b/docs/search.html
index 6f0ca52182ce165de78b00668d1d277ab8b684e7..cbd1f82955205016b44fef03927538f8bb8fea14 100644
--- a/docs/search.html
+++ b/docs/search.html
@@ -13,12 +13,13 @@
 
   <link rel="stylesheet" href="./css/theme.css" type="text/css" />
   <link rel="stylesheet" href="./css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="./css/highlight.css">
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
   <link href="./extra.css" rel="stylesheet">
   
-  <script src="./js/jquery-2.1.1.min.js"></script>
-  <script src="./js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="./js/highlight.pack.js"></script> 
+  <script src="./js/jquery-2.1.1.min.js" defer></script>
+  <script src="./js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -29,10 +30,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="./index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
   <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -43,22 +44,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="index.html">Home</a>
+    <a class="" href="./index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="install/index.html">Installation</a>
+    <a class="" href="./install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="usage/index.html">Usage</a>
+    <a class="" href="./usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="schedule/index.html">Compression Scheduling</a>
+    <a class="" href="./schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -67,23 +68,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="pruning/index.html">Pruning</a>
+    <a class="" href="./pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="regularization/index.html">Regularization</a>
+    <a class="" href="./regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="quantization/index.html">Quantization</a>
+    <a class="" href="./quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="./knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="./conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -94,32 +95,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="algo_pruning/index.html">Pruning</a>
+    <a class="" href="./algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="algo_quantization/index.html">Quantization</a>
+    <a class="" href="./algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="./algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="model_zoo/index.html">Model Zoo</a>
+    <a class="" href="./model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="./jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="design/index.html">Design</a>
+    <a class="" href="./design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -128,11 +129,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="./tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="./tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -147,7 +148,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="index.html">Neural Network Distiller</a>
+        <a href="./index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -155,7 +156,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="index.html">Docs</a> &raquo;</li>
+    <li><a href="./index.html">Docs</a> &raquo;</li>
     
     
     <li class="wy-breadcrumbs-aside">
@@ -172,7 +173,7 @@
 
   <form id="content_search" action="search.html">
     <span role="status" aria-live="polite" class="ui-helper-hidden-accessible"></span>
-    <input name="q" id="mkdocs-search-query" type="text" class="search_input search-query ui-autocomplete-input" placeholder="Search the Docs" autocomplete="off" autofocus>
+    <input name="q" id="mkdocs-search-query" type="text" class="search_input search-query ui-autocomplete-input" placeholder="Search the Docs" autocomplete="off" autofocus title="Type search term here">
   </form>
 
   <div id="mkdocs-search-results" class="search-results">
@@ -210,10 +211,9 @@
     </span>
 </div>
     <script>var base_url = '.';</script>
-    <script src="./js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="./search/require.js"></script>
-      <script src="./search/search.js"></script>
+    <script src="./js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="./search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/search/lunr.js b/docs/search/lunr.js
new file mode 100644
index 0000000000000000000000000000000000000000..c218cc8979980f691a88ca2fbf52bccaec40a46f
--- /dev/null
+++ b/docs/search/lunr.js
@@ -0,0 +1,2986 @@
+/**
+ * lunr - http://lunrjs.com - A bit like Solr, but much smaller and not as bright - 2.1.6
+ * Copyright (C) 2018 Oliver Nightingale
+ * @license MIT
+ */
+
+;(function(){
+
+/**
+ * A convenience function for configuring and constructing
+ * a new lunr Index.
+ *
+ * A lunr.Builder instance is created and the pipeline setup
+ * with a trimmer, stop word filter and stemmer.
+ *
+ * This builder object is yielded to the configuration function
+ * that is passed as a parameter, allowing the list of fields
+ * and other builder parameters to be customised.
+ *
+ * All documents _must_ be added within the passed config function.
+ *
+ * @example
+ * var idx = lunr(function () {
+ *   this.field('title')
+ *   this.field('body')
+ *   this.ref('id')
+ *
+ *   documents.forEach(function (doc) {
+ *     this.add(doc)
+ *   }, this)
+ * })
+ *
+ * @see {@link lunr.Builder}
+ * @see {@link lunr.Pipeline}
+ * @see {@link lunr.trimmer}
+ * @see {@link lunr.stopWordFilter}
+ * @see {@link lunr.stemmer}
+ * @namespace {function} lunr
+ */
+var lunr = function (config) {
+  var builder = new lunr.Builder
+
+  builder.pipeline.add(
+    lunr.trimmer,
+    lunr.stopWordFilter,
+    lunr.stemmer
+  )
+
+  builder.searchPipeline.add(
+    lunr.stemmer
+  )
+
+  config.call(builder, builder)
+  return builder.build()
+}
+
+lunr.version = "2.1.6"
+/*!
+ * lunr.utils
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * A namespace containing utils for the rest of the lunr library
+ */
+lunr.utils = {}
+
+/**
+ * Print a warning message to the console.
+ *
+ * @param {String} message The message to be printed.
+ * @memberOf Utils
+ */
+lunr.utils.warn = (function (global) {
+  /* eslint-disable no-console */
+  return function (message) {
+    if (global.console && console.warn) {
+      console.warn(message)
+    }
+  }
+  /* eslint-enable no-console */
+})(this)
+
+/**
+ * Convert an object to a string.
+ *
+ * In the case of `null` and `undefined` the function returns
+ * the empty string, in all other cases the result of calling
+ * `toString` on the passed object is returned.
+ *
+ * @param {Any} obj The object to convert to a string.
+ * @return {String} string representation of the passed object.
+ * @memberOf Utils
+ */
+lunr.utils.asString = function (obj) {
+  if (obj === void 0 || obj === null) {
+    return ""
+  } else {
+    return obj.toString()
+  }
+}
+lunr.FieldRef = function (docRef, fieldName, stringValue) {
+  this.docRef = docRef
+  this.fieldName = fieldName
+  this._stringValue = stringValue
+}
+
+lunr.FieldRef.joiner = "/"
+
+lunr.FieldRef.fromString = function (s) {
+  var n = s.indexOf(lunr.FieldRef.joiner)
+
+  if (n === -1) {
+    throw "malformed field ref string"
+  }
+
+  var fieldRef = s.slice(0, n),
+      docRef = s.slice(n + 1)
+
+  return new lunr.FieldRef (docRef, fieldRef, s)
+}
+
+lunr.FieldRef.prototype.toString = function () {
+  if (this._stringValue == undefined) {
+    this._stringValue = this.fieldName + lunr.FieldRef.joiner + this.docRef
+  }
+
+  return this._stringValue
+}
+/**
+ * A function to calculate the inverse document frequency for
+ * a posting. This is shared between the builder and the index
+ *
+ * @private
+ * @param {object} posting - The posting for a given term
+ * @param {number} documentCount - The total number of documents.
+ */
+lunr.idf = function (posting, documentCount) {
+  var documentsWithTerm = 0
+
+  for (var fieldName in posting) {
+    if (fieldName == '_index') continue // Ignore the term index, its not a field
+    documentsWithTerm += Object.keys(posting[fieldName]).length
+  }
+
+  var x = (documentCount - documentsWithTerm + 0.5) / (documentsWithTerm + 0.5)
+
+  return Math.log(1 + Math.abs(x))
+}
+
+/**
+ * A token wraps a string representation of a token
+ * as it is passed through the text processing pipeline.
+ *
+ * @constructor
+ * @param {string} [str=''] - The string token being wrapped.
+ * @param {object} [metadata={}] - Metadata associated with this token.
+ */
+lunr.Token = function (str, metadata) {
+  this.str = str || ""
+  this.metadata = metadata || {}
+}
+
+/**
+ * Returns the token string that is being wrapped by this object.
+ *
+ * @returns {string}
+ */
+lunr.Token.prototype.toString = function () {
+  return this.str
+}
+
+/**
+ * A token update function is used when updating or optionally
+ * when cloning a token.
+ *
+ * @callback lunr.Token~updateFunction
+ * @param {string} str - The string representation of the token.
+ * @param {Object} metadata - All metadata associated with this token.
+ */
+
+/**
+ * Applies the given function to the wrapped string token.
+ *
+ * @example
+ * token.update(function (str, metadata) {
+ *   return str.toUpperCase()
+ * })
+ *
+ * @param {lunr.Token~updateFunction} fn - A function to apply to the token string.
+ * @returns {lunr.Token}
+ */
+lunr.Token.prototype.update = function (fn) {
+  this.str = fn(this.str, this.metadata)
+  return this
+}
+
+/**
+ * Creates a clone of this token. Optionally a function can be
+ * applied to the cloned token.
+ *
+ * @param {lunr.Token~updateFunction} [fn] - An optional function to apply to the cloned token.
+ * @returns {lunr.Token}
+ */
+lunr.Token.prototype.clone = function (fn) {
+  fn = fn || function (s) { return s }
+  return new lunr.Token (fn(this.str, this.metadata), this.metadata)
+}
+/*!
+ * lunr.tokenizer
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * A function for splitting a string into tokens ready to be inserted into
+ * the search index. Uses `lunr.tokenizer.separator` to split strings, change
+ * the value of this property to change how strings are split into tokens.
+ *
+ * This tokenizer will convert its parameter to a string by calling `toString` and
+ * then will split this string on the character in `lunr.tokenizer.separator`.
+ * Arrays will have their elements converted to strings and wrapped in a lunr.Token.
+ *
+ * @static
+ * @param {?(string|object|object[])} obj - The object to convert into tokens
+ * @returns {lunr.Token[]}
+ */
+lunr.tokenizer = function (obj) {
+  if (obj == null || obj == undefined) {
+    return []
+  }
+
+  if (Array.isArray(obj)) {
+    return obj.map(function (t) {
+      return new lunr.Token(lunr.utils.asString(t).toLowerCase())
+    })
+  }
+
+  var str = obj.toString().trim().toLowerCase(),
+      len = str.length,
+      tokens = []
+
+  for (var sliceEnd = 0, sliceStart = 0; sliceEnd <= len; sliceEnd++) {
+    var char = str.charAt(sliceEnd),
+        sliceLength = sliceEnd - sliceStart
+
+    if ((char.match(lunr.tokenizer.separator) || sliceEnd == len)) {
+
+      if (sliceLength > 0) {
+        tokens.push(
+          new lunr.Token (str.slice(sliceStart, sliceEnd), {
+            position: [sliceStart, sliceLength],
+            index: tokens.length
+          })
+        )
+      }
+
+      sliceStart = sliceEnd + 1
+    }
+
+  }
+
+  return tokens
+}
+
+/**
+ * The separator used to split a string into tokens. Override this property to change the behaviour of
+ * `lunr.tokenizer` behaviour when tokenizing strings. By default this splits on whitespace and hyphens.
+ *
+ * @static
+ * @see lunr.tokenizer
+ */
+lunr.tokenizer.separator = /[\s\-]+/
+/*!
+ * lunr.Pipeline
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * lunr.Pipelines maintain an ordered list of functions to be applied to all
+ * tokens in documents entering the search index and queries being ran against
+ * the index.
+ *
+ * An instance of lunr.Index created with the lunr shortcut will contain a
+ * pipeline with a stop word filter and an English language stemmer. Extra
+ * functions can be added before or after either of these functions or these
+ * default functions can be removed.
+ *
+ * When run the pipeline will call each function in turn, passing a token, the
+ * index of that token in the original list of all tokens and finally a list of
+ * all the original tokens.
+ *
+ * The output of functions in the pipeline will be passed to the next function
+ * in the pipeline. To exclude a token from entering the index the function
+ * should return undefined, the rest of the pipeline will not be called with
+ * this token.
+ *
+ * For serialisation of pipelines to work, all functions used in an instance of
+ * a pipeline should be registered with lunr.Pipeline. Registered functions can
+ * then be loaded. If trying to load a serialised pipeline that uses functions
+ * that are not registered an error will be thrown.
+ *
+ * If not planning on serialising the pipeline then registering pipeline functions
+ * is not necessary.
+ *
+ * @constructor
+ */
+lunr.Pipeline = function () {
+  this._stack = []
+}
+
+lunr.Pipeline.registeredFunctions = Object.create(null)
+
+/**
+ * A pipeline function maps lunr.Token to lunr.Token. A lunr.Token contains the token
+ * string as well as all known metadata. A pipeline function can mutate the token string
+ * or mutate (or add) metadata for a given token.
+ *
+ * A pipeline function can indicate that the passed token should be discarded by returning
+ * null. This token will not be passed to any downstream pipeline functions and will not be
+ * added to the index.
+ *
+ * Multiple tokens can be returned by returning an array of tokens. Each token will be passed
+ * to any downstream pipeline functions and all will returned tokens will be added to the index.
+ *
+ * Any number of pipeline functions may be chained together using a lunr.Pipeline.
+ *
+ * @interface lunr.PipelineFunction
+ * @param {lunr.Token} token - A token from the document being processed.
+ * @param {number} i - The index of this token in the complete list of tokens for this document/field.
+ * @param {lunr.Token[]} tokens - All tokens for this document/field.
+ * @returns {(?lunr.Token|lunr.Token[])}
+ */
+
+/**
+ * Register a function with the pipeline.
+ *
+ * Functions that are used in the pipeline should be registered if the pipeline
+ * needs to be serialised, or a serialised pipeline needs to be loaded.
+ *
+ * Registering a function does not add it to a pipeline, functions must still be
+ * added to instances of the pipeline for them to be used when running a pipeline.
+ *
+ * @param {lunr.PipelineFunction} fn - The function to check for.
+ * @param {String} label - The label to register this function with
+ */
+lunr.Pipeline.registerFunction = function (fn, label) {
+  if (label in this.registeredFunctions) {
+    lunr.utils.warn('Overwriting existing registered function: ' + label)
+  }
+
+  fn.label = label
+  lunr.Pipeline.registeredFunctions[fn.label] = fn
+}
+
+/**
+ * Warns if the function is not registered as a Pipeline function.
+ *
+ * @param {lunr.PipelineFunction} fn - The function to check for.
+ * @private
+ */
+lunr.Pipeline.warnIfFunctionNotRegistered = function (fn) {
+  var isRegistered = fn.label && (fn.label in this.registeredFunctions)
+
+  if (!isRegistered) {
+    lunr.utils.warn('Function is not registered with pipeline. This may cause problems when serialising the index.\n', fn)
+  }
+}
+
+/**
+ * Loads a previously serialised pipeline.
+ *
+ * All functions to be loaded must already be registered with lunr.Pipeline.
+ * If any function from the serialised data has not been registered then an
+ * error will be thrown.
+ *
+ * @param {Object} serialised - The serialised pipeline to load.
+ * @returns {lunr.Pipeline}
+ */
+lunr.Pipeline.load = function (serialised) {
+  var pipeline = new lunr.Pipeline
+
+  serialised.forEach(function (fnName) {
+    var fn = lunr.Pipeline.registeredFunctions[fnName]
+
+    if (fn) {
+      pipeline.add(fn)
+    } else {
+      throw new Error('Cannot load unregistered function: ' + fnName)
+    }
+  })
+
+  return pipeline
+}
+
+/**
+ * Adds new functions to the end of the pipeline.
+ *
+ * Logs a warning if the function has not been registered.
+ *
+ * @param {lunr.PipelineFunction[]} functions - Any number of functions to add to the pipeline.
+ */
+lunr.Pipeline.prototype.add = function () {
+  var fns = Array.prototype.slice.call(arguments)
+
+  fns.forEach(function (fn) {
+    lunr.Pipeline.warnIfFunctionNotRegistered(fn)
+    this._stack.push(fn)
+  }, this)
+}
+
+/**
+ * Adds a single function after a function that already exists in the
+ * pipeline.
+ *
+ * Logs a warning if the function has not been registered.
+ *
+ * @param {lunr.PipelineFunction} existingFn - A function that already exists in the pipeline.
+ * @param {lunr.PipelineFunction} newFn - The new function to add to the pipeline.
+ */
+lunr.Pipeline.prototype.after = function (existingFn, newFn) {
+  lunr.Pipeline.warnIfFunctionNotRegistered(newFn)
+
+  var pos = this._stack.indexOf(existingFn)
+  if (pos == -1) {
+    throw new Error('Cannot find existingFn')
+  }
+
+  pos = pos + 1
+  this._stack.splice(pos, 0, newFn)
+}
+
+/**
+ * Adds a single function before a function that already exists in the
+ * pipeline.
+ *
+ * Logs a warning if the function has not been registered.
+ *
+ * @param {lunr.PipelineFunction} existingFn - A function that already exists in the pipeline.
+ * @param {lunr.PipelineFunction} newFn - The new function to add to the pipeline.
+ */
+lunr.Pipeline.prototype.before = function (existingFn, newFn) {
+  lunr.Pipeline.warnIfFunctionNotRegistered(newFn)
+
+  var pos = this._stack.indexOf(existingFn)
+  if (pos == -1) {
+    throw new Error('Cannot find existingFn')
+  }
+
+  this._stack.splice(pos, 0, newFn)
+}
+
+/**
+ * Removes a function from the pipeline.
+ *
+ * @param {lunr.PipelineFunction} fn The function to remove from the pipeline.
+ */
+lunr.Pipeline.prototype.remove = function (fn) {
+  var pos = this._stack.indexOf(fn)
+  if (pos == -1) {
+    return
+  }
+
+  this._stack.splice(pos, 1)
+}
+
+/**
+ * Runs the current list of functions that make up the pipeline against the
+ * passed tokens.
+ *
+ * @param {Array} tokens The tokens to run through the pipeline.
+ * @returns {Array}
+ */
+lunr.Pipeline.prototype.run = function (tokens) {
+  var stackLength = this._stack.length
+
+  for (var i = 0; i < stackLength; i++) {
+    var fn = this._stack[i]
+    var memo = []
+
+    for (var j = 0; j < tokens.length; j++) {
+      var result = fn(tokens[j], j, tokens)
+
+      if (result === void 0 || result === '') continue
+
+      if (result instanceof Array) {
+        for (var k = 0; k < result.length; k++) {
+          memo.push(result[k])
+        }
+      } else {
+        memo.push(result)
+      }
+    }
+
+    tokens = memo
+  }
+
+  return tokens
+}
+
+/**
+ * Convenience method for passing a string through a pipeline and getting
+ * strings out. This method takes care of wrapping the passed string in a
+ * token and mapping the resulting tokens back to strings.
+ *
+ * @param {string} str - The string to pass through the pipeline.
+ * @returns {string[]}
+ */
+lunr.Pipeline.prototype.runString = function (str) {
+  var token = new lunr.Token (str)
+
+  return this.run([token]).map(function (t) {
+    return t.toString()
+  })
+}
+
+/**
+ * Resets the pipeline by removing any existing processors.
+ *
+ */
+lunr.Pipeline.prototype.reset = function () {
+  this._stack = []
+}
+
+/**
+ * Returns a representation of the pipeline ready for serialisation.
+ *
+ * Logs a warning if the function has not been registered.
+ *
+ * @returns {Array}
+ */
+lunr.Pipeline.prototype.toJSON = function () {
+  return this._stack.map(function (fn) {
+    lunr.Pipeline.warnIfFunctionNotRegistered(fn)
+
+    return fn.label
+  })
+}
+/*!
+ * lunr.Vector
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * A vector is used to construct the vector space of documents and queries. These
+ * vectors support operations to determine the similarity between two documents or
+ * a document and a query.
+ *
+ * Normally no parameters are required for initializing a vector, but in the case of
+ * loading a previously dumped vector the raw elements can be provided to the constructor.
+ *
+ * For performance reasons vectors are implemented with a flat array, where an elements
+ * index is immediately followed by its value. E.g. [index, value, index, value]. This
+ * allows the underlying array to be as sparse as possible and still offer decent
+ * performance when being used for vector calculations.
+ *
+ * @constructor
+ * @param {Number[]} [elements] - The flat list of element index and element value pairs.
+ */
+lunr.Vector = function (elements) {
+  this._magnitude = 0
+  this.elements = elements || []
+}
+
+
+/**
+ * Calculates the position within the vector to insert a given index.
+ *
+ * This is used internally by insert and upsert. If there are duplicate indexes then
+ * the position is returned as if the value for that index were to be updated, but it
+ * is the callers responsibility to check whether there is a duplicate at that index
+ *
+ * @param {Number} insertIdx - The index at which the element should be inserted.
+ * @returns {Number}
+ */
+lunr.Vector.prototype.positionForIndex = function (index) {
+  // For an empty vector the tuple can be inserted at the beginning
+  if (this.elements.length == 0) {
+    return 0
+  }
+
+  var start = 0,
+      end = this.elements.length / 2,
+      sliceLength = end - start,
+      pivotPoint = Math.floor(sliceLength / 2),
+      pivotIndex = this.elements[pivotPoint * 2]
+
+  while (sliceLength > 1) {
+    if (pivotIndex < index) {
+      start = pivotPoint
+    }
+
+    if (pivotIndex > index) {
+      end = pivotPoint
+    }
+
+    if (pivotIndex == index) {
+      break
+    }
+
+    sliceLength = end - start
+    pivotPoint = start + Math.floor(sliceLength / 2)
+    pivotIndex = this.elements[pivotPoint * 2]
+  }
+
+  if (pivotIndex == index) {
+    return pivotPoint * 2
+  }
+
+  if (pivotIndex > index) {
+    return pivotPoint * 2
+  }
+
+  if (pivotIndex < index) {
+    return (pivotPoint + 1) * 2
+  }
+}
+
+/**
+ * Inserts an element at an index within the vector.
+ *
+ * Does not allow duplicates, will throw an error if there is already an entry
+ * for this index.
+ *
+ * @param {Number} insertIdx - The index at which the element should be inserted.
+ * @param {Number} val - The value to be inserted into the vector.
+ */
+lunr.Vector.prototype.insert = function (insertIdx, val) {
+  this.upsert(insertIdx, val, function () {
+    throw "duplicate index"
+  })
+}
+
+/**
+ * Inserts or updates an existing index within the vector.
+ *
+ * @param {Number} insertIdx - The index at which the element should be inserted.
+ * @param {Number} val - The value to be inserted into the vector.
+ * @param {function} fn - A function that is called for updates, the existing value and the
+ * requested value are passed as arguments
+ */
+lunr.Vector.prototype.upsert = function (insertIdx, val, fn) {
+  this._magnitude = 0
+  var position = this.positionForIndex(insertIdx)
+
+  if (this.elements[position] == insertIdx) {
+    this.elements[position + 1] = fn(this.elements[position + 1], val)
+  } else {
+    this.elements.splice(position, 0, insertIdx, val)
+  }
+}
+
+/**
+ * Calculates the magnitude of this vector.
+ *
+ * @returns {Number}
+ */
+lunr.Vector.prototype.magnitude = function () {
+  if (this._magnitude) return this._magnitude
+
+  var sumOfSquares = 0,
+      elementsLength = this.elements.length
+
+  for (var i = 1; i < elementsLength; i += 2) {
+    var val = this.elements[i]
+    sumOfSquares += val * val
+  }
+
+  return this._magnitude = Math.sqrt(sumOfSquares)
+}
+
+/**
+ * Calculates the dot product of this vector and another vector.
+ *
+ * @param {lunr.Vector} otherVector - The vector to compute the dot product with.
+ * @returns {Number}
+ */
+lunr.Vector.prototype.dot = function (otherVector) {
+  var dotProduct = 0,
+      a = this.elements, b = otherVector.elements,
+      aLen = a.length, bLen = b.length,
+      aVal = 0, bVal = 0,
+      i = 0, j = 0
+
+  while (i < aLen && j < bLen) {
+    aVal = a[i], bVal = b[j]
+    if (aVal < bVal) {
+      i += 2
+    } else if (aVal > bVal) {
+      j += 2
+    } else if (aVal == bVal) {
+      dotProduct += a[i + 1] * b[j + 1]
+      i += 2
+      j += 2
+    }
+  }
+
+  return dotProduct
+}
+
+/**
+ * Calculates the cosine similarity between this vector and another
+ * vector.
+ *
+ * @param {lunr.Vector} otherVector - The other vector to calculate the
+ * similarity with.
+ * @returns {Number}
+ */
+lunr.Vector.prototype.similarity = function (otherVector) {
+  return this.dot(otherVector) / (this.magnitude() * otherVector.magnitude())
+}
+
+/**
+ * Converts the vector to an array of the elements within the vector.
+ *
+ * @returns {Number[]}
+ */
+lunr.Vector.prototype.toArray = function () {
+  var output = new Array (this.elements.length / 2)
+
+  for (var i = 1, j = 0; i < this.elements.length; i += 2, j++) {
+    output[j] = this.elements[i]
+  }
+
+  return output
+}
+
+/**
+ * A JSON serializable representation of the vector.
+ *
+ * @returns {Number[]}
+ */
+lunr.Vector.prototype.toJSON = function () {
+  return this.elements
+}
+/* eslint-disable */
+/*!
+ * lunr.stemmer
+ * Copyright (C) 2018 Oliver Nightingale
+ * Includes code from - http://tartarus.org/~martin/PorterStemmer/js.txt
+ */
+
+/**
+ * lunr.stemmer is an english language stemmer, this is a JavaScript
+ * implementation of the PorterStemmer taken from http://tartarus.org/~martin
+ *
+ * @static
+ * @implements {lunr.PipelineFunction}
+ * @param {lunr.Token} token - The string to stem
+ * @returns {lunr.Token}
+ * @see {@link lunr.Pipeline}
+ */
+lunr.stemmer = (function(){
+  var step2list = {
+      "ational" : "ate",
+      "tional" : "tion",
+      "enci" : "ence",
+      "anci" : "ance",
+      "izer" : "ize",
+      "bli" : "ble",
+      "alli" : "al",
+      "entli" : "ent",
+      "eli" : "e",
+      "ousli" : "ous",
+      "ization" : "ize",
+      "ation" : "ate",
+      "ator" : "ate",
+      "alism" : "al",
+      "iveness" : "ive",
+      "fulness" : "ful",
+      "ousness" : "ous",
+      "aliti" : "al",
+      "iviti" : "ive",
+      "biliti" : "ble",
+      "logi" : "log"
+    },
+
+    step3list = {
+      "icate" : "ic",
+      "ative" : "",
+      "alize" : "al",
+      "iciti" : "ic",
+      "ical" : "ic",
+      "ful" : "",
+      "ness" : ""
+    },
+
+    c = "[^aeiou]",          // consonant
+    v = "[aeiouy]",          // vowel
+    C = c + "[^aeiouy]*",    // consonant sequence
+    V = v + "[aeiou]*",      // vowel sequence
+
+    mgr0 = "^(" + C + ")?" + V + C,               // [C]VC... is m>0
+    meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$",  // [C]VC[V] is m=1
+    mgr1 = "^(" + C + ")?" + V + C + V + C,       // [C]VCVC... is m>1
+    s_v = "^(" + C + ")?" + v;                   // vowel in stem
+
+  var re_mgr0 = new RegExp(mgr0);
+  var re_mgr1 = new RegExp(mgr1);
+  var re_meq1 = new RegExp(meq1);
+  var re_s_v = new RegExp(s_v);
+
+  var re_1a = /^(.+?)(ss|i)es$/;
+  var re2_1a = /^(.+?)([^s])s$/;
+  var re_1b = /^(.+?)eed$/;
+  var re2_1b = /^(.+?)(ed|ing)$/;
+  var re_1b_2 = /.$/;
+  var re2_1b_2 = /(at|bl|iz)$/;
+  var re3_1b_2 = new RegExp("([^aeiouylsz])\\1$");
+  var re4_1b_2 = new RegExp("^" + C + v + "[^aeiouwxy]$");
+
+  var re_1c = /^(.+?[^aeiou])y$/;
+  var re_2 = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/;
+
+  var re_3 = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/;
+
+  var re_4 = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/;
+  var re2_4 = /^(.+?)(s|t)(ion)$/;
+
+  var re_5 = /^(.+?)e$/;
+  var re_5_1 = /ll$/;
+  var re3_5 = new RegExp("^" + C + v + "[^aeiouwxy]$");
+
+  var porterStemmer = function porterStemmer(w) {
+    var stem,
+      suffix,
+      firstch,
+      re,
+      re2,
+      re3,
+      re4;
+
+    if (w.length < 3) { return w; }
+
+    firstch = w.substr(0,1);
+    if (firstch == "y") {
+      w = firstch.toUpperCase() + w.substr(1);
+    }
+
+    // Step 1a
+    re = re_1a
+    re2 = re2_1a;
+
+    if (re.test(w)) { w = w.replace(re,"$1$2"); }
+    else if (re2.test(w)) { w = w.replace(re2,"$1$2"); }
+
+    // Step 1b
+    re = re_1b;
+    re2 = re2_1b;
+    if (re.test(w)) {
+      var fp = re.exec(w);
+      re = re_mgr0;
+      if (re.test(fp[1])) {
+        re = re_1b_2;
+        w = w.replace(re,"");
+      }
+    } else if (re2.test(w)) {
+      var fp = re2.exec(w);
+      stem = fp[1];
+      re2 = re_s_v;
+      if (re2.test(stem)) {
+        w = stem;
+        re2 = re2_1b_2;
+        re3 = re3_1b_2;
+        re4 = re4_1b_2;
+        if (re2.test(w)) { w = w + "e"; }
+        else if (re3.test(w)) { re = re_1b_2; w = w.replace(re,""); }
+        else if (re4.test(w)) { w = w + "e"; }
+      }
+    }
+
+    // Step 1c - replace suffix y or Y by i if preceded by a non-vowel which is not the first letter of the word (so cry -> cri, by -> by, say -> say)
+    re = re_1c;
+    if (re.test(w)) {
+      var fp = re.exec(w);
+      stem = fp[1];
+      w = stem + "i";
+    }
+
+    // Step 2
+    re = re_2;
+    if (re.test(w)) {
+      var fp = re.exec(w);
+      stem = fp[1];
+      suffix = fp[2];
+      re = re_mgr0;
+      if (re.test(stem)) {
+        w = stem + step2list[suffix];
+      }
+    }
+
+    // Step 3
+    re = re_3;
+    if (re.test(w)) {
+      var fp = re.exec(w);
+      stem = fp[1];
+      suffix = fp[2];
+      re = re_mgr0;
+      if (re.test(stem)) {
+        w = stem + step3list[suffix];
+      }
+    }
+
+    // Step 4
+    re = re_4;
+    re2 = re2_4;
+    if (re.test(w)) {
+      var fp = re.exec(w);
+      stem = fp[1];
+      re = re_mgr1;
+      if (re.test(stem)) {
+        w = stem;
+      }
+    } else if (re2.test(w)) {
+      var fp = re2.exec(w);
+      stem = fp[1] + fp[2];
+      re2 = re_mgr1;
+      if (re2.test(stem)) {
+        w = stem;
+      }
+    }
+
+    // Step 5
+    re = re_5;
+    if (re.test(w)) {
+      var fp = re.exec(w);
+      stem = fp[1];
+      re = re_mgr1;
+      re2 = re_meq1;
+      re3 = re3_5;
+      if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) {
+        w = stem;
+      }
+    }
+
+    re = re_5_1;
+    re2 = re_mgr1;
+    if (re.test(w) && re2.test(w)) {
+      re = re_1b_2;
+      w = w.replace(re,"");
+    }
+
+    // and turn initial Y back to y
+
+    if (firstch == "y") {
+      w = firstch.toLowerCase() + w.substr(1);
+    }
+
+    return w;
+  };
+
+  return function (token) {
+    return token.update(porterStemmer);
+  }
+})();
+
+lunr.Pipeline.registerFunction(lunr.stemmer, 'stemmer')
+/*!
+ * lunr.stopWordFilter
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * lunr.generateStopWordFilter builds a stopWordFilter function from the provided
+ * list of stop words.
+ *
+ * The built in lunr.stopWordFilter is built using this generator and can be used
+ * to generate custom stopWordFilters for applications or non English languages.
+ *
+ * @param {Array} token The token to pass through the filter
+ * @returns {lunr.PipelineFunction}
+ * @see lunr.Pipeline
+ * @see lunr.stopWordFilter
+ */
+lunr.generateStopWordFilter = function (stopWords) {
+  var words = stopWords.reduce(function (memo, stopWord) {
+    memo[stopWord] = stopWord
+    return memo
+  }, {})
+
+  return function (token) {
+    if (token && words[token.toString()] !== token.toString()) return token
+  }
+}
+
+/**
+ * lunr.stopWordFilter is an English language stop word list filter, any words
+ * contained in the list will not be passed through the filter.
+ *
+ * This is intended to be used in the Pipeline. If the token does not pass the
+ * filter then undefined will be returned.
+ *
+ * @implements {lunr.PipelineFunction}
+ * @params {lunr.Token} token - A token to check for being a stop word.
+ * @returns {lunr.Token}
+ * @see {@link lunr.Pipeline}
+ */
+lunr.stopWordFilter = lunr.generateStopWordFilter([
+  'a',
+  'able',
+  'about',
+  'across',
+  'after',
+  'all',
+  'almost',
+  'also',
+  'am',
+  'among',
+  'an',
+  'and',
+  'any',
+  'are',
+  'as',
+  'at',
+  'be',
+  'because',
+  'been',
+  'but',
+  'by',
+  'can',
+  'cannot',
+  'could',
+  'dear',
+  'did',
+  'do',
+  'does',
+  'either',
+  'else',
+  'ever',
+  'every',
+  'for',
+  'from',
+  'get',
+  'got',
+  'had',
+  'has',
+  'have',
+  'he',
+  'her',
+  'hers',
+  'him',
+  'his',
+  'how',
+  'however',
+  'i',
+  'if',
+  'in',
+  'into',
+  'is',
+  'it',
+  'its',
+  'just',
+  'least',
+  'let',
+  'like',
+  'likely',
+  'may',
+  'me',
+  'might',
+  'most',
+  'must',
+  'my',
+  'neither',
+  'no',
+  'nor',
+  'not',
+  'of',
+  'off',
+  'often',
+  'on',
+  'only',
+  'or',
+  'other',
+  'our',
+  'own',
+  'rather',
+  'said',
+  'say',
+  'says',
+  'she',
+  'should',
+  'since',
+  'so',
+  'some',
+  'than',
+  'that',
+  'the',
+  'their',
+  'them',
+  'then',
+  'there',
+  'these',
+  'they',
+  'this',
+  'tis',
+  'to',
+  'too',
+  'twas',
+  'us',
+  'wants',
+  'was',
+  'we',
+  'were',
+  'what',
+  'when',
+  'where',
+  'which',
+  'while',
+  'who',
+  'whom',
+  'why',
+  'will',
+  'with',
+  'would',
+  'yet',
+  'you',
+  'your'
+])
+
+lunr.Pipeline.registerFunction(lunr.stopWordFilter, 'stopWordFilter')
+/*!
+ * lunr.trimmer
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * lunr.trimmer is a pipeline function for trimming non word
+ * characters from the beginning and end of tokens before they
+ * enter the index.
+ *
+ * This implementation may not work correctly for non latin
+ * characters and should either be removed or adapted for use
+ * with languages with non-latin characters.
+ *
+ * @static
+ * @implements {lunr.PipelineFunction}
+ * @param {lunr.Token} token The token to pass through the filter
+ * @returns {lunr.Token}
+ * @see lunr.Pipeline
+ */
+lunr.trimmer = function (token) {
+  return token.update(function (s) {
+    return s.replace(/^\W+/, '').replace(/\W+$/, '')
+  })
+}
+
+lunr.Pipeline.registerFunction(lunr.trimmer, 'trimmer')
+/*!
+ * lunr.TokenSet
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * A token set is used to store the unique list of all tokens
+ * within an index. Token sets are also used to represent an
+ * incoming query to the index, this query token set and index
+ * token set are then intersected to find which tokens to look
+ * up in the inverted index.
+ *
+ * A token set can hold multiple tokens, as in the case of the
+ * index token set, or it can hold a single token as in the
+ * case of a simple query token set.
+ *
+ * Additionally token sets are used to perform wildcard matching.
+ * Leading, contained and trailing wildcards are supported, and
+ * from this edit distance matching can also be provided.
+ *
+ * Token sets are implemented as a minimal finite state automata,
+ * where both common prefixes and suffixes are shared between tokens.
+ * This helps to reduce the space used for storing the token set.
+ *
+ * @constructor
+ */
+lunr.TokenSet = function () {
+  this.final = false
+  this.edges = {}
+  this.id = lunr.TokenSet._nextId
+  lunr.TokenSet._nextId += 1
+}
+
+/**
+ * Keeps track of the next, auto increment, identifier to assign
+ * to a new tokenSet.
+ *
+ * TokenSets require a unique identifier to be correctly minimised.
+ *
+ * @private
+ */
+lunr.TokenSet._nextId = 1
+
+/**
+ * Creates a TokenSet instance from the given sorted array of words.
+ *
+ * @param {String[]} arr - A sorted array of strings to create the set from.
+ * @returns {lunr.TokenSet}
+ * @throws Will throw an error if the input array is not sorted.
+ */
+lunr.TokenSet.fromArray = function (arr) {
+  var builder = new lunr.TokenSet.Builder
+
+  for (var i = 0, len = arr.length; i < len; i++) {
+    builder.insert(arr[i])
+  }
+
+  builder.finish()
+  return builder.root
+}
+
+/**
+ * Creates a token set from a query clause.
+ *
+ * @private
+ * @param {Object} clause - A single clause from lunr.Query.
+ * @param {string} clause.term - The query clause term.
+ * @param {number} [clause.editDistance] - The optional edit distance for the term.
+ * @returns {lunr.TokenSet}
+ */
+lunr.TokenSet.fromClause = function (clause) {
+  if ('editDistance' in clause) {
+    return lunr.TokenSet.fromFuzzyString(clause.term, clause.editDistance)
+  } else {
+    return lunr.TokenSet.fromString(clause.term)
+  }
+}
+
+/**
+ * Creates a token set representing a single string with a specified
+ * edit distance.
+ *
+ * Insertions, deletions, substitutions and transpositions are each
+ * treated as an edit distance of 1.
+ *
+ * Increasing the allowed edit distance will have a dramatic impact
+ * on the performance of both creating and intersecting these TokenSets.
+ * It is advised to keep the edit distance less than 3.
+ *
+ * @param {string} str - The string to create the token set from.
+ * @param {number} editDistance - The allowed edit distance to match.
+ * @returns {lunr.Vector}
+ */
+lunr.TokenSet.fromFuzzyString = function (str, editDistance) {
+  var root = new lunr.TokenSet
+
+  var stack = [{
+    node: root,
+    editsRemaining: editDistance,
+    str: str
+  }]
+
+  while (stack.length) {
+    var frame = stack.pop()
+
+    // no edit
+    if (frame.str.length > 0) {
+      var char = frame.str.charAt(0),
+          noEditNode
+
+      if (char in frame.node.edges) {
+        noEditNode = frame.node.edges[char]
+      } else {
+        noEditNode = new lunr.TokenSet
+        frame.node.edges[char] = noEditNode
+      }
+
+      if (frame.str.length == 1) {
+        noEditNode.final = true
+      } else {
+        stack.push({
+          node: noEditNode,
+          editsRemaining: frame.editsRemaining,
+          str: frame.str.slice(1)
+        })
+      }
+    }
+
+    // deletion
+    // can only do a deletion if we have enough edits remaining
+    // and if there are characters left to delete in the string
+    if (frame.editsRemaining > 0 && frame.str.length > 1) {
+      var char = frame.str.charAt(1),
+          deletionNode
+
+      if (char in frame.node.edges) {
+        deletionNode = frame.node.edges[char]
+      } else {
+        deletionNode = new lunr.TokenSet
+        frame.node.edges[char] = deletionNode
+      }
+
+      if (frame.str.length <= 2) {
+        deletionNode.final = true
+      } else {
+        stack.push({
+          node: deletionNode,
+          editsRemaining: frame.editsRemaining - 1,
+          str: frame.str.slice(2)
+        })
+      }
+    }
+
+    // deletion
+    // just removing the last character from the str
+    if (frame.editsRemaining > 0 && frame.str.length == 1) {
+      frame.node.final = true
+    }
+
+    // substitution
+    // can only do a substitution if we have enough edits remaining
+    // and if there are characters left to substitute
+    if (frame.editsRemaining > 0 && frame.str.length >= 1) {
+      if ("*" in frame.node.edges) {
+        var substitutionNode = frame.node.edges["*"]
+      } else {
+        var substitutionNode = new lunr.TokenSet
+        frame.node.edges["*"] = substitutionNode
+      }
+
+      if (frame.str.length == 1) {
+        substitutionNode.final = true
+      } else {
+        stack.push({
+          node: substitutionNode,
+          editsRemaining: frame.editsRemaining - 1,
+          str: frame.str.slice(1)
+        })
+      }
+    }
+
+    // insertion
+    // can only do insertion if there are edits remaining
+    if (frame.editsRemaining > 0) {
+      if ("*" in frame.node.edges) {
+        var insertionNode = frame.node.edges["*"]
+      } else {
+        var insertionNode = new lunr.TokenSet
+        frame.node.edges["*"] = insertionNode
+      }
+
+      if (frame.str.length == 0) {
+        insertionNode.final = true
+      } else {
+        stack.push({
+          node: insertionNode,
+          editsRemaining: frame.editsRemaining - 1,
+          str: frame.str
+        })
+      }
+    }
+
+    // transposition
+    // can only do a transposition if there are edits remaining
+    // and there are enough characters to transpose
+    if (frame.editsRemaining > 0 && frame.str.length > 1) {
+      var charA = frame.str.charAt(0),
+          charB = frame.str.charAt(1),
+          transposeNode
+
+      if (charB in frame.node.edges) {
+        transposeNode = frame.node.edges[charB]
+      } else {
+        transposeNode = new lunr.TokenSet
+        frame.node.edges[charB] = transposeNode
+      }
+
+      if (frame.str.length == 1) {
+        transposeNode.final = true
+      } else {
+        stack.push({
+          node: transposeNode,
+          editsRemaining: frame.editsRemaining - 1,
+          str: charA + frame.str.slice(2)
+        })
+      }
+    }
+  }
+
+  return root
+}
+
+/**
+ * Creates a TokenSet from a string.
+ *
+ * The string may contain one or more wildcard characters (*)
+ * that will allow wildcard matching when intersecting with
+ * another TokenSet.
+ *
+ * @param {string} str - The string to create a TokenSet from.
+ * @returns {lunr.TokenSet}
+ */
+lunr.TokenSet.fromString = function (str) {
+  var node = new lunr.TokenSet,
+      root = node,
+      wildcardFound = false
+
+  /*
+   * Iterates through all characters within the passed string
+   * appending a node for each character.
+   *
+   * As soon as a wildcard character is found then a self
+   * referencing edge is introduced to continually match
+   * any number of any characters.
+   */
+  for (var i = 0, len = str.length; i < len; i++) {
+    var char = str[i],
+        final = (i == len - 1)
+
+    if (char == "*") {
+      wildcardFound = true
+      node.edges[char] = node
+      node.final = final
+
+    } else {
+      var next = new lunr.TokenSet
+      next.final = final
+
+      node.edges[char] = next
+      node = next
+
+      // TODO: is this needed anymore?
+      if (wildcardFound) {
+        node.edges["*"] = root
+      }
+    }
+  }
+
+  return root
+}
+
+/**
+ * Converts this TokenSet into an array of strings
+ * contained within the TokenSet.
+ *
+ * @returns {string[]}
+ */
+lunr.TokenSet.prototype.toArray = function () {
+  var words = []
+
+  var stack = [{
+    prefix: "",
+    node: this
+  }]
+
+  while (stack.length) {
+    var frame = stack.pop(),
+        edges = Object.keys(frame.node.edges),
+        len = edges.length
+
+    if (frame.node.final) {
+      words.push(frame.prefix)
+    }
+
+    for (var i = 0; i < len; i++) {
+      var edge = edges[i]
+
+      stack.push({
+        prefix: frame.prefix.concat(edge),
+        node: frame.node.edges[edge]
+      })
+    }
+  }
+
+  return words
+}
+
+/**
+ * Generates a string representation of a TokenSet.
+ *
+ * This is intended to allow TokenSets to be used as keys
+ * in objects, largely to aid the construction and minimisation
+ * of a TokenSet. As such it is not designed to be a human
+ * friendly representation of the TokenSet.
+ *
+ * @returns {string}
+ */
+lunr.TokenSet.prototype.toString = function () {
+  // NOTE: Using Object.keys here as this.edges is very likely
+  // to enter 'hash-mode' with many keys being added
+  //
+  // avoiding a for-in loop here as it leads to the function
+  // being de-optimised (at least in V8). From some simple
+  // benchmarks the performance is comparable, but allowing
+  // V8 to optimize may mean easy performance wins in the future.
+
+  if (this._str) {
+    return this._str
+  }
+
+  var str = this.final ? '1' : '0',
+      labels = Object.keys(this.edges).sort(),
+      len = labels.length
+
+  for (var i = 0; i < len; i++) {
+    var label = labels[i],
+        node = this.edges[label]
+
+    str = str + label + node.id
+  }
+
+  return str
+}
+
+/**
+ * Returns a new TokenSet that is the intersection of
+ * this TokenSet and the passed TokenSet.
+ *
+ * This intersection will take into account any wildcards
+ * contained within the TokenSet.
+ *
+ * @param {lunr.TokenSet} b - An other TokenSet to intersect with.
+ * @returns {lunr.TokenSet}
+ */
+lunr.TokenSet.prototype.intersect = function (b) {
+  var output = new lunr.TokenSet,
+      frame = undefined
+
+  var stack = [{
+    qNode: b,
+    output: output,
+    node: this
+  }]
+
+  while (stack.length) {
+    frame = stack.pop()
+
+    // NOTE: As with the #toString method, we are using
+    // Object.keys and a for loop instead of a for-in loop
+    // as both of these objects enter 'hash' mode, causing
+    // the function to be de-optimised in V8
+    var qEdges = Object.keys(frame.qNode.edges),
+        qLen = qEdges.length,
+        nEdges = Object.keys(frame.node.edges),
+        nLen = nEdges.length
+
+    for (var q = 0; q < qLen; q++) {
+      var qEdge = qEdges[q]
+
+      for (var n = 0; n < nLen; n++) {
+        var nEdge = nEdges[n]
+
+        if (nEdge == qEdge || qEdge == '*') {
+          var node = frame.node.edges[nEdge],
+              qNode = frame.qNode.edges[qEdge],
+              final = node.final && qNode.final,
+              next = undefined
+
+          if (nEdge in frame.output.edges) {
+            // an edge already exists for this character
+            // no need to create a new node, just set the finality
+            // bit unless this node is already final
+            next = frame.output.edges[nEdge]
+            next.final = next.final || final
+
+          } else {
+            // no edge exists yet, must create one
+            // set the finality bit and insert it
+            // into the output
+            next = new lunr.TokenSet
+            next.final = final
+            frame.output.edges[nEdge] = next
+          }
+
+          stack.push({
+            qNode: qNode,
+            output: next,
+            node: node
+          })
+        }
+      }
+    }
+  }
+
+  return output
+}
+lunr.TokenSet.Builder = function () {
+  this.previousWord = ""
+  this.root = new lunr.TokenSet
+  this.uncheckedNodes = []
+  this.minimizedNodes = {}
+}
+
+lunr.TokenSet.Builder.prototype.insert = function (word) {
+  var node,
+      commonPrefix = 0
+
+  if (word < this.previousWord) {
+    throw new Error ("Out of order word insertion")
+  }
+
+  for (var i = 0; i < word.length && i < this.previousWord.length; i++) {
+    if (word[i] != this.previousWord[i]) break
+    commonPrefix++
+  }
+
+  this.minimize(commonPrefix)
+
+  if (this.uncheckedNodes.length == 0) {
+    node = this.root
+  } else {
+    node = this.uncheckedNodes[this.uncheckedNodes.length - 1].child
+  }
+
+  for (var i = commonPrefix; i < word.length; i++) {
+    var nextNode = new lunr.TokenSet,
+        char = word[i]
+
+    node.edges[char] = nextNode
+
+    this.uncheckedNodes.push({
+      parent: node,
+      char: char,
+      child: nextNode
+    })
+
+    node = nextNode
+  }
+
+  node.final = true
+  this.previousWord = word
+}
+
+lunr.TokenSet.Builder.prototype.finish = function () {
+  this.minimize(0)
+}
+
+lunr.TokenSet.Builder.prototype.minimize = function (downTo) {
+  for (var i = this.uncheckedNodes.length - 1; i >= downTo; i--) {
+    var node = this.uncheckedNodes[i],
+        childKey = node.child.toString()
+
+    if (childKey in this.minimizedNodes) {
+      node.parent.edges[node.char] = this.minimizedNodes[childKey]
+    } else {
+      // Cache the key for this node since
+      // we know it can't change anymore
+      node.child._str = childKey
+
+      this.minimizedNodes[childKey] = node.child
+    }
+
+    this.uncheckedNodes.pop()
+  }
+}
+/*!
+ * lunr.Index
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * An index contains the built index of all documents and provides a query interface
+ * to the index.
+ *
+ * Usually instances of lunr.Index will not be created using this constructor, instead
+ * lunr.Builder should be used to construct new indexes, or lunr.Index.load should be
+ * used to load previously built and serialized indexes.
+ *
+ * @constructor
+ * @param {Object} attrs - The attributes of the built search index.
+ * @param {Object} attrs.invertedIndex - An index of term/field to document reference.
+ * @param {Object<string, lunr.Vector>} attrs.documentVectors - Document vectors keyed by document reference.
+ * @param {lunr.TokenSet} attrs.tokenSet - An set of all corpus tokens.
+ * @param {string[]} attrs.fields - The names of indexed document fields.
+ * @param {lunr.Pipeline} attrs.pipeline - The pipeline to use for search terms.
+ */
+lunr.Index = function (attrs) {
+  this.invertedIndex = attrs.invertedIndex
+  this.fieldVectors = attrs.fieldVectors
+  this.tokenSet = attrs.tokenSet
+  this.fields = attrs.fields
+  this.pipeline = attrs.pipeline
+}
+
+/**
+ * A result contains details of a document matching a search query.
+ * @typedef {Object} lunr.Index~Result
+ * @property {string} ref - The reference of the document this result represents.
+ * @property {number} score - A number between 0 and 1 representing how similar this document is to the query.
+ * @property {lunr.MatchData} matchData - Contains metadata about this match including which term(s) caused the match.
+ */
+
+/**
+ * Although lunr provides the ability to create queries using lunr.Query, it also provides a simple
+ * query language which itself is parsed into an instance of lunr.Query.
+ *
+ * For programmatically building queries it is advised to directly use lunr.Query, the query language
+ * is best used for human entered text rather than program generated text.
+ *
+ * At its simplest queries can just be a single term, e.g. `hello`, multiple terms are also supported
+ * and will be combined with OR, e.g `hello world` will match documents that contain either 'hello'
+ * or 'world', though those that contain both will rank higher in the results.
+ *
+ * Wildcards can be included in terms to match one or more unspecified characters, these wildcards can
+ * be inserted anywhere within the term, and more than one wildcard can exist in a single term. Adding
+ * wildcards will increase the number of documents that will be found but can also have a negative
+ * impact on query performance, especially with wildcards at the beginning of a term.
+ *
+ * Terms can be restricted to specific fields, e.g. `title:hello`, only documents with the term
+ * hello in the title field will match this query. Using a field not present in the index will lead
+ * to an error being thrown.
+ *
+ * Modifiers can also be added to terms, lunr supports edit distance and boost modifiers on terms. A term
+ * boost will make documents matching that term score higher, e.g. `foo^5`. Edit distance is also supported
+ * to provide fuzzy matching, e.g. 'hello~2' will match documents with hello with an edit distance of 2.
+ * Avoid large values for edit distance to improve query performance.
+ *
+ * To escape special characters the backslash character '\' can be used, this allows searches to include
+ * characters that would normally be considered modifiers, e.g. `foo\~2` will search for a term "foo~2" instead
+ * of attempting to apply a boost of 2 to the search term "foo".
+ *
+ * @typedef {string} lunr.Index~QueryString
+ * @example <caption>Simple single term query</caption>
+ * hello
+ * @example <caption>Multiple term query</caption>
+ * hello world
+ * @example <caption>term scoped to a field</caption>
+ * title:hello
+ * @example <caption>term with a boost of 10</caption>
+ * hello^10
+ * @example <caption>term with an edit distance of 2</caption>
+ * hello~2
+ */
+
+/**
+ * Performs a search against the index using lunr query syntax.
+ *
+ * Results will be returned sorted by their score, the most relevant results
+ * will be returned first.
+ *
+ * For more programmatic querying use lunr.Index#query.
+ *
+ * @param {lunr.Index~QueryString} queryString - A string containing a lunr query.
+ * @throws {lunr.QueryParseError} If the passed query string cannot be parsed.
+ * @returns {lunr.Index~Result[]}
+ */
+lunr.Index.prototype.search = function (queryString) {
+  return this.query(function (query) {
+    var parser = new lunr.QueryParser(queryString, query)
+    parser.parse()
+  })
+}
+
+/**
+ * A query builder callback provides a query object to be used to express
+ * the query to perform on the index.
+ *
+ * @callback lunr.Index~queryBuilder
+ * @param {lunr.Query} query - The query object to build up.
+ * @this lunr.Query
+ */
+
+/**
+ * Performs a query against the index using the yielded lunr.Query object.
+ *
+ * If performing programmatic queries against the index, this method is preferred
+ * over lunr.Index#search so as to avoid the additional query parsing overhead.
+ *
+ * A query object is yielded to the supplied function which should be used to
+ * express the query to be run against the index.
+ *
+ * Note that although this function takes a callback parameter it is _not_ an
+ * asynchronous operation, the callback is just yielded a query object to be
+ * customized.
+ *
+ * @param {lunr.Index~queryBuilder} fn - A function that is used to build the query.
+ * @returns {lunr.Index~Result[]}
+ */
+lunr.Index.prototype.query = function (fn) {
+  // for each query clause
+  // * process terms
+  // * expand terms from token set
+  // * find matching documents and metadata
+  // * get document vectors
+  // * score documents
+
+  var query = new lunr.Query(this.fields),
+      matchingFields = Object.create(null),
+      queryVectors = Object.create(null),
+      termFieldCache = Object.create(null)
+
+  fn.call(query, query)
+
+  for (var i = 0; i < query.clauses.length; i++) {
+    /*
+     * Unless the pipeline has been disabled for this term, which is
+     * the case for terms with wildcards, we need to pass the clause
+     * term through the search pipeline. A pipeline returns an array
+     * of processed terms. Pipeline functions may expand the passed
+     * term, which means we may end up performing multiple index lookups
+     * for a single query term.
+     */
+    var clause = query.clauses[i],
+        terms = null
+
+    if (clause.usePipeline) {
+      terms = this.pipeline.runString(clause.term)
+    } else {
+      terms = [clause.term]
+    }
+
+    for (var m = 0; m < terms.length; m++) {
+      var term = terms[m]
+
+      /*
+       * Each term returned from the pipeline needs to use the same query
+       * clause object, e.g. the same boost and or edit distance. The
+       * simplest way to do this is to re-use the clause object but mutate
+       * its term property.
+       */
+      clause.term = term
+
+      /*
+       * From the term in the clause we create a token set which will then
+       * be used to intersect the indexes token set to get a list of terms
+       * to lookup in the inverted index
+       */
+      var termTokenSet = lunr.TokenSet.fromClause(clause),
+          expandedTerms = this.tokenSet.intersect(termTokenSet).toArray()
+
+      for (var j = 0; j < expandedTerms.length; j++) {
+        /*
+         * For each term get the posting and termIndex, this is required for
+         * building the query vector.
+         */
+        var expandedTerm = expandedTerms[j],
+            posting = this.invertedIndex[expandedTerm],
+            termIndex = posting._index
+
+        for (var k = 0; k < clause.fields.length; k++) {
+          /*
+           * For each field that this query term is scoped by (by default
+           * all fields are in scope) we need to get all the document refs
+           * that have this term in that field.
+           *
+           * The posting is the entry in the invertedIndex for the matching
+           * term from above.
+           */
+          var field = clause.fields[k],
+              fieldPosting = posting[field],
+              matchingDocumentRefs = Object.keys(fieldPosting),
+              termField = expandedTerm + "/" + field
+
+          /*
+           * To support field level boosts a query vector is created per
+           * field. This vector is populated using the termIndex found for
+           * the term and a unit value with the appropriate boost applied.
+           *
+           * If the query vector for this field does not exist yet it needs
+           * to be created.
+           */
+          if (queryVectors[field] === undefined) {
+            queryVectors[field] = new lunr.Vector
+          }
+
+          /*
+           * Using upsert because there could already be an entry in the vector
+           * for the term we are working with. In that case we just add the scores
+           * together.
+           */
+          queryVectors[field].upsert(termIndex, 1 * clause.boost, function (a, b) { return a + b })
+
+          /**
+           * If we've already seen this term, field combo then we've already collected
+           * the matching documents and metadata, no need to go through all that again
+           */
+          if (termFieldCache[termField]) {
+            continue
+          }
+
+          for (var l = 0; l < matchingDocumentRefs.length; l++) {
+            /*
+             * All metadata for this term/field/document triple
+             * are then extracted and collected into an instance
+             * of lunr.MatchData ready to be returned in the query
+             * results
+             */
+            var matchingDocumentRef = matchingDocumentRefs[l],
+                matchingFieldRef = new lunr.FieldRef (matchingDocumentRef, field),
+                metadata = fieldPosting[matchingDocumentRef],
+                fieldMatch
+
+            if ((fieldMatch = matchingFields[matchingFieldRef]) === undefined) {
+              matchingFields[matchingFieldRef] = new lunr.MatchData (expandedTerm, field, metadata)
+            } else {
+              fieldMatch.add(expandedTerm, field, metadata)
+            }
+
+          }
+
+          termFieldCache[termField] = true
+        }
+      }
+    }
+  }
+
+  var matchingFieldRefs = Object.keys(matchingFields),
+      results = [],
+      matches = Object.create(null)
+
+  for (var i = 0; i < matchingFieldRefs.length; i++) {
+    /*
+     * Currently we have document fields that match the query, but we
+     * need to return documents. The matchData and scores are combined
+     * from multiple fields belonging to the same document.
+     *
+     * Scores are calculated by field, using the query vectors created
+     * above, and combined into a final document score using addition.
+     */
+    var fieldRef = lunr.FieldRef.fromString(matchingFieldRefs[i]),
+        docRef = fieldRef.docRef,
+        fieldVector = this.fieldVectors[fieldRef],
+        score = queryVectors[fieldRef.fieldName].similarity(fieldVector),
+        docMatch
+
+    if ((docMatch = matches[docRef]) !== undefined) {
+      docMatch.score += score
+      docMatch.matchData.combine(matchingFields[fieldRef])
+    } else {
+      var match = {
+        ref: docRef,
+        score: score,
+        matchData: matchingFields[fieldRef]
+      }
+      matches[docRef] = match
+      results.push(match)
+    }
+  }
+
+  /*
+   * Sort the results objects by score, highest first.
+   */
+  return results.sort(function (a, b) {
+    return b.score - a.score
+  })
+}
+
+/**
+ * Prepares the index for JSON serialization.
+ *
+ * The schema for this JSON blob will be described in a
+ * separate JSON schema file.
+ *
+ * @returns {Object}
+ */
+lunr.Index.prototype.toJSON = function () {
+  var invertedIndex = Object.keys(this.invertedIndex)
+    .sort()
+    .map(function (term) {
+      return [term, this.invertedIndex[term]]
+    }, this)
+
+  var fieldVectors = Object.keys(this.fieldVectors)
+    .map(function (ref) {
+      return [ref, this.fieldVectors[ref].toJSON()]
+    }, this)
+
+  return {
+    version: lunr.version,
+    fields: this.fields,
+    fieldVectors: fieldVectors,
+    invertedIndex: invertedIndex,
+    pipeline: this.pipeline.toJSON()
+  }
+}
+
+/**
+ * Loads a previously serialized lunr.Index
+ *
+ * @param {Object} serializedIndex - A previously serialized lunr.Index
+ * @returns {lunr.Index}
+ */
+lunr.Index.load = function (serializedIndex) {
+  var attrs = {},
+      fieldVectors = {},
+      serializedVectors = serializedIndex.fieldVectors,
+      invertedIndex = {},
+      serializedInvertedIndex = serializedIndex.invertedIndex,
+      tokenSetBuilder = new lunr.TokenSet.Builder,
+      pipeline = lunr.Pipeline.load(serializedIndex.pipeline)
+
+  if (serializedIndex.version != lunr.version) {
+    lunr.utils.warn("Version mismatch when loading serialised index. Current version of lunr '" + lunr.version + "' does not match serialized index '" + serializedIndex.version + "'")
+  }
+
+  for (var i = 0; i < serializedVectors.length; i++) {
+    var tuple = serializedVectors[i],
+        ref = tuple[0],
+        elements = tuple[1]
+
+    fieldVectors[ref] = new lunr.Vector(elements)
+  }
+
+  for (var i = 0; i < serializedInvertedIndex.length; i++) {
+    var tuple = serializedInvertedIndex[i],
+        term = tuple[0],
+        posting = tuple[1]
+
+    tokenSetBuilder.insert(term)
+    invertedIndex[term] = posting
+  }
+
+  tokenSetBuilder.finish()
+
+  attrs.fields = serializedIndex.fields
+
+  attrs.fieldVectors = fieldVectors
+  attrs.invertedIndex = invertedIndex
+  attrs.tokenSet = tokenSetBuilder.root
+  attrs.pipeline = pipeline
+
+  return new lunr.Index(attrs)
+}
+/*!
+ * lunr.Builder
+ * Copyright (C) 2018 Oliver Nightingale
+ */
+
+/**
+ * lunr.Builder performs indexing on a set of documents and
+ * returns instances of lunr.Index ready for querying.
+ *
+ * All configuration of the index is done via the builder, the
+ * fields to index, the document reference, the text processing
+ * pipeline and document scoring parameters are all set on the
+ * builder before indexing.
+ *
+ * @constructor
+ * @property {string} _ref - Internal reference to the document reference field.
+ * @property {string[]} _fields - Internal reference to the document fields to index.
+ * @property {object} invertedIndex - The inverted index maps terms to document fields.
+ * @property {object} documentTermFrequencies - Keeps track of document term frequencies.
+ * @property {object} documentLengths - Keeps track of the length of documents added to the index.
+ * @property {lunr.tokenizer} tokenizer - Function for splitting strings into tokens for indexing.
+ * @property {lunr.Pipeline} pipeline - The pipeline performs text processing on tokens before indexing.
+ * @property {lunr.Pipeline} searchPipeline - A pipeline for processing search terms before querying the index.
+ * @property {number} documentCount - Keeps track of the total number of documents indexed.
+ * @property {number} _b - A parameter to control field length normalization, setting this to 0 disabled normalization, 1 fully normalizes field lengths, the default value is 0.75.
+ * @property {number} _k1 - A parameter to control how quickly an increase in term frequency results in term frequency saturation, the default value is 1.2.
+ * @property {number} termIndex - A counter incremented for each unique term, used to identify a terms position in the vector space.
+ * @property {array} metadataWhitelist - A list of metadata keys that have been whitelisted for entry in the index.
+ */
+lunr.Builder = function () {
+  this._ref = "id"
+  this._fields = []
+  this.invertedIndex = Object.create(null)
+  this.fieldTermFrequencies = {}
+  this.fieldLengths = {}
+  this.tokenizer = lunr.tokenizer
+  this.pipeline = new lunr.Pipeline
+  this.searchPipeline = new lunr.Pipeline
+  this.documentCount = 0
+  this._b = 0.75
+  this._k1 = 1.2
+  this.termIndex = 0
+  this.metadataWhitelist = []
+}
+
+/**
+ * Sets the document field used as the document reference. Every document must have this field.
+ * The type of this field in the document should be a string, if it is not a string it will be
+ * coerced into a string by calling toString.
+ *
+ * The default ref is 'id'.
+ *
+ * The ref should _not_ be changed during indexing, it should be set before any documents are
+ * added to the index. Changing it during indexing can lead to inconsistent results.
+ *
+ * @param {string} ref - The name of the reference field in the document.
+ */
+lunr.Builder.prototype.ref = function (ref) {
+  this._ref = ref
+}
+
+/**
+ * Adds a field to the list of document fields that will be indexed. Every document being
+ * indexed should have this field. Null values for this field in indexed documents will
+ * not cause errors but will limit the chance of that document being retrieved by searches.
+ *
+ * All fields should be added before adding documents to the index. Adding fields after
+ * a document has been indexed will have no effect on already indexed documents.
+ *
+ * @param {string} field - The name of a field to index in all documents.
+ */
+lunr.Builder.prototype.field = function (field) {
+  this._fields.push(field)
+}
+
+/**
+ * A parameter to tune the amount of field length normalisation that is applied when
+ * calculating relevance scores. A value of 0 will completely disable any normalisation
+ * and a value of 1 will fully normalise field lengths. The default is 0.75. Values of b
+ * will be clamped to the range 0 - 1.
+ *
+ * @param {number} number - The value to set for this tuning parameter.
+ */
+lunr.Builder.prototype.b = function (number) {
+  if (number < 0) {
+    this._b = 0
+  } else if (number > 1) {
+    this._b = 1
+  } else {
+    this._b = number
+  }
+}
+
+/**
+ * A parameter that controls the speed at which a rise in term frequency results in term
+ * frequency saturation. The default value is 1.2. Setting this to a higher value will give
+ * slower saturation levels, a lower value will result in quicker saturation.
+ *
+ * @param {number} number - The value to set for this tuning parameter.
+ */
+lunr.Builder.prototype.k1 = function (number) {
+  this._k1 = number
+}
+
+/**
+ * Adds a document to the index.
+ *
+ * Before adding fields to the index the index should have been fully setup, with the document
+ * ref and all fields to index already having been specified.
+ *
+ * The document must have a field name as specified by the ref (by default this is 'id') and
+ * it should have all fields defined for indexing, though null or undefined values will not
+ * cause errors.
+ *
+ * @param {object} doc - The document to add to the index.
+ */
+lunr.Builder.prototype.add = function (doc) {
+  var docRef = doc[this._ref]
+
+  this.documentCount += 1
+
+  for (var i = 0; i < this._fields.length; i++) {
+    var fieldName = this._fields[i],
+        field = doc[fieldName],
+        tokens = this.tokenizer(field),
+        terms = this.pipeline.run(tokens),
+        fieldRef = new lunr.FieldRef (docRef, fieldName),
+        fieldTerms = Object.create(null)
+
+    this.fieldTermFrequencies[fieldRef] = fieldTerms
+    this.fieldLengths[fieldRef] = 0
+
+    // store the length of this field for this document
+    this.fieldLengths[fieldRef] += terms.length
+
+    // calculate term frequencies for this field
+    for (var j = 0; j < terms.length; j++) {
+      var term = terms[j]
+
+      if (fieldTerms[term] == undefined) {
+        fieldTerms[term] = 0
+      }
+
+      fieldTerms[term] += 1
+
+      // add to inverted index
+      // create an initial posting if one doesn't exist
+      if (this.invertedIndex[term] == undefined) {
+        var posting = Object.create(null)
+        posting["_index"] = this.termIndex
+        this.termIndex += 1
+
+        for (var k = 0; k < this._fields.length; k++) {
+          posting[this._fields[k]] = Object.create(null)
+        }
+
+        this.invertedIndex[term] = posting
+      }
+
+      // add an entry for this term/fieldName/docRef to the invertedIndex
+      if (this.invertedIndex[term][fieldName][docRef] == undefined) {
+        this.invertedIndex[term][fieldName][docRef] = Object.create(null)
+      }
+
+      // store all whitelisted metadata about this token in the
+      // inverted index
+      for (var l = 0; l < this.metadataWhitelist.length; l++) {
+        var metadataKey = this.metadataWhitelist[l],
+            metadata = term.metadata[metadataKey]
+
+        if (this.invertedIndex[term][fieldName][docRef][metadataKey] == undefined) {
+          this.invertedIndex[term][fieldName][docRef][metadataKey] = []
+        }
+
+        this.invertedIndex[term][fieldName][docRef][metadataKey].push(metadata)
+      }
+    }
+
+  }
+}
+
+/**
+ * Calculates the average document length for this index
+ *
+ * @private
+ */
+lunr.Builder.prototype.calculateAverageFieldLengths = function () {
+
+  var fieldRefs = Object.keys(this.fieldLengths),
+      numberOfFields = fieldRefs.length,
+      accumulator = {},
+      documentsWithField = {}
+
+  for (var i = 0; i < numberOfFields; i++) {
+    var fieldRef = lunr.FieldRef.fromString(fieldRefs[i]),
+        field = fieldRef.fieldName
+
+    documentsWithField[field] || (documentsWithField[field] = 0)
+    documentsWithField[field] += 1
+
+    accumulator[field] || (accumulator[field] = 0)
+    accumulator[field] += this.fieldLengths[fieldRef]
+  }
+
+  for (var i = 0; i < this._fields.length; i++) {
+    var field = this._fields[i]
+    accumulator[field] = accumulator[field] / documentsWithField[field]
+  }
+
+  this.averageFieldLength = accumulator
+}
+
+/**
+ * Builds a vector space model of every document using lunr.Vector
+ *
+ * @private
+ */
+lunr.Builder.prototype.createFieldVectors = function () {
+  var fieldVectors = {},
+      fieldRefs = Object.keys(this.fieldTermFrequencies),
+      fieldRefsLength = fieldRefs.length,
+      termIdfCache = Object.create(null)
+
+  for (var i = 0; i < fieldRefsLength; i++) {
+    var fieldRef = lunr.FieldRef.fromString(fieldRefs[i]),
+        field = fieldRef.fieldName,
+        fieldLength = this.fieldLengths[fieldRef],
+        fieldVector = new lunr.Vector,
+        termFrequencies = this.fieldTermFrequencies[fieldRef],
+        terms = Object.keys(termFrequencies),
+        termsLength = terms.length
+
+    for (var j = 0; j < termsLength; j++) {
+      var term = terms[j],
+          tf = termFrequencies[term],
+          termIndex = this.invertedIndex[term]._index,
+          idf, score, scoreWithPrecision
+
+      if (termIdfCache[term] === undefined) {
+        idf = lunr.idf(this.invertedIndex[term], this.documentCount)
+        termIdfCache[term] = idf
+      } else {
+        idf = termIdfCache[term]
+      }
+
+      score = idf * ((this._k1 + 1) * tf) / (this._k1 * (1 - this._b + this._b * (fieldLength / this.averageFieldLength[field])) + tf)
+      scoreWithPrecision = Math.round(score * 1000) / 1000
+      // Converts 1.23456789 to 1.234.
+      // Reducing the precision so that the vectors take up less
+      // space when serialised. Doing it now so that they behave
+      // the same before and after serialisation. Also, this is
+      // the fastest approach to reducing a number's precision in
+      // JavaScript.
+
+      fieldVector.insert(termIndex, scoreWithPrecision)
+    }
+
+    fieldVectors[fieldRef] = fieldVector
+  }
+
+  this.fieldVectors = fieldVectors
+}
+
+/**
+ * Creates a token set of all tokens in the index using lunr.TokenSet
+ *
+ * @private
+ */
+lunr.Builder.prototype.createTokenSet = function () {
+  this.tokenSet = lunr.TokenSet.fromArray(
+    Object.keys(this.invertedIndex).sort()
+  )
+}
+
+/**
+ * Builds the index, creating an instance of lunr.Index.
+ *
+ * This completes the indexing process and should only be called
+ * once all documents have been added to the index.
+ *
+ * @returns {lunr.Index}
+ */
+lunr.Builder.prototype.build = function () {
+  this.calculateAverageFieldLengths()
+  this.createFieldVectors()
+  this.createTokenSet()
+
+  return new lunr.Index({
+    invertedIndex: this.invertedIndex,
+    fieldVectors: this.fieldVectors,
+    tokenSet: this.tokenSet,
+    fields: this._fields,
+    pipeline: this.searchPipeline
+  })
+}
+
+/**
+ * Applies a plugin to the index builder.
+ *
+ * A plugin is a function that is called with the index builder as its context.
+ * Plugins can be used to customise or extend the behaviour of the index
+ * in some way. A plugin is just a function, that encapsulated the custom
+ * behaviour that should be applied when building the index.
+ *
+ * The plugin function will be called with the index builder as its argument, additional
+ * arguments can also be passed when calling use. The function will be called
+ * with the index builder as its context.
+ *
+ * @param {Function} plugin The plugin to apply.
+ */
+lunr.Builder.prototype.use = function (fn) {
+  var args = Array.prototype.slice.call(arguments, 1)
+  args.unshift(this)
+  fn.apply(this, args)
+}
+/**
+ * Contains and collects metadata about a matching document.
+ * A single instance of lunr.MatchData is returned as part of every
+ * lunr.Index~Result.
+ *
+ * @constructor
+ * @param {string} term - The term this match data is associated with
+ * @param {string} field - The field in which the term was found
+ * @param {object} metadata - The metadata recorded about this term in this field
+ * @property {object} metadata - A cloned collection of metadata associated with this document.
+ * @see {@link lunr.Index~Result}
+ */
+lunr.MatchData = function (term, field, metadata) {
+  var clonedMetadata = Object.create(null),
+      metadataKeys = Object.keys(metadata)
+
+  // Cloning the metadata to prevent the original
+  // being mutated during match data combination.
+  // Metadata is kept in an array within the inverted
+  // index so cloning the data can be done with
+  // Array#slice
+  for (var i = 0; i < metadataKeys.length; i++) {
+    var key = metadataKeys[i]
+    clonedMetadata[key] = metadata[key].slice()
+  }
+
+  this.metadata = Object.create(null)
+  this.metadata[term] = Object.create(null)
+  this.metadata[term][field] = clonedMetadata
+}
+
+/**
+ * An instance of lunr.MatchData will be created for every term that matches a
+ * document. However only one instance is required in a lunr.Index~Result. This
+ * method combines metadata from another instance of lunr.MatchData with this
+ * objects metadata.
+ *
+ * @param {lunr.MatchData} otherMatchData - Another instance of match data to merge with this one.
+ * @see {@link lunr.Index~Result}
+ */
+lunr.MatchData.prototype.combine = function (otherMatchData) {
+  var terms = Object.keys(otherMatchData.metadata)
+
+  for (var i = 0; i < terms.length; i++) {
+    var term = terms[i],
+        fields = Object.keys(otherMatchData.metadata[term])
+
+    if (this.metadata[term] == undefined) {
+      this.metadata[term] = Object.create(null)
+    }
+
+    for (var j = 0; j < fields.length; j++) {
+      var field = fields[j],
+          keys = Object.keys(otherMatchData.metadata[term][field])
+
+      if (this.metadata[term][field] == undefined) {
+        this.metadata[term][field] = Object.create(null)
+      }
+
+      for (var k = 0; k < keys.length; k++) {
+        var key = keys[k]
+
+        if (this.metadata[term][field][key] == undefined) {
+          this.metadata[term][field][key] = otherMatchData.metadata[term][field][key]
+        } else {
+          this.metadata[term][field][key] = this.metadata[term][field][key].concat(otherMatchData.metadata[term][field][key])
+        }
+
+      }
+    }
+  }
+}
+
+/**
+ * Add metadata for a term/field pair to this instance of match data.
+ *
+ * @param {string} term - The term this match data is associated with
+ * @param {string} field - The field in which the term was found
+ * @param {object} metadata - The metadata recorded about this term in this field
+ */
+lunr.MatchData.prototype.add = function (term, field, metadata) {
+  if (!(term in this.metadata)) {
+    this.metadata[term] = Object.create(null)
+    this.metadata[term][field] = metadata
+    return
+  }
+
+  if (!(field in this.metadata[term])) {
+    this.metadata[term][field] = metadata
+    return
+  }
+
+  var metadataKeys = Object.keys(metadata)
+
+  for (var i = 0; i < metadataKeys.length; i++) {
+    var key = metadataKeys[i]
+
+    if (key in this.metadata[term][field]) {
+      this.metadata[term][field][key] = this.metadata[term][field][key].concat(metadata[key])
+    } else {
+      this.metadata[term][field][key] = metadata[key]
+    }
+  }
+}
+/**
+ * A lunr.Query provides a programmatic way of defining queries to be performed
+ * against a {@link lunr.Index}.
+ *
+ * Prefer constructing a lunr.Query using the {@link lunr.Index#query} method
+ * so the query object is pre-initialized with the right index fields.
+ *
+ * @constructor
+ * @property {lunr.Query~Clause[]} clauses - An array of query clauses.
+ * @property {string[]} allFields - An array of all available fields in a lunr.Index.
+ */
+lunr.Query = function (allFields) {
+  this.clauses = []
+  this.allFields = allFields
+}
+
+/**
+ * Constants for indicating what kind of automatic wildcard insertion will be used when constructing a query clause.
+ *
+ * This allows wildcards to be added to the beginning and end of a term without having to manually do any string
+ * concatenation.
+ *
+ * The wildcard constants can be bitwise combined to select both leading and trailing wildcards.
+ *
+ * @constant
+ * @default
+ * @property {number} wildcard.NONE - The term will have no wildcards inserted, this is the default behaviour
+ * @property {number} wildcard.LEADING - Prepend the term with a wildcard, unless a leading wildcard already exists
+ * @property {number} wildcard.TRAILING - Append a wildcard to the term, unless a trailing wildcard already exists
+ * @see lunr.Query~Clause
+ * @see lunr.Query#clause
+ * @see lunr.Query#term
+ * @example <caption>query term with trailing wildcard</caption>
+ * query.term('foo', { wildcard: lunr.Query.wildcard.TRAILING })
+ * @example <caption>query term with leading and trailing wildcard</caption>
+ * query.term('foo', {
+ *   wildcard: lunr.Query.wildcard.LEADING | lunr.Query.wildcard.TRAILING
+ * })
+ */
+lunr.Query.wildcard = new String ("*")
+lunr.Query.wildcard.NONE = 0
+lunr.Query.wildcard.LEADING = 1
+lunr.Query.wildcard.TRAILING = 2
+
+/**
+ * A single clause in a {@link lunr.Query} contains a term and details on how to
+ * match that term against a {@link lunr.Index}.
+ *
+ * @typedef {Object} lunr.Query~Clause
+ * @property {string[]} fields - The fields in an index this clause should be matched against.
+ * @property {number} [boost=1] - Any boost that should be applied when matching this clause.
+ * @property {number} [editDistance] - Whether the term should have fuzzy matching applied, and how fuzzy the match should be.
+ * @property {boolean} [usePipeline] - Whether the term should be passed through the search pipeline.
+ * @property {number} [wildcard=0] - Whether the term should have wildcards appended or prepended.
+ */
+
+/**
+ * Adds a {@link lunr.Query~Clause} to this query.
+ *
+ * Unless the clause contains the fields to be matched all fields will be matched. In addition
+ * a default boost of 1 is applied to the clause.
+ *
+ * @param {lunr.Query~Clause} clause - The clause to add to this query.
+ * @see lunr.Query~Clause
+ * @returns {lunr.Query}
+ */
+lunr.Query.prototype.clause = function (clause) {
+  if (!('fields' in clause)) {
+    clause.fields = this.allFields
+  }
+
+  if (!('boost' in clause)) {
+    clause.boost = 1
+  }
+
+  if (!('usePipeline' in clause)) {
+    clause.usePipeline = true
+  }
+
+  if (!('wildcard' in clause)) {
+    clause.wildcard = lunr.Query.wildcard.NONE
+  }
+
+  if ((clause.wildcard & lunr.Query.wildcard.LEADING) && (clause.term.charAt(0) != lunr.Query.wildcard)) {
+    clause.term = "*" + clause.term
+  }
+
+  if ((clause.wildcard & lunr.Query.wildcard.TRAILING) && (clause.term.slice(-1) != lunr.Query.wildcard)) {
+    clause.term = "" + clause.term + "*"
+  }
+
+  this.clauses.push(clause)
+
+  return this
+}
+
+/**
+ * Adds a term to the current query, under the covers this will create a {@link lunr.Query~Clause}
+ * to the list of clauses that make up this query.
+ *
+ * @param {string} term - The term to add to the query.
+ * @param {Object} [options] - Any additional properties to add to the query clause.
+ * @returns {lunr.Query}
+ * @see lunr.Query#clause
+ * @see lunr.Query~Clause
+ * @example <caption>adding a single term to a query</caption>
+ * query.term("foo")
+ * @example <caption>adding a single term to a query and specifying search fields, term boost and automatic trailing wildcard</caption>
+ * query.term("foo", {
+ *   fields: ["title"],
+ *   boost: 10,
+ *   wildcard: lunr.Query.wildcard.TRAILING
+ * })
+ */
+lunr.Query.prototype.term = function (term, options) {
+  var clause = options || {}
+  clause.term = term
+
+  this.clause(clause)
+
+  return this
+}
+lunr.QueryParseError = function (message, start, end) {
+  this.name = "QueryParseError"
+  this.message = message
+  this.start = start
+  this.end = end
+}
+
+lunr.QueryParseError.prototype = new Error
+lunr.QueryLexer = function (str) {
+  this.lexemes = []
+  this.str = str
+  this.length = str.length
+  this.pos = 0
+  this.start = 0
+  this.escapeCharPositions = []
+}
+
+lunr.QueryLexer.prototype.run = function () {
+  var state = lunr.QueryLexer.lexText
+
+  while (state) {
+    state = state(this)
+  }
+}
+
+lunr.QueryLexer.prototype.sliceString = function () {
+  var subSlices = [],
+      sliceStart = this.start,
+      sliceEnd = this.pos
+
+  for (var i = 0; i < this.escapeCharPositions.length; i++) {
+    sliceEnd = this.escapeCharPositions[i]
+    subSlices.push(this.str.slice(sliceStart, sliceEnd))
+    sliceStart = sliceEnd + 1
+  }
+
+  subSlices.push(this.str.slice(sliceStart, this.pos))
+  this.escapeCharPositions.length = 0
+
+  return subSlices.join('')
+}
+
+lunr.QueryLexer.prototype.emit = function (type) {
+  this.lexemes.push({
+    type: type,
+    str: this.sliceString(),
+    start: this.start,
+    end: this.pos
+  })
+
+  this.start = this.pos
+}
+
+lunr.QueryLexer.prototype.escapeCharacter = function () {
+  this.escapeCharPositions.push(this.pos - 1)
+  this.pos += 1
+}
+
+lunr.QueryLexer.prototype.next = function () {
+  if (this.pos >= this.length) {
+    return lunr.QueryLexer.EOS
+  }
+
+  var char = this.str.charAt(this.pos)
+  this.pos += 1
+  return char
+}
+
+lunr.QueryLexer.prototype.width = function () {
+  return this.pos - this.start
+}
+
+lunr.QueryLexer.prototype.ignore = function () {
+  if (this.start == this.pos) {
+    this.pos += 1
+  }
+
+  this.start = this.pos
+}
+
+lunr.QueryLexer.prototype.backup = function () {
+  this.pos -= 1
+}
+
+lunr.QueryLexer.prototype.acceptDigitRun = function () {
+  var char, charCode
+
+  do {
+    char = this.next()
+    charCode = char.charCodeAt(0)
+  } while (charCode > 47 && charCode < 58)
+
+  if (char != lunr.QueryLexer.EOS) {
+    this.backup()
+  }
+}
+
+lunr.QueryLexer.prototype.more = function () {
+  return this.pos < this.length
+}
+
+lunr.QueryLexer.EOS = 'EOS'
+lunr.QueryLexer.FIELD = 'FIELD'
+lunr.QueryLexer.TERM = 'TERM'
+lunr.QueryLexer.EDIT_DISTANCE = 'EDIT_DISTANCE'
+lunr.QueryLexer.BOOST = 'BOOST'
+
+lunr.QueryLexer.lexField = function (lexer) {
+  lexer.backup()
+  lexer.emit(lunr.QueryLexer.FIELD)
+  lexer.ignore()
+  return lunr.QueryLexer.lexText
+}
+
+lunr.QueryLexer.lexTerm = function (lexer) {
+  if (lexer.width() > 1) {
+    lexer.backup()
+    lexer.emit(lunr.QueryLexer.TERM)
+  }
+
+  lexer.ignore()
+
+  if (lexer.more()) {
+    return lunr.QueryLexer.lexText
+  }
+}
+
+lunr.QueryLexer.lexEditDistance = function (lexer) {
+  lexer.ignore()
+  lexer.acceptDigitRun()
+  lexer.emit(lunr.QueryLexer.EDIT_DISTANCE)
+  return lunr.QueryLexer.lexText
+}
+
+lunr.QueryLexer.lexBoost = function (lexer) {
+  lexer.ignore()
+  lexer.acceptDigitRun()
+  lexer.emit(lunr.QueryLexer.BOOST)
+  return lunr.QueryLexer.lexText
+}
+
+lunr.QueryLexer.lexEOS = function (lexer) {
+  if (lexer.width() > 0) {
+    lexer.emit(lunr.QueryLexer.TERM)
+  }
+}
+
+// This matches the separator used when tokenising fields
+// within a document. These should match otherwise it is
+// not possible to search for some tokens within a document.
+//
+// It is possible for the user to change the separator on the
+// tokenizer so it _might_ clash with any other of the special
+// characters already used within the search string, e.g. :.
+//
+// This means that it is possible to change the separator in
+// such a way that makes some words unsearchable using a search
+// string.
+lunr.QueryLexer.termSeparator = lunr.tokenizer.separator
+
+lunr.QueryLexer.lexText = function (lexer) {
+  while (true) {
+    var char = lexer.next()
+
+    if (char == lunr.QueryLexer.EOS) {
+      return lunr.QueryLexer.lexEOS
+    }
+
+    // Escape character is '\'
+    if (char.charCodeAt(0) == 92) {
+      lexer.escapeCharacter()
+      continue
+    }
+
+    if (char == ":") {
+      return lunr.QueryLexer.lexField
+    }
+
+    if (char == "~") {
+      lexer.backup()
+      if (lexer.width() > 0) {
+        lexer.emit(lunr.QueryLexer.TERM)
+      }
+      return lunr.QueryLexer.lexEditDistance
+    }
+
+    if (char == "^") {
+      lexer.backup()
+      if (lexer.width() > 0) {
+        lexer.emit(lunr.QueryLexer.TERM)
+      }
+      return lunr.QueryLexer.lexBoost
+    }
+
+    if (char.match(lunr.QueryLexer.termSeparator)) {
+      return lunr.QueryLexer.lexTerm
+    }
+  }
+}
+
+lunr.QueryParser = function (str, query) {
+  this.lexer = new lunr.QueryLexer (str)
+  this.query = query
+  this.currentClause = {}
+  this.lexemeIdx = 0
+}
+
+lunr.QueryParser.prototype.parse = function () {
+  this.lexer.run()
+  this.lexemes = this.lexer.lexemes
+
+  var state = lunr.QueryParser.parseFieldOrTerm
+
+  while (state) {
+    state = state(this)
+  }
+
+  return this.query
+}
+
+lunr.QueryParser.prototype.peekLexeme = function () {
+  return this.lexemes[this.lexemeIdx]
+}
+
+lunr.QueryParser.prototype.consumeLexeme = function () {
+  var lexeme = this.peekLexeme()
+  this.lexemeIdx += 1
+  return lexeme
+}
+
+lunr.QueryParser.prototype.nextClause = function () {
+  var completedClause = this.currentClause
+  this.query.clause(completedClause)
+  this.currentClause = {}
+}
+
+lunr.QueryParser.parseFieldOrTerm = function (parser) {
+  var lexeme = parser.peekLexeme()
+
+  if (lexeme == undefined) {
+    return
+  }
+
+  switch (lexeme.type) {
+    case lunr.QueryLexer.FIELD:
+      return lunr.QueryParser.parseField
+    case lunr.QueryLexer.TERM:
+      return lunr.QueryParser.parseTerm
+    default:
+      var errorMessage = "expected either a field or a term, found " + lexeme.type
+
+      if (lexeme.str.length >= 1) {
+        errorMessage += " with value '" + lexeme.str + "'"
+      }
+
+      throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end)
+  }
+}
+
+lunr.QueryParser.parseField = function (parser) {
+  var lexeme = parser.consumeLexeme()
+
+  if (lexeme == undefined) {
+    return
+  }
+
+  if (parser.query.allFields.indexOf(lexeme.str) == -1) {
+    var possibleFields = parser.query.allFields.map(function (f) { return "'" + f + "'" }).join(', '),
+        errorMessage = "unrecognised field '" + lexeme.str + "', possible fields: " + possibleFields
+
+    throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end)
+  }
+
+  parser.currentClause.fields = [lexeme.str]
+
+  var nextLexeme = parser.peekLexeme()
+
+  if (nextLexeme == undefined) {
+    var errorMessage = "expecting term, found nothing"
+    throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end)
+  }
+
+  switch (nextLexeme.type) {
+    case lunr.QueryLexer.TERM:
+      return lunr.QueryParser.parseTerm
+    default:
+      var errorMessage = "expecting term, found '" + nextLexeme.type + "'"
+      throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end)
+  }
+}
+
+lunr.QueryParser.parseTerm = function (parser) {
+  var lexeme = parser.consumeLexeme()
+
+  if (lexeme == undefined) {
+    return
+  }
+
+  parser.currentClause.term = lexeme.str.toLowerCase()
+
+  if (lexeme.str.indexOf("*") != -1) {
+    parser.currentClause.usePipeline = false
+  }
+
+  var nextLexeme = parser.peekLexeme()
+
+  if (nextLexeme == undefined) {
+    parser.nextClause()
+    return
+  }
+
+  switch (nextLexeme.type) {
+    case lunr.QueryLexer.TERM:
+      parser.nextClause()
+      return lunr.QueryParser.parseTerm
+    case lunr.QueryLexer.FIELD:
+      parser.nextClause()
+      return lunr.QueryParser.parseField
+    case lunr.QueryLexer.EDIT_DISTANCE:
+      return lunr.QueryParser.parseEditDistance
+    case lunr.QueryLexer.BOOST:
+      return lunr.QueryParser.parseBoost
+    default:
+      var errorMessage = "Unexpected lexeme type '" + nextLexeme.type + "'"
+      throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end)
+  }
+}
+
+lunr.QueryParser.parseEditDistance = function (parser) {
+  var lexeme = parser.consumeLexeme()
+
+  if (lexeme == undefined) {
+    return
+  }
+
+  var editDistance = parseInt(lexeme.str, 10)
+
+  if (isNaN(editDistance)) {
+    var errorMessage = "edit distance must be numeric"
+    throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end)
+  }
+
+  parser.currentClause.editDistance = editDistance
+
+  var nextLexeme = parser.peekLexeme()
+
+  if (nextLexeme == undefined) {
+    parser.nextClause()
+    return
+  }
+
+  switch (nextLexeme.type) {
+    case lunr.QueryLexer.TERM:
+      parser.nextClause()
+      return lunr.QueryParser.parseTerm
+    case lunr.QueryLexer.FIELD:
+      parser.nextClause()
+      return lunr.QueryParser.parseField
+    case lunr.QueryLexer.EDIT_DISTANCE:
+      return lunr.QueryParser.parseEditDistance
+    case lunr.QueryLexer.BOOST:
+      return lunr.QueryParser.parseBoost
+    default:
+      var errorMessage = "Unexpected lexeme type '" + nextLexeme.type + "'"
+      throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end)
+  }
+}
+
+lunr.QueryParser.parseBoost = function (parser) {
+  var lexeme = parser.consumeLexeme()
+
+  if (lexeme == undefined) {
+    return
+  }
+
+  var boost = parseInt(lexeme.str, 10)
+
+  if (isNaN(boost)) {
+    var errorMessage = "boost must be numeric"
+    throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end)
+  }
+
+  parser.currentClause.boost = boost
+
+  var nextLexeme = parser.peekLexeme()
+
+  if (nextLexeme == undefined) {
+    parser.nextClause()
+    return
+  }
+
+  switch (nextLexeme.type) {
+    case lunr.QueryLexer.TERM:
+      parser.nextClause()
+      return lunr.QueryParser.parseTerm
+    case lunr.QueryLexer.FIELD:
+      parser.nextClause()
+      return lunr.QueryParser.parseField
+    case lunr.QueryLexer.EDIT_DISTANCE:
+      return lunr.QueryParser.parseEditDistance
+    case lunr.QueryLexer.BOOST:
+      return lunr.QueryParser.parseBoost
+    default:
+      var errorMessage = "Unexpected lexeme type '" + nextLexeme.type + "'"
+      throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end)
+  }
+}
+
+  /**
+   * export the module via AMD, CommonJS or as a browser global
+   * Export code from https://github.com/umdjs/umd/blob/master/returnExports.js
+   */
+  ;(function (root, factory) {
+    if (typeof define === 'function' && define.amd) {
+      // AMD. Register as an anonymous module.
+      define(factory)
+    } else if (typeof exports === 'object') {
+      /**
+       * Node. Does not work with strict CommonJS, but
+       * only CommonJS-like enviroments that support module.exports,
+       * like Node.
+       */
+      module.exports = factory()
+    } else {
+      // Browser globals (root is window)
+      root.lunr = factory()
+    }
+  }(this, function () {
+    /**
+     * Just return a value to define the module export.
+     * This example returns an object, but the module
+     * can return a function as the exported value.
+     */
+    return lunr
+  }))
+})();
diff --git a/docs/search/lunr.min.js b/docs/search/lunr.min.js
deleted file mode 100644
index b0198dff91fb2d8a6ceebe8c704c3ceb1ea49f01..0000000000000000000000000000000000000000
--- a/docs/search/lunr.min.js
+++ /dev/null
@@ -1,7 +0,0 @@
-/**
- * lunr - http://lunrjs.com - A bit like Solr, but much smaller and not as bright - 0.7.0
- * Copyright (C) 2016 Oliver Nightingale
- * MIT Licensed
- * @license
- */
-!function(){var t=function(e){var n=new t.Index;return n.pipeline.add(t.trimmer,t.stopWordFilter,t.stemmer),e&&e.call(n,n),n};t.version="0.7.0",t.utils={},t.utils.warn=function(t){return function(e){t.console&&console.warn&&console.warn(e)}}(this),t.utils.asString=function(t){return void 0===t||null===t?"":t.toString()},t.EventEmitter=function(){this.events={}},t.EventEmitter.prototype.addListener=function(){var t=Array.prototype.slice.call(arguments),e=t.pop(),n=t;if("function"!=typeof e)throw new TypeError("last argument must be a function");n.forEach(function(t){this.hasHandler(t)||(this.events[t]=[]),this.events[t].push(e)},this)},t.EventEmitter.prototype.removeListener=function(t,e){if(this.hasHandler(t)){var n=this.events[t].indexOf(e);this.events[t].splice(n,1),this.events[t].length||delete this.events[t]}},t.EventEmitter.prototype.emit=function(t){if(this.hasHandler(t)){var e=Array.prototype.slice.call(arguments,1);this.events[t].forEach(function(t){t.apply(void 0,e)})}},t.EventEmitter.prototype.hasHandler=function(t){return t in this.events},t.tokenizer=function(e){return arguments.length&&null!=e&&void 0!=e?Array.isArray(e)?e.map(function(e){return t.utils.asString(e).toLowerCase()}):e.toString().trim().toLowerCase().split(t.tokenizer.seperator):[]},t.tokenizer.seperator=/[\s\-]+/,t.tokenizer.load=function(t){var e=this.registeredFunctions[t];if(!e)throw new Error("Cannot load un-registered function: "+t);return e},t.tokenizer.label="default",t.tokenizer.registeredFunctions={"default":t.tokenizer},t.tokenizer.registerFunction=function(e,n){n in this.registeredFunctions&&t.utils.warn("Overwriting existing tokenizer: "+n),e.label=n,this.registeredFunctions[n]=e},t.Pipeline=function(){this._stack=[]},t.Pipeline.registeredFunctions={},t.Pipeline.registerFunction=function(e,n){n in this.registeredFunctions&&t.utils.warn("Overwriting existing registered function: "+n),e.label=n,t.Pipeline.registeredFunctions[e.label]=e},t.Pipeline.warnIfFunctionNotRegistered=function(e){var n=e.label&&e.label in this.registeredFunctions;n||t.utils.warn("Function is not registered with pipeline. This may cause problems when serialising the index.\n",e)},t.Pipeline.load=function(e){var n=new t.Pipeline;return e.forEach(function(e){var i=t.Pipeline.registeredFunctions[e];if(!i)throw new Error("Cannot load un-registered function: "+e);n.add(i)}),n},t.Pipeline.prototype.add=function(){var e=Array.prototype.slice.call(arguments);e.forEach(function(e){t.Pipeline.warnIfFunctionNotRegistered(e),this._stack.push(e)},this)},t.Pipeline.prototype.after=function(e,n){t.Pipeline.warnIfFunctionNotRegistered(n);var i=this._stack.indexOf(e);if(-1==i)throw new Error("Cannot find existingFn");i+=1,this._stack.splice(i,0,n)},t.Pipeline.prototype.before=function(e,n){t.Pipeline.warnIfFunctionNotRegistered(n);var i=this._stack.indexOf(e);if(-1==i)throw new Error("Cannot find existingFn");this._stack.splice(i,0,n)},t.Pipeline.prototype.remove=function(t){var e=this._stack.indexOf(t);-1!=e&&this._stack.splice(e,1)},t.Pipeline.prototype.run=function(t){for(var e=[],n=t.length,i=this._stack.length,r=0;n>r;r++){for(var o=t[r],s=0;i>s&&(o=this._stack[s](o,r,t),void 0!==o&&""!==o);s++);void 0!==o&&""!==o&&e.push(o)}return e},t.Pipeline.prototype.reset=function(){this._stack=[]},t.Pipeline.prototype.toJSON=function(){return this._stack.map(function(e){return t.Pipeline.warnIfFunctionNotRegistered(e),e.label})},t.Vector=function(){this._magnitude=null,this.list=void 0,this.length=0},t.Vector.Node=function(t,e,n){this.idx=t,this.val=e,this.next=n},t.Vector.prototype.insert=function(e,n){this._magnitude=void 0;var i=this.list;if(!i)return this.list=new t.Vector.Node(e,n,i),this.length++;if(e<i.idx)return this.list=new t.Vector.Node(e,n,i),this.length++;for(var r=i,o=i.next;void 0!=o;){if(e<o.idx)return r.next=new t.Vector.Node(e,n,o),this.length++;r=o,o=o.next}return r.next=new t.Vector.Node(e,n,o),this.length++},t.Vector.prototype.magnitude=function(){if(this._magnitude)return this._magnitude;for(var t,e=this.list,n=0;e;)t=e.val,n+=t*t,e=e.next;return this._magnitude=Math.sqrt(n)},t.Vector.prototype.dot=function(t){for(var e=this.list,n=t.list,i=0;e&&n;)e.idx<n.idx?e=e.next:e.idx>n.idx?n=n.next:(i+=e.val*n.val,e=e.next,n=n.next);return i},t.Vector.prototype.similarity=function(t){return this.dot(t)/(this.magnitude()*t.magnitude())},t.SortedSet=function(){this.length=0,this.elements=[]},t.SortedSet.load=function(t){var e=new this;return e.elements=t,e.length=t.length,e},t.SortedSet.prototype.add=function(){var t,e;for(t=0;t<arguments.length;t++)e=arguments[t],~this.indexOf(e)||this.elements.splice(this.locationFor(e),0,e);this.length=this.elements.length},t.SortedSet.prototype.toArray=function(){return this.elements.slice()},t.SortedSet.prototype.map=function(t,e){return this.elements.map(t,e)},t.SortedSet.prototype.forEach=function(t,e){return this.elements.forEach(t,e)},t.SortedSet.prototype.indexOf=function(t){for(var e=0,n=this.elements.length,i=n-e,r=e+Math.floor(i/2),o=this.elements[r];i>1;){if(o===t)return r;t>o&&(e=r),o>t&&(n=r),i=n-e,r=e+Math.floor(i/2),o=this.elements[r]}return o===t?r:-1},t.SortedSet.prototype.locationFor=function(t){for(var e=0,n=this.elements.length,i=n-e,r=e+Math.floor(i/2),o=this.elements[r];i>1;)t>o&&(e=r),o>t&&(n=r),i=n-e,r=e+Math.floor(i/2),o=this.elements[r];return o>t?r:t>o?r+1:void 0},t.SortedSet.prototype.intersect=function(e){for(var n=new t.SortedSet,i=0,r=0,o=this.length,s=e.length,a=this.elements,h=e.elements;;){if(i>o-1||r>s-1)break;a[i]!==h[r]?a[i]<h[r]?i++:a[i]>h[r]&&r++:(n.add(a[i]),i++,r++)}return n},t.SortedSet.prototype.clone=function(){var e=new t.SortedSet;return e.elements=this.toArray(),e.length=e.elements.length,e},t.SortedSet.prototype.union=function(t){var e,n,i;this.length>=t.length?(e=this,n=t):(e=t,n=this),i=e.clone();for(var r=0,o=n.toArray();r<o.length;r++)i.add(o[r]);return i},t.SortedSet.prototype.toJSON=function(){return this.toArray()},t.Index=function(){this._fields=[],this._ref="id",this.pipeline=new t.Pipeline,this.documentStore=new t.Store,this.tokenStore=new t.TokenStore,this.corpusTokens=new t.SortedSet,this.eventEmitter=new t.EventEmitter,this.tokenizerFn=t.tokenizer,this._idfCache={},this.on("add","remove","update",function(){this._idfCache={}}.bind(this))},t.Index.prototype.on=function(){var t=Array.prototype.slice.call(arguments);return this.eventEmitter.addListener.apply(this.eventEmitter,t)},t.Index.prototype.off=function(t,e){return this.eventEmitter.removeListener(t,e)},t.Index.load=function(e){e.version!==t.version&&t.utils.warn("version mismatch: current "+t.version+" importing "+e.version);var n=new this;return n._fields=e.fields,n._ref=e.ref,n.tokenizer=t.tokenizer.load(e.tokenizer),n.documentStore=t.Store.load(e.documentStore),n.tokenStore=t.TokenStore.load(e.tokenStore),n.corpusTokens=t.SortedSet.load(e.corpusTokens),n.pipeline=t.Pipeline.load(e.pipeline),n},t.Index.prototype.field=function(t,e){var e=e||{},n={name:t,boost:e.boost||1};return this._fields.push(n),this},t.Index.prototype.ref=function(t){return this._ref=t,this},t.Index.prototype.tokenizer=function(e){var n=e.label&&e.label in t.tokenizer.registeredFunctions;return n||t.utils.warn("Function is not a registered tokenizer. This may cause problems when serialising the index"),this.tokenizerFn=e,this},t.Index.prototype.add=function(e,n){var i={},r=new t.SortedSet,o=e[this._ref],n=void 0===n?!0:n;this._fields.forEach(function(t){var n=this.pipeline.run(this.tokenizerFn(e[t.name]));i[t.name]=n;for(var o=0;o<n.length;o++){var s=n[o];r.add(s),this.corpusTokens.add(s)}},this),this.documentStore.set(o,r);for(var s=0;s<r.length;s++){for(var a=r.elements[s],h=0,u=0;u<this._fields.length;u++){var l=this._fields[u],c=i[l.name],f=c.length;if(f){for(var d=0,p=0;f>p;p++)c[p]===a&&d++;h+=d/f*l.boost}}this.tokenStore.add(a,{ref:o,tf:h})}n&&this.eventEmitter.emit("add",e,this)},t.Index.prototype.remove=function(t,e){var n=t[this._ref],e=void 0===e?!0:e;if(this.documentStore.has(n)){var i=this.documentStore.get(n);this.documentStore.remove(n),i.forEach(function(t){this.tokenStore.remove(t,n)},this),e&&this.eventEmitter.emit("remove",t,this)}},t.Index.prototype.update=function(t,e){var e=void 0===e?!0:e;this.remove(t,!1),this.add(t,!1),e&&this.eventEmitter.emit("update",t,this)},t.Index.prototype.idf=function(t){var e="@"+t;if(Object.prototype.hasOwnProperty.call(this._idfCache,e))return this._idfCache[e];var n=this.tokenStore.count(t),i=1;return n>0&&(i=1+Math.log(this.documentStore.length/n)),this._idfCache[e]=i},t.Index.prototype.search=function(e){var n=this.pipeline.run(this.tokenizerFn(e)),i=new t.Vector,r=[],o=this._fields.reduce(function(t,e){return t+e.boost},0),s=n.some(function(t){return this.tokenStore.has(t)},this);if(!s)return[];n.forEach(function(e,n,s){var a=1/s.length*this._fields.length*o,h=this,u=this.tokenStore.expand(e).reduce(function(n,r){var o=h.corpusTokens.indexOf(r),s=h.idf(r),u=1,l=new t.SortedSet;if(r!==e){var c=Math.max(3,r.length-e.length);u=1/Math.log(c)}o>-1&&i.insert(o,a*s*u);for(var f=h.tokenStore.get(r),d=Object.keys(f),p=d.length,v=0;p>v;v++)l.add(f[d[v]].ref);return n.union(l)},new t.SortedSet);r.push(u)},this);var a=r.reduce(function(t,e){return t.intersect(e)});return a.map(function(t){return{ref:t,score:i.similarity(this.documentVector(t))}},this).sort(function(t,e){return e.score-t.score})},t.Index.prototype.documentVector=function(e){for(var n=this.documentStore.get(e),i=n.length,r=new t.Vector,o=0;i>o;o++){var s=n.elements[o],a=this.tokenStore.get(s)[e].tf,h=this.idf(s);r.insert(this.corpusTokens.indexOf(s),a*h)}return r},t.Index.prototype.toJSON=function(){return{version:t.version,fields:this._fields,ref:this._ref,tokenizer:this.tokenizerFn.label,documentStore:this.documentStore.toJSON(),tokenStore:this.tokenStore.toJSON(),corpusTokens:this.corpusTokens.toJSON(),pipeline:this.pipeline.toJSON()}},t.Index.prototype.use=function(t){var e=Array.prototype.slice.call(arguments,1);e.unshift(this),t.apply(this,e)},t.Store=function(){this.store={},this.length=0},t.Store.load=function(e){var n=new this;return n.length=e.length,n.store=Object.keys(e.store).reduce(function(n,i){return n[i]=t.SortedSet.load(e.store[i]),n},{}),n},t.Store.prototype.set=function(t,e){this.has(t)||this.length++,this.store[t]=e},t.Store.prototype.get=function(t){return this.store[t]},t.Store.prototype.has=function(t){return t in this.store},t.Store.prototype.remove=function(t){this.has(t)&&(delete this.store[t],this.length--)},t.Store.prototype.toJSON=function(){return{store:this.store,length:this.length}},t.stemmer=function(){var t={ational:"ate",tional:"tion",enci:"ence",anci:"ance",izer:"ize",bli:"ble",alli:"al",entli:"ent",eli:"e",ousli:"ous",ization:"ize",ation:"ate",ator:"ate",alism:"al",iveness:"ive",fulness:"ful",ousness:"ous",aliti:"al",iviti:"ive",biliti:"ble",logi:"log"},e={icate:"ic",ative:"",alize:"al",iciti:"ic",ical:"ic",ful:"",ness:""},n="[^aeiou]",i="[aeiouy]",r=n+"[^aeiouy]*",o=i+"[aeiou]*",s="^("+r+")?"+o+r,a="^("+r+")?"+o+r+"("+o+")?$",h="^("+r+")?"+o+r+o+r,u="^("+r+")?"+i,l=new RegExp(s),c=new RegExp(h),f=new RegExp(a),d=new RegExp(u),p=/^(.+?)(ss|i)es$/,v=/^(.+?)([^s])s$/,g=/^(.+?)eed$/,m=/^(.+?)(ed|ing)$/,y=/.$/,S=/(at|bl|iz)$/,w=new RegExp("([^aeiouylsz])\\1$"),k=new RegExp("^"+r+i+"[^aeiouwxy]$"),x=/^(.+?[^aeiou])y$/,b=/^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/,E=/^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/,F=/^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/,_=/^(.+?)(s|t)(ion)$/,z=/^(.+?)e$/,O=/ll$/,P=new RegExp("^"+r+i+"[^aeiouwxy]$"),T=function(n){var i,r,o,s,a,h,u;if(n.length<3)return n;if(o=n.substr(0,1),"y"==o&&(n=o.toUpperCase()+n.substr(1)),s=p,a=v,s.test(n)?n=n.replace(s,"$1$2"):a.test(n)&&(n=n.replace(a,"$1$2")),s=g,a=m,s.test(n)){var T=s.exec(n);s=l,s.test(T[1])&&(s=y,n=n.replace(s,""))}else if(a.test(n)){var T=a.exec(n);i=T[1],a=d,a.test(i)&&(n=i,a=S,h=w,u=k,a.test(n)?n+="e":h.test(n)?(s=y,n=n.replace(s,"")):u.test(n)&&(n+="e"))}if(s=x,s.test(n)){var T=s.exec(n);i=T[1],n=i+"i"}if(s=b,s.test(n)){var T=s.exec(n);i=T[1],r=T[2],s=l,s.test(i)&&(n=i+t[r])}if(s=E,s.test(n)){var T=s.exec(n);i=T[1],r=T[2],s=l,s.test(i)&&(n=i+e[r])}if(s=F,a=_,s.test(n)){var T=s.exec(n);i=T[1],s=c,s.test(i)&&(n=i)}else if(a.test(n)){var T=a.exec(n);i=T[1]+T[2],a=c,a.test(i)&&(n=i)}if(s=z,s.test(n)){var T=s.exec(n);i=T[1],s=c,a=f,h=P,(s.test(i)||a.test(i)&&!h.test(i))&&(n=i)}return s=O,a=c,s.test(n)&&a.test(n)&&(s=y,n=n.replace(s,"")),"y"==o&&(n=o.toLowerCase()+n.substr(1)),n};return T}(),t.Pipeline.registerFunction(t.stemmer,"stemmer"),t.generateStopWordFilter=function(t){var e=t.reduce(function(t,e){return t[e]=e,t},{});return function(t){return t&&e[t]!==t?t:void 0}},t.stopWordFilter=t.generateStopWordFilter(["a","able","about","across","after","all","almost","also","am","among","an","and","any","are","as","at","be","because","been","but","by","can","cannot","could","dear","did","do","does","either","else","ever","every","for","from","get","got","had","has","have","he","her","hers","him","his","how","however","i","if","in","into","is","it","its","just","least","let","like","likely","may","me","might","most","must","my","neither","no","nor","not","of","off","often","on","only","or","other","our","own","rather","said","say","says","she","should","since","so","some","than","that","the","their","them","then","there","these","they","this","tis","to","too","twas","us","wants","was","we","were","what","when","where","which","while","who","whom","why","will","with","would","yet","you","your"]),t.Pipeline.registerFunction(t.stopWordFilter,"stopWordFilter"),t.trimmer=function(t){return t.replace(/^\W+/,"").replace(/\W+$/,"")},t.Pipeline.registerFunction(t.trimmer,"trimmer"),t.TokenStore=function(){this.root={docs:{}},this.length=0},t.TokenStore.load=function(t){var e=new this;return e.root=t.root,e.length=t.length,e},t.TokenStore.prototype.add=function(t,e,n){var n=n||this.root,i=t.charAt(0),r=t.slice(1);return i in n||(n[i]={docs:{}}),0===r.length?(n[i].docs[e.ref]=e,void(this.length+=1)):this.add(r,e,n[i])},t.TokenStore.prototype.has=function(t){if(!t)return!1;for(var e=this.root,n=0;n<t.length;n++){if(!e[t.charAt(n)])return!1;e=e[t.charAt(n)]}return!0},t.TokenStore.prototype.getNode=function(t){if(!t)return{};for(var e=this.root,n=0;n<t.length;n++){if(!e[t.charAt(n)])return{};e=e[t.charAt(n)]}return e},t.TokenStore.prototype.get=function(t,e){return this.getNode(t,e).docs||{}},t.TokenStore.prototype.count=function(t,e){return Object.keys(this.get(t,e)).length},t.TokenStore.prototype.remove=function(t,e){if(t){for(var n=this.root,i=0;i<t.length;i++){if(!(t.charAt(i)in n))return;n=n[t.charAt(i)]}delete n.docs[e]}},t.TokenStore.prototype.expand=function(t,e){var n=this.getNode(t),i=n.docs||{},e=e||[];return Object.keys(i).length&&e.push(t),Object.keys(n).forEach(function(n){"docs"!==n&&e.concat(this.expand(t+n,e))},this),e},t.TokenStore.prototype.toJSON=function(){return{root:this.root,length:this.length}},function(t,e){"function"==typeof define&&define.amd?define(e):"object"==typeof exports?module.exports=e():t.lunr=e()}(this,function(){return t})}();
diff --git a/docs/search/main.js b/docs/search/main.js
new file mode 100644
index 0000000000000000000000000000000000000000..0a82ab56a6a378734d0354e8e59147951a829b3e
--- /dev/null
+++ b/docs/search/main.js
@@ -0,0 +1,96 @@
+function getSearchTermFromLocation() {
+  var sPageURL = window.location.search.substring(1);
+  var sURLVariables = sPageURL.split('&');
+  for (var i = 0; i < sURLVariables.length; i++) {
+    var sParameterName = sURLVariables[i].split('=');
+    if (sParameterName[0] == 'q') {
+      return decodeURIComponent(sParameterName[1].replace(/\+/g, '%20'));
+    }
+  }
+}
+
+function joinUrl (base, path) {
+  if (path.substring(0, 1) === "/") {
+    // path starts with `/`. Thus it is absolute.
+    return path;
+  }
+  if (base.substring(base.length-1) === "/") {
+    // base ends with `/`
+    return base + path;
+  }
+  return base + "/" + path;
+}
+
+function formatResult (location, title, summary) {
+  return '<article><h3><a href="' + joinUrl(base_url, location) + '">'+ title + '</a></h3><p>' + summary +'</p></article>';
+}
+
+function displayResults (results) {
+  var search_results = document.getElementById("mkdocs-search-results");
+  while (search_results.firstChild) {
+    search_results.removeChild(search_results.firstChild);
+  }
+  if (results.length > 0){
+    for (var i=0; i < results.length; i++){
+      var result = results[i];
+      var html = formatResult(result.location, result.title, result.summary);
+      search_results.insertAdjacentHTML('beforeend', html);
+    }
+  } else {
+    search_results.insertAdjacentHTML('beforeend', "<p>No results found</p>");
+  }
+}
+
+function doSearch () {
+  var query = document.getElementById('mkdocs-search-query').value;
+  if (query.length > 2) {
+    if (!window.Worker) {
+      displayResults(search(query));
+    } else {
+      searchWorker.postMessage({query: query});
+    }
+  } else {
+    // Clear results for short queries
+    displayResults([]);
+  }
+}
+
+function initSearch () {
+  var search_input = document.getElementById('mkdocs-search-query');
+  if (search_input) {
+    search_input.addEventListener("keyup", doSearch);
+  }
+  var term = getSearchTermFromLocation();
+  if (term) {
+    search_input.value = term;
+    doSearch();
+  }
+}
+
+function onWorkerMessage (e) {
+  if (e.data.allowSearch) {
+    initSearch();
+  } else if (e.data.results) {
+    var results = e.data.results;
+    displayResults(results);
+  }
+}
+
+if (!window.Worker) {
+  console.log('Web Worker API not supported');
+  // load index in main thread
+  $.getScript(joinUrl(base_url, "search/worker.js")).done(function () {
+    console.log('Loaded worker');
+    init();
+    window.postMessage = function (msg) {
+      onWorkerMessage({data: msg});
+    };
+  }).fail(function (jqxhr, settings, exception) {
+    console.error('Could not load worker.js');
+  });
+} else {
+  // Wrap search in a web worker
+  var searchWorker = new Worker(joinUrl(base_url, "search/worker.js"));
+  searchWorker.postMessage({init: true});
+  searchWorker.onmessage = onWorkerMessage;
+}
diff --git a/docs/search/mustache.min.js b/docs/search/mustache.min.js
deleted file mode 100644
index 7fc6da86b89b8574a9e41ca74bd553de8043d7ed..0000000000000000000000000000000000000000
--- a/docs/search/mustache.min.js
+++ /dev/null
@@ -1 +0,0 @@
-(function(global,factory){if(typeof exports==="object"&&exports){factory(exports)}else if(typeof define==="function"&&define.amd){define(["exports"],factory)}else{factory(global.Mustache={})}})(this,function(mustache){var Object_toString=Object.prototype.toString;var isArray=Array.isArray||function(object){return Object_toString.call(object)==="[object Array]"};function isFunction(object){return typeof object==="function"}function escapeRegExp(string){return string.replace(/[\-\[\]{}()*+?.,\\\^$|#\s]/g,"\\$&")}var RegExp_test=RegExp.prototype.test;function testRegExp(re,string){return RegExp_test.call(re,string)}var nonSpaceRe=/\S/;function isWhitespace(string){return!testRegExp(nonSpaceRe,string)}var entityMap={"&":"&amp;","<":"&lt;",">":"&gt;",'"':"&quot;","'":"&#39;","/":"&#x2F;"};function escapeHtml(string){return String(string).replace(/[&<>"'\/]/g,function(s){return entityMap[s]})}var whiteRe=/\s*/;var spaceRe=/\s+/;var equalsRe=/\s*=/;var curlyRe=/\s*\}/;var tagRe=/#|\^|\/|>|\{|&|=|!/;function parseTemplate(template,tags){if(!template)return[];var sections=[];var tokens=[];var spaces=[];var hasTag=false;var nonSpace=false;function stripSpace(){if(hasTag&&!nonSpace){while(spaces.length)delete tokens[spaces.pop()]}else{spaces=[]}hasTag=false;nonSpace=false}var openingTagRe,closingTagRe,closingCurlyRe;function compileTags(tags){if(typeof tags==="string")tags=tags.split(spaceRe,2);if(!isArray(tags)||tags.length!==2)throw new Error("Invalid tags: "+tags);openingTagRe=new RegExp(escapeRegExp(tags[0])+"\\s*");closingTagRe=new RegExp("\\s*"+escapeRegExp(tags[1]));closingCurlyRe=new RegExp("\\s*"+escapeRegExp("}"+tags[1]))}compileTags(tags||mustache.tags);var scanner=new Scanner(template);var start,type,value,chr,token,openSection;while(!scanner.eos()){start=scanner.pos;value=scanner.scanUntil(openingTagRe);if(value){for(var i=0,valueLength=value.length;i<valueLength;++i){chr=value.charAt(i);if(isWhitespace(chr)){spaces.push(tokens.length)}else{nonSpace=true}tokens.push(["text",chr,start,start+1]);start+=1;if(chr==="\n")stripSpace()}}if(!scanner.scan(openingTagRe))break;hasTag=true;type=scanner.scan(tagRe)||"name";scanner.scan(whiteRe);if(type==="="){value=scanner.scanUntil(equalsRe);scanner.scan(equalsRe);scanner.scanUntil(closingTagRe)}else if(type==="{"){value=scanner.scanUntil(closingCurlyRe);scanner.scan(curlyRe);scanner.scanUntil(closingTagRe);type="&"}else{value=scanner.scanUntil(closingTagRe)}if(!scanner.scan(closingTagRe))throw new Error("Unclosed tag at "+scanner.pos);token=[type,value,start,scanner.pos];tokens.push(token);if(type==="#"||type==="^"){sections.push(token)}else if(type==="/"){openSection=sections.pop();if(!openSection)throw new Error('Unopened section "'+value+'" at '+start);if(openSection[1]!==value)throw new Error('Unclosed section "'+openSection[1]+'" at '+start)}else if(type==="name"||type==="{"||type==="&"){nonSpace=true}else if(type==="="){compileTags(value)}}openSection=sections.pop();if(openSection)throw new Error('Unclosed section "'+openSection[1]+'" at '+scanner.pos);return nestTokens(squashTokens(tokens))}function squashTokens(tokens){var squashedTokens=[];var token,lastToken;for(var i=0,numTokens=tokens.length;i<numTokens;++i){token=tokens[i];if(token){if(token[0]==="text"&&lastToken&&lastToken[0]==="text"){lastToken[1]+=token[1];lastToken[3]=token[3]}else{squashedTokens.push(token);lastToken=token}}}return squashedTokens}function nestTokens(tokens){var nestedTokens=[];var collector=nestedTokens;var sections=[];var token,section;for(var i=0,numTokens=tokens.length;i<numTokens;++i){token=tokens[i];switch(token[0]){case"#":case"^":collector.push(token);sections.push(token);collector=token[4]=[];break;case"/":section=sections.pop();section[5]=token[2];collector=sections.length>0?sections[sections.length-1][4]:nestedTokens;break;default:collector.push(token)}}return nestedTokens}function Scanner(string){this.string=string;this.tail=string;this.pos=0}Scanner.prototype.eos=function(){return this.tail===""};Scanner.prototype.scan=function(re){var match=this.tail.match(re);if(!match||match.index!==0)return"";var string=match[0];this.tail=this.tail.substring(string.length);this.pos+=string.length;return string};Scanner.prototype.scanUntil=function(re){var index=this.tail.search(re),match;switch(index){case-1:match=this.tail;this.tail="";break;case 0:match="";break;default:match=this.tail.substring(0,index);this.tail=this.tail.substring(index)}this.pos+=match.length;return match};function Context(view,parentContext){this.view=view;this.cache={".":this.view};this.parent=parentContext}Context.prototype.push=function(view){return new Context(view,this)};Context.prototype.lookup=function(name){var cache=this.cache;var value;if(name in cache){value=cache[name]}else{var context=this,names,index,lookupHit=false;while(context){if(name.indexOf(".")>0){value=context.view;names=name.split(".");index=0;while(value!=null&&index<names.length){if(index===names.length-1&&value!=null)lookupHit=typeof value==="object"&&value.hasOwnProperty(names[index]);value=value[names[index++]]}}else if(context.view!=null&&typeof context.view==="object"){value=context.view[name];lookupHit=context.view.hasOwnProperty(name)}if(lookupHit)break;context=context.parent}cache[name]=value}if(isFunction(value))value=value.call(this.view);return value};function Writer(){this.cache={}}Writer.prototype.clearCache=function(){this.cache={}};Writer.prototype.parse=function(template,tags){var cache=this.cache;var tokens=cache[template];if(tokens==null)tokens=cache[template]=parseTemplate(template,tags);return tokens};Writer.prototype.render=function(template,view,partials){var tokens=this.parse(template);var context=view instanceof Context?view:new Context(view);return this.renderTokens(tokens,context,partials,template)};Writer.prototype.renderTokens=function(tokens,context,partials,originalTemplate){var buffer="";var token,symbol,value;for(var i=0,numTokens=tokens.length;i<numTokens;++i){value=undefined;token=tokens[i];symbol=token[0];if(symbol==="#")value=this._renderSection(token,context,partials,originalTemplate);else if(symbol==="^")value=this._renderInverted(token,context,partials,originalTemplate);else if(symbol===">")value=this._renderPartial(token,context,partials,originalTemplate);else if(symbol==="&")value=this._unescapedValue(token,context);else if(symbol==="name")value=this._escapedValue(token,context);else if(symbol==="text")value=this._rawValue(token);if(value!==undefined)buffer+=value}return buffer};Writer.prototype._renderSection=function(token,context,partials,originalTemplate){var self=this;var buffer="";var value=context.lookup(token[1]);function subRender(template){return self.render(template,context,partials)}if(!value)return;if(isArray(value)){for(var j=0,valueLength=value.length;j<valueLength;++j){buffer+=this.renderTokens(token[4],context.push(value[j]),partials,originalTemplate)}}else if(typeof value==="object"||typeof value==="string"||typeof value==="number"){buffer+=this.renderTokens(token[4],context.push(value),partials,originalTemplate)}else if(isFunction(value)){if(typeof originalTemplate!=="string")throw new Error("Cannot use higher-order sections without the original template");value=value.call(context.view,originalTemplate.slice(token[3],token[5]),subRender);if(value!=null)buffer+=value}else{buffer+=this.renderTokens(token[4],context,partials,originalTemplate)}return buffer};Writer.prototype._renderInverted=function(token,context,partials,originalTemplate){var value=context.lookup(token[1]);if(!value||isArray(value)&&value.length===0)return this.renderTokens(token[4],context,partials,originalTemplate)};Writer.prototype._renderPartial=function(token,context,partials){if(!partials)return;var value=isFunction(partials)?partials(token[1]):partials[token[1]];if(value!=null)return this.renderTokens(this.parse(value),context,partials,value)};Writer.prototype._unescapedValue=function(token,context){var value=context.lookup(token[1]);if(value!=null)return value};Writer.prototype._escapedValue=function(token,context){var value=context.lookup(token[1]);if(value!=null)return mustache.escape(value)};Writer.prototype._rawValue=function(token){return token[1]};mustache.name="mustache.js";mustache.version="2.0.0";mustache.tags=["{{","}}"];var defaultWriter=new Writer;mustache.clearCache=function(){return defaultWriter.clearCache()};mustache.parse=function(template,tags){return defaultWriter.parse(template,tags)};mustache.render=function(template,view,partials){return defaultWriter.render(template,view,partials)};mustache.to_html=function(template,view,partials,send){var result=mustache.render(template,view,partials);if(isFunction(send)){send(result)}else{return result}};mustache.escape=escapeHtml;mustache.Scanner=Scanner;mustache.Context=Context;mustache.Writer=Writer});
\ No newline at end of file
diff --git a/docs/search/require.js b/docs/search/require.js
deleted file mode 100644
index 8638a3101b565f5d4a8eaef8352c4759a9643030..0000000000000000000000000000000000000000
--- a/docs/search/require.js
+++ /dev/null
@@ -1,36 +0,0 @@
-/*
- RequireJS 2.1.16 Copyright (c) 2010-2015, The Dojo Foundation All Rights Reserved.
- Available via the MIT or new BSD license.
- see: http://github.com/jrburke/requirejs for details
-*/
-var requirejs,require,define;
-(function(ba){function G(b){return"[object Function]"===K.call(b)}function H(b){return"[object Array]"===K.call(b)}function v(b,c){if(b){var d;for(d=0;d<b.length&&(!b[d]||!c(b[d],d,b));d+=1);}}function T(b,c){if(b){var d;for(d=b.length-1;-1<d&&(!b[d]||!c(b[d],d,b));d-=1);}}function t(b,c){return fa.call(b,c)}function m(b,c){return t(b,c)&&b[c]}function B(b,c){for(var d in b)if(t(b,d)&&c(b[d],d))break}function U(b,c,d,e){c&&B(c,function(c,g){if(d||!t(b,g))e&&"object"===typeof c&&c&&!H(c)&&!G(c)&&!(c instanceof
-RegExp)?(b[g]||(b[g]={}),U(b[g],c,d,e)):b[g]=c});return b}function u(b,c){return function(){return c.apply(b,arguments)}}function ca(b){throw b;}function da(b){if(!b)return b;var c=ba;v(b.split("."),function(b){c=c[b]});return c}function C(b,c,d,e){c=Error(c+"\nhttp://requirejs.org/docs/errors.html#"+b);c.requireType=b;c.requireModules=e;d&&(c.originalError=d);return c}function ga(b){function c(a,k,b){var f,l,c,d,e,g,i,p,k=k&&k.split("/"),h=j.map,n=h&&h["*"];if(a){a=a.split("/");l=a.length-1;j.nodeIdCompat&&
-Q.test(a[l])&&(a[l]=a[l].replace(Q,""));"."===a[0].charAt(0)&&k&&(l=k.slice(0,k.length-1),a=l.concat(a));l=a;for(c=0;c<l.length;c++)if(d=l[c],"."===d)l.splice(c,1),c-=1;else if(".."===d&&!(0===c||1==c&&".."===l[2]||".."===l[c-1])&&0<c)l.splice(c-1,2),c-=2;a=a.join("/")}if(b&&h&&(k||n)){l=a.split("/");c=l.length;a:for(;0<c;c-=1){e=l.slice(0,c).join("/");if(k)for(d=k.length;0<d;d-=1)if(b=m(h,k.slice(0,d).join("/")))if(b=m(b,e)){f=b;g=c;break a}!i&&(n&&m(n,e))&&(i=m(n,e),p=c)}!f&&i&&(f=i,g=p);f&&(l.splice(0,
-g,f),a=l.join("/"))}return(f=m(j.pkgs,a))?f:a}function d(a){z&&v(document.getElementsByTagName("script"),function(k){if(k.getAttribute("data-requiremodule")===a&&k.getAttribute("data-requirecontext")===i.contextName)return k.parentNode.removeChild(k),!0})}function e(a){var k=m(j.paths,a);if(k&&H(k)&&1<k.length)return k.shift(),i.require.undef(a),i.makeRequire(null,{skipMap:!0})([a]),!0}function n(a){var k,c=a?a.indexOf("!"):-1;-1<c&&(k=a.substring(0,c),a=a.substring(c+1,a.length));return[k,a]}function p(a,
-k,b,f){var l,d,e=null,g=k?k.name:null,j=a,p=!0,h="";a||(p=!1,a="_@r"+(K+=1));a=n(a);e=a[0];a=a[1];e&&(e=c(e,g,f),d=m(r,e));a&&(e?h=d&&d.normalize?d.normalize(a,function(a){return c(a,g,f)}):-1===a.indexOf("!")?c(a,g,f):a:(h=c(a,g,f),a=n(h),e=a[0],h=a[1],b=!0,l=i.nameToUrl(h)));b=e&&!d&&!b?"_unnormalized"+(O+=1):"";return{prefix:e,name:h,parentMap:k,unnormalized:!!b,url:l,originalName:j,isDefine:p,id:(e?e+"!"+h:h)+b}}function s(a){var k=a.id,b=m(h,k);b||(b=h[k]=new i.Module(a));return b}function q(a,
-k,b){var f=a.id,c=m(h,f);if(t(r,f)&&(!c||c.defineEmitComplete))"defined"===k&&b(r[f]);else if(c=s(a),c.error&&"error"===k)b(c.error);else c.on(k,b)}function w(a,b){var c=a.requireModules,f=!1;if(b)b(a);else if(v(c,function(b){if(b=m(h,b))b.error=a,b.events.error&&(f=!0,b.emit("error",a))}),!f)g.onError(a)}function x(){R.length&&(ha.apply(A,[A.length,0].concat(R)),R=[])}function y(a){delete h[a];delete V[a]}function F(a,b,c){var f=a.map.id;a.error?a.emit("error",a.error):(b[f]=!0,v(a.depMaps,function(f,
-d){var e=f.id,g=m(h,e);g&&(!a.depMatched[d]&&!c[e])&&(m(b,e)?(a.defineDep(d,r[e]),a.check()):F(g,b,c))}),c[f]=!0)}function D(){var a,b,c=(a=1E3*j.waitSeconds)&&i.startTime+a<(new Date).getTime(),f=[],l=[],g=!1,h=!0;if(!W){W=!0;B(V,function(a){var i=a.map,j=i.id;if(a.enabled&&(i.isDefine||l.push(a),!a.error))if(!a.inited&&c)e(j)?g=b=!0:(f.push(j),d(j));else if(!a.inited&&(a.fetched&&i.isDefine)&&(g=!0,!i.prefix))return h=!1});if(c&&f.length)return a=C("timeout","Load timeout for modules: "+f,null,
-f),a.contextName=i.contextName,w(a);h&&v(l,function(a){F(a,{},{})});if((!c||b)&&g)if((z||ea)&&!X)X=setTimeout(function(){X=0;D()},50);W=!1}}function E(a){t(r,a[0])||s(p(a[0],null,!0)).init(a[1],a[2])}function I(a){var a=a.currentTarget||a.srcElement,b=i.onScriptLoad;a.detachEvent&&!Y?a.detachEvent("onreadystatechange",b):a.removeEventListener("load",b,!1);b=i.onScriptError;(!a.detachEvent||Y)&&a.removeEventListener("error",b,!1);return{node:a,id:a&&a.getAttribute("data-requiremodule")}}function J(){var a;
-for(x();A.length;){a=A.shift();if(null===a[0])return w(C("mismatch","Mismatched anonymous define() module: "+a[a.length-1]));E(a)}}var W,Z,i,L,X,j={waitSeconds:7,baseUrl:"./",paths:{},bundles:{},pkgs:{},shim:{},config:{}},h={},V={},$={},A=[],r={},S={},aa={},K=1,O=1;L={require:function(a){return a.require?a.require:a.require=i.makeRequire(a.map)},exports:function(a){a.usingExports=!0;if(a.map.isDefine)return a.exports?r[a.map.id]=a.exports:a.exports=r[a.map.id]={}},module:function(a){return a.module?
-a.module:a.module={id:a.map.id,uri:a.map.url,config:function(){return m(j.config,a.map.id)||{}},exports:a.exports||(a.exports={})}}};Z=function(a){this.events=m($,a.id)||{};this.map=a;this.shim=m(j.shim,a.id);this.depExports=[];this.depMaps=[];this.depMatched=[];this.pluginMaps={};this.depCount=0};Z.prototype={init:function(a,b,c,f){f=f||{};if(!this.inited){this.factory=b;if(c)this.on("error",c);else this.events.error&&(c=u(this,function(a){this.emit("error",a)}));this.depMaps=a&&a.slice(0);this.errback=
-c;this.inited=!0;this.ignore=f.ignore;f.enabled||this.enabled?this.enable():this.check()}},defineDep:function(a,b){this.depMatched[a]||(this.depMatched[a]=!0,this.depCount-=1,this.depExports[a]=b)},fetch:function(){if(!this.fetched){this.fetched=!0;i.startTime=(new Date).getTime();var a=this.map;if(this.shim)i.makeRequire(this.map,{enableBuildCallback:!0})(this.shim.deps||[],u(this,function(){return a.prefix?this.callPlugin():this.load()}));else return a.prefix?this.callPlugin():this.load()}},load:function(){var a=
-this.map.url;S[a]||(S[a]=!0,i.load(this.map.id,a))},check:function(){if(this.enabled&&!this.enabling){var a,b,c=this.map.id;b=this.depExports;var f=this.exports,l=this.factory;if(this.inited)if(this.error)this.emit("error",this.error);else{if(!this.defining){this.defining=!0;if(1>this.depCount&&!this.defined){if(G(l)){if(this.events.error&&this.map.isDefine||g.onError!==ca)try{f=i.execCb(c,l,b,f)}catch(d){a=d}else f=i.execCb(c,l,b,f);this.map.isDefine&&void 0===f&&((b=this.module)?f=b.exports:this.usingExports&&
-(f=this.exports));if(a)return a.requireMap=this.map,a.requireModules=this.map.isDefine?[this.map.id]:null,a.requireType=this.map.isDefine?"define":"require",w(this.error=a)}else f=l;this.exports=f;if(this.map.isDefine&&!this.ignore&&(r[c]=f,g.onResourceLoad))g.onResourceLoad(i,this.map,this.depMaps);y(c);this.defined=!0}this.defining=!1;this.defined&&!this.defineEmitted&&(this.defineEmitted=!0,this.emit("defined",this.exports),this.defineEmitComplete=!0)}}else this.fetch()}},callPlugin:function(){var a=
-this.map,b=a.id,d=p(a.prefix);this.depMaps.push(d);q(d,"defined",u(this,function(f){var l,d;d=m(aa,this.map.id);var e=this.map.name,P=this.map.parentMap?this.map.parentMap.name:null,n=i.makeRequire(a.parentMap,{enableBuildCallback:!0});if(this.map.unnormalized){if(f.normalize&&(e=f.normalize(e,function(a){return c(a,P,!0)})||""),f=p(a.prefix+"!"+e,this.map.parentMap),q(f,"defined",u(this,function(a){this.init([],function(){return a},null,{enabled:!0,ignore:!0})})),d=m(h,f.id)){this.depMaps.push(f);
-if(this.events.error)d.on("error",u(this,function(a){this.emit("error",a)}));d.enable()}}else d?(this.map.url=i.nameToUrl(d),this.load()):(l=u(this,function(a){this.init([],function(){return a},null,{enabled:!0})}),l.error=u(this,function(a){this.inited=!0;this.error=a;a.requireModules=[b];B(h,function(a){0===a.map.id.indexOf(b+"_unnormalized")&&y(a.map.id)});w(a)}),l.fromText=u(this,function(f,c){var d=a.name,e=p(d),P=M;c&&(f=c);P&&(M=!1);s(e);t(j.config,b)&&(j.config[d]=j.config[b]);try{g.exec(f)}catch(h){return w(C("fromtexteval",
-"fromText eval for "+b+" failed: "+h,h,[b]))}P&&(M=!0);this.depMaps.push(e);i.completeLoad(d);n([d],l)}),f.load(a.name,n,l,j))}));i.enable(d,this);this.pluginMaps[d.id]=d},enable:function(){V[this.map.id]=this;this.enabling=this.enabled=!0;v(this.depMaps,u(this,function(a,b){var c,f;if("string"===typeof a){a=p(a,this.map.isDefine?this.map:this.map.parentMap,!1,!this.skipMap);this.depMaps[b]=a;if(c=m(L,a.id)){this.depExports[b]=c(this);return}this.depCount+=1;q(a,"defined",u(this,function(a){this.defineDep(b,
-a);this.check()}));this.errback?q(a,"error",u(this,this.errback)):this.events.error&&q(a,"error",u(this,function(a){this.emit("error",a)}))}c=a.id;f=h[c];!t(L,c)&&(f&&!f.enabled)&&i.enable(a,this)}));B(this.pluginMaps,u(this,function(a){var b=m(h,a.id);b&&!b.enabled&&i.enable(a,this)}));this.enabling=!1;this.check()},on:function(a,b){var c=this.events[a];c||(c=this.events[a]=[]);c.push(b)},emit:function(a,b){v(this.events[a],function(a){a(b)});"error"===a&&delete this.events[a]}};i={config:j,contextName:b,
-registry:h,defined:r,urlFetched:S,defQueue:A,Module:Z,makeModuleMap:p,nextTick:g.nextTick,onError:w,configure:function(a){a.baseUrl&&"/"!==a.baseUrl.charAt(a.baseUrl.length-1)&&(a.baseUrl+="/");var b=j.shim,c={paths:!0,bundles:!0,config:!0,map:!0};B(a,function(a,b){c[b]?(j[b]||(j[b]={}),U(j[b],a,!0,!0)):j[b]=a});a.bundles&&B(a.bundles,function(a,b){v(a,function(a){a!==b&&(aa[a]=b)})});a.shim&&(B(a.shim,function(a,c){H(a)&&(a={deps:a});if((a.exports||a.init)&&!a.exportsFn)a.exportsFn=i.makeShimExports(a);
-b[c]=a}),j.shim=b);a.packages&&v(a.packages,function(a){var b,a="string"===typeof a?{name:a}:a;b=a.name;a.location&&(j.paths[b]=a.location);j.pkgs[b]=a.name+"/"+(a.main||"main").replace(ia,"").replace(Q,"")});B(h,function(a,b){!a.inited&&!a.map.unnormalized&&(a.map=p(b))});if(a.deps||a.callback)i.require(a.deps||[],a.callback)},makeShimExports:function(a){return function(){var b;a.init&&(b=a.init.apply(ba,arguments));return b||a.exports&&da(a.exports)}},makeRequire:function(a,e){function j(c,d,m){var n,
-q;e.enableBuildCallback&&(d&&G(d))&&(d.__requireJsBuild=!0);if("string"===typeof c){if(G(d))return w(C("requireargs","Invalid require call"),m);if(a&&t(L,c))return L[c](h[a.id]);if(g.get)return g.get(i,c,a,j);n=p(c,a,!1,!0);n=n.id;return!t(r,n)?w(C("notloaded",'Module name "'+n+'" has not been loaded yet for context: '+b+(a?"":". Use require([])"))):r[n]}J();i.nextTick(function(){J();q=s(p(null,a));q.skipMap=e.skipMap;q.init(c,d,m,{enabled:!0});D()});return j}e=e||{};U(j,{isBrowser:z,toUrl:function(b){var d,
-e=b.lastIndexOf("."),k=b.split("/")[0];if(-1!==e&&(!("."===k||".."===k)||1<e))d=b.substring(e,b.length),b=b.substring(0,e);return i.nameToUrl(c(b,a&&a.id,!0),d,!0)},defined:function(b){return t(r,p(b,a,!1,!0).id)},specified:function(b){b=p(b,a,!1,!0).id;return t(r,b)||t(h,b)}});a||(j.undef=function(b){x();var c=p(b,a,!0),e=m(h,b);d(b);delete r[b];delete S[c.url];delete $[b];T(A,function(a,c){a[0]===b&&A.splice(c,1)});e&&(e.events.defined&&($[b]=e.events),y(b))});return j},enable:function(a){m(h,a.id)&&
-s(a).enable()},completeLoad:function(a){var b,c,d=m(j.shim,a)||{},g=d.exports;for(x();A.length;){c=A.shift();if(null===c[0]){c[0]=a;if(b)break;b=!0}else c[0]===a&&(b=!0);E(c)}c=m(h,a);if(!b&&!t(r,a)&&c&&!c.inited){if(j.enforceDefine&&(!g||!da(g)))return e(a)?void 0:w(C("nodefine","No define call for "+a,null,[a]));E([a,d.deps||[],d.exportsFn])}D()},nameToUrl:function(a,b,c){var d,e,h;(d=m(j.pkgs,a))&&(a=d);if(d=m(aa,a))return i.nameToUrl(d,b,c);if(g.jsExtRegExp.test(a))d=a+(b||"");else{d=j.paths;
-a=a.split("/");for(e=a.length;0<e;e-=1)if(h=a.slice(0,e).join("/"),h=m(d,h)){H(h)&&(h=h[0]);a.splice(0,e,h);break}d=a.join("/");d+=b||(/^data\:|\?/.test(d)||c?"":".js");d=("/"===d.charAt(0)||d.match(/^[\w\+\.\-]+:/)?"":j.baseUrl)+d}return j.urlArgs?d+((-1===d.indexOf("?")?"?":"&")+j.urlArgs):d},load:function(a,b){g.load(i,a,b)},execCb:function(a,b,c,d){return b.apply(d,c)},onScriptLoad:function(a){if("load"===a.type||ja.test((a.currentTarget||a.srcElement).readyState))N=null,a=I(a),i.completeLoad(a.id)},
-onScriptError:function(a){var b=I(a);if(!e(b.id))return w(C("scripterror","Script error for: "+b.id,a,[b.id]))}};i.require=i.makeRequire();return i}var g,x,y,D,I,E,N,J,s,O,ka=/(\/\*([\s\S]*?)\*\/|([^:]|^)\/\/(.*)$)/mg,la=/[^.]\s*require\s*\(\s*["']([^'"\s]+)["']\s*\)/g,Q=/\.js$/,ia=/^\.\//;x=Object.prototype;var K=x.toString,fa=x.hasOwnProperty,ha=Array.prototype.splice,z=!!("undefined"!==typeof window&&"undefined"!==typeof navigator&&window.document),ea=!z&&"undefined"!==typeof importScripts,ja=
-z&&"PLAYSTATION 3"===navigator.platform?/^complete$/:/^(complete|loaded)$/,Y="undefined"!==typeof opera&&"[object Opera]"===opera.toString(),F={},q={},R=[],M=!1;if("undefined"===typeof define){if("undefined"!==typeof requirejs){if(G(requirejs))return;q=requirejs;requirejs=void 0}"undefined"!==typeof require&&!G(require)&&(q=require,require=void 0);g=requirejs=function(b,c,d,e){var n,p="_";!H(b)&&"string"!==typeof b&&(n=b,H(c)?(b=c,c=d,d=e):b=[]);n&&n.context&&(p=n.context);(e=m(F,p))||(e=F[p]=g.s.newContext(p));
-n&&e.configure(n);return e.require(b,c,d)};g.config=function(b){return g(b)};g.nextTick="undefined"!==typeof setTimeout?function(b){setTimeout(b,4)}:function(b){b()};require||(require=g);g.version="2.1.16";g.jsExtRegExp=/^\/|:|\?|\.js$/;g.isBrowser=z;x=g.s={contexts:F,newContext:ga};g({});v(["toUrl","undef","defined","specified"],function(b){g[b]=function(){var c=F._;return c.require[b].apply(c,arguments)}});if(z&&(y=x.head=document.getElementsByTagName("head")[0],D=document.getElementsByTagName("base")[0]))y=
-x.head=D.parentNode;g.onError=ca;g.createNode=function(b){var c=b.xhtml?document.createElementNS("http://www.w3.org/1999/xhtml","html:script"):document.createElement("script");c.type=b.scriptType||"text/javascript";c.charset="utf-8";c.async=!0;return c};g.load=function(b,c,d){var e=b&&b.config||{};if(z)return e=g.createNode(e,c,d),e.setAttribute("data-requirecontext",b.contextName),e.setAttribute("data-requiremodule",c),e.attachEvent&&!(e.attachEvent.toString&&0>e.attachEvent.toString().indexOf("[native code"))&&
-!Y?(M=!0,e.attachEvent("onreadystatechange",b.onScriptLoad)):(e.addEventListener("load",b.onScriptLoad,!1),e.addEventListener("error",b.onScriptError,!1)),e.src=d,J=e,D?y.insertBefore(e,D):y.appendChild(e),J=null,e;if(ea)try{importScripts(d),b.completeLoad(c)}catch(m){b.onError(C("importscripts","importScripts failed for "+c+" at "+d,m,[c]))}};z&&!q.skipDataMain&&T(document.getElementsByTagName("script"),function(b){y||(y=b.parentNode);if(I=b.getAttribute("data-main"))return s=I,q.baseUrl||(E=s.split("/"),
-s=E.pop(),O=E.length?E.join("/")+"/":"./",q.baseUrl=O),s=s.replace(Q,""),g.jsExtRegExp.test(s)&&(s=I),q.deps=q.deps?q.deps.concat(s):[s],!0});define=function(b,c,d){var e,g;"string"!==typeof b&&(d=c,c=b,b=null);H(c)||(d=c,c=null);!c&&G(d)&&(c=[],d.length&&(d.toString().replace(ka,"").replace(la,function(b,d){c.push(d)}),c=(1===d.length?["require"]:["require","exports","module"]).concat(c)));if(M){if(!(e=J))N&&"interactive"===N.readyState||T(document.getElementsByTagName("script"),function(b){if("interactive"===
-b.readyState)return N=b}),e=N;e&&(b||(b=e.getAttribute("data-requiremodule")),g=F[e.getAttribute("data-requirecontext")])}(g?g.defQueue:R).push([b,c,d])};define.amd={jQuery:!0};g.exec=function(b){return eval(b)};g(q)}})(this);
diff --git a/docs/search/search-results-template.mustache b/docs/search/search-results-template.mustache
deleted file mode 100644
index a8b3862f2005ca405eaa19386dc32e82731238ac..0000000000000000000000000000000000000000
--- a/docs/search/search-results-template.mustache
+++ /dev/null
@@ -1,4 +0,0 @@
-<article>
-  <h3><a href="{{location}}">{{title}}</a></h3>
-  <p>{{summary}}</p>
-</article>
diff --git a/docs/search/search.js b/docs/search/search.js
deleted file mode 100644
index 2283930c833e25410fe35a52b6dcb87c13fe2eb1..0000000000000000000000000000000000000000
--- a/docs/search/search.js
+++ /dev/null
@@ -1,92 +0,0 @@
-require.config({
-   baseUrl: base_url + "/search/"
-});
-
-require([
-    'mustache.min',
-    'lunr.min',
-    'text!search-results-template.mustache',
-    'text!search_index.json',
-], function (Mustache, lunr, results_template, data) {
-   "use strict";
-
-    function getSearchTerm()
-    {
-        var sPageURL = window.location.search.substring(1);
-        var sURLVariables = sPageURL.split('&');
-        for (var i = 0; i < sURLVariables.length; i++)
-        {
-            var sParameterName = sURLVariables[i].split('=');
-            if (sParameterName[0] == 'q')
-            {
-                return decodeURIComponent(sParameterName[1].replace(/\+/g, '%20'));
-            }
-        }
-    }
-
-    var index = lunr(function () {
-        this.field('title', {boost: 10});
-        this.field('text');
-        this.ref('location');
-    });
-
-    data = JSON.parse(data);
-    var documents = {};
-
-    for (var i=0; i < data.docs.length; i++){
-        var doc = data.docs[i];
-        doc.location = base_url + doc.location;
-        index.add(doc);
-        documents[doc.location] = doc;
-    }
-
-    var search = function(){
-
-        var query = document.getElementById('mkdocs-search-query').value;
-        var search_results = document.getElementById("mkdocs-search-results");
-        while (search_results.firstChild) {
-            search_results.removeChild(search_results.firstChild);
-        }
-
-        if(query === ''){
-            return;
-        }
-
-        var results = index.search(query);
-
-        if (results.length > 0){
-            for (var i=0; i < results.length; i++){
-                var result = results[i];
-                doc = documents[result.ref];
-                doc.base_url = base_url;
-                doc.summary = doc.text.substring(0, 200);
-                var html = Mustache.to_html(results_template, doc);
-                search_results.insertAdjacentHTML('beforeend', html);
-            }
-        } else {
-            search_results.insertAdjacentHTML('beforeend', "<p>No results found</p>");
-        }
-
-        if(jQuery){
-            /*
-             * We currently only automatically hide bootstrap models. This
-             * requires jQuery to work.
-             */
-            jQuery('#mkdocs_search_modal a').click(function(){
-                jQuery('#mkdocs_search_modal').modal('hide');
-            });
-        }
-
-    };
-
-    var search_input = document.getElementById('mkdocs-search-query');
-
-    var term = getSearchTerm();
-    if (term){
-        search_input.value = term;
-        search();
-    }
-
-    if (search_input){search_input.addEventListener("keyup", search);}
-
-});
diff --git a/docs/search/search_index.json b/docs/search/search_index.json
index 0ab63076ffa9a360742ebcb7a159ebdcbdbd4c51..105f5b1bbb2b53c9c4826deef18a241cb3a489be 100644
--- a/docs/search/search_index.json
+++ b/docs/search/search_index.json
@@ -1,854 +1 @@
-{
-    "docs": [
-        {
-            "location": "/index.html", 
-            "text": "Distiller Documentation\n\n\nWhat is Distiller\n\n\nDistiller\n is an open-source Python package for neural network compression research.\n\n\nNetwork compression can reduce the footprint of a neural network, increase its inference speed and save energy. Distiller provides a \nPyTorch\n environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low precision arithmetic.\n\n\nDistiller contains:\n\n\n\n\nA framework for integrating pruning, regularization and quantization algorithms.\n\n\nA set of tools for analyzing and evaluating compression performance.\n\n\nExample implementations of state-of-the-art compression algorithms.\n\n\n\n\nMotivation\n\n\nA sparse tensor is any tensor that contains some zeros, but sparse tensors are usually only interesting if they contain a significant number of zeros.  A sparse neural network performs computations using some sparse tensors (preferably many).  These tensors can be parameters (weights and biases) or activations (feature maps).\n\n\nWhy do we care about sparsity?\n\nPresent day neural networks tend to be deep, with millions of weights and activations.  Refer to GoogLeNet or ResNet50, for a couple of examples.\nThese large models are compute-intensive which means that even with dedicated acceleration hardware, the inference pass (network evaluation) will take time.  You might think that latency is an issue only in certain cases, such as autonomous driving systems, but in fact, whenever we humans interact with our phones and computers, we are sensitive to the latency of the interaction.  We don't like to wait for search results or for an application or web-page to load, and we are especially sensitive in realtime interactions such as speech recognition.  So inference latency is often something we want to minimize.\n\n\nLarge models are also memory-intensive with millions of parameters.  Moving around all of the data required to compute inference results consumes energy, which is a problem on a mobile device as well as in a server environment.  Data center server-racks are limited by their power-envelope and their ToC (total cost of ownership) is correlated to their power consumption and thermal characteristics.  In the mobile device environment, we are obviously always aware of the implications of power consumption on the device battery.\nInference performance in the data center is often measured using a KPI (key performance indicator) which folds latency and power considerations: inferences per second, per Watt (inferences/sec/watt).\n\n\nThe storage and transfer of large neural networks is also a challenge in mobile device environments, because of limitations on application sizes and long application download times.\n\n\nFor these reasons, we wish to compress the network as much as possible, to reduce the amount of bandwidth and compute required.  Inducing sparseness, through regularization or pruning, in neural-network models, is one way to compress the network (quantization is another method).\nSparse neural networks hold the promise of speed, small size, and energy efficiency.  \n\n\nSmaller\n\n\nSparse NN model representations can be compressed by taking advantage of the fact that the tensor elements are dominated by zeros.  The compression format, if any, is very HW and SW specific, and the optimal format may be different per tensor (an obvious example: largely dense tensors should not be compressed).  The compute hardware needs to support the compressions formats, for representation compression to be meaningful.  Compression representation decisions might interact with algorithms such as the use of tiles for memory accesses.  Data such as a parameter tensor is read/written from/to main system memory compressed, but the computation can be dense or sparse.  In dense compute we use dense operators, so the compressed data eventually needs to be decompressed into its full, dense size.  The best we can do is bring the compressed representation as close as possible to the compute engine.\n\nSparse compute, on the other hand, operates on the sparse representation which never requires decompression (we therefore distinguish between sparse representation and compressed representation).  This is not a simple matter to implement in HW, and often means lower utilization of the vectorized compute engines.  Therefore, there is a third class of representations, which take advantage of specific hardware characteristics.  For example, for a vectorized compute engine we can remove an entire zero-weights vector and skip its computation (this uses structured pruning or regularization).\n\n\nFaster\n\n\nMany of the layers in modern neural-networks are bandwidth-bound, which means that the execution latency is dominated by the available bandwidth. In essence, the hardware spends more time bringing data close to the compute engines, than actually performing the computations.  Fully-connected layers, RNNs and LSTMs are some examples of bandwidth-dominated operations.\n\nReducing the bandwidth required by these layers, will immediately speed them up.\n\nSome pruning algorithms prune entire kernels, filters and even layers from the network without adversely impacting the final accuracy.  Depending on the hardware implementation, these methods can be leveraged to skip computations, thus reducing latency and power.\n\n\nMore energy efficient\n\n\nBecause we pay two orders-of-magnitude more energy to access off-chip memory (e.g. DDR) compared to on-chip memory (e.g. SRAM or cache), many hardware designs employ a multi-layered cache hierarchy.  Fitting the parameters and activations of a network in these on-chip caches can make a big difference on the required bandwidth, the total inference latency, and off course reduce power consumption.\n\nAnd of course, if we used a sparse or compressed representation, then we are reducing the data throughput and therefore the energy consumption.", 
-            "title": "Home"
-        }, 
-        {
-            "location": "/index.html#distiller-documentation", 
-            "text": "", 
-            "title": "Distiller Documentation"
-        }, 
-        {
-            "location": "/index.html#what-is-distiller", 
-            "text": "Distiller  is an open-source Python package for neural network compression research.  Network compression can reduce the footprint of a neural network, increase its inference speed and save energy. Distiller provides a  PyTorch  environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low precision arithmetic.  Distiller contains:   A framework for integrating pruning, regularization and quantization algorithms.  A set of tools for analyzing and evaluating compression performance.  Example implementations of state-of-the-art compression algorithms.", 
-            "title": "What is Distiller"
-        }, 
-        {
-            "location": "/index.html#motivation", 
-            "text": "A sparse tensor is any tensor that contains some zeros, but sparse tensors are usually only interesting if they contain a significant number of zeros.  A sparse neural network performs computations using some sparse tensors (preferably many).  These tensors can be parameters (weights and biases) or activations (feature maps).  Why do we care about sparsity? \nPresent day neural networks tend to be deep, with millions of weights and activations.  Refer to GoogLeNet or ResNet50, for a couple of examples.\nThese large models are compute-intensive which means that even with dedicated acceleration hardware, the inference pass (network evaluation) will take time.  You might think that latency is an issue only in certain cases, such as autonomous driving systems, but in fact, whenever we humans interact with our phones and computers, we are sensitive to the latency of the interaction.  We don't like to wait for search results or for an application or web-page to load, and we are especially sensitive in realtime interactions such as speech recognition.  So inference latency is often something we want to minimize. \nLarge models are also memory-intensive with millions of parameters.  Moving around all of the data required to compute inference results consumes energy, which is a problem on a mobile device as well as in a server environment.  Data center server-racks are limited by their power-envelope and their ToC (total cost of ownership) is correlated to their power consumption and thermal characteristics.  In the mobile device environment, we are obviously always aware of the implications of power consumption on the device battery.\nInference performance in the data center is often measured using a KPI (key performance indicator) which folds latency and power considerations: inferences per second, per Watt (inferences/sec/watt). \nThe storage and transfer of large neural networks is also a challenge in mobile device environments, because of limitations on application sizes and long application download times. \nFor these reasons, we wish to compress the network as much as possible, to reduce the amount of bandwidth and compute required.  Inducing sparseness, through regularization or pruning, in neural-network models, is one way to compress the network (quantization is another method).\nSparse neural networks hold the promise of speed, small size, and energy efficiency.", 
-            "title": "Motivation"
-        }, 
-        {
-            "location": "/index.html#smaller", 
-            "text": "Sparse NN model representations can be compressed by taking advantage of the fact that the tensor elements are dominated by zeros.  The compression format, if any, is very HW and SW specific, and the optimal format may be different per tensor (an obvious example: largely dense tensors should not be compressed).  The compute hardware needs to support the compressions formats, for representation compression to be meaningful.  Compression representation decisions might interact with algorithms such as the use of tiles for memory accesses.  Data such as a parameter tensor is read/written from/to main system memory compressed, but the computation can be dense or sparse.  In dense compute we use dense operators, so the compressed data eventually needs to be decompressed into its full, dense size.  The best we can do is bring the compressed representation as close as possible to the compute engine. \nSparse compute, on the other hand, operates on the sparse representation which never requires decompression (we therefore distinguish between sparse representation and compressed representation).  This is not a simple matter to implement in HW, and often means lower utilization of the vectorized compute engines.  Therefore, there is a third class of representations, which take advantage of specific hardware characteristics.  For example, for a vectorized compute engine we can remove an entire zero-weights vector and skip its computation (this uses structured pruning or regularization).", 
-            "title": "Smaller"
-        }, 
-        {
-            "location": "/index.html#faster", 
-            "text": "Many of the layers in modern neural-networks are bandwidth-bound, which means that the execution latency is dominated by the available bandwidth. In essence, the hardware spends more time bringing data close to the compute engines, than actually performing the computations.  Fully-connected layers, RNNs and LSTMs are some examples of bandwidth-dominated operations. \nReducing the bandwidth required by these layers, will immediately speed them up. \nSome pruning algorithms prune entire kernels, filters and even layers from the network without adversely impacting the final accuracy.  Depending on the hardware implementation, these methods can be leveraged to skip computations, thus reducing latency and power.", 
-            "title": "Faster"
-        }, 
-        {
-            "location": "/index.html#more-energy-efficient", 
-            "text": "Because we pay two orders-of-magnitude more energy to access off-chip memory (e.g. DDR) compared to on-chip memory (e.g. SRAM or cache), many hardware designs employ a multi-layered cache hierarchy.  Fitting the parameters and activations of a network in these on-chip caches can make a big difference on the required bandwidth, the total inference latency, and off course reduce power consumption. \nAnd of course, if we used a sparse or compressed representation, then we are reducing the data throughput and therefore the energy consumption.", 
-            "title": "More energy efficient"
-        }, 
-        {
-            "location": "/install/index.html", 
-            "text": "Distiller Installation\n\n\nThese instructions will help get Distiller up and running on your local machine.\n\n\nYou may also want to refer to these resources:\n\n\n\n\nDataset installation\n instructions.\n\n\nJupyter installation\n instructions.\n\n\n\n\nNotes:\n- Distiller has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.\n- If you are not using a GPU, you might need to make small adjustments to the code.\n\n\nClone Distiller\n\n\nClone the Distiller code repository from github:\n\n\n$ git clone https://github.com/NervanaSystems/distiller.git\n\n\n\n\nThe rest of the documentation that follows, assumes that you have cloned your repository to a directory called \ndistiller\n. \n\n\nCreate a Python virtual environment\n\n\nWe recommend using a \nPython virtual environment\n, but that of course, is up to you.\nThere's nothing special about using Distiller in a virtual environment, but we provide some instructions, for completeness.\n\nBefore creating the virtual environment, make sure you are located in directory \ndistiller\n.  After creating the environment, you should see a directory called \ndistiller/env\n.\n\n\n\nUsing virtualenv\n\n\nIf you don't have virtualenv installed, you can find the installation instructions \nhere\n.\n\n\nTo create the environment, execute:\n\n\n$ python3 -m virtualenv env\n\n\n\n\nThis creates a subdirectory named \nenv\n where the python virtual environment is stored, and configures the current shell to use it as the default python environment.\n\n\nUsing venv\n\n\nIf you prefer to use \nvenv\n, then begin by installing it:\n\n\n$ sudo apt-get install python3-venv\n\n\n\n\nThen create the environment:\n\n\n$ python3 -m venv env\n\n\n\n\nAs with virtualenv, this creates a directory called \ndistiller/env\n.\n\n\nActivate the environment\n\n\nThe environment activation and deactivation commands for \nvenv\n and \nvirtualenv\n are the same.\n\n\n!NOTE: Make sure to activate the environment, before proceeding with the installation of the dependency packages:\n\n\n$ source env/bin/activate\n\n\n\n\nInstall the package\n\n\nFinally, install the Distiller package and its dependencies using \npip3\n:\n\n\n$ cd distiller\n$ pip3 install -e .\n\n\n\n\nThis installs Distiller in \"development mode\", meaning any changes made in the code are reflected in the environment without re-running the install command (so no need to re-install after pulling changes from the Git repository).\n\n\nPyTorch is included in the \nrequirements.txt\n file, and will currently download PyTorch version 1.0.1 for CUDA 9.0.  This is the setup we've used for testing Distiller.", 
-            "title": "Installation"
-        }, 
-        {
-            "location": "/install/index.html#distiller-installation", 
-            "text": "These instructions will help get Distiller up and running on your local machine.  You may also want to refer to these resources:   Dataset installation  instructions.  Jupyter installation  instructions.   Notes:\n- Distiller has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.\n- If you are not using a GPU, you might need to make small adjustments to the code.", 
-            "title": "Distiller Installation"
-        }, 
-        {
-            "location": "/install/index.html#clone-distiller", 
-            "text": "Clone the Distiller code repository from github:  $ git clone https://github.com/NervanaSystems/distiller.git  The rest of the documentation that follows, assumes that you have cloned your repository to a directory called  distiller .", 
-            "title": "Clone Distiller"
-        }, 
-        {
-            "location": "/install/index.html#create-a-python-virtual-environment", 
-            "text": "We recommend using a  Python virtual environment , but that of course, is up to you.\nThere's nothing special about using Distiller in a virtual environment, but we provide some instructions, for completeness. \nBefore creating the virtual environment, make sure you are located in directory  distiller .  After creating the environment, you should see a directory called  distiller/env .", 
-            "title": "Create a Python virtual environment"
-        }, 
-        {
-            "location": "/install/index.html#using-virtualenv", 
-            "text": "If you don't have virtualenv installed, you can find the installation instructions  here .  To create the environment, execute:  $ python3 -m virtualenv env  This creates a subdirectory named  env  where the python virtual environment is stored, and configures the current shell to use it as the default python environment.", 
-            "title": "Using virtualenv"
-        }, 
-        {
-            "location": "/install/index.html#using-venv", 
-            "text": "If you prefer to use  venv , then begin by installing it:  $ sudo apt-get install python3-venv  Then create the environment:  $ python3 -m venv env  As with virtualenv, this creates a directory called  distiller/env .", 
-            "title": "Using venv"
-        }, 
-        {
-            "location": "/install/index.html#activate-the-environment", 
-            "text": "The environment activation and deactivation commands for  venv  and  virtualenv  are the same.  !NOTE: Make sure to activate the environment, before proceeding with the installation of the dependency packages:  $ source env/bin/activate", 
-            "title": "Activate the environment"
-        }, 
-        {
-            "location": "/install/index.html#install-the-package", 
-            "text": "Finally, install the Distiller package and its dependencies using  pip3 :  $ cd distiller\n$ pip3 install -e .  This installs Distiller in \"development mode\", meaning any changes made in the code are reflected in the environment without re-running the install command (so no need to re-install after pulling changes from the Git repository).  PyTorch is included in the  requirements.txt  file, and will currently download PyTorch version 1.0.1 for CUDA 9.0.  This is the setup we've used for testing Distiller.", 
-            "title": "Install the package"
-        }, 
-        {
-            "location": "/usage/index.html", 
-            "text": "Using the sample application\n\n\nThe Distiller repository contains a sample application, \ndistiller/examples/classifier_compression/compress_classifier.py\n, and a set of scheduling files which demonstrate Distiller's features.  Following is a brief discussion of how to use this application and the accompanying schedules.\n\n\nYou might also want to refer to the following resources:\n\n\n\n\nAn \nexplanation\n of the scheduler file format.\n\n\nAn in-depth \ndiscussion\n of how we used these schedule files to implement several state-of-the-art DNN compression research papers.\n\n\n\n\nThe sample application supports various features for compression of image classification DNNs, and gives an example of how to integrate distiller in your own application.  The code is documented and should be considered the best source of documentation, but we provide some elaboration here.\n\n\nThis diagram shows how where \ncompress_classifier.py\n fits in the compression workflow, and how we integrate the Jupyter notebooks as part of our research work.\n\n\n\nCommand line arguments\n\n\nTo get help on the command line arguments, invoke:\n\n\n$ python3 compress_classifier.py --help\n\n\n\n\nFor example:\n\n\n$ time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../sensitivity-pruning/alexnet.schedule_sensitivity.yaml\n\nParameters:\n +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n |    | Name                      | Shape            |   NNZ (dense) |   NNZ (sparse) |   Cols (%) |   Rows (%) |   Ch (%) |   2D (%) |   3D (%) |   Fine (%) |     Std |     Mean |   Abs-Mean |\n |----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|\n |  0 | features.module.0.weight  | (64, 3, 11, 11)  |         23232 |          13411 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   42.27359 | 0.14391 | -0.00002 |    0.08805 |\n |  1 | features.module.3.weight  | (192, 64, 5, 5)  |        307200 |         115560 |    0.00000 |    0.00000 |  0.00000 |  1.91243 |  0.00000 |   62.38281 | 0.04703 | -0.00250 |    0.02289 |\n |  2 | features.module.6.weight  | (384, 192, 3, 3) |        663552 |         256565 |    0.00000 |    0.00000 |  0.00000 |  6.18490 |  0.00000 |   61.33445 | 0.03354 | -0.00184 |    0.01803 |\n |  3 | features.module.8.weight  | (256, 384, 3, 3) |        884736 |         315065 |    0.00000 |    0.00000 |  0.00000 |  6.96411 |  0.00000 |   64.38881 | 0.02646 | -0.00168 |    0.01422 |\n |  4 | features.module.10.weight | (256, 256, 3, 3) |        589824 |         186938 |    0.00000 |    0.00000 |  0.00000 | 15.49225 |  0.00000 |   68.30614 | 0.02714 | -0.00246 |    0.01409 |\n |  5 | classifier.1.weight       | (4096, 9216)     |      37748736 |        3398881 |    0.00000 |    0.21973 |  0.00000 |  0.21973 |  0.00000 |   90.99604 | 0.00589 | -0.00020 |    0.00168 |\n |  6 | classifier.4.weight       | (4096, 4096)     |      16777216 |        1782769 |    0.21973 |    3.46680 |  0.00000 |  3.46680 |  0.00000 |   89.37387 | 0.00849 | -0.00066 |    0.00263 |\n |  7 | classifier.6.weight       | (1000, 4096)     |       4096000 |         994738 |    3.36914 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   75.71440 | 0.01718 |  0.00030 |    0.00778 |\n |  8 | Total sparsity:           | -                |      61090496 |        7063928 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   88.43694 | 0.00000 |  0.00000 |    0.00000 |\n +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n 2018-04-04 21:30:52,499 - Total sparsity: 88.44\n\n 2018-04-04 21:30:52,499 - --- validate (epoch=89)-----------\n 2018-04-04 21:30:52,499 - 128116 samples (256 per mini-batch)\n 2018-04-04 21:31:04,646 - Epoch: [89][   50/  500]    Loss 2.175988    Top1 51.289063    Top5 74.023438\n 2018-04-04 21:31:06,427 - Epoch: [89][  100/  500]    Loss 2.171564    Top1 51.175781    Top5 74.308594\n 2018-04-04 21:31:11,432 - Epoch: [89][  150/  500]    Loss 2.159347    Top1 51.546875    Top5 74.473958\n 2018-04-04 21:31:14,364 - Epoch: [89][  200/  500]    Loss 2.156857    Top1 51.585938    Top5 74.568359\n 2018-04-04 21:31:18,381 - Epoch: [89][  250/  500]    Loss 2.152790    Top1 51.707813    Top5 74.681250\n 2018-04-04 21:31:22,195 - Epoch: [89][  300/  500]    Loss 2.149962    Top1 51.791667    Top5 74.755208\n 2018-04-04 21:31:25,508 - Epoch: [89][  350/  500]    Loss 2.150936    Top1 51.827009    Top5 74.767857\n 2018-04-04 21:31:29,538 - Epoch: [89][  400/  500]    Loss 2.150853    Top1 51.781250    Top5 74.763672\n 2018-04-04 21:31:32,842 - Epoch: [89][  450/  500]    Loss 2.150156    Top1 51.828125    Top5 74.821181\n 2018-04-04 21:31:35,338 - Epoch: [89][  500/  500]    Loss 2.150417    Top1 51.833594    Top5 74.817187\n 2018-04-04 21:31:35,357 - ==\n Top1: 51.838    Top5: 74.817    Loss: 2.150\n\n 2018-04-04 21:31:35,364 - Saving checkpoint\n 2018-04-04 21:31:39,251 - --- test ---------------------\n 2018-04-04 21:31:39,252 - 50000 samples (256 per mini-batch)\n 2018-04-04 21:31:51,512 - Test: [   50/  195]    Loss 1.487607    Top1 63.273438    Top5 85.695312\n 2018-04-04 21:31:55,015 - Test: [  100/  195]    Loss 1.638043    Top1 60.636719    Top5 83.664062\n 2018-04-04 21:31:58,732 - Test: [  150/  195]    Loss 1.833214    Top1 57.619792    Top5 80.447917\n 2018-04-04 21:32:01,274 - ==\n Top1: 56.606    Top5: 79.446    Loss: 1.893\n\n\n\n\nLet's look at the command line again:\n\n\n$ time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../sensitivity-pruning/alexnet.schedule_sensitivity.yaml\n\n\n\n\nIn this example, we prune a TorchVision pre-trained AlexNet network, using the following configuration:\n\n\n\n\nLearning-rate of 0.005\n\n\nPrint progress every 50 mini-batches.\n\n\nUse 44 worker threads to load data (make sure to use something suitable for your machine).\n\n\nRun for 90 epochs.  Torchvision's pre-trained models did not store the epoch metadata, so pruning starts at epoch 0.  When you train and prune your own networks, the last training epoch is saved as a metadata with the model.  Therefore, when you load such models, the first epoch is not 0, but it is the last training epoch.\n\n\nThe pruning schedule is provided in \nalexnet.schedule_sensitivity.yaml\n\n\nLog files are written to directory \nlogs\n.\n\n\n\n\nExamples\n\n\nDistiller comes with several example schedules which can be used together with \ncompress_classifier.py\n.\nThese example schedules (YAML) files, contain the command line that is used in order to invoke the schedule (so that you can easily recreate the results in your environment), together with the results of the pruning or regularization.  The results usually contain a table showing the sparsity of  each of the model parameters, together with the validation and test top1, top5 and loss scores.\n\n\nFor more details on the example schedules, you can refer to the coverage of the \nModel Zoo\n.\n\n\n\n\nexamples/agp-pruning\n:\n\n\nAutomated Gradual Pruning (AGP) on MobileNet and ResNet18 (ImageNet dataset)\n\n\n\n\n\n\n\nexamples/hybrid\n:\n\n\nAlexNet AGP with 2D (kernel) regularization (ImageNet dataset)\n\n\nAlexNet sensitivity pruning with 2D regularization\n\n\n\n\n\n\n\nexamples/network_slimming\n:\n\n\nResNet20 Network Slimming (this is work-in-progress)\n\n\n\n\n\n\n\nexamples/pruning_filters_for_efficient_convnets\n:\n\n\nResNet56 baseline training (CIFAR10 dataset)\n\n\nResNet56 filter removal using filter ranking\n\n\n\n\n\n\n\nexamples/sensitivity_analysis\n:\n\n\nElement-wise pruning sensitivity-analysis:\n\n\nAlexNet (ImageNet)\n\n\nMobileNet (ImageNet)\n\n\nResNet18 (ImageNet)\n\n\nResNet20 (CIFAR10)\n\n\nResNet34 (ImageNet)\n\n\nFilter-wise pruning sensitivity-analysis:\n\n\nResNet20 (CIFAR10)\n\n\nResNet56 (CIFAR10)\n\n\n\n\n\n\n\nexamples/sensitivity-pruning\n:\n\n\nAlexNet sensitivity pruning with Iterative Pruning\n\n\nAlexNet sensitivity pruning with One-Shot Pruning\n\n\n\n\n\n\n\nexamples/ssl\n:\n\n\nResNet20 baseline training (CIFAR10 dataset)\n\n\nStructured Sparsity Learning (SSL) with layer removal on ResNet20\n\n\nSSL with channels removal on ResNet20\n\n\n\n\n\n\n\nexamples/quantization\n:\n\n\nAlexNet w. Batch-Norm (base FP32 + DoReFa)\n\n\nPre-activation ResNet20 on CIFAR10 (base FP32 + DoReFa)\n\n\nPre-activation ResNet18 on ImageNEt (base FP32 + DoReFa)\n\n\n\n\n\n\n\n\nExperiment reproducibility\n\n\nExperiment reproducibility is sometimes important.  Pete Warden recently expounded about this in his \nblog\n.\n\nPyTorch's support for deterministic execution requires us to use only one thread for loading data (other wise the multi-threaded execution of the data loaders can create random order and change the results), and to set the seed of the CPU and GPU PRNGs.  Using the \n--deterministic\n command-line flag and setting \nj=1\n will produce reproducible results (for the same PyTorch version).\n\n\nPerforming pruning sensitivity analysis\n\n\nDistiller supports element-wise and filter-wise pruning sensitivity analysis.  In both cases, L1-norm is used to rank which elements or filters to prune.  For example, when running filter-pruning sensitivity analysis, the L1-norm of the filters of each layer's weights tensor are calculated, and the bottom x% are set to zero.  \n\nThe analysis process is quite long, because currently we use the entire test dataset to assess the accuracy performance at each pruning level of each weights tensor.  Using a small dataset for this would save much time and we plan on assessing if this will provide sufficient results.\n\nResults are output as a CSV file (\nsensitivity.csv\n) and PNG file (\nsensitivity.png\n).  The implementation is in \ndistiller/sensitivity.py\n and it contains further details about process and the format of the CSV file.\n\n\nThe example below performs element-wise pruning sensitivity analysis on ResNet20 for CIFAR10:\n\n\n$ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10/ -j=1 --resume=../cifar10/resnet20/checkpoint_trained_dense.pth.tar --sense=element\n\n\n\n\nThe \nsense\n command-line argument can be set to either \nelement\n or \nfilter\n, depending on the type of analysis you want done.\n\n\nThere is also a \nJupyter notebook\n with example invocations, outputs and explanations.\n\n\nPost-Training Quantization\n\n\nThe following example qunatizes ResNet18 for ImageNet:\n\n\n$ python3 compress_classifier.py -a resnet18 ../../../data.imagenet  --pretrained --quantize-eval --evaluate\n\n\n\n\nSee \nhere\n for more details on how to invoke post-training quantization from the command line.\n\n\nA checkpoint with the quantized model will be dumped in the run directory. It will contain the quantized model parameters (the data type will still be FP32, but the values will be integers). The calculated quantization parameters (scale and zero-point) are stored as well in each quantized layer.\n\n\nFor more examples of post-training quantization see \nhere\n.\n\n\nSummaries\n\n\nYou can use the sample compression application to generate model summary reports, such as the attributes and compute summary report (see screen capture below).\nYou can log sparsity statistics (written to console and CSV file), performance, optimizer and model information, and also create a PNG image of the DNN.\nCreating a PNG image is an experimental feature (it relies on features which are not available on PyTorch 3.1 and that we hope will be available in PyTorch's next release), so to use it you will need to compile the PyTorch master branch, and hope for the best ;-).\n\n\n$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_ch_regularized_dense.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=compute\n\n\n\n\nGenerates:\n\n\n+----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------+\n|    | Name                         | Type   | Attrs    | IFM             |   IFM volume | OFM             |   OFM volume |   Weights volume |    MACs |\n|----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------|\n|  0 | module.conv1                 | Conv2d | k=(3, 3) | (1, 3, 32, 32)  |         3072 | (1, 16, 32, 32) |        16384 |              432 |  442368 |\n|  1 | module.layer1.0.conv1        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  2 | module.layer1.0.conv2        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  3 | module.layer1.1.conv1        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  4 | module.layer1.1.conv2        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  5 | module.layer1.2.conv1        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  6 | module.layer1.2.conv2        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  7 | module.layer2.0.conv1        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 32, 16, 16) |         8192 |             4608 | 1179648 |\n|  8 | module.layer2.0.conv2        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n|  9 | module.layer2.0.downsample.0 | Conv2d | k=(1, 1) | (1, 16, 32, 32) |        16384 | (1, 32, 16, 16) |         8192 |              512 |  131072 |\n| 10 | module.layer2.1.conv1        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n| 11 | module.layer2.1.conv2        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n| 12 | module.layer2.2.conv1        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n| 13 | module.layer2.2.conv2        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n| 14 | module.layer3.0.conv1        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 64, 8, 8)   |         4096 |            18432 | 1179648 |\n| 15 | module.layer3.0.conv2        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 16 | module.layer3.0.downsample.0 | Conv2d | k=(1, 1) | (1, 32, 16, 16) |         8192 | (1, 64, 8, 8)   |         4096 |             2048 |  131072 |\n| 17 | module.layer3.1.conv1        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 18 | module.layer3.1.conv2        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 19 | module.layer3.2.conv1        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 20 | module.layer3.2.conv2        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 21 | module.fc                    | Linear |          | (1, 64)         |           64 | (1, 10)         |           10 |              640 |     640 |\n+----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------+\nTotal MACs: 40,813,184\n\n\n\n\nUsing TensorBoard\n\n\nGoogle's \nTensorBoard\n is an excellent tool for visualizing the progress of DNN training.  Distiller's logger supports writing performance indicators and parameter statistics in a file format that can be read by TensorBoard (Distiller uses TensorFlow's APIs in order to do this, which is why Distiller requires the installation of TensorFlow).\n\nTo view the graphs, invoke the TensorBoard server.  For example:\n\n\n$ tensorboard --logdir=logs\n\n\n\n\nDistillers's setup (requirements.txt) installs TensorFlow for CPU. If you want a different installation, please follow the \nTensorFlow installation instructions\n.\n\n\nCollecting activations statistics\n\n\nIn CNNs with ReLU layers, ReLU activations (feature-maps) also exhibit a nice level of sparsity (50-60% sparsity is typical). \n\nYou can collect activation statistics using the \n--act_stats\n command-line flag.\n\nFor example:\n\n\n$ python3 compress_classifier.py -a=resnet56_cifar -p=50 ../../../data.cifar10  --resume=checkpoint.resnet56_cifar_baseline.pth.tar --act-stats=test -e\n\n\n\n\nThe \ntest\n parameter indicates that, in this example, we want to collect activation statistics during the \ntest\n phase.  Note that we also used the \n-e\n command-line argument to indicate that we want to run a \ntest\n phase. The other two legal parameter values are \ntrain\n and \nvalid\n which collect activation statistics during the \ntraining\n and \nvalidation\n phases, respectively.  \n\n\nCollectors and their collaterals\n\n\nAn instance of a subclass of \nActivationStatsCollector\n can be used to collect activation statistics.  Currently, \nActivationStatsCollector\n has two types of subclasses: \nSummaryActivationStatsCollector\n and \nRecordsActivationStatsCollector\n.\n\nInstances of \nSummaryActivationStatsCollector\n compute the mean of some statistic of the activation.  It is rather\nlight-weight and quicker than collecting a record per activation.  The statistic function is configured in the constructor.\n\nIn the sample compression application, \ncompress_classifier.py\n, we create a dictionary of collectors.  For example:\n\n\nSummaryActivationStatsCollector(model,\n                                \nsparsity\n,\n                                lambda t: 100 * distiller.utils.sparsity(t))\n\n\n\n\nThe lambda expression is invoked per activation encountered during forward passes, and the value it returns (in this case, the sparsity of the activation tensors, multiplied by 100) is stored in \nmodule.sparsity\n (\n\"sparsity\"\n is this collector's name).  To access the statistics, you can invoke \ncollector.value()\n, or you can access each module's data directly.\n\n\nAnother type of collector is \nRecordsActivationStatsCollector\n which computes a hard-coded set of activations statistics and collects a\n\nrecord per activation\n.  For obvious reasons, this is slower than instances of \nSummaryActivationStatsCollector\n.\nActivationStatsCollector\n default to collecting activations statistics only on the output activations of ReLU layers, but we can choose any layer type we want.  In the example below we collect statistics from outputs of \ntorch.nn.Conv2d\n layers.\n\n\nRecordsActivationStatsCollector(model, classes=[torch.nn.Conv2d])\n\n\n\n\nCollectors can write their data to Excel workbooks (which are named using the collector's name), by invoking \ncollector.to_xlsx(path_to_workbook)\n.  In \ncompress_classifier.py\n we currently create four different collectors which you can selectively disable.  You can also add other statistics collectors and use a different function to compute your new statistic.\n\n\ncollectors = missingdict({\n    \nsparsity\n:      SummaryActivationStatsCollector(model, \nsparsity\n,\n                                                     lambda t: 100 * distiller.utils.sparsity(t)),\n    \nl1_channels\n:   SummaryActivationStatsCollector(model, \nl1_channels\n,\n                                                     distiller.utils.activation_channels_l1),\n    \napoz_channels\n: SummaryActivationStatsCollector(model, \napoz_channels\n,\n                                                     distiller.utils.activation_channels_apoz),\n    \nrecords\n:       RecordsActivationStatsCollector(model, classes=[torch.nn.Conv2d])})\n\n\n\n\nBy default, these Collectors write their data to files in the active log directory.\n\n\nYou can use a utility function, \ndistiller.log_activation_statsitics\n, to log the data of an \nActivationStatsCollector\n instance to one of the backend-loggers.  For an example, the code below logs the \n\"sparsity\"\n collector to a TensorBoard log file.\n\n\ndistiller.log_activation_statsitics(epoch, \ntrain\n, loggers=[tflogger],\n                                    collector=collectors[\nsparsity\n])\n\n\n\n\nCaveats\n\n\nDistiller collects activations statistics using PyTorch's forward-hooks mechanism.  Collectors iteratively register the modules' forward-hooks, and collectors are called during the forward traversal and get exposed to activation data.  Registering for forward callbacks is performed like this:\n\n\nmodule.register_forward_hook\n\n\n\n\nThis makes apparent two limitations of this mechanism:\n\n\n\n\nWe can only register on PyTorch modules.  This means that we can't register on the forward hook of a functionals such as \ntorch.nn.functional.relu\n and \ntorch.nn.functional.max_pool2d\n.\n\n   Therefore, you may need to replace functionals with their module alternative.  For example:  \n\n\n\n\nclass MadeUpNet(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(3, 6, 5)\n\n    def forward(self, x):\n        x = F.relu(self.conv1(x))\n        return x\n\n\n\n\nCan be changed to:  \n\n\nclass MadeUpNet(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(3, 6, 5)\n        self.relu = nn.ReLU(inplace=True)\n\n    def forward(self, x):\n        x = self.relu(self.conv1(x))\n        return x\n\n\n\n\n\n\nWe can only use a module instance once in our models.  If we use the same module several times, then we can't determine which node in the graph has invoked the callback, because the PyTorch callback signature \ndef hook(module, input, output)\n doesn't provide enough contextual information.\n\nTorchVision's \nResNet\n is an example of a model that uses the same instance of nn.ReLU multiple times:  \n\n\n\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n    def __init__(self, inplanes, planes, stride=1, downsample=None):\n        super(BasicBlock, self).__init__()\n        self.conv1 = conv3x3(inplanes, planes, stride)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(planes, planes)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)                    # \n================\n        out = self.conv2(out)\n        out = self.bn2(out)\n        if self.downsample is not None:\n            residual = self.downsample(x)\n        out += residual\n        out = self.relu(out)                    # \n================\n        return out\n\n\n\n\nIn Distiller we changed \nResNet\n to use multiple instances of nn.ReLU, and each instance is used only once:  \n\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n    def __init__(self, inplanes, planes, stride=1, downsample=None):\n        super(BasicBlock, self).__init__()\n        self.conv1 = conv3x3(inplanes, planes, stride)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.relu1 = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(planes, planes)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.relu2 = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu1(out)                   # \n================\n        out = self.conv2(out)\n        out = self.bn2(out)\n        if self.downsample is not None:\n            residual = self.downsample(x)\n        out += residual\n        out = self.relu2(out)                   # \n================\n        return out\n\n\n\n\nUsing the Jupyter notebooks\n\n\nThe Jupyter notebooks contain many examples of how to use the statistics summaries generated by Distiller.  They are explained in a separate page.\n\n\nGenerating this documentation\n\n\nInstall mkdocs and the required packages by executing:\n\n\n$ pip3 install -r doc-requirements.txt\n\n\n\n\nTo build the project documentation run:\n\n\n$ cd distiller/docs-src\n$ mkdocs build --clean\n\n\n\n\nThis will create a folder named 'site' which contains the documentation website.\nOpen distiller/docs/site/index.html to view the documentation home page.", 
-            "title": "Usage"
-        }, 
-        {
-            "location": "/usage/index.html#using-the-sample-application", 
-            "text": "The Distiller repository contains a sample application,  distiller/examples/classifier_compression/compress_classifier.py , and a set of scheduling files which demonstrate Distiller's features.  Following is a brief discussion of how to use this application and the accompanying schedules.  You might also want to refer to the following resources:   An  explanation  of the scheduler file format.  An in-depth  discussion  of how we used these schedule files to implement several state-of-the-art DNN compression research papers.   The sample application supports various features for compression of image classification DNNs, and gives an example of how to integrate distiller in your own application.  The code is documented and should be considered the best source of documentation, but we provide some elaboration here.  This diagram shows how where  compress_classifier.py  fits in the compression workflow, and how we integrate the Jupyter notebooks as part of our research work.", 
-            "title": "Using the sample application"
-        }, 
-        {
-            "location": "/usage/index.html#command-line-arguments", 
-            "text": "To get help on the command line arguments, invoke:  $ python3 compress_classifier.py --help  For example:  $ time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../sensitivity-pruning/alexnet.schedule_sensitivity.yaml\n\nParameters:\n +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n |    | Name                      | Shape            |   NNZ (dense) |   NNZ (sparse) |   Cols (%) |   Rows (%) |   Ch (%) |   2D (%) |   3D (%) |   Fine (%) |     Std |     Mean |   Abs-Mean |\n |----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|\n |  0 | features.module.0.weight  | (64, 3, 11, 11)  |         23232 |          13411 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   42.27359 | 0.14391 | -0.00002 |    0.08805 |\n |  1 | features.module.3.weight  | (192, 64, 5, 5)  |        307200 |         115560 |    0.00000 |    0.00000 |  0.00000 |  1.91243 |  0.00000 |   62.38281 | 0.04703 | -0.00250 |    0.02289 |\n |  2 | features.module.6.weight  | (384, 192, 3, 3) |        663552 |         256565 |    0.00000 |    0.00000 |  0.00000 |  6.18490 |  0.00000 |   61.33445 | 0.03354 | -0.00184 |    0.01803 |\n |  3 | features.module.8.weight  | (256, 384, 3, 3) |        884736 |         315065 |    0.00000 |    0.00000 |  0.00000 |  6.96411 |  0.00000 |   64.38881 | 0.02646 | -0.00168 |    0.01422 |\n |  4 | features.module.10.weight | (256, 256, 3, 3) |        589824 |         186938 |    0.00000 |    0.00000 |  0.00000 | 15.49225 |  0.00000 |   68.30614 | 0.02714 | -0.00246 |    0.01409 |\n |  5 | classifier.1.weight       | (4096, 9216)     |      37748736 |        3398881 |    0.00000 |    0.21973 |  0.00000 |  0.21973 |  0.00000 |   90.99604 | 0.00589 | -0.00020 |    0.00168 |\n |  6 | classifier.4.weight       | (4096, 4096)     |      16777216 |        1782769 |    0.21973 |    3.46680 |  0.00000 |  3.46680 |  0.00000 |   89.37387 | 0.00849 | -0.00066 |    0.00263 |\n |  7 | classifier.6.weight       | (1000, 4096)     |       4096000 |         994738 |    3.36914 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   75.71440 | 0.01718 |  0.00030 |    0.00778 |\n |  8 | Total sparsity:           | -                |      61090496 |        7063928 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   88.43694 | 0.00000 |  0.00000 |    0.00000 |\n +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n 2018-04-04 21:30:52,499 - Total sparsity: 88.44\n\n 2018-04-04 21:30:52,499 - --- validate (epoch=89)-----------\n 2018-04-04 21:30:52,499 - 128116 samples (256 per mini-batch)\n 2018-04-04 21:31:04,646 - Epoch: [89][   50/  500]    Loss 2.175988    Top1 51.289063    Top5 74.023438\n 2018-04-04 21:31:06,427 - Epoch: [89][  100/  500]    Loss 2.171564    Top1 51.175781    Top5 74.308594\n 2018-04-04 21:31:11,432 - Epoch: [89][  150/  500]    Loss 2.159347    Top1 51.546875    Top5 74.473958\n 2018-04-04 21:31:14,364 - Epoch: [89][  200/  500]    Loss 2.156857    Top1 51.585938    Top5 74.568359\n 2018-04-04 21:31:18,381 - Epoch: [89][  250/  500]    Loss 2.152790    Top1 51.707813    Top5 74.681250\n 2018-04-04 21:31:22,195 - Epoch: [89][  300/  500]    Loss 2.149962    Top1 51.791667    Top5 74.755208\n 2018-04-04 21:31:25,508 - Epoch: [89][  350/  500]    Loss 2.150936    Top1 51.827009    Top5 74.767857\n 2018-04-04 21:31:29,538 - Epoch: [89][  400/  500]    Loss 2.150853    Top1 51.781250    Top5 74.763672\n 2018-04-04 21:31:32,842 - Epoch: [89][  450/  500]    Loss 2.150156    Top1 51.828125    Top5 74.821181\n 2018-04-04 21:31:35,338 - Epoch: [89][  500/  500]    Loss 2.150417    Top1 51.833594    Top5 74.817187\n 2018-04-04 21:31:35,357 - ==  Top1: 51.838    Top5: 74.817    Loss: 2.150\n\n 2018-04-04 21:31:35,364 - Saving checkpoint\n 2018-04-04 21:31:39,251 - --- test ---------------------\n 2018-04-04 21:31:39,252 - 50000 samples (256 per mini-batch)\n 2018-04-04 21:31:51,512 - Test: [   50/  195]    Loss 1.487607    Top1 63.273438    Top5 85.695312\n 2018-04-04 21:31:55,015 - Test: [  100/  195]    Loss 1.638043    Top1 60.636719    Top5 83.664062\n 2018-04-04 21:31:58,732 - Test: [  150/  195]    Loss 1.833214    Top1 57.619792    Top5 80.447917\n 2018-04-04 21:32:01,274 - ==  Top1: 56.606    Top5: 79.446    Loss: 1.893  Let's look at the command line again:  $ time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../sensitivity-pruning/alexnet.schedule_sensitivity.yaml  In this example, we prune a TorchVision pre-trained AlexNet network, using the following configuration:   Learning-rate of 0.005  Print progress every 50 mini-batches.  Use 44 worker threads to load data (make sure to use something suitable for your machine).  Run for 90 epochs.  Torchvision's pre-trained models did not store the epoch metadata, so pruning starts at epoch 0.  When you train and prune your own networks, the last training epoch is saved as a metadata with the model.  Therefore, when you load such models, the first epoch is not 0, but it is the last training epoch.  The pruning schedule is provided in  alexnet.schedule_sensitivity.yaml  Log files are written to directory  logs .", 
-            "title": "Command line arguments"
-        }, 
-        {
-            "location": "/usage/index.html#examples", 
-            "text": "Distiller comes with several example schedules which can be used together with  compress_classifier.py .\nThese example schedules (YAML) files, contain the command line that is used in order to invoke the schedule (so that you can easily recreate the results in your environment), together with the results of the pruning or regularization.  The results usually contain a table showing the sparsity of  each of the model parameters, together with the validation and test top1, top5 and loss scores.  For more details on the example schedules, you can refer to the coverage of the  Model Zoo .   examples/agp-pruning :  Automated Gradual Pruning (AGP) on MobileNet and ResNet18 (ImageNet dataset)    examples/hybrid :  AlexNet AGP with 2D (kernel) regularization (ImageNet dataset)  AlexNet sensitivity pruning with 2D regularization    examples/network_slimming :  ResNet20 Network Slimming (this is work-in-progress)    examples/pruning_filters_for_efficient_convnets :  ResNet56 baseline training (CIFAR10 dataset)  ResNet56 filter removal using filter ranking    examples/sensitivity_analysis :  Element-wise pruning sensitivity-analysis:  AlexNet (ImageNet)  MobileNet (ImageNet)  ResNet18 (ImageNet)  ResNet20 (CIFAR10)  ResNet34 (ImageNet)  Filter-wise pruning sensitivity-analysis:  ResNet20 (CIFAR10)  ResNet56 (CIFAR10)    examples/sensitivity-pruning :  AlexNet sensitivity pruning with Iterative Pruning  AlexNet sensitivity pruning with One-Shot Pruning    examples/ssl :  ResNet20 baseline training (CIFAR10 dataset)  Structured Sparsity Learning (SSL) with layer removal on ResNet20  SSL with channels removal on ResNet20    examples/quantization :  AlexNet w. Batch-Norm (base FP32 + DoReFa)  Pre-activation ResNet20 on CIFAR10 (base FP32 + DoReFa)  Pre-activation ResNet18 on ImageNEt (base FP32 + DoReFa)", 
-            "title": "Examples"
-        }, 
-        {
-            "location": "/usage/index.html#experiment-reproducibility", 
-            "text": "Experiment reproducibility is sometimes important.  Pete Warden recently expounded about this in his  blog . \nPyTorch's support for deterministic execution requires us to use only one thread for loading data (other wise the multi-threaded execution of the data loaders can create random order and change the results), and to set the seed of the CPU and GPU PRNGs.  Using the  --deterministic  command-line flag and setting  j=1  will produce reproducible results (for the same PyTorch version).", 
-            "title": "Experiment reproducibility"
-        }, 
-        {
-            "location": "/usage/index.html#performing-pruning-sensitivity-analysis", 
-            "text": "Distiller supports element-wise and filter-wise pruning sensitivity analysis.  In both cases, L1-norm is used to rank which elements or filters to prune.  For example, when running filter-pruning sensitivity analysis, the L1-norm of the filters of each layer's weights tensor are calculated, and the bottom x% are set to zero.   \nThe analysis process is quite long, because currently we use the entire test dataset to assess the accuracy performance at each pruning level of each weights tensor.  Using a small dataset for this would save much time and we plan on assessing if this will provide sufficient results. \nResults are output as a CSV file ( sensitivity.csv ) and PNG file ( sensitivity.png ).  The implementation is in  distiller/sensitivity.py  and it contains further details about process and the format of the CSV file.  The example below performs element-wise pruning sensitivity analysis on ResNet20 for CIFAR10:  $ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10/ -j=1 --resume=../cifar10/resnet20/checkpoint_trained_dense.pth.tar --sense=element  The  sense  command-line argument can be set to either  element  or  filter , depending on the type of analysis you want done.  There is also a  Jupyter notebook  with example invocations, outputs and explanations.", 
-            "title": "Performing pruning sensitivity analysis"
-        }, 
-        {
-            "location": "/usage/index.html#post-training-quantization", 
-            "text": "The following example qunatizes ResNet18 for ImageNet:  $ python3 compress_classifier.py -a resnet18 ../../../data.imagenet  --pretrained --quantize-eval --evaluate  See  here  for more details on how to invoke post-training quantization from the command line.  A checkpoint with the quantized model will be dumped in the run directory. It will contain the quantized model parameters (the data type will still be FP32, but the values will be integers). The calculated quantization parameters (scale and zero-point) are stored as well in each quantized layer.  For more examples of post-training quantization see  here .", 
-            "title": "Post-Training Quantization"
-        }, 
-        {
-            "location": "/usage/index.html#summaries", 
-            "text": "You can use the sample compression application to generate model summary reports, such as the attributes and compute summary report (see screen capture below).\nYou can log sparsity statistics (written to console and CSV file), performance, optimizer and model information, and also create a PNG image of the DNN.\nCreating a PNG image is an experimental feature (it relies on features which are not available on PyTorch 3.1 and that we hope will be available in PyTorch's next release), so to use it you will need to compile the PyTorch master branch, and hope for the best ;-).  $ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_ch_regularized_dense.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=compute  Generates:  +----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------+\n|    | Name                         | Type   | Attrs    | IFM             |   IFM volume | OFM             |   OFM volume |   Weights volume |    MACs |\n|----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------|\n|  0 | module.conv1                 | Conv2d | k=(3, 3) | (1, 3, 32, 32)  |         3072 | (1, 16, 32, 32) |        16384 |              432 |  442368 |\n|  1 | module.layer1.0.conv1        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  2 | module.layer1.0.conv2        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  3 | module.layer1.1.conv1        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  4 | module.layer1.1.conv2        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  5 | module.layer1.2.conv1        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  6 | module.layer1.2.conv2        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 16, 32, 32) |        16384 |             2304 | 2359296 |\n|  7 | module.layer2.0.conv1        | Conv2d | k=(3, 3) | (1, 16, 32, 32) |        16384 | (1, 32, 16, 16) |         8192 |             4608 | 1179648 |\n|  8 | module.layer2.0.conv2        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n|  9 | module.layer2.0.downsample.0 | Conv2d | k=(1, 1) | (1, 16, 32, 32) |        16384 | (1, 32, 16, 16) |         8192 |              512 |  131072 |\n| 10 | module.layer2.1.conv1        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n| 11 | module.layer2.1.conv2        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n| 12 | module.layer2.2.conv1        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n| 13 | module.layer2.2.conv2        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 32, 16, 16) |         8192 |             9216 | 2359296 |\n| 14 | module.layer3.0.conv1        | Conv2d | k=(3, 3) | (1, 32, 16, 16) |         8192 | (1, 64, 8, 8)   |         4096 |            18432 | 1179648 |\n| 15 | module.layer3.0.conv2        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 16 | module.layer3.0.downsample.0 | Conv2d | k=(1, 1) | (1, 32, 16, 16) |         8192 | (1, 64, 8, 8)   |         4096 |             2048 |  131072 |\n| 17 | module.layer3.1.conv1        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 18 | module.layer3.1.conv2        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 19 | module.layer3.2.conv1        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 20 | module.layer3.2.conv2        | Conv2d | k=(3, 3) | (1, 64, 8, 8)   |         4096 | (1, 64, 8, 8)   |         4096 |            36864 | 2359296 |\n| 21 | module.fc                    | Linear |          | (1, 64)         |           64 | (1, 10)         |           10 |              640 |     640 |\n+----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------+\nTotal MACs: 40,813,184", 
-            "title": "Summaries"
-        }, 
-        {
-            "location": "/usage/index.html#using-tensorboard", 
-            "text": "Google's  TensorBoard  is an excellent tool for visualizing the progress of DNN training.  Distiller's logger supports writing performance indicators and parameter statistics in a file format that can be read by TensorBoard (Distiller uses TensorFlow's APIs in order to do this, which is why Distiller requires the installation of TensorFlow). \nTo view the graphs, invoke the TensorBoard server.  For example:  $ tensorboard --logdir=logs  Distillers's setup (requirements.txt) installs TensorFlow for CPU. If you want a different installation, please follow the  TensorFlow installation instructions .", 
-            "title": "Using TensorBoard"
-        }, 
-        {
-            "location": "/usage/index.html#collecting-activations-statistics", 
-            "text": "In CNNs with ReLU layers, ReLU activations (feature-maps) also exhibit a nice level of sparsity (50-60% sparsity is typical).  \nYou can collect activation statistics using the  --act_stats  command-line flag. \nFor example:  $ python3 compress_classifier.py -a=resnet56_cifar -p=50 ../../../data.cifar10  --resume=checkpoint.resnet56_cifar_baseline.pth.tar --act-stats=test -e  The  test  parameter indicates that, in this example, we want to collect activation statistics during the  test  phase.  Note that we also used the  -e  command-line argument to indicate that we want to run a  test  phase. The other two legal parameter values are  train  and  valid  which collect activation statistics during the  training  and  validation  phases, respectively.", 
-            "title": "Collecting activations statistics"
-        }, 
-        {
-            "location": "/usage/index.html#collectors-and-their-collaterals", 
-            "text": "An instance of a subclass of  ActivationStatsCollector  can be used to collect activation statistics.  Currently,  ActivationStatsCollector  has two types of subclasses:  SummaryActivationStatsCollector  and  RecordsActivationStatsCollector . \nInstances of  SummaryActivationStatsCollector  compute the mean of some statistic of the activation.  It is rather\nlight-weight and quicker than collecting a record per activation.  The statistic function is configured in the constructor. \nIn the sample compression application,  compress_classifier.py , we create a dictionary of collectors.  For example:  SummaryActivationStatsCollector(model,\n                                 sparsity ,\n                                lambda t: 100 * distiller.utils.sparsity(t))  The lambda expression is invoked per activation encountered during forward passes, and the value it returns (in this case, the sparsity of the activation tensors, multiplied by 100) is stored in  module.sparsity  ( \"sparsity\"  is this collector's name).  To access the statistics, you can invoke  collector.value() , or you can access each module's data directly.  Another type of collector is  RecordsActivationStatsCollector  which computes a hard-coded set of activations statistics and collects a record per activation .  For obvious reasons, this is slower than instances of  SummaryActivationStatsCollector . ActivationStatsCollector  default to collecting activations statistics only on the output activations of ReLU layers, but we can choose any layer type we want.  In the example below we collect statistics from outputs of  torch.nn.Conv2d  layers.  RecordsActivationStatsCollector(model, classes=[torch.nn.Conv2d])  Collectors can write their data to Excel workbooks (which are named using the collector's name), by invoking  collector.to_xlsx(path_to_workbook) .  In  compress_classifier.py  we currently create four different collectors which you can selectively disable.  You can also add other statistics collectors and use a different function to compute your new statistic.  collectors = missingdict({\n     sparsity :      SummaryActivationStatsCollector(model,  sparsity ,\n                                                     lambda t: 100 * distiller.utils.sparsity(t)),\n     l1_channels :   SummaryActivationStatsCollector(model,  l1_channels ,\n                                                     distiller.utils.activation_channels_l1),\n     apoz_channels : SummaryActivationStatsCollector(model,  apoz_channels ,\n                                                     distiller.utils.activation_channels_apoz),\n     records :       RecordsActivationStatsCollector(model, classes=[torch.nn.Conv2d])})  By default, these Collectors write their data to files in the active log directory.  You can use a utility function,  distiller.log_activation_statsitics , to log the data of an  ActivationStatsCollector  instance to one of the backend-loggers.  For an example, the code below logs the  \"sparsity\"  collector to a TensorBoard log file.  distiller.log_activation_statsitics(epoch,  train , loggers=[tflogger],\n                                    collector=collectors[ sparsity ])", 
-            "title": "Collectors and their collaterals"
-        }, 
-        {
-            "location": "/usage/index.html#caveats", 
-            "text": "Distiller collects activations statistics using PyTorch's forward-hooks mechanism.  Collectors iteratively register the modules' forward-hooks, and collectors are called during the forward traversal and get exposed to activation data.  Registering for forward callbacks is performed like this:  module.register_forward_hook  This makes apparent two limitations of this mechanism:   We can only register on PyTorch modules.  This means that we can't register on the forward hook of a functionals such as  torch.nn.functional.relu  and  torch.nn.functional.max_pool2d . \n   Therefore, you may need to replace functionals with their module alternative.  For example:     class MadeUpNet(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(3, 6, 5)\n\n    def forward(self, x):\n        x = F.relu(self.conv1(x))\n        return x  Can be changed to:    class MadeUpNet(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(3, 6, 5)\n        self.relu = nn.ReLU(inplace=True)\n\n    def forward(self, x):\n        x = self.relu(self.conv1(x))\n        return x   We can only use a module instance once in our models.  If we use the same module several times, then we can't determine which node in the graph has invoked the callback, because the PyTorch callback signature  def hook(module, input, output)  doesn't provide enough contextual information. \nTorchVision's  ResNet  is an example of a model that uses the same instance of nn.ReLU multiple times:     class BasicBlock(nn.Module):\n    expansion = 1\n    def __init__(self, inplanes, planes, stride=1, downsample=None):\n        super(BasicBlock, self).__init__()\n        self.conv1 = conv3x3(inplanes, planes, stride)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(planes, planes)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)                    #  ================\n        out = self.conv2(out)\n        out = self.bn2(out)\n        if self.downsample is not None:\n            residual = self.downsample(x)\n        out += residual\n        out = self.relu(out)                    #  ================\n        return out  In Distiller we changed  ResNet  to use multiple instances of nn.ReLU, and each instance is used only once:    class BasicBlock(nn.Module):\n    expansion = 1\n    def __init__(self, inplanes, planes, stride=1, downsample=None):\n        super(BasicBlock, self).__init__()\n        self.conv1 = conv3x3(inplanes, planes, stride)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.relu1 = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(planes, planes)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.relu2 = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu1(out)                   #  ================\n        out = self.conv2(out)\n        out = self.bn2(out)\n        if self.downsample is not None:\n            residual = self.downsample(x)\n        out += residual\n        out = self.relu2(out)                   #  ================\n        return out", 
-            "title": "Caveats"
-        }, 
-        {
-            "location": "/usage/index.html#using-the-jupyter-notebooks", 
-            "text": "The Jupyter notebooks contain many examples of how to use the statistics summaries generated by Distiller.  They are explained in a separate page.", 
-            "title": "Using the Jupyter notebooks"
-        }, 
-        {
-            "location": "/usage/index.html#generating-this-documentation", 
-            "text": "Install mkdocs and the required packages by executing:  $ pip3 install -r doc-requirements.txt  To build the project documentation run:  $ cd distiller/docs-src\n$ mkdocs build --clean  This will create a folder named 'site' which contains the documentation website.\nOpen distiller/docs/site/index.html to view the documentation home page.", 
-            "title": "Generating this documentation"
-        }, 
-        {
-            "location": "/schedule/index.html", 
-            "text": "Compression scheduler\n\n\nIn iterative pruning, we create some kind of pruning regimen that specifies how to prune, and what to prune at every stage of the pruning and training stages. This motivated the design of \nCompressionScheduler\n: it needed to be part of the training loop, and to be able to make and implement pruning, regularization and quantization decisions.  We wanted to be able to change the particulars of the compression schedule, w/o touching the code, and settled on using YAML as a container for this specification.  We found that when we make many experiments on the same code base, it is easier to maintain all of these experiments if we decouple the differences from the code-base.  Therefore, we added to the scheduler support for learning-rate decay scheduling because, again, we wanted the freedom to change the LR-decay policy without changing code.  \n\n\nHigh level overview\n\n\nLet's briefly discuss the main mechanisms and abstractions: A schedule specification is composed of a list of sections defining instances of Pruners, Regularizers, Quantizers, LR-scheduler and Policies.\n\n\n\n\nPruners, Regularizers and Quantizers are very similar: They implement either a Pruning/Regularization/Quantization algorithm, respectively. \n\n\nAn LR-scheduler specifies the LR-decay algorithm.  \n\n\n\n\nThese define the \nwhat\n part of the schedule.  \n\n\nThe Policies define the \nwhen\n part of the schedule: at which epoch to start applying the Pruner/Regularizer/Quantizer/LR-decay, the epoch to end, and how often to invoke the policy (frequency of application).  A policy also defines the instance of Pruner/Regularizer/Quantizer/LR-decay it is managing.\n\nThe \nCompressionScheduler\n is configured from a YAML file or from a dictionary, but you can also manually create Policies, Pruners, Regularizers and Quantizers from code.\n\n\nSyntax through example\n\n\nWe'll use \nalexnet.schedule_agp.yaml\n to explain some of the YAML syntax for configuring Sensitivity Pruning of Alexnet.\n\n\nversion: 1\npruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.625\n\nlr_schedulers:\n   pruning_lr:\n     class: ExponentialLR\n     gamma: 0.9\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n  - lr_scheduler:\n      instance_name: pruning_lr\n    starting_epoch: 24\n    ending_epoch: 200\n    frequency: 1\n\n\n\n\nThere is only one version of the YAML syntax, and the version number is not verified at the moment.  However, to be future-proof it is probably better to let the YAML parser know that you are using version-1 syntax, in case there is ever a version 2.\n\n\nversion: 1\n\n\n\n\nIn the \npruners\n section, we define the instances of pruners we want the scheduler to instantiate and use.\n\nWe define a single pruner instance, named \nmy_pruner\n, of algorithm \nSensitivityPruner\n.  We will refer to this instance in the \nPolicies\n section.\n\nThen we list the sensitivity multipliers, \\(s\\), of each of the weight tensors.\n\nYou may list as many Pruners as you want in this section, as long as each has a unique name.  You can several types of pruners in one schedule.\n\n\npruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.6\n\n\n\n\nNext, we want to specify the learning-rate decay scheduling in the \nlr_schedulers\n section.  We assign a name to this instance: \npruning_lr\n.  As in the \npruners\n section, you may use any name, as long as all LR-schedulers have a unique name.  At the moment, only one instance of LR-scheduler is allowed.  The LR-scheduler must be a subclass of PyTorch's \n_LRScheduler\n.  You can use any of the schedulers defined in \ntorch.optim.lr_scheduler\n (see \nhere\n).  In addition, we've implemented some additional schedulers in Distiller (see \nhere\n). The keyword arguments (kwargs) are passed directly to the LR-scheduler's constructor, so that as new LR-schedulers are added to \ntorch.optim.lr_scheduler\n, they can be used without changing the application code.\n\n\nlr_schedulers:\n   pruning_lr:\n     class: ExponentialLR\n     gamma: 0.9\n\n\n\n\nFinally, we define the \npolicies\n section which defines the actual scheduling.  A \nPolicy\n manages an instance of a \nPruner\n, \nRegularizer\n, \nQuantizer\n, or \nLRScheduler\n, by naming the instance.  In the example below, a \nPruningPolicy\n uses the pruner instance named \nmy_pruner\n: it activates it at a frequency of 2 epochs (i.e. every other epoch), starting at epoch 0, and ending at epoch 38.  \n\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n  - lr_scheduler:\n      instance_name: pruning_lr\n    starting_epoch: 24\n    ending_epoch: 200\n    frequency: 1\n\n\n\n\nThis is \niterative pruning\n:\n\n\n\n\n\n\nTrain Connectivity\n\n\n\n\n\n\nPrune Connections\n\n\n\n\n\n\nRetrain Weights\n\n\n\n\n\n\nGoto 2\n\n\n\n\n\n\nIt is described  in \nLearning both Weights and Connections for Efficient Neural Networks\n:\n\n\n\n\n\"Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections...After an initial training phase, we remove all connections whose weight is lower than a threshold. This pruning converts a dense, fully-connected layer to a sparse layer. This first phase learns the topology of the networks \u2014 learning which connections are important and removing the unimportant connections. We then retrain the sparse network so the remaining connections can compensate for the connections that have been removed. The phases of pruning and retraining may be repeated iteratively to further reduce network complexity.\"\n\n\n\n\nRegularization\n\n\nYou can also define and schedule regularization.\n\n\nL1 regularization\n\n\nFormat (this is an informal specification, not a valid \nABNF\n specification):\n\n\nregularizers:\n  \nREGULARIZER_NAME_STR\n:\n    class: L1Regularizer\n    reg_regims:\n      \nPYTORCH_PARAM_NAME_STR\n: \nSTRENGTH_FLOAT\n\n      ...\n      \nPYTORCH_PARAM_NAME_STR\n: \nSTRENGTH_FLOAT\n\n    threshold_criteria: [Mean_Abs | Max]\n\n\n\n\nFor example:\n\n\nversion: 1\n\nregularizers:\n  my_L1_reg:\n    class: L1Regularizer\n    reg_regims:\n      'module.layer3.1.conv1.weight': 0.000002\n      'module.layer3.1.conv2.weight': 0.000002\n      'module.layer3.1.conv3.weight': 0.000002\n      'module.layer3.2.conv1.weight': 0.000002\n    threshold_criteria: Mean_Abs\n\npolicies:\n  - regularizer:\n      instance_name: my_L1_reg\n    starting_epoch: 0\n    ending_epoch: 60\n    frequency: 1\n\n\n\n\nGroup regularization\n\n\nFormat (informal specification):\n\n\nFormat:\n  regularizers:\n    \nREGULARIZER_NAME_STR\n:\n      class: L1Regularizer\n      reg_regims:\n        \nPYTORCH_PARAM_NAME_STR\n: [\nSTRENGTH_FLOAT\n, \n'2D' | '3D' | '4D' | 'Channels' | 'Cols' | 'Rows'\n]\n        \nPYTORCH_PARAM_NAME_STR\n: [\nSTRENGTH_FLOAT\n, \n'2D' | '3D' | '4D' | 'Channels' | 'Cols' | 'Rows'\n]\n      threshold_criteria: [Mean_Abs | Max]\n\n\n\n\nFor example:\n\n\nversion: 1\n\nregularizers:\n  my_filter_regularizer:\n    class: GroupLassoRegularizer\n    reg_regims:\n      'module.layer3.1.conv1.weight': [0.00005, '3D']\n      'module.layer3.1.conv2.weight': [0.00005, '3D']\n      'module.layer3.1.conv3.weight': [0.00005, '3D']\n      'module.layer3.2.conv1.weight': [0.00005, '3D']\n    threshold_criteria: Mean_Abs\n\npolicies:\n  - regularizer:\n      instance_name: my_filter_regularizer\n    starting_epoch: 0\n    ending_epoch: 60\n    frequency: 1\n\n\n\n\nMixing it up\n\n\nYou can mix pruning and regularization.\n\n\nversion: 1\npruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.625\n\nregularizers:\n  2d_groups_regularizer:\n    class: GroupLassoRegularizer\n    reg_regims:\n      'features.module.0.weight': [0.000012, '2D']\n      'features.module.3.weight': [0.000012, '2D']\n      'features.module.6.weight': [0.000012, '2D']\n      'features.module.8.weight': [0.000012, '2D']\n      'features.module.10.weight': [0.000012, '2D']\n\n\nlr_schedulers:\n  # Learning rate decay scheduler\n   pruning_lr:\n     class: ExponentialLR\n     gamma: 0.9\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n  - regularizer:\n      instance_name: '2d_groups_regularizer'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 1\n\n  - lr_scheduler:\n      instance_name: pruning_lr\n    starting_epoch: 24\n    ending_epoch: 200\n    frequency: 1\n\n\n\n\n\nQuantization-Aware Training\n\n\nSimilarly to pruners and regularizers, specifying a quantizer in the scheduler YAML follows the constructor arguments of the \nQuantizer\n class (see details \nhere\n). \nNote\n that only a single quantizer instance may be defined per YAML.\n\n\nLet's see an example:\n\n\nquantizers:\n  dorefa_quantizer:\n    class: DorefaQuantizer\n    bits_activations: 8\n    bits_weights: 4\n    bits_overrides:\n      conv1:\n        wts: null\n        acts: null\n      relu1:\n        wts: null\n        acts: null\n      final_relu:\n        wts: null\n        acts: null\n      fc:\n        wts: null\n        acts: null\n\n\n\n\n\n\nThe specific quantization method we're instantiating here is \nDorefaQuantizer\n.\n\n\nThen we define the default bit-widths for activations and weights, in this case 8 and 4-bits, respectively. \n\n\nThen, we define the \nbits_overrides\n mapping. In the example above, we choose not to quantize the first and last layer of the model. In the case of \nDorefaQuantizer\n, the weights are quantized as part of the convolution / FC layers, but the activations are quantized in separate layers, which replace the ReLU layers in the original model (remember - even though we replaced the ReLU modules with our own quantization modules, the name of the modules isn't changed). So, in all, we need to reference the first layer with parameters \nconv1\n, the first activation layer \nrelu1\n, the last activation layer \nfinal_relu\n and the last layer with parameters \nfc\n.\n\n\nSpecifying \nnull\n means \"do not quantize\".\n\n\nNote that for quantizers, we reference names of modules, not names of parameters as we do for pruners and regularizers.\n\n\n\n\nDefining overrides for \ngroups of layers\n using regular expressions\n\n\nSuppose we have a sub-module in our model named \nblock1\n, which contains multiple convolution layers which we would like to quantize to, say, 2-bits. The convolution layers are named \nconv1\n, \nconv2\n and so on. In that case we would define the following:\n\n\nbits_overrides:\n  'block1\\.conv*':\n    wts: 2\n    acts: null\n\n\n\n\n\n\nRegEx Note\n: Remember that the dot (\n.\n) is a meta-character (i.e. a reserved character) in regular expressions. So, to match the actual dot characters which separate sub-modules in PyTorch module names, we need to escape it: \n\\.\n\n\n\n\nOverlapping patterns\n are also possible, which allows to define some override for a groups of layers and also \"single-out\" specific layers for different overrides. For example, let's take the last example and configure a different override for \nblock1.conv1\n:\n\n\nbits_overrides:\n  'block1\\.conv1':\n    wts: 4\n    acts: null\n  'block1\\.conv*':\n    wts: 2\n    acts: null\n\n\n\n\n\n\nImportant Note\n: The patterns are evaluated eagerly - first match wins. So, to properly quantize a model using \"broad\" patterns and more \"specific\" patterns as just shown, make sure the specific pattern is listed \nbefore\n the broad one.\n\n\n\n\nThe \nQuantizationPolicy\n, which controls the quantization procedure during training, is actually quite simplistic. All it does is call the \nprepare_model()\n function of the \nQuantizer\n when it's initialized, followed by the first call to \nquantize_params()\n. Then, at the end of each epoch, after the float copy of the weights has been updated, it calls the \nquantize_params()\n function again.\n\n\npolicies:\n  - quantizer:\n      instance_name: dorefa_quantizer\n      starting_epoch: 0\n      ending_epoch: 200\n      frequency: 1\n\n\n\n\nImportant Note\n: As mentioned \nhere\n, since the quantizer modifies the model's parameters (assuming training with quantization in the loop is used), the call to \nprepare_model()\n must be performed before an optimizer is called. Therefore, currently, the starting epoch for a quantization policy must be 0, otherwise the quantization process will not work as expected. If one wishes to do a \"warm-startup\" (or \"boot-strapping\"), training for a few epochs with full precision and only then starting to quantize, the only way to do this right now is to execute a separate run to generate the boot-strapped weights, and execute a second which will resume the checkpoint with the boot-strapped weights.\n\n\nPost-Training Quantization\n\n\nPost-training quantization differs from the other techniques described here. Since it is not executed during training, it does not require any Policies nor a Scheduler. Currently, the only method implemented for post-training quantization is \nrange-based linear quantization\n. Quantizing a model using this method, requires adding 2 lines of code:\n\n\nquantizer = distiller.quantization.PostTrainLinearQuantizer(model, \nquantizer arguments\n)\nquantizer.prepare_model()\n# Execute evaluation on model as usual\n\n\n\n\nSee the documentation for \nPostTrainLinearQuantizer\n in \nrange_linear.py\n for details on the available arguments.\n\nIn addition to directly instantiating the quantizer with arguments, it can also be configured from a YAML file. The syntax for the YAML file is exactly the same as seen in the quantization-aware training section above. Not surprisingly, the \nclass\n defined must be \nPostTrainLinearQuantizer\n, and any other components or policies defined in the YAML file are ignored. We'll see how to create the quantizer in this manner below.\n\n\nIf more configurability is needed, a helper function can be used that will add a set of command-line arguments to configure the quantizer:\n\n\nparser = argparse.ArgumentParser()\ndistiller.quantization.add_post_train_quant_args(parser)\nargs = parser.parse_args()\n\n\n\n\nThese are the available command line arguments:\n\n\nArguments controlling quantization at evaluation time (\npost-training quantization\n):\n  --quantize-eval, --qe\n                        Apply linear quantization to model before evaluation.\n                        Applicable only if --evaluate is also set\n  --qe-calibration PORTION_OF_TEST_SET\n                        Run the model in evaluation mode on the specified\n                        portion of the test dataset and collect statistics.\n                        Ignores all other 'qe--*' arguments\n  --qe-mode QE_MODE, --qem QE_MODE\n                        Linear quantization mode. Choices: sym | asym_s |\n                        asym_u\n  --qe-bits-acts NUM_BITS, --qeba NUM_BITS\n                        Number of bits for quantization of activations\n  --qe-bits-wts NUM_BITS, --qebw NUM_BITS\n                        Number of bits for quantization of weights\n  --qe-bits-accum NUM_BITS\n                        Number of bits for quantization of the accumulator\n  --qe-clip-acts, --qeca\n                        Enable clipping of activations using min/max values\n                        averaging over batch\n  --qe-no-clip-layers LAYER_NAME [LAYER_NAME ...], --qencl LAYER_NAME [LAYER_NAME ...]\n                        List of layer names for which not to clip activations.\n                        Applicable only if --qe-clip-acts is also set\n  --qe-per-channel, --qepc\n                        Enable per-channel quantization of weights (per output\n                        channel)\n  --qe-stats-file PATH  Path to YAML file with calibration stats. If not\n                        given, dynamic quantization will be run (Note that not\n                        all layer types are supported for dynamic\n                        quantization)\n  --qe-config-file PATH\n                        Path to YAML file containing configuration for\n                        PostTrainLinearQuantizer (if present, all other --qe*\n                        arguments are ignored)\n\n\n\n\n(Note that \n--quantize-eval\n and \n--qe-calibration\n are mutually exclusive.)\n\n\nWhen using these command line arguments, the quantizer can be invoked as follows:\n\n\nif args.quantize_eval:\n    if args.qe_config_file:\n        quantizer = distiller.config_component_from_file_by_class(model, args.qe_config_file,\n                                                                  'PostTrainLinearQuantizer')\n    else:\n        quantizer = quantization.PostTrainLinearQuantizer(model, args.qe_bits_acts, args.qe_bits_wts,\n                                                          args.qe_bits_accum, None, args.qe_mode, args.qe_clip_acts,\n                                                          args.qe_no_clip_layers, args.qe_per_channel,\n                                                          args.qe_stats_file)\n    quantizer.prepare_model()\n    # Execute evaluation on model as usual\n\n\n\n\nNote that the command-line arguments don't expose the \nbits_overrides\n parameter of the quantizer, which allows fine-grained control over how each layer is quantized. To utilize this functionality, configure with a YAML file.\n\n\nTo see integration of these command line arguments in use, see the \nimage classification example\n. For examples invocations of post-training quantization see \nhere\n.\n\n\nCollecting Statistics for Quantization\n\n\nTo collect generate statistics that can be used for static quantization of activations, do the following (shown here assuming the command line argument \n--qe-calibration\n shown above is used, which specifies the number of batches to use for statistics generation):\n\n\nif args.qe_calibration:\n    distiller.utils.assign_layer_fq_names(model)\n    msglogger.info(\nGenerating quantization calibration stats based on {0} users\n.format(args.qe_calibration))\n    collector = distiller.data_loggers.QuantCalibrationStatsCollector(model)\n    with collector_context(collector):\n        # Here call your model evaluation function, making sure to execute only\n        # the portion of the dataset specified by the qe_calibration argument\n    yaml_path = 'some/dir/quantization_stats.yaml'\n    collector.save(yaml_path)\n\n\n\n\nThe genreated YAML stats file can then be provided using the \n`--qe-stats-file\n argument. An example of a generated stats file can be found \nhere\n.\n\n\nKnowledge Distillation\n\n\nKnowledge distillation (see \nhere\n) is also implemented as a \nPolicy\n, which should be added to the scheduler. However, with the current implementation, it cannot be defined within the YAML file like the rest of the policies described above.\n\n\nTo make the integration of this method into applications a bit easier, a helper function can be used that will add a set of command-line arguments related to knowledge distillation:\n\n\nimport argparse\nimport distiller\n\nparser = argparse.ArgumentParser()\ndistiller.knowledge_distillation.add_distillation_args(parser)\n\n\n\n\n(The \nadd_distillation_args\n function accepts some optional arguments, see its implementation at \ndistiller/knowledge_distillation.py\n for details)\n\n\nThese are the command line arguments exposed by this function:\n\n\nKnowledge Distillation Training Arguments:\n  --kd-teacher ARCH     Model architecture for teacher model\n  --kd-pretrained       Use pre-trained model for teacher\n  --kd-resume PATH      Path to checkpoint from which to load teacher weights\n  --kd-temperature TEMP, --kd-temp TEMP\n                        Knowledge distillation softmax temperature\n  --kd-distill-wt WEIGHT, --kd-dw WEIGHT\n                        Weight for distillation loss (student vs. teacher soft\n                        targets)\n  --kd-student-wt WEIGHT, --kd-sw WEIGHT\n                        Weight for student vs. labels loss\n  --kd-teacher-wt WEIGHT, --kd-tw WEIGHT\n                        Weight for teacher vs. labels loss\n  --kd-start-epoch EPOCH_NUM\n                        Epoch from which to enable distillation\n\n\n\n\n\nOnce arguments have been parsed, some initialization code is required, similar to the following:\n\n\n# Assuming:\n# \nargs\n variable holds command line arguments\n# \nmodel\n variable holds the model we're going to train, that is - the student model\n# \ncompression_scheduler\n variable holds a CompressionScheduler instance\n\nargs.kd_policy = None\nif args.kd_teacher:\n    # Create teacher model - replace this with your model creation code\n    teacher = create_model(args.kd_pretrained, args.dataset, args.kd_teacher, device_ids=args.gpus)\n    if args.kd_resume:\n        teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)\n\n    # Create policy and add to scheduler\n    dlw = distiller.DistillationLossWeights(args.kd_distill_wt, args.kd_student_wt, args.kd_teacher_wt)\n    args.kd_policy = distiller.KnowledgeDistillationPolicy(model, teacher, args.kd_temp, dlw)\n    compression_scheduler.add_policy(args.kd_policy, starting_epoch=args.kd_start_epoch, ending_epoch=args.epochs,\n                                     frequency=1)\n\n\n\n\nFinally, during the training loop, we need to perform forward propagation through the teacher model as well. The \nKnowledgeDistillationPolicy\n class keeps a reference to both the student and teacher models, and exposes a \nforward\n function that performs forward propagation on both of them. Since this is not one of the standard policy callbacks, we need to call this function manually from our training loop, as follows:\n\n\nif args.kd_policy is None:\n    # Revert to a \nnormal\n forward-prop call if no knowledge distillation policy is present\n    output = model(input_var)\nelse:\n    output = args.kd_policy.forward(input_var)\n\n\n\n\nTo see this integration in action, take a look at the image classification sample at \nexamples/classifier_compression/compress_classifier.py\n.", 
-            "title": "Compression Scheduling"
-        }, 
-        {
-            "location": "/schedule/index.html#compression-scheduler", 
-            "text": "In iterative pruning, we create some kind of pruning regimen that specifies how to prune, and what to prune at every stage of the pruning and training stages. This motivated the design of  CompressionScheduler : it needed to be part of the training loop, and to be able to make and implement pruning, regularization and quantization decisions.  We wanted to be able to change the particulars of the compression schedule, w/o touching the code, and settled on using YAML as a container for this specification.  We found that when we make many experiments on the same code base, it is easier to maintain all of these experiments if we decouple the differences from the code-base.  Therefore, we added to the scheduler support for learning-rate decay scheduling because, again, we wanted the freedom to change the LR-decay policy without changing code.", 
-            "title": "Compression scheduler"
-        }, 
-        {
-            "location": "/schedule/index.html#high-level-overview", 
-            "text": "Let's briefly discuss the main mechanisms and abstractions: A schedule specification is composed of a list of sections defining instances of Pruners, Regularizers, Quantizers, LR-scheduler and Policies.   Pruners, Regularizers and Quantizers are very similar: They implement either a Pruning/Regularization/Quantization algorithm, respectively.   An LR-scheduler specifies the LR-decay algorithm.     These define the  what  part of the schedule.    The Policies define the  when  part of the schedule: at which epoch to start applying the Pruner/Regularizer/Quantizer/LR-decay, the epoch to end, and how often to invoke the policy (frequency of application).  A policy also defines the instance of Pruner/Regularizer/Quantizer/LR-decay it is managing. \nThe  CompressionScheduler  is configured from a YAML file or from a dictionary, but you can also manually create Policies, Pruners, Regularizers and Quantizers from code.", 
-            "title": "High level overview"
-        }, 
-        {
-            "location": "/schedule/index.html#syntax-through-example", 
-            "text": "We'll use  alexnet.schedule_agp.yaml  to explain some of the YAML syntax for configuring Sensitivity Pruning of Alexnet.  version: 1\npruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.625\n\nlr_schedulers:\n   pruning_lr:\n     class: ExponentialLR\n     gamma: 0.9\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n  - lr_scheduler:\n      instance_name: pruning_lr\n    starting_epoch: 24\n    ending_epoch: 200\n    frequency: 1  There is only one version of the YAML syntax, and the version number is not verified at the moment.  However, to be future-proof it is probably better to let the YAML parser know that you are using version-1 syntax, in case there is ever a version 2.  version: 1  In the  pruners  section, we define the instances of pruners we want the scheduler to instantiate and use. \nWe define a single pruner instance, named  my_pruner , of algorithm  SensitivityPruner .  We will refer to this instance in the  Policies  section. \nThen we list the sensitivity multipliers, \\(s\\), of each of the weight tensors. \nYou may list as many Pruners as you want in this section, as long as each has a unique name.  You can several types of pruners in one schedule.  pruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.6  Next, we want to specify the learning-rate decay scheduling in the  lr_schedulers  section.  We assign a name to this instance:  pruning_lr .  As in the  pruners  section, you may use any name, as long as all LR-schedulers have a unique name.  At the moment, only one instance of LR-scheduler is allowed.  The LR-scheduler must be a subclass of PyTorch's  _LRScheduler .  You can use any of the schedulers defined in  torch.optim.lr_scheduler  (see  here ).  In addition, we've implemented some additional schedulers in Distiller (see  here ). The keyword arguments (kwargs) are passed directly to the LR-scheduler's constructor, so that as new LR-schedulers are added to  torch.optim.lr_scheduler , they can be used without changing the application code.  lr_schedulers:\n   pruning_lr:\n     class: ExponentialLR\n     gamma: 0.9  Finally, we define the  policies  section which defines the actual scheduling.  A  Policy  manages an instance of a  Pruner ,  Regularizer ,  Quantizer , or  LRScheduler , by naming the instance.  In the example below, a  PruningPolicy  uses the pruner instance named  my_pruner : it activates it at a frequency of 2 epochs (i.e. every other epoch), starting at epoch 0, and ending at epoch 38.    policies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n  - lr_scheduler:\n      instance_name: pruning_lr\n    starting_epoch: 24\n    ending_epoch: 200\n    frequency: 1  This is  iterative pruning :    Train Connectivity    Prune Connections    Retrain Weights    Goto 2    It is described  in  Learning both Weights and Connections for Efficient Neural Networks :   \"Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections...After an initial training phase, we remove all connections whose weight is lower than a threshold. This pruning converts a dense, fully-connected layer to a sparse layer. This first phase learns the topology of the networks \u2014 learning which connections are important and removing the unimportant connections. We then retrain the sparse network so the remaining connections can compensate for the connections that have been removed. The phases of pruning and retraining may be repeated iteratively to further reduce network complexity.\"", 
-            "title": "Syntax through example"
-        }, 
-        {
-            "location": "/schedule/index.html#regularization", 
-            "text": "You can also define and schedule regularization.", 
-            "title": "Regularization"
-        }, 
-        {
-            "location": "/schedule/index.html#l1-regularization", 
-            "text": "Format (this is an informal specification, not a valid  ABNF  specification):  regularizers:\n   REGULARIZER_NAME_STR :\n    class: L1Regularizer\n    reg_regims:\n       PYTORCH_PARAM_NAME_STR :  STRENGTH_FLOAT \n      ...\n       PYTORCH_PARAM_NAME_STR :  STRENGTH_FLOAT \n    threshold_criteria: [Mean_Abs | Max]  For example:  version: 1\n\nregularizers:\n  my_L1_reg:\n    class: L1Regularizer\n    reg_regims:\n      'module.layer3.1.conv1.weight': 0.000002\n      'module.layer3.1.conv2.weight': 0.000002\n      'module.layer3.1.conv3.weight': 0.000002\n      'module.layer3.2.conv1.weight': 0.000002\n    threshold_criteria: Mean_Abs\n\npolicies:\n  - regularizer:\n      instance_name: my_L1_reg\n    starting_epoch: 0\n    ending_epoch: 60\n    frequency: 1", 
-            "title": "L1 regularization"
-        }, 
-        {
-            "location": "/schedule/index.html#group-regularization", 
-            "text": "Format (informal specification):  Format:\n  regularizers:\n     REGULARIZER_NAME_STR :\n      class: L1Regularizer\n      reg_regims:\n         PYTORCH_PARAM_NAME_STR : [ STRENGTH_FLOAT ,  '2D' | '3D' | '4D' | 'Channels' | 'Cols' | 'Rows' ]\n         PYTORCH_PARAM_NAME_STR : [ STRENGTH_FLOAT ,  '2D' | '3D' | '4D' | 'Channels' | 'Cols' | 'Rows' ]\n      threshold_criteria: [Mean_Abs | Max]  For example:  version: 1\n\nregularizers:\n  my_filter_regularizer:\n    class: GroupLassoRegularizer\n    reg_regims:\n      'module.layer3.1.conv1.weight': [0.00005, '3D']\n      'module.layer3.1.conv2.weight': [0.00005, '3D']\n      'module.layer3.1.conv3.weight': [0.00005, '3D']\n      'module.layer3.2.conv1.weight': [0.00005, '3D']\n    threshold_criteria: Mean_Abs\n\npolicies:\n  - regularizer:\n      instance_name: my_filter_regularizer\n    starting_epoch: 0\n    ending_epoch: 60\n    frequency: 1", 
-            "title": "Group regularization"
-        }, 
-        {
-            "location": "/schedule/index.html#mixing-it-up", 
-            "text": "You can mix pruning and regularization.  version: 1\npruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.625\n\nregularizers:\n  2d_groups_regularizer:\n    class: GroupLassoRegularizer\n    reg_regims:\n      'features.module.0.weight': [0.000012, '2D']\n      'features.module.3.weight': [0.000012, '2D']\n      'features.module.6.weight': [0.000012, '2D']\n      'features.module.8.weight': [0.000012, '2D']\n      'features.module.10.weight': [0.000012, '2D']\n\n\nlr_schedulers:\n  # Learning rate decay scheduler\n   pruning_lr:\n     class: ExponentialLR\n     gamma: 0.9\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n  - regularizer:\n      instance_name: '2d_groups_regularizer'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 1\n\n  - lr_scheduler:\n      instance_name: pruning_lr\n    starting_epoch: 24\n    ending_epoch: 200\n    frequency: 1", 
-            "title": "Mixing it up"
-        }, 
-        {
-            "location": "/schedule/index.html#quantization-aware-training", 
-            "text": "Similarly to pruners and regularizers, specifying a quantizer in the scheduler YAML follows the constructor arguments of the  Quantizer  class (see details  here ).  Note  that only a single quantizer instance may be defined per YAML.  Let's see an example:  quantizers:\n  dorefa_quantizer:\n    class: DorefaQuantizer\n    bits_activations: 8\n    bits_weights: 4\n    bits_overrides:\n      conv1:\n        wts: null\n        acts: null\n      relu1:\n        wts: null\n        acts: null\n      final_relu:\n        wts: null\n        acts: null\n      fc:\n        wts: null\n        acts: null   The specific quantization method we're instantiating here is  DorefaQuantizer .  Then we define the default bit-widths for activations and weights, in this case 8 and 4-bits, respectively.   Then, we define the  bits_overrides  mapping. In the example above, we choose not to quantize the first and last layer of the model. In the case of  DorefaQuantizer , the weights are quantized as part of the convolution / FC layers, but the activations are quantized in separate layers, which replace the ReLU layers in the original model (remember - even though we replaced the ReLU modules with our own quantization modules, the name of the modules isn't changed). So, in all, we need to reference the first layer with parameters  conv1 , the first activation layer  relu1 , the last activation layer  final_relu  and the last layer with parameters  fc .  Specifying  null  means \"do not quantize\".  Note that for quantizers, we reference names of modules, not names of parameters as we do for pruners and regularizers.", 
-            "title": "Quantization-Aware Training"
-        }, 
-        {
-            "location": "/schedule/index.html#defining-overrides-for-groups-of-layers-using-regular-expressions", 
-            "text": "Suppose we have a sub-module in our model named  block1 , which contains multiple convolution layers which we would like to quantize to, say, 2-bits. The convolution layers are named  conv1 ,  conv2  and so on. In that case we would define the following:  bits_overrides:\n  'block1\\.conv*':\n    wts: 2\n    acts: null   RegEx Note : Remember that the dot ( . ) is a meta-character (i.e. a reserved character) in regular expressions. So, to match the actual dot characters which separate sub-modules in PyTorch module names, we need to escape it:  \\.   Overlapping patterns  are also possible, which allows to define some override for a groups of layers and also \"single-out\" specific layers for different overrides. For example, let's take the last example and configure a different override for  block1.conv1 :  bits_overrides:\n  'block1\\.conv1':\n    wts: 4\n    acts: null\n  'block1\\.conv*':\n    wts: 2\n    acts: null   Important Note : The patterns are evaluated eagerly - first match wins. So, to properly quantize a model using \"broad\" patterns and more \"specific\" patterns as just shown, make sure the specific pattern is listed  before  the broad one.   The  QuantizationPolicy , which controls the quantization procedure during training, is actually quite simplistic. All it does is call the  prepare_model()  function of the  Quantizer  when it's initialized, followed by the first call to  quantize_params() . Then, at the end of each epoch, after the float copy of the weights has been updated, it calls the  quantize_params()  function again.  policies:\n  - quantizer:\n      instance_name: dorefa_quantizer\n      starting_epoch: 0\n      ending_epoch: 200\n      frequency: 1  Important Note : As mentioned  here , since the quantizer modifies the model's parameters (assuming training with quantization in the loop is used), the call to  prepare_model()  must be performed before an optimizer is called. Therefore, currently, the starting epoch for a quantization policy must be 0, otherwise the quantization process will not work as expected. If one wishes to do a \"warm-startup\" (or \"boot-strapping\"), training for a few epochs with full precision and only then starting to quantize, the only way to do this right now is to execute a separate run to generate the boot-strapped weights, and execute a second which will resume the checkpoint with the boot-strapped weights.", 
-            "title": "Defining overrides for groups of layers using regular expressions"
-        }, 
-        {
-            "location": "/schedule/index.html#post-training-quantization", 
-            "text": "Post-training quantization differs from the other techniques described here. Since it is not executed during training, it does not require any Policies nor a Scheduler. Currently, the only method implemented for post-training quantization is  range-based linear quantization . Quantizing a model using this method, requires adding 2 lines of code:  quantizer = distiller.quantization.PostTrainLinearQuantizer(model,  quantizer arguments )\nquantizer.prepare_model()\n# Execute evaluation on model as usual  See the documentation for  PostTrainLinearQuantizer  in  range_linear.py  for details on the available arguments. \nIn addition to directly instantiating the quantizer with arguments, it can also be configured from a YAML file. The syntax for the YAML file is exactly the same as seen in the quantization-aware training section above. Not surprisingly, the  class  defined must be  PostTrainLinearQuantizer , and any other components or policies defined in the YAML file are ignored. We'll see how to create the quantizer in this manner below.  If more configurability is needed, a helper function can be used that will add a set of command-line arguments to configure the quantizer:  parser = argparse.ArgumentParser()\ndistiller.quantization.add_post_train_quant_args(parser)\nargs = parser.parse_args()  These are the available command line arguments:  Arguments controlling quantization at evaluation time ( post-training quantization ):\n  --quantize-eval, --qe\n                        Apply linear quantization to model before evaluation.\n                        Applicable only if --evaluate is also set\n  --qe-calibration PORTION_OF_TEST_SET\n                        Run the model in evaluation mode on the specified\n                        portion of the test dataset and collect statistics.\n                        Ignores all other 'qe--*' arguments\n  --qe-mode QE_MODE, --qem QE_MODE\n                        Linear quantization mode. Choices: sym | asym_s |\n                        asym_u\n  --qe-bits-acts NUM_BITS, --qeba NUM_BITS\n                        Number of bits for quantization of activations\n  --qe-bits-wts NUM_BITS, --qebw NUM_BITS\n                        Number of bits for quantization of weights\n  --qe-bits-accum NUM_BITS\n                        Number of bits for quantization of the accumulator\n  --qe-clip-acts, --qeca\n                        Enable clipping of activations using min/max values\n                        averaging over batch\n  --qe-no-clip-layers LAYER_NAME [LAYER_NAME ...], --qencl LAYER_NAME [LAYER_NAME ...]\n                        List of layer names for which not to clip activations.\n                        Applicable only if --qe-clip-acts is also set\n  --qe-per-channel, --qepc\n                        Enable per-channel quantization of weights (per output\n                        channel)\n  --qe-stats-file PATH  Path to YAML file with calibration stats. If not\n                        given, dynamic quantization will be run (Note that not\n                        all layer types are supported for dynamic\n                        quantization)\n  --qe-config-file PATH\n                        Path to YAML file containing configuration for\n                        PostTrainLinearQuantizer (if present, all other --qe*\n                        arguments are ignored)  (Note that  --quantize-eval  and  --qe-calibration  are mutually exclusive.)  When using these command line arguments, the quantizer can be invoked as follows:  if args.quantize_eval:\n    if args.qe_config_file:\n        quantizer = distiller.config_component_from_file_by_class(model, args.qe_config_file,\n                                                                  'PostTrainLinearQuantizer')\n    else:\n        quantizer = quantization.PostTrainLinearQuantizer(model, args.qe_bits_acts, args.qe_bits_wts,\n                                                          args.qe_bits_accum, None, args.qe_mode, args.qe_clip_acts,\n                                                          args.qe_no_clip_layers, args.qe_per_channel,\n                                                          args.qe_stats_file)\n    quantizer.prepare_model()\n    # Execute evaluation on model as usual  Note that the command-line arguments don't expose the  bits_overrides  parameter of the quantizer, which allows fine-grained control over how each layer is quantized. To utilize this functionality, configure with a YAML file.  To see integration of these command line arguments in use, see the  image classification example . For examples invocations of post-training quantization see  here .", 
-            "title": "Post-Training Quantization"
-        }, 
-        {
-            "location": "/schedule/index.html#collecting-statistics-for-quantization", 
-            "text": "To collect generate statistics that can be used for static quantization of activations, do the following (shown here assuming the command line argument  --qe-calibration  shown above is used, which specifies the number of batches to use for statistics generation):  if args.qe_calibration:\n    distiller.utils.assign_layer_fq_names(model)\n    msglogger.info( Generating quantization calibration stats based on {0} users .format(args.qe_calibration))\n    collector = distiller.data_loggers.QuantCalibrationStatsCollector(model)\n    with collector_context(collector):\n        # Here call your model evaluation function, making sure to execute only\n        # the portion of the dataset specified by the qe_calibration argument\n    yaml_path = 'some/dir/quantization_stats.yaml'\n    collector.save(yaml_path)  The genreated YAML stats file can then be provided using the  `--qe-stats-file  argument. An example of a generated stats file can be found  here .", 
-            "title": "Collecting Statistics for Quantization"
-        }, 
-        {
-            "location": "/schedule/index.html#knowledge-distillation", 
-            "text": "Knowledge distillation (see  here ) is also implemented as a  Policy , which should be added to the scheduler. However, with the current implementation, it cannot be defined within the YAML file like the rest of the policies described above.  To make the integration of this method into applications a bit easier, a helper function can be used that will add a set of command-line arguments related to knowledge distillation:  import argparse\nimport distiller\n\nparser = argparse.ArgumentParser()\ndistiller.knowledge_distillation.add_distillation_args(parser)  (The  add_distillation_args  function accepts some optional arguments, see its implementation at  distiller/knowledge_distillation.py  for details)  These are the command line arguments exposed by this function:  Knowledge Distillation Training Arguments:\n  --kd-teacher ARCH     Model architecture for teacher model\n  --kd-pretrained       Use pre-trained model for teacher\n  --kd-resume PATH      Path to checkpoint from which to load teacher weights\n  --kd-temperature TEMP, --kd-temp TEMP\n                        Knowledge distillation softmax temperature\n  --kd-distill-wt WEIGHT, --kd-dw WEIGHT\n                        Weight for distillation loss (student vs. teacher soft\n                        targets)\n  --kd-student-wt WEIGHT, --kd-sw WEIGHT\n                        Weight for student vs. labels loss\n  --kd-teacher-wt WEIGHT, --kd-tw WEIGHT\n                        Weight for teacher vs. labels loss\n  --kd-start-epoch EPOCH_NUM\n                        Epoch from which to enable distillation  Once arguments have been parsed, some initialization code is required, similar to the following:  # Assuming:\n#  args  variable holds command line arguments\n#  model  variable holds the model we're going to train, that is - the student model\n#  compression_scheduler  variable holds a CompressionScheduler instance\n\nargs.kd_policy = None\nif args.kd_teacher:\n    # Create teacher model - replace this with your model creation code\n    teacher = create_model(args.kd_pretrained, args.dataset, args.kd_teacher, device_ids=args.gpus)\n    if args.kd_resume:\n        teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)\n\n    # Create policy and add to scheduler\n    dlw = distiller.DistillationLossWeights(args.kd_distill_wt, args.kd_student_wt, args.kd_teacher_wt)\n    args.kd_policy = distiller.KnowledgeDistillationPolicy(model, teacher, args.kd_temp, dlw)\n    compression_scheduler.add_policy(args.kd_policy, starting_epoch=args.kd_start_epoch, ending_epoch=args.epochs,\n                                     frequency=1)  Finally, during the training loop, we need to perform forward propagation through the teacher model as well. The  KnowledgeDistillationPolicy  class keeps a reference to both the student and teacher models, and exposes a  forward  function that performs forward propagation on both of them. Since this is not one of the standard policy callbacks, we need to call this function manually from our training loop, as follows:  if args.kd_policy is None:\n    # Revert to a  normal  forward-prop call if no knowledge distillation policy is present\n    output = model(input_var)\nelse:\n    output = args.kd_policy.forward(input_var)  To see this integration in action, take a look at the image classification sample at  examples/classifier_compression/compress_classifier.py .", 
-            "title": "Knowledge Distillation"
-        }, 
-        {
-            "location": "/pruning/index.html", 
-            "text": "Pruning\n\n\nA common methodology for inducing sparsity in weights and activations is called \npruning\n.  Pruning is the application of a binary criteria to decide which weights to prune: weights which match the pruning criteria are assigned a value of zero.  Pruned elements are \"trimmed\" from the model: we zero their values and also make sure they don't take part in the back-propagation process.\n\n\nWe can prune weights, biases, and activations.  Biases are few and their contribution to a layer's output is relatively large, so there is little incentive to prune them.  We usually see sparse activations following a ReLU layer, because ReLU quenches negative activations to exact zero (\\(ReLU(x): max(0,x)\\)).   Sparsity in weights is less common, as weights tend to be very small, but are often not exact zeros.\n\n\n\nLet's define sparsity\n\n\nSparsity is a a measure of how many elements in a tensor are exact zeros, relative to the tensor size.  A tensor is considered sparse if \"most\" of its elements are zero.  How much is \"most\", is not strictly defined, but when you see a sparse tensor you know it ;-)\n\nThe \n\\(l_0\\)-\"norm\" function\n measures how many zero-elements are in a tensor \nx\n:\n\\[\\lVert x \\rVert_0\\;=\\;|x_1|^0 + |x_2|^0 + ... + |x_n|^0 \\]\nIn other words, an element contributes either a value of 1 or 0 to \\(l_0\\).  Anything but an exact zero contributes a value of 1 - that's pretty cool.\n\nSometimes it helps to think about density, the number of non-zero elements (NNZ) and sparsity's complement:\n\\[\ndensity = 1 - sparsity\n\\]\nYou can use \ndistiller.sparsity\n and \ndistiller.density\n to query a PyTorch tensor's sparsity and density.\n\n\nWhat is weights pruning?\n\n\nWeights pruning, or model pruning, is a set of methods to increase the sparsity (amount of zero-valued elements in a tensor) of a network's weights.  In general, the term 'parameters' refers to both weights and bias tensors of a model.  Biases are rarely, if ever, pruned because there are very few bias elements compared to weights elements, and it is just not worth the trouble.\n\n\nPruning requires a criteria for choosing which elements to prune - this is called the \npruning criteria\n.  The most common pruning criteria is the absolute value of each element: the element's absolute value is compared to some threshold value, and if it is below the threshold the element is set to zero (i.e. pruned) .  This is implemented by the \ndistiller.MagnitudeParameterPruner\n class.  The idea behind this method, is that weights with small \\(l_1\\)-norms (absolute value) contribute little to the final result (low saliency), so they are less important and can be removed.\n\n\nA related idea motivating pruning, is that models are over-parametrized and contain redundant logic and features.  Therefore, some of these redundancies can be removed by setting their weights to zero.\n\n\nAnd yet another way to think of pruning is to phrase it as a search for a set of weights with as many zeros as possible, which still produces acceptable inference accuracies compared to the dense-model (non-pruned model).  Another way to look at it, is to imagine that because of the very high-dimensionality of the parameter space, the immediate space around the dense-model's solution likely contains some sparse solutions, and we want to use find these sparse solutions. \n\n\n\n\nPruning schedule\n\n\nThe most straight-forward to prune is to take a trained model and prune it once; also called \none-shot pruning\n.  In \nLearning both Weights and Connections for Efficient Neural Networks\n Song Han et. al show that this is surprisingly effective, but also leaves a lot of potential sparsity untapped.  The surprise is what they call the \"free lunch\" effect: \n\"reducing 2x the connections without losing accuracy even without retraining.\"\n\nHowever, they also note that when employing a pruning-followed-by-retraining regimen, they can achieve much better results (higher sparsity at no accuracy loss).  This is called \niterative pruning\n, and the retraining that follows pruning is often referred to as \nfine-tuning\n. How the pruning criteria changes between iterations, how many iterations we perform and how often, and which tensors are pruned - this is collectively called the \npruning schedule\n.\n\n\nWe can think of iterative pruning as repeatedly learning which weights are important, removing the least important ones based on some importance criteria, and then retraining the model to let it \"recover\" from the pruning by adjusting the remaining weights.  At each iteration, we prune more weights.\n\nThe decision of when to stop pruning is also expressed in the schedule, and it depends on the pruning algorithm.  For example, if we are trying to achieve a specific sparsity level, then we stop when the pruning achieves that level.  And if we are pruning weights structures in order to reduce the required compute budget, then we stop the pruning when this compute reduction is achieved.\n\n\nDistiller supports expressing the pruning schedule as a YAML file (which is then executed by an instance of a PruningScheduler).\n\n\nPruning granularity\n\n\nPruning individual weight elements is called \nelement-wise pruning\n, and it is also sometimes referred to as \nfine-grained\n pruning.\n\n\nCoarse-grained pruning\n - also referred to as \nstructured pruning\n, \ngroup pruning\n, or \nblock pruning\n - is pruning entire groups of elements which have some significance.  Groups come in various shapes and sizes, but an easy to visualize group-pruning is filter-pruning, in which entire filters are removed.\n\n\nSensitivity analysis\n\n\nThe hard part about inducing sparsity via pruning is determining what threshold, or sparsity level, to use for each layer's tensors.  Sensitivity analysis is a method that tries to help us rank the tensors by their sensitivity to pruning.  \n\nThe idea is to set the pruning level (percentage) of a specific layer, and then to prune once, run an evaluation on the test dataset and record the accuracy score.  We do this for all of the parameterized layers, and for each layer we examine several sparsity levels.  This should teach us about the \"sensitivity\" of each of the layers to pruning.\n\n\nThe evaluated model should be trained to maximum accuracy before running the analysis, because we aim to understand the behavior of the trained model's performance in relation to pruning of a specific weights tensor.\n\n\nMuch as we can prune structures, we can also perform sensitivity analysis on structures.  Distiller implements element-wise pruning sensitivity analysis using the \\(l_1\\)-norm of individual elements; and filter-wise pruning sensitivity analysis using the mean \\(l_1\\)-norm of filters.\n\n\n\nThe authors of \nPruning Filters for Efficient ConvNets\n describe how they do sensitivity analysis:\n\n\n\n\n\"To understand the sensitivity of each layer, we prune each layer independently and evaluate the resulting pruned network\u2019s accuracy on the validation set. Figure 2(b) shows that layers that maintain their accuracy as filters are pruned away correspond to layers with larger slopes in Figure 2(a). On the contrary, layers with relatively flat slopes are more sensitive to pruning. We empirically determine the number of filters to prune for each layer based on their sensitivity to pruning. For deep networks such as VGG-16 or ResNets, we observe that layers in the same stage (with the same feature map size) have a similar sensitivity to pruning. To avoid introducing layer-wise meta-parameters, we use the same pruning ratio for all layers in the same stage. For layers that are sensitive to pruning, we prune a smaller percentage of these layers or completely skip pruning them.\"\n\n\n\n\nThe diagram below shows the results of running an element-wise sensitivity analysis on Alexnet, using Distillers's \nperform_sensitivity_analysis\n utility function.\n\n\nAs reported by Song Han, and exhibited in the diagram, in Alexnet the feature detecting layers (convolution layers) are more sensitive to pruning, and their sensitivity drops, the deeper they are.  The fully-connected layers are much less sensitive, which is great, because that's where most of the parameters are.\n\n\n\n\nReferences\n\n\n \nSong Han, Jeff Pool, John Tran, William J. Dally\n.\n    \nLearning both Weights and Connections for Efficient Neural Networks\n,\n     arXiv:1607.04381v2,\n    2015.\n\n\n\n\n\nHao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf\n.\n    \nPruning Filters for Efficient ConvNets\n,\n     arXiv:1608.08710v3,\n    2017.", 
-            "title": "Pruning"
-        }, 
-        {
-            "location": "/pruning/index.html#pruning", 
-            "text": "A common methodology for inducing sparsity in weights and activations is called  pruning .  Pruning is the application of a binary criteria to decide which weights to prune: weights which match the pruning criteria are assigned a value of zero.  Pruned elements are \"trimmed\" from the model: we zero their values and also make sure they don't take part in the back-propagation process.  We can prune weights, biases, and activations.  Biases are few and their contribution to a layer's output is relatively large, so there is little incentive to prune them.  We usually see sparse activations following a ReLU layer, because ReLU quenches negative activations to exact zero (\\(ReLU(x): max(0,x)\\)).   Sparsity in weights is less common, as weights tend to be very small, but are often not exact zeros.", 
-            "title": "Pruning"
-        }, 
-        {
-            "location": "/pruning/index.html#lets-define-sparsity", 
-            "text": "Sparsity is a a measure of how many elements in a tensor are exact zeros, relative to the tensor size.  A tensor is considered sparse if \"most\" of its elements are zero.  How much is \"most\", is not strictly defined, but when you see a sparse tensor you know it ;-) \nThe  \\(l_0\\)-\"norm\" function  measures how many zero-elements are in a tensor  x :\n\\[\\lVert x \\rVert_0\\;=\\;|x_1|^0 + |x_2|^0 + ... + |x_n|^0 \\]\nIn other words, an element contributes either a value of 1 or 0 to \\(l_0\\).  Anything but an exact zero contributes a value of 1 - that's pretty cool. \nSometimes it helps to think about density, the number of non-zero elements (NNZ) and sparsity's complement:\n\\[\ndensity = 1 - sparsity\n\\]\nYou can use  distiller.sparsity  and  distiller.density  to query a PyTorch tensor's sparsity and density.", 
-            "title": "Let's define sparsity"
-        }, 
-        {
-            "location": "/pruning/index.html#what-is-weights-pruning", 
-            "text": "Weights pruning, or model pruning, is a set of methods to increase the sparsity (amount of zero-valued elements in a tensor) of a network's weights.  In general, the term 'parameters' refers to both weights and bias tensors of a model.  Biases are rarely, if ever, pruned because there are very few bias elements compared to weights elements, and it is just not worth the trouble. \nPruning requires a criteria for choosing which elements to prune - this is called the  pruning criteria .  The most common pruning criteria is the absolute value of each element: the element's absolute value is compared to some threshold value, and if it is below the threshold the element is set to zero (i.e. pruned) .  This is implemented by the  distiller.MagnitudeParameterPruner  class.  The idea behind this method, is that weights with small \\(l_1\\)-norms (absolute value) contribute little to the final result (low saliency), so they are less important and can be removed. \nA related idea motivating pruning, is that models are over-parametrized and contain redundant logic and features.  Therefore, some of these redundancies can be removed by setting their weights to zero. \nAnd yet another way to think of pruning is to phrase it as a search for a set of weights with as many zeros as possible, which still produces acceptable inference accuracies compared to the dense-model (non-pruned model).  Another way to look at it, is to imagine that because of the very high-dimensionality of the parameter space, the immediate space around the dense-model's solution likely contains some sparse solutions, and we want to use find these sparse solutions.", 
-            "title": "What is weights pruning?"
-        }, 
-        {
-            "location": "/pruning/index.html#pruning-schedule", 
-            "text": "The most straight-forward to prune is to take a trained model and prune it once; also called  one-shot pruning .  In  Learning both Weights and Connections for Efficient Neural Networks  Song Han et. al show that this is surprisingly effective, but also leaves a lot of potential sparsity untapped.  The surprise is what they call the \"free lunch\" effect:  \"reducing 2x the connections without losing accuracy even without retraining.\" \nHowever, they also note that when employing a pruning-followed-by-retraining regimen, they can achieve much better results (higher sparsity at no accuracy loss).  This is called  iterative pruning , and the retraining that follows pruning is often referred to as  fine-tuning . How the pruning criteria changes between iterations, how many iterations we perform and how often, and which tensors are pruned - this is collectively called the  pruning schedule . \nWe can think of iterative pruning as repeatedly learning which weights are important, removing the least important ones based on some importance criteria, and then retraining the model to let it \"recover\" from the pruning by adjusting the remaining weights.  At each iteration, we prune more weights. \nThe decision of when to stop pruning is also expressed in the schedule, and it depends on the pruning algorithm.  For example, if we are trying to achieve a specific sparsity level, then we stop when the pruning achieves that level.  And if we are pruning weights structures in order to reduce the required compute budget, then we stop the pruning when this compute reduction is achieved. \nDistiller supports expressing the pruning schedule as a YAML file (which is then executed by an instance of a PruningScheduler).", 
-            "title": "Pruning schedule"
-        }, 
-        {
-            "location": "/pruning/index.html#pruning-granularity", 
-            "text": "Pruning individual weight elements is called  element-wise pruning , and it is also sometimes referred to as  fine-grained  pruning.  Coarse-grained pruning  - also referred to as  structured pruning ,  group pruning , or  block pruning  - is pruning entire groups of elements which have some significance.  Groups come in various shapes and sizes, but an easy to visualize group-pruning is filter-pruning, in which entire filters are removed.", 
-            "title": "Pruning granularity"
-        }, 
-        {
-            "location": "/pruning/index.html#sensitivity-analysis", 
-            "text": "The hard part about inducing sparsity via pruning is determining what threshold, or sparsity level, to use for each layer's tensors.  Sensitivity analysis is a method that tries to help us rank the tensors by their sensitivity to pruning.   \nThe idea is to set the pruning level (percentage) of a specific layer, and then to prune once, run an evaluation on the test dataset and record the accuracy score.  We do this for all of the parameterized layers, and for each layer we examine several sparsity levels.  This should teach us about the \"sensitivity\" of each of the layers to pruning. \nThe evaluated model should be trained to maximum accuracy before running the analysis, because we aim to understand the behavior of the trained model's performance in relation to pruning of a specific weights tensor. \nMuch as we can prune structures, we can also perform sensitivity analysis on structures.  Distiller implements element-wise pruning sensitivity analysis using the \\(l_1\\)-norm of individual elements; and filter-wise pruning sensitivity analysis using the mean \\(l_1\\)-norm of filters.  The authors of  Pruning Filters for Efficient ConvNets  describe how they do sensitivity analysis:   \"To understand the sensitivity of each layer, we prune each layer independently and evaluate the resulting pruned network\u2019s accuracy on the validation set. Figure 2(b) shows that layers that maintain their accuracy as filters are pruned away correspond to layers with larger slopes in Figure 2(a). On the contrary, layers with relatively flat slopes are more sensitive to pruning. We empirically determine the number of filters to prune for each layer based on their sensitivity to pruning. For deep networks such as VGG-16 or ResNets, we observe that layers in the same stage (with the same feature map size) have a similar sensitivity to pruning. To avoid introducing layer-wise meta-parameters, we use the same pruning ratio for all layers in the same stage. For layers that are sensitive to pruning, we prune a smaller percentage of these layers or completely skip pruning them.\"   The diagram below shows the results of running an element-wise sensitivity analysis on Alexnet, using Distillers's  perform_sensitivity_analysis  utility function. \nAs reported by Song Han, and exhibited in the diagram, in Alexnet the feature detecting layers (convolution layers) are more sensitive to pruning, and their sensitivity drops, the deeper they are.  The fully-connected layers are much less sensitive, which is great, because that's where most of the parameters are.", 
-            "title": "Sensitivity analysis"
-        }, 
-        {
-            "location": "/pruning/index.html#references", 
-            "text": "Song Han, Jeff Pool, John Tran, William J. Dally .\n     Learning both Weights and Connections for Efficient Neural Networks ,\n     arXiv:1607.04381v2,\n    2015.   Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf .\n     Pruning Filters for Efficient ConvNets ,\n     arXiv:1608.08710v3,\n    2017.", 
-            "title": "References"
-        }, 
-        {
-            "location": "/regularization/index.html", 
-            "text": "Regularization\n\n\nIn their book \nDeep Learning\n Ian Goodfellow et al. define regularization as\n\n\n\n\n\"any modification we make to a learning algorithm that is intended to reduce its generalization error, but not its training error.\"\n\n\n\n\nPyTorch's \noptimizers\n use \\(l_2\\) parameter regularization to limit the capacity of models (i.e. reduce the variance).\n\n\nIn general, we can write this as:\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_R R(W)\n\\]\nAnd specifically,\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_R \\lVert W \\rVert_2^2\n\\]\nWhere W is the collection of all weight elements in the network (i.e. this is model.parameters()), \\(loss(W;x;y)\\) is the total training loss, and \\(loss_D(W)\\) is the data loss (i.e. the error of the objective function, also called the loss function, or \ncriterion\n in the Distiller sample image classifier compression application).\n\n\noptimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9, weight_decay=0.0001)\ncriterion = nn.CrossEntropyLoss()\n...\nfor input, target in dataset:\n    optimizer.zero_grad()\n    output = model(input)\n    loss = criterion(output, target)\n    loss.backward()\n    optimizer.step()\n\n\n\n\n\\(\\lambda_R\\) is a scalar called the \nregularization strength\n, and it balances the data error and the regularization error.  In PyTorch, this is the \nweight_decay\n argument.\n\n\n\\(\\lVert W \\rVert_2^2\\) is the square of the \\(l_2\\)-norm of W, and as such it is a \nmagnitude\n, or sizing, of the weights tensor.\n\\[\n\\lVert W \\rVert_2^2 = \\sum_{l=1}^{L}  \\sum_{i=1}^{n} |w_{l,i}|^2 \\;\\;where \\;n = torch.numel(w_l)\n\\]\n\n\n\\(L\\) is the number of layers in the network; and the notation about used 1-based numbering to simplify the notation.\n\n\nThe qualitative differences between the \\(l_2\\)-norm, and the squared \\(l_2\\)-norm is explained in \nDeep Learning\n.\n\n\nSparsity and Regularization\n\n\nWe mention regularization because there is an interesting interaction between regularization and some DNN sparsity-inducing methods.\n\n\nIn \nDense-Sparse-Dense (DSD)\n, Song Han et al. use pruning as a regularizer to improve a model's accuracy:\n\n\n\n\n\"Sparsity is a powerful form of regularization. Our intuition is that, once the network arrives at a local minimum given the sparsity constraint, relaxing the constraint gives the network more freedom to escape the saddle point and arrive at a higher-accuracy local minimum.\"\n\n\n\n\nRegularization can also be used to induce sparsity.  To induce element-wise sparsity we can use the \\(l_1\\)-norm, \\(\\lVert W \\rVert_1\\).\n\\[\n\\lVert W \\rVert_1 = l_1(W) = \\sum_{i=1}^{|W|} |w_i|\n\\]\n\n\n\\(l_2\\)-norm regularization reduces overfitting and improves a model's accuracy by shrinking large parameters, but it does not force these parameters to absolute zero.  \\(l_1\\)-norm regularization sets some of the parameter elements to zero, therefore limiting the model's capacity while making the model simpler.  This is sometimes referred to as \nfeature selection\n and gives us another interpretation of pruning.\n\n\nOne\n of Distiller's Jupyter notebooks explains how the \\(l_1\\)-norm regularizer induces sparsity, and how it interacts with \\(l_2\\)-norm regularization.\n\n\nIf we configure \nweight_decay\n to zero and use \\(l_1\\)-norm regularization, then we have:\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_R \\lVert W \\rVert_1\n\\]\nIf we use both regularizers, we have:\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_{R_2} \\lVert W \\rVert_2^2  + \\lambda_{R_1} \\lVert W \\rVert_1\n\\]\n\n\nClass \ndistiller.L1Regularizer\n implements \\(l_1\\)-norm regularization, and of course, you can also schedule regularization.\n\n\nl1_regularizer = distiller.s(model.parameters())\n...\nloss = criterion(output, target) + lambda * l1_regularizer()\n\n\n\n\nGroup Regularization\n\n\nIn Group Regularization, we penalize entire groups of parameter elements, instead of individual elements.  Therefore, entire groups are either sparsified (i.e. all of the group elements have a value of zero) or not.  The group structures have to be pre-defined.\n\n\nTo the data loss, and the element-wise regularization (if any), we can add group-wise regularization penalty.  We represent all of the parameter groups in layer \\(l\\) as \\( W_l^{(G)} \\), and we add the penalty of all groups for all layers.  It gets a bit messy, but not overly complicated:\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_R R(W) + \\lambda_g \\sum_{l=1}^{L} R_g(W_l^{(G)})\n\\]\n\n\nLet's denote all of the weight elements in group \\(g\\) as \\(w^{(g)}\\).\n\n\n\\[\nR_g(w^{(g)}) = \\sum_{g=1}^{G} \\lVert w^{(g)} \\rVert_g = \\sum_{g=1}^{G} \\sum_{i=1}^{|w^{(g)}|} {(w_i^{(g)})}^2\n\\]\nwhere \\(w^{(g)} \\in w^{(l)} \\) and \\( |w^{(g)}| \\) is the number of elements in \\( w^{(g)} \\).\n\n\n\\( \\lambda_g \\sum_{l=1}^{L} R_g(W_l^{(G)}) \\) is called the Group Lasso regularizer.  Much as in \\(l_1\\)-norm regularization we sum the magnitudes of all tensor elements, in Group Lasso we sum the magnitudes of element structures (i.e. groups).\n\n\n\nGroup Regularization is also called Block Regularization, Structured Regularization, or coarse-grained sparsity (remember that element-wise sparsity is sometimes referred to as fine-grained sparsity).  Group sparsity exhibits regularity (i.e. its shape is regular), and therefore\nit can be beneficial to improve inference speed.\n\n\nHuizi-et-al-2017\n provides an overview of some of the different groups: kernel, channel, filter, layers.  Fiber structures such as matrix columns and rows, as well as various shaped structures (block sparsity), and even \nintra kernel strided sparsity\n can also be used.\n\n\ndistiller.GroupLassoRegularizer\n currently implements most of these groups, and you can easily add new groups.\n\n\nReferences\n\n\n \nIan Goodfellow and Yoshua Bengio and Aaron Courville\n.\n    \nDeep Learning\n,\n     arXiv:1607.04381v2,\n    2017.\n\n\n\n\n\nSong Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally\n.\n    \nDSD: Dense-Sparse-Dense Training for Deep Neural Networks\n,\n     arXiv:1607.04381v2,\n    2017.\n\n\n\n\n\nHuizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally\n.\n    \nExploring the Regularity of Sparse Structure in Convolutional Neural Networks\n,\n    arXiv:1705.08922v3,\n    2017.\n\n\n\n\n\nSajid Anwar, Kyuyeon Hwang, and Wonyong Sung\n.\n    \nStructured pruning of deep convolutional neural networks\n,\n    arXiv:1512.08571,\n    2015", 
-            "title": "Regularization"
-        }, 
-        {
-            "location": "/regularization/index.html#regularization", 
-            "text": "In their book  Deep Learning  Ian Goodfellow et al. define regularization as   \"any modification we make to a learning algorithm that is intended to reduce its generalization error, but not its training error.\"   PyTorch's  optimizers  use \\(l_2\\) parameter regularization to limit the capacity of models (i.e. reduce the variance).  In general, we can write this as:\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_R R(W)\n\\]\nAnd specifically,\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_R \\lVert W \\rVert_2^2\n\\]\nWhere W is the collection of all weight elements in the network (i.e. this is model.parameters()), \\(loss(W;x;y)\\) is the total training loss, and \\(loss_D(W)\\) is the data loss (i.e. the error of the objective function, also called the loss function, or  criterion  in the Distiller sample image classifier compression application).  optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9, weight_decay=0.0001)\ncriterion = nn.CrossEntropyLoss()\n...\nfor input, target in dataset:\n    optimizer.zero_grad()\n    output = model(input)\n    loss = criterion(output, target)\n    loss.backward()\n    optimizer.step()  \\(\\lambda_R\\) is a scalar called the  regularization strength , and it balances the data error and the regularization error.  In PyTorch, this is the  weight_decay  argument.  \\(\\lVert W \\rVert_2^2\\) is the square of the \\(l_2\\)-norm of W, and as such it is a  magnitude , or sizing, of the weights tensor.\n\\[\n\\lVert W \\rVert_2^2 = \\sum_{l=1}^{L}  \\sum_{i=1}^{n} |w_{l,i}|^2 \\;\\;where \\;n = torch.numel(w_l)\n\\]  \\(L\\) is the number of layers in the network; and the notation about used 1-based numbering to simplify the notation.  The qualitative differences between the \\(l_2\\)-norm, and the squared \\(l_2\\)-norm is explained in  Deep Learning .", 
-            "title": "Regularization"
-        }, 
-        {
-            "location": "/regularization/index.html#sparsity-and-regularization", 
-            "text": "We mention regularization because there is an interesting interaction between regularization and some DNN sparsity-inducing methods.  In  Dense-Sparse-Dense (DSD) , Song Han et al. use pruning as a regularizer to improve a model's accuracy:   \"Sparsity is a powerful form of regularization. Our intuition is that, once the network arrives at a local minimum given the sparsity constraint, relaxing the constraint gives the network more freedom to escape the saddle point and arrive at a higher-accuracy local minimum.\"   Regularization can also be used to induce sparsity.  To induce element-wise sparsity we can use the \\(l_1\\)-norm, \\(\\lVert W \\rVert_1\\).\n\\[\n\\lVert W \\rVert_1 = l_1(W) = \\sum_{i=1}^{|W|} |w_i|\n\\]  \\(l_2\\)-norm regularization reduces overfitting and improves a model's accuracy by shrinking large parameters, but it does not force these parameters to absolute zero.  \\(l_1\\)-norm regularization sets some of the parameter elements to zero, therefore limiting the model's capacity while making the model simpler.  This is sometimes referred to as  feature selection  and gives us another interpretation of pruning.  One  of Distiller's Jupyter notebooks explains how the \\(l_1\\)-norm regularizer induces sparsity, and how it interacts with \\(l_2\\)-norm regularization.  If we configure  weight_decay  to zero and use \\(l_1\\)-norm regularization, then we have:\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_R \\lVert W \\rVert_1\n\\]\nIf we use both regularizers, we have:\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_{R_2} \\lVert W \\rVert_2^2  + \\lambda_{R_1} \\lVert W \\rVert_1\n\\]  Class  distiller.L1Regularizer  implements \\(l_1\\)-norm regularization, and of course, you can also schedule regularization.  l1_regularizer = distiller.s(model.parameters())\n...\nloss = criterion(output, target) + lambda * l1_regularizer()", 
-            "title": "Sparsity and Regularization"
-        }, 
-        {
-            "location": "/regularization/index.html#group-regularization", 
-            "text": "In Group Regularization, we penalize entire groups of parameter elements, instead of individual elements.  Therefore, entire groups are either sparsified (i.e. all of the group elements have a value of zero) or not.  The group structures have to be pre-defined.  To the data loss, and the element-wise regularization (if any), we can add group-wise regularization penalty.  We represent all of the parameter groups in layer \\(l\\) as \\( W_l^{(G)} \\), and we add the penalty of all groups for all layers.  It gets a bit messy, but not overly complicated:\n\\[\nloss(W;x;y) = loss_D(W;x;y) + \\lambda_R R(W) + \\lambda_g \\sum_{l=1}^{L} R_g(W_l^{(G)})\n\\]  Let's denote all of the weight elements in group \\(g\\) as \\(w^{(g)}\\).  \\[\nR_g(w^{(g)}) = \\sum_{g=1}^{G} \\lVert w^{(g)} \\rVert_g = \\sum_{g=1}^{G} \\sum_{i=1}^{|w^{(g)}|} {(w_i^{(g)})}^2\n\\]\nwhere \\(w^{(g)} \\in w^{(l)} \\) and \\( |w^{(g)}| \\) is the number of elements in \\( w^{(g)} \\).  \\( \\lambda_g \\sum_{l=1}^{L} R_g(W_l^{(G)}) \\) is called the Group Lasso regularizer.  Much as in \\(l_1\\)-norm regularization we sum the magnitudes of all tensor elements, in Group Lasso we sum the magnitudes of element structures (i.e. groups).  \nGroup Regularization is also called Block Regularization, Structured Regularization, or coarse-grained sparsity (remember that element-wise sparsity is sometimes referred to as fine-grained sparsity).  Group sparsity exhibits regularity (i.e. its shape is regular), and therefore\nit can be beneficial to improve inference speed.  Huizi-et-al-2017  provides an overview of some of the different groups: kernel, channel, filter, layers.  Fiber structures such as matrix columns and rows, as well as various shaped structures (block sparsity), and even  intra kernel strided sparsity  can also be used.  distiller.GroupLassoRegularizer  currently implements most of these groups, and you can easily add new groups.", 
-            "title": "Group Regularization"
-        }, 
-        {
-            "location": "/regularization/index.html#references", 
-            "text": "Ian Goodfellow and Yoshua Bengio and Aaron Courville .\n     Deep Learning ,\n     arXiv:1607.04381v2,\n    2017.   Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally .\n     DSD: Dense-Sparse-Dense Training for Deep Neural Networks ,\n     arXiv:1607.04381v2,\n    2017.   Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally .\n     Exploring the Regularity of Sparse Structure in Convolutional Neural Networks ,\n    arXiv:1705.08922v3,\n    2017.   Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung .\n     Structured pruning of deep convolutional neural networks ,\n    arXiv:1512.08571,\n    2015", 
-            "title": "References"
-        }, 
-        {
-            "location": "/quantization/index.html", 
-            "text": "Quantization\n\n\nQuantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, the predominant numerical format used for research and for deployment has so far been 32-bit floating point, or FP32. However, the desire for reduced bandwidth and compute requirements of deep learning models has driven research into using lower-precision numerical formats. It has been extensively demonstrated that weights and activations can be represented using 8-bit integers (or INT8) without incurring significant loss in accuracy. The use of even lower bit-widths, such as 4/2/1-bits, is an active field of research that has also shown great progress.\n\n\nNote that this discussion is on quantization only in the context of more efficient inference. Using lower-precision numerics for more efficient training is currently out of scope.\n\n\nMotivation: Overall Efficiency\n\n\nThe more obvious benefit from quantization is \nsignificantly reduced bandwidth and storage\n. For instance, using INT8 for weights and activations consumes 4x less overall bandwidth compared to FP32.\n\nAdditionally integer compute is \nfaster\n than floating point compute. It is also much more \narea and energy efficient\n: \n\n\n\n\n\n\n\n\nINT8 Operation\n\n\nEnergy Saving vs FP32\n\n\nArea Saving vs FP32\n\n\n\n\n\n\n\n\n\n\nAdd\n\n\n30x\n\n\n116x\n\n\n\n\n\n\nMultiply\n\n\n18.5x\n\n\n27x\n\n\n\n\n\n\n\n\n(\nDally, 2015\n)\n\n\nNote that very aggressive quantization can yield even more efficiency. If weights are binary (-1, 1) or ternary (-1, 0, 1 using 2-bits), then convolution and fully-connected layers can be computed with additions and subtractions only, removing multiplications completely. If activations are binary as well, then additions can also be removed, in favor of bitwise operations (\nRastegari et al., 2016\n).\n\n\nInteger vs. FP32\n\n\nThere are two main attributes when discussing a numerical format. The first is \ndynamic range\n, which refers to the range of representable numbers. The second one is how many values can be represented within the dynamic range, which in turn determines the \nprecision / resolution\n of the format (the distance between two numbers).\n\nFor all integer formats, the dynamic range is \n[-2^{n-1} .. 2^{n-1}-1]\n, where \nn\n is the number of bits. So for INT8 the range is \n[-128 .. 127]\n, and for INT4 it is \n[-16 .. 15]\n (we're limiting ourselves to signed integers for now). The number of representable values is \n2^n\n.\nContrast that with FP32, where the dynamic range is \n\\pm 3.4\\ x\\ 10^{38}\n, and approximately \n4.2\\ x\\ 10^9\n values can be represented.\n\nWe can immediately see that FP32 is much more \nversatile\n, in that it is able to represent a wide range of distributions accurately. This is a nice property for deep learning models, where the distributions of weights and activations are usually very different (at least in dynamic range). In addition the dynamic range can differ between layers in the model.\n\nIn order to be able to represent these different distributions with an integer format, a \nscale factor\n is used to map the dynamic range of the tensor to the integer format range. But still we remain with the issue of having a significantly lower number of representable values, that is - much lower resolution.\n\nNote that this scale factor is, in most cases, a floating-point number. Hence, even when using integer numerics, some floating-point computations remain. \nCourbariaux et al., 2014\n scale using only shifts, eliminating the floating point operation. In \nGEMMLWOP\n, the FP32 scale factor is approximated using an integer or fixed-point multiplication followed by a shift operation. In many cases the effect of this approximation on accuracy is negligible.\n\n\nAvoiding Overflows\n\n\nConvolution and fully connected layers involve the storing of intermediate results in accumulators. Due to the limited dynamic range of integer formats, if we would use the same bit-width for the weights and activation, \nand\n for the accumulators, we would likely overflow very quickly. Therefore, accumulators are usually implemented with higher bit-widths.\n\nThe result of multiplying two \nn\n-bit integers is, at most, a \n2n\n-bit number. In convolution layers, such multiplications are accumulated \nc\\cdot k^2\n times, where \nc\n is the number of input channels and \nk\n is the kernel width (assuming a square kernel). Hence, to avoid overflowing, the accumulator should be \n2n + M\n-bits wide, where M is at least \nlog_2(c\\cdot k^2)\n. In many cases 32-bit accumulators are used, however for INT4 and lower it might be possible to use less than 32 -bits, depending on the expected use cases and layer widths.\n\n\n\"Conservative\" Quantization: INT8\n\n\nIn many cases, taking a model trained for FP32 and directly quantizing it to INT8, without any re-training, can result in a relatively low loss of accuracy (which may or may not be acceptable, depending on the use case). Some fine-tuning can further improve the accuracy (\nGysel at al., 2018\n).\n\nAs mentioned above, a scale factor is used to adapt the dynamic range of the tensor at hand to that of the integer format. This scale factor needs to be calculated per-layer per-tensor. The simplest way is to map the min/max values of the float tensor to the min/max of the integer format. For weights and biases this is easy, as they are set once training is complete. For activations, the min/max float values can be obtained \"online\" during inference, or \"offline\".\n\n\n\n\nOffline\n means gathering activations statistics before deploying the model, either during training or by running a few \"calibration\" batches on the trained FP32 model. Based on these gathered statistics, the scaled factors are calculated and are fixed once the model is deployed. This method has the risk of encountering values outside the previously observed ranges at runtime. These values will be clipped, which might lead to accuracy degradation.\n\n\nOnline\n means calculating the min/max values for each tensor dynamically during runtime. In this method clipping cannot occur, however the added computation resources required to calculate the min/max values at runtime might be prohibitive.\n\n\n\n\n\n\n\nIt is important to note, however, that the full float range of an activations tensor usually includes elements which are statistically outliers. These values can be discarded by using a narrower min/max range, effectively allowing some clipping to occur in favor of increasing the resolution provided to the part of the distribution containing most of the information. A simple method which can yield nice results is to simply use an average of the observed min/max values instead of the actual values. Alternatively, statistical measures can be used to intelligently select where to clip the original range in order to preserve as much information as possible (\nMigacz, 2017\n). Going further, \nBanner et al., 2018\n have proposed a method for analytically computing the clipping value under certain conditions.\n\n\nAnother possible optimization point is \nscale-factor scope\n. The most common way is use a single scale-factor per-layer, but it is also possible to calculate a scale-factor per-channel. This can be beneficial if the weight distributions vary greatly between channels.\n\n\nWhen used to directly quantize a model without re-training, as described so far, this method is commonly referred to as \npost-training quantization\n. However, recent publications have shown that there are cases where post-training quantization to INT8 doesn't preserve accuracy (\nBenoit et al., 2018\n, \nKrishnamoorthi, 2018\n). Namely, smaller models such as MobileNet seem to not respond as well to post-training quantization, presumabley due to their smaller representational capacity. In such cases, \nquantization-aware training\n is used.\n\n\n\"Aggressive\" Quantization: INT4 and Lower\n\n\nNaively quantizing a FP32 model to INT4 and lower usually incurs significant accuracy degradation. Many works have tried to mitigate this effect. They usually employ one or more of the following concepts in order to improve model accuracy:\n\n\n\n\nTraining / Re-Training\n: For INT4 and lower, training is required in order to obtain reasonable accuracy. The training loop is modified to take quantization into account. See details in the \nnext section\n.\n\n\nZhou S et al., 2016\n have shown that bootstrapping the quantized model with trained FP32 weights leads to higher accuracy, as opposed to training from scratch. Other methods \nrequire\n a trained FP32 model, either as a starting point (\nZhou A et al., 2017\n), or as a teacher network in a knowledge distillation training setup (see \nhere\n).\n\n\nReplacing the activation function\n: The most common activation function in vision models is ReLU, which is unbounded. That is - its dynamic range is not limited for positive inputs. This is very problematic for INT4 and below due to the very limited range and resolution. Therefore, most methods replace ReLU with another function which is bounded. In some cases a clipping function with hard coded values is used (\nZhou S et al., 2016\n, \nMishra et al., 2018\n). Another method learns the clipping value per layer, with better results (\nChoi et al., 2018\n). Once the clipping value is set, the scale factor used for quantization is also set, and no further calibration steps are required (as opposed to INT8 methods described above).\n\n\nModifying network structure\n: \nMishra et al., 2018\n try to compensate for the loss of information due to quantization by using wider layers (more channels). \nLin et al., 2017\n proposed a binary quantization method in which a single FP32 convolution is replaced with multiple binary convolutions, each scaled to represent a different \"base\", covering a larger dynamic range overall.\n\n\nFirst and last layer\n: Many methods do not quantize the first and last layer of the model. It has been observed by \nHan et al., 2015\n that the first convolutional layer is more sensitive to weights pruning, and some quantization works cite the same reason and show it empirically (\nZhou S et al., 2016\n, \nChoi et al., 2018\n). Some works also note that these layers usually constitute a very small portion of the overall computation within the model, further reducing the motivation to quantize them (\nRastegari et al., 2016\n). Most methods keep the first and last layers at FP32. However, \nChoi et al., 2018\n showed that \"conservative\" quantization of these layers, e.g. to INT8, does not reduce accuracy.\n\n\nIterative quantization\n: Most methods quantize the entire model at once. \nZhou A et al., 2017\n employ an iterative method, which starts with a trained FP32 baseline, and quantizes only a portion of the model at the time followed by several epochs of re-training to recover the accuracy loss from quantization.\n\n\nMixed Weights and Activations Precision\n: It has been observed that activations are more sensitive to quantization than weights (\nZhou S et al., 2016\n). Hence it is not uncommon to see experiments with activations quantized to a higher precision compared to weights. Some works have focused solely on quantizing weights, keeping the activations at FP32 (\nLi et al., 2016\n, \nZhu et al., 2016\n).\n\n\n\n\nQuantization-Aware Training\n\n\nAs mentioned above, in order to minimize the loss of accuracy from \"aggressive\" quantization, many methods that target INT4 and lower (and in some cases for INT8 as well) involve training the model in a way that considers the quantization. This means training with quantization of weights and activations \"baked\" into the training procedure. The training graph usually looks like this:\n\n\n\n\nA full precision copy of the weights is maintained throughout the training process (\"weights_fp\" in the diagram). Its purpose is to accumulate the small changes from the gradients without loss of precision (Note that the quantization of the weights is an integral part of the training graph, meaning that we back-propagate through it as well). Once the model is trained, only the quantized weights are used for inference.\n\nIn the diagram we show \"layer N\" as the conv + batch-norm + activation combination, but the same applies to fully-connected layers, element-wise operations, etc. During training, the operations within \"layer N\" can still run in full precision, with the \"quantize\" operations in the boundaries ensuring discrete-valued weights and activations. This is sometimes called \"simulated quantization\".  \n\n\nStraight-Through Estimator\n\n\nAn important question in this context is how to back-propagate through the quantization functions. These functions are discrete-valued, hence their derivative is 0 almost everywhere. So, using their gradients as-is would severely hinder the learning process. An approximation commonly used to overcome this issue is the \"straight-through estimator\" (STE) (\nHinton et al., 2012\n, \nBengio, 2013\n), which simply passes the gradient through these functions as-is.  \n\n\nReferences\n\n\n\n\nWilliam Dally\n. High-Performance Hardware for Machine Learning. \nTutorial, NIPS, 2015\n\n\n\n\n\nMohammad Rastegari, Vicente Ordone, Joseph Redmon and Ali Farhadi\n. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. \nECCV, 2016\n\n\n\n\n\nMatthieu Courbariaux, Yoshua Bengio and Jean-Pierre David\n. Training deep neural networks with low precision multiplications. \narxiv:1412.7024\n\n\n\n\n\nPhilipp Gysel, Jon Pimentel, Mohammad Motamedi and Soheil Ghiasi\n. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. \nIEEE Transactions on Neural Networks and Learning Systems, 2018\n\n\n\n\n\nSzymon Migacz\n. 8-bit Inference with TensorRT. \nGTC San Jose, 2017\n\n\n\n\n\nShuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu and Yuheng Zou\n. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. \narxiv:1606.06160\n\n\n\n\n\nAojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu and Yurong Chen\n. Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights. \nICLR, 2017\n\n\n\n\n\nAsit Mishra, Eriko Nurvitadhi, Jeffrey J Cook and Debbie Marr\n. WRPN: Wide Reduced-Precision Networks. \nICLR, 2018\n\n\n\n\n\nJungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan and Kailash Gopalakrishnan\n. PACT: Parameterized Clipping Activation for Quantized Neural Networks. \narxiv:1805.06085\n\n\n\n\n\nXiaofan Lin, Cong Zhao and Wei Pan\n. Towards Accurate Binary Convolutional Neural Network. \nNIPS, 2017\n\n\n\n\n\nSong Han, Jeff Pool, John Tran and William Dally\n. Learning both Weights and Connections for Efficient Neural Network. \nNIPS, 2015\n\n\n\n\n\nFengfu Li, Bo Zhang and Bin Liu\n. Ternary Weight Networks. \narxiv:1605.04711\n\n\n\n\n\nChenzhuo Zhu, Song Han, Huizi Mao and William J. Dally\n. Trained Ternary Quantization. \narxiv:1612.01064\n\n\n\n\n\nYoshua Bengio, Nicholas Leonard and Aaron Courville\n. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. \narxiv:1308.3432\n\n\n\n\n\nGeoffrey Hinton, Nitish Srivastava, Kevin Swersky, Tijmen Tieleman and Abdelrahman Mohamed\n. Neural Networks for Machine Learning. \nCoursera, video lectures, 2012\n\n\n\n\n\nBenoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam and Dmitry Kalenichenko\n. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. \nECCV, 2018\n\n\n\n\n\nRaghuraman Krishnamoorthi\n. Quantizing deep convolutional networks for efficient inference: A whitepaper \narxiv:1806.08342\n\n\n\n\n\nRon Banner, Yury Nahshan, Elad Hoffer and Daniel Soudry\n. ACIQ: Analytical Clipping for Integer Quantization of neural networks \narxiv:1810.05723", 
-            "title": "Quantization"
-        }, 
-        {
-            "location": "/quantization/index.html#quantization", 
-            "text": "Quantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, the predominant numerical format used for research and for deployment has so far been 32-bit floating point, or FP32. However, the desire for reduced bandwidth and compute requirements of deep learning models has driven research into using lower-precision numerical formats. It has been extensively demonstrated that weights and activations can be represented using 8-bit integers (or INT8) without incurring significant loss in accuracy. The use of even lower bit-widths, such as 4/2/1-bits, is an active field of research that has also shown great progress.  Note that this discussion is on quantization only in the context of more efficient inference. Using lower-precision numerics for more efficient training is currently out of scope.", 
-            "title": "Quantization"
-        }, 
-        {
-            "location": "/quantization/index.html#motivation-overall-efficiency", 
-            "text": "The more obvious benefit from quantization is  significantly reduced bandwidth and storage . For instance, using INT8 for weights and activations consumes 4x less overall bandwidth compared to FP32. \nAdditionally integer compute is  faster  than floating point compute. It is also much more  area and energy efficient :      INT8 Operation  Energy Saving vs FP32  Area Saving vs FP32      Add  30x  116x    Multiply  18.5x  27x     ( Dally, 2015 )  Note that very aggressive quantization can yield even more efficiency. If weights are binary (-1, 1) or ternary (-1, 0, 1 using 2-bits), then convolution and fully-connected layers can be computed with additions and subtractions only, removing multiplications completely. If activations are binary as well, then additions can also be removed, in favor of bitwise operations ( Rastegari et al., 2016 ).", 
-            "title": "Motivation: Overall Efficiency"
-        }, 
-        {
-            "location": "/quantization/index.html#integer-vs-fp32", 
-            "text": "There are two main attributes when discussing a numerical format. The first is  dynamic range , which refers to the range of representable numbers. The second one is how many values can be represented within the dynamic range, which in turn determines the  precision / resolution  of the format (the distance between two numbers). \nFor all integer formats, the dynamic range is  [-2^{n-1} .. 2^{n-1}-1] , where  n  is the number of bits. So for INT8 the range is  [-128 .. 127] , and for INT4 it is  [-16 .. 15]  (we're limiting ourselves to signed integers for now). The number of representable values is  2^n .\nContrast that with FP32, where the dynamic range is  \\pm 3.4\\ x\\ 10^{38} , and approximately  4.2\\ x\\ 10^9  values can be represented. \nWe can immediately see that FP32 is much more  versatile , in that it is able to represent a wide range of distributions accurately. This is a nice property for deep learning models, where the distributions of weights and activations are usually very different (at least in dynamic range). In addition the dynamic range can differ between layers in the model. \nIn order to be able to represent these different distributions with an integer format, a  scale factor  is used to map the dynamic range of the tensor to the integer format range. But still we remain with the issue of having a significantly lower number of representable values, that is - much lower resolution. \nNote that this scale factor is, in most cases, a floating-point number. Hence, even when using integer numerics, some floating-point computations remain.  Courbariaux et al., 2014  scale using only shifts, eliminating the floating point operation. In  GEMMLWOP , the FP32 scale factor is approximated using an integer or fixed-point multiplication followed by a shift operation. In many cases the effect of this approximation on accuracy is negligible.", 
-            "title": "Integer vs. FP32"
-        }, 
-        {
-            "location": "/quantization/index.html#avoiding-overflows", 
-            "text": "Convolution and fully connected layers involve the storing of intermediate results in accumulators. Due to the limited dynamic range of integer formats, if we would use the same bit-width for the weights and activation,  and  for the accumulators, we would likely overflow very quickly. Therefore, accumulators are usually implemented with higher bit-widths. \nThe result of multiplying two  n -bit integers is, at most, a  2n -bit number. In convolution layers, such multiplications are accumulated  c\\cdot k^2  times, where  c  is the number of input channels and  k  is the kernel width (assuming a square kernel). Hence, to avoid overflowing, the accumulator should be  2n + M -bits wide, where M is at least  log_2(c\\cdot k^2) . In many cases 32-bit accumulators are used, however for INT4 and lower it might be possible to use less than 32 -bits, depending on the expected use cases and layer widths.", 
-            "title": "Avoiding Overflows"
-        }, 
-        {
-            "location": "/quantization/index.html#conservative-quantization-int8", 
-            "text": "In many cases, taking a model trained for FP32 and directly quantizing it to INT8, without any re-training, can result in a relatively low loss of accuracy (which may or may not be acceptable, depending on the use case). Some fine-tuning can further improve the accuracy ( Gysel at al., 2018 ). \nAs mentioned above, a scale factor is used to adapt the dynamic range of the tensor at hand to that of the integer format. This scale factor needs to be calculated per-layer per-tensor. The simplest way is to map the min/max values of the float tensor to the min/max of the integer format. For weights and biases this is easy, as they are set once training is complete. For activations, the min/max float values can be obtained \"online\" during inference, or \"offline\".   Offline  means gathering activations statistics before deploying the model, either during training or by running a few \"calibration\" batches on the trained FP32 model. Based on these gathered statistics, the scaled factors are calculated and are fixed once the model is deployed. This method has the risk of encountering values outside the previously observed ranges at runtime. These values will be clipped, which might lead to accuracy degradation.  Online  means calculating the min/max values for each tensor dynamically during runtime. In this method clipping cannot occur, however the added computation resources required to calculate the min/max values at runtime might be prohibitive.    It is important to note, however, that the full float range of an activations tensor usually includes elements which are statistically outliers. These values can be discarded by using a narrower min/max range, effectively allowing some clipping to occur in favor of increasing the resolution provided to the part of the distribution containing most of the information. A simple method which can yield nice results is to simply use an average of the observed min/max values instead of the actual values. Alternatively, statistical measures can be used to intelligently select where to clip the original range in order to preserve as much information as possible ( Migacz, 2017 ). Going further,  Banner et al., 2018  have proposed a method for analytically computing the clipping value under certain conditions.  Another possible optimization point is  scale-factor scope . The most common way is use a single scale-factor per-layer, but it is also possible to calculate a scale-factor per-channel. This can be beneficial if the weight distributions vary greatly between channels.  When used to directly quantize a model without re-training, as described so far, this method is commonly referred to as  post-training quantization . However, recent publications have shown that there are cases where post-training quantization to INT8 doesn't preserve accuracy ( Benoit et al., 2018 ,  Krishnamoorthi, 2018 ). Namely, smaller models such as MobileNet seem to not respond as well to post-training quantization, presumabley due to their smaller representational capacity. In such cases,  quantization-aware training  is used.", 
-            "title": "\"Conservative\" Quantization: INT8"
-        }, 
-        {
-            "location": "/quantization/index.html#aggressive-quantization-int4-and-lower", 
-            "text": "Naively quantizing a FP32 model to INT4 and lower usually incurs significant accuracy degradation. Many works have tried to mitigate this effect. They usually employ one or more of the following concepts in order to improve model accuracy:   Training / Re-Training : For INT4 and lower, training is required in order to obtain reasonable accuracy. The training loop is modified to take quantization into account. See details in the  next section .  Zhou S et al., 2016  have shown that bootstrapping the quantized model with trained FP32 weights leads to higher accuracy, as opposed to training from scratch. Other methods  require  a trained FP32 model, either as a starting point ( Zhou A et al., 2017 ), or as a teacher network in a knowledge distillation training setup (see  here ).  Replacing the activation function : The most common activation function in vision models is ReLU, which is unbounded. That is - its dynamic range is not limited for positive inputs. This is very problematic for INT4 and below due to the very limited range and resolution. Therefore, most methods replace ReLU with another function which is bounded. In some cases a clipping function with hard coded values is used ( Zhou S et al., 2016 ,  Mishra et al., 2018 ). Another method learns the clipping value per layer, with better results ( Choi et al., 2018 ). Once the clipping value is set, the scale factor used for quantization is also set, and no further calibration steps are required (as opposed to INT8 methods described above).  Modifying network structure :  Mishra et al., 2018  try to compensate for the loss of information due to quantization by using wider layers (more channels).  Lin et al., 2017  proposed a binary quantization method in which a single FP32 convolution is replaced with multiple binary convolutions, each scaled to represent a different \"base\", covering a larger dynamic range overall.  First and last layer : Many methods do not quantize the first and last layer of the model. It has been observed by  Han et al., 2015  that the first convolutional layer is more sensitive to weights pruning, and some quantization works cite the same reason and show it empirically ( Zhou S et al., 2016 ,  Choi et al., 2018 ). Some works also note that these layers usually constitute a very small portion of the overall computation within the model, further reducing the motivation to quantize them ( Rastegari et al., 2016 ). Most methods keep the first and last layers at FP32. However,  Choi et al., 2018  showed that \"conservative\" quantization of these layers, e.g. to INT8, does not reduce accuracy.  Iterative quantization : Most methods quantize the entire model at once.  Zhou A et al., 2017  employ an iterative method, which starts with a trained FP32 baseline, and quantizes only a portion of the model at the time followed by several epochs of re-training to recover the accuracy loss from quantization.  Mixed Weights and Activations Precision : It has been observed that activations are more sensitive to quantization than weights ( Zhou S et al., 2016 ). Hence it is not uncommon to see experiments with activations quantized to a higher precision compared to weights. Some works have focused solely on quantizing weights, keeping the activations at FP32 ( Li et al., 2016 ,  Zhu et al., 2016 ).", 
-            "title": "\"Aggressive\" Quantization: INT4 and Lower"
-        }, 
-        {
-            "location": "/quantization/index.html#quantization-aware-training", 
-            "text": "As mentioned above, in order to minimize the loss of accuracy from \"aggressive\" quantization, many methods that target INT4 and lower (and in some cases for INT8 as well) involve training the model in a way that considers the quantization. This means training with quantization of weights and activations \"baked\" into the training procedure. The training graph usually looks like this:   A full precision copy of the weights is maintained throughout the training process (\"weights_fp\" in the diagram). Its purpose is to accumulate the small changes from the gradients without loss of precision (Note that the quantization of the weights is an integral part of the training graph, meaning that we back-propagate through it as well). Once the model is trained, only the quantized weights are used for inference. \nIn the diagram we show \"layer N\" as the conv + batch-norm + activation combination, but the same applies to fully-connected layers, element-wise operations, etc. During training, the operations within \"layer N\" can still run in full precision, with the \"quantize\" operations in the boundaries ensuring discrete-valued weights and activations. This is sometimes called \"simulated quantization\".", 
-            "title": "Quantization-Aware Training"
-        }, 
-        {
-            "location": "/quantization/index.html#straight-through-estimator", 
-            "text": "An important question in this context is how to back-propagate through the quantization functions. These functions are discrete-valued, hence their derivative is 0 almost everywhere. So, using their gradients as-is would severely hinder the learning process. An approximation commonly used to overcome this issue is the \"straight-through estimator\" (STE) ( Hinton et al., 2012 ,  Bengio, 2013 ), which simply passes the gradient through these functions as-is.", 
-            "title": "Straight-Through Estimator"
-        }, 
-        {
-            "location": "/quantization/index.html#references", 
-            "text": "William Dally . High-Performance Hardware for Machine Learning.  Tutorial, NIPS, 2015   Mohammad Rastegari, Vicente Ordone, Joseph Redmon and Ali Farhadi . XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.  ECCV, 2016   Matthieu Courbariaux, Yoshua Bengio and Jean-Pierre David . Training deep neural networks with low precision multiplications.  arxiv:1412.7024   Philipp Gysel, Jon Pimentel, Mohammad Motamedi and Soheil Ghiasi . Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks.  IEEE Transactions on Neural Networks and Learning Systems, 2018   Szymon Migacz . 8-bit Inference with TensorRT.  GTC San Jose, 2017   Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu and Yuheng Zou . DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.  arxiv:1606.06160   Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu and Yurong Chen . Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights.  ICLR, 2017   Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook and Debbie Marr . WRPN: Wide Reduced-Precision Networks.  ICLR, 2018   Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan and Kailash Gopalakrishnan . PACT: Parameterized Clipping Activation for Quantized Neural Networks.  arxiv:1805.06085   Xiaofan Lin, Cong Zhao and Wei Pan . Towards Accurate Binary Convolutional Neural Network.  NIPS, 2017   Song Han, Jeff Pool, John Tran and William Dally . Learning both Weights and Connections for Efficient Neural Network.  NIPS, 2015   Fengfu Li, Bo Zhang and Bin Liu . Ternary Weight Networks.  arxiv:1605.04711   Chenzhuo Zhu, Song Han, Huizi Mao and William J. Dally . Trained Ternary Quantization.  arxiv:1612.01064   Yoshua Bengio, Nicholas Leonard and Aaron Courville . Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.  arxiv:1308.3432   Geoffrey Hinton, Nitish Srivastava, Kevin Swersky, Tijmen Tieleman and Abdelrahman Mohamed . Neural Networks for Machine Learning.  Coursera, video lectures, 2012   Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam and Dmitry Kalenichenko . Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference.  ECCV, 2018   Raghuraman Krishnamoorthi . Quantizing deep convolutional networks for efficient inference: A whitepaper  arxiv:1806.08342   Ron Banner, Yury Nahshan, Elad Hoffer and Daniel Soudry . ACIQ: Analytical Clipping for Integer Quantization of neural networks  arxiv:1810.05723", 
-            "title": "References"
-        }, 
-        {
-            "location": "/knowledge_distillation/index.html", 
-            "text": "Knowledge Distillation\n\n\n(For details on how to train a model with knowledge distillation in Distiller, see \nhere\n)\n\n\nKnowledge distillation is model compression method in which a small model is trained to mimic a pre-trained, larger model (or ensemble of models). This training setting is sometimes referred to as \"teacher-student\", where the large model is the teacher and the small model is the student (we'll be using these terms interchangeably).\n\n\nThe method was first proposed by \nBucila et al., 2006\n and generalized by \nHinton et al., 2015\n. The implementation in Distiller is based on the latter publication. Here we'll provide a summary of the method. For more information the reader may refer to the paper (a \nvideo lecture\n with \nslides\n is also available).\n\n\nIn distillation, knowledge is transferred from the teacher model to the student by minimizing a loss function in which the target is the distribution of class probabilities predicted by the teacher model. That is - the output of a softmax function on the teacher model's logits. However, in many cases, this probability distribution has the correct class at a very high probability, with all other class probabilities very close to 0. As such, it doesn't provide much information beyond the ground truth labels already provided in the dataset. To tackle this issue, \nHinton et al., 2015\n introduced the concept of \"softmax temperature\". The probability \np_i\n of class \ni\n is calculated from the logits \nz\n as:\n\n\n\n\np_i = \\frac{exp\\left(\\frac{z_i}{T}\\right)}{\\sum_{j} \\exp\\left(\\frac{z_j}{T}\\right)}\n\n\n\n\nwhere \nT\n is the temperature parameter. When \nT=1\n we get the standard softmax function. As \nT\n grows, the probability distribution generated by the softmax function becomes softer, providing more information as to which classes the teacher found more similar to the predicted class. Hinton calls this the \"dark knowledge\" embedded in the teacher model, and it is this dark knowledge that we are transferring to the student model in the distillation process. When computing the loss function vs. the teacher's soft targets, we use the same value of \nT\n to compute the softmax on the student's logits. We call this loss the \"distillation loss\".\n\n\nHinton et al., 2015\n found that it is also beneficial to train the distilled model to produce the correct labels (based on the ground truth) in addition to the teacher's soft-labels. Hence, we also calculate the \"standard\" loss between the student's predicted class probabilities and the ground-truth labels (also called \"hard labels/targets\"). We dub this loss the \"student loss\". When calculating the class probabilities for the student loss we use \nT = 1\n. \n\n\nThe overall loss function, incorporating both distillation and student losses, is calculated as:\n\n\n\n\n\\mathcal{L}(x;W) = \\alpha * \\mathcal{H}(y, \\sigma(z_s; T=1)) + \\beta * \\mathcal{H}(\\sigma(z_t; T=\\tau), \\sigma(z_s, T=\\tau))\n\n\n\n\nwhere \nx\n is the input, \nW\n are the student model parameters, \ny\n is the ground truth label, \n\\mathcal{H}\n is the cross-entropy loss function, \n\\sigma\n is the softmax function parameterized by the temperature \nT\n, and \n\\alpha\n and \n\\beta\n are coefficients. \nz_s\n and \nz_t\n are the logits of the student and teacher respectively.\n\n\n\n\nNew Hyper-Parameters\n\n\nIn general \n\\tau\n, \n\\alpha\n and \n\\beta\n are hyper parameters.\n\n\nIn their experiments, \nHinton et al., 2015\n use temperature values ranging from 1 to 20. They note that empirically, when the student model is very small compared to the teacher model, lower temperatures work better. This makes sense if we consider that as we raise the temperature, the resulting soft-labels distribution becomes richer in information, and a very small model might not be able to capture all of this information. However, there's no clear way to predict up front what kind of capacity for information the student model will have.\n\n\nWith regards to \n\\alpha\n and \n\\beta\n, \nHinton et al., 2015\n use a weighted average between the distillation loss and the student loss. That is, \n\\beta = 1 - \\alpha\n. They note that in general, they obtained the best results when setting \n\\alpha\n to be much smaller than \n\\beta\n (although in one of their experiments they use \n\\alpha = \\beta = 0.5\n).  Other works which utilize knowledge distillation don't use a weighted average. Some set \n\\alpha = 1\n while leaving \n\\beta\n tunable, while others don't set any constraints.\n\n\nCombining with Other Model Compression Techniques\n\n\nIn the \"basic\" scenario, the smaller (student) model is a pre-defined architecture which just has a smaller number of parameters compared to the teacher model. For example, we could train ResNet-18 by distilling knowledge from ResNet-34. But, a model with smaller capacity can also be obtained by other model compression techniques - sparsification and/or quantization. So, for example, we could train a 4-bit ResNet-18 model with some method using quantization-aware training, and use a distillation loss function as described above. In that case, the teacher model can even be a FP32 ResNet-18 model. Same goes for pruning and regularization.\n\n\nTann et al., 2017\n, \nMishra and Marr, 2018\n and \nPolino et al., 2018\n are some works that combine knowledge distillation with \nquantization\n. \nTheis et al., 2018\n and \nAshok et al., 2018\n combine distillation with \npruning\n.\n\n\nReferences\n\n\n\n\nCristian Bucila, Rich Caruana, and Alexandru Niculescu-Mizil\n. Model Compression. \nKDD, 2006\n\n\n\n\n\nGeoffrey Hinton, Oriol Vinyals and Jeff Dean\n. Distilling the Knowledge in a Neural Network. \narxiv:1503.02531\n\n\n\n\n\nHokchhay Tann, Soheil Hashemi, Iris Bahar and Sherief Reda\n. Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks. \nDAC, 2017\n\n\n\n\n\nAsit Mishra and Debbie Marr\n. Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. \nICLR, 2018\n\n\n\n\n\nAntonio Polino, Razvan Pascanu and Dan Alistarh\n. Model compression via distillation and quantization. \nICLR, 2018\n\n\n\n\n\nAnubhav Ashok, Nicholas Rhinehart, Fares Beainy and Kris M. Kitani\n. N2N learning: Network to Network Compression via Policy Gradient Reinforcement Learning. \nICLR, 2018\n\n\n\n\n\nLucas Theis, Iryna Korshunova, Alykhan Tejani and Ferenc Husz\u00e1r\n. Faster gaze prediction with dense networks and Fisher pruning. \narxiv:1801.05787", 
-            "title": "Knowledge Distillation"
-        }, 
-        {
-            "location": "/knowledge_distillation/index.html#knowledge-distillation", 
-            "text": "(For details on how to train a model with knowledge distillation in Distiller, see  here )  Knowledge distillation is model compression method in which a small model is trained to mimic a pre-trained, larger model (or ensemble of models). This training setting is sometimes referred to as \"teacher-student\", where the large model is the teacher and the small model is the student (we'll be using these terms interchangeably).  The method was first proposed by  Bucila et al., 2006  and generalized by  Hinton et al., 2015 . The implementation in Distiller is based on the latter publication. Here we'll provide a summary of the method. For more information the reader may refer to the paper (a  video lecture  with  slides  is also available).  In distillation, knowledge is transferred from the teacher model to the student by minimizing a loss function in which the target is the distribution of class probabilities predicted by the teacher model. That is - the output of a softmax function on the teacher model's logits. However, in many cases, this probability distribution has the correct class at a very high probability, with all other class probabilities very close to 0. As such, it doesn't provide much information beyond the ground truth labels already provided in the dataset. To tackle this issue,  Hinton et al., 2015  introduced the concept of \"softmax temperature\". The probability  p_i  of class  i  is calculated from the logits  z  as:   p_i = \\frac{exp\\left(\\frac{z_i}{T}\\right)}{\\sum_{j} \\exp\\left(\\frac{z_j}{T}\\right)}   where  T  is the temperature parameter. When  T=1  we get the standard softmax function. As  T  grows, the probability distribution generated by the softmax function becomes softer, providing more information as to which classes the teacher found more similar to the predicted class. Hinton calls this the \"dark knowledge\" embedded in the teacher model, and it is this dark knowledge that we are transferring to the student model in the distillation process. When computing the loss function vs. the teacher's soft targets, we use the same value of  T  to compute the softmax on the student's logits. We call this loss the \"distillation loss\".  Hinton et al., 2015  found that it is also beneficial to train the distilled model to produce the correct labels (based on the ground truth) in addition to the teacher's soft-labels. Hence, we also calculate the \"standard\" loss between the student's predicted class probabilities and the ground-truth labels (also called \"hard labels/targets\"). We dub this loss the \"student loss\". When calculating the class probabilities for the student loss we use  T = 1 .   The overall loss function, incorporating both distillation and student losses, is calculated as:   \\mathcal{L}(x;W) = \\alpha * \\mathcal{H}(y, \\sigma(z_s; T=1)) + \\beta * \\mathcal{H}(\\sigma(z_t; T=\\tau), \\sigma(z_s, T=\\tau))   where  x  is the input,  W  are the student model parameters,  y  is the ground truth label,  \\mathcal{H}  is the cross-entropy loss function,  \\sigma  is the softmax function parameterized by the temperature  T , and  \\alpha  and  \\beta  are coefficients.  z_s  and  z_t  are the logits of the student and teacher respectively.", 
-            "title": "Knowledge Distillation"
-        }, 
-        {
-            "location": "/knowledge_distillation/index.html#new-hyper-parameters", 
-            "text": "In general  \\tau ,  \\alpha  and  \\beta  are hyper parameters.  In their experiments,  Hinton et al., 2015  use temperature values ranging from 1 to 20. They note that empirically, when the student model is very small compared to the teacher model, lower temperatures work better. This makes sense if we consider that as we raise the temperature, the resulting soft-labels distribution becomes richer in information, and a very small model might not be able to capture all of this information. However, there's no clear way to predict up front what kind of capacity for information the student model will have.  With regards to  \\alpha  and  \\beta ,  Hinton et al., 2015  use a weighted average between the distillation loss and the student loss. That is,  \\beta = 1 - \\alpha . They note that in general, they obtained the best results when setting  \\alpha  to be much smaller than  \\beta  (although in one of their experiments they use  \\alpha = \\beta = 0.5 ).  Other works which utilize knowledge distillation don't use a weighted average. Some set  \\alpha = 1  while leaving  \\beta  tunable, while others don't set any constraints.", 
-            "title": "New Hyper-Parameters"
-        }, 
-        {
-            "location": "/knowledge_distillation/index.html#references", 
-            "text": "Cristian Bucila, Rich Caruana, and Alexandru Niculescu-Mizil . Model Compression.  KDD, 2006   Geoffrey Hinton, Oriol Vinyals and Jeff Dean . Distilling the Knowledge in a Neural Network.  arxiv:1503.02531   Hokchhay Tann, Soheil Hashemi, Iris Bahar and Sherief Reda . Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks.  DAC, 2017   Asit Mishra and Debbie Marr . Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy.  ICLR, 2018   Antonio Polino, Razvan Pascanu and Dan Alistarh . Model compression via distillation and quantization.  ICLR, 2018   Anubhav Ashok, Nicholas Rhinehart, Fares Beainy and Kris M. Kitani . N2N learning: Network to Network Compression via Policy Gradient Reinforcement Learning.  ICLR, 2018   Lucas Theis, Iryna Korshunova, Alykhan Tejani and Ferenc Husz\u00e1r . Faster gaze prediction with dense networks and Fisher pruning.  arxiv:1801.05787", 
-            "title": "References"
-        }, 
-        {
-            "location": "/conditional_computation/index.html", 
-            "text": "Conditional Computation\n\n\nConditional Computation refers to a class of algorithms in which each input sample uses a different part of the model, such that on average the compute, latency or power (depending on our objective) is reduced.\nTo quote \nBengio et. al\n\n\n\n\n\"Conditional computation refers to activating only some of the units in a network, in an input-dependent fashion. For example, if we think we\u2019re looking at a car, we only need to compute the activations of the vehicle detecting units, not of all features that a network could possible compute. The immediate effect of activating fewer units is that propagating information through the network will be faster, both at training as well as at test time. However, one needs to be able to decide in an intelligent fashion which units to turn on and off, depending on the input data. This is typically achieved with some form of gating structure, learned in parallel with the original network.\"\n\n\n\n\nAs usual, there are several approaches to implement Conditional Computation:\n\n\n\n\nSun et. al\n use several expert CNN, each trained on a different task, and combine them to one large network.\n\n\nZheng et. al\n use cascading, an idea which may be familiar to you from Viola-Jones face detection.\n\n\nTheodorakopoulos et. al\n add small layers that learn which filters to use per input sample, and then enforce that during inference (LKAM module).\n\n\nIoannou et. al\n introduce Conditional Networks: that \"can be thought of as: i) decision trees augmented with data transformation\noperators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions\"\n\n\nBolukbasi et. al\n \"learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example.\"\n\n\n\n\nConditional Computation is especially useful for real-time, latency-sensitive applicative.\n\nIn Distiller we currently have implemented a variant of Early Exit.\n\n\nReferences\n\n\n \nEmmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup.\n\n    \nConditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition\n, arXiv:1511.06297v2, 2016.\n\n\n\n\n\nY. Sun, X.Wang, and X. Tang.\n\n    \nDeep Convolutional Network Cascade for Facial Point Detection\n. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2014\n\n\n\n\n\nX. Zheng, W.Ouyang, and X.Wang.\n \nMulti-Stage Contextual Deep Learning for Pedestrian Detection.\n In Proc. IEEE Intl Conf. on Computer Vision (ICCV), 2014.\n\n\n\n\n\nI. Theodorakopoulos, V. Pothos, D. Kastaniotis and N. Fragoulis1.\n \nParsimonious Inference on Convolutional Neural Networks: Learning and applying on-line kernel activation rules.\n Irida Labs S.A, January 2017\n\n\n\n\n\nTolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama\n \nAdaptive Neural Networks for Efficient Inference\n.  Proceedings of the 34th International Conference on Machine Learning, PMLR 70:527-536, 2017.\n\n\n\n\n\nYani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, Antonio Criminisi\n.\n    \nDecision Forests, Convolutional Networks and the Models in-Between\n, arXiv:1511.06297v2, 2016.", 
-            "title": "Conditional Computation"
-        }, 
-        {
-            "location": "/conditional_computation/index.html#conditional-computation", 
-            "text": "Conditional Computation refers to a class of algorithms in which each input sample uses a different part of the model, such that on average the compute, latency or power (depending on our objective) is reduced.\nTo quote  Bengio et. al   \"Conditional computation refers to activating only some of the units in a network, in an input-dependent fashion. For example, if we think we\u2019re looking at a car, we only need to compute the activations of the vehicle detecting units, not of all features that a network could possible compute. The immediate effect of activating fewer units is that propagating information through the network will be faster, both at training as well as at test time. However, one needs to be able to decide in an intelligent fashion which units to turn on and off, depending on the input data. This is typically achieved with some form of gating structure, learned in parallel with the original network.\"   As usual, there are several approaches to implement Conditional Computation:   Sun et. al  use several expert CNN, each trained on a different task, and combine them to one large network.  Zheng et. al  use cascading, an idea which may be familiar to you from Viola-Jones face detection.  Theodorakopoulos et. al  add small layers that learn which filters to use per input sample, and then enforce that during inference (LKAM module).  Ioannou et. al  introduce Conditional Networks: that \"can be thought of as: i) decision trees augmented with data transformation\noperators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions\"  Bolukbasi et. al  \"learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example.\"   Conditional Computation is especially useful for real-time, latency-sensitive applicative. \nIn Distiller we currently have implemented a variant of Early Exit.", 
-            "title": "Conditional Computation"
-        }, 
-        {
-            "location": "/conditional_computation/index.html#references", 
-            "text": "Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup. \n     Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition , arXiv:1511.06297v2, 2016.   Y. Sun, X.Wang, and X. Tang. \n     Deep Convolutional Network Cascade for Facial Point Detection . In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2014   X. Zheng, W.Ouyang, and X.Wang.   Multi-Stage Contextual Deep Learning for Pedestrian Detection.  In Proc. IEEE Intl Conf. on Computer Vision (ICCV), 2014.   I. Theodorakopoulos, V. Pothos, D. Kastaniotis and N. Fragoulis1.   Parsimonious Inference on Convolutional Neural Networks: Learning and applying on-line kernel activation rules.  Irida Labs S.A, January 2017   Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama   Adaptive Neural Networks for Efficient Inference .  Proceedings of the 34th International Conference on Machine Learning, PMLR 70:527-536, 2017.   Yani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, Antonio Criminisi .\n     Decision Forests, Convolutional Networks and the Models in-Between , arXiv:1511.06297v2, 2016.", 
-            "title": "References"
-        }, 
-        {
-            "location": "/algo_pruning/index.html", 
-            "text": "Weights Pruning Algorithms\n\n\n\n\nMagnitude Pruner\n\n\nThis is the most basic pruner: it applies a thresholding function, \\(thresh(.)\\), on each element, \\(w_i\\), of a weights tensor.  A different threshold can be used for each layer's weights tensor.\n\nBecause the threshold is applied on individual elements, this pruner belongs to the element-wise pruning algorithm family.\n\n\n\\[ thresh(w_i)=\\left\\lbrace\n\\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} }\n\\right\\rbrace \\]\n\n\nSensitivity Pruner\n\n\nFinding a threshold magnitude per layer is daunting, especially since each layer's elements have different average absolute values.  We can take advantage of the fact that the weights of convolutional and fully connected layers exhibit a Gaussian distribution with a mean value roughly zero, to avoid using a direct threshold based on the values of each specific tensor.\n\n\nThe diagram below shows the distribution the weights tensor of the first convolutional layer, and first fully-connected layer in TorchVision's pre-trained Alexnet model.  You can see that they have an approximate Gaussian distribution.\n\n\n \n\n\nThe distributions of Alexnet conv1 and fc1 layers\n\n\nWe use the standard deviation of the weights tensor as a sort of normalizing factor between the different weights tensors.  For example, if a tensor is Normally distributed, then about 68% of the elements have an absolute value less than the standard deviation (\\(\\sigma\\)) of the tensor.  Thus, if we set the threshold to \\(s*\\sigma\\), then basically we are thresholding \\(s * 68\\%\\) of the tensor elements.  \n\n\n\\[ thresh(w_i)=\\left\\lbrace\n\\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} }\n\\right\\rbrace \\]\n\n\n\\[\n\\lambda = s * \\sigma_l \\;\\;\\; where\\; \\sigma_l\\; is \\;the \\;std \\;of \\;layer \\;l \\;as \\;measured \\;on \\;the \\;dense \\;model\n\\]\n\n\nHow do we choose this \\(s\\) multiplier?\n\n\nIn \nLearning both Weights and Connections for Efficient Neural Networks\n the authors write:\n\n\n\n\n\"We used the sensitivity results to find each layer\u2019s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer\u2019s weights\n\n\n\n\nSo the results of executing pruning sensitivity analysis on the tensor, gives us a good starting guess at \\(s\\).  Sensitivity analysis is an empirical method, and we still have to spend time to hone in on the exact multiplier value.\n\n\nMethod of Operation\n\n\n\n\nStart by running a pruning sensitivity analysis on the model.  \n\n\nThen use the results to set and tune the threshold of each layer, but instead of using a direct threshold use a sensitivity parameter which is multiplied by the standard-deviation of the initial weight-tensor's distribution.\n\n\n\n\nSchedule\n\n\nIn their \npaper\n Song Han et al. use iterative pruning and change the value of the \\(s\\) multiplier at each pruning step.  Distiller's \nSensitivityPruner\n works differently: the value \\(s\\) is set once based on a one-time calculation of the standard-deviation of the tensor (the first time we prune), and relies on the fact that as the tensor is pruned, more elements are \"pulled\" toward the center of the distribution and thus more elements gets pruned.\n\n\nThis actually works quite well as we can see in the diagram below.  This is a TensorBoard screen-capture from Alexnet training, which shows how this method starts off pruning very aggressively, but then slowly reduces the pruning rate.\n\n\n\nWe use a simple iterative-pruning schedule such as: \nPrune every second epoch starting at epoch 0, and ending at epoch 38.\n  This excerpt from \nalexnet.schedule_sensitivity.yaml\n shows how this iterative schedule is conveyed in Distiller scheduling configuration YAML:\n\n\npruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.625\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n\n\n\nLevel Pruner\n\n\nClass \nSparsityLevelParameterPruner\n uses a similar method to go around specifying specific thresholding magnitudes.\nInstead of specifying a threshold magnitude, you specify a target sparsity level (expressed as a fraction, so 0.5 means 50% sparsity).  Essentially this pruner also uses a pruning criteria based on the magnitude of each tensor element, but it has the advantage that you can aim for an exact and specific sparsity level.\n\nThis pruner is much more stable compared to \nSensitivityPruner\n because the target sparsity level is not coupled to the actual magnitudes of the elements. Distiller's \nSensitivityPruner\n is unstable because the final sparsity level depends on the convergence pattern of the tensor distribution.  Song Han's methodology of using several different values for the multiplier \\(s\\), and the recalculation of the standard-deviation at each pruning phase, probably gives it stability, but requires much more hyper-parameters (this is the reason we have not implemented it thus far).  \n\n\nTo set the target sparsity levels, you can once again use pruning sensitivity analysis to make better guesses at the correct sparsity level of each\n\n\nMethod of Operation\n\n\n\n\nSort the weights in the specified layer by their absolute values. \n\n\nMask to zero the smallest magnitude weights until the desired sparsity level is reached.\n\n\n\n\nSplicing Pruner\n\n\nIn \nDynamic Network Surgery for Efficient DNNs\n Guo et. al propose that network pruning and splicing work in tandem.  A \nSpilicingPruner\n is a pruner that both prunes and splices connections and works best with a Dynamic Network Surgery schedule, which, for example, configures the \nPruningPolicy\n to mask weights only during the forward pass.\n\n\nAutomated Gradual Pruner (AGP)\n\n\nIn \nTo prune, or not to prune: exploring the efficacy of pruning for model compression\n, authors Michael Zhu and Suyog Gupta provide an algorithm to schedule a Level Pruner which Distiller implements in \nAutomatedGradualPruner\n.\n\n\n\n\n\n\"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value \\(s_i\\) (usually 0) to a \ufb01nal sparsity value \\(s_f\\) over a span of n pruning steps.\nThe intuition behind this sparsity function in equation (1)  is to prune the network rapidly in the initial phase when the redundant connections are\nabundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.\"\"\n\n\n\n\n\n\nYou can play with the scheduling parameters in the \nagp_schedule.ipynb notebook\n.\n\n\nThe authors describe AGP:\n\n\n\n\n\n\nOur automated gradual pruning algorithm prunes the smallest magnitude weights to achieve a preset level of network sparsity.\n\n\nDoesn't require much hyper-parameter tuning\n\n\nShown to perform well across different models\n\n\nDoes not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable.\n\n\n\n\n\n\nRNN Pruner\n\n\nThe authors of \nExploring Sparsity in Recurrent Neural Networks\n, Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, \"propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network.\"  They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training.  They show pruning of RNN, GRU, LSTM and embedding layers.\n\n\nDistiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm.\n\n\n\n\nStructure Pruners\n\n\nElement-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation.  Structure pruners, remove entire \"structures\", such as kernels, filters, and even entire feature-maps.\n\n\nStructure Ranking Pruners\n\n\nRanking pruners use some criterion to rank the structures in a tensor, and then prune the tensor to a specified level. In principle, these pruners perform one-shot pruning, but can be combined with automatic pruning-level scheduling, such as AGP (see below).\nIn \nPruning Filters for Efficient ConvNets\n the authors use filter ranking, with \none-shot pruning\n followed by fine-tuning.  The authors of \nExploiting Sparseness in Deep Neural Networks for Large Vocabulary Speech Recognition\n also use a one-shot pruning schedule, for fully-connected layers, and they provide an explanation:\n\n\n\n\nFirst, after sweeping through the full training set several times the weights become relatively stable \u2014 they tend to remain either large or small magnitudes. Second, in a stabilized model, the importance of the connection is approximated well by the magnitudes of the weights (times the magnitudes of the corresponding input values, but these are relatively uniform within each layer since on the input layer, features are normalized to zero-mean and unit-variance, and hidden-layer values are probabilities)\n\n\n\n\nL1RankedStructureParameterPruner\n\n\nThe \nL1RankedStructureParameterPruner\n pruner calculates the magnitude of some \"structure\", orders all of the structures based on some magnitude function and the \nm\n lowest ranking structures are pruned away.  This pruner performs ranking of structures using the mean of the absolute value of the structure as the representative of the structure magnitude.  The absolute mean does not depend on the size of the structure, so it is easier to use compared to just using the \\(L_1\\)-norm of the structure, and at the same time it is a good proxy of the \\(L_1\\)-norm.  Basically, you can think of \nmean(abs(t))\n as a form of normalization of the structure L1-norm by the length of the structure.  \nL1RankedStructureParameterPruner\n currently prunes weight filters, channels, and rows (for linear layers).\n\n\nActivationAPoZRankedFilterPruner\n\n\nThe \nActivationAPoZRankedFilterPruner\n pruner uses the activation channels mean APoZ (average percentage of zeros) to rank weight filters and prune a specified percentage of filters.\nThis method is called \nNetwork Trimming\n from the research paper:\n\"Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures\",\n    Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016\n    https://arxiv.org/abs/1607.03250  \n\n\nGradientRankedFilterPruner\n\n\nThe \nGradientRankedFilterPruner\n tries to asses the importance of weight filters using the product of their gradients and the filter value.  \n\n\nRandomRankedFilterPruner\n\n\nFor research purposes we may want to compare the results of some structure-ranking pruner to a random structure-ranking.  The \nRandomRankedFilterPruner\n pruner can be used for this purpose.\n\n\nAutomated Gradual Pruner (AGP) for Structures\n\n\nThe idea of a mathematical formula controlling the sparsity level growth is very useful and \nStructuredAGP\n extends the implementation to structured pruning.\n\n\nPruner Compositions\n\n\nPruners can be combined to create new pruning schemes.  Specifically, with a few lines of code we currently marry the AGP sparsity level scheduler with our filter-ranking classes to create pruner compositions.  For each of these, we use AGP to decided how many filters to prune at each step, and we choose the filters to remove using one of the filter-ranking methods:\n\n\n\n\nL1RankedStructureParameterPruner_AGP\n\n\nActivationAPoZRankedFilterPruner_AGP\n\n\nGradientRankedFilterPruner_AGP\n\n\nRandomRankedFilterPruner_AGP\n\n\n\n\nHybrid Pruning\n\n\nIn a single schedule we can mix different pruning techniques.  For example, we might mix pruning and regularization.  Or structured pruning and element-wise pruning.  We can even apply different methods on the same tensor.  For example, we might want to perform filter pruning for a few epochs, then perform \nthinning\n and continue with element-wise pruning of the smaller network tensors.  This technique of mixing different methods we call Hybrid Pruning, and Distiller has a few example schedules.", 
-            "title": "Pruning"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#weights-pruning-algorithms", 
-            "text": "", 
-            "title": "Weights Pruning Algorithms"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#magnitude-pruner", 
-            "text": "This is the most basic pruner: it applies a thresholding function, \\(thresh(.)\\), on each element, \\(w_i\\), of a weights tensor.  A different threshold can be used for each layer's weights tensor. \nBecause the threshold is applied on individual elements, this pruner belongs to the element-wise pruning algorithm family.  \\[ thresh(w_i)=\\left\\lbrace\n\\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} }\n\\right\\rbrace \\]", 
-            "title": "Magnitude Pruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#sensitivity-pruner", 
-            "text": "Finding a threshold magnitude per layer is daunting, especially since each layer's elements have different average absolute values.  We can take advantage of the fact that the weights of convolutional and fully connected layers exhibit a Gaussian distribution with a mean value roughly zero, to avoid using a direct threshold based on the values of each specific tensor. \nThe diagram below shows the distribution the weights tensor of the first convolutional layer, and first fully-connected layer in TorchVision's pre-trained Alexnet model.  You can see that they have an approximate Gaussian distribution.     The distributions of Alexnet conv1 and fc1 layers  We use the standard deviation of the weights tensor as a sort of normalizing factor between the different weights tensors.  For example, if a tensor is Normally distributed, then about 68% of the elements have an absolute value less than the standard deviation (\\(\\sigma\\)) of the tensor.  Thus, if we set the threshold to \\(s*\\sigma\\), then basically we are thresholding \\(s * 68\\%\\) of the tensor elements.    \\[ thresh(w_i)=\\left\\lbrace\n\\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} }\n\\right\\rbrace \\]  \\[\n\\lambda = s * \\sigma_l \\;\\;\\; where\\; \\sigma_l\\; is \\;the \\;std \\;of \\;layer \\;l \\;as \\;measured \\;on \\;the \\;dense \\;model\n\\]  How do we choose this \\(s\\) multiplier?  In  Learning both Weights and Connections for Efficient Neural Networks  the authors write:   \"We used the sensitivity results to find each layer\u2019s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer\u2019s weights   So the results of executing pruning sensitivity analysis on the tensor, gives us a good starting guess at \\(s\\).  Sensitivity analysis is an empirical method, and we still have to spend time to hone in on the exact multiplier value.", 
-            "title": "Sensitivity Pruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#method-of-operation", 
-            "text": "Start by running a pruning sensitivity analysis on the model.    Then use the results to set and tune the threshold of each layer, but instead of using a direct threshold use a sensitivity parameter which is multiplied by the standard-deviation of the initial weight-tensor's distribution.", 
-            "title": "Method of Operation"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#schedule", 
-            "text": "In their  paper  Song Han et al. use iterative pruning and change the value of the \\(s\\) multiplier at each pruning step.  Distiller's  SensitivityPruner  works differently: the value \\(s\\) is set once based on a one-time calculation of the standard-deviation of the tensor (the first time we prune), and relies on the fact that as the tensor is pruned, more elements are \"pulled\" toward the center of the distribution and thus more elements gets pruned.  This actually works quite well as we can see in the diagram below.  This is a TensorBoard screen-capture from Alexnet training, which shows how this method starts off pruning very aggressively, but then slowly reduces the pruning rate.  We use a simple iterative-pruning schedule such as:  Prune every second epoch starting at epoch 0, and ending at epoch 38.   This excerpt from  alexnet.schedule_sensitivity.yaml  shows how this iterative schedule is conveyed in Distiller scheduling configuration YAML:  pruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.625\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2", 
-            "title": "Schedule"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#level-pruner", 
-            "text": "Class  SparsityLevelParameterPruner  uses a similar method to go around specifying specific thresholding magnitudes.\nInstead of specifying a threshold magnitude, you specify a target sparsity level (expressed as a fraction, so 0.5 means 50% sparsity).  Essentially this pruner also uses a pruning criteria based on the magnitude of each tensor element, but it has the advantage that you can aim for an exact and specific sparsity level. \nThis pruner is much more stable compared to  SensitivityPruner  because the target sparsity level is not coupled to the actual magnitudes of the elements. Distiller's  SensitivityPruner  is unstable because the final sparsity level depends on the convergence pattern of the tensor distribution.  Song Han's methodology of using several different values for the multiplier \\(s\\), and the recalculation of the standard-deviation at each pruning phase, probably gives it stability, but requires much more hyper-parameters (this is the reason we have not implemented it thus far).    To set the target sparsity levels, you can once again use pruning sensitivity analysis to make better guesses at the correct sparsity level of each", 
-            "title": "Level Pruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#method-of-operation_1", 
-            "text": "Sort the weights in the specified layer by their absolute values.   Mask to zero the smallest magnitude weights until the desired sparsity level is reached.", 
-            "title": "Method of Operation"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#splicing-pruner", 
-            "text": "In  Dynamic Network Surgery for Efficient DNNs  Guo et. al propose that network pruning and splicing work in tandem.  A  SpilicingPruner  is a pruner that both prunes and splices connections and works best with a Dynamic Network Surgery schedule, which, for example, configures the  PruningPolicy  to mask weights only during the forward pass.", 
-            "title": "Splicing Pruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#automated-gradual-pruner-agp", 
-            "text": "In  To prune, or not to prune: exploring the efficacy of pruning for model compression , authors Michael Zhu and Suyog Gupta provide an algorithm to schedule a Level Pruner which Distiller implements in  AutomatedGradualPruner .   \"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value \\(s_i\\) (usually 0) to a \ufb01nal sparsity value \\(s_f\\) over a span of n pruning steps.\nThe intuition behind this sparsity function in equation (1)  is to prune the network rapidly in the initial phase when the redundant connections are\nabundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.\"\"    You can play with the scheduling parameters in the  agp_schedule.ipynb notebook .  The authors describe AGP:    Our automated gradual pruning algorithm prunes the smallest magnitude weights to achieve a preset level of network sparsity.  Doesn't require much hyper-parameter tuning  Shown to perform well across different models  Does not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable.", 
-            "title": "Automated Gradual Pruner (AGP)"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#rnn-pruner", 
-            "text": "The authors of  Exploring Sparsity in Recurrent Neural Networks , Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, \"propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network.\"  They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training.  They show pruning of RNN, GRU, LSTM and embedding layers.  Distiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm.", 
-            "title": "RNN Pruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#structure-pruners", 
-            "text": "Element-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation.  Structure pruners, remove entire \"structures\", such as kernels, filters, and even entire feature-maps.", 
-            "title": "Structure Pruners"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#structure-ranking-pruners", 
-            "text": "Ranking pruners use some criterion to rank the structures in a tensor, and then prune the tensor to a specified level. In principle, these pruners perform one-shot pruning, but can be combined with automatic pruning-level scheduling, such as AGP (see below).\nIn  Pruning Filters for Efficient ConvNets  the authors use filter ranking, with  one-shot pruning  followed by fine-tuning.  The authors of  Exploiting Sparseness in Deep Neural Networks for Large Vocabulary Speech Recognition  also use a one-shot pruning schedule, for fully-connected layers, and they provide an explanation:   First, after sweeping through the full training set several times the weights become relatively stable \u2014 they tend to remain either large or small magnitudes. Second, in a stabilized model, the importance of the connection is approximated well by the magnitudes of the weights (times the magnitudes of the corresponding input values, but these are relatively uniform within each layer since on the input layer, features are normalized to zero-mean and unit-variance, and hidden-layer values are probabilities)", 
-            "title": "Structure Ranking Pruners"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#l1rankedstructureparameterpruner", 
-            "text": "The  L1RankedStructureParameterPruner  pruner calculates the magnitude of some \"structure\", orders all of the structures based on some magnitude function and the  m  lowest ranking structures are pruned away.  This pruner performs ranking of structures using the mean of the absolute value of the structure as the representative of the structure magnitude.  The absolute mean does not depend on the size of the structure, so it is easier to use compared to just using the \\(L_1\\)-norm of the structure, and at the same time it is a good proxy of the \\(L_1\\)-norm.  Basically, you can think of  mean(abs(t))  as a form of normalization of the structure L1-norm by the length of the structure.   L1RankedStructureParameterPruner  currently prunes weight filters, channels, and rows (for linear layers).", 
-            "title": "L1RankedStructureParameterPruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#activationapozrankedfilterpruner", 
-            "text": "The  ActivationAPoZRankedFilterPruner  pruner uses the activation channels mean APoZ (average percentage of zeros) to rank weight filters and prune a specified percentage of filters.\nThis method is called  Network Trimming  from the research paper:\n\"Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures\",\n    Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016\n    https://arxiv.org/abs/1607.03250", 
-            "title": "ActivationAPoZRankedFilterPruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#gradientrankedfilterpruner", 
-            "text": "The  GradientRankedFilterPruner  tries to asses the importance of weight filters using the product of their gradients and the filter value.", 
-            "title": "GradientRankedFilterPruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#randomrankedfilterpruner", 
-            "text": "For research purposes we may want to compare the results of some structure-ranking pruner to a random structure-ranking.  The  RandomRankedFilterPruner  pruner can be used for this purpose.", 
-            "title": "RandomRankedFilterPruner"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#automated-gradual-pruner-agp-for-structures", 
-            "text": "The idea of a mathematical formula controlling the sparsity level growth is very useful and  StructuredAGP  extends the implementation to structured pruning.", 
-            "title": "Automated Gradual Pruner (AGP) for Structures"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#pruner-compositions", 
-            "text": "Pruners can be combined to create new pruning schemes.  Specifically, with a few lines of code we currently marry the AGP sparsity level scheduler with our filter-ranking classes to create pruner compositions.  For each of these, we use AGP to decided how many filters to prune at each step, and we choose the filters to remove using one of the filter-ranking methods:   L1RankedStructureParameterPruner_AGP  ActivationAPoZRankedFilterPruner_AGP  GradientRankedFilterPruner_AGP  RandomRankedFilterPruner_AGP", 
-            "title": "Pruner Compositions"
-        }, 
-        {
-            "location": "/algo_pruning/index.html#hybrid-pruning", 
-            "text": "In a single schedule we can mix different pruning techniques.  For example, we might mix pruning and regularization.  Or structured pruning and element-wise pruning.  We can even apply different methods on the same tensor.  For example, we might want to perform filter pruning for a few epochs, then perform  thinning  and continue with element-wise pruning of the smaller network tensors.  This technique of mixing different methods we call Hybrid Pruning, and Distiller has a few example schedules.", 
-            "title": "Hybrid Pruning"
-        }, 
-        {
-            "location": "/algo_quantization/index.html", 
-            "text": "Quantization Algorithms\n\n\nNote:\n\nFor any of the methods below that require quantization-aware training, please see \nhere\n for details on how to invoke it using Distiller's scheduling mechanism.\n\n\nRange-Based Linear Quantization\n\n\nLet's break down the terminology we use here:\n\n\n\n\nLinear:\n Means a float value is quantized by multiplying with a numeric constant (the \nscale factor\n).\n\n\nRange-Based:\n Means that in order to calculate the scale factor, we look at the actual range of the tensor's values. In the most naive implementation, we use the actual min/max values of the tensor. Alternatively, we use some derivation based on the tensor's range / distribution to come up with a narrower min/max range, in order to remove possible outliers. This is in contrast to the other methods described here, which we could call \nclipping-based\n, as they impose an explicit clipping function on the tensors (using either a hard-coded value or a learned value).\n\n\n\n\nAsymmetric vs. Symmetric\n\n\nIn this method we can use two modes - \nasymmetric\n and \nsymmetric\n.\n\n\nAsymmetric Mode\n\n\n\n    \n\n\n\n\n\nIn \nasymmetric\n mode, we map the min/max in the float range to the min/max of the integer range. This is done by using a \nzero-point\n (also called \nquantization bias\n, or \noffset\n) in addition to the scale factor.\n\n\nLet us denote the original floating-point tensor by \nx_f\n, the quantized tensor by \nx_q\n, the scale factor by \nq_x\n, the zero-point by \nzp_x\n and the number of bits used for quantization by \nn\n. Then, we get:\n\n\n\n\nx_q = round\\left ((x_f - min_{x_f})\\underbrace{\\frac{2^n - 1}{max_{x_f} - min_{x_f}}}_{q_x} \\right) = round(q_x x_f - \\underbrace{min_{x_f}q_x)}_{zp_x} = round(q_x x_f - zp_x)\n\n\n\n\nIn practice, we actually use \nzp_x = round(min_{x_f}q_x)\n. This means that zero is exactly representable by an integer in the quantized range. This is important, for example, for layers that have zero-padding. By rounding the zero-point, we effectively \"nudge\" the min/max values in the float range a little bit, in order to gain this exact quantization of zero.\n\n\nNote that in the derivation above we use unsigned integer to represent the quantized range. That is, \nx_q \\in [0, 2^n-1]\n. One could use signed integer if necessary (perhaps due to HW considerations). This can be achieved by subtracting \n2^{n-1}\n.\n\n\nLet's see how a \nconvolution\n or \nfully-connected (FC)\n layer is quantized in asymmetric mode: (we denote input, output, weights and bias with  \nx, y, w\n and \nb\n respectively)\n\n\n\n\ny_f = \\sum{x_f w_f} + b_f = \\sum{\\frac{x_q + zp_x}{q_x} \\frac{w_q + zp_w}{q_w}} + \\frac{b_q + zp_b}{q_b} =\n\n\n = \\frac{1}{q_x q_w} \\left( \\sum { (x_q + zp_x) (w_q + zp_w) + \\frac{q_x q_w}{q_b}(b_q + zp_b) } \\right)\n\n\n\n\nTherefore:\n\n\n\n\ny_q = round(q_y y_f) = round\\left(\\frac{q_y}{q_x q_w} \\left( \\sum { (x_q+zp_x) (w_q+zp_w) + \\frac{q_x q_w}{q_b}(b_q+zp_b) } \\right) \\right) \n\n\n\n\nNotes:\n\n\n\n\nWe can see that the bias has to be re-scaled to match the scale of the summation.\n\n\nIn a proper integer-only HW pipeline, we would like our main accumulation term to simply be \n\\sum{x_q w_q}\n. In order to achieve this, one needs to further develop the expression we derived above. For further details please refer to the \ngemmlowp documentation\n\n\n\n\nSymmetric Mode\n\n\n\n    \n\n\n\n\n\nIn \nsymmetric\n mode, instead of mapping the exact min/max of the float range to the quantized range, we choose the maximum absolute value between min/max. In addition, we don't use a zero-point. So, the floating-point range we're effectively quantizing is symmetric with respect to zero, and so is the quantized range.\n\n\nUsing the same notations as above, we get:\n\n\n\n\nx_q = round\\left (x_f \\underbrace{\\frac{2^{n-1} - 1}{\\max|x_f|}}_{q_x} \\right) = round(q_x x_f)\n\n\n\n\nAgain, let's see how a \nconvolution\n or \nfully-connected (FC)\n layer is quantized, this time in symmetric mode:\n\n\n\n\ny_f = \\sum{x_f w_f} + b_f = \\sum{\\frac{x_q}{q_x} \\frac{w_q}{q_w}} + \\frac{b_q}{q_b} = \\frac{1}{q_x q_w} \\left( \\sum { x_q w_q + \\frac{q_x q_w}{q_b}b_q } \\right)\n\n\n\n\nTherefore:\n\n\n\n\ny_q = round(q_y y_f) = round\\left(\\frac{q_y}{q_x q_w} \\left( \\sum { x_q w_q + \\frac{q_x q_w}{q_b}b_q } \\right) \\right) \n\n\n\n\nComparing the Two Modes\n\n\nThe main trade-off between these two modes is simplicity vs. utilization of the quantized range.\n\n\n\n\nWhen using asymmetric quantization, the quantized range is fully utilized. That is because we exactly map the min/max values from the float range to the min/max of the quantized range. Using symmetric mode, if the float range is biased towards one side, could result in a quantized range where significant dynamic range is dedicated to values that we'll never see. The most extreme example of this is after ReLU, where the entire tensor is positive. Quantizing it in symmetric mode means we're effectively losing 1 bit.\n\n\nOn the other hand, if we look at the derviations for convolution / FC layers above, we can see that the actual implementation of symmetric mode is much simpler. In asymmetric mode, the zero-points require additional logic in HW. The cost of this extra logic in terms of latency and/or power and/or area will of course depend on the exact implementation.\n\n\n\n\nOther Features\n\n\n\n\nRemoving Outliers:\n As discussed \nhere\n, in some cases the float range of activations contains outliers. Spending dynamic range on these outliers hurts our ability to represent the values we actually care about accurately.\n   \n\n       \n\n   \n\n  Currently, Distiller supports clipping of activations with averaging during post-training quantization. That is - for each batch, instead of calculating global min/max values, an average of the min/max values of each sample in the batch.\n\n\nScale factor scope:\n For weight tensors, Distiller supports per-channel quantization (per output channel).\n\n\n\n\nImplementation in Distiller\n\n\nPost-Training\n\n\nFor post-training quantization, this method is implemented by wrapping existing modules with quantization and de-quantization operations. The wrapper implementations are in \nrange_linear.py\n.\n\n\n\n\nThe operations currently supported are:\n\n\nConvolution\n\n\nFully connected\n\n\nElement-wise addition\n\n\nElement-wise multiplication\n\n\nConcatenation\n\n\n\n\n\n\nAll other layers are unaffected and are executed using their original FP32 implementation.\n\n\nTo automatically transform an existing model to a quantized model using this method, use the \nPostTrainLinearQuantizer\n class. For details on ways to invoke the quantizer see \nhere\n.\n\n\nThe transform performed by the Quantizer only works on sub-classes of \ntorch.nn.Module\n. But operations such as element-wise addition / multiplication and concatenation do not have associated Modules in PyTorch. They are either overloaded operators, or simple functions in the \ntorch\n namespace. To be able to quantize these operations, we've implemented very simple modules that wrap these operations \nhere\n. It is necessary to manually modify your model and replace any existing operator with a corresponding module. For an example, see our slightly modified \nResNet implementation\n.\n\n\nFor weights and bias the scale factor and zero-point are determined once at quantization setup (\"offline\" / \"static\"). For activations, both \"static\" and \"dynamic\" quantization is supported. Static quantizaton of activations requires that statistics be collected beforehand. See details on how to do that \nhere\n.\n\n\nThe calculated quantization parameters are stored as buffers within the module, so they are automatically serialized when the model checkpoint is saved.\n\n\n\n\nQuantization-Aware Training\n\n\nTo apply range-based linear quantization in training, use the \nQuantAwareTrainRangeLinearQuantizer\n class. As it is now, it will apply weights quantization to convolution, FC and embedding modules. For activations quantization, it will insert instances \nFakeLinearQuantization\n module after ReLUs. This module follows the methodology described in \nBenoit et al., 2018\n and uses exponential moving averages to track activation ranges.\n\nNote that the current implementation of \nQuantAwareTrainRangeLinearQuantizer\n supports training with \nsingle GPU only\n.\n\n\nSimilarly to post-training, the calculated quantization parameters (scale factors, zero-points, tracked activation ranges) are stored as buffers within their respective modules, so they're saved when a checkpoint is created.\n\n\nNote that converting from a quantization-aware training model to a post-training quantization model is not yet supported. Such a conversion will use the activation ranges tracked during training, so additional offline or online calculation of quantization parameters will not be required.\n\n\nDoReFa\n\n\n(As proposed in \nDoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients\n)  \n\n\nIn this method, we first define the quantization function \nquantize_k\n, which takes a real value \na_f \\in [0, 1]\n and outputs a discrete-valued \na_q \\in \\left\\{ \\frac{0}{2^k-1}, \\frac{1}{2^k-1}, ... , \\frac{2^k-1}{2^k-1} \\right\\}\n, where \nk\n is the number of bits used for quantization.\n\n\n\n\na_q = quantize_k(a_f) = \\frac{1}{2^k-1} round \\left( \\left(2^k - 1 \\right) a_f \\right)\n\n\n\n\nActivations are clipped to the \n[0, 1]\n range and then quantized as follows:\n\n\n\n\nx_q = quantize_k(x_f)\n\n\n\n\nFor weights, we define the following function \nf\n, which takes an unbounded real valued input and outputs a real value in \n[0, 1]\n:\n\n\n\n\nf(w) = \\frac{tanh(w)}{2 max(|tanh(w)|)} + \\frac{1}{2} \n\n\n\n\nNow we can use \nquantize_k\n to get quantized weight values, as follows:\n\n\n\n\nw_q = 2 quantize_k \\left( f(w_f) \\right) - 1\n\n\n\n\nThis method requires training the model with quantization-aware training, as discussed \nhere\n. Use the \nDorefaQuantizer\n class to transform an existing model to a model suitable for training with quantization using DoReFa.\n\n\nNotes:\n\n\n\n\nGradients quantization as proposed in the paper is not supported yet.\n\n\nThe paper defines special handling for binary weights which isn't supported in Distiller yet.\n\n\n\n\nPACT\n\n\n(As proposed in \nPACT: Parameterized Clipping Activation for Quantized Neural Networks\n)\n\n\nThis method is similar to DoReFa, but the upper clipping values, \n\\alpha\n, of the activation functions are learned parameters instead of hard coded to 1. Note that per the paper's recommendation, \n\\alpha\n is shared per layer.\n\n\nThis method requires training the model with quantization-aware training, as discussed \nhere\n. Use the \nPACTQuantizer\n class to transform an existing model to a model suitable for training with quantization using PACT.\n\n\nWRPN\n\n\n(As proposed in \nWRPN: Wide Reduced-Precision Networks\n)  \n\n\nIn this method, activations are clipped to \n[0, 1]\n and quantized as follows (\nk\n is the number of bits used for quantization):\n\n\n\n\nx_q = \\frac{1}{2^k-1} round \\left( \\left(2^k - 1 \\right) x_f \\right)\n\n\n\n\nWeights are clipped to \n[-1, 1]\n and quantized as follows:\n\n\n\n\nw_q = \\frac{1}{2^{k-1}-1} round \\left( \\left(2^{k-1} - 1 \\right)w_f \\right)\n\n\n\n\nNote that \nk-1\n bits are used to quantize weights, leaving one bit for sign.\n\n\nThis method requires training the model with quantization-aware training, as discussed \nhere\n. Use the \nWRPNQuantizer\n class to transform an existing model to a model suitable for training with quantization using WRPN.\n\n\nNotes:\n\n\n\n\nThe paper proposed widening of layers as a means to reduce accuracy loss. This isn't implemented as part of \nWRPNQuantizer\n at the moment. To experiment with this, modify your model implementation to have wider layers.\n\n\nThe paper defines special handling for binary weights which isn't supported in Distiller yet.", 
-            "title": "Quantization"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#quantization-algorithms", 
-            "text": "Note: \nFor any of the methods below that require quantization-aware training, please see  here  for details on how to invoke it using Distiller's scheduling mechanism.", 
-            "title": "Quantization Algorithms"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#range-based-linear-quantization", 
-            "text": "Let's break down the terminology we use here:   Linear:  Means a float value is quantized by multiplying with a numeric constant (the  scale factor ).  Range-Based:  Means that in order to calculate the scale factor, we look at the actual range of the tensor's values. In the most naive implementation, we use the actual min/max values of the tensor. Alternatively, we use some derivation based on the tensor's range / distribution to come up with a narrower min/max range, in order to remove possible outliers. This is in contrast to the other methods described here, which we could call  clipping-based , as they impose an explicit clipping function on the tensors (using either a hard-coded value or a learned value).", 
-            "title": "Range-Based Linear Quantization"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#asymmetric-vs-symmetric", 
-            "text": "In this method we can use two modes -  asymmetric  and  symmetric .", 
-            "title": "Asymmetric vs. Symmetric"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#asymmetric-mode", 
-            "text": "In  asymmetric  mode, we map the min/max in the float range to the min/max of the integer range. This is done by using a  zero-point  (also called  quantization bias , or  offset ) in addition to the scale factor.  Let us denote the original floating-point tensor by  x_f , the quantized tensor by  x_q , the scale factor by  q_x , the zero-point by  zp_x  and the number of bits used for quantization by  n . Then, we get:   x_q = round\\left ((x_f - min_{x_f})\\underbrace{\\frac{2^n - 1}{max_{x_f} - min_{x_f}}}_{q_x} \\right) = round(q_x x_f - \\underbrace{min_{x_f}q_x)}_{zp_x} = round(q_x x_f - zp_x)   In practice, we actually use  zp_x = round(min_{x_f}q_x) . This means that zero is exactly representable by an integer in the quantized range. This is important, for example, for layers that have zero-padding. By rounding the zero-point, we effectively \"nudge\" the min/max values in the float range a little bit, in order to gain this exact quantization of zero.  Note that in the derivation above we use unsigned integer to represent the quantized range. That is,  x_q \\in [0, 2^n-1] . One could use signed integer if necessary (perhaps due to HW considerations). This can be achieved by subtracting  2^{n-1} .  Let's see how a  convolution  or  fully-connected (FC)  layer is quantized in asymmetric mode: (we denote input, output, weights and bias with   x, y, w  and  b  respectively)   y_f = \\sum{x_f w_f} + b_f = \\sum{\\frac{x_q + zp_x}{q_x} \\frac{w_q + zp_w}{q_w}} + \\frac{b_q + zp_b}{q_b} =   = \\frac{1}{q_x q_w} \\left( \\sum { (x_q + zp_x) (w_q + zp_w) + \\frac{q_x q_w}{q_b}(b_q + zp_b) } \\right)   Therefore:   y_q = round(q_y y_f) = round\\left(\\frac{q_y}{q_x q_w} \\left( \\sum { (x_q+zp_x) (w_q+zp_w) + \\frac{q_x q_w}{q_b}(b_q+zp_b) } \\right) \\right)    Notes:   We can see that the bias has to be re-scaled to match the scale of the summation.  In a proper integer-only HW pipeline, we would like our main accumulation term to simply be  \\sum{x_q w_q} . In order to achieve this, one needs to further develop the expression we derived above. For further details please refer to the  gemmlowp documentation", 
-            "title": "Asymmetric Mode"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#symmetric-mode", 
-            "text": "In  symmetric  mode, instead of mapping the exact min/max of the float range to the quantized range, we choose the maximum absolute value between min/max. In addition, we don't use a zero-point. So, the floating-point range we're effectively quantizing is symmetric with respect to zero, and so is the quantized range.  Using the same notations as above, we get:   x_q = round\\left (x_f \\underbrace{\\frac{2^{n-1} - 1}{\\max|x_f|}}_{q_x} \\right) = round(q_x x_f)   Again, let's see how a  convolution  or  fully-connected (FC)  layer is quantized, this time in symmetric mode:   y_f = \\sum{x_f w_f} + b_f = \\sum{\\frac{x_q}{q_x} \\frac{w_q}{q_w}} + \\frac{b_q}{q_b} = \\frac{1}{q_x q_w} \\left( \\sum { x_q w_q + \\frac{q_x q_w}{q_b}b_q } \\right)   Therefore:   y_q = round(q_y y_f) = round\\left(\\frac{q_y}{q_x q_w} \\left( \\sum { x_q w_q + \\frac{q_x q_w}{q_b}b_q } \\right) \\right)", 
-            "title": "Symmetric Mode"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#comparing-the-two-modes", 
-            "text": "The main trade-off between these two modes is simplicity vs. utilization of the quantized range.   When using asymmetric quantization, the quantized range is fully utilized. That is because we exactly map the min/max values from the float range to the min/max of the quantized range. Using symmetric mode, if the float range is biased towards one side, could result in a quantized range where significant dynamic range is dedicated to values that we'll never see. The most extreme example of this is after ReLU, where the entire tensor is positive. Quantizing it in symmetric mode means we're effectively losing 1 bit.  On the other hand, if we look at the derviations for convolution / FC layers above, we can see that the actual implementation of symmetric mode is much simpler. In asymmetric mode, the zero-points require additional logic in HW. The cost of this extra logic in terms of latency and/or power and/or area will of course depend on the exact implementation.", 
-            "title": "Comparing the Two Modes"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#other-features", 
-            "text": "Removing Outliers:  As discussed  here , in some cases the float range of activations contains outliers. Spending dynamic range on these outliers hurts our ability to represent the values we actually care about accurately.\n    \n        \n    \n  Currently, Distiller supports clipping of activations with averaging during post-training quantization. That is - for each batch, instead of calculating global min/max values, an average of the min/max values of each sample in the batch.  Scale factor scope:  For weight tensors, Distiller supports per-channel quantization (per output channel).", 
-            "title": "Other Features"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#implementation-in-distiller", 
-            "text": "", 
-            "title": "Implementation in Distiller"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#post-training", 
-            "text": "For post-training quantization, this method is implemented by wrapping existing modules with quantization and de-quantization operations. The wrapper implementations are in  range_linear.py .   The operations currently supported are:  Convolution  Fully connected  Element-wise addition  Element-wise multiplication  Concatenation    All other layers are unaffected and are executed using their original FP32 implementation.  To automatically transform an existing model to a quantized model using this method, use the  PostTrainLinearQuantizer  class. For details on ways to invoke the quantizer see  here .  The transform performed by the Quantizer only works on sub-classes of  torch.nn.Module . But operations such as element-wise addition / multiplication and concatenation do not have associated Modules in PyTorch. They are either overloaded operators, or simple functions in the  torch  namespace. To be able to quantize these operations, we've implemented very simple modules that wrap these operations  here . It is necessary to manually modify your model and replace any existing operator with a corresponding module. For an example, see our slightly modified  ResNet implementation .  For weights and bias the scale factor and zero-point are determined once at quantization setup (\"offline\" / \"static\"). For activations, both \"static\" and \"dynamic\" quantization is supported. Static quantizaton of activations requires that statistics be collected beforehand. See details on how to do that  here .  The calculated quantization parameters are stored as buffers within the module, so they are automatically serialized when the model checkpoint is saved.", 
-            "title": "Post-Training"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#quantization-aware-training", 
-            "text": "To apply range-based linear quantization in training, use the  QuantAwareTrainRangeLinearQuantizer  class. As it is now, it will apply weights quantization to convolution, FC and embedding modules. For activations quantization, it will insert instances  FakeLinearQuantization  module after ReLUs. This module follows the methodology described in  Benoit et al., 2018  and uses exponential moving averages to track activation ranges. \nNote that the current implementation of  QuantAwareTrainRangeLinearQuantizer  supports training with  single GPU only .  Similarly to post-training, the calculated quantization parameters (scale factors, zero-points, tracked activation ranges) are stored as buffers within their respective modules, so they're saved when a checkpoint is created.  Note that converting from a quantization-aware training model to a post-training quantization model is not yet supported. Such a conversion will use the activation ranges tracked during training, so additional offline or online calculation of quantization parameters will not be required.", 
-            "title": "Quantization-Aware Training"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#dorefa", 
-            "text": "(As proposed in  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients )    In this method, we first define the quantization function  quantize_k , which takes a real value  a_f \\in [0, 1]  and outputs a discrete-valued  a_q \\in \\left\\{ \\frac{0}{2^k-1}, \\frac{1}{2^k-1}, ... , \\frac{2^k-1}{2^k-1} \\right\\} , where  k  is the number of bits used for quantization.   a_q = quantize_k(a_f) = \\frac{1}{2^k-1} round \\left( \\left(2^k - 1 \\right) a_f \\right)   Activations are clipped to the  [0, 1]  range and then quantized as follows:   x_q = quantize_k(x_f)   For weights, we define the following function  f , which takes an unbounded real valued input and outputs a real value in  [0, 1] :   f(w) = \\frac{tanh(w)}{2 max(|tanh(w)|)} + \\frac{1}{2}    Now we can use  quantize_k  to get quantized weight values, as follows:   w_q = 2 quantize_k \\left( f(w_f) \\right) - 1   This method requires training the model with quantization-aware training, as discussed  here . Use the  DorefaQuantizer  class to transform an existing model to a model suitable for training with quantization using DoReFa.", 
-            "title": "DoReFa"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#notes", 
-            "text": "Gradients quantization as proposed in the paper is not supported yet.  The paper defines special handling for binary weights which isn't supported in Distiller yet.", 
-            "title": "Notes:"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#pact", 
-            "text": "(As proposed in  PACT: Parameterized Clipping Activation for Quantized Neural Networks )  This method is similar to DoReFa, but the upper clipping values,  \\alpha , of the activation functions are learned parameters instead of hard coded to 1. Note that per the paper's recommendation,  \\alpha  is shared per layer.  This method requires training the model with quantization-aware training, as discussed  here . Use the  PACTQuantizer  class to transform an existing model to a model suitable for training with quantization using PACT.", 
-            "title": "PACT"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#wrpn", 
-            "text": "(As proposed in  WRPN: Wide Reduced-Precision Networks )    In this method, activations are clipped to  [0, 1]  and quantized as follows ( k  is the number of bits used for quantization):   x_q = \\frac{1}{2^k-1} round \\left( \\left(2^k - 1 \\right) x_f \\right)   Weights are clipped to  [-1, 1]  and quantized as follows:   w_q = \\frac{1}{2^{k-1}-1} round \\left( \\left(2^{k-1} - 1 \\right)w_f \\right)   Note that  k-1  bits are used to quantize weights, leaving one bit for sign.  This method requires training the model with quantization-aware training, as discussed  here . Use the  WRPNQuantizer  class to transform an existing model to a model suitable for training with quantization using WRPN.", 
-            "title": "WRPN"
-        }, 
-        {
-            "location": "/algo_quantization/index.html#notes_1", 
-            "text": "The paper proposed widening of layers as a means to reduce accuracy loss. This isn't implemented as part of  WRPNQuantizer  at the moment. To experiment with this, modify your model implementation to have wider layers.  The paper defines special handling for binary weights which isn't supported in Distiller yet.", 
-            "title": "Notes:"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html", 
-            "text": "Early Exit Inference\n\n\nWhile Deep Neural Networks benefit from a large number of layers, it's often the case that many data points in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in \nConditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition\n points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in \nBranchyNet: Fast Inference via Early Exiting from Deep Neural Networks\n look at a selective approach to exit placement and criteria for exiting early.\n\n\nWhy Does Early Exit Work?\n\n\nEarly Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can represent more complex and expressive boundaries between classes (assuming we\u2019re confident of avoiding over-fitting the data), it\u2019s also clear that much of the data can be properly classified with even the simplest of classification boundaries.\n\n\n\n\nData points far from the boundary can be considered \"easy to classify\" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is \"difficult to classify\" and require the full expressiveness of the neural network to accurately classify it.\n\n\nExample code for Early Exit\n\n\nBoth CIFAR10 and ImageNet code comes directly from publicly available examples from PyTorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work.\n\n\nNote:\n the sample code provided for ResNet models with Early Exits has exactly one early exit for the CIFAR10 example and exactly two early exits for the ImageNet example. If you want to modify the number of early exits, you will need to make sure that the model code is updated to have a corresponding number of exits.\nDeeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and ImageNet, respectively.\n\n\nNote that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues.\n\n\nExample command lines\n\n\nWe have provided examples for ResNets of varying sizes for both CIFAR10 and ImageNet datasets. An example command line for training for CIFAR10 is:\n\n\npython compress_classifier.py --arch=resnet32_cifar_earlyexit --epochs=20 -b 128 \\\n    --lr=0.003 --earlyexit_thresholds 0.4 --earlyexit_lossweights 0.4 -j 30 \\\n    --out-dir /home/ -n earlyexit /home/pcifar10\n\n\n\n\nAnd an example command line for ImageNet is:\n\n\npython compress_classifier.py --arch=resnet50_earlyexit --epochs=120 -b 128 \\\n    --lr=0.003 --earlyexit_thresholds 1.2 0.9 --earlyexit_lossweights 0.1 0.3 \\\n    -j 30 --out-dir /home/ -n earlyexit /home/I1K/i1k-extracted/\n\n\n\n\nHeuristics\n\n\nThe insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more aggressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy.\n\n\nThere are other benefits to adding exits in that training the modified network now has back-propagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient.\n\n\nEarly Exit Hyper-Parameters\n\n\nThere are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit:\n\n\n\n\n--earlyexit_thresholds\n defines the thresholds for each of the early exits. The cross entropy measure must be \nless than\n the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify \"--earlyexit_thresholds 0.9 1.2\" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits.\n\n\n\n\n12 \n--earlyexit_lossweights\n provide the weights for the linear combination of losses during training to compute a single, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of \"--earlyexit_lossweights 0.2 0.3\" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy.\n\n\nOutput Stats\n\n\nThe example code outputs various statistics regarding the loss and accuracy at each of the exits. During training, the Top1 and Top5 stats represent the accuracy should all of the data be forced out that exit (in order to compute the loss at that exit). During inference (i.e. validation and test stages), the Top1 and Top5 stats represent the accuracy for those data points that could exit because the calculated entropy at that exit was lower than the specified threshold for that exit.\n\n\nCIFAR10\n\n\nIn the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers.\n\n\nImageNet\n\n\nThis supports training and inference of the ImageNet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic ResNet code and could be used with other size ResNets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly.\n\n\nReferences\n\n\n\n\n\nPriyadarshini Panda, Abhronil Sengupta, Kaushik Roy\n.\n    \nConditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition\n, arXiv:1509.08971v6, 2017.\n\n\n\n\n\nSurat Teerapittayanon, Bradley McDanel, H. T. Kung\n.\n    \nBranchyNet: Fast Inference via Early Exiting from Deep Neural Networks\n, arXiv:1709.01686, 2017.", 
-            "title": "Early Exit"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#early-exit-inference", 
-            "text": "While Deep Neural Networks benefit from a large number of layers, it's often the case that many data points in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in  Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition  points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in  BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks  look at a selective approach to exit placement and criteria for exiting early.", 
-            "title": "Early Exit Inference"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#why-does-early-exit-work", 
-            "text": "Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can represent more complex and expressive boundaries between classes (assuming we\u2019re confident of avoiding over-fitting the data), it\u2019s also clear that much of the data can be properly classified with even the simplest of classification boundaries.   Data points far from the boundary can be considered \"easy to classify\" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is \"difficult to classify\" and require the full expressiveness of the neural network to accurately classify it.", 
-            "title": "Why Does Early Exit Work?"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#example-code-for-early-exit", 
-            "text": "Both CIFAR10 and ImageNet code comes directly from publicly available examples from PyTorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work.  Note:  the sample code provided for ResNet models with Early Exits has exactly one early exit for the CIFAR10 example and exactly two early exits for the ImageNet example. If you want to modify the number of early exits, you will need to make sure that the model code is updated to have a corresponding number of exits.\nDeeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and ImageNet, respectively.  Note that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues.", 
-            "title": "Example code for Early Exit"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#example-command-lines", 
-            "text": "We have provided examples for ResNets of varying sizes for both CIFAR10 and ImageNet datasets. An example command line for training for CIFAR10 is:  python compress_classifier.py --arch=resnet32_cifar_earlyexit --epochs=20 -b 128 \\\n    --lr=0.003 --earlyexit_thresholds 0.4 --earlyexit_lossweights 0.4 -j 30 \\\n    --out-dir /home/ -n earlyexit /home/pcifar10  And an example command line for ImageNet is:  python compress_classifier.py --arch=resnet50_earlyexit --epochs=120 -b 128 \\\n    --lr=0.003 --earlyexit_thresholds 1.2 0.9 --earlyexit_lossweights 0.1 0.3 \\\n    -j 30 --out-dir /home/ -n earlyexit /home/I1K/i1k-extracted/", 
-            "title": "Example command lines"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#heuristics", 
-            "text": "The insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more aggressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy.  There are other benefits to adding exits in that training the modified network now has back-propagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient.", 
-            "title": "Heuristics"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#early-exit-hyper-parameters", 
-            "text": "There are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit:   --earlyexit_thresholds  defines the thresholds for each of the early exits. The cross entropy measure must be  less than  the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify \"--earlyexit_thresholds 0.9 1.2\" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits.   12  --earlyexit_lossweights  provide the weights for the linear combination of losses during training to compute a single, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of \"--earlyexit_lossweights 0.2 0.3\" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy.", 
-            "title": "Early Exit Hyper-Parameters"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#output-stats", 
-            "text": "The example code outputs various statistics regarding the loss and accuracy at each of the exits. During training, the Top1 and Top5 stats represent the accuracy should all of the data be forced out that exit (in order to compute the loss at that exit). During inference (i.e. validation and test stages), the Top1 and Top5 stats represent the accuracy for those data points that could exit because the calculated entropy at that exit was lower than the specified threshold for that exit.", 
-            "title": "Output Stats"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#cifar10", 
-            "text": "In the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers.", 
-            "title": "CIFAR10"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#imagenet", 
-            "text": "This supports training and inference of the ImageNet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic ResNet code and could be used with other size ResNets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly.", 
-            "title": "ImageNet"
-        }, 
-        {
-            "location": "/algo_earlyexit/index.html#references", 
-            "text": "Priyadarshini Panda, Abhronil Sengupta, Kaushik Roy .\n     Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition , arXiv:1509.08971v6, 2017.   Surat Teerapittayanon, Bradley McDanel, H. T. Kung .\n     BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks , arXiv:1709.01686, 2017.", 
-            "title": "References"
-        }, 
-        {
-            "location": "/model_zoo/index.html", 
-            "text": "Distiller Model Zoo\n\n\nHow to contribute models to the Model Zoo\n\n\nWe encourage you to contribute new models to the Model Zoo.  We welcome implementations of published papers or of your own work.  To assure that models and algorithms shared with others are high-quality, please commit your models with the following:\n\n\n\n\nCommand-line arguments\n\n\nLog files\n\n\nPyTorch model\n\n\n\n\nContents\n\n\nThe Distiller model zoo is not a \"traditional\" model-zoo, because it does not necessarily contain best-in-class compressed models.  Instead, the model-zoo contains a number of deep learning models that have been compressed using Distiller following some well-known research papers.  These are meant to serve as examples of how Distiller can be used.\n\n\nEach model contains a Distiller schedule detailing how the model was compressed, a PyTorch checkpoint, text logs and TensorBoard logs.\n\n\n\n\ntable, th, td {\n    border: 1px solid black;\n}\n\n\n\n\n  \n\n    \nPaper\n\n    \nDataset\n\n    \nNetwork\n\n    \nMethod \n Granularity\n\n    \nSchedule\n\n    \nFeatures\n\n  \n\n  \n\n    \nLearning both Weights and Connections for Efficient Neural Networks\n\n    \nImageNet\n\n    \nAlexnet\n\n    \nElement-wise pruning\n\n    \nIterative; Manual\n\n    \nMagnitude thresholding based on a sensitivity quantifier.\nElement-wise sparsity sensitivity analysis\n\n  \n\n  \n\n    \nTo prune, or not to prune: exploring the efficacy of pruning for model compression\n\n    \nImageNet\n\n    \nMobileNet\n\n    \nElement-wise pruning\n\n    \nAutomated gradual; Iterative\n\n    \nMagnitude thresholding based on target level\n\n  \n\n  \n\n    \nLearning Structured Sparsity in Deep Neural Networks\n\n    \nCIFAR10\n\n    \nResNet20\n\n    \nGroup regularization\n\n    \n1.Train with group-lasso\n2.Remove zero groups and fine-tune\n\n    \nGroup Lasso regularization. Groups: kernels (2D), channels, filters (3D), layers (4D), vectors (rows, cols)\n\n  \n\n  \n\n    \nPruning Filters for Efficient ConvNets\n\n    \nCIFAR10\n\n    \nResNet56\n\n    \nFilter ranking; guided by sensitivity analysis\n\n    \n1.Rank filters\n2. Remove filters and channels\n3.Fine-tune\n\n    \nOne-shot ranking and pruning of filters; with network thinning\n  \n\n\n\n\nLearning both Weights and Connections for Efficient Neural Networks\n\n\nThis schedule is an example of \"Iterative Pruning\" for Alexnet/Imagent, as described in chapter 3 of Song Han's PhD dissertation: \nEfficient Methods and Hardware for Deep Learning\n and in his paper \nLearning both Weights and Connections for Efficient Neural Networks\n.  \n\n\nThe Distiller schedule uses SensitivityPruner which is similar to MagnitudeParameterPruner, but instead of specifying \"raw\" thresholds, it uses a \"sensitivity parameter\".  Song Han's paper says that \"the pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layers weights,\" and this is not explained much further.  In Distiller, the \"quality parameter\" is referred to as \"sensitivity\" and\nis based on the values learned from performing sensitivity analysis.  Using a parameter that is related to the standard deviation is very helpful: under the assumption that the weights tensors are distributed normally, the standard deviation acts as a threshold normalizer.\n\n\nNote that Distiller's implementation deviates slightly from the algorithm Song Han describes in his PhD dissertation, in that the threshold value is set only once.  In his PhD dissertation, Song Han describes a growing threshold, at each iteration.  This requires n+1 hyper-parameters (n being the number of pruning iterations we use): the threshold and the threshold increase (delta) at each pruning iteration.  Distiller's implementation takes advantage of the fact that as pruning progresses, more weights are pulled toward zero, and therefore the threshold \"traps\" more weights.  Thus, we can use less hyper-parameters and achieve the same results.\n\n\n\n\nDistiller schedule: \ndistiller/examples/sensitivity-pruning/alexnet.schedule_sensitivity.yaml\n\n\nCheckpoint file: \nalexnet.checkpoint.89.pth.tar\n\n\n\n\nResults\n\n\nOur reference is TorchVision's pretrained Alexnet model which has a Top1 accuracy of 56.55 and Top5=79.09.  We prune away 88.44% of the parameters and achieve  Top1=56.61 and Top5=79.45.\nSong Han prunes 89% of the parameters, which is slightly better than our results.\n\n\nParameters:\n+----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n|    | Name                      | Shape            |   NNZ (dense) |   NNZ (sparse) |   Cols (%) |   Rows (%) |   Ch (%) |   2D (%) |   3D (%) |   Fine (%) |     Std |     Mean |   Abs-Mean\n|----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|\n|  0 | features.module.0.weight  | (64, 3, 11, 11)  |         23232 |          13411 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   42.27359 | 0.14391 | -0.00002 |    0.08805 |\n|  1 | features.module.3.weight  | (192, 64, 5, 5)  |        307200 |         115560 |    0.00000 |    0.00000 |  0.00000 |  1.91243 |  0.00000 |   62.38281 | 0.04703 | -0.00250 |    0.02289 |\n|  2 | features.module.6.weight  | (384, 192, 3, 3) |        663552 |         256565 |    0.00000 |    0.00000 |  0.00000 |  6.18490 |  0.00000 |   61.33445 | 0.03354 | -0.00184 |    0.01803 |\n|  3 | features.module.8.weight  | (256, 384, 3, 3) |        884736 |         315065 |    0.00000 |    0.00000 |  0.00000 |  6.96411 |  0.00000 |   64.38881 | 0.02646 | -0.00168 |    0.01422 |\n|  4 | features.module.10.weight | (256, 256, 3, 3) |        589824 |         186938 |    0.00000 |    0.00000 |  0.00000 | 15.49225 |  0.00000 |   68.30614 | 0.02714 | -0.00246 |    0.01409 |\n|  5 | classifier.1.weight       | (4096, 9216)     |      37748736 |        3398881 |    0.00000 |    0.21973 |  0.00000 |  0.21973 |  0.00000 |   90.99604 | 0.00589 | -0.00020 |    0.00168 |\n|  6 | classifier.4.weight       | (4096, 4096)     |      16777216 |        1782769 |    0.21973 |    3.46680 |  0.00000 |  3.46680 |  0.00000 |   89.37387 | 0.00849 | -0.00066 |    0.00263 |\n|  7 | classifier.6.weight       | (1000, 4096)     |       4096000 |         994738 |    3.36914 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   75.71440 | 0.01718 |  0.00030 |    0.00778 |\n|  8 | Total sparsity:           | -                |      61090496 |        7063928 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   88.43694 | 0.00000 |  0.00000 |    0.00000 |\n+----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n 2018-04-04 21:30:52,499 - Total sparsity: 88.44\n\n 2018-04-04 21:30:52,499 - --- validate (epoch=89)-----------\n 2018-04-04 21:30:52,499 - 128116 samples (256 per mini-batch)\n 2018-04-04 21:31:35,357 - ==\n Top1: 51.838    Top5: 74.817    Loss: 2.150\n\n 2018-04-04 21:31:39,251 - --- test ---------------------\n 2018-04-04 21:31:39,252 - 50000 samples (256 per mini-batch)\n 2018-04-04 21:32:01,274 - ==\n Top1: 56.606    Top5: 79.446    Loss: 1.893\n\n\n\n\nTo prune, or not to prune: exploring the efficacy of pruning for model compression\n\n\nIn their paper Zhu and Gupta, \"compare the accuracy of large, but pruned models (large-sparse) and their\nsmaller, but dense (small-dense) counterparts with identical memory footprint.\"\nThey also \"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with\nminimal tuning.\"\n\n\nThis pruning schedule is implemented by distiller.AutomatedGradualPruner, which increases the sparsity level (expressed as a percentage of zero-valued elements) gradually over several pruning steps.  Distiller's implementation only prunes elements once in an epoch (the model is fine-tuned in between pruning events), which is a small deviation from Zhu and Gupta's paper.  The research paper specifies the schedule in terms of mini-batches, while our implementation specifies the schedule in terms of epochs.  We feel that using epochs performs well, and is more \"stable\", since the number of mini-batches will change, if you change the batch size.\n\n\nImageNet files:\n\n\n\n\nDistiller schedule: \ndistiller/examples/agp-pruning/mobilenet.imagenet.schedule_agp.yaml\n\n\nCheckpoint file: \ncheckpoint.pth.tar\n\n\n\n\nResNet18 files:\n\n\n\n\nDistiller schedule: \ndistiller/examples/agp-pruning/resnet18.schedule_agp.yaml\n\n\nCheckpoint file: \ncheckpoint.pth.tar\n\n\n\n\nResults\n\n\nAs our baseline we used a \npretrained PyTorch MobileNet model\n (width=1) which has Top1=68.848 and Top5=88.740.\n\nIn their paper, Zhu and Gupta prune 50% of the elements of MobileNet (width=1) with a 1.1% drop in accuracy.  We pruned about 51.6% of the elements, with virtually no change in the accuracies (Top1: 68.808 and Top5: 88.656).  We didn't try to prune more than this, but we do note that the baseline accuracy that we used is almost 2% lower than the accuracy published in the paper.  \n\n\n+----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n|    | Name                     | Shape              |   NNZ (dense) |   NNZ (sparse) |   Cols (%) |   Rows (%) |   Ch (%) |   2D (%) |   3D (%) |   Fine (%) |     Std |     Mean |   Abs-Mean |\n|----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|\n|  0 | module.model.0.0.weight  | (32, 3, 3, 3)      |           864 |            864 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.14466 |  0.00103 |    0.06508 |\n|  1 | module.model.1.0.weight  | (32, 1, 3, 3)      |           288 |            288 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.32146 |  0.01020 |    0.12932 |\n|  2 | module.model.1.3.weight  | (64, 32, 1, 1)     |          2048 |           2048 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.11942 |  0.00024 |    0.03627 |\n|  3 | module.model.2.0.weight  | (64, 1, 3, 3)      |           576 |            576 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.15809 |  0.00543 |    0.11513 |\n|  4 | module.model.2.3.weight  | (128, 64, 1, 1)    |          8192 |           8192 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.08442 | -0.00031 |    0.04182 |\n|  5 | module.model.3.0.weight  | (128, 1, 3, 3)     |          1152 |           1152 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.16780 |  0.00125 |    0.10545 |\n|  6 | module.model.3.3.weight  | (128, 128, 1, 1)   |         16384 |          16384 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.07126 | -0.00197 |    0.04123 |\n|  7 | module.model.4.0.weight  | (128, 1, 3, 3)     |          1152 |           1152 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.10182 |  0.00171 |    0.08719 |\n|  8 | module.model.4.3.weight  | (256, 128, 1, 1)   |         32768 |          13108 |    0.00000 |    0.00000 | 10.15625 | 59.99756 | 12.50000 |   59.99756 | 0.05543 | -0.00002 |    0.02760 |\n|  9 | module.model.5.0.weight  | (256, 1, 3, 3)     |          2304 |           2304 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.12516 | -0.00288 |    0.08058 |\n| 10 | module.model.5.3.weight  | (256, 256, 1, 1)   |         65536 |          26215 |    0.00000 |    0.00000 | 12.50000 | 59.99908 | 23.82812 |   59.99908 | 0.04453 |  0.00002 |    0.02271 |\n| 11 | module.model.6.0.weight  | (256, 1, 3, 3)     |          2304 |           2304 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.08024 |  0.00252 |    0.06377 |\n| 12 | module.model.6.3.weight  | (512, 256, 1, 1)   |        131072 |          52429 |    0.00000 |    0.00000 | 23.82812 | 59.99985 | 14.25781 |   59.99985 | 0.03561 | -0.00057 |    0.01779 |\n| 13 | module.model.7.0.weight  | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.11008 | -0.00018 |    0.06829 |\n| 14 | module.model.7.3.weight  | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 14.25781 | 59.99985 | 21.28906 |   59.99985 | 0.02944 | -0.00060 |    0.01515 |\n| 15 | module.model.8.0.weight  | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.08258 |  0.00370 |    0.04905 |\n| 16 | module.model.8.3.weight  | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 21.28906 | 59.99985 | 28.51562 |   59.99985 | 0.02865 | -0.00046 |    0.01465 |\n| 17 | module.model.9.0.weight  | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.07578 |  0.00468 |    0.04201 |\n| 18 | module.model.9.3.weight  | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 28.51562 | 59.99985 | 23.43750 |   59.99985 | 0.02939 | -0.00044 |    0.01511 |\n| 19 | module.model.10.0.weight | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.07091 |  0.00014 |    0.04306 |\n| 20 | module.model.10.3.weight | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 24.60938 | 59.99985 | 20.89844 |   59.99985 | 0.03095 | -0.00059 |    0.01672 |\n| 21 | module.model.11.0.weight | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.05729 | -0.00518 |    0.04267 |\n| 22 | module.model.11.3.weight | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 20.89844 | 59.99985 | 17.57812 |   59.99985 | 0.03229 | -0.00044 |    0.01797 |\n| 23 | module.model.12.0.weight | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.04981 | -0.00136 |    0.03967 |\n| 24 | module.model.12.3.weight | (1024, 512, 1, 1)  |        524288 |         209716 |    0.00000 |    0.00000 | 16.01562 | 59.99985 | 44.23828 |   59.99985 | 0.02514 | -0.00106 |    0.01278 |\n| 25 | module.model.13.0.weight | (1024, 1, 3, 3)    |          9216 |           9216 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.02396 | -0.00949 |    0.01549 |\n| 26 | module.model.13.3.weight | (1024, 1024, 1, 1) |       1048576 |         419431 |    0.00000 |    0.00000 | 44.72656 | 59.99994 |  1.46484 |   59.99994 | 0.01801 | -0.00017 |    0.00931 |\n| 27 | module.fc.weight         | (1000, 1024)       |       1024000 |         409600 |    1.46484 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   60.00000 | 0.05078 |  0.00271 |    0.02734 |\n| 28 | Total sparsity:          | -                  |       4209088 |        1726917 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   58.97171 | 0.00000 |  0.00000 |    0.00000 |\n+----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\nTotal sparsity: 58.97\n\n--- validate (epoch=199)-----------\n128116 samples (256 per mini-batch)\n==\n Top1: 65.337    Top5: 84.984    Loss: 1.494\n\n--- test ---------------------\n50000 samples (256 per mini-batch)\n==\n Top1: 68.810    Top5: 88.626    Loss: 1.282\n\n\n\n\n\nLearning Structured Sparsity in Deep Neural Networks\n\n\nThis research paper from the University of Pittsburgh, \"proposes a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNN\u2019s evaluation.\"\n\n\nNote that this paper does not use pruning, but instead uses group regularization during the training to force weights towards zero, as a group.  We used a schedule which thresholds the regularized elements at a magnitude equal to the regularization strength.  At the end of the regularization phase, we save the final sparsity masks generated by the regularization, and exit.  Then we load this regularized model, remove the layers corresponding to the zeroed weight tensors (all of a layer's elements have a zero value).    \n\n\nBaseline training\n\n\nWe started by training the baseline ResNet20-Cifar dense network since we didn't have a pre-trained model.\n\n\n\n\nDistiller schedule: \ndistiller/examples/ssl/resnet20_cifar_baseline_training.yaml\n\n\nCheckpoint files: \ndistiller/examples/ssl/checkpoints/\n\n\n\n\n$ time python3 compress_classifier.py --arch resnet20_cifar  ../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/baseline_training.yaml -j=1 --deterministic\n\n\n\n\nRegularization\n\n\nThen we started training from scratch again, but this time we used Group Lasso regularization on entire layers:\n\nDistiller schedule: \ndistiller/examples/ssl/ssl_4D-removal_4L_training.yaml\n\n\n$ time python3 compress_classifier.py --arch resnet20_cifar  ../data.cifar10 -p=50 --lr=0.4 --epochs=180 --compress=../ssl/ssl_4D-removal_training.yaml -j=1 --deterministic\n\n\n\n\nThe diagram below shows the training of Resnet20/CIFAR10 using Group Lasso regularization on entire layers (in blue) vs. training Resnet20/CIFAR10  baseline (in red).  You may notice several interesting things:\n1. The LR-decay policy is the same, but the two sessions start with different initial LR values.\n2. The data-loss of the regularized training follows the same shape as the un-regularized training (baseline), and eventually the two seem to merge.\n3. We see similar behavior in the validation Top1 and Top5 accuracy results, but the regularized training eventually performs better.\n4. In the top right corner we see the behavior of the regularization loss (\nReg Loss\n), which actually increases for some time, until the data-loss has a sharp drop (after ~16K mini-batches), at which point the regularization loss also starts dropping.\n\n\n\nThis \nregularization\n yields 5 layers with zeroed weight tensors.  We load this model, remove the 5 layers, and start the fine tuning of the weights.  This process of layer removal is specific to ResNet for CIFAR, which we altered by adding code to skip over layers during the forward path.  When you export to ONNX, the removed layers do not participate in the forward path, so they don't get incarnated.  \n\n\nWe managed to remove 5 of the 16 3x3 convolution layers which dominate the computation time.  It's not bad, but we probably could have done better.\n\n\nFine-tuning\n\n\nDuring the \nfine-tuning\n process, because the removed layers do not participate in the forward path, they do not appear in the backward path and are not backpropogated: therefore they are completely disconnected from the network.\n\nWe copy the checkpoint file of the regularized model to \ncheckpoint_trained_4D_regularized_5Lremoved.pth.tar\n.\n\nDistiller schedule: \ndistiller/examples/ssl/ssl_4D-removal_finetuning.yaml\n\n\n$ time python3 compress_classifier.py --arch resnet20_cifar  ../data.cifar10 -p=50 --lr=0.1 --epochs=250 --resume=../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar --compress=../ssl/ssl_4D-removal_finetuning.yaml  -j=1 --deterministic\n\n\n\n\nResults\n\n\nOur baseline results for ResNet20 Cifar are: Top1=91.450 and  Top5=99.750\n\n\nWe used Distiller's GroupLassoRegularizer to remove 5 layers from Resnet20 (CIFAR10) with no degradation of the accuracies.\n\nThe regularized model exhibits really poor classification abilities: \n\n\n$ time python3 compress_classifier.py --arch resnet20_cifar  ../data.cifar10 -p=50 --resume=../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar --evaluate\n\n=\n loading checkpoint ../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar\n   best top@1: 90.620\nLoaded compression schedule from checkpoint (epoch 179)\nRemoving layer: module.layer1.0.conv1 [layer=0 block=0 conv=0]\nRemoving layer: module.layer1.0.conv2 [layer=0 block=0 conv=1]\nRemoving layer: module.layer1.1.conv1 [layer=0 block=1 conv=0]\nRemoving layer: module.layer1.1.conv2 [layer=0 block=1 conv=1]\nRemoving layer: module.layer2.2.conv2 [layer=1 block=2 conv=1]\nFiles already downloaded and verified\nFiles already downloaded and verified\nDataset sizes:\n        training=45000\n        validation=5000\n        test=10000\n--- test ---------------------\n10000 samples (256 per mini-batch)\n==\n Top1: 22.290    Top5: 68.940    Loss: 5.172\n\n\n\n\nHowever, after fine-tuning, we recovered most of the accuracies loss, but not quite all of it: Top1=91.020 and Top5=99.670\n\n\nWe didn't spend time trying to wrestle with this network, and therefore didn't achieve SSL's published results (which showed that they managed to remove 6 layers and at the same time increase accuracies).\n\n\nPruning Filters for Efficient ConvNets\n\n\nQuoting the authors directly:\n\n\n\n\nWe present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly.\nIn contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications.\n\n\n\n\nThe implementation of the research by Hao et al. required us to add filter-pruning sensitivity analysis, and support for \"network thinning\".\n\n\nAfter performing filter-pruning sensitivity analysis to assess which layers are more sensitive to the pruning of filters, we execute distiller.L1RankedStructureParameterPruner once in order to rank the filters of each layer by their L1-norm values, and then we prune the schedule-prescribed sparsity level.  \n\n\n\n\nDistiller schedule: \ndistiller/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank.yaml\n\n\nCheckpoint files: \ncheckpoint_finetuned.pth.tar\n\n\n\n\nThe excerpt from the schedule, displayed below, shows how we declare the L1RankedStructureParameterPruner.  This class currently ranks filters only, but because in the future this class may support ranking of various structures, you need to specify for each parameter both the target sparsity level, and the structure type ('3D' is filter-wise pruning).\n\n\npruners:\n  filter_pruner:\n    class: 'L1RankedStructureParameterPruner'\n    reg_regims:\n      'module.layer1.0.conv1.weight': [0.6, '3D']\n      'module.layer1.1.conv1.weight': [0.6, '3D']\n      'module.layer1.2.conv1.weight': [0.6, '3D']\n      'module.layer1.3.conv1.weight': [0.6, '3D']\n\n\n\n\nIn the policy, we specify that we want to invoke this pruner once, at epoch 180.  Because we are starting from a network which was trained for 180 epochs (see Baseline training below), the filter ranking is performed right at the outset of this schedule.\n\n\npolicies:\n  - pruner:\n      instance_name: filter_pruner\n    epochs: [180]\n\n\n\n\n\nFollowing the pruning, we want to \"physically\" remove the pruned filters from the network, which involves reconfiguring the Convolutional layers and the parameter tensors.  When we remove filters from Convolution layer \nn\n we need to perform several changes to the network:\n1. Shrink layer \nn\n's weights tensor, leaving only the \"important\" filters.\n2. Configure layer \nn\n's \n.out_channels\n member to its new, smaller, value.\n3. If a BN layer follows layer \nn\n, then it also needs to be reconfigured and its scale and shift parameter vectors need to be shrunk.\n4. If a Convolution layer follows the BN layer, then it will have less input channels which requires reconfiguration and shrinking of its weights.\n\n\nAll of this is performed by distiller.ResnetCifarFilterRemover which is also scheduled at epoch 180.  We call this process \"network thinning\".\n\n\nextensions:\n  net_thinner:\n      class: 'FilterRemover'\n      thinning_func_str: remove_filters\n      arch: 'resnet56_cifar'\n      dataset: 'cifar10'\n\n\n\n\nNetwork thinning requires us to understand the layer connectivity and data-dependency of the DNN, and we are working on a robust method to perform this.  On networks with topologies similar to ResNet (residuals) and GoogLeNet (inception), which have several inputs and outputs to/from Convolution layers, there is extra details to consider.\n\nOur current implementation is specific to certain layers in ResNet and is a bit fragile.  We will continue to improve and generalize this.\n\n\nBaseline training\n\n\nWe started by training the baseline ResNet56-Cifar dense network (180 epochs) since we didn't have a pre-trained model.\n\n\n\n\nDistiller schedule: \ndistiller/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_baseline_training.yaml\n\n\nCheckpoint files: \ncheckpoint.resnet56_cifar_baseline.pth.tar\n\n\n\n\nResults\n\n\nWe trained a ResNet56-Cifar10 network and achieve accuracy results which are on-par with published results:\nTop1: 92.970 and Top5: 99.740.\n\n\nWe used Hao et al.'s algorithm to remove 37.3% of the original convolution MACs, while maintaining virtually the same accuracy as the baseline:\nTop1: 92.830 and Top5: 99.760", 
-            "title": "Model Zoo"
-        }, 
-        {
-            "location": "/model_zoo/index.html#distiller-model-zoo", 
-            "text": "", 
-            "title": "Distiller Model Zoo"
-        }, 
-        {
-            "location": "/model_zoo/index.html#how-to-contribute-models-to-the-model-zoo", 
-            "text": "We encourage you to contribute new models to the Model Zoo.  We welcome implementations of published papers or of your own work.  To assure that models and algorithms shared with others are high-quality, please commit your models with the following:   Command-line arguments  Log files  PyTorch model", 
-            "title": "How to contribute models to the Model Zoo"
-        }, 
-        {
-            "location": "/model_zoo/index.html#contents", 
-            "text": "The Distiller model zoo is not a \"traditional\" model-zoo, because it does not necessarily contain best-in-class compressed models.  Instead, the model-zoo contains a number of deep learning models that have been compressed using Distiller following some well-known research papers.  These are meant to serve as examples of how Distiller can be used.  Each model contains a Distiller schedule detailing how the model was compressed, a PyTorch checkpoint, text logs and TensorBoard logs.  \ntable, th, td {\n    border: 1px solid black;\n}  \n   \n     Paper \n     Dataset \n     Network \n     Method   Granularity \n     Schedule \n     Features \n   \n   \n     Learning both Weights and Connections for Efficient Neural Networks \n     ImageNet \n     Alexnet \n     Element-wise pruning \n     Iterative; Manual \n     Magnitude thresholding based on a sensitivity quantifier. Element-wise sparsity sensitivity analysis \n   \n   \n     To prune, or not to prune: exploring the efficacy of pruning for model compression \n     ImageNet \n     MobileNet \n     Element-wise pruning \n     Automated gradual; Iterative \n     Magnitude thresholding based on target level \n   \n   \n     Learning Structured Sparsity in Deep Neural Networks \n     CIFAR10 \n     ResNet20 \n     Group regularization \n     1.Train with group-lasso 2.Remove zero groups and fine-tune \n     Group Lasso regularization. Groups: kernels (2D), channels, filters (3D), layers (4D), vectors (rows, cols) \n   \n   \n     Pruning Filters for Efficient ConvNets \n     CIFAR10 \n     ResNet56 \n     Filter ranking; guided by sensitivity analysis \n     1.Rank filters 2. Remove filters and channels 3.Fine-tune \n     One-shot ranking and pruning of filters; with network thinning", 
-            "title": "Contents"
-        }, 
-        {
-            "location": "/model_zoo/index.html#learning-both-weights-and-connections-for-efficient-neural-networks", 
-            "text": "This schedule is an example of \"Iterative Pruning\" for Alexnet/Imagent, as described in chapter 3 of Song Han's PhD dissertation:  Efficient Methods and Hardware for Deep Learning  and in his paper  Learning both Weights and Connections for Efficient Neural Networks .    The Distiller schedule uses SensitivityPruner which is similar to MagnitudeParameterPruner, but instead of specifying \"raw\" thresholds, it uses a \"sensitivity parameter\".  Song Han's paper says that \"the pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layers weights,\" and this is not explained much further.  In Distiller, the \"quality parameter\" is referred to as \"sensitivity\" and\nis based on the values learned from performing sensitivity analysis.  Using a parameter that is related to the standard deviation is very helpful: under the assumption that the weights tensors are distributed normally, the standard deviation acts as a threshold normalizer.  Note that Distiller's implementation deviates slightly from the algorithm Song Han describes in his PhD dissertation, in that the threshold value is set only once.  In his PhD dissertation, Song Han describes a growing threshold, at each iteration.  This requires n+1 hyper-parameters (n being the number of pruning iterations we use): the threshold and the threshold increase (delta) at each pruning iteration.  Distiller's implementation takes advantage of the fact that as pruning progresses, more weights are pulled toward zero, and therefore the threshold \"traps\" more weights.  Thus, we can use less hyper-parameters and achieve the same results.   Distiller schedule:  distiller/examples/sensitivity-pruning/alexnet.schedule_sensitivity.yaml  Checkpoint file:  alexnet.checkpoint.89.pth.tar", 
-            "title": "Learning both Weights and Connections for Efficient Neural Networks"
-        }, 
-        {
-            "location": "/model_zoo/index.html#results", 
-            "text": "Our reference is TorchVision's pretrained Alexnet model which has a Top1 accuracy of 56.55 and Top5=79.09.  We prune away 88.44% of the parameters and achieve  Top1=56.61 and Top5=79.45.\nSong Han prunes 89% of the parameters, which is slightly better than our results.  Parameters:\n+----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n|    | Name                      | Shape            |   NNZ (dense) |   NNZ (sparse) |   Cols (%) |   Rows (%) |   Ch (%) |   2D (%) |   3D (%) |   Fine (%) |     Std |     Mean |   Abs-Mean\n|----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|\n|  0 | features.module.0.weight  | (64, 3, 11, 11)  |         23232 |          13411 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   42.27359 | 0.14391 | -0.00002 |    0.08805 |\n|  1 | features.module.3.weight  | (192, 64, 5, 5)  |        307200 |         115560 |    0.00000 |    0.00000 |  0.00000 |  1.91243 |  0.00000 |   62.38281 | 0.04703 | -0.00250 |    0.02289 |\n|  2 | features.module.6.weight  | (384, 192, 3, 3) |        663552 |         256565 |    0.00000 |    0.00000 |  0.00000 |  6.18490 |  0.00000 |   61.33445 | 0.03354 | -0.00184 |    0.01803 |\n|  3 | features.module.8.weight  | (256, 384, 3, 3) |        884736 |         315065 |    0.00000 |    0.00000 |  0.00000 |  6.96411 |  0.00000 |   64.38881 | 0.02646 | -0.00168 |    0.01422 |\n|  4 | features.module.10.weight | (256, 256, 3, 3) |        589824 |         186938 |    0.00000 |    0.00000 |  0.00000 | 15.49225 |  0.00000 |   68.30614 | 0.02714 | -0.00246 |    0.01409 |\n|  5 | classifier.1.weight       | (4096, 9216)     |      37748736 |        3398881 |    0.00000 |    0.21973 |  0.00000 |  0.21973 |  0.00000 |   90.99604 | 0.00589 | -0.00020 |    0.00168 |\n|  6 | classifier.4.weight       | (4096, 4096)     |      16777216 |        1782769 |    0.21973 |    3.46680 |  0.00000 |  3.46680 |  0.00000 |   89.37387 | 0.00849 | -0.00066 |    0.00263 |\n|  7 | classifier.6.weight       | (1000, 4096)     |       4096000 |         994738 |    3.36914 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   75.71440 | 0.01718 |  0.00030 |    0.00778 |\n|  8 | Total sparsity:           | -                |      61090496 |        7063928 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   88.43694 | 0.00000 |  0.00000 |    0.00000 |\n+----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n 2018-04-04 21:30:52,499 - Total sparsity: 88.44\n\n 2018-04-04 21:30:52,499 - --- validate (epoch=89)-----------\n 2018-04-04 21:30:52,499 - 128116 samples (256 per mini-batch)\n 2018-04-04 21:31:35,357 - ==  Top1: 51.838    Top5: 74.817    Loss: 2.150\n\n 2018-04-04 21:31:39,251 - --- test ---------------------\n 2018-04-04 21:31:39,252 - 50000 samples (256 per mini-batch)\n 2018-04-04 21:32:01,274 - ==  Top1: 56.606    Top5: 79.446    Loss: 1.893", 
-            "title": "Results"
-        }, 
-        {
-            "location": "/model_zoo/index.html#to-prune-or-not-to-prune-exploring-the-efficacy-of-pruning-for-model-compression", 
-            "text": "In their paper Zhu and Gupta, \"compare the accuracy of large, but pruned models (large-sparse) and their\nsmaller, but dense (small-dense) counterparts with identical memory footprint.\"\nThey also \"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with\nminimal tuning.\"  This pruning schedule is implemented by distiller.AutomatedGradualPruner, which increases the sparsity level (expressed as a percentage of zero-valued elements) gradually over several pruning steps.  Distiller's implementation only prunes elements once in an epoch (the model is fine-tuned in between pruning events), which is a small deviation from Zhu and Gupta's paper.  The research paper specifies the schedule in terms of mini-batches, while our implementation specifies the schedule in terms of epochs.  We feel that using epochs performs well, and is more \"stable\", since the number of mini-batches will change, if you change the batch size.  ImageNet files:   Distiller schedule:  distiller/examples/agp-pruning/mobilenet.imagenet.schedule_agp.yaml  Checkpoint file:  checkpoint.pth.tar   ResNet18 files:   Distiller schedule:  distiller/examples/agp-pruning/resnet18.schedule_agp.yaml  Checkpoint file:  checkpoint.pth.tar", 
-            "title": "To prune, or not to prune: exploring the efficacy of pruning for model compression"
-        }, 
-        {
-            "location": "/model_zoo/index.html#results_1", 
-            "text": "As our baseline we used a  pretrained PyTorch MobileNet model  (width=1) which has Top1=68.848 and Top5=88.740. \nIn their paper, Zhu and Gupta prune 50% of the elements of MobileNet (width=1) with a 1.1% drop in accuracy.  We pruned about 51.6% of the elements, with virtually no change in the accuracies (Top1: 68.808 and Top5: 88.656).  We didn't try to prune more than this, but we do note that the baseline accuracy that we used is almost 2% lower than the accuracy published in the paper.    +----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n|    | Name                     | Shape              |   NNZ (dense) |   NNZ (sparse) |   Cols (%) |   Rows (%) |   Ch (%) |   2D (%) |   3D (%) |   Fine (%) |     Std |     Mean |   Abs-Mean |\n|----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|\n|  0 | module.model.0.0.weight  | (32, 3, 3, 3)      |           864 |            864 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.14466 |  0.00103 |    0.06508 |\n|  1 | module.model.1.0.weight  | (32, 1, 3, 3)      |           288 |            288 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.32146 |  0.01020 |    0.12932 |\n|  2 | module.model.1.3.weight  | (64, 32, 1, 1)     |          2048 |           2048 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.11942 |  0.00024 |    0.03627 |\n|  3 | module.model.2.0.weight  | (64, 1, 3, 3)      |           576 |            576 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.15809 |  0.00543 |    0.11513 |\n|  4 | module.model.2.3.weight  | (128, 64, 1, 1)    |          8192 |           8192 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.08442 | -0.00031 |    0.04182 |\n|  5 | module.model.3.0.weight  | (128, 1, 3, 3)     |          1152 |           1152 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.16780 |  0.00125 |    0.10545 |\n|  6 | module.model.3.3.weight  | (128, 128, 1, 1)   |         16384 |          16384 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.07126 | -0.00197 |    0.04123 |\n|  7 | module.model.4.0.weight  | (128, 1, 3, 3)     |          1152 |           1152 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.10182 |  0.00171 |    0.08719 |\n|  8 | module.model.4.3.weight  | (256, 128, 1, 1)   |         32768 |          13108 |    0.00000 |    0.00000 | 10.15625 | 59.99756 | 12.50000 |   59.99756 | 0.05543 | -0.00002 |    0.02760 |\n|  9 | module.model.5.0.weight  | (256, 1, 3, 3)     |          2304 |           2304 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.12516 | -0.00288 |    0.08058 |\n| 10 | module.model.5.3.weight  | (256, 256, 1, 1)   |         65536 |          26215 |    0.00000 |    0.00000 | 12.50000 | 59.99908 | 23.82812 |   59.99908 | 0.04453 |  0.00002 |    0.02271 |\n| 11 | module.model.6.0.weight  | (256, 1, 3, 3)     |          2304 |           2304 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.08024 |  0.00252 |    0.06377 |\n| 12 | module.model.6.3.weight  | (512, 256, 1, 1)   |        131072 |          52429 |    0.00000 |    0.00000 | 23.82812 | 59.99985 | 14.25781 |   59.99985 | 0.03561 | -0.00057 |    0.01779 |\n| 13 | module.model.7.0.weight  | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.11008 | -0.00018 |    0.06829 |\n| 14 | module.model.7.3.weight  | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 14.25781 | 59.99985 | 21.28906 |   59.99985 | 0.02944 | -0.00060 |    0.01515 |\n| 15 | module.model.8.0.weight  | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.08258 |  0.00370 |    0.04905 |\n| 16 | module.model.8.3.weight  | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 21.28906 | 59.99985 | 28.51562 |   59.99985 | 0.02865 | -0.00046 |    0.01465 |\n| 17 | module.model.9.0.weight  | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.07578 |  0.00468 |    0.04201 |\n| 18 | module.model.9.3.weight  | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 28.51562 | 59.99985 | 23.43750 |   59.99985 | 0.02939 | -0.00044 |    0.01511 |\n| 19 | module.model.10.0.weight | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.07091 |  0.00014 |    0.04306 |\n| 20 | module.model.10.3.weight | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 24.60938 | 59.99985 | 20.89844 |   59.99985 | 0.03095 | -0.00059 |    0.01672 |\n| 21 | module.model.11.0.weight | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.05729 | -0.00518 |    0.04267 |\n| 22 | module.model.11.3.weight | (512, 512, 1, 1)   |        262144 |         104858 |    0.00000 |    0.00000 | 20.89844 | 59.99985 | 17.57812 |   59.99985 | 0.03229 | -0.00044 |    0.01797 |\n| 23 | module.model.12.0.weight | (512, 1, 3, 3)     |          4608 |           4608 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.04981 | -0.00136 |    0.03967 |\n| 24 | module.model.12.3.weight | (1024, 512, 1, 1)  |        524288 |         209716 |    0.00000 |    0.00000 | 16.01562 | 59.99985 | 44.23828 |   59.99985 | 0.02514 | -0.00106 |    0.01278 |\n| 25 | module.model.13.0.weight | (1024, 1, 3, 3)    |          9216 |           9216 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |    0.00000 | 0.02396 | -0.00949 |    0.01549 |\n| 26 | module.model.13.3.weight | (1024, 1024, 1, 1) |       1048576 |         419431 |    0.00000 |    0.00000 | 44.72656 | 59.99994 |  1.46484 |   59.99994 | 0.01801 | -0.00017 |    0.00931 |\n| 27 | module.fc.weight         | (1000, 1024)       |       1024000 |         409600 |    1.46484 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   60.00000 | 0.05078 |  0.00271 |    0.02734 |\n| 28 | Total sparsity:          | -                  |       4209088 |        1726917 |    0.00000 |    0.00000 |  0.00000 |  0.00000 |  0.00000 |   58.97171 | 0.00000 |  0.00000 |    0.00000 |\n+----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\nTotal sparsity: 58.97\n\n--- validate (epoch=199)-----------\n128116 samples (256 per mini-batch)\n==  Top1: 65.337    Top5: 84.984    Loss: 1.494\n\n--- test ---------------------\n50000 samples (256 per mini-batch)\n==  Top1: 68.810    Top5: 88.626    Loss: 1.282", 
-            "title": "Results"
-        }, 
-        {
-            "location": "/model_zoo/index.html#learning-structured-sparsity-in-deep-neural-networks", 
-            "text": "This research paper from the University of Pittsburgh, \"proposes a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNN\u2019s evaluation.\"  Note that this paper does not use pruning, but instead uses group regularization during the training to force weights towards zero, as a group.  We used a schedule which thresholds the regularized elements at a magnitude equal to the regularization strength.  At the end of the regularization phase, we save the final sparsity masks generated by the regularization, and exit.  Then we load this regularized model, remove the layers corresponding to the zeroed weight tensors (all of a layer's elements have a zero value).", 
-            "title": "Learning Structured Sparsity in Deep Neural Networks"
-        }, 
-        {
-            "location": "/model_zoo/index.html#baseline-training", 
-            "text": "We started by training the baseline ResNet20-Cifar dense network since we didn't have a pre-trained model.   Distiller schedule:  distiller/examples/ssl/resnet20_cifar_baseline_training.yaml  Checkpoint files:  distiller/examples/ssl/checkpoints/   $ time python3 compress_classifier.py --arch resnet20_cifar  ../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/baseline_training.yaml -j=1 --deterministic", 
-            "title": "Baseline training"
-        }, 
-        {
-            "location": "/model_zoo/index.html#regularization", 
-            "text": "Then we started training from scratch again, but this time we used Group Lasso regularization on entire layers: \nDistiller schedule:  distiller/examples/ssl/ssl_4D-removal_4L_training.yaml  $ time python3 compress_classifier.py --arch resnet20_cifar  ../data.cifar10 -p=50 --lr=0.4 --epochs=180 --compress=../ssl/ssl_4D-removal_training.yaml -j=1 --deterministic  The diagram below shows the training of Resnet20/CIFAR10 using Group Lasso regularization on entire layers (in blue) vs. training Resnet20/CIFAR10  baseline (in red).  You may notice several interesting things:\n1. The LR-decay policy is the same, but the two sessions start with different initial LR values.\n2. The data-loss of the regularized training follows the same shape as the un-regularized training (baseline), and eventually the two seem to merge.\n3. We see similar behavior in the validation Top1 and Top5 accuracy results, but the regularized training eventually performs better.\n4. In the top right corner we see the behavior of the regularization loss ( Reg Loss ), which actually increases for some time, until the data-loss has a sharp drop (after ~16K mini-batches), at which point the regularization loss also starts dropping.  This  regularization  yields 5 layers with zeroed weight tensors.  We load this model, remove the 5 layers, and start the fine tuning of the weights.  This process of layer removal is specific to ResNet for CIFAR, which we altered by adding code to skip over layers during the forward path.  When you export to ONNX, the removed layers do not participate in the forward path, so they don't get incarnated.    We managed to remove 5 of the 16 3x3 convolution layers which dominate the computation time.  It's not bad, but we probably could have done better.", 
-            "title": "Regularization"
-        }, 
-        {
-            "location": "/model_zoo/index.html#fine-tuning", 
-            "text": "During the  fine-tuning  process, because the removed layers do not participate in the forward path, they do not appear in the backward path and are not backpropogated: therefore they are completely disconnected from the network. \nWe copy the checkpoint file of the regularized model to  checkpoint_trained_4D_regularized_5Lremoved.pth.tar . \nDistiller schedule:  distiller/examples/ssl/ssl_4D-removal_finetuning.yaml  $ time python3 compress_classifier.py --arch resnet20_cifar  ../data.cifar10 -p=50 --lr=0.1 --epochs=250 --resume=../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar --compress=../ssl/ssl_4D-removal_finetuning.yaml  -j=1 --deterministic", 
-            "title": "Fine-tuning"
-        }, 
-        {
-            "location": "/model_zoo/index.html#results_2", 
-            "text": "Our baseline results for ResNet20 Cifar are: Top1=91.450 and  Top5=99.750  We used Distiller's GroupLassoRegularizer to remove 5 layers from Resnet20 (CIFAR10) with no degradation of the accuracies. \nThe regularized model exhibits really poor classification abilities:   $ time python3 compress_classifier.py --arch resnet20_cifar  ../data.cifar10 -p=50 --resume=../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar --evaluate\n\n=  loading checkpoint ../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar\n   best top@1: 90.620\nLoaded compression schedule from checkpoint (epoch 179)\nRemoving layer: module.layer1.0.conv1 [layer=0 block=0 conv=0]\nRemoving layer: module.layer1.0.conv2 [layer=0 block=0 conv=1]\nRemoving layer: module.layer1.1.conv1 [layer=0 block=1 conv=0]\nRemoving layer: module.layer1.1.conv2 [layer=0 block=1 conv=1]\nRemoving layer: module.layer2.2.conv2 [layer=1 block=2 conv=1]\nFiles already downloaded and verified\nFiles already downloaded and verified\nDataset sizes:\n        training=45000\n        validation=5000\n        test=10000\n--- test ---------------------\n10000 samples (256 per mini-batch)\n==  Top1: 22.290    Top5: 68.940    Loss: 5.172  However, after fine-tuning, we recovered most of the accuracies loss, but not quite all of it: Top1=91.020 and Top5=99.670  We didn't spend time trying to wrestle with this network, and therefore didn't achieve SSL's published results (which showed that they managed to remove 6 layers and at the same time increase accuracies).", 
-            "title": "Results"
-        }, 
-        {
-            "location": "/model_zoo/index.html#pruning-filters-for-efficient-convnets", 
-            "text": "Quoting the authors directly:   We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly.\nIn contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications.   The implementation of the research by Hao et al. required us to add filter-pruning sensitivity analysis, and support for \"network thinning\".  After performing filter-pruning sensitivity analysis to assess which layers are more sensitive to the pruning of filters, we execute distiller.L1RankedStructureParameterPruner once in order to rank the filters of each layer by their L1-norm values, and then we prune the schedule-prescribed sparsity level.     Distiller schedule:  distiller/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank.yaml  Checkpoint files:  checkpoint_finetuned.pth.tar   The excerpt from the schedule, displayed below, shows how we declare the L1RankedStructureParameterPruner.  This class currently ranks filters only, but because in the future this class may support ranking of various structures, you need to specify for each parameter both the target sparsity level, and the structure type ('3D' is filter-wise pruning).  pruners:\n  filter_pruner:\n    class: 'L1RankedStructureParameterPruner'\n    reg_regims:\n      'module.layer1.0.conv1.weight': [0.6, '3D']\n      'module.layer1.1.conv1.weight': [0.6, '3D']\n      'module.layer1.2.conv1.weight': [0.6, '3D']\n      'module.layer1.3.conv1.weight': [0.6, '3D']  In the policy, we specify that we want to invoke this pruner once, at epoch 180.  Because we are starting from a network which was trained for 180 epochs (see Baseline training below), the filter ranking is performed right at the outset of this schedule.  policies:\n  - pruner:\n      instance_name: filter_pruner\n    epochs: [180]  Following the pruning, we want to \"physically\" remove the pruned filters from the network, which involves reconfiguring the Convolutional layers and the parameter tensors.  When we remove filters from Convolution layer  n  we need to perform several changes to the network:\n1. Shrink layer  n 's weights tensor, leaving only the \"important\" filters.\n2. Configure layer  n 's  .out_channels  member to its new, smaller, value.\n3. If a BN layer follows layer  n , then it also needs to be reconfigured and its scale and shift parameter vectors need to be shrunk.\n4. If a Convolution layer follows the BN layer, then it will have less input channels which requires reconfiguration and shrinking of its weights.  All of this is performed by distiller.ResnetCifarFilterRemover which is also scheduled at epoch 180.  We call this process \"network thinning\".  extensions:\n  net_thinner:\n      class: 'FilterRemover'\n      thinning_func_str: remove_filters\n      arch: 'resnet56_cifar'\n      dataset: 'cifar10'  Network thinning requires us to understand the layer connectivity and data-dependency of the DNN, and we are working on a robust method to perform this.  On networks with topologies similar to ResNet (residuals) and GoogLeNet (inception), which have several inputs and outputs to/from Convolution layers, there is extra details to consider. \nOur current implementation is specific to certain layers in ResNet and is a bit fragile.  We will continue to improve and generalize this.", 
-            "title": "Pruning Filters for Efficient ConvNets"
-        }, 
-        {
-            "location": "/model_zoo/index.html#baseline-training_1", 
-            "text": "We started by training the baseline ResNet56-Cifar dense network (180 epochs) since we didn't have a pre-trained model.   Distiller schedule:  distiller/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_baseline_training.yaml  Checkpoint files:  checkpoint.resnet56_cifar_baseline.pth.tar", 
-            "title": "Baseline training"
-        }, 
-        {
-            "location": "/model_zoo/index.html#results_3", 
-            "text": "We trained a ResNet56-Cifar10 network and achieve accuracy results which are on-par with published results:\nTop1: 92.970 and Top5: 99.740.  We used Hao et al.'s algorithm to remove 37.3% of the original convolution MACs, while maintaining virtually the same accuracy as the baseline:\nTop1: 92.830 and Top5: 99.760", 
-            "title": "Results"
-        }, 
-        {
-            "location": "/jupyter/index.html", 
-            "text": "Jupyter environment\n\n\nThe Jupyter notebooks environment allows us to plan our compression session and load Distiller data summaries to study and analyze compression results.\n\n\nEach notebook has embedded instructions and explanations, so here we provide only a brief description of each notebook.\n\n\nInstallation\n\n\nJupyter and its dependencies are included as part of the main \nrequirements.txt\n file, so there is no need for a dedicated installation step.\n\nHowever, to use the ipywidgets extension, you will need to enable it:\n\n\n$ jupyter nbextension enable --py widgetsnbextension --sys-prefix\n\n\n\n\nYou may want to refer to the \nipywidgets extension installation documentation\n.\n\n\nAnother extension which requires special installation handling is \nQgrid\n.  Qgrid is a Jupyter notebook widget that adds interactive features, such as sorting, to Panadas DataFrames rendering.  To enable Qgrid:\n\n\n$ jupyter nbextension enable --py --sys-prefix qgrid\n\n\n\n\nLaunching the Jupyter server\n\n\nThere are all kinds of options to use when launching Jupyter which you can use.  The example below tells the server to listen to connections from any IP address, and not to launch the browser window, but of course, you are free to launch Jupyter any way you want.\n\nConsult the \nuser's guide\n for more details.\n\n\n$ jupyter-notebook --ip=0.0.0.0 --no-browser\n\n\n\n\nUsing the Distiller notebooks\n\n\nThe Distiller Jupyter notebooks are located in the \ndistiller/jupyter\n directory.\n\nThey are provided as tools that you can use to prepare your compression experiments and study their results.\nWe welcome new ideas and implementations of Jupyter.\n\n\nRoughly, the notebooks can be divided into three categories.\n\n\nTheory\n\n\n\n\njupyter/L1-regularization.ipynb\n: Experience hands-on how L1 and L2 regularization affect the solution of a toy loss-minimization problem, to get a better grasp on the interaction between regularization and sparsity.\n\n\njupyter/alexnet_insights.ipynb\n: This notebook reviews and compares a couple of pruning sessions on Alexnet.  We compare distributions, performance, statistics and show some visualizations of the weights tensors.\n\n\n\n\nPreparation for compression\n\n\n\n\njupyter/model_summary.ipynb\n: Begin by getting familiar with your model.  Examine the sizes and properties of layers and connections.  Study which layers are compute-bound, and which are bandwidth-bound, and decide how to prune or regularize the model.\n\n\njupyter/sensitivity_analysis.ipynb\n: If you performed pruning sensitivity analysis on your model, this notebook can help you load the results and graphically study how the layers behave.\n\n\njupyter/interactive_lr_scheduler.ipynb\n: The learning rate decay policy affects pruning results, perhaps as much as it affects training results.  Graph a few LR-decay policies to see how they behave.\n\n\njupyter/jupyter/agp_schedule.ipynb\n: If you are using the Automated Gradual Pruner, this notebook can help you tune the schedule.\n\n\n\n\nReviewing experiment results\n\n\n\n\njupyter/compare_executions.ipynb\n: This is a simple notebook to help you graphically compare the results of executions of several experiments.\n\n\njupyter/compression_insights.ipynb\n: This notebook is packed with code, tables and graphs to us understand the results of a compression session.  Distiller provides \nsummaries\n, which are Pandas dataframes, which contain statistical information about you model.  We chose to use Pandas dataframes because they can be sliced, queried, summarized and graphed with a few lines of code.", 
-            "title": "Jupyter Notebooks"
-        }, 
-        {
-            "location": "/jupyter/index.html#jupyter-environment", 
-            "text": "The Jupyter notebooks environment allows us to plan our compression session and load Distiller data summaries to study and analyze compression results.  Each notebook has embedded instructions and explanations, so here we provide only a brief description of each notebook.", 
-            "title": "Jupyter environment"
-        }, 
-        {
-            "location": "/jupyter/index.html#installation", 
-            "text": "Jupyter and its dependencies are included as part of the main  requirements.txt  file, so there is no need for a dedicated installation step. \nHowever, to use the ipywidgets extension, you will need to enable it:  $ jupyter nbextension enable --py widgetsnbextension --sys-prefix  You may want to refer to the  ipywidgets extension installation documentation .  Another extension which requires special installation handling is  Qgrid .  Qgrid is a Jupyter notebook widget that adds interactive features, such as sorting, to Panadas DataFrames rendering.  To enable Qgrid:  $ jupyter nbextension enable --py --sys-prefix qgrid", 
-            "title": "Installation"
-        }, 
-        {
-            "location": "/jupyter/index.html#launching-the-jupyter-server", 
-            "text": "There are all kinds of options to use when launching Jupyter which you can use.  The example below tells the server to listen to connections from any IP address, and not to launch the browser window, but of course, you are free to launch Jupyter any way you want. \nConsult the  user's guide  for more details.  $ jupyter-notebook --ip=0.0.0.0 --no-browser", 
-            "title": "Launching the Jupyter server"
-        }, 
-        {
-            "location": "/jupyter/index.html#using-the-distiller-notebooks", 
-            "text": "The Distiller Jupyter notebooks are located in the  distiller/jupyter  directory. \nThey are provided as tools that you can use to prepare your compression experiments and study their results.\nWe welcome new ideas and implementations of Jupyter.  Roughly, the notebooks can be divided into three categories.", 
-            "title": "Using the Distiller notebooks"
-        }, 
-        {
-            "location": "/jupyter/index.html#theory", 
-            "text": "jupyter/L1-regularization.ipynb : Experience hands-on how L1 and L2 regularization affect the solution of a toy loss-minimization problem, to get a better grasp on the interaction between regularization and sparsity.  jupyter/alexnet_insights.ipynb : This notebook reviews and compares a couple of pruning sessions on Alexnet.  We compare distributions, performance, statistics and show some visualizations of the weights tensors.", 
-            "title": "Theory"
-        }, 
-        {
-            "location": "/jupyter/index.html#preparation-for-compression", 
-            "text": "jupyter/model_summary.ipynb : Begin by getting familiar with your model.  Examine the sizes and properties of layers and connections.  Study which layers are compute-bound, and which are bandwidth-bound, and decide how to prune or regularize the model.  jupyter/sensitivity_analysis.ipynb : If you performed pruning sensitivity analysis on your model, this notebook can help you load the results and graphically study how the layers behave.  jupyter/interactive_lr_scheduler.ipynb : The learning rate decay policy affects pruning results, perhaps as much as it affects training results.  Graph a few LR-decay policies to see how they behave.  jupyter/jupyter/agp_schedule.ipynb : If you are using the Automated Gradual Pruner, this notebook can help you tune the schedule.", 
-            "title": "Preparation for compression"
-        }, 
-        {
-            "location": "/jupyter/index.html#reviewing-experiment-results", 
-            "text": "jupyter/compare_executions.ipynb : This is a simple notebook to help you graphically compare the results of executions of several experiments.  jupyter/compression_insights.ipynb : This notebook is packed with code, tables and graphs to us understand the results of a compression session.  Distiller provides  summaries , which are Pandas dataframes, which contain statistical information about you model.  We chose to use Pandas dataframes because they can be sliced, queried, summarized and graphed with a few lines of code.", 
-            "title": "Reviewing experiment results"
-        }, 
-        {
-            "location": "/design/index.html", 
-            "text": "Distiller design\n\n\nDistiller is designed to be easily integrated into your own PyTorch research applications.\n\nIt is easiest to understand this integration by examining the code of the sample application for compressing image classification models (\ncompress_classifier.py\n).\n\n\nThe application borrows its main flow code from torchvision's ImageNet classification training sample application (https://github.com/pytorch/examples/tree/master/imagenet). We tried to keep it similar, in order to make it familiar and easy to understand.\n\n\nIntegrating compression is very simple: simply add invocations of the appropriate compression_scheduler callbacks, for each stage in the training.  The training skeleton looks like the pseudo code below.  The boiler-plate Pytorch classification training is speckled with invocations of CompressionScheduler.\n\n\nFor each epoch:\n    compression_scheduler.on_epoch_begin(epoch)\n    train()\n    validate()\n    save_checkpoint()\n    compression_scheduler.on_epoch_end(epoch)\n\ntrain():\n    For each training step:\n        compression_scheduler.on_minibatch_begin(epoch)\n        output = model(input_var)\n        loss = criterion(output, target_var)\n        compression_scheduler.before_backward_pass(epoch)\n        loss.backward()\n        optimizer.step()\n        compression_scheduler.on_minibatch_end(epoch)\n\n\n\n\nThese callbacks can be seen in the diagram below, as the arrow pointing from the Training Loop and into Distiller's \nScheduler\n, which invokes the correct algorithm.  The application also uses Distiller services to collect statistics in \nSummaries\n and logs files, which can be queried at a later time, from Jupyter notebooks or TensorBoard.\n\n\n\n\nSparsification and fine-tuning\n\n\n\n\nThe application sets up a model as normally done in PyTorch.\n\n\nAnd then instantiates a Scheduler and configures it:\n\n\nScheduler configuration is defined in a YAML file\n\n\nThe configuration specifies Policies. Each Policy is tied to a specific algorithm which controls some aspect of the training.\n\n\nSome types of algorithms control the actual sparsification of the model. Such types are \"pruner\" and \"regularizer\".\n\n\nSome algorithms control some parameter of the training process, such as the learning-rate decay scheduler (\nlr_scheduler\n).\n\n\nThe parameters of each algorithm are also specified in the configuration.\n\n\n\n\n\n\n\n\n\n\nIn addition to specifying the algorithm, each Policy specifies scheduling parameters which control when the algorithm is executed: start epoch, end epoch and frequency.\n\n\nThe Scheduler exposes callbacks for relevant training stages: epoch start/end, mini-batch start/end and pre-backward pass. Each scheduler callback activates the policies that were defined according the schedule that was defined.\n\n\nThese callbacks are placed the training loop.\n\n\n\n\nQuantization\n\n\nA quantized model is obtained by replacing existing operations with quantized versions. The quantized versions can be either complete replacements, or wrappers. A wrapper will use the existing modules internally and add quantization and de-quantization operations before/after as necessary.\n\n\nIn Distiller we will provide a set of quantized versions of common operations which will enable implementation of different quantization methods. The user can write a quantized model from scratch, using the quantized operations provided.\n\n\nWe also provide a mechanism which takes an existing model and automatically replaces required operations with quantized versions. This mechanism is exposed by the \nQuantizer\n class. \nQuantizer\n should be sub-classed for each quantization method.\n\n\nModel Transformation\n\n\nThe high-level flow is as follows:\n\n\n\n\nDefine a \nmapping\n between the module types to be replaced (e.g. Conv2D, Linear, etc.) to a function which generates the replacement module. The mapping is defined in the \nreplacement_factory\n attribute of the \nQuantizer\n class.\n\n\nIterate over the modules defined in the model. For each module, if its type is in the mapping, call the replacement generation function. We pass the existing module to this function to allow wrapping of it.\n\n\nReplace the existing module with the module returned by the function. It is important to note that the \nname\n of the module \ndoes not\n change, as that could break the \nforward\n function of the parent module.\n\n\n\n\nDifferent quantization methods may, obviously, use different quantized operations. In addition, different methods may employ different \"strategies\" of replacing / wrapping existing modules. For instance, some methods replace ReLU with another activation function, while others keep it. Hence, for each quantization method, a different \nmapping\n will likely be defined.\n\nEach sub-class of \nQuantizer\n should populate the \nreplacement_factory\n dictionary attribute with the appropriate mapping.\n\nTo execute the model transformation, call the \nprepare_model\n function of the \nQuantizer\n instance.\n\n\nFlexible Bit-Widths\n\n\n\n\nEach instance of \nQuantizer\n is parameterized by the number of bits to be used for quantization of different tensor types. The default ones are activations and weights. These are the \nbits_activations\n and \nbits_weights\n parameters in \nQuantizer\n's constructor. Sub-classes may define bit-widths for other tensor types as needed.\n\n\nWe also want to be able to override the default number of bits mentioned in the bullet above for certain layers. These could be very specific layers. However, many models are comprised of building blocks (\"container\" modules, such as Sequential) which contain several modules, and it is likely we'll want to override settings for entire blocks, or for a certain module across different blocks. When such building blocks are used, the names of the internal modules usually follow some pattern.\n\n\nSo, for this purpose, Quantizer also accepts a mapping of regular expressions to number of bits. This allows the user to override specific layers using they're exact name, or a group of layers via a regular expression. This mapping is passed via the \nbits_overrides\n parameter in the constructor.\n\n\nThe \nbits_overrides\n mapping is required to be an instance of \ncollections.OrderedDict\n (as opposed to just a simple Python \ndict\n). This is done in order to enable handling of overlapping name patterns.\n\n     So, for example, one could define certain override parameters for a group of layers, e.g. 'conv*', but also define different parameters for specific layers in that group, e.g. 'conv1'.\n\n     The patterns are evaluated eagerly - the first match wins. Therefore, the more specific patterns must come before the broad patterns.\n\n\n\n\nWeights Quantization\n\n\nThe \nQuantizer\n class also provides an API to quantize the weights of all layers at once. To use it, the \nparam_quantization_fn\n attribute needs to point to a function that accepts a tensor and the number of bits. During model transformation, the \nQuantizer\n class will build a list of all model parameters that need to be quantized along with their bit-width. Then, the \nquantize_params\n function can be called, which will iterate over all parameters and quantize them using \nparams_quantization_fn\n.\n\n\nQuantization-Aware Training\n\n\nThe \nQuantizer\n class supports quantization-aware training, that is - training with quantization in the loop. This requires handling of a couple of flows / scenarios:\n\n\n\n\n\n\nMaintaining a full precision copy of the weights, as described \nhere\n. This is enabled by setting \ntrain_with_fp_copy=True\n in the \nQuantizer\n constructor. At model transformation, in each module that has parameters that should be quantized, a new \ntorch.nn.Parameter\n is added, which will maintain the required full precision copy of the parameters. Note that this is done in-place - a new module \nis not\n created. We preferred not to sub-class the existing PyTorch modules for this purpose. In order to this in-place, and also guarantee proper back-propagation through the weights quantization function, we employ the following \"hack\": \n\n\n\n\nThe existing \ntorch.nn.Parameter\n, e.g. \nweights\n, is replaced by a \ntorch.nn.Parameter\n named \nfloat_weight\n.\n\n\nTo maintain the existing functionality of the module, we then register a \nbuffer\n in the module with the original name - \nweights\n.\n\n\nDuring training, \nfloat_weight\n will be passed to \nparam_quantization_fn\n and the result will be stored in \nweight\n.\n\n\n\n\n\n\n\n\nIn addition, some quantization methods may introduce additional learned parameters to the model. For example, in the \nPACT\n method, acitvations are clipped to a value \n\\alpha\n, which is a learned parameter per-layer\n\n\n\n\n\n\nTo support these two cases, the \nQuantizer\n class also accepts an instance of a \ntorch.optim.Optimizer\n (normally this would be one an instance of its sub-classes). The quantizer will take care of modifying the optimizer according to the changes made to the parameters.   \n\n\n\n\nOptimizing New Parameters\n\n\nIn cases where new parameters are required by the scheme, it is likely that they'll need to be optimized separately from the main model parameters. In that case, the sub-class for the speicifc method should override \nQuantizer._get_updated_optimizer_params_groups()\n, and return the proper groups plus any desired hyper-parameter overrides.\n\n\n\n\nExamples\n\n\nThe base \nQuantizer\n class is implemented in \ndistiller/quantization/quantizer.py\n.\n\nFor a simple sub-class implementing symmetric linear quantization, see \nSymmetricLinearQuantizer\n in \ndistiller/quantization/range_linear.py\n.\n\nIn \ndistiller/quantization/clipped_linear.py\n there are examples of lower-precision methods which use training with quantization. Specifically, see \nPACTQuantizer\n for an example of overriding \nQuantizer._get_updated_optimizer_params_groups()\n.", 
-            "title": "Design"
-        }, 
-        {
-            "location": "/design/index.html#distiller-design", 
-            "text": "Distiller is designed to be easily integrated into your own PyTorch research applications. \nIt is easiest to understand this integration by examining the code of the sample application for compressing image classification models ( compress_classifier.py ).  The application borrows its main flow code from torchvision's ImageNet classification training sample application (https://github.com/pytorch/examples/tree/master/imagenet). We tried to keep it similar, in order to make it familiar and easy to understand.  Integrating compression is very simple: simply add invocations of the appropriate compression_scheduler callbacks, for each stage in the training.  The training skeleton looks like the pseudo code below.  The boiler-plate Pytorch classification training is speckled with invocations of CompressionScheduler.  For each epoch:\n    compression_scheduler.on_epoch_begin(epoch)\n    train()\n    validate()\n    save_checkpoint()\n    compression_scheduler.on_epoch_end(epoch)\n\ntrain():\n    For each training step:\n        compression_scheduler.on_minibatch_begin(epoch)\n        output = model(input_var)\n        loss = criterion(output, target_var)\n        compression_scheduler.before_backward_pass(epoch)\n        loss.backward()\n        optimizer.step()\n        compression_scheduler.on_minibatch_end(epoch)  These callbacks can be seen in the diagram below, as the arrow pointing from the Training Loop and into Distiller's  Scheduler , which invokes the correct algorithm.  The application also uses Distiller services to collect statistics in  Summaries  and logs files, which can be queried at a later time, from Jupyter notebooks or TensorBoard.", 
-            "title": "Distiller design"
-        }, 
-        {
-            "location": "/design/index.html#sparsification-and-fine-tuning", 
-            "text": "The application sets up a model as normally done in PyTorch.  And then instantiates a Scheduler and configures it:  Scheduler configuration is defined in a YAML file  The configuration specifies Policies. Each Policy is tied to a specific algorithm which controls some aspect of the training.  Some types of algorithms control the actual sparsification of the model. Such types are \"pruner\" and \"regularizer\".  Some algorithms control some parameter of the training process, such as the learning-rate decay scheduler ( lr_scheduler ).  The parameters of each algorithm are also specified in the configuration.      In addition to specifying the algorithm, each Policy specifies scheduling parameters which control when the algorithm is executed: start epoch, end epoch and frequency.  The Scheduler exposes callbacks for relevant training stages: epoch start/end, mini-batch start/end and pre-backward pass. Each scheduler callback activates the policies that were defined according the schedule that was defined.  These callbacks are placed the training loop.", 
-            "title": "Sparsification and fine-tuning"
-        }, 
-        {
-            "location": "/design/index.html#quantization", 
-            "text": "A quantized model is obtained by replacing existing operations with quantized versions. The quantized versions can be either complete replacements, or wrappers. A wrapper will use the existing modules internally and add quantization and de-quantization operations before/after as necessary.  In Distiller we will provide a set of quantized versions of common operations which will enable implementation of different quantization methods. The user can write a quantized model from scratch, using the quantized operations provided.  We also provide a mechanism which takes an existing model and automatically replaces required operations with quantized versions. This mechanism is exposed by the  Quantizer  class.  Quantizer  should be sub-classed for each quantization method.", 
-            "title": "Quantization"
-        }, 
-        {
-            "location": "/design/index.html#model-transformation", 
-            "text": "The high-level flow is as follows:   Define a  mapping  between the module types to be replaced (e.g. Conv2D, Linear, etc.) to a function which generates the replacement module. The mapping is defined in the  replacement_factory  attribute of the  Quantizer  class.  Iterate over the modules defined in the model. For each module, if its type is in the mapping, call the replacement generation function. We pass the existing module to this function to allow wrapping of it.  Replace the existing module with the module returned by the function. It is important to note that the  name  of the module  does not  change, as that could break the  forward  function of the parent module.   Different quantization methods may, obviously, use different quantized operations. In addition, different methods may employ different \"strategies\" of replacing / wrapping existing modules. For instance, some methods replace ReLU with another activation function, while others keep it. Hence, for each quantization method, a different  mapping  will likely be defined. \nEach sub-class of  Quantizer  should populate the  replacement_factory  dictionary attribute with the appropriate mapping. \nTo execute the model transformation, call the  prepare_model  function of the  Quantizer  instance.", 
-            "title": "Model Transformation"
-        }, 
-        {
-            "location": "/design/index.html#flexible-bit-widths", 
-            "text": "Each instance of  Quantizer  is parameterized by the number of bits to be used for quantization of different tensor types. The default ones are activations and weights. These are the  bits_activations  and  bits_weights  parameters in  Quantizer 's constructor. Sub-classes may define bit-widths for other tensor types as needed.  We also want to be able to override the default number of bits mentioned in the bullet above for certain layers. These could be very specific layers. However, many models are comprised of building blocks (\"container\" modules, such as Sequential) which contain several modules, and it is likely we'll want to override settings for entire blocks, or for a certain module across different blocks. When such building blocks are used, the names of the internal modules usually follow some pattern.  So, for this purpose, Quantizer also accepts a mapping of regular expressions to number of bits. This allows the user to override specific layers using they're exact name, or a group of layers via a regular expression. This mapping is passed via the  bits_overrides  parameter in the constructor.  The  bits_overrides  mapping is required to be an instance of  collections.OrderedDict  (as opposed to just a simple Python  dict ). This is done in order to enable handling of overlapping name patterns. \n     So, for example, one could define certain override parameters for a group of layers, e.g. 'conv*', but also define different parameters for specific layers in that group, e.g. 'conv1'. \n     The patterns are evaluated eagerly - the first match wins. Therefore, the more specific patterns must come before the broad patterns.", 
-            "title": "Flexible Bit-Widths"
-        }, 
-        {
-            "location": "/design/index.html#weights-quantization", 
-            "text": "The  Quantizer  class also provides an API to quantize the weights of all layers at once. To use it, the  param_quantization_fn  attribute needs to point to a function that accepts a tensor and the number of bits. During model transformation, the  Quantizer  class will build a list of all model parameters that need to be quantized along with their bit-width. Then, the  quantize_params  function can be called, which will iterate over all parameters and quantize them using  params_quantization_fn .", 
-            "title": "Weights Quantization"
-        }, 
-        {
-            "location": "/design/index.html#quantization-aware-training", 
-            "text": "The  Quantizer  class supports quantization-aware training, that is - training with quantization in the loop. This requires handling of a couple of flows / scenarios:    Maintaining a full precision copy of the weights, as described  here . This is enabled by setting  train_with_fp_copy=True  in the  Quantizer  constructor. At model transformation, in each module that has parameters that should be quantized, a new  torch.nn.Parameter  is added, which will maintain the required full precision copy of the parameters. Note that this is done in-place - a new module  is not  created. We preferred not to sub-class the existing PyTorch modules for this purpose. In order to this in-place, and also guarantee proper back-propagation through the weights quantization function, we employ the following \"hack\":    The existing  torch.nn.Parameter , e.g.  weights , is replaced by a  torch.nn.Parameter  named  float_weight .  To maintain the existing functionality of the module, we then register a  buffer  in the module with the original name -  weights .  During training,  float_weight  will be passed to  param_quantization_fn  and the result will be stored in  weight .     In addition, some quantization methods may introduce additional learned parameters to the model. For example, in the  PACT  method, acitvations are clipped to a value  \\alpha , which is a learned parameter per-layer    To support these two cases, the  Quantizer  class also accepts an instance of a  torch.optim.Optimizer  (normally this would be one an instance of its sub-classes). The quantizer will take care of modifying the optimizer according to the changes made to the parameters.      Optimizing New Parameters  In cases where new parameters are required by the scheme, it is likely that they'll need to be optimized separately from the main model parameters. In that case, the sub-class for the speicifc method should override  Quantizer._get_updated_optimizer_params_groups() , and return the proper groups plus any desired hyper-parameter overrides.", 
-            "title": "Quantization-Aware Training"
-        }, 
-        {
-            "location": "/design/index.html#examples", 
-            "text": "The base  Quantizer  class is implemented in  distiller/quantization/quantizer.py . \nFor a simple sub-class implementing symmetric linear quantization, see  SymmetricLinearQuantizer  in  distiller/quantization/range_linear.py . \nIn  distiller/quantization/clipped_linear.py  there are examples of lower-precision methods which use training with quantization. Specifically, see  PACTQuantizer  for an example of overriding  Quantizer._get_updated_optimizer_params_groups() .", 
-            "title": "Examples"
-        }, 
-        {
-            "location": "/tutorial-struct_pruning/index.html", 
-            "text": "Pruning Filters \n Channels\n\n\nIntroduction\n\n\nChannel and filter pruning are examples of structured-pruning which create compressed models that do not require special hardware to execute.  This latter fact makes this form of structured pruning particularly interesting and popular.\nIn networks that have serial data dependencies, it is pretty straight-forward to understand and define how to prune channels and filters.  However, in more complex models,  with parallel-data dependencies (paths) - such as ResNets (skip connections) and GoogLeNet (Inception layers) \u2013 things become increasingly more complex and require a deeper understanding of the data flow in the model, in order to define the pruning schedule.\n\nThis post explains channel and filter pruning, the challenges, and how to define a Distiller pruning schedule for these structures.  The details of the implementation are left for a separate post.\n\n\nBefore we dive into pruning, let\u2019s level-set on the terminology, because different people (and even research papers) do not always agree on the nomenclature.  This reflects my understanding of the nomenclature, and therefore these are the names used in Distiller.  I\u2019ll restrict this discussion to Convolution layers in CNNs, to contain the scope of the topic I\u2019ll be covering, although Distiller supports pruning of other structures such as matrix columns and rows.\nPyTorch describes \ntorch.nn.Conv2d\n as applying \u201ca 2D convolution over an input signal composed of several input planes.\u201d  We call each of these input planes a \nfeature-map\n (or FM, for short).  Another name is \ninput channel\n, as in the R/G/B channels of an image.  Some people refer to feature-maps as \nactivations\n (i.e. the activation of neurons), although I think strictly speaking \nactivations\n are the output of an activation layer that was fed a group of feature-maps.  Because it is very common, and because the use of an activation is orthogonal to our discussion, I will use \nactivations\n to refer to the output of a Convolution layer (i.e. 3D stack of feature-maps).\n\n\nIn the PyTorch documentation Convolution outputs have shape (N, C\nout\n, H\nout\n, W\nout\n) where N is a batch size, C\nout\n denotes a number of output channels, H\nout\n is a height of output planes in pixels, and W\nout\n is width in pixels.  We won\u2019t be paying much attention to the batch-size since it\u2019s not important to our discussion, so without loss of generality we can set N=1.  I\u2019m also assuming the most common Convolutions having \ngroups==1\n.\nConvolution weights are 4D: (F, C, K, K) where F is the number of filters, C is the number of channels, and K is the kernel size (we can assume the kernel height and width are equal for simplicity).  A \nkernel\n is a 2D matrix (K, K) that is part of a 3D feature detector.  This feature detector is called a \nfilter\n and it is basically a stack of 2D \nkernels\n.  Each kernel is convolved with a 2D input channel (i.e. feature-map) so if there are C\nin\n channels in the input, then there are C\nin\n kernels in a filter (C == C\nin\n).  Each filter is convolved with the entire input to create a single output channel (i.e. feature-map).  If there are C\nout\n output channels, then there are C\nout\n filters (F == C\nout\n).\n\n\nFilter Pruning\n\n\nFilter pruning and channel pruning are very similar, and I\u2019ll expand on that similarity later on \u2013 but for now let\u2019s focus on filter pruning.\n\nIn filter pruning we use some criterion to determine which filters are \nimportant\n and which are not.  Researchers came up with all sorts of pruning criteria: the L1-magnitude of the filters (citation), the entropy of the activations (citation), and the classification accuracy reduction (citation) are just some examples.  Disregarding how we chose the filters to prune, let\u2019s imagine that in the diagram below, we chose to prune (remove) the green and orange filters (the circle with the \u201c*\u201d designates a Convolution operation).\n\n\nSince we have two less filters operating on the input, we must have two less output feature-maps.  So when we prune filters, besides changing the physical size of the weight tensors, we also need to reconfigure the immediate Convolution layer (change its \nout_channels\n) and the following Convolution layer (change its \nin_channels\n).  And finally, because the next layer\u2019s input is now smaller (has fewer channels),  we should also shrink the next layer\u2019s weights tensors, by removing the channels corresponding to the filters we pruned.  We say that there is a \ndata-dependency\n between the two Convolution layers.  I didn\u2019t make any mention of the activation function that usually follows Convolution, because these functions are parameter-less and are not sensitive to the shape of their input.\nThere are some other dependencies that Distiller resolves (such as Optimizer parameters tightly-coupled to the weights) that I won\u2019t discuss here, because they are implementation details.\n\n\n\nThe scheduler YAML syntax for this example is pasted below.  We use L1-norm ranking of weight filters, and the pruning-rate is set by the AGP algorithm (Automatic Gradual Pruning).  The Convolution layers are conveniently named \nconv1\n and \nconv2\n in this example.\n\n\npruners:\n  example_pruner:\n    class: L1RankedStructureParameterPruner_AGP\n    initial_sparsity : 0.10\n    final_sparsity: 0.50\n    group_type: Filters\n    weights: [module.conv1.weight]\n\n\n\n\nNow let\u2019s add a Batch Normalization layer between the two convolutions:\n\n\n\nThe Batch Normalization layer is parameterized by a couple of tensors that contain information per input-channel (i.e. scale and shift).  Because our Convolution produces less output FMs, and these are the input to the Batch Normalization layer, we also need to reconfigure the Batch Normalization layer.  And we also need to physically shrink the Batch Normalization layer\u2019s scale and shift tensors, which are coefficients in the BN input transformation.  Moreover, the scale and shift coefficients that we remove from the tensors, must correspond to the filters (or output feature-maps channels) that we removed from the Convolution weight tensors.  This small nuance will prove to be a large pain, but we\u2019ll get to that in later examples.\nThe presence of a Batch Normalization layer in the example above is transparent to us, and in fact, the YAML schedule does not change.  Distiller detects the presence of Batch Normalization layers and adjusts their parameters automatically.\n\n\nLet\u2019s look at another example, with non-serial data-dependencies.  Here, the output of \nconv1\n is the input for \nconv2\n and \nconv3\n.  This is an example of parallel data-dependency, since both \nconv2\n and \nconv3\n depend on \nconv1\n.\n\n\n\nNote that the Distiller YAML schedule is unchanged from the previous two examples, since we are still only explicitly pruning the weight filters of \nconv1\n.  The weight channels of \nconv2\n and \nconv3\n are pruned implicitly by Distiller in a process called \u201cThinning\u201d (on which I will expand in a different post).\n\n\nNext, let\u2019s look at another example also involving three Convolutions, but this time we want to prune the filters of two convolutional layers, whose outputs are element-wise-summed and fed into a third Convolution.\nIn this example \nconv3\n is dependent on both \nconv1\n and \nconv2\n, and there are two implications to this dependency.  The first, and more obvious implication, is that we need to prune the same number of filters from both \nconv1\n and \nconv2\n.  Since we apply element-wise addition on the outputs of \nconv1\n and \nconv2\n, they must have the same shape - and they can only have the same shape if \nconv1\n and \nconv2\n prune the same number of filters.  The second implication of this triangular data-dependency is that both \nconv1\n and \nconv2\n must prune the \nsame\n filters!  Let\u2019s imagine for a moment, that we ignore this second constraint.  The diagram below illustrates the dilemma that arises: how should we prune the channels of the weights of \nconv3\n?  Obviously, we can\u2019t.\n\n\n\nWe must apply the second constraint \u2013 and that means that we now need to be proactive: we need to decide whether to use the prune \nconv1\n and \nconv2\n according to the filter-pruning choices of \nconv1\n or of \nconv2\n.  The diagram below illustrates the pruning scheme after deciding to follow the pruning choices of \nconv1\n.\n\n\n\nThe YAML compression schedule syntax needs to be able to express the two dependencies (or constraints) discussed above.  First we need to tell the Filter Pruner that we there is a dependency of type \nLeader\n.  This means that all of the tensors listed in the \nweights\n field are pruned together, to the same extent at each iteration, and that to prune the filters we will use the pruning decisions of the first tensor listed.  In the example below \nmodule.conv1.weight\n and \nmodule.conv2.weight\n are pruned together according to the pruning choices for \nmodule.conv1.weight\n.\n\n\npruners:\n  example_pruner:\n    class: L1RankedStructureParameterPruner_AGP\n    initial_sparsity : 0.10\n    final_sparsity: 0.50\n    group_type: Filters\n    group_dependency: Leader\n    weights: [module.conv1.weight, module.conv2.weight]\n\n\n\n\nWhen we turn to filter-pruning ResNets we see some pretty long dependency chains because of the skip-connections.  If you don\u2019t pay attention, you can easily under-specify (or mis-specify) dependency chains and Distiller will exit with an exception.  The exception does not explain the specification error and this needs to be improved.\n\n\nChannel Pruning\n\n\nChannel pruning is very similar to Filter pruning with all the details of dependencies reversed.  Look again at example #1, but this time imagine that we\u2019ve changed our schedule to prune the \nchannels\n of \nmodule.conv2.weight\n.\n\n\npruners:\n  example_pruner:\n    class: L1RankedStructureParameterPruner_AGP\n    initial_sparsity : 0.10\n    final_sparsity: 0.50\n    group_type: Channels\n    weights: [module.conv2.weight]\n\n\n\n\nAs the diagram shows, \nconv1\n is now dependent on \nconv2\n and its weights filters will be implicitly pruned according to the channels removed from the weights of \nconv2\n.\n\n\n\nGeek On.", 
-            "title": "Pruning Filters and Channels"
-        }, 
-        {
-            "location": "/tutorial-struct_pruning/index.html#pruning-filters-channels", 
-            "text": "", 
-            "title": "Pruning Filters &amp; Channels"
-        }, 
-        {
-            "location": "/tutorial-struct_pruning/index.html#introduction", 
-            "text": "Channel and filter pruning are examples of structured-pruning which create compressed models that do not require special hardware to execute.  This latter fact makes this form of structured pruning particularly interesting and popular.\nIn networks that have serial data dependencies, it is pretty straight-forward to understand and define how to prune channels and filters.  However, in more complex models,  with parallel-data dependencies (paths) - such as ResNets (skip connections) and GoogLeNet (Inception layers) \u2013 things become increasingly more complex and require a deeper understanding of the data flow in the model, in order to define the pruning schedule. \nThis post explains channel and filter pruning, the challenges, and how to define a Distiller pruning schedule for these structures.  The details of the implementation are left for a separate post.  Before we dive into pruning, let\u2019s level-set on the terminology, because different people (and even research papers) do not always agree on the nomenclature.  This reflects my understanding of the nomenclature, and therefore these are the names used in Distiller.  I\u2019ll restrict this discussion to Convolution layers in CNNs, to contain the scope of the topic I\u2019ll be covering, although Distiller supports pruning of other structures such as matrix columns and rows.\nPyTorch describes  torch.nn.Conv2d  as applying \u201ca 2D convolution over an input signal composed of several input planes.\u201d  We call each of these input planes a  feature-map  (or FM, for short).  Another name is  input channel , as in the R/G/B channels of an image.  Some people refer to feature-maps as  activations  (i.e. the activation of neurons), although I think strictly speaking  activations  are the output of an activation layer that was fed a group of feature-maps.  Because it is very common, and because the use of an activation is orthogonal to our discussion, I will use  activations  to refer to the output of a Convolution layer (i.e. 3D stack of feature-maps).  In the PyTorch documentation Convolution outputs have shape (N, C out , H out , W out ) where N is a batch size, C out  denotes a number of output channels, H out  is a height of output planes in pixels, and W out  is width in pixels.  We won\u2019t be paying much attention to the batch-size since it\u2019s not important to our discussion, so without loss of generality we can set N=1.  I\u2019m also assuming the most common Convolutions having  groups==1 .\nConvolution weights are 4D: (F, C, K, K) where F is the number of filters, C is the number of channels, and K is the kernel size (we can assume the kernel height and width are equal for simplicity).  A  kernel  is a 2D matrix (K, K) that is part of a 3D feature detector.  This feature detector is called a  filter  and it is basically a stack of 2D  kernels .  Each kernel is convolved with a 2D input channel (i.e. feature-map) so if there are C in  channels in the input, then there are C in  kernels in a filter (C == C in ).  Each filter is convolved with the entire input to create a single output channel (i.e. feature-map).  If there are C out  output channels, then there are C out  filters (F == C out ).", 
-            "title": "Introduction"
-        }, 
-        {
-            "location": "/tutorial-struct_pruning/index.html#filter-pruning", 
-            "text": "Filter pruning and channel pruning are very similar, and I\u2019ll expand on that similarity later on \u2013 but for now let\u2019s focus on filter pruning. \nIn filter pruning we use some criterion to determine which filters are  important  and which are not.  Researchers came up with all sorts of pruning criteria: the L1-magnitude of the filters (citation), the entropy of the activations (citation), and the classification accuracy reduction (citation) are just some examples.  Disregarding how we chose the filters to prune, let\u2019s imagine that in the diagram below, we chose to prune (remove) the green and orange filters (the circle with the \u201c*\u201d designates a Convolution operation).  Since we have two less filters operating on the input, we must have two less output feature-maps.  So when we prune filters, besides changing the physical size of the weight tensors, we also need to reconfigure the immediate Convolution layer (change its  out_channels ) and the following Convolution layer (change its  in_channels ).  And finally, because the next layer\u2019s input is now smaller (has fewer channels),  we should also shrink the next layer\u2019s weights tensors, by removing the channels corresponding to the filters we pruned.  We say that there is a  data-dependency  between the two Convolution layers.  I didn\u2019t make any mention of the activation function that usually follows Convolution, because these functions are parameter-less and are not sensitive to the shape of their input.\nThere are some other dependencies that Distiller resolves (such as Optimizer parameters tightly-coupled to the weights) that I won\u2019t discuss here, because they are implementation details.  The scheduler YAML syntax for this example is pasted below.  We use L1-norm ranking of weight filters, and the pruning-rate is set by the AGP algorithm (Automatic Gradual Pruning).  The Convolution layers are conveniently named  conv1  and  conv2  in this example.  pruners:\n  example_pruner:\n    class: L1RankedStructureParameterPruner_AGP\n    initial_sparsity : 0.10\n    final_sparsity: 0.50\n    group_type: Filters\n    weights: [module.conv1.weight]  Now let\u2019s add a Batch Normalization layer between the two convolutions:  The Batch Normalization layer is parameterized by a couple of tensors that contain information per input-channel (i.e. scale and shift).  Because our Convolution produces less output FMs, and these are the input to the Batch Normalization layer, we also need to reconfigure the Batch Normalization layer.  And we also need to physically shrink the Batch Normalization layer\u2019s scale and shift tensors, which are coefficients in the BN input transformation.  Moreover, the scale and shift coefficients that we remove from the tensors, must correspond to the filters (or output feature-maps channels) that we removed from the Convolution weight tensors.  This small nuance will prove to be a large pain, but we\u2019ll get to that in later examples.\nThe presence of a Batch Normalization layer in the example above is transparent to us, and in fact, the YAML schedule does not change.  Distiller detects the presence of Batch Normalization layers and adjusts their parameters automatically.  Let\u2019s look at another example, with non-serial data-dependencies.  Here, the output of  conv1  is the input for  conv2  and  conv3 .  This is an example of parallel data-dependency, since both  conv2  and  conv3  depend on  conv1 .  Note that the Distiller YAML schedule is unchanged from the previous two examples, since we are still only explicitly pruning the weight filters of  conv1 .  The weight channels of  conv2  and  conv3  are pruned implicitly by Distiller in a process called \u201cThinning\u201d (on which I will expand in a different post).  Next, let\u2019s look at another example also involving three Convolutions, but this time we want to prune the filters of two convolutional layers, whose outputs are element-wise-summed and fed into a third Convolution.\nIn this example  conv3  is dependent on both  conv1  and  conv2 , and there are two implications to this dependency.  The first, and more obvious implication, is that we need to prune the same number of filters from both  conv1  and  conv2 .  Since we apply element-wise addition on the outputs of  conv1  and  conv2 , they must have the same shape - and they can only have the same shape if  conv1  and  conv2  prune the same number of filters.  The second implication of this triangular data-dependency is that both  conv1  and  conv2  must prune the  same  filters!  Let\u2019s imagine for a moment, that we ignore this second constraint.  The diagram below illustrates the dilemma that arises: how should we prune the channels of the weights of  conv3 ?  Obviously, we can\u2019t.  We must apply the second constraint \u2013 and that means that we now need to be proactive: we need to decide whether to use the prune  conv1  and  conv2  according to the filter-pruning choices of  conv1  or of  conv2 .  The diagram below illustrates the pruning scheme after deciding to follow the pruning choices of  conv1 .  The YAML compression schedule syntax needs to be able to express the two dependencies (or constraints) discussed above.  First we need to tell the Filter Pruner that we there is a dependency of type  Leader .  This means that all of the tensors listed in the  weights  field are pruned together, to the same extent at each iteration, and that to prune the filters we will use the pruning decisions of the first tensor listed.  In the example below  module.conv1.weight  and  module.conv2.weight  are pruned together according to the pruning choices for  module.conv1.weight .  pruners:\n  example_pruner:\n    class: L1RankedStructureParameterPruner_AGP\n    initial_sparsity : 0.10\n    final_sparsity: 0.50\n    group_type: Filters\n    group_dependency: Leader\n    weights: [module.conv1.weight, module.conv2.weight]  When we turn to filter-pruning ResNets we see some pretty long dependency chains because of the skip-connections.  If you don\u2019t pay attention, you can easily under-specify (or mis-specify) dependency chains and Distiller will exit with an exception.  The exception does not explain the specification error and this needs to be improved.", 
-            "title": "Filter Pruning"
-        }, 
-        {
-            "location": "/tutorial-struct_pruning/index.html#channel-pruning", 
-            "text": "Channel pruning is very similar to Filter pruning with all the details of dependencies reversed.  Look again at example #1, but this time imagine that we\u2019ve changed our schedule to prune the  channels  of  module.conv2.weight .  pruners:\n  example_pruner:\n    class: L1RankedStructureParameterPruner_AGP\n    initial_sparsity : 0.10\n    final_sparsity: 0.50\n    group_type: Channels\n    weights: [module.conv2.weight]  As the diagram shows,  conv1  is now dependent on  conv2  and its weights filters will be implicitly pruned according to the channels removed from the weights of  conv2 .  Geek On.", 
-            "title": "Channel Pruning"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html", 
-            "text": "Using Distiller to prune a PyTorch language model\n\n\nContents\n\n\n\n\nIntroduction\n\n\nSetup\n\n\nPreparing the code\n\n\nTraining-loop\n\n\nCreating compression baselines\n\n\nCompressing the language model\n\n\nWhat are we compressing?\n\n\nHow are we compressing?\n\n\nWhen are we compressing?\n\n\nUntil next time\n\n\n\n\nIntroduction\n\n\nIn this tutorial I'll show you how to compress a word-level language model using \nDistiller\n.  Specifically, we use PyTorch\u2019s \nword-level language model sample code\n as the code-base of our example, weave in some Distiller code, and show how we compress the model using two different element-wise pruning algorithms.  To make things manageable, I've divided the tutorial to two parts: in the first we will setup the sample application and prune using \nAGP\n.  In the second part I'll show how I've added Baidu's RNN pruning algorithm and then use it to prune the same word-level language model.  The completed code is available \nhere\n.\n\n\nThe results are displayed below and the code is available \nhere\n.\nNote that we can improve the results by training longer, since the loss curves are usually still decreasing at the end of epoch 40.  However, for demonstration purposes we don\u2019t need to do this.\n\n\n\n\n\n\n\n\nType\n\n\nSparsity\n\n\nNNZ\n\n\nValidation\n\n\nTest\n\n\nCommand line\n\n\n\n\n\n\n\n\n\n\nSmall\n\n\n0%\n\n\n7,135,600\n\n\n101.13\n\n\n96.29\n\n\ntime python3 main.py --cuda --epochs 40 --tied --wd=1e-6\n\n\n\n\n\n\nMedium\n\n\n0%\n\n\n28,390,700\n\n\n88.17\n\n\n84.21\n\n\ntime python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied,--wd=1e-6\n\n\n\n\n\n\nLarge\n\n\n0%\n\n\n85,917,000\n\n\n87.49\n\n\n83.85\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-6\n\n\n\n\n\n\nLarge\n\n\n70%\n\n\n25,487,550\n\n\n90.67\n\n\n85.96\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml\n\n\n\n\n\n\nLarge\n\n\n70%\n\n\n25,487,550\n\n\n90.59\n\n\n85.84\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml --wd=1e-6\n\n\n\n\n\n\nLarge\n\n\n70%\n\n\n25,487,550\n\n\n87.40\n\n\n82.93\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70B.schedule_agp.yaml --wd=1e-6\n\n\n\n\n\n\nLarge\n\n\n80.4%\n\n\n16,847,550\n\n\n89.31\n\n\n83.64\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_80.schedule_agp.yaml --wd=1e-6\n\n\n\n\n\n\nLarge\n\n\n90%\n\n\n8,591,700\n\n\n90.70\n\n\n85.67\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_90.schedule_agp.yaml --wd=1e-6\n\n\n\n\n\n\nLarge\n\n\n95%\n\n\n4,295,850\n\n\n98.42\n\n\n92.79\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_95.schedule_agp.yaml --wd=1e-6\n\n\n\n\n\n\n\n\nTable 1: AGP language model pruning results. \nNNZ stands for number of non-zero coefficients (embeddings are counted once, because they are tied).\n\n\n\n\n\n\n  \nFigure 1: Perplexity vs model size (lower perplexity is better).\n\n\n\n\nThe model is composed of an Encoder embedding, two LSTMs, and a Decoder embedding.  The Encoder and decoder embeddings (projections) are tied to improve perplexity results (per https://arxiv.org/pdf/1611.01462.pdf), so in the sparsity statistics we account for only one of the encoder/decoder embeddings.  We used the WikiText2 dataset (twice as large as PTB).\n\n\nWe compared three model sizes: small (7.1M; 14M), medium (28M; 50M), large: (86M; 136M) \u2013 reported as (#parameters net/tied; #parameters gross).\nThe results reported below use a preset seed (for reproducibility), and we expect results can be improved if we allow \u201ctrue\u201d pseudo-randomness.  We limited our tests to 40 epochs, even though validation perplexity was still trending down.\n\n\nEssentially, this recreates the language model experiment in the AGP paper, and validates its conclusions:\n\n \u201cWe see that sparse models are able to outperform dense models which have significantly more parameters.\u201d\n\n The 80% sparse large model (which has 16.9M parameters and a perplexity of 83.64) is able to outperform the dense medium (which has 28.4M parameters and a perplexity of 84.21), a model which has 1.7 times more parameters.  It also outperform the dense large model, which exemplifies how pruning can act as a regularizer.\n* \u201cOur results show that pruning works very well not only on the dense LSTM weights and dense softmax layer but also the dense embedding matrix. This suggests that during the optimization procedure the neural network can find a good sparse embedding for the words in the vocabulary that works well together with the sparse connectivity structure of the LSTM weights and softmax layer.\u201d\n\n\nSetup\n\n\nWe start by cloning Pytorch\u2019s example \nrepository\n. I\u2019ve copied the language model code to distiller\u2019s examples/word_language_model directory, so I\u2019ll use that for the rest of the tutorial.\nNext, let\u2019s create and activate a virtual environment, as explained in Distiller's \nREADME\n file.\nNow we can turn our attention to \nmain.py\n, which contains the training application.\n\n\nPreparing the code\n\n\nWe begin by adding code to invoke Distiller in file \nmain.py\n.  This involves a bit of mechanics, because we did not \npip install\n Distiller in our environment (we don\u2019t have a \nsetup.py\n script for Distiller as of yet).  To make Distiller library functions accessible from \nmain.py\n, we modify \nsys.path\n to include the distiller root directory by taking the current directory and pointing two directories up.  This is very specific to the location of this example code, and it will break if you\u2019ve placed the code elsewhere \u2013 so be aware.\n\n\nimport os\nimport sys\nscript_dir = os.path.dirname(__file__)\nmodule_path = os.path.abspath(os.path.join(script_dir, '..', '..'))\nif module_path not in sys.path:\n    sys.path.append(module_path)\nimport distiller\nimport apputils\nfrom distiller.data_loggers import TensorBoardLogger, PythonLogger\n\n\n\n\nNext, we augment the application arguments with two Distiller-specific arguments.  The first, \n--summary\n, gives us the ability to do simple compression instrumentation (e.g. log sparsity statistics).  The second argument, \n--compress\n, is how we tell the application where the compression scheduling file is located.\nWe also add two arguments - momentum and weight-decay - for the SGD optimizer.  As I explain later, I replaced the original code's optimizer with SGD, so we need these extra arguments.\n\n\n# Distiller-related arguments\nSUMMARY_CHOICES = ['sparsity', 'model', 'modules', 'png', 'percentile']\nparser.add_argument('--summary', type=str, choices=SUMMARY_CHOICES,\n                    help='print a summary of the model, and exit - options: ' +\n                    ' | '.join(SUMMARY_CHOICES))\nparser.add_argument('--compress', dest='compress', type=str, nargs='?', action='store',\n                    help='configuration file for pruning the model (default is to use hard-coded schedule)')\nparser.add_argument('--momentum', default=0., type=float, metavar='M',\n                    help='momentum')\nparser.add_argument('--weight-decay', '--wd', default=0., type=float,\n                    metavar='W', help='weight decay (default: 1e-4)')\n\n\n\n\nWe add code to handle the \n--summary\n application argument.  It can be as simple as forwarding to \ndistiller.model_summary\n or more complex, as in the Distiller sample.\n\n\nif args.summary:\n    distiller.model_summary(model, None, args.summary, 'wikitext2')\n    exit(0)\n\n\n\n\nSimilarly, we add code to handle the \n--compress\n argument, which creates a CompressionScheduler and configures it from a YAML schedule file:\n\n\nif args.compress:\n    source = args.compress\n    compression_scheduler = distiller.CompressionScheduler(model)\n    distiller.config.fileConfig(model, None, compression_scheduler, args.compress, msglogger)\n\n\n\n\nWe also create the optimizer, and the learning-rate decay policy scheduler.  The original PyTorch example manually manages the optimization and LR decay process, but I think that having a standard optimizer and LR-decay schedule gives us the flexibility to experiment with these during the training process.  Using an \nSGD optimizer\n configured with \nmomentum=0\n and \nweight_decay=0\n, and a \nReduceLROnPlateau LR-decay policy\n with \npatience=0\n and \nfactor=0.5\n will give the same behavior as in the original PyTorch example.  From there, we can experiment with the optimizer and LR-decay configuration.\n\n\noptimizer = torch.optim.SGD(model.parameters(), args.lr,\n                            momentum=args.momentum,\n                            weight_decay=args.weight_decay)\nlr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min',\n                                                          patience=0, verbose=True, factor=0.5)\n\n\n\n\nNext, we add code to setup the logging backends: a Python logger backend which reads its configuration from file and logs messages to the console and log file (\npylogger\n); and a TensorBoard backend logger which logs statistics to a TensorBoard data file (\ntflogger\n).  I configured the TensorBoard backend to log gradients because RNNs suffer from vanishing and exploding gradients, so we might want to take a look in case the training experiences a sudden failure.\nThis code is not strictly required, but it is quite useful to be able to log the session progress, and to export logs to TensorBoard for realtime visualization of the training progress.\n\n\n# Distiller loggers\nmsglogger = apputils.config_pylogger('logging.conf', None)\ntflogger = TensorBoardLogger(msglogger.logdir)\ntflogger.log_gradients = True\npylogger = PythonLogger(msglogger)\n\n\n\n\nTraining loop\n\n\nNow we scroll down all the way to the train() function.  We'll change its signature to include the \nepoch\n, \noptimizer\n, and \ncompression_schdule\n.   We'll soon see why we need these.\n\n\ndef train(epoch, optimizer, compression_scheduler=None)\n\n\n\n\nFunction \ntrain()\n is responsible for training the network in batches for one epoch, and in its epoch loop we want to perform compression.   The \nCompressionScheduler\n invokes \nScheduledTrainingPolicy\n instances per the scheduling specification that was programmed in the \nCompressionScheduler\n instance.  There are four main \nSchedulingPolicy\n types: \nPruningPolicy\n, \nRegularizationPolicy\n, \nLRPolicy\n, and \nQuantizationPolicy\n.  We'll be using \nPruningPolicy\n, which is triggered \non_epoch_begin\n (to invoke the \nPruners\n, and \non_minibatch_begin\n (to mask the weights).   Later we will create a YAML scheduling file, and specify the schedule of \nAutomatedGradualPruner\n instances.  \n\n\nBecause we are writing a single application, which can be used with various Policies in the future (e.g. group-lasso regularization), we should add code to invoke all of the \nCompressionScheduler\n's callbacks, not just the mandatory \non_epoch_begin\n callback.    We invoke \non_minibatch_begin\n before running the forward-pass, \nbefore_backward_pass\n after computing the loss, and \non_minibatch_end\n after completing the backward-pass.\n\n\n\ndef train(epoch, optimizer, compression_scheduler=None):\n    ...\n\n    # The line below was fixed as per: https://github.com/pytorch/examples/issues/214\n    for batch, i in enumerate(range(0, train_data.size(0), args.bptt)):\n        data, targets = get_batch(train_data, i)\n        # Starting each batch, we detach the hidden state from how it was previously produced.\n        # If we didn't, the model would try backpropagating all the way to start of the dataset.\n        hidden = repackage_hidden(hidden)\n\n        \nif compression_scheduler:\n            compression_scheduler.on_minibatch_begin(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch)\n\n        output, hidden = model(data, hidden)\n        loss = criterion(output.view(-1, ntokens), targets)\n\n        \nif compression_scheduler:\n            compression_scheduler.before_backward_pass(epoch, minibatch_id=batch,\n                                                       minibatches_per_epoch=steps_per_epoch,\n                                                       loss=loss)\n\n        optimizer.zero_grad()\n        loss.backward()\n\n        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.\n        torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)\n        optimizer.step()\n\n        total_loss += loss.item()\n\n        \nif compression_scheduler:\n            compression_scheduler.on_minibatch_end(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch)\n\n\n\n\n\nThe rest of the code could stay as in the original PyTorch sample, but I wanted to use an SGD optimizer, so I replaced:\n\n\nfor p in model.parameters():\n    p.data.add_(-lr, p.grad.data)\n\n\n\n\nwith:\n\n\noptimizer.step()\n\n\n\n\nThe rest of the code in function \ntrain()\n logs to a text file and a \nTensorBoard\n backend.  Again, such code is not mandatory, but a few lines give us a lot of visibility: we have training progress information saved to log, and we can monitor the training progress in realtime on TensorBoard.  That's a lot for a few lines of code ;-)\n\n\n\nif batch % args.log_interval == 0 and batch > 0:\n    cur_loss = total_loss / args.log_interval\n    elapsed = time.time() - start_time\n    lr = optimizer.param_groups[0]['lr']\n    msglogger.info(\n            '| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.4f} | ms/batch {:5.2f} '\n            '| loss {:5.2f} | ppl {:8.2f}'.format(\n        epoch, batch, len(train_data) // args.bptt, lr,\n        elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss)))\n    total_loss = 0\n    start_time = time.time()\n    stats = ('Peformance/Training/',\n        OrderedDict([\n            ('Loss', cur_loss),\n            ('Perplexity', math.exp(cur_loss)),\n            ('LR', lr),\n            ('Batch Time', elapsed * 1000)])\n        )\n    steps_completed = batch + 1\n    distiller.log_training_progress(stats, model.named_parameters(), epoch, steps_completed,\n                                    steps_per_epoch, args.log_interval, [tflogger])\n\n\n\n\nFinally we get to the outer training-loop which loops on \nargs.epochs\n.  We add the two final \nCompressionScheduler\n callbacks: \non_epoch_begin\n, at the start of the loop, and \non_epoch_end\n after running \nevaluate\n on the model and updating the learning-rate.\n\n\n\ntry:\n    for epoch in range(0, args.epochs):\n        epoch_start_time = time.time()\n        \nif compression_scheduler:\n            compression_scheduler.on_epoch_begin(epoch)\n\n\n        train(epoch, optimizer, compression_scheduler)\n        val_loss = evaluate(val_data)\n        lr_scheduler.step(val_loss)\n\n        \nif compression_scheduler:\n            compression_scheduler.on_epoch_end(epoch)\n\n\n\n\n\nAnd that's it!  The language model sample is ready for compression.  \n\n\nCreating compression baselines\n\n\nIn \nTo prune, or not to prune: exploring the efficacy of pruning for model compression\n Zhu and Gupta, \"compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint.\" They also \"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning.\"\n\nThis pruning schedule is implemented by distiller.AutomatedGradualPruner (AGP), which increases the sparsity level (expressed as a percentage of zero-valued elements) gradually over several pruning steps. Distiller's implementation only prunes elements once in an epoch (the model is fine-tuned in between pruning events), which is a small deviation from Zhu and Gupta's paper. The research paper specifies the schedule in terms of mini-batches, while our implementation specifies the schedule in terms of epochs. We feel that using epochs performs well, and is more \"stable\", since the number of mini-batches will change, if you change the batch size.\n\n\nBefore we start compressing stuff ;-), we need to create baselines so we have something to benchmark against.  Let's prepare small, medium, and large baseline models, like Table 3 of \nTo prune, or Not to Prune\n.  These will provide baseline perplexity results that we'll compare the compressed models against.  \n\nI chose to use tied input/output embeddings, and constrained the training to 40 epochs.  The table below shows the model sizes, where we are interested in the tied version (biases are ignored due to their small size and because we don't prune them).\n\n\n\n\n\n\n\n\nSize\n\n\nNumber of Weights (untied)\n\n\nNumber of Weights (tied)\n\n\n\n\n\n\n\n\n\n\nSmall\n\n\n13,951,200\n\n\n7,295,600\n\n\n\n\n\n\nMedium\n\n\n50,021,400\n\n\n28,390,700\n\n\n\n\n\n\nLarge\n\n\n135,834,000\n\n\n85,917,000\n\n\n\n\n\n\n\n\nI started experimenting with the optimizer setup like in the PyTorch example, but I added some L2 regularization when I noticed that the training was overfitting.  The two right columns show the perplexity results (lower is better) of each of the models with no L2 regularization and with 1e-5 and 1e-6.\nIn all three model sizes using the smaller L2 regularization (1e-6) gave the best results.  BTW, I'm not showing here experiments with even lower regularization because that did not help.\n\n\n\n\n\n\n\n\nType\n\n\nCommand line\n\n\nValidation\n\n\nTest\n\n\n\n\n\n\n\n\n\n\nSmall\n\n\ntime python3 main.py --cuda --epochs 40 --tied\n\n\n105.23\n\n\n99.53\n\n\n\n\n\n\nSmall\n\n\ntime python3 main.py --cuda --epochs 40 --tied --wd=1e-6\n\n\n101.13\n\n\n96.29\n\n\n\n\n\n\nSmall\n\n\ntime python3 main.py --cuda --epochs 40 --tied --wd=1e-5\n\n\n109.49\n\n\n103.53\n\n\n\n\n\n\nMedium\n\n\ntime python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied\n\n\n90.93\n\n\n86.20\n\n\n\n\n\n\nMedium\n\n\ntime python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --wd=1e-6\n\n\n88.17\n\n\n84.21\n\n\n\n\n\n\nMedium\n\n\ntime python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --wd=1e-5\n\n\n97.75\n\n\n93.06\n\n\n\n\n\n\nLarge\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied\n\n\n88.23\n\n\n84.21\n\n\n\n\n\n\nLarge\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-6\n\n\n87.49\n\n\n83.85\n\n\n\n\n\n\nLarge\n\n\ntime python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-5\n\n\n99.22\n\n\n94.28\n\n\n\n\n\n\n\n\nCompressing the language model\n\n\nOK, so now let's recreate the results of the language model experiment from section 4.2 of paper.  We're using PyTorch's sample, so the language model we implement is not exactly like the one in the AGP paper (and uses a different dataset), but it's close enough, so if everything goes well, we should see similar compression results.\n\n\nWhat are we compressing?\n\n\nTo gain insight about the model parameters, we can use the command-line to produce a weights-sparsity table:\n\n\n$ python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --summary=sparsity\n\nParameters:\n+---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n|         | Name             | Shape         |   NNZ (dense) |   NNZ (sparse) |   Cols (%) |   Rows (%) |   Ch (%) |   2D (%) |   3D (%) |   Fine (%) |     Std |     Mean |   Abs-Mean |\n|---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|\n| 0.00000 | encoder.weight   | (33278, 1500) |      49917000 |       49916999 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.05773 | -0.00000 |    0.05000 |\n| 1.00000 | rnn.weight_ih_l0 | (6000, 1500)  |       9000000 |        9000000 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.01491 |  0.00001 |    0.01291 |\n| 2.00000 | rnn.weight_hh_l0 | (6000, 1500)  |       9000000 |        8999999 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00001 | 0.01491 |  0.00000 |    0.01291 |\n| 3.00000 | rnn.weight_ih_l1 | (6000, 1500)  |       9000000 |        8999999 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00001 | 0.01490 | -0.00000 |    0.01291 |\n| 4.00000 | rnn.weight_hh_l1 | (6000, 1500)  |       9000000 |        9000000 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.01491 | -0.00000 |    0.01291 |\n| 5.00000 | decoder.weight   | (33278, 1500) |      49917000 |       49916999 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.05773 | -0.00000 |    0.05000 |\n| 6.00000 | Total sparsity:  | -             |     135834000 |      135833996 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.00000 |  0.00000 |    0.00000 |\n+---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\nTotal sparsity: 0.00\n\n\n\n\nSo what's going on here?\n\nencoder.weight\n and \ndecoder.weight\n are the input and output embeddings, respectively.  Remember that in the configuration I chose for the three model sizes these embeddings are tied, which means that we only have one copy of parameters, that is shared between the encoder and decoder.\nWe also have two pairs of RNN (LSTM really) parameters.  There is a pair because the model uses the command-line argument \nargs.nlayers\n to decide how many instances of RNN (or LSTM or GRU) cells to use, and it defaults to 2.  The recurrent cells are LSTM cells, because this is the default of \nargs.model\n, which is used in the initialization of \nRNNModel\n.  Let's look at the parameters of the first RNN: \nrnn.weight_ih_l0\n and \nrnn.weight_hh_l0\n: what are these?\n\nRecall the \nLSTM equations\n that PyTorch implements.  In the equations, there are 8 instances of vector-matrix multiplication (when batch=1).  These can be combined into a single matrix-matrix multiplication (GEMM), but PyTorch groups these into two GEMM operations: one GEMM multiplies the inputs (\nrnn.weight_ih_l0\n), and the other multiplies the hidden-state (\nrnn.weight_hh_l0\n).  \n\n\nHow are we compressing?\n\n\nLet's turn to the configurations of the Large language model compression schedule to 70%, 80%, 90% and 95% sparsity. Using AGP it is easy to configure the pruning schedule to produce an exact sparsity of the compressed model.  I'll use the \n70% schedule\n to show a concrete example.\n\n\nThe YAML file has two sections: \npruners\n and \npolicies\n.  Section \npruners\n defines instances of \nParameterPruner\n - in our case we define three instances of \nAutomatedGradualPruner\n: for the weights of the first RNN (\nl0_rnn_pruner\n), the second RNN (\nl1_rnn_pruner\n) and the embedding layer (\nembedding_pruner\n).  These names are arbitrary, and serve are name-handles which bind Policies to Pruners - so you can use whatever names you want.\nEach \nAutomatedGradualPruner\n is configured with an \ninitial_sparsity\n and \nfinal_sparsity\n.  For examples, the \nl0_rnn_pruner\n below is configured to prune 5% of the weights as soon as it starts working, and finish when 70% of the weights have been pruned.  The \nweights\n parameter tells the Pruner which weight tensors to prune.\n\n\npruners:\n  l0_rnn_pruner:\n    class: AutomatedGradualPruner\n    initial_sparsity : 0.05\n    final_sparsity: 0.70\n    weights: [rnn.weight_ih_l0, rnn.weight_hh_l0]\n\n  l1_rnn_pruner:\n    class: AutomatedGradualPruner\n    initial_sparsity : 0.05\n    final_sparsity: 0.70\n    weights: [rnn.weight_ih_l1, rnn.weight_hh_l1]\n\n  embedding_pruner:\n    class: AutomatedGradualPruner\n    initial_sparsity : 0.05\n    final_sparsity: 0.70\n    weights: [encoder.weight]\n\n\n\n\nWhen are we compressing?\n\n\nIf the \npruners\n section defines \"what-to-do\", the \npolicies\n section defines \"when-to-do\".  This part is harder, because we define the pruning schedule, which requires us to try a few different schedules until we understand which schedule works best.\nBelow we define three \nPruningPolicy\n instances.  The first two instances start operating at epoch 2 (\nstarting_epoch\n), end at epoch 20 (\nending_epoch\n), and operate once every epoch (\nfrequency\n; as I explained above, Distiller's Pruning scheduling operates only at \non_epoch_begin\n).  In between pruning operations, the pruned model is fine-tuned.\n\n\npolicies:\n  - pruner:\n      instance_name : l0_rnn_pruner\n    starting_epoch: 2\n    ending_epoch: 20  \n    frequency: 1\n\n  - pruner:\n      instance_name : l1_rnn_pruner\n    starting_epoch: 2\n    ending_epoch: 20\n    frequency: 1\n\n  - pruner:\n      instance_name : embedding_pruner\n    starting_epoch: 3\n    ending_epoch: 21\n    frequency: 1\n\n\n\n\nWe invoke the compression as follows:\n\n\n$ time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml\n\n\n\n\nTable 1\n above shows that we can make a negligible improvement when adding L2 regularization.  I did some experimenting with the sparsity distribution between the layers, and the scheduling frequency and noticed that the embedding layers are much less sensitive to pruning than the RNN cells.  I didn't notice any difference between the RNN cells, but I also didn't invest in this exploration.\nA new \n70% sparsity schedule\n, prunes the RNNs only to 50% sparsity, but prunes the embedding to 85% sparsity, and achieves almost a 3 points improvement in the test perplexity results.\n\n\nWe provide \nsimilar pruning schedules\n for the other compression rates.\n\n\nUntil next time\n\n\nThis concludes the first part of the tutorial on pruning a PyTorch language model.\n\nIn the next installment, I'll explain how we added an implementation of Baidu Research's \nExploring Sparsity in Recurrent Neural Networks\n paper, and applied to this language model.\n\n\nGeek On.", 
-            "title": "Pruning a Language Model"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#using-distiller-to-prune-a-pytorch-language-model", 
-            "text": "", 
-            "title": "Using Distiller to prune a PyTorch language model"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#contents", 
-            "text": "Introduction  Setup  Preparing the code  Training-loop  Creating compression baselines  Compressing the language model  What are we compressing?  How are we compressing?  When are we compressing?  Until next time", 
-            "title": "Contents"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#introduction", 
-            "text": "In this tutorial I'll show you how to compress a word-level language model using  Distiller .  Specifically, we use PyTorch\u2019s  word-level language model sample code  as the code-base of our example, weave in some Distiller code, and show how we compress the model using two different element-wise pruning algorithms.  To make things manageable, I've divided the tutorial to two parts: in the first we will setup the sample application and prune using  AGP .  In the second part I'll show how I've added Baidu's RNN pruning algorithm and then use it to prune the same word-level language model.  The completed code is available  here .  The results are displayed below and the code is available  here .\nNote that we can improve the results by training longer, since the loss curves are usually still decreasing at the end of epoch 40.  However, for demonstration purposes we don\u2019t need to do this.     Type  Sparsity  NNZ  Validation  Test  Command line      Small  0%  7,135,600  101.13  96.29  time python3 main.py --cuda --epochs 40 --tied --wd=1e-6    Medium  0%  28,390,700  88.17  84.21  time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied,--wd=1e-6    Large  0%  85,917,000  87.49  83.85  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-6    Large  70%  25,487,550  90.67  85.96  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml    Large  70%  25,487,550  90.59  85.84  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml --wd=1e-6    Large  70%  25,487,550  87.40  82.93  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70B.schedule_agp.yaml --wd=1e-6    Large  80.4%  16,847,550  89.31  83.64  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_80.schedule_agp.yaml --wd=1e-6    Large  90%  8,591,700  90.70  85.67  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_90.schedule_agp.yaml --wd=1e-6    Large  95%  4,295,850  98.42  92.79  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_95.schedule_agp.yaml --wd=1e-6     Table 1: AGP language model pruning results.  NNZ stands for number of non-zero coefficients (embeddings are counted once, because they are tied).   \n   Figure 1: Perplexity vs model size (lower perplexity is better).   The model is composed of an Encoder embedding, two LSTMs, and a Decoder embedding.  The Encoder and decoder embeddings (projections) are tied to improve perplexity results (per https://arxiv.org/pdf/1611.01462.pdf), so in the sparsity statistics we account for only one of the encoder/decoder embeddings.  We used the WikiText2 dataset (twice as large as PTB).  We compared three model sizes: small (7.1M; 14M), medium (28M; 50M), large: (86M; 136M) \u2013 reported as (#parameters net/tied; #parameters gross).\nThe results reported below use a preset seed (for reproducibility), and we expect results can be improved if we allow \u201ctrue\u201d pseudo-randomness.  We limited our tests to 40 epochs, even though validation perplexity was still trending down.  Essentially, this recreates the language model experiment in the AGP paper, and validates its conclusions:  \u201cWe see that sparse models are able to outperform dense models which have significantly more parameters.\u201d  The 80% sparse large model (which has 16.9M parameters and a perplexity of 83.64) is able to outperform the dense medium (which has 28.4M parameters and a perplexity of 84.21), a model which has 1.7 times more parameters.  It also outperform the dense large model, which exemplifies how pruning can act as a regularizer.\n* \u201cOur results show that pruning works very well not only on the dense LSTM weights and dense softmax layer but also the dense embedding matrix. This suggests that during the optimization procedure the neural network can find a good sparse embedding for the words in the vocabulary that works well together with the sparse connectivity structure of the LSTM weights and softmax layer.\u201d", 
-            "title": "Introduction"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#setup", 
-            "text": "We start by cloning Pytorch\u2019s example  repository . I\u2019ve copied the language model code to distiller\u2019s examples/word_language_model directory, so I\u2019ll use that for the rest of the tutorial.\nNext, let\u2019s create and activate a virtual environment, as explained in Distiller's  README  file.\nNow we can turn our attention to  main.py , which contains the training application.", 
-            "title": "Setup"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#preparing-the-code", 
-            "text": "We begin by adding code to invoke Distiller in file  main.py .  This involves a bit of mechanics, because we did not  pip install  Distiller in our environment (we don\u2019t have a  setup.py  script for Distiller as of yet).  To make Distiller library functions accessible from  main.py , we modify  sys.path  to include the distiller root directory by taking the current directory and pointing two directories up.  This is very specific to the location of this example code, and it will break if you\u2019ve placed the code elsewhere \u2013 so be aware.  import os\nimport sys\nscript_dir = os.path.dirname(__file__)\nmodule_path = os.path.abspath(os.path.join(script_dir, '..', '..'))\nif module_path not in sys.path:\n    sys.path.append(module_path)\nimport distiller\nimport apputils\nfrom distiller.data_loggers import TensorBoardLogger, PythonLogger  Next, we augment the application arguments with two Distiller-specific arguments.  The first,  --summary , gives us the ability to do simple compression instrumentation (e.g. log sparsity statistics).  The second argument,  --compress , is how we tell the application where the compression scheduling file is located.\nWe also add two arguments - momentum and weight-decay - for the SGD optimizer.  As I explain later, I replaced the original code's optimizer with SGD, so we need these extra arguments.  # Distiller-related arguments\nSUMMARY_CHOICES = ['sparsity', 'model', 'modules', 'png', 'percentile']\nparser.add_argument('--summary', type=str, choices=SUMMARY_CHOICES,\n                    help='print a summary of the model, and exit - options: ' +\n                    ' | '.join(SUMMARY_CHOICES))\nparser.add_argument('--compress', dest='compress', type=str, nargs='?', action='store',\n                    help='configuration file for pruning the model (default is to use hard-coded schedule)')\nparser.add_argument('--momentum', default=0., type=float, metavar='M',\n                    help='momentum')\nparser.add_argument('--weight-decay', '--wd', default=0., type=float,\n                    metavar='W', help='weight decay (default: 1e-4)')  We add code to handle the  --summary  application argument.  It can be as simple as forwarding to  distiller.model_summary  or more complex, as in the Distiller sample.  if args.summary:\n    distiller.model_summary(model, None, args.summary, 'wikitext2')\n    exit(0)  Similarly, we add code to handle the  --compress  argument, which creates a CompressionScheduler and configures it from a YAML schedule file:  if args.compress:\n    source = args.compress\n    compression_scheduler = distiller.CompressionScheduler(model)\n    distiller.config.fileConfig(model, None, compression_scheduler, args.compress, msglogger)  We also create the optimizer, and the learning-rate decay policy scheduler.  The original PyTorch example manually manages the optimization and LR decay process, but I think that having a standard optimizer and LR-decay schedule gives us the flexibility to experiment with these during the training process.  Using an  SGD optimizer  configured with  momentum=0  and  weight_decay=0 , and a  ReduceLROnPlateau LR-decay policy  with  patience=0  and  factor=0.5  will give the same behavior as in the original PyTorch example.  From there, we can experiment with the optimizer and LR-decay configuration.  optimizer = torch.optim.SGD(model.parameters(), args.lr,\n                            momentum=args.momentum,\n                            weight_decay=args.weight_decay)\nlr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min',\n                                                          patience=0, verbose=True, factor=0.5)  Next, we add code to setup the logging backends: a Python logger backend which reads its configuration from file and logs messages to the console and log file ( pylogger ); and a TensorBoard backend logger which logs statistics to a TensorBoard data file ( tflogger ).  I configured the TensorBoard backend to log gradients because RNNs suffer from vanishing and exploding gradients, so we might want to take a look in case the training experiences a sudden failure.\nThis code is not strictly required, but it is quite useful to be able to log the session progress, and to export logs to TensorBoard for realtime visualization of the training progress.  # Distiller loggers\nmsglogger = apputils.config_pylogger('logging.conf', None)\ntflogger = TensorBoardLogger(msglogger.logdir)\ntflogger.log_gradients = True\npylogger = PythonLogger(msglogger)", 
-            "title": "Preparing the code"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#training-loop", 
-            "text": "Now we scroll down all the way to the train() function.  We'll change its signature to include the  epoch ,  optimizer , and  compression_schdule .   We'll soon see why we need these.  def train(epoch, optimizer, compression_scheduler=None)  Function  train()  is responsible for training the network in batches for one epoch, and in its epoch loop we want to perform compression.   The  CompressionScheduler  invokes  ScheduledTrainingPolicy  instances per the scheduling specification that was programmed in the  CompressionScheduler  instance.  There are four main  SchedulingPolicy  types:  PruningPolicy ,  RegularizationPolicy ,  LRPolicy , and  QuantizationPolicy .  We'll be using  PruningPolicy , which is triggered  on_epoch_begin  (to invoke the  Pruners , and  on_minibatch_begin  (to mask the weights).   Later we will create a YAML scheduling file, and specify the schedule of  AutomatedGradualPruner  instances.    Because we are writing a single application, which can be used with various Policies in the future (e.g. group-lasso regularization), we should add code to invoke all of the  CompressionScheduler 's callbacks, not just the mandatory  on_epoch_begin  callback.    We invoke  on_minibatch_begin  before running the forward-pass,  before_backward_pass  after computing the loss, and  on_minibatch_end  after completing the backward-pass.  \ndef train(epoch, optimizer, compression_scheduler=None):\n    ...\n\n    # The line below was fixed as per: https://github.com/pytorch/examples/issues/214\n    for batch, i in enumerate(range(0, train_data.size(0), args.bptt)):\n        data, targets = get_batch(train_data, i)\n        # Starting each batch, we detach the hidden state from how it was previously produced.\n        # If we didn't, the model would try backpropagating all the way to start of the dataset.\n        hidden = repackage_hidden(hidden)\n\n         if compression_scheduler:\n            compression_scheduler.on_minibatch_begin(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch) \n        output, hidden = model(data, hidden)\n        loss = criterion(output.view(-1, ntokens), targets)\n\n         if compression_scheduler:\n            compression_scheduler.before_backward_pass(epoch, minibatch_id=batch,\n                                                       minibatches_per_epoch=steps_per_epoch,\n                                                       loss=loss) \n        optimizer.zero_grad()\n        loss.backward()\n\n        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.\n        torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)\n        optimizer.step()\n\n        total_loss += loss.item()\n\n         if compression_scheduler:\n            compression_scheduler.on_minibatch_end(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch)   The rest of the code could stay as in the original PyTorch sample, but I wanted to use an SGD optimizer, so I replaced:  for p in model.parameters():\n    p.data.add_(-lr, p.grad.data)  with:  optimizer.step()  The rest of the code in function  train()  logs to a text file and a  TensorBoard  backend.  Again, such code is not mandatory, but a few lines give us a lot of visibility: we have training progress information saved to log, and we can monitor the training progress in realtime on TensorBoard.  That's a lot for a few lines of code ;-)  \nif batch % args.log_interval == 0 and batch > 0:\n    cur_loss = total_loss / args.log_interval\n    elapsed = time.time() - start_time\n    lr = optimizer.param_groups[0]['lr']\n    msglogger.info(\n            '| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.4f} | ms/batch {:5.2f} '\n            '| loss {:5.2f} | ppl {:8.2f}'.format(\n        epoch, batch, len(train_data) // args.bptt, lr,\n        elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss)))\n    total_loss = 0\n    start_time = time.time()\n    stats = ('Peformance/Training/',\n        OrderedDict([\n            ('Loss', cur_loss),\n            ('Perplexity', math.exp(cur_loss)),\n            ('LR', lr),\n            ('Batch Time', elapsed * 1000)])\n        )\n    steps_completed = batch + 1\n    distiller.log_training_progress(stats, model.named_parameters(), epoch, steps_completed,\n                                    steps_per_epoch, args.log_interval, [tflogger])  Finally we get to the outer training-loop which loops on  args.epochs .  We add the two final  CompressionScheduler  callbacks:  on_epoch_begin , at the start of the loop, and  on_epoch_end  after running  evaluate  on the model and updating the learning-rate.  \ntry:\n    for epoch in range(0, args.epochs):\n        epoch_start_time = time.time()\n         if compression_scheduler:\n            compression_scheduler.on_epoch_begin(epoch) \n\n        train(epoch, optimizer, compression_scheduler)\n        val_loss = evaluate(val_data)\n        lr_scheduler.step(val_loss)\n\n         if compression_scheduler:\n            compression_scheduler.on_epoch_end(epoch)   And that's it!  The language model sample is ready for compression.", 
-            "title": "Training loop"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#creating-compression-baselines", 
-            "text": "In  To prune, or not to prune: exploring the efficacy of pruning for model compression  Zhu and Gupta, \"compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint.\" They also \"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning.\" \nThis pruning schedule is implemented by distiller.AutomatedGradualPruner (AGP), which increases the sparsity level (expressed as a percentage of zero-valued elements) gradually over several pruning steps. Distiller's implementation only prunes elements once in an epoch (the model is fine-tuned in between pruning events), which is a small deviation from Zhu and Gupta's paper. The research paper specifies the schedule in terms of mini-batches, while our implementation specifies the schedule in terms of epochs. We feel that using epochs performs well, and is more \"stable\", since the number of mini-batches will change, if you change the batch size.  Before we start compressing stuff ;-), we need to create baselines so we have something to benchmark against.  Let's prepare small, medium, and large baseline models, like Table 3 of  To prune, or Not to Prune .  These will provide baseline perplexity results that we'll compare the compressed models against.   \nI chose to use tied input/output embeddings, and constrained the training to 40 epochs.  The table below shows the model sizes, where we are interested in the tied version (biases are ignored due to their small size and because we don't prune them).     Size  Number of Weights (untied)  Number of Weights (tied)      Small  13,951,200  7,295,600    Medium  50,021,400  28,390,700    Large  135,834,000  85,917,000     I started experimenting with the optimizer setup like in the PyTorch example, but I added some L2 regularization when I noticed that the training was overfitting.  The two right columns show the perplexity results (lower is better) of each of the models with no L2 regularization and with 1e-5 and 1e-6.\nIn all three model sizes using the smaller L2 regularization (1e-6) gave the best results.  BTW, I'm not showing here experiments with even lower regularization because that did not help.     Type  Command line  Validation  Test      Small  time python3 main.py --cuda --epochs 40 --tied  105.23  99.53    Small  time python3 main.py --cuda --epochs 40 --tied --wd=1e-6  101.13  96.29    Small  time python3 main.py --cuda --epochs 40 --tied --wd=1e-5  109.49  103.53    Medium  time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied  90.93  86.20    Medium  time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --wd=1e-6  88.17  84.21    Medium  time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --wd=1e-5  97.75  93.06    Large  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied  88.23  84.21    Large  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-6  87.49  83.85    Large  time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-5  99.22  94.28", 
-            "title": "Creating compression baselines"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#compressing-the-language-model", 
-            "text": "OK, so now let's recreate the results of the language model experiment from section 4.2 of paper.  We're using PyTorch's sample, so the language model we implement is not exactly like the one in the AGP paper (and uses a different dataset), but it's close enough, so if everything goes well, we should see similar compression results.", 
-            "title": "Compressing the language model"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#what-are-we-compressing", 
-            "text": "To gain insight about the model parameters, we can use the command-line to produce a weights-sparsity table:  $ python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --summary=sparsity\n\nParameters:\n+---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\n|         | Name             | Shape         |   NNZ (dense) |   NNZ (sparse) |   Cols (%) |   Rows (%) |   Ch (%) |   2D (%) |   3D (%) |   Fine (%) |     Std |     Mean |   Abs-Mean |\n|---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------|\n| 0.00000 | encoder.weight   | (33278, 1500) |      49917000 |       49916999 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.05773 | -0.00000 |    0.05000 |\n| 1.00000 | rnn.weight_ih_l0 | (6000, 1500)  |       9000000 |        9000000 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.01491 |  0.00001 |    0.01291 |\n| 2.00000 | rnn.weight_hh_l0 | (6000, 1500)  |       9000000 |        8999999 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00001 | 0.01491 |  0.00000 |    0.01291 |\n| 3.00000 | rnn.weight_ih_l1 | (6000, 1500)  |       9000000 |        8999999 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00001 | 0.01490 | -0.00000 |    0.01291 |\n| 4.00000 | rnn.weight_hh_l1 | (6000, 1500)  |       9000000 |        9000000 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.01491 | -0.00000 |    0.01291 |\n| 5.00000 | decoder.weight   | (33278, 1500) |      49917000 |       49916999 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.05773 | -0.00000 |    0.05000 |\n| 6.00000 | Total sparsity:  | -             |     135834000 |      135833996 |    0.00000 |    0.00000 |        0 |  0.00000 |        0 |    0.00000 | 0.00000 |  0.00000 |    0.00000 |\n+---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+\nTotal sparsity: 0.00  So what's going on here? encoder.weight  and  decoder.weight  are the input and output embeddings, respectively.  Remember that in the configuration I chose for the three model sizes these embeddings are tied, which means that we only have one copy of parameters, that is shared between the encoder and decoder.\nWe also have two pairs of RNN (LSTM really) parameters.  There is a pair because the model uses the command-line argument  args.nlayers  to decide how many instances of RNN (or LSTM or GRU) cells to use, and it defaults to 2.  The recurrent cells are LSTM cells, because this is the default of  args.model , which is used in the initialization of  RNNModel .  Let's look at the parameters of the first RNN:  rnn.weight_ih_l0  and  rnn.weight_hh_l0 : what are these? \nRecall the  LSTM equations  that PyTorch implements.  In the equations, there are 8 instances of vector-matrix multiplication (when batch=1).  These can be combined into a single matrix-matrix multiplication (GEMM), but PyTorch groups these into two GEMM operations: one GEMM multiplies the inputs ( rnn.weight_ih_l0 ), and the other multiplies the hidden-state ( rnn.weight_hh_l0 ).", 
-            "title": "What are we compressing?"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#how-are-we-compressing", 
-            "text": "Let's turn to the configurations of the Large language model compression schedule to 70%, 80%, 90% and 95% sparsity. Using AGP it is easy to configure the pruning schedule to produce an exact sparsity of the compressed model.  I'll use the  70% schedule  to show a concrete example.  The YAML file has two sections:  pruners  and  policies .  Section  pruners  defines instances of  ParameterPruner  - in our case we define three instances of  AutomatedGradualPruner : for the weights of the first RNN ( l0_rnn_pruner ), the second RNN ( l1_rnn_pruner ) and the embedding layer ( embedding_pruner ).  These names are arbitrary, and serve are name-handles which bind Policies to Pruners - so you can use whatever names you want.\nEach  AutomatedGradualPruner  is configured with an  initial_sparsity  and  final_sparsity .  For examples, the  l0_rnn_pruner  below is configured to prune 5% of the weights as soon as it starts working, and finish when 70% of the weights have been pruned.  The  weights  parameter tells the Pruner which weight tensors to prune.  pruners:\n  l0_rnn_pruner:\n    class: AutomatedGradualPruner\n    initial_sparsity : 0.05\n    final_sparsity: 0.70\n    weights: [rnn.weight_ih_l0, rnn.weight_hh_l0]\n\n  l1_rnn_pruner:\n    class: AutomatedGradualPruner\n    initial_sparsity : 0.05\n    final_sparsity: 0.70\n    weights: [rnn.weight_ih_l1, rnn.weight_hh_l1]\n\n  embedding_pruner:\n    class: AutomatedGradualPruner\n    initial_sparsity : 0.05\n    final_sparsity: 0.70\n    weights: [encoder.weight]", 
-            "title": "How are we compressing?"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#when-are-we-compressing", 
-            "text": "If the  pruners  section defines \"what-to-do\", the  policies  section defines \"when-to-do\".  This part is harder, because we define the pruning schedule, which requires us to try a few different schedules until we understand which schedule works best.\nBelow we define three  PruningPolicy  instances.  The first two instances start operating at epoch 2 ( starting_epoch ), end at epoch 20 ( ending_epoch ), and operate once every epoch ( frequency ; as I explained above, Distiller's Pruning scheduling operates only at  on_epoch_begin ).  In between pruning operations, the pruned model is fine-tuned.  policies:\n  - pruner:\n      instance_name : l0_rnn_pruner\n    starting_epoch: 2\n    ending_epoch: 20  \n    frequency: 1\n\n  - pruner:\n      instance_name : l1_rnn_pruner\n    starting_epoch: 2\n    ending_epoch: 20\n    frequency: 1\n\n  - pruner:\n      instance_name : embedding_pruner\n    starting_epoch: 3\n    ending_epoch: 21\n    frequency: 1  We invoke the compression as follows:  $ time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml  Table 1  above shows that we can make a negligible improvement when adding L2 regularization.  I did some experimenting with the sparsity distribution between the layers, and the scheduling frequency and noticed that the embedding layers are much less sensitive to pruning than the RNN cells.  I didn't notice any difference between the RNN cells, but I also didn't invest in this exploration.\nA new  70% sparsity schedule , prunes the RNNs only to 50% sparsity, but prunes the embedding to 85% sparsity, and achieves almost a 3 points improvement in the test perplexity results.  We provide  similar pruning schedules  for the other compression rates.", 
-            "title": "When are we compressing?"
-        }, 
-        {
-            "location": "/tutorial-lang_model/index.html#until-next-time", 
-            "text": "This concludes the first part of the tutorial on pruning a PyTorch language model. \nIn the next installment, I'll explain how we added an implementation of Baidu Research's  Exploring Sparsity in Recurrent Neural Networks  paper, and applied to this language model.  Geek On.", 
-            "title": "Until next time"
-        }
-    ]
-}
\ No newline at end of file
+{"config":{"lang":["en"],"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"index.html","text":"Distiller Documentation What is Distiller Distiller is an open-source Python package for neural network compression research. Network compression can reduce the footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low precision arithmetic. Distiller contains: A framework for integrating pruning, regularization and quantization algorithms. A set of tools for analyzing and evaluating compression performance. Example implementations of state-of-the-art compression algorithms. Motivation A sparse tensor is any tensor that contains some zeros, but sparse tensors are usually only interesting if they contain a significant number of zeros. A sparse neural network performs computations using some sparse tensors (preferably many). These tensors can be parameters (weights and biases) or activations (feature maps). Why do we care about sparsity? Present day neural networks tend to be deep, with millions of weights and activations. Refer to GoogLeNet or ResNet50, for a couple of examples. These large models are compute-intensive which means that even with dedicated acceleration hardware, the inference pass (network evaluation) will take time. You might think that latency is an issue only in certain cases, such as autonomous driving systems, but in fact, whenever we humans interact with our phones and computers, we are sensitive to the latency of the interaction. We don't like to wait for search results or for an application or web-page to load, and we are especially sensitive in realtime interactions such as speech recognition. So inference latency is often something we want to minimize. Large models are also memory-intensive with millions of parameters. Moving around all of the data required to compute inference results consumes energy, which is a problem on a mobile device as well as in a server environment. Data center server-racks are limited by their power-envelope and their ToC (total cost of ownership) is correlated to their power consumption and thermal characteristics. In the mobile device environment, we are obviously always aware of the implications of power consumption on the device battery. Inference performance in the data center is often measured using a KPI (key performance indicator) which folds latency and power considerations: inferences per second, per Watt (inferences/sec/watt). The storage and transfer of large neural networks is also a challenge in mobile device environments, because of limitations on application sizes and long application download times. For these reasons, we wish to compress the network as much as possible, to reduce the amount of bandwidth and compute required. Inducing sparseness, through regularization or pruning, in neural-network models, is one way to compress the network (quantization is another method). Sparse neural networks hold the promise of speed, small size, and energy efficiency. Smaller Sparse NN model representations can be compressed by taking advantage of the fact that the tensor elements are dominated by zeros. The compression format, if any, is very HW and SW specific, and the optimal format may be different per tensor (an obvious example: largely dense tensors should not be compressed). The compute hardware needs to support the compressions formats, for representation compression to be meaningful. Compression representation decisions might interact with algorithms such as the use of tiles for memory accesses. Data such as a parameter tensor is read/written from/to main system memory compressed, but the computation can be dense or sparse. In dense compute we use dense operators, so the compressed data eventually needs to be decompressed into its full, dense size. The best we can do is bring the compressed representation as close as possible to the compute engine. Sparse compute, on the other hand, operates on the sparse representation which never requires decompression (we therefore distinguish between sparse representation and compressed representation). This is not a simple matter to implement in HW, and often means lower utilization of the vectorized compute engines. Therefore, there is a third class of representations, which take advantage of specific hardware characteristics. For example, for a vectorized compute engine we can remove an entire zero-weights vector and skip its computation (this uses structured pruning or regularization). Faster Many of the layers in modern neural-networks are bandwidth-bound, which means that the execution latency is dominated by the available bandwidth. In essence, the hardware spends more time bringing data close to the compute engines, than actually performing the computations. Fully-connected layers, RNNs and LSTMs are some examples of bandwidth-dominated operations. Reducing the bandwidth required by these layers, will immediately speed them up. Some pruning algorithms prune entire kernels, filters and even layers from the network without adversely impacting the final accuracy. Depending on the hardware implementation, these methods can be leveraged to skip computations, thus reducing latency and power. More energy efficient Because we pay two orders-of-magnitude more energy to access off-chip memory (e.g. DDR) compared to on-chip memory (e.g. SRAM or cache), many hardware designs employ a multi-layered cache hierarchy. Fitting the parameters and activations of a network in these on-chip caches can make a big difference on the required bandwidth, the total inference latency, and off course reduce power consumption. And of course, if we used a sparse or compressed representation, then we are reducing the data throughput and therefore the energy consumption.","title":"Home"},{"location":"index.html#distiller-documentation","text":"","title":"Distiller Documentation"},{"location":"index.html#what-is-distiller","text":"Distiller is an open-source Python package for neural network compression research. Network compression can reduce the footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low precision arithmetic. Distiller contains: A framework for integrating pruning, regularization and quantization algorithms. A set of tools for analyzing and evaluating compression performance. Example implementations of state-of-the-art compression algorithms.","title":"What is Distiller"},{"location":"index.html#motivation","text":"A sparse tensor is any tensor that contains some zeros, but sparse tensors are usually only interesting if they contain a significant number of zeros. A sparse neural network performs computations using some sparse tensors (preferably many). These tensors can be parameters (weights and biases) or activations (feature maps). Why do we care about sparsity? Present day neural networks tend to be deep, with millions of weights and activations. Refer to GoogLeNet or ResNet50, for a couple of examples. These large models are compute-intensive which means that even with dedicated acceleration hardware, the inference pass (network evaluation) will take time. You might think that latency is an issue only in certain cases, such as autonomous driving systems, but in fact, whenever we humans interact with our phones and computers, we are sensitive to the latency of the interaction. We don't like to wait for search results or for an application or web-page to load, and we are especially sensitive in realtime interactions such as speech recognition. So inference latency is often something we want to minimize. Large models are also memory-intensive with millions of parameters. Moving around all of the data required to compute inference results consumes energy, which is a problem on a mobile device as well as in a server environment. Data center server-racks are limited by their power-envelope and their ToC (total cost of ownership) is correlated to their power consumption and thermal characteristics. In the mobile device environment, we are obviously always aware of the implications of power consumption on the device battery. Inference performance in the data center is often measured using a KPI (key performance indicator) which folds latency and power considerations: inferences per second, per Watt (inferences/sec/watt). The storage and transfer of large neural networks is also a challenge in mobile device environments, because of limitations on application sizes and long application download times. For these reasons, we wish to compress the network as much as possible, to reduce the amount of bandwidth and compute required. Inducing sparseness, through regularization or pruning, in neural-network models, is one way to compress the network (quantization is another method). Sparse neural networks hold the promise of speed, small size, and energy efficiency.","title":"Motivation"},{"location":"index.html#smaller","text":"Sparse NN model representations can be compressed by taking advantage of the fact that the tensor elements are dominated by zeros. The compression format, if any, is very HW and SW specific, and the optimal format may be different per tensor (an obvious example: largely dense tensors should not be compressed). The compute hardware needs to support the compressions formats, for representation compression to be meaningful. Compression representation decisions might interact with algorithms such as the use of tiles for memory accesses. Data such as a parameter tensor is read/written from/to main system memory compressed, but the computation can be dense or sparse. In dense compute we use dense operators, so the compressed data eventually needs to be decompressed into its full, dense size. The best we can do is bring the compressed representation as close as possible to the compute engine. Sparse compute, on the other hand, operates on the sparse representation which never requires decompression (we therefore distinguish between sparse representation and compressed representation). This is not a simple matter to implement in HW, and often means lower utilization of the vectorized compute engines. Therefore, there is a third class of representations, which take advantage of specific hardware characteristics. For example, for a vectorized compute engine we can remove an entire zero-weights vector and skip its computation (this uses structured pruning or regularization).","title":"Smaller"},{"location":"index.html#faster","text":"Many of the layers in modern neural-networks are bandwidth-bound, which means that the execution latency is dominated by the available bandwidth. In essence, the hardware spends more time bringing data close to the compute engines, than actually performing the computations. Fully-connected layers, RNNs and LSTMs are some examples of bandwidth-dominated operations. Reducing the bandwidth required by these layers, will immediately speed them up. Some pruning algorithms prune entire kernels, filters and even layers from the network without adversely impacting the final accuracy. Depending on the hardware implementation, these methods can be leveraged to skip computations, thus reducing latency and power.","title":"Faster"},{"location":"index.html#more-energy-efficient","text":"Because we pay two orders-of-magnitude more energy to access off-chip memory (e.g. DDR) compared to on-chip memory (e.g. SRAM or cache), many hardware designs employ a multi-layered cache hierarchy. Fitting the parameters and activations of a network in these on-chip caches can make a big difference on the required bandwidth, the total inference latency, and off course reduce power consumption. And of course, if we used a sparse or compressed representation, then we are reducing the data throughput and therefore the energy consumption.","title":"More energy efficient"},{"location":"algo_earlyexit.html","text":"Early Exit Inference While Deep Neural Networks benefit from a large number of layers, it's often the case that many data points in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks look at a selective approach to exit placement and criteria for exiting early. Why Does Early Exit Work? Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can represent more complex and expressive boundaries between classes (assuming we\u2019re confident of avoiding over-fitting the data), it\u2019s also clear that much of the data can be properly classified with even the simplest of classification boundaries. Data points far from the boundary can be considered \"easy to classify\" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is \"difficult to classify\" and require the full expressiveness of the neural network to accurately classify it. Example code for Early Exit Both CIFAR10 and ImageNet code comes directly from publicly available examples from PyTorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work. Note: the sample code provided for ResNet models with Early Exits has exactly one early exit for the CIFAR10 example and exactly two early exits for the ImageNet example. If you want to modify the number of early exits, you will need to make sure that the model code is updated to have a corresponding number of exits. Deeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and ImageNet, respectively. Note that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues. Example command lines We have provided examples for ResNets of varying sizes for both CIFAR10 and ImageNet datasets. An example command line for training for CIFAR10 is: python compress_classifier.py --arch=resnet32_cifar_earlyexit --epochs=20 -b 128 \\ --lr=0.003 --earlyexit_thresholds 0.4 --earlyexit_lossweights 0.4 -j 30 \\ --out-dir /home/ -n earlyexit /home/pcifar10 And an example command line for ImageNet is: python compress_classifier.py --arch=resnet50_earlyexit --epochs=120 -b 128 \\ --lr=0.003 --earlyexit_thresholds 1.2 0.9 --earlyexit_lossweights 0.1 0.3 \\ -j 30 --out-dir /home/ -n earlyexit /home/I1K/i1k-extracted/ Heuristics The insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more aggressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy. There are other benefits to adding exits in that training the modified network now has back-propagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient. Early Exit Hyper-Parameters There are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit: --earlyexit_thresholds defines the thresholds for each of the early exits. The cross entropy measure must be less than the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify \"--earlyexit_thresholds 0.9 1.2\" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits. 12 --earlyexit_lossweights provide the weights for the linear combination of losses during training to compute a single, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of \"--earlyexit_lossweights 0.2 0.3\" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy. Output Stats The example code outputs various statistics regarding the loss and accuracy at each of the exits. During training, the Top1 and Top5 stats represent the accuracy should all of the data be forced out that exit (in order to compute the loss at that exit). During inference (i.e. validation and test stages), the Top1 and Top5 stats represent the accuracy for those data points that could exit because the calculated entropy at that exit was lower than the specified threshold for that exit. CIFAR10 In the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers. ImageNet This supports training and inference of the ImageNet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic ResNet code and could be used with other size ResNets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly. References Priyadarshini Panda, Abhronil Sengupta, Kaushik Roy . Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition , arXiv:1509.08971v6, 2017. Surat Teerapittayanon, Bradley McDanel, H. T. Kung . BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks , arXiv:1709.01686, 2017.","title":"Early Exit"},{"location":"algo_earlyexit.html#early-exit-inference","text":"While Deep Neural Networks benefit from a large number of layers, it's often the case that many data points in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks look at a selective approach to exit placement and criteria for exiting early.","title":"Early Exit Inference"},{"location":"algo_earlyexit.html#why-does-early-exit-work","text":"Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can represent more complex and expressive boundaries between classes (assuming we\u2019re confident of avoiding over-fitting the data), it\u2019s also clear that much of the data can be properly classified with even the simplest of classification boundaries. Data points far from the boundary can be considered \"easy to classify\" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is \"difficult to classify\" and require the full expressiveness of the neural network to accurately classify it.","title":"Why Does Early Exit Work?"},{"location":"algo_earlyexit.html#example-code-for-early-exit","text":"Both CIFAR10 and ImageNet code comes directly from publicly available examples from PyTorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work. Note: the sample code provided for ResNet models with Early Exits has exactly one early exit for the CIFAR10 example and exactly two early exits for the ImageNet example. If you want to modify the number of early exits, you will need to make sure that the model code is updated to have a corresponding number of exits. Deeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and ImageNet, respectively. Note that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues.","title":"Example code for Early Exit"},{"location":"algo_earlyexit.html#example-command-lines","text":"We have provided examples for ResNets of varying sizes for both CIFAR10 and ImageNet datasets. An example command line for training for CIFAR10 is: python compress_classifier.py --arch=resnet32_cifar_earlyexit --epochs=20 -b 128 \\ --lr=0.003 --earlyexit_thresholds 0.4 --earlyexit_lossweights 0.4 -j 30 \\ --out-dir /home/ -n earlyexit /home/pcifar10 And an example command line for ImageNet is: python compress_classifier.py --arch=resnet50_earlyexit --epochs=120 -b 128 \\ --lr=0.003 --earlyexit_thresholds 1.2 0.9 --earlyexit_lossweights 0.1 0.3 \\ -j 30 --out-dir /home/ -n earlyexit /home/I1K/i1k-extracted/","title":"Example command lines"},{"location":"algo_earlyexit.html#heuristics","text":"The insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more aggressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy. There are other benefits to adding exits in that training the modified network now has back-propagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient.","title":"Heuristics"},{"location":"algo_earlyexit.html#early-exit-hyper-parameters","text":"There are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit: --earlyexit_thresholds defines the thresholds for each of the early exits. The cross entropy measure must be less than the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify \"--earlyexit_thresholds 0.9 1.2\" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits. 12 --earlyexit_lossweights provide the weights for the linear combination of losses during training to compute a single, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of \"--earlyexit_lossweights 0.2 0.3\" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy.","title":"Early Exit Hyper-Parameters"},{"location":"algo_earlyexit.html#output-stats","text":"The example code outputs various statistics regarding the loss and accuracy at each of the exits. During training, the Top1 and Top5 stats represent the accuracy should all of the data be forced out that exit (in order to compute the loss at that exit). During inference (i.e. validation and test stages), the Top1 and Top5 stats represent the accuracy for those data points that could exit because the calculated entropy at that exit was lower than the specified threshold for that exit.","title":"Output Stats"},{"location":"algo_earlyexit.html#cifar10","text":"In the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers.","title":"CIFAR10"},{"location":"algo_earlyexit.html#imagenet","text":"This supports training and inference of the ImageNet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic ResNet code and could be used with other size ResNets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly.","title":"ImageNet"},{"location":"algo_earlyexit.html#references","text":"Priyadarshini Panda, Abhronil Sengupta, Kaushik Roy . Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition , arXiv:1509.08971v6, 2017. Surat Teerapittayanon, Bradley McDanel, H. T. Kung . BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks , arXiv:1709.01686, 2017.","title":"References"},{"location":"algo_pruning.html","text":"Weights Pruning Algorithms Magnitude Pruner This is the most basic pruner: it applies a thresholding function, \\(thresh(.)\\), on each element, \\(w_i\\), of a weights tensor. A different threshold can be used for each layer's weights tensor. Because the threshold is applied on individual elements, this pruner belongs to the element-wise pruning algorithm family. \\[ thresh(w_i)=\\left\\lbrace \\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} } \\right\\rbrace \\] Sensitivity Pruner Finding a threshold magnitude per layer is daunting, especially since each layer's elements have different average absolute values. We can take advantage of the fact that the weights of convolutional and fully connected layers exhibit a Gaussian distribution with a mean value roughly zero, to avoid using a direct threshold based on the values of each specific tensor. The diagram below shows the distribution the weights tensor of the first convolutional layer, and first fully-connected layer in TorchVision's pre-trained Alexnet model. You can see that they have an approximate Gaussian distribution. The distributions of Alexnet conv1 and fc1 layers We use the standard deviation of the weights tensor as a sort of normalizing factor between the different weights tensors. For example, if a tensor is Normally distributed, then about 68% of the elements have an absolute value less than the standard deviation (\\(\\sigma\\)) of the tensor. Thus, if we set the threshold to \\(s*\\sigma\\), then basically we are thresholding \\(s * 68\\%\\) of the tensor elements. \\[ thresh(w_i)=\\left\\lbrace \\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} } \\right\\rbrace \\] \\[ \\lambda = s * \\sigma_l \\;\\;\\; where\\; \\sigma_l\\; is \\;the \\;std \\;of \\;layer \\;l \\;as \\;measured \\;on \\;the \\;dense \\;model \\] How do we choose this \\(s\\) multiplier? In Learning both Weights and Connections for Efficient Neural Networks the authors write: \"We used the sensitivity results to find each layer\u2019s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer\u2019s weights So the results of executing pruning sensitivity analysis on the tensor, gives us a good starting guess at \\(s\\). Sensitivity analysis is an empirical method, and we still have to spend time to hone in on the exact multiplier value. Method of Operation Start by running a pruning sensitivity analysis on the model. Then use the results to set and tune the threshold of each layer, but instead of using a direct threshold use a sensitivity parameter which is multiplied by the standard-deviation of the initial weight-tensor's distribution. Schedule In their paper Song Han et al. use iterative pruning and change the value of the \\(s\\) multiplier at each pruning step. Distiller's SensitivityPruner works differently: the value \\(s\\) is set once based on a one-time calculation of the standard-deviation of the tensor (the first time we prune), and relies on the fact that as the tensor is pruned, more elements are \"pulled\" toward the center of the distribution and thus more elements gets pruned. This actually works quite well as we can see in the diagram below. This is a TensorBoard screen-capture from Alexnet training, which shows how this method starts off pruning very aggressively, but then slowly reduces the pruning rate. We use a simple iterative-pruning schedule such as: Prune every second epoch starting at epoch 0, and ending at epoch 38. This excerpt from alexnet.schedule_sensitivity.yaml shows how this iterative schedule is conveyed in Distiller scheduling configuration YAML: pruners: my_pruner: class: 'SensitivityPruner' sensitivities: 'features.module.0.weight': 0.25 'features.module.3.weight': 0.35 'features.module.6.weight': 0.40 'features.module.8.weight': 0.45 'features.module.10.weight': 0.55 'classifier.1.weight': 0.875 'classifier.4.weight': 0.875 'classifier.6.weight': 0.625 policies: - pruner: instance_name : 'my_pruner' starting_epoch: 0 ending_epoch: 38 frequency: 2 Level Pruner Class SparsityLevelParameterPruner uses a similar method to go around specifying specific thresholding magnitudes. Instead of specifying a threshold magnitude, you specify a target sparsity level (expressed as a fraction, so 0.5 means 50% sparsity). Essentially this pruner also uses a pruning criteria based on the magnitude of each tensor element, but it has the advantage that you can aim for an exact and specific sparsity level. This pruner is much more stable compared to SensitivityPruner because the target sparsity level is not coupled to the actual magnitudes of the elements. Distiller's SensitivityPruner is unstable because the final sparsity level depends on the convergence pattern of the tensor distribution. Song Han's methodology of using several different values for the multiplier \\(s\\), and the recalculation of the standard-deviation at each pruning phase, probably gives it stability, but requires much more hyper-parameters (this is the reason we have not implemented it thus far). To set the target sparsity levels, you can once again use pruning sensitivity analysis to make better guesses at the correct sparsity level of each Method of Operation Sort the weights in the specified layer by their absolute values. Mask to zero the smallest magnitude weights until the desired sparsity level is reached. Splicing Pruner In Dynamic Network Surgery for Efficient DNNs Guo et. al propose that network pruning and splicing work in tandem. A SpilicingPruner is a pruner that both prunes and splices connections and works best with a Dynamic Network Surgery schedule, which, for example, configures the PruningPolicy to mask weights only during the forward pass. Automated Gradual Pruner (AGP) In To prune, or not to prune: exploring the efficacy of pruning for model compression , authors Michael Zhu and Suyog Gupta provide an algorithm to schedule a Level Pruner which Distiller implements in AutomatedGradualPruner . \"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value \\(s_i\\) (usually 0) to a \ufb01nal sparsity value \\(s_f\\) over a span of n pruning steps. The intuition behind this sparsity function in equation (1) is to prune the network rapidly in the initial phase when the redundant connections are abundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.\"\" You can play with the scheduling parameters in the agp_schedule.ipynb notebook . The authors describe AGP: Our automated gradual pruning algorithm prunes the smallest magnitude weights to achieve a preset level of network sparsity. Doesn't require much hyper-parameter tuning Shown to perform well across different models Does not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable. RNN Pruner The authors of Exploring Sparsity in Recurrent Neural Networks , Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, \"propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network.\" They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training. They show pruning of RNN, GRU, LSTM and embedding layers. Distiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm. Structure Pruners Element-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation. Structure pruners, remove entire \"structures\", such as kernels, filters, and even entire feature-maps. Structure Ranking Pruners Ranking pruners use some criterion to rank the structures in a tensor, and then prune the tensor to a specified level. In principle, these pruners perform one-shot pruning, but can be combined with automatic pruning-level scheduling, such as AGP (see below). In Pruning Filters for Efficient ConvNets the authors use filter ranking, with one-shot pruning followed by fine-tuning. The authors of Exploiting Sparseness in Deep Neural Networks for Large Vocabulary Speech Recognition also use a one-shot pruning schedule, for fully-connected layers, and they provide an explanation: First, after sweeping through the full training set several times the weights become relatively stable \u2014 they tend to remain either large or small magnitudes. Second, in a stabilized model, the importance of the connection is approximated well by the magnitudes of the weights (times the magnitudes of the corresponding input values, but these are relatively uniform within each layer since on the input layer, features are normalized to zero-mean and unit-variance, and hidden-layer values are probabilities) L1RankedStructureParameterPruner The L1RankedStructureParameterPruner pruner calculates the magnitude of some \"structure\", orders all of the structures based on some magnitude function and the m lowest ranking structures are pruned away. This pruner performs ranking of structures using the mean of the absolute value of the structure as the representative of the structure magnitude. The absolute mean does not depend on the size of the structure, so it is easier to use compared to just using the \\(L_1\\)-norm of the structure, and at the same time it is a good proxy of the \\(L_1\\)-norm. Basically, you can think of mean(abs(t)) as a form of normalization of the structure L1-norm by the length of the structure. L1RankedStructureParameterPruner currently prunes weight filters, channels, and rows (for linear layers). ActivationAPoZRankedFilterPruner The ActivationAPoZRankedFilterPruner pruner uses the activation channels mean APoZ (average percentage of zeros) to rank weight filters and prune a specified percentage of filters. This method is called Network Trimming from the research paper: \"Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures\", Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016 https://arxiv.org/abs/1607.03250 GradientRankedFilterPruner The GradientRankedFilterPruner tries to asses the importance of weight filters using the product of their gradients and the filter value. RandomRankedFilterPruner For research purposes we may want to compare the results of some structure-ranking pruner to a random structure-ranking. The RandomRankedFilterPruner pruner can be used for this purpose. Automated Gradual Pruner (AGP) for Structures The idea of a mathematical formula controlling the sparsity level growth is very useful and StructuredAGP extends the implementation to structured pruning. Pruner Compositions Pruners can be combined to create new pruning schemes. Specifically, with a few lines of code we currently marry the AGP sparsity level scheduler with our filter-ranking classes to create pruner compositions. For each of these, we use AGP to decided how many filters to prune at each step, and we choose the filters to remove using one of the filter-ranking methods: L1RankedStructureParameterPruner_AGP ActivationAPoZRankedFilterPruner_AGP GradientRankedFilterPruner_AGP RandomRankedFilterPruner_AGP Hybrid Pruning In a single schedule we can mix different pruning techniques. For example, we might mix pruning and regularization. Or structured pruning and element-wise pruning. We can even apply different methods on the same tensor. For example, we might want to perform filter pruning for a few epochs, then perform thinning and continue with element-wise pruning of the smaller network tensors. This technique of mixing different methods we call Hybrid Pruning, and Distiller has a few example schedules.","title":"Pruning"},{"location":"algo_pruning.html#weights-pruning-algorithms","text":"","title":"Weights Pruning Algorithms"},{"location":"algo_pruning.html#magnitude-pruner","text":"This is the most basic pruner: it applies a thresholding function, \\(thresh(.)\\), on each element, \\(w_i\\), of a weights tensor. A different threshold can be used for each layer's weights tensor. Because the threshold is applied on individual elements, this pruner belongs to the element-wise pruning algorithm family. \\[ thresh(w_i)=\\left\\lbrace \\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} } \\right\\rbrace \\]","title":"Magnitude Pruner"},{"location":"algo_pruning.html#sensitivity-pruner","text":"Finding a threshold magnitude per layer is daunting, especially since each layer's elements have different average absolute values. We can take advantage of the fact that the weights of convolutional and fully connected layers exhibit a Gaussian distribution with a mean value roughly zero, to avoid using a direct threshold based on the values of each specific tensor. The diagram below shows the distribution the weights tensor of the first convolutional layer, and first fully-connected layer in TorchVision's pre-trained Alexnet model. You can see that they have an approximate Gaussian distribution. The distributions of Alexnet conv1 and fc1 layers We use the standard deviation of the weights tensor as a sort of normalizing factor between the different weights tensors. For example, if a tensor is Normally distributed, then about 68% of the elements have an absolute value less than the standard deviation (\\(\\sigma\\)) of the tensor. Thus, if we set the threshold to \\(s*\\sigma\\), then basically we are thresholding \\(s * 68\\%\\) of the tensor elements. \\[ thresh(w_i)=\\left\\lbrace \\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} } \\right\\rbrace \\] \\[ \\lambda = s * \\sigma_l \\;\\;\\; where\\; \\sigma_l\\; is \\;the \\;std \\;of \\;layer \\;l \\;as \\;measured \\;on \\;the \\;dense \\;model \\] How do we choose this \\(s\\) multiplier? In Learning both Weights and Connections for Efficient Neural Networks the authors write: \"We used the sensitivity results to find each layer\u2019s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer\u2019s weights So the results of executing pruning sensitivity analysis on the tensor, gives us a good starting guess at \\(s\\). Sensitivity analysis is an empirical method, and we still have to spend time to hone in on the exact multiplier value.","title":"Sensitivity Pruner"},{"location":"algo_pruning.html#method-of-operation","text":"Start by running a pruning sensitivity analysis on the model. Then use the results to set and tune the threshold of each layer, but instead of using a direct threshold use a sensitivity parameter which is multiplied by the standard-deviation of the initial weight-tensor's distribution.","title":"Method of Operation"},{"location":"algo_pruning.html#schedule","text":"In their paper Song Han et al. use iterative pruning and change the value of the \\(s\\) multiplier at each pruning step. Distiller's SensitivityPruner works differently: the value \\(s\\) is set once based on a one-time calculation of the standard-deviation of the tensor (the first time we prune), and relies on the fact that as the tensor is pruned, more elements are \"pulled\" toward the center of the distribution and thus more elements gets pruned. This actually works quite well as we can see in the diagram below. This is a TensorBoard screen-capture from Alexnet training, which shows how this method starts off pruning very aggressively, but then slowly reduces the pruning rate. We use a simple iterative-pruning schedule such as: Prune every second epoch starting at epoch 0, and ending at epoch 38. This excerpt from alexnet.schedule_sensitivity.yaml shows how this iterative schedule is conveyed in Distiller scheduling configuration YAML: pruners: my_pruner: class: 'SensitivityPruner' sensitivities: 'features.module.0.weight': 0.25 'features.module.3.weight': 0.35 'features.module.6.weight': 0.40 'features.module.8.weight': 0.45 'features.module.10.weight': 0.55 'classifier.1.weight': 0.875 'classifier.4.weight': 0.875 'classifier.6.weight': 0.625 policies: - pruner: instance_name : 'my_pruner' starting_epoch: 0 ending_epoch: 38 frequency: 2","title":"Schedule"},{"location":"algo_pruning.html#level-pruner","text":"Class SparsityLevelParameterPruner uses a similar method to go around specifying specific thresholding magnitudes. Instead of specifying a threshold magnitude, you specify a target sparsity level (expressed as a fraction, so 0.5 means 50% sparsity). Essentially this pruner also uses a pruning criteria based on the magnitude of each tensor element, but it has the advantage that you can aim for an exact and specific sparsity level. This pruner is much more stable compared to SensitivityPruner because the target sparsity level is not coupled to the actual magnitudes of the elements. Distiller's SensitivityPruner is unstable because the final sparsity level depends on the convergence pattern of the tensor distribution. Song Han's methodology of using several different values for the multiplier \\(s\\), and the recalculation of the standard-deviation at each pruning phase, probably gives it stability, but requires much more hyper-parameters (this is the reason we have not implemented it thus far). To set the target sparsity levels, you can once again use pruning sensitivity analysis to make better guesses at the correct sparsity level of each","title":"Level Pruner"},{"location":"algo_pruning.html#method-of-operation_1","text":"Sort the weights in the specified layer by their absolute values. Mask to zero the smallest magnitude weights until the desired sparsity level is reached.","title":"Method of Operation"},{"location":"algo_pruning.html#splicing-pruner","text":"In Dynamic Network Surgery for Efficient DNNs Guo et. al propose that network pruning and splicing work in tandem. A SpilicingPruner is a pruner that both prunes and splices connections and works best with a Dynamic Network Surgery schedule, which, for example, configures the PruningPolicy to mask weights only during the forward pass.","title":"Splicing Pruner"},{"location":"algo_pruning.html#automated-gradual-pruner-agp","text":"In To prune, or not to prune: exploring the efficacy of pruning for model compression , authors Michael Zhu and Suyog Gupta provide an algorithm to schedule a Level Pruner which Distiller implements in AutomatedGradualPruner . \"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value \\(s_i\\) (usually 0) to a \ufb01nal sparsity value \\(s_f\\) over a span of n pruning steps. The intuition behind this sparsity function in equation (1) is to prune the network rapidly in the initial phase when the redundant connections are abundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.\"\" You can play with the scheduling parameters in the agp_schedule.ipynb notebook . The authors describe AGP: Our automated gradual pruning algorithm prunes the smallest magnitude weights to achieve a preset level of network sparsity. Doesn't require much hyper-parameter tuning Shown to perform well across different models Does not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable.","title":"Automated Gradual Pruner (AGP)"},{"location":"algo_pruning.html#rnn-pruner","text":"The authors of Exploring Sparsity in Recurrent Neural Networks , Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, \"propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network.\" They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training. They show pruning of RNN, GRU, LSTM and embedding layers. Distiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm.","title":"RNN Pruner"},{"location":"algo_pruning.html#structure-pruners","text":"Element-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation. Structure pruners, remove entire \"structures\", such as kernels, filters, and even entire feature-maps.","title":"Structure Pruners"},{"location":"algo_pruning.html#structure-ranking-pruners","text":"Ranking pruners use some criterion to rank the structures in a tensor, and then prune the tensor to a specified level. In principle, these pruners perform one-shot pruning, but can be combined with automatic pruning-level scheduling, such as AGP (see below). In Pruning Filters for Efficient ConvNets the authors use filter ranking, with one-shot pruning followed by fine-tuning. The authors of Exploiting Sparseness in Deep Neural Networks for Large Vocabulary Speech Recognition also use a one-shot pruning schedule, for fully-connected layers, and they provide an explanation: First, after sweeping through the full training set several times the weights become relatively stable \u2014 they tend to remain either large or small magnitudes. Second, in a stabilized model, the importance of the connection is approximated well by the magnitudes of the weights (times the magnitudes of the corresponding input values, but these are relatively uniform within each layer since on the input layer, features are normalized to zero-mean and unit-variance, and hidden-layer values are probabilities)","title":"Structure Ranking Pruners"},{"location":"algo_pruning.html#l1rankedstructureparameterpruner","text":"The L1RankedStructureParameterPruner pruner calculates the magnitude of some \"structure\", orders all of the structures based on some magnitude function and the m lowest ranking structures are pruned away. This pruner performs ranking of structures using the mean of the absolute value of the structure as the representative of the structure magnitude. The absolute mean does not depend on the size of the structure, so it is easier to use compared to just using the \\(L_1\\)-norm of the structure, and at the same time it is a good proxy of the \\(L_1\\)-norm. Basically, you can think of mean(abs(t)) as a form of normalization of the structure L1-norm by the length of the structure. L1RankedStructureParameterPruner currently prunes weight filters, channels, and rows (for linear layers).","title":"L1RankedStructureParameterPruner"},{"location":"algo_pruning.html#activationapozrankedfilterpruner","text":"The ActivationAPoZRankedFilterPruner pruner uses the activation channels mean APoZ (average percentage of zeros) to rank weight filters and prune a specified percentage of filters. This method is called Network Trimming from the research paper: \"Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures\", Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016 https://arxiv.org/abs/1607.03250","title":"ActivationAPoZRankedFilterPruner"},{"location":"algo_pruning.html#gradientrankedfilterpruner","text":"The GradientRankedFilterPruner tries to asses the importance of weight filters using the product of their gradients and the filter value.","title":"GradientRankedFilterPruner"},{"location":"algo_pruning.html#randomrankedfilterpruner","text":"For research purposes we may want to compare the results of some structure-ranking pruner to a random structure-ranking. The RandomRankedFilterPruner pruner can be used for this purpose.","title":"RandomRankedFilterPruner"},{"location":"algo_pruning.html#automated-gradual-pruner-agp-for-structures","text":"The idea of a mathematical formula controlling the sparsity level growth is very useful and StructuredAGP extends the implementation to structured pruning.","title":"Automated Gradual Pruner (AGP) for Structures"},{"location":"algo_pruning.html#pruner-compositions","text":"Pruners can be combined to create new pruning schemes. Specifically, with a few lines of code we currently marry the AGP sparsity level scheduler with our filter-ranking classes to create pruner compositions. For each of these, we use AGP to decided how many filters to prune at each step, and we choose the filters to remove using one of the filter-ranking methods: L1RankedStructureParameterPruner_AGP ActivationAPoZRankedFilterPruner_AGP GradientRankedFilterPruner_AGP RandomRankedFilterPruner_AGP","title":"Pruner Compositions"},{"location":"algo_pruning.html#hybrid-pruning","text":"In a single schedule we can mix different pruning techniques. For example, we might mix pruning and regularization. Or structured pruning and element-wise pruning. We can even apply different methods on the same tensor. For example, we might want to perform filter pruning for a few epochs, then perform thinning and continue with element-wise pruning of the smaller network tensors. This technique of mixing different methods we call Hybrid Pruning, and Distiller has a few example schedules.","title":"Hybrid Pruning"},{"location":"algo_quantization.html","text":"Quantization Algorithms Note: For any of the methods below that require quantization-aware training, please see here for details on how to invoke it using Distiller's scheduling mechanism. Range-Based Linear Quantization Let's break down the terminology we use here: Linear: Means a float value is quantized by multiplying with a numeric constant (the scale factor ). Range-Based: Means that in order to calculate the scale factor, we look at the actual range of the tensor's values. In the most naive implementation, we use the actual min/max values of the tensor. Alternatively, we use some derivation based on the tensor's range / distribution to come up with a narrower min/max range, in order to remove possible outliers. This is in contrast to the other methods described here, which we could call clipping-based , as they impose an explicit clipping function on the tensors (using either a hard-coded value or a learned value). Asymmetric vs. Symmetric In this method we can use two modes - asymmetric and symmetric . Asymmetric Mode In asymmetric mode, we map the min/max in the float range to the min/max of the integer range. This is done by using a zero-point (also called quantization bias , or offset ) in addition to the scale factor. Let us denote the original floating-point tensor by x_f , the quantized tensor by x_q , the scale factor by q_x , the zero-point by zp_x and the number of bits used for quantization by n . Then, we get: x_q = round\\left ((x_f - min_{x_f})\\underbrace{\\frac{2^n - 1}{max_{x_f} - min_{x_f}}}_{q_x} \\right) = round(q_x x_f - \\underbrace{min_{x_f}q_x)}_{zp_x} = round(q_x x_f - zp_x) In practice, we actually use zp_x = round(min_{x_f}q_x) . This means that zero is exactly representable by an integer in the quantized range. This is important, for example, for layers that have zero-padding. By rounding the zero-point, we effectively \"nudge\" the min/max values in the float range a little bit, in order to gain this exact quantization of zero. Note that in the derivation above we use unsigned integer to represent the quantized range. That is, x_q \\in [0, 2^n-1] . One could use signed integer if necessary (perhaps due to HW considerations). This can be achieved by subtracting 2^{n-1} . Let's see how a convolution or fully-connected (FC) layer is quantized in asymmetric mode: (we denote input, output, weights and bias with x, y, w and b respectively) y_f = \\sum{x_f w_f} + b_f = \\sum{\\frac{x_q + zp_x}{q_x} \\frac{w_q + zp_w}{q_w}} + \\frac{b_q + zp_b}{q_b} = = \\frac{1}{q_x q_w} \\left( \\sum { (x_q + zp_x) (w_q + zp_w) + \\frac{q_x q_w}{q_b}(b_q + zp_b) } \\right) Therefore: y_q = round(q_y y_f) = round\\left(\\frac{q_y}{q_x q_w} \\left( \\sum { (x_q+zp_x) (w_q+zp_w) + \\frac{q_x q_w}{q_b}(b_q+zp_b) } \\right) \\right) Notes: We can see that the bias has to be re-scaled to match the scale of the summation. In a proper integer-only HW pipeline, we would like our main accumulation term to simply be \\sum{x_q w_q} . In order to achieve this, one needs to further develop the expression we derived above. For further details please refer to the gemmlowp documentation Symmetric Mode In symmetric mode, instead of mapping the exact min/max of the float range to the quantized range, we choose the maximum absolute value between min/max. In addition, we don't use a zero-point. So, the floating-point range we're effectively quantizing is symmetric with respect to zero, and so is the quantized range. Using the same notations as above, we get: x_q = round\\left (x_f \\underbrace{\\frac{2^{n-1} - 1}{\\max|x_f|}}_{q_x} \\right) = round(q_x x_f) Again, let's see how a convolution or fully-connected (FC) layer is quantized, this time in symmetric mode: y_f = \\sum{x_f w_f} + b_f = \\sum{\\frac{x_q}{q_x} \\frac{w_q}{q_w}} + \\frac{b_q}{q_b} = \\frac{1}{q_x q_w} \\left( \\sum { x_q w_q + \\frac{q_x q_w}{q_b}b_q } \\right) Therefore: y_q = round(q_y y_f) = round\\left(\\frac{q_y}{q_x q_w} \\left( \\sum { x_q w_q + \\frac{q_x q_w}{q_b}b_q } \\right) \\right) Comparing the Two Modes The main trade-off between these two modes is simplicity vs. utilization of the quantized range. When using asymmetric quantization, the quantized range is fully utilized. That is because we exactly map the min/max values from the float range to the min/max of the quantized range. Using symmetric mode, if the float range is biased towards one side, could result in a quantized range where significant dynamic range is dedicated to values that we'll never see. The most extreme example of this is after ReLU, where the entire tensor is positive. Quantizing it in symmetric mode means we're effectively losing 1 bit. On the other hand, if we look at the derviations for convolution / FC layers above, we can see that the actual implementation of symmetric mode is much simpler. In asymmetric mode, the zero-points require additional logic in HW. The cost of this extra logic in terms of latency and/or power and/or area will of course depend on the exact implementation. Other Features Removing Outliers: As discussed here , in some cases the float range of activations contains outliers. Spending dynamic range on these outliers hurts our ability to represent the values we actually care about accurately. Currently, Distiller supports clipping of activations with averaging during post-training quantization. That is - for each batch, instead of calculating global min/max values, an average of the min/max values of each sample in the batch. Scale factor scope: For weight tensors, Distiller supports per-channel quantization (per output channel). Implementation in Distiller Post-Training For post-training quantization, this method is implemented by wrapping existing modules with quantization and de-quantization operations. The wrapper implementations are in range_linear.py . The operations currently supported are: Convolution Fully connected Element-wise addition Element-wise multiplication Concatenation All other layers are unaffected and are executed using their original FP32 implementation. To automatically transform an existing model to a quantized model using this method, use the PostTrainLinearQuantizer class. For details on ways to invoke the quantizer see here . The transform performed by the Quantizer only works on sub-classes of torch.nn.Module . But operations such as element-wise addition / multiplication and concatenation do not have associated Modules in PyTorch. They are either overloaded operators, or simple functions in the torch namespace. To be able to quantize these operations, we've implemented very simple modules that wrap these operations here . It is necessary to manually modify your model and replace any existing operator with a corresponding module. For an example, see our slightly modified ResNet implementation . For weights and bias the scale factor and zero-point are determined once at quantization setup (\"offline\" / \"static\"). For activations, both \"static\" and \"dynamic\" quantization is supported. Static quantizaton of activations requires that statistics be collected beforehand. See details on how to do that here . The calculated quantization parameters are stored as buffers within the module, so they are automatically serialized when the model checkpoint is saved. Quantization-Aware Training To apply range-based linear quantization in training, use the QuantAwareTrainRangeLinearQuantizer class. As it is now, it will apply weights quantization to convolution, FC and embedding modules. For activations quantization, it will insert instances FakeLinearQuantization module after ReLUs. This module follows the methodology described in Benoit et al., 2018 and uses exponential moving averages to track activation ranges. Note that the current implementation of QuantAwareTrainRangeLinearQuantizer supports training with single GPU only . Similarly to post-training, the calculated quantization parameters (scale factors, zero-points, tracked activation ranges) are stored as buffers within their respective modules, so they're saved when a checkpoint is created. Note that converting from a quantization-aware training model to a post-training quantization model is not yet supported. Such a conversion will use the activation ranges tracked during training, so additional offline or online calculation of quantization parameters will not be required. DoReFa (As proposed in DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients ) In this method, we first define the quantization function quantize_k , which takes a real value a_f \\in [0, 1] and outputs a discrete-valued a_q \\in \\left\\{ \\frac{0}{2^k-1}, \\frac{1}{2^k-1}, ... , \\frac{2^k-1}{2^k-1} \\right\\} , where k is the number of bits used for quantization. a_q = quantize_k(a_f) = \\frac{1}{2^k-1} round \\left( \\left(2^k - 1 \\right) a_f \\right) Activations are clipped to the [0, 1] range and then quantized as follows: x_q = quantize_k(x_f) For weights, we define the following function f , which takes an unbounded real valued input and outputs a real value in [0, 1] : f(w) = \\frac{tanh(w)}{2 max(|tanh(w)|)} + \\frac{1}{2} Now we can use quantize_k to get quantized weight values, as follows: w_q = 2 quantize_k \\left( f(w_f) \\right) - 1 This method requires training the model with quantization-aware training, as discussed here . Use the DorefaQuantizer class to transform an existing model to a model suitable for training with quantization using DoReFa. Notes: Gradients quantization as proposed in the paper is not supported yet. The paper defines special handling for binary weights which isn't supported in Distiller yet. PACT (As proposed in PACT: Parameterized Clipping Activation for Quantized Neural Networks ) This method is similar to DoReFa, but the upper clipping values, \\alpha , of the activation functions are learned parameters instead of hard coded to 1. Note that per the paper's recommendation, \\alpha is shared per layer. This method requires training the model with quantization-aware training, as discussed here . Use the PACTQuantizer class to transform an existing model to a model suitable for training with quantization using PACT. WRPN (As proposed in WRPN: Wide Reduced-Precision Networks ) In this method, activations are clipped to [0, 1] and quantized as follows ( k is the number of bits used for quantization): x_q = \\frac{1}{2^k-1} round \\left( \\left(2^k - 1 \\right) x_f \\right) Weights are clipped to [-1, 1] and quantized as follows: w_q = \\frac{1}{2^{k-1}-1} round \\left( \\left(2^{k-1} - 1 \\right)w_f \\right) Note that k-1 bits are used to quantize weights, leaving one bit for sign. This method requires training the model with quantization-aware training, as discussed here . Use the WRPNQuantizer class to transform an existing model to a model suitable for training with quantization using WRPN. Notes: The paper proposed widening of layers as a means to reduce accuracy loss. This isn't implemented as part of WRPNQuantizer at the moment. To experiment with this, modify your model implementation to have wider layers. The paper defines special handling for binary weights which isn't supported in Distiller yet.","title":"Quantization"},{"location":"algo_quantization.html#quantization-algorithms","text":"Note: For any of the methods below that require quantization-aware training, please see here for details on how to invoke it using Distiller's scheduling mechanism.","title":"Quantization Algorithms"},{"location":"algo_quantization.html#range-based-linear-quantization","text":"Let's break down the terminology we use here: Linear: Means a float value is quantized by multiplying with a numeric constant (the scale factor ). Range-Based: Means that in order to calculate the scale factor, we look at the actual range of the tensor's values. In the most naive implementation, we use the actual min/max values of the tensor. Alternatively, we use some derivation based on the tensor's range / distribution to come up with a narrower min/max range, in order to remove possible outliers. This is in contrast to the other methods described here, which we could call clipping-based , as they impose an explicit clipping function on the tensors (using either a hard-coded value or a learned value).","title":"Range-Based Linear Quantization"},{"location":"algo_quantization.html#asymmetric-vs-symmetric","text":"In this method we can use two modes - asymmetric and symmetric .","title":"Asymmetric vs. Symmetric"},{"location":"algo_quantization.html#asymmetric-mode","text":"In asymmetric mode, we map the min/max in the float range to the min/max of the integer range. This is done by using a zero-point (also called quantization bias , or offset ) in addition to the scale factor. Let us denote the original floating-point tensor by x_f , the quantized tensor by x_q , the scale factor by q_x , the zero-point by zp_x and the number of bits used for quantization by n . Then, we get: x_q = round\\left ((x_f - min_{x_f})\\underbrace{\\frac{2^n - 1}{max_{x_f} - min_{x_f}}}_{q_x} \\right) = round(q_x x_f - \\underbrace{min_{x_f}q_x)}_{zp_x} = round(q_x x_f - zp_x) In practice, we actually use zp_x = round(min_{x_f}q_x) . This means that zero is exactly representable by an integer in the quantized range. This is important, for example, for layers that have zero-padding. By rounding the zero-point, we effectively \"nudge\" the min/max values in the float range a little bit, in order to gain this exact quantization of zero. Note that in the derivation above we use unsigned integer to represent the quantized range. That is, x_q \\in [0, 2^n-1] . One could use signed integer if necessary (perhaps due to HW considerations). This can be achieved by subtracting 2^{n-1} . Let's see how a convolution or fully-connected (FC) layer is quantized in asymmetric mode: (we denote input, output, weights and bias with x, y, w and b respectively) y_f = \\sum{x_f w_f} + b_f = \\sum{\\frac{x_q + zp_x}{q_x} \\frac{w_q + zp_w}{q_w}} + \\frac{b_q + zp_b}{q_b} = = \\frac{1}{q_x q_w} \\left( \\sum { (x_q + zp_x) (w_q + zp_w) + \\frac{q_x q_w}{q_b}(b_q + zp_b) } \\right) Therefore: y_q = round(q_y y_f) = round\\left(\\frac{q_y}{q_x q_w} \\left( \\sum { (x_q+zp_x) (w_q+zp_w) + \\frac{q_x q_w}{q_b}(b_q+zp_b) } \\right) \\right) Notes: We can see that the bias has to be re-scaled to match the scale of the summation. In a proper integer-only HW pipeline, we would like our main accumulation term to simply be \\sum{x_q w_q} . In order to achieve this, one needs to further develop the expression we derived above. For further details please refer to the gemmlowp documentation","title":"Asymmetric Mode"},{"location":"algo_quantization.html#symmetric-mode","text":"In symmetric mode, instead of mapping the exact min/max of the float range to the quantized range, we choose the maximum absolute value between min/max. In addition, we don't use a zero-point. So, the floating-point range we're effectively quantizing is symmetric with respect to zero, and so is the quantized range. Using the same notations as above, we get: x_q = round\\left (x_f \\underbrace{\\frac{2^{n-1} - 1}{\\max|x_f|}}_{q_x} \\right) = round(q_x x_f) Again, let's see how a convolution or fully-connected (FC) layer is quantized, this time in symmetric mode: y_f = \\sum{x_f w_f} + b_f = \\sum{\\frac{x_q}{q_x} \\frac{w_q}{q_w}} + \\frac{b_q}{q_b} = \\frac{1}{q_x q_w} \\left( \\sum { x_q w_q + \\frac{q_x q_w}{q_b}b_q } \\right) Therefore: y_q = round(q_y y_f) = round\\left(\\frac{q_y}{q_x q_w} \\left( \\sum { x_q w_q + \\frac{q_x q_w}{q_b}b_q } \\right) \\right)","title":"Symmetric Mode"},{"location":"algo_quantization.html#comparing-the-two-modes","text":"The main trade-off between these two modes is simplicity vs. utilization of the quantized range. When using asymmetric quantization, the quantized range is fully utilized. That is because we exactly map the min/max values from the float range to the min/max of the quantized range. Using symmetric mode, if the float range is biased towards one side, could result in a quantized range where significant dynamic range is dedicated to values that we'll never see. The most extreme example of this is after ReLU, where the entire tensor is positive. Quantizing it in symmetric mode means we're effectively losing 1 bit. On the other hand, if we look at the derviations for convolution / FC layers above, we can see that the actual implementation of symmetric mode is much simpler. In asymmetric mode, the zero-points require additional logic in HW. The cost of this extra logic in terms of latency and/or power and/or area will of course depend on the exact implementation.","title":"Comparing the Two Modes"},{"location":"algo_quantization.html#other-features","text":"Removing Outliers: As discussed here , in some cases the float range of activations contains outliers. Spending dynamic range on these outliers hurts our ability to represent the values we actually care about accurately. Currently, Distiller supports clipping of activations with averaging during post-training quantization. That is - for each batch, instead of calculating global min/max values, an average of the min/max values of each sample in the batch. Scale factor scope: For weight tensors, Distiller supports per-channel quantization (per output channel).","title":"Other Features"},{"location":"algo_quantization.html#implementation-in-distiller","text":"","title":"Implementation in Distiller"},{"location":"algo_quantization.html#post-training","text":"For post-training quantization, this method is implemented by wrapping existing modules with quantization and de-quantization operations. The wrapper implementations are in range_linear.py . The operations currently supported are: Convolution Fully connected Element-wise addition Element-wise multiplication Concatenation All other layers are unaffected and are executed using their original FP32 implementation. To automatically transform an existing model to a quantized model using this method, use the PostTrainLinearQuantizer class. For details on ways to invoke the quantizer see here . The transform performed by the Quantizer only works on sub-classes of torch.nn.Module . But operations such as element-wise addition / multiplication and concatenation do not have associated Modules in PyTorch. They are either overloaded operators, or simple functions in the torch namespace. To be able to quantize these operations, we've implemented very simple modules that wrap these operations here . It is necessary to manually modify your model and replace any existing operator with a corresponding module. For an example, see our slightly modified ResNet implementation . For weights and bias the scale factor and zero-point are determined once at quantization setup (\"offline\" / \"static\"). For activations, both \"static\" and \"dynamic\" quantization is supported. Static quantizaton of activations requires that statistics be collected beforehand. See details on how to do that here . The calculated quantization parameters are stored as buffers within the module, so they are automatically serialized when the model checkpoint is saved.","title":"Post-Training"},{"location":"algo_quantization.html#quantization-aware-training","text":"To apply range-based linear quantization in training, use the QuantAwareTrainRangeLinearQuantizer class. As it is now, it will apply weights quantization to convolution, FC and embedding modules. For activations quantization, it will insert instances FakeLinearQuantization module after ReLUs. This module follows the methodology described in Benoit et al., 2018 and uses exponential moving averages to track activation ranges. Note that the current implementation of QuantAwareTrainRangeLinearQuantizer supports training with single GPU only . Similarly to post-training, the calculated quantization parameters (scale factors, zero-points, tracked activation ranges) are stored as buffers within their respective modules, so they're saved when a checkpoint is created. Note that converting from a quantization-aware training model to a post-training quantization model is not yet supported. Such a conversion will use the activation ranges tracked during training, so additional offline or online calculation of quantization parameters will not be required.","title":"Quantization-Aware Training"},{"location":"algo_quantization.html#dorefa","text":"(As proposed in DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients ) In this method, we first define the quantization function quantize_k , which takes a real value a_f \\in [0, 1] and outputs a discrete-valued a_q \\in \\left\\{ \\frac{0}{2^k-1}, \\frac{1}{2^k-1}, ... , \\frac{2^k-1}{2^k-1} \\right\\} , where k is the number of bits used for quantization. a_q = quantize_k(a_f) = \\frac{1}{2^k-1} round \\left( \\left(2^k - 1 \\right) a_f \\right) Activations are clipped to the [0, 1] range and then quantized as follows: x_q = quantize_k(x_f) For weights, we define the following function f , which takes an unbounded real valued input and outputs a real value in [0, 1] : f(w) = \\frac{tanh(w)}{2 max(|tanh(w)|)} + \\frac{1}{2} Now we can use quantize_k to get quantized weight values, as follows: w_q = 2 quantize_k \\left( f(w_f) \\right) - 1 This method requires training the model with quantization-aware training, as discussed here . Use the DorefaQuantizer class to transform an existing model to a model suitable for training with quantization using DoReFa.","title":"DoReFa"},{"location":"algo_quantization.html#notes","text":"Gradients quantization as proposed in the paper is not supported yet. The paper defines special handling for binary weights which isn't supported in Distiller yet.","title":"Notes:"},{"location":"algo_quantization.html#pact","text":"(As proposed in PACT: Parameterized Clipping Activation for Quantized Neural Networks ) This method is similar to DoReFa, but the upper clipping values, \\alpha , of the activation functions are learned parameters instead of hard coded to 1. Note that per the paper's recommendation, \\alpha is shared per layer. This method requires training the model with quantization-aware training, as discussed here . Use the PACTQuantizer class to transform an existing model to a model suitable for training with quantization using PACT.","title":"PACT"},{"location":"algo_quantization.html#wrpn","text":"(As proposed in WRPN: Wide Reduced-Precision Networks ) In this method, activations are clipped to [0, 1] and quantized as follows ( k is the number of bits used for quantization): x_q = \\frac{1}{2^k-1} round \\left( \\left(2^k - 1 \\right) x_f \\right) Weights are clipped to [-1, 1] and quantized as follows: w_q = \\frac{1}{2^{k-1}-1} round \\left( \\left(2^{k-1} - 1 \\right)w_f \\right) Note that k-1 bits are used to quantize weights, leaving one bit for sign. This method requires training the model with quantization-aware training, as discussed here . Use the WRPNQuantizer class to transform an existing model to a model suitable for training with quantization using WRPN.","title":"WRPN"},{"location":"algo_quantization.html#notes_1","text":"The paper proposed widening of layers as a means to reduce accuracy loss. This isn't implemented as part of WRPNQuantizer at the moment. To experiment with this, modify your model implementation to have wider layers. The paper defines special handling for binary weights which isn't supported in Distiller yet.","title":"Notes:"},{"location":"conditional_computation.html","text":"Conditional Computation Conditional Computation refers to a class of algorithms in which each input sample uses a different part of the model, such that on average the compute, latency or power (depending on our objective) is reduced. To quote Bengio et. al \"Conditional computation refers to activating only some of the units in a network, in an input-dependent fashion. For example, if we think we\u2019re looking at a car, we only need to compute the activations of the vehicle detecting units, not of all features that a network could possible compute. The immediate effect of activating fewer units is that propagating information through the network will be faster, both at training as well as at test time. However, one needs to be able to decide in an intelligent fashion which units to turn on and off, depending on the input data. This is typically achieved with some form of gating structure, learned in parallel with the original network.\" As usual, there are several approaches to implement Conditional Computation: Sun et. al use several expert CNN, each trained on a different task, and combine them to one large network. Zheng et. al use cascading, an idea which may be familiar to you from Viola-Jones face detection. Theodorakopoulos et. al add small layers that learn which filters to use per input sample, and then enforce that during inference (LKAM module). Ioannou et. al introduce Conditional Networks: that \"can be thought of as: i) decision trees augmented with data transformation operators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions\" Bolukbasi et. al \"learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example.\" Conditional Computation is especially useful for real-time, latency-sensitive applicative. In Distiller we currently have implemented a variant of Early Exit. References Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup. Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition , arXiv:1511.06297v2, 2016. Y. Sun, X.Wang, and X. Tang. Deep Convolutional Network Cascade for Facial Point Detection . In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2014 X. Zheng, W.Ouyang, and X.Wang. Multi-Stage Contextual Deep Learning for Pedestrian Detection. In Proc. IEEE Intl Conf. on Computer Vision (ICCV), 2014. I. Theodorakopoulos, V. Pothos, D. Kastaniotis and N. Fragoulis1. Parsimonious Inference on Convolutional Neural Networks: Learning and applying on-line kernel activation rules. Irida Labs S.A, January 2017 Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama Adaptive Neural Networks for Efficient Inference . Proceedings of the 34th International Conference on Machine Learning, PMLR 70:527-536, 2017. Yani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, Antonio Criminisi . Decision Forests, Convolutional Networks and the Models in-Between , arXiv:1511.06297v2, 2016.","title":"Conditional Computation"},{"location":"conditional_computation.html#conditional-computation","text":"Conditional Computation refers to a class of algorithms in which each input sample uses a different part of the model, such that on average the compute, latency or power (depending on our objective) is reduced. To quote Bengio et. al \"Conditional computation refers to activating only some of the units in a network, in an input-dependent fashion. For example, if we think we\u2019re looking at a car, we only need to compute the activations of the vehicle detecting units, not of all features that a network could possible compute. The immediate effect of activating fewer units is that propagating information through the network will be faster, both at training as well as at test time. However, one needs to be able to decide in an intelligent fashion which units to turn on and off, depending on the input data. This is typically achieved with some form of gating structure, learned in parallel with the original network.\" As usual, there are several approaches to implement Conditional Computation: Sun et. al use several expert CNN, each trained on a different task, and combine them to one large network. Zheng et. al use cascading, an idea which may be familiar to you from Viola-Jones face detection. Theodorakopoulos et. al add small layers that learn which filters to use per input sample, and then enforce that during inference (LKAM module). Ioannou et. al introduce Conditional Networks: that \"can be thought of as: i) decision trees augmented with data transformation operators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions\" Bolukbasi et. al \"learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example.\" Conditional Computation is especially useful for real-time, latency-sensitive applicative. In Distiller we currently have implemented a variant of Early Exit.","title":"Conditional Computation"},{"location":"conditional_computation.html#references","text":"Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup. Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition , arXiv:1511.06297v2, 2016. Y. Sun, X.Wang, and X. Tang. Deep Convolutional Network Cascade for Facial Point Detection . In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2014 X. Zheng, W.Ouyang, and X.Wang. Multi-Stage Contextual Deep Learning for Pedestrian Detection. In Proc. IEEE Intl Conf. on Computer Vision (ICCV), 2014. I. Theodorakopoulos, V. Pothos, D. Kastaniotis and N. Fragoulis1. Parsimonious Inference on Convolutional Neural Networks: Learning and applying on-line kernel activation rules. Irida Labs S.A, January 2017 Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama Adaptive Neural Networks for Efficient Inference . Proceedings of the 34th International Conference on Machine Learning, PMLR 70:527-536, 2017. Yani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, Antonio Criminisi . Decision Forests, Convolutional Networks and the Models in-Between , arXiv:1511.06297v2, 2016.","title":"References"},{"location":"design.html","text":"Distiller design Distiller is designed to be easily integrated into your own PyTorch research applications. It is easiest to understand this integration by examining the code of the sample application for compressing image classification models ( compress_classifier.py ). The application borrows its main flow code from torchvision's ImageNet classification training sample application (https://github.com/pytorch/examples/tree/master/imagenet). We tried to keep it similar, in order to make it familiar and easy to understand. Integrating compression is very simple: simply add invocations of the appropriate compression_scheduler callbacks, for each stage in the training. The training skeleton looks like the pseudo code below. The boiler-plate Pytorch classification training is speckled with invocations of CompressionScheduler. For each epoch: compression_scheduler.on_epoch_begin(epoch) train() validate() save_checkpoint() compression_scheduler.on_epoch_end(epoch) train(): For each training step: compression_scheduler.on_minibatch_begin(epoch) output = model(input_var) loss = criterion(output, target_var) compression_scheduler.before_backward_pass(epoch) loss.backward() optimizer.step() compression_scheduler.on_minibatch_end(epoch) These callbacks can be seen in the diagram below, as the arrow pointing from the Training Loop and into Distiller's Scheduler , which invokes the correct algorithm. The application also uses Distiller services to collect statistics in Summaries and logs files, which can be queried at a later time, from Jupyter notebooks or TensorBoard. Sparsification and fine-tuning The application sets up a model as normally done in PyTorch. And then instantiates a Scheduler and configures it: Scheduler configuration is defined in a YAML file The configuration specifies Policies. Each Policy is tied to a specific algorithm which controls some aspect of the training. Some types of algorithms control the actual sparsification of the model. Such types are \"pruner\" and \"regularizer\". Some algorithms control some parameter of the training process, such as the learning-rate decay scheduler ( lr_scheduler ). The parameters of each algorithm are also specified in the configuration. In addition to specifying the algorithm, each Policy specifies scheduling parameters which control when the algorithm is executed: start epoch, end epoch and frequency. The Scheduler exposes callbacks for relevant training stages: epoch start/end, mini-batch start/end and pre-backward pass. Each scheduler callback activates the policies that were defined according the schedule that was defined. These callbacks are placed the training loop. Quantization A quantized model is obtained by replacing existing operations with quantized versions. The quantized versions can be either complete replacements, or wrappers. A wrapper will use the existing modules internally and add quantization and de-quantization operations before/after as necessary. In Distiller we will provide a set of quantized versions of common operations which will enable implementation of different quantization methods. The user can write a quantized model from scratch, using the quantized operations provided. We also provide a mechanism which takes an existing model and automatically replaces required operations with quantized versions. This mechanism is exposed by the Quantizer class. Quantizer should be sub-classed for each quantization method. Model Transformation The high-level flow is as follows: Define a mapping between the module types to be replaced (e.g. Conv2D, Linear, etc.) to a function which generates the replacement module. The mapping is defined in the replacement_factory attribute of the Quantizer class. Iterate over the modules defined in the model. For each module, if its type is in the mapping, call the replacement generation function. We pass the existing module to this function to allow wrapping of it. Replace the existing module with the module returned by the function. It is important to note that the name of the module does not change, as that could break the forward function of the parent module. Different quantization methods may, obviously, use different quantized operations. In addition, different methods may employ different \"strategies\" of replacing / wrapping existing modules. For instance, some methods replace ReLU with another activation function, while others keep it. Hence, for each quantization method, a different mapping will likely be defined. Each sub-class of Quantizer should populate the replacement_factory dictionary attribute with the appropriate mapping. To execute the model transformation, call the prepare_model function of the Quantizer instance. Flexible Bit-Widths Each instance of Quantizer is parameterized by the number of bits to be used for quantization of different tensor types. The default ones are activations and weights. These are the bits_activations and bits_weights parameters in Quantizer 's constructor. Sub-classes may define bit-widths for other tensor types as needed. We also want to be able to override the default number of bits mentioned in the bullet above for certain layers. These could be very specific layers. However, many models are comprised of building blocks (\"container\" modules, such as Sequential) which contain several modules, and it is likely we'll want to override settings for entire blocks, or for a certain module across different blocks. When such building blocks are used, the names of the internal modules usually follow some pattern. So, for this purpose, Quantizer also accepts a mapping of regular expressions to number of bits. This allows the user to override specific layers using they're exact name, or a group of layers via a regular expression. This mapping is passed via the bits_overrides parameter in the constructor. The bits_overrides mapping is required to be an instance of collections.OrderedDict (as opposed to just a simple Python dict ). This is done in order to enable handling of overlapping name patterns. So, for example, one could define certain override parameters for a group of layers, e.g. 'conv*', but also define different parameters for specific layers in that group, e.g. 'conv1'. The patterns are evaluated eagerly - the first match wins. Therefore, the more specific patterns must come before the broad patterns. Weights Quantization The Quantizer class also provides an API to quantize the weights of all layers at once. To use it, the param_quantization_fn attribute needs to point to a function that accepts a tensor and the number of bits. During model transformation, the Quantizer class will build a list of all model parameters that need to be quantized along with their bit-width. Then, the quantize_params function can be called, which will iterate over all parameters and quantize them using params_quantization_fn . Quantization-Aware Training The Quantizer class supports quantization-aware training, that is - training with quantization in the loop. This requires handling of a couple of flows / scenarios: Maintaining a full precision copy of the weights, as described here . This is enabled by setting train_with_fp_copy=True in the Quantizer constructor. At model transformation, in each module that has parameters that should be quantized, a new torch.nn.Parameter is added, which will maintain the required full precision copy of the parameters. Note that this is done in-place - a new module is not created. We preferred not to sub-class the existing PyTorch modules for this purpose. In order to this in-place, and also guarantee proper back-propagation through the weights quantization function, we employ the following \"hack\": The existing torch.nn.Parameter , e.g. weights , is replaced by a torch.nn.Parameter named float_weight . To maintain the existing functionality of the module, we then register a buffer in the module with the original name - weights . During training, float_weight will be passed to param_quantization_fn and the result will be stored in weight . In addition, some quantization methods may introduce additional learned parameters to the model. For example, in the PACT method, acitvations are clipped to a value \\alpha , which is a learned parameter per-layer To support these two cases, the Quantizer class also accepts an instance of a torch.optim.Optimizer (normally this would be one an instance of its sub-classes). The quantizer will take care of modifying the optimizer according to the changes made to the parameters. Optimizing New Parameters In cases where new parameters are required by the scheme, it is likely that they'll need to be optimized separately from the main model parameters. In that case, the sub-class for the speicifc method should override Quantizer._get_updated_optimizer_params_groups() , and return the proper groups plus any desired hyper-parameter overrides. Examples The base Quantizer class is implemented in distiller/quantization/quantizer.py . For a simple sub-class implementing symmetric linear quantization, see SymmetricLinearQuantizer in distiller/quantization/range_linear.py . In distiller/quantization/clipped_linear.py there are examples of lower-precision methods which use training with quantization. Specifically, see PACTQuantizer for an example of overriding Quantizer._get_updated_optimizer_params_groups() .","title":"Design"},{"location":"design.html#distiller-design","text":"Distiller is designed to be easily integrated into your own PyTorch research applications. It is easiest to understand this integration by examining the code of the sample application for compressing image classification models ( compress_classifier.py ). The application borrows its main flow code from torchvision's ImageNet classification training sample application (https://github.com/pytorch/examples/tree/master/imagenet). We tried to keep it similar, in order to make it familiar and easy to understand. Integrating compression is very simple: simply add invocations of the appropriate compression_scheduler callbacks, for each stage in the training. The training skeleton looks like the pseudo code below. The boiler-plate Pytorch classification training is speckled with invocations of CompressionScheduler. For each epoch: compression_scheduler.on_epoch_begin(epoch) train() validate() save_checkpoint() compression_scheduler.on_epoch_end(epoch) train(): For each training step: compression_scheduler.on_minibatch_begin(epoch) output = model(input_var) loss = criterion(output, target_var) compression_scheduler.before_backward_pass(epoch) loss.backward() optimizer.step() compression_scheduler.on_minibatch_end(epoch) These callbacks can be seen in the diagram below, as the arrow pointing from the Training Loop and into Distiller's Scheduler , which invokes the correct algorithm. The application also uses Distiller services to collect statistics in Summaries and logs files, which can be queried at a later time, from Jupyter notebooks or TensorBoard.","title":"Distiller design"},{"location":"design.html#sparsification-and-fine-tuning","text":"The application sets up a model as normally done in PyTorch. And then instantiates a Scheduler and configures it: Scheduler configuration is defined in a YAML file The configuration specifies Policies. Each Policy is tied to a specific algorithm which controls some aspect of the training. Some types of algorithms control the actual sparsification of the model. Such types are \"pruner\" and \"regularizer\". Some algorithms control some parameter of the training process, such as the learning-rate decay scheduler ( lr_scheduler ). The parameters of each algorithm are also specified in the configuration. In addition to specifying the algorithm, each Policy specifies scheduling parameters which control when the algorithm is executed: start epoch, end epoch and frequency. The Scheduler exposes callbacks for relevant training stages: epoch start/end, mini-batch start/end and pre-backward pass. Each scheduler callback activates the policies that were defined according the schedule that was defined. These callbacks are placed the training loop.","title":"Sparsification and fine-tuning"},{"location":"design.html#quantization","text":"A quantized model is obtained by replacing existing operations with quantized versions. The quantized versions can be either complete replacements, or wrappers. A wrapper will use the existing modules internally and add quantization and de-quantization operations before/after as necessary. In Distiller we will provide a set of quantized versions of common operations which will enable implementation of different quantization methods. The user can write a quantized model from scratch, using the quantized operations provided. We also provide a mechanism which takes an existing model and automatically replaces required operations with quantized versions. This mechanism is exposed by the Quantizer class. Quantizer should be sub-classed for each quantization method.","title":"Quantization"},{"location":"design.html#model-transformation","text":"The high-level flow is as follows: Define a mapping between the module types to be replaced (e.g. Conv2D, Linear, etc.) to a function which generates the replacement module. The mapping is defined in the replacement_factory attribute of the Quantizer class. Iterate over the modules defined in the model. For each module, if its type is in the mapping, call the replacement generation function. We pass the existing module to this function to allow wrapping of it. Replace the existing module with the module returned by the function. It is important to note that the name of the module does not change, as that could break the forward function of the parent module. Different quantization methods may, obviously, use different quantized operations. In addition, different methods may employ different \"strategies\" of replacing / wrapping existing modules. For instance, some methods replace ReLU with another activation function, while others keep it. Hence, for each quantization method, a different mapping will likely be defined. Each sub-class of Quantizer should populate the replacement_factory dictionary attribute with the appropriate mapping. To execute the model transformation, call the prepare_model function of the Quantizer instance.","title":"Model Transformation"},{"location":"design.html#flexible-bit-widths","text":"Each instance of Quantizer is parameterized by the number of bits to be used for quantization of different tensor types. The default ones are activations and weights. These are the bits_activations and bits_weights parameters in Quantizer 's constructor. Sub-classes may define bit-widths for other tensor types as needed. We also want to be able to override the default number of bits mentioned in the bullet above for certain layers. These could be very specific layers. However, many models are comprised of building blocks (\"container\" modules, such as Sequential) which contain several modules, and it is likely we'll want to override settings for entire blocks, or for a certain module across different blocks. When such building blocks are used, the names of the internal modules usually follow some pattern. So, for this purpose, Quantizer also accepts a mapping of regular expressions to number of bits. This allows the user to override specific layers using they're exact name, or a group of layers via a regular expression. This mapping is passed via the bits_overrides parameter in the constructor. The bits_overrides mapping is required to be an instance of collections.OrderedDict (as opposed to just a simple Python dict ). This is done in order to enable handling of overlapping name patterns. So, for example, one could define certain override parameters for a group of layers, e.g. 'conv*', but also define different parameters for specific layers in that group, e.g. 'conv1'. The patterns are evaluated eagerly - the first match wins. Therefore, the more specific patterns must come before the broad patterns.","title":"Flexible Bit-Widths"},{"location":"design.html#weights-quantization","text":"The Quantizer class also provides an API to quantize the weights of all layers at once. To use it, the param_quantization_fn attribute needs to point to a function that accepts a tensor and the number of bits. During model transformation, the Quantizer class will build a list of all model parameters that need to be quantized along with their bit-width. Then, the quantize_params function can be called, which will iterate over all parameters and quantize them using params_quantization_fn .","title":"Weights Quantization"},{"location":"design.html#quantization-aware-training","text":"The Quantizer class supports quantization-aware training, that is - training with quantization in the loop. This requires handling of a couple of flows / scenarios: Maintaining a full precision copy of the weights, as described here . This is enabled by setting train_with_fp_copy=True in the Quantizer constructor. At model transformation, in each module that has parameters that should be quantized, a new torch.nn.Parameter is added, which will maintain the required full precision copy of the parameters. Note that this is done in-place - a new module is not created. We preferred not to sub-class the existing PyTorch modules for this purpose. In order to this in-place, and also guarantee proper back-propagation through the weights quantization function, we employ the following \"hack\": The existing torch.nn.Parameter , e.g. weights , is replaced by a torch.nn.Parameter named float_weight . To maintain the existing functionality of the module, we then register a buffer in the module with the original name - weights . During training, float_weight will be passed to param_quantization_fn and the result will be stored in weight . In addition, some quantization methods may introduce additional learned parameters to the model. For example, in the PACT method, acitvations are clipped to a value \\alpha , which is a learned parameter per-layer To support these two cases, the Quantizer class also accepts an instance of a torch.optim.Optimizer (normally this would be one an instance of its sub-classes). The quantizer will take care of modifying the optimizer according to the changes made to the parameters. Optimizing New Parameters In cases where new parameters are required by the scheme, it is likely that they'll need to be optimized separately from the main model parameters. In that case, the sub-class for the speicifc method should override Quantizer._get_updated_optimizer_params_groups() , and return the proper groups plus any desired hyper-parameter overrides.","title":"Quantization-Aware Training"},{"location":"design.html#examples","text":"The base Quantizer class is implemented in distiller/quantization/quantizer.py . For a simple sub-class implementing symmetric linear quantization, see SymmetricLinearQuantizer in distiller/quantization/range_linear.py . In distiller/quantization/clipped_linear.py there are examples of lower-precision methods which use training with quantization. Specifically, see PACTQuantizer for an example of overriding Quantizer._get_updated_optimizer_params_groups() .","title":"Examples"},{"location":"earlyexit.html","text":"Early Exit Inference While Deep Neural Networks benefit from a large number of layers, it's often the case that many datapoints in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks look at a selective approach to exit placement and criteria for exiting early. Why Does Early Exit Work? Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can representative more complex and expressive boundaries between classes (assuming we\u2019re confident of avoiding over-fitting the data), it\u2019s also clear that much of the data can be properly classified with even the simplest of classification boundaries. Data points far from the boundary can be considered \"easy to classify\" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is \"difficult to classify\" and require the full expressiveness of the neural network to accurately classify it. Example code for Early Exit Both CIFAR10 and Imagenet code comes directly from publically available examples from Pytorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work. Deeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and Imagenet, respectively. Note that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues. Heuristics The insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more agressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy. There are other benefits to adding exits in that training the modified network now has backpropagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient. Early Exit Hyperparameters There are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit: --earlyexit_thresholds defines the thresholds for each of the early exits. The cross entropy measure must be less than the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify \"--earlyexit_thresholds 0.9 1.2\" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits. --earlyexit_lossweights provide the weights for the linear combination of losses during training to compute a signle, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of \"--earlyexit_lossweights 0.2 0.3\" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy. Output Stats The example code outputs various statistics regarding the loss and accuracy at each of the exits. During training, the Top1 and Top5 stats represent the accuracy should all of the data be forced out that exit (in order to compute the loss at that exit). During inference (i.e. validation and test stages), the Top1 and Top5 stats represent the accuracy for those data points that could exit because the calculated entropy at that exit was lower than the specified threshold for that exit. CIFAR10 In the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers. Imagenet This supports training and inference of the imagenet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic resnet code and could be used with other size resnets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly. References Priyadarshini Panda, Abhronil Sengupta, Kaushik Roy . Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition , arXiv:1509.08971v6, 2017. Surat Teerapittayanon, Bradley McDanel, H. T. Kung . BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks , arXiv:1709.01686, 2017.","title":"Early Exit Inference"},{"location":"earlyexit.html#early-exit-inference","text":"While Deep Neural Networks benefit from a large number of layers, it's often the case that many datapoints in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks look at a selective approach to exit placement and criteria for exiting early.","title":"Early Exit Inference"},{"location":"earlyexit.html#why-does-early-exit-work","text":"Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can representative more complex and expressive boundaries between classes (assuming we\u2019re confident of avoiding over-fitting the data), it\u2019s also clear that much of the data can be properly classified with even the simplest of classification boundaries. Data points far from the boundary can be considered \"easy to classify\" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is \"difficult to classify\" and require the full expressiveness of the neural network to accurately classify it.","title":"Why Does Early Exit Work?"},{"location":"earlyexit.html#example-code-for-early-exit","text":"Both CIFAR10 and Imagenet code comes directly from publically available examples from Pytorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work. Deeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and Imagenet, respectively. Note that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues.","title":"Example code for Early Exit"},{"location":"earlyexit.html#heuristics","text":"The insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more agressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy. There are other benefits to adding exits in that training the modified network now has backpropagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient.","title":"Heuristics"},{"location":"earlyexit.html#early-exit-hyperparameters","text":"There are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit: --earlyexit_thresholds defines the thresholds for each of the early exits. The cross entropy measure must be less than the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify \"--earlyexit_thresholds 0.9 1.2\" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits. --earlyexit_lossweights provide the weights for the linear combination of losses during training to compute a signle, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of \"--earlyexit_lossweights 0.2 0.3\" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy.","title":"Early Exit Hyperparameters"},{"location":"earlyexit.html#output-stats","text":"The example code outputs various statistics regarding the loss and accuracy at each of the exits. During training, the Top1 and Top5 stats represent the accuracy should all of the data be forced out that exit (in order to compute the loss at that exit). During inference (i.e. validation and test stages), the Top1 and Top5 stats represent the accuracy for those data points that could exit because the calculated entropy at that exit was lower than the specified threshold for that exit.","title":"Output Stats"},{"location":"earlyexit.html#cifar10","text":"In the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers.","title":"CIFAR10"},{"location":"earlyexit.html#imagenet","text":"This supports training and inference of the imagenet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic resnet code and could be used with other size resnets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly.","title":"Imagenet"},{"location":"earlyexit.html#references","text":"Priyadarshini Panda, Abhronil Sengupta, Kaushik Roy . Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition , arXiv:1509.08971v6, 2017. Surat Teerapittayanon, Bradley McDanel, H. T. Kung . BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks , arXiv:1709.01686, 2017.","title":"References"},{"location":"install.html","text":"Distiller Installation These instructions will help get Distiller up and running on your local machine. You may also want to refer to these resources: Dataset installation instructions. Jupyter installation instructions. Notes: - Distiller has only been tested on Ubuntu 16.04 LTS, and with Python 3.5. - If you are not using a GPU, you might need to make small adjustments to the code. Clone Distiller Clone the Distiller code repository from github: $ git clone https://github.com/NervanaSystems/distiller.git The rest of the documentation that follows, assumes that you have cloned your repository to a directory called distiller . Create a Python virtual environment We recommend using a Python virtual environment , but that of course, is up to you. There's nothing special about using Distiller in a virtual environment, but we provide some instructions, for completeness. Before creating the virtual environment, make sure you are located in directory distiller . After creating the environment, you should see a directory called distiller/env . Using virtualenv If you don't have virtualenv installed, you can find the installation instructions here . To create the environment, execute: $ python3 -m virtualenv env This creates a subdirectory named env where the python virtual environment is stored, and configures the current shell to use it as the default python environment. Using venv If you prefer to use venv , then begin by installing it: $ sudo apt-get install python3-venv Then create the environment: $ python3 -m venv env As with virtualenv, this creates a directory called distiller/env . Activate the environment The environment activation and deactivation commands for venv and virtualenv are the same. !NOTE: Make sure to activate the environment, before proceeding with the installation of the dependency packages: $ source env/bin/activate Install the package Finally, install the Distiller package and its dependencies using pip3 : $ cd distiller $ pip3 install -e . This installs Distiller in \"development mode\", meaning any changes made in the code are reflected in the environment without re-running the install command (so no need to re-install after pulling changes from the Git repository). PyTorch is included in the requirements.txt file, and will currently download PyTorch version 1.0.1 for CUDA 9.0. This is the setup we've used for testing Distiller.","title":"Installation"},{"location":"install.html#distiller-installation","text":"These instructions will help get Distiller up and running on your local machine. You may also want to refer to these resources: Dataset installation instructions. Jupyter installation instructions. Notes: - Distiller has only been tested on Ubuntu 16.04 LTS, and with Python 3.5. - If you are not using a GPU, you might need to make small adjustments to the code.","title":"Distiller Installation"},{"location":"install.html#clone-distiller","text":"Clone the Distiller code repository from github: $ git clone https://github.com/NervanaSystems/distiller.git The rest of the documentation that follows, assumes that you have cloned your repository to a directory called distiller .","title":"Clone Distiller"},{"location":"install.html#create-a-python-virtual-environment","text":"We recommend using a Python virtual environment , but that of course, is up to you. There's nothing special about using Distiller in a virtual environment, but we provide some instructions, for completeness. Before creating the virtual environment, make sure you are located in directory distiller . After creating the environment, you should see a directory called distiller/env .","title":"Create a Python virtual environment"},{"location":"install.html#using-virtualenv","text":"If you don't have virtualenv installed, you can find the installation instructions here . To create the environment, execute: $ python3 -m virtualenv env This creates a subdirectory named env where the python virtual environment is stored, and configures the current shell to use it as the default python environment.","title":"Using virtualenv"},{"location":"install.html#using-venv","text":"If you prefer to use venv , then begin by installing it: $ sudo apt-get install python3-venv Then create the environment: $ python3 -m venv env As with virtualenv, this creates a directory called distiller/env .","title":"Using venv"},{"location":"install.html#activate-the-environment","text":"The environment activation and deactivation commands for venv and virtualenv are the same. !NOTE: Make sure to activate the environment, before proceeding with the installation of the dependency packages: $ source env/bin/activate","title":"Activate the environment"},{"location":"install.html#install-the-package","text":"Finally, install the Distiller package and its dependencies using pip3 : $ cd distiller $ pip3 install -e . This installs Distiller in \"development mode\", meaning any changes made in the code are reflected in the environment without re-running the install command (so no need to re-install after pulling changes from the Git repository). PyTorch is included in the requirements.txt file, and will currently download PyTorch version 1.0.1 for CUDA 9.0. This is the setup we've used for testing Distiller.","title":"Install the package"},{"location":"jupyter.html","text":"Jupyter environment The Jupyter notebooks environment allows us to plan our compression session and load Distiller data summaries to study and analyze compression results. Each notebook has embedded instructions and explanations, so here we provide only a brief description of each notebook. Installation Jupyter and its dependencies are included as part of the main requirements.txt file, so there is no need for a dedicated installation step. However, to use the ipywidgets extension, you will need to enable it: $ jupyter nbextension enable --py widgetsnbextension --sys-prefix You may want to refer to the ipywidgets extension installation documentation . Another extension which requires special installation handling is Qgrid . Qgrid is a Jupyter notebook widget that adds interactive features, such as sorting, to Panadas DataFrames rendering. To enable Qgrid: $ jupyter nbextension enable --py --sys-prefix qgrid Launching the Jupyter server There are all kinds of options to use when launching Jupyter which you can use. The example below tells the server to listen to connections from any IP address, and not to launch the browser window, but of course, you are free to launch Jupyter any way you want. Consult the user's guide for more details. $ jupyter-notebook --ip=0.0.0.0 --no-browser Using the Distiller notebooks The Distiller Jupyter notebooks are located in the distiller/jupyter directory. They are provided as tools that you can use to prepare your compression experiments and study their results. We welcome new ideas and implementations of Jupyter. Roughly, the notebooks can be divided into three categories. Theory jupyter/L1-regularization.ipynb : Experience hands-on how L1 and L2 regularization affect the solution of a toy loss-minimization problem, to get a better grasp on the interaction between regularization and sparsity. jupyter/alexnet_insights.ipynb : This notebook reviews and compares a couple of pruning sessions on Alexnet. We compare distributions, performance, statistics and show some visualizations of the weights tensors. Preparation for compression jupyter/model_summary.ipynb : Begin by getting familiar with your model. Examine the sizes and properties of layers and connections. Study which layers are compute-bound, and which are bandwidth-bound, and decide how to prune or regularize the model. jupyter/sensitivity_analysis.ipynb : If you performed pruning sensitivity analysis on your model, this notebook can help you load the results and graphically study how the layers behave. jupyter/interactive_lr_scheduler.ipynb : The learning rate decay policy affects pruning results, perhaps as much as it affects training results. Graph a few LR-decay policies to see how they behave. jupyter/jupyter/agp_schedule.ipynb : If you are using the Automated Gradual Pruner, this notebook can help you tune the schedule. Reviewing experiment results jupyter/compare_executions.ipynb : This is a simple notebook to help you graphically compare the results of executions of several experiments. jupyter/compression_insights.ipynb : This notebook is packed with code, tables and graphs to us understand the results of a compression session. Distiller provides summaries , which are Pandas dataframes, which contain statistical information about you model. We chose to use Pandas dataframes because they can be sliced, queried, summarized and graphed with a few lines of code.","title":"Jupyter Notebooks"},{"location":"jupyter.html#jupyter-environment","text":"The Jupyter notebooks environment allows us to plan our compression session and load Distiller data summaries to study and analyze compression results. Each notebook has embedded instructions and explanations, so here we provide only a brief description of each notebook.","title":"Jupyter environment"},{"location":"jupyter.html#installation","text":"Jupyter and its dependencies are included as part of the main requirements.txt file, so there is no need for a dedicated installation step. However, to use the ipywidgets extension, you will need to enable it: $ jupyter nbextension enable --py widgetsnbextension --sys-prefix You may want to refer to the ipywidgets extension installation documentation . Another extension which requires special installation handling is Qgrid . Qgrid is a Jupyter notebook widget that adds interactive features, such as sorting, to Panadas DataFrames rendering. To enable Qgrid: $ jupyter nbextension enable --py --sys-prefix qgrid","title":"Installation"},{"location":"jupyter.html#launching-the-jupyter-server","text":"There are all kinds of options to use when launching Jupyter which you can use. The example below tells the server to listen to connections from any IP address, and not to launch the browser window, but of course, you are free to launch Jupyter any way you want. Consult the user's guide for more details. $ jupyter-notebook --ip=0.0.0.0 --no-browser","title":"Launching the Jupyter server"},{"location":"jupyter.html#using-the-distiller-notebooks","text":"The Distiller Jupyter notebooks are located in the distiller/jupyter directory. They are provided as tools that you can use to prepare your compression experiments and study their results. We welcome new ideas and implementations of Jupyter. Roughly, the notebooks can be divided into three categories.","title":"Using the Distiller notebooks"},{"location":"jupyter.html#theory","text":"jupyter/L1-regularization.ipynb : Experience hands-on how L1 and L2 regularization affect the solution of a toy loss-minimization problem, to get a better grasp on the interaction between regularization and sparsity. jupyter/alexnet_insights.ipynb : This notebook reviews and compares a couple of pruning sessions on Alexnet. We compare distributions, performance, statistics and show some visualizations of the weights tensors.","title":"Theory"},{"location":"jupyter.html#preparation-for-compression","text":"jupyter/model_summary.ipynb : Begin by getting familiar with your model. Examine the sizes and properties of layers and connections. Study which layers are compute-bound, and which are bandwidth-bound, and decide how to prune or regularize the model. jupyter/sensitivity_analysis.ipynb : If you performed pruning sensitivity analysis on your model, this notebook can help you load the results and graphically study how the layers behave. jupyter/interactive_lr_scheduler.ipynb : The learning rate decay policy affects pruning results, perhaps as much as it affects training results. Graph a few LR-decay policies to see how they behave. jupyter/jupyter/agp_schedule.ipynb : If you are using the Automated Gradual Pruner, this notebook can help you tune the schedule.","title":"Preparation for compression"},{"location":"jupyter.html#reviewing-experiment-results","text":"jupyter/compare_executions.ipynb : This is a simple notebook to help you graphically compare the results of executions of several experiments. jupyter/compression_insights.ipynb : This notebook is packed with code, tables and graphs to us understand the results of a compression session. Distiller provides summaries , which are Pandas dataframes, which contain statistical information about you model. We chose to use Pandas dataframes because they can be sliced, queried, summarized and graphed with a few lines of code.","title":"Reviewing experiment results"},{"location":"knowledge_distillation.html","text":"Knowledge Distillation (For details on how to train a model with knowledge distillation in Distiller, see here ) Knowledge distillation is model compression method in which a small model is trained to mimic a pre-trained, larger model (or ensemble of models). This training setting is sometimes referred to as \"teacher-student\", where the large model is the teacher and the small model is the student (we'll be using these terms interchangeably). The method was first proposed by Bucila et al., 2006 and generalized by Hinton et al., 2015 . The implementation in Distiller is based on the latter publication. Here we'll provide a summary of the method. For more information the reader may refer to the paper (a video lecture with slides is also available). In distillation, knowledge is transferred from the teacher model to the student by minimizing a loss function in which the target is the distribution of class probabilities predicted by the teacher model. That is - the output of a softmax function on the teacher model's logits. However, in many cases, this probability distribution has the correct class at a very high probability, with all other class probabilities very close to 0. As such, it doesn't provide much information beyond the ground truth labels already provided in the dataset. To tackle this issue, Hinton et al., 2015 introduced the concept of \"softmax temperature\". The probability p_i of class i is calculated from the logits z as: p_i = \\frac{exp\\left(\\frac{z_i}{T}\\right)}{\\sum_{j} \\exp\\left(\\frac{z_j}{T}\\right)} where T is the temperature parameter. When T=1 we get the standard softmax function. As T grows, the probability distribution generated by the softmax function becomes softer, providing more information as to which classes the teacher found more similar to the predicted class. Hinton calls this the \"dark knowledge\" embedded in the teacher model, and it is this dark knowledge that we are transferring to the student model in the distillation process. When computing the loss function vs. the teacher's soft targets, we use the same value of T to compute the softmax on the student's logits. We call this loss the \"distillation loss\". Hinton et al., 2015 found that it is also beneficial to train the distilled model to produce the correct labels (based on the ground truth) in addition to the teacher's soft-labels. Hence, we also calculate the \"standard\" loss between the student's predicted class probabilities and the ground-truth labels (also called \"hard labels/targets\"). We dub this loss the \"student loss\". When calculating the class probabilities for the student loss we use T = 1 . The overall loss function, incorporating both distillation and student losses, is calculated as: \\mathcal{L}(x;W) = \\alpha * \\mathcal{H}(y, \\sigma(z_s; T=1)) + \\beta * \\mathcal{H}(\\sigma(z_t; T=\\tau), \\sigma(z_s, T=\\tau)) where x is the input, W are the student model parameters, y is the ground truth label, \\mathcal{H} is the cross-entropy loss function, \\sigma is the softmax function parameterized by the temperature T , and \\alpha and \\beta are coefficients. z_s and z_t are the logits of the student and teacher respectively. New Hyper-Parameters In general \\tau , \\alpha and \\beta are hyper parameters. In their experiments, Hinton et al., 2015 use temperature values ranging from 1 to 20. They note that empirically, when the student model is very small compared to the teacher model, lower temperatures work better. This makes sense if we consider that as we raise the temperature, the resulting soft-labels distribution becomes richer in information, and a very small model might not be able to capture all of this information. However, there's no clear way to predict up front what kind of capacity for information the student model will have. With regards to \\alpha and \\beta , Hinton et al., 2015 use a weighted average between the distillation loss and the student loss. That is, \\beta = 1 - \\alpha . They note that in general, they obtained the best results when setting \\alpha to be much smaller than \\beta (although in one of their experiments they use \\alpha = \\beta = 0.5 ). Other works which utilize knowledge distillation don't use a weighted average. Some set \\alpha = 1 while leaving \\beta tunable, while others don't set any constraints. Combining with Other Model Compression Techniques In the \"basic\" scenario, the smaller (student) model is a pre-defined architecture which just has a smaller number of parameters compared to the teacher model. For example, we could train ResNet-18 by distilling knowledge from ResNet-34. But, a model with smaller capacity can also be obtained by other model compression techniques - sparsification and/or quantization. So, for example, we could train a 4-bit ResNet-18 model with some method using quantization-aware training, and use a distillation loss function as described above. In that case, the teacher model can even be a FP32 ResNet-18 model. Same goes for pruning and regularization. Tann et al., 2017 , Mishra and Marr, 2018 and Polino et al., 2018 are some works that combine knowledge distillation with quantization . Theis et al., 2018 and Ashok et al., 2018 combine distillation with pruning . References Cristian Bucila, Rich Caruana, and Alexandru Niculescu-Mizil . Model Compression. KDD, 2006 Geoffrey Hinton, Oriol Vinyals and Jeff Dean . Distilling the Knowledge in a Neural Network. arxiv:1503.02531 Hokchhay Tann, Soheil Hashemi, Iris Bahar and Sherief Reda . Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks. DAC, 2017 Asit Mishra and Debbie Marr . Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. ICLR, 2018 Antonio Polino, Razvan Pascanu and Dan Alistarh . Model compression via distillation and quantization. ICLR, 2018 Anubhav Ashok, Nicholas Rhinehart, Fares Beainy and Kris M. Kitani . N2N learning: Network to Network Compression via Policy Gradient Reinforcement Learning. ICLR, 2018 Lucas Theis, Iryna Korshunova, Alykhan Tejani and Ferenc Husz\u00e1r . Faster gaze prediction with dense networks and Fisher pruning. arxiv:1801.05787","title":"Knowledge Distillation"},{"location":"knowledge_distillation.html#knowledge-distillation","text":"(For details on how to train a model with knowledge distillation in Distiller, see here ) Knowledge distillation is model compression method in which a small model is trained to mimic a pre-trained, larger model (or ensemble of models). This training setting is sometimes referred to as \"teacher-student\", where the large model is the teacher and the small model is the student (we'll be using these terms interchangeably). The method was first proposed by Bucila et al., 2006 and generalized by Hinton et al., 2015 . The implementation in Distiller is based on the latter publication. Here we'll provide a summary of the method. For more information the reader may refer to the paper (a video lecture with slides is also available). In distillation, knowledge is transferred from the teacher model to the student by minimizing a loss function in which the target is the distribution of class probabilities predicted by the teacher model. That is - the output of a softmax function on the teacher model's logits. However, in many cases, this probability distribution has the correct class at a very high probability, with all other class probabilities very close to 0. As such, it doesn't provide much information beyond the ground truth labels already provided in the dataset. To tackle this issue, Hinton et al., 2015 introduced the concept of \"softmax temperature\". The probability p_i of class i is calculated from the logits z as: p_i = \\frac{exp\\left(\\frac{z_i}{T}\\right)}{\\sum_{j} \\exp\\left(\\frac{z_j}{T}\\right)} where T is the temperature parameter. When T=1 we get the standard softmax function. As T grows, the probability distribution generated by the softmax function becomes softer, providing more information as to which classes the teacher found more similar to the predicted class. Hinton calls this the \"dark knowledge\" embedded in the teacher model, and it is this dark knowledge that we are transferring to the student model in the distillation process. When computing the loss function vs. the teacher's soft targets, we use the same value of T to compute the softmax on the student's logits. We call this loss the \"distillation loss\". Hinton et al., 2015 found that it is also beneficial to train the distilled model to produce the correct labels (based on the ground truth) in addition to the teacher's soft-labels. Hence, we also calculate the \"standard\" loss between the student's predicted class probabilities and the ground-truth labels (also called \"hard labels/targets\"). We dub this loss the \"student loss\". When calculating the class probabilities for the student loss we use T = 1 . The overall loss function, incorporating both distillation and student losses, is calculated as: \\mathcal{L}(x;W) = \\alpha * \\mathcal{H}(y, \\sigma(z_s; T=1)) + \\beta * \\mathcal{H}(\\sigma(z_t; T=\\tau), \\sigma(z_s, T=\\tau)) where x is the input, W are the student model parameters, y is the ground truth label, \\mathcal{H} is the cross-entropy loss function, \\sigma is the softmax function parameterized by the temperature T , and \\alpha and \\beta are coefficients. z_s and z_t are the logits of the student and teacher respectively.","title":"Knowledge Distillation"},{"location":"knowledge_distillation.html#new-hyper-parameters","text":"In general \\tau , \\alpha and \\beta are hyper parameters. In their experiments, Hinton et al., 2015 use temperature values ranging from 1 to 20. They note that empirically, when the student model is very small compared to the teacher model, lower temperatures work better. This makes sense if we consider that as we raise the temperature, the resulting soft-labels distribution becomes richer in information, and a very small model might not be able to capture all of this information. However, there's no clear way to predict up front what kind of capacity for information the student model will have. With regards to \\alpha and \\beta , Hinton et al., 2015 use a weighted average between the distillation loss and the student loss. That is, \\beta = 1 - \\alpha . They note that in general, they obtained the best results when setting \\alpha to be much smaller than \\beta (although in one of their experiments they use \\alpha = \\beta = 0.5 ). Other works which utilize knowledge distillation don't use a weighted average. Some set \\alpha = 1 while leaving \\beta tunable, while others don't set any constraints.","title":"New Hyper-Parameters"},{"location":"knowledge_distillation.html#references","text":"Cristian Bucila, Rich Caruana, and Alexandru Niculescu-Mizil . Model Compression. KDD, 2006 Geoffrey Hinton, Oriol Vinyals and Jeff Dean . Distilling the Knowledge in a Neural Network. arxiv:1503.02531 Hokchhay Tann, Soheil Hashemi, Iris Bahar and Sherief Reda . Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks. DAC, 2017 Asit Mishra and Debbie Marr . Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. ICLR, 2018 Antonio Polino, Razvan Pascanu and Dan Alistarh . Model compression via distillation and quantization. ICLR, 2018 Anubhav Ashok, Nicholas Rhinehart, Fares Beainy and Kris M. Kitani . N2N learning: Network to Network Compression via Policy Gradient Reinforcement Learning. ICLR, 2018 Lucas Theis, Iryna Korshunova, Alykhan Tejani and Ferenc Husz\u00e1r . Faster gaze prediction with dense networks and Fisher pruning. arxiv:1801.05787","title":"References"},{"location":"model_zoo.html","text":"Distiller Model Zoo How to contribute models to the Model Zoo We encourage you to contribute new models to the Model Zoo. We welcome implementations of published papers or of your own work. To assure that models and algorithms shared with others are high-quality, please commit your models with the following: Command-line arguments Log files PyTorch model Contents The Distiller model zoo is not a \"traditional\" model-zoo, because it does not necessarily contain best-in-class compressed models. Instead, the model-zoo contains a number of deep learning models that have been compressed using Distiller following some well-known research papers. These are meant to serve as examples of how Distiller can be used. Each model contains a Distiller schedule detailing how the model was compressed, a PyTorch checkpoint, text logs and TensorBoard logs. table, th, td { border: 1px solid black; } Paper Dataset Network Method Granularity Schedule Features Learning both Weights and Connections for Efficient Neural Networks ImageNet Alexnet Element-wise pruning Iterative; Manual Magnitude thresholding based on a sensitivity quantifier. Element-wise sparsity sensitivity analysis To prune, or not to prune: exploring the efficacy of pruning for model compression ImageNet MobileNet Element-wise pruning Automated gradual; Iterative Magnitude thresholding based on target level Learning Structured Sparsity in Deep Neural Networks CIFAR10 ResNet20 Group regularization 1.Train with group-lasso 2.Remove zero groups and fine-tune Group Lasso regularization. Groups: kernels (2D), channels, filters (3D), layers (4D), vectors (rows, cols) Pruning Filters for Efficient ConvNets CIFAR10 ResNet56 Filter ranking; guided by sensitivity analysis 1.Rank filters 2. Remove filters and channels 3.Fine-tune One-shot ranking and pruning of filters; with network thinning Learning both Weights and Connections for Efficient Neural Networks This schedule is an example of \"Iterative Pruning\" for Alexnet/Imagent, as described in chapter 3 of Song Han's PhD dissertation: Efficient Methods and Hardware for Deep Learning and in his paper Learning both Weights and Connections for Efficient Neural Networks . The Distiller schedule uses SensitivityPruner which is similar to MagnitudeParameterPruner, but instead of specifying \"raw\" thresholds, it uses a \"sensitivity parameter\". Song Han's paper says that \"the pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layers weights,\" and this is not explained much further. In Distiller, the \"quality parameter\" is referred to as \"sensitivity\" and is based on the values learned from performing sensitivity analysis. Using a parameter that is related to the standard deviation is very helpful: under the assumption that the weights tensors are distributed normally, the standard deviation acts as a threshold normalizer. Note that Distiller's implementation deviates slightly from the algorithm Song Han describes in his PhD dissertation, in that the threshold value is set only once. In his PhD dissertation, Song Han describes a growing threshold, at each iteration. This requires n+1 hyper-parameters (n being the number of pruning iterations we use): the threshold and the threshold increase (delta) at each pruning iteration. Distiller's implementation takes advantage of the fact that as pruning progresses, more weights are pulled toward zero, and therefore the threshold \"traps\" more weights. Thus, we can use less hyper-parameters and achieve the same results. Distiller schedule: distiller/examples/sensitivity-pruning/alexnet.schedule_sensitivity.yaml Checkpoint file: alexnet.checkpoint.89.pth.tar Results Our reference is TorchVision's pretrained Alexnet model which has a Top1 accuracy of 56.55 and Top5=79.09. We prune away 88.44% of the parameters and achieve Top1=56.61 and Top5=79.45. Song Han prunes 89% of the parameters, which is slightly better than our results. Parameters: +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ | | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean |----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------| | 0 | features.module.0.weight | (64, 3, 11, 11) | 23232 | 13411 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 42.27359 | 0.14391 | -0.00002 | 0.08805 | | 1 | features.module.3.weight | (192, 64, 5, 5) | 307200 | 115560 | 0.00000 | 0.00000 | 0.00000 | 1.91243 | 0.00000 | 62.38281 | 0.04703 | -0.00250 | 0.02289 | | 2 | features.module.6.weight | (384, 192, 3, 3) | 663552 | 256565 | 0.00000 | 0.00000 | 0.00000 | 6.18490 | 0.00000 | 61.33445 | 0.03354 | -0.00184 | 0.01803 | | 3 | features.module.8.weight | (256, 384, 3, 3) | 884736 | 315065 | 0.00000 | 0.00000 | 0.00000 | 6.96411 | 0.00000 | 64.38881 | 0.02646 | -0.00168 | 0.01422 | | 4 | features.module.10.weight | (256, 256, 3, 3) | 589824 | 186938 | 0.00000 | 0.00000 | 0.00000 | 15.49225 | 0.00000 | 68.30614 | 0.02714 | -0.00246 | 0.01409 | | 5 | classifier.1.weight | (4096, 9216) | 37748736 | 3398881 | 0.00000 | 0.21973 | 0.00000 | 0.21973 | 0.00000 | 90.99604 | 0.00589 | -0.00020 | 0.00168 | | 6 | classifier.4.weight | (4096, 4096) | 16777216 | 1782769 | 0.21973 | 3.46680 | 0.00000 | 3.46680 | 0.00000 | 89.37387 | 0.00849 | -0.00066 | 0.00263 | | 7 | classifier.6.weight | (1000, 4096) | 4096000 | 994738 | 3.36914 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 75.71440 | 0.01718 | 0.00030 | 0.00778 | | 8 | Total sparsity: | - | 61090496 | 7063928 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 88.43694 | 0.00000 | 0.00000 | 0.00000 | +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ 2018-04-04 21:30:52,499 - Total sparsity: 88.44 2018-04-04 21:30:52,499 - --- validate (epoch=89)----------- 2018-04-04 21:30:52,499 - 128116 samples (256 per mini-batch) 2018-04-04 21:31:35,357 - == Top1: 51.838 Top5: 74.817 Loss: 2.150 2018-04-04 21:31:39,251 - --- test --------------------- 2018-04-04 21:31:39,252 - 50000 samples (256 per mini-batch) 2018-04-04 21:32:01,274 - == Top1: 56.606 Top5: 79.446 Loss: 1.893 To prune, or not to prune: exploring the efficacy of pruning for model compression In their paper Zhu and Gupta, \"compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint.\" They also \"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning.\" This pruning schedule is implemented by distiller.AutomatedGradualPruner, which increases the sparsity level (expressed as a percentage of zero-valued elements) gradually over several pruning steps. Distiller's implementation only prunes elements once in an epoch (the model is fine-tuned in between pruning events), which is a small deviation from Zhu and Gupta's paper. The research paper specifies the schedule in terms of mini-batches, while our implementation specifies the schedule in terms of epochs. We feel that using epochs performs well, and is more \"stable\", since the number of mini-batches will change, if you change the batch size. ImageNet files: Distiller schedule: distiller/examples/agp-pruning/mobilenet.imagenet.schedule_agp.yaml Checkpoint file: checkpoint.pth.tar ResNet18 files: Distiller schedule: distiller/examples/agp-pruning/resnet18.schedule_agp.yaml Checkpoint file: checkpoint.pth.tar Results As our baseline we used a pretrained PyTorch MobileNet model (width=1) which has Top1=68.848 and Top5=88.740. In their paper, Zhu and Gupta prune 50% of the elements of MobileNet (width=1) with a 1.1% drop in accuracy. We pruned about 51.6% of the elements, with virtually no change in the accuracies (Top1: 68.808 and Top5: 88.656). We didn't try to prune more than this, but we do note that the baseline accuracy that we used is almost 2% lower than the accuracy published in the paper. +----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ | | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean | |----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------| | 0 | module.model.0.0.weight | (32, 3, 3, 3) | 864 | 864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.14466 | 0.00103 | 0.06508 | | 1 | module.model.1.0.weight | (32, 1, 3, 3) | 288 | 288 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.32146 | 0.01020 | 0.12932 | | 2 | module.model.1.3.weight | (64, 32, 1, 1) | 2048 | 2048 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.11942 | 0.00024 | 0.03627 | | 3 | module.model.2.0.weight | (64, 1, 3, 3) | 576 | 576 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.15809 | 0.00543 | 0.11513 | | 4 | module.model.2.3.weight | (128, 64, 1, 1) | 8192 | 8192 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08442 | -0.00031 | 0.04182 | | 5 | module.model.3.0.weight | (128, 1, 3, 3) | 1152 | 1152 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.16780 | 0.00125 | 0.10545 | | 6 | module.model.3.3.weight | (128, 128, 1, 1) | 16384 | 16384 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.07126 | -0.00197 | 0.04123 | | 7 | module.model.4.0.weight | (128, 1, 3, 3) | 1152 | 1152 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.10182 | 0.00171 | 0.08719 | | 8 | module.model.4.3.weight | (256, 128, 1, 1) | 32768 | 13108 | 0.00000 | 0.00000 | 10.15625 | 59.99756 | 12.50000 | 59.99756 | 0.05543 | -0.00002 | 0.02760 | | 9 | module.model.5.0.weight | (256, 1, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.12516 | -0.00288 | 0.08058 | | 10 | module.model.5.3.weight | (256, 256, 1, 1) | 65536 | 26215 | 0.00000 | 0.00000 | 12.50000 | 59.99908 | 23.82812 | 59.99908 | 0.04453 | 0.00002 | 0.02271 | | 11 | module.model.6.0.weight | (256, 1, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08024 | 0.00252 | 0.06377 | | 12 | module.model.6.3.weight | (512, 256, 1, 1) | 131072 | 52429 | 0.00000 | 0.00000 | 23.82812 | 59.99985 | 14.25781 | 59.99985 | 0.03561 | -0.00057 | 0.01779 | | 13 | module.model.7.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.11008 | -0.00018 | 0.06829 | | 14 | module.model.7.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 14.25781 | 59.99985 | 21.28906 | 59.99985 | 0.02944 | -0.00060 | 0.01515 | | 15 | module.model.8.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08258 | 0.00370 | 0.04905 | | 16 | module.model.8.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 21.28906 | 59.99985 | 28.51562 | 59.99985 | 0.02865 | -0.00046 | 0.01465 | | 17 | module.model.9.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.07578 | 0.00468 | 0.04201 | | 18 | module.model.9.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 28.51562 | 59.99985 | 23.43750 | 59.99985 | 0.02939 | -0.00044 | 0.01511 | | 19 | module.model.10.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.07091 | 0.00014 | 0.04306 | | 20 | module.model.10.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 24.60938 | 59.99985 | 20.89844 | 59.99985 | 0.03095 | -0.00059 | 0.01672 | | 21 | module.model.11.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05729 | -0.00518 | 0.04267 | | 22 | module.model.11.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 20.89844 | 59.99985 | 17.57812 | 59.99985 | 0.03229 | -0.00044 | 0.01797 | | 23 | module.model.12.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.04981 | -0.00136 | 0.03967 | | 24 | module.model.12.3.weight | (1024, 512, 1, 1) | 524288 | 209716 | 0.00000 | 0.00000 | 16.01562 | 59.99985 | 44.23828 | 59.99985 | 0.02514 | -0.00106 | 0.01278 | | 25 | module.model.13.0.weight | (1024, 1, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.02396 | -0.00949 | 0.01549 | | 26 | module.model.13.3.weight | (1024, 1024, 1, 1) | 1048576 | 419431 | 0.00000 | 0.00000 | 44.72656 | 59.99994 | 1.46484 | 59.99994 | 0.01801 | -0.00017 | 0.00931 | | 27 | module.fc.weight | (1000, 1024) | 1024000 | 409600 | 1.46484 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 60.00000 | 0.05078 | 0.00271 | 0.02734 | | 28 | Total sparsity: | - | 4209088 | 1726917 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 58.97171 | 0.00000 | 0.00000 | 0.00000 | +----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ Total sparsity: 58.97 --- validate (epoch=199)----------- 128116 samples (256 per mini-batch) == Top1: 65.337 Top5: 84.984 Loss: 1.494 --- test --------------------- 50000 samples (256 per mini-batch) == Top1: 68.810 Top5: 88.626 Loss: 1.282 Learning Structured Sparsity in Deep Neural Networks This research paper from the University of Pittsburgh, \"proposes a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNN\u2019s evaluation.\" Note that this paper does not use pruning, but instead uses group regularization during the training to force weights towards zero, as a group. We used a schedule which thresholds the regularized elements at a magnitude equal to the regularization strength. At the end of the regularization phase, we save the final sparsity masks generated by the regularization, and exit. Then we load this regularized model, remove the layers corresponding to the zeroed weight tensors (all of a layer's elements have a zero value). Baseline training We started by training the baseline ResNet20-Cifar dense network since we didn't have a pre-trained model. Distiller schedule: distiller/examples/ssl/resnet20_cifar_baseline_training.yaml Checkpoint files: distiller/examples/ssl/checkpoints/ $ time python3 compress_classifier.py --arch resnet20_cifar ../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/baseline_training.yaml -j=1 --deterministic Regularization Then we started training from scratch again, but this time we used Group Lasso regularization on entire layers: Distiller schedule: distiller/examples/ssl/ssl_4D-removal_4L_training.yaml $ time python3 compress_classifier.py --arch resnet20_cifar ../data.cifar10 -p=50 --lr=0.4 --epochs=180 --compress=../ssl/ssl_4D-removal_training.yaml -j=1 --deterministic The diagram below shows the training of Resnet20/CIFAR10 using Group Lasso regularization on entire layers (in blue) vs. training Resnet20/CIFAR10 baseline (in red). You may notice several interesting things: 1. The LR-decay policy is the same, but the two sessions start with different initial LR values. 2. The data-loss of the regularized training follows the same shape as the un-regularized training (baseline), and eventually the two seem to merge. 3. We see similar behavior in the validation Top1 and Top5 accuracy results, but the regularized training eventually performs better. 4. In the top right corner we see the behavior of the regularization loss ( Reg Loss ), which actually increases for some time, until the data-loss has a sharp drop (after ~16K mini-batches), at which point the regularization loss also starts dropping. This regularization yields 5 layers with zeroed weight tensors. We load this model, remove the 5 layers, and start the fine tuning of the weights. This process of layer removal is specific to ResNet for CIFAR, which we altered by adding code to skip over layers during the forward path. When you export to ONNX, the removed layers do not participate in the forward path, so they don't get incarnated. We managed to remove 5 of the 16 3x3 convolution layers which dominate the computation time. It's not bad, but we probably could have done better. Fine-tuning During the fine-tuning process, because the removed layers do not participate in the forward path, they do not appear in the backward path and are not backpropogated: therefore they are completely disconnected from the network. We copy the checkpoint file of the regularized model to checkpoint_trained_4D_regularized_5Lremoved.pth.tar . Distiller schedule: distiller/examples/ssl/ssl_4D-removal_finetuning.yaml $ time python3 compress_classifier.py --arch resnet20_cifar ../data.cifar10 -p=50 --lr=0.1 --epochs=250 --resume=../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar --compress=../ssl/ssl_4D-removal_finetuning.yaml -j=1 --deterministic Results Our baseline results for ResNet20 Cifar are: Top1=91.450 and Top5=99.750 We used Distiller's GroupLassoRegularizer to remove 5 layers from Resnet20 (CIFAR10) with no degradation of the accuracies. The regularized model exhibits really poor classification abilities: $ time python3 compress_classifier.py --arch resnet20_cifar ../data.cifar10 -p=50 --resume=../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar --evaluate = loading checkpoint ../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar best top@1: 90.620 Loaded compression schedule from checkpoint (epoch 179) Removing layer: module.layer1.0.conv1 [layer=0 block=0 conv=0] Removing layer: module.layer1.0.conv2 [layer=0 block=0 conv=1] Removing layer: module.layer1.1.conv1 [layer=0 block=1 conv=0] Removing layer: module.layer1.1.conv2 [layer=0 block=1 conv=1] Removing layer: module.layer2.2.conv2 [layer=1 block=2 conv=1] Files already downloaded and verified Files already downloaded and verified Dataset sizes: training=45000 validation=5000 test=10000 --- test --------------------- 10000 samples (256 per mini-batch) == Top1: 22.290 Top5: 68.940 Loss: 5.172 However, after fine-tuning, we recovered most of the accuracies loss, but not quite all of it: Top1=91.020 and Top5=99.670 We didn't spend time trying to wrestle with this network, and therefore didn't achieve SSL's published results (which showed that they managed to remove 6 layers and at the same time increase accuracies). Pruning Filters for Efficient ConvNets Quoting the authors directly: We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. The implementation of the research by Hao et al. required us to add filter-pruning sensitivity analysis, and support for \"network thinning\". After performing filter-pruning sensitivity analysis to assess which layers are more sensitive to the pruning of filters, we execute distiller.L1RankedStructureParameterPruner once in order to rank the filters of each layer by their L1-norm values, and then we prune the schedule-prescribed sparsity level. Distiller schedule: distiller/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank.yaml Checkpoint files: checkpoint_finetuned.pth.tar The excerpt from the schedule, displayed below, shows how we declare the L1RankedStructureParameterPruner. This class currently ranks filters only, but because in the future this class may support ranking of various structures, you need to specify for each parameter both the target sparsity level, and the structure type ('3D' is filter-wise pruning). pruners: filter_pruner: class: 'L1RankedStructureParameterPruner' reg_regims: 'module.layer1.0.conv1.weight': [0.6, '3D'] 'module.layer1.1.conv1.weight': [0.6, '3D'] 'module.layer1.2.conv1.weight': [0.6, '3D'] 'module.layer1.3.conv1.weight': [0.6, '3D'] In the policy, we specify that we want to invoke this pruner once, at epoch 180. Because we are starting from a network which was trained for 180 epochs (see Baseline training below), the filter ranking is performed right at the outset of this schedule. policies: - pruner: instance_name: filter_pruner epochs: [180] Following the pruning, we want to \"physically\" remove the pruned filters from the network, which involves reconfiguring the Convolutional layers and the parameter tensors. When we remove filters from Convolution layer n we need to perform several changes to the network: 1. Shrink layer n 's weights tensor, leaving only the \"important\" filters. 2. Configure layer n 's .out_channels member to its new, smaller, value. 3. If a BN layer follows layer n , then it also needs to be reconfigured and its scale and shift parameter vectors need to be shrunk. 4. If a Convolution layer follows the BN layer, then it will have less input channels which requires reconfiguration and shrinking of its weights. All of this is performed by distiller.ResnetCifarFilterRemover which is also scheduled at epoch 180. We call this process \"network thinning\". extensions: net_thinner: class: 'FilterRemover' thinning_func_str: remove_filters arch: 'resnet56_cifar' dataset: 'cifar10' Network thinning requires us to understand the layer connectivity and data-dependency of the DNN, and we are working on a robust method to perform this. On networks with topologies similar to ResNet (residuals) and GoogLeNet (inception), which have several inputs and outputs to/from Convolution layers, there is extra details to consider. Our current implementation is specific to certain layers in ResNet and is a bit fragile. We will continue to improve and generalize this. Baseline training We started by training the baseline ResNet56-Cifar dense network (180 epochs) since we didn't have a pre-trained model. Distiller schedule: distiller/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_baseline_training.yaml Checkpoint files: checkpoint.resnet56_cifar_baseline.pth.tar Results We trained a ResNet56-Cifar10 network and achieve accuracy results which are on-par with published results: Top1: 92.970 and Top5: 99.740. We used Hao et al.'s algorithm to remove 37.3% of the original convolution MACs, while maintaining virtually the same accuracy as the baseline: Top1: 92.830 and Top5: 99.760","title":"Model Zoo"},{"location":"model_zoo.html#distiller-model-zoo","text":"","title":"Distiller Model Zoo"},{"location":"model_zoo.html#how-to-contribute-models-to-the-model-zoo","text":"We encourage you to contribute new models to the Model Zoo. We welcome implementations of published papers or of your own work. To assure that models and algorithms shared with others are high-quality, please commit your models with the following: Command-line arguments Log files PyTorch model","title":"How to contribute models to the Model Zoo"},{"location":"model_zoo.html#contents","text":"The Distiller model zoo is not a \"traditional\" model-zoo, because it does not necessarily contain best-in-class compressed models. Instead, the model-zoo contains a number of deep learning models that have been compressed using Distiller following some well-known research papers. These are meant to serve as examples of how Distiller can be used. Each model contains a Distiller schedule detailing how the model was compressed, a PyTorch checkpoint, text logs and TensorBoard logs. table, th, td { border: 1px solid black; } Paper Dataset Network Method Granularity Schedule Features Learning both Weights and Connections for Efficient Neural Networks ImageNet Alexnet Element-wise pruning Iterative; Manual Magnitude thresholding based on a sensitivity quantifier. Element-wise sparsity sensitivity analysis To prune, or not to prune: exploring the efficacy of pruning for model compression ImageNet MobileNet Element-wise pruning Automated gradual; Iterative Magnitude thresholding based on target level Learning Structured Sparsity in Deep Neural Networks CIFAR10 ResNet20 Group regularization 1.Train with group-lasso 2.Remove zero groups and fine-tune Group Lasso regularization. Groups: kernels (2D), channels, filters (3D), layers (4D), vectors (rows, cols) Pruning Filters for Efficient ConvNets CIFAR10 ResNet56 Filter ranking; guided by sensitivity analysis 1.Rank filters 2. Remove filters and channels 3.Fine-tune One-shot ranking and pruning of filters; with network thinning","title":"Contents"},{"location":"model_zoo.html#learning-both-weights-and-connections-for-efficient-neural-networks","text":"This schedule is an example of \"Iterative Pruning\" for Alexnet/Imagent, as described in chapter 3 of Song Han's PhD dissertation: Efficient Methods and Hardware for Deep Learning and in his paper Learning both Weights and Connections for Efficient Neural Networks . The Distiller schedule uses SensitivityPruner which is similar to MagnitudeParameterPruner, but instead of specifying \"raw\" thresholds, it uses a \"sensitivity parameter\". Song Han's paper says that \"the pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layers weights,\" and this is not explained much further. In Distiller, the \"quality parameter\" is referred to as \"sensitivity\" and is based on the values learned from performing sensitivity analysis. Using a parameter that is related to the standard deviation is very helpful: under the assumption that the weights tensors are distributed normally, the standard deviation acts as a threshold normalizer. Note that Distiller's implementation deviates slightly from the algorithm Song Han describes in his PhD dissertation, in that the threshold value is set only once. In his PhD dissertation, Song Han describes a growing threshold, at each iteration. This requires n+1 hyper-parameters (n being the number of pruning iterations we use): the threshold and the threshold increase (delta) at each pruning iteration. Distiller's implementation takes advantage of the fact that as pruning progresses, more weights are pulled toward zero, and therefore the threshold \"traps\" more weights. Thus, we can use less hyper-parameters and achieve the same results. Distiller schedule: distiller/examples/sensitivity-pruning/alexnet.schedule_sensitivity.yaml Checkpoint file: alexnet.checkpoint.89.pth.tar","title":"Learning both Weights and Connections for Efficient Neural Networks"},{"location":"model_zoo.html#results","text":"Our reference is TorchVision's pretrained Alexnet model which has a Top1 accuracy of 56.55 and Top5=79.09. We prune away 88.44% of the parameters and achieve Top1=56.61 and Top5=79.45. Song Han prunes 89% of the parameters, which is slightly better than our results. Parameters: +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ | | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean |----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------| | 0 | features.module.0.weight | (64, 3, 11, 11) | 23232 | 13411 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 42.27359 | 0.14391 | -0.00002 | 0.08805 | | 1 | features.module.3.weight | (192, 64, 5, 5) | 307200 | 115560 | 0.00000 | 0.00000 | 0.00000 | 1.91243 | 0.00000 | 62.38281 | 0.04703 | -0.00250 | 0.02289 | | 2 | features.module.6.weight | (384, 192, 3, 3) | 663552 | 256565 | 0.00000 | 0.00000 | 0.00000 | 6.18490 | 0.00000 | 61.33445 | 0.03354 | -0.00184 | 0.01803 | | 3 | features.module.8.weight | (256, 384, 3, 3) | 884736 | 315065 | 0.00000 | 0.00000 | 0.00000 | 6.96411 | 0.00000 | 64.38881 | 0.02646 | -0.00168 | 0.01422 | | 4 | features.module.10.weight | (256, 256, 3, 3) | 589824 | 186938 | 0.00000 | 0.00000 | 0.00000 | 15.49225 | 0.00000 | 68.30614 | 0.02714 | -0.00246 | 0.01409 | | 5 | classifier.1.weight | (4096, 9216) | 37748736 | 3398881 | 0.00000 | 0.21973 | 0.00000 | 0.21973 | 0.00000 | 90.99604 | 0.00589 | -0.00020 | 0.00168 | | 6 | classifier.4.weight | (4096, 4096) | 16777216 | 1782769 | 0.21973 | 3.46680 | 0.00000 | 3.46680 | 0.00000 | 89.37387 | 0.00849 | -0.00066 | 0.00263 | | 7 | classifier.6.weight | (1000, 4096) | 4096000 | 994738 | 3.36914 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 75.71440 | 0.01718 | 0.00030 | 0.00778 | | 8 | Total sparsity: | - | 61090496 | 7063928 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 88.43694 | 0.00000 | 0.00000 | 0.00000 | +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ 2018-04-04 21:30:52,499 - Total sparsity: 88.44 2018-04-04 21:30:52,499 - --- validate (epoch=89)----------- 2018-04-04 21:30:52,499 - 128116 samples (256 per mini-batch) 2018-04-04 21:31:35,357 - == Top1: 51.838 Top5: 74.817 Loss: 2.150 2018-04-04 21:31:39,251 - --- test --------------------- 2018-04-04 21:31:39,252 - 50000 samples (256 per mini-batch) 2018-04-04 21:32:01,274 - == Top1: 56.606 Top5: 79.446 Loss: 1.893","title":"Results"},{"location":"model_zoo.html#to-prune-or-not-to-prune-exploring-the-efficacy-of-pruning-for-model-compression","text":"In their paper Zhu and Gupta, \"compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint.\" They also \"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning.\" This pruning schedule is implemented by distiller.AutomatedGradualPruner, which increases the sparsity level (expressed as a percentage of zero-valued elements) gradually over several pruning steps. Distiller's implementation only prunes elements once in an epoch (the model is fine-tuned in between pruning events), which is a small deviation from Zhu and Gupta's paper. The research paper specifies the schedule in terms of mini-batches, while our implementation specifies the schedule in terms of epochs. We feel that using epochs performs well, and is more \"stable\", since the number of mini-batches will change, if you change the batch size. ImageNet files: Distiller schedule: distiller/examples/agp-pruning/mobilenet.imagenet.schedule_agp.yaml Checkpoint file: checkpoint.pth.tar ResNet18 files: Distiller schedule: distiller/examples/agp-pruning/resnet18.schedule_agp.yaml Checkpoint file: checkpoint.pth.tar","title":"To prune, or not to prune: exploring the efficacy of pruning for model compression"},{"location":"model_zoo.html#results_1","text":"As our baseline we used a pretrained PyTorch MobileNet model (width=1) which has Top1=68.848 and Top5=88.740. In their paper, Zhu and Gupta prune 50% of the elements of MobileNet (width=1) with a 1.1% drop in accuracy. We pruned about 51.6% of the elements, with virtually no change in the accuracies (Top1: 68.808 and Top5: 88.656). We didn't try to prune more than this, but we do note that the baseline accuracy that we used is almost 2% lower than the accuracy published in the paper. +----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ | | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean | |----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------| | 0 | module.model.0.0.weight | (32, 3, 3, 3) | 864 | 864 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.14466 | 0.00103 | 0.06508 | | 1 | module.model.1.0.weight | (32, 1, 3, 3) | 288 | 288 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.32146 | 0.01020 | 0.12932 | | 2 | module.model.1.3.weight | (64, 32, 1, 1) | 2048 | 2048 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.11942 | 0.00024 | 0.03627 | | 3 | module.model.2.0.weight | (64, 1, 3, 3) | 576 | 576 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.15809 | 0.00543 | 0.11513 | | 4 | module.model.2.3.weight | (128, 64, 1, 1) | 8192 | 8192 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08442 | -0.00031 | 0.04182 | | 5 | module.model.3.0.weight | (128, 1, 3, 3) | 1152 | 1152 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.16780 | 0.00125 | 0.10545 | | 6 | module.model.3.3.weight | (128, 128, 1, 1) | 16384 | 16384 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.07126 | -0.00197 | 0.04123 | | 7 | module.model.4.0.weight | (128, 1, 3, 3) | 1152 | 1152 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.10182 | 0.00171 | 0.08719 | | 8 | module.model.4.3.weight | (256, 128, 1, 1) | 32768 | 13108 | 0.00000 | 0.00000 | 10.15625 | 59.99756 | 12.50000 | 59.99756 | 0.05543 | -0.00002 | 0.02760 | | 9 | module.model.5.0.weight | (256, 1, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.12516 | -0.00288 | 0.08058 | | 10 | module.model.5.3.weight | (256, 256, 1, 1) | 65536 | 26215 | 0.00000 | 0.00000 | 12.50000 | 59.99908 | 23.82812 | 59.99908 | 0.04453 | 0.00002 | 0.02271 | | 11 | module.model.6.0.weight | (256, 1, 3, 3) | 2304 | 2304 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08024 | 0.00252 | 0.06377 | | 12 | module.model.6.3.weight | (512, 256, 1, 1) | 131072 | 52429 | 0.00000 | 0.00000 | 23.82812 | 59.99985 | 14.25781 | 59.99985 | 0.03561 | -0.00057 | 0.01779 | | 13 | module.model.7.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.11008 | -0.00018 | 0.06829 | | 14 | module.model.7.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 14.25781 | 59.99985 | 21.28906 | 59.99985 | 0.02944 | -0.00060 | 0.01515 | | 15 | module.model.8.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.08258 | 0.00370 | 0.04905 | | 16 | module.model.8.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 21.28906 | 59.99985 | 28.51562 | 59.99985 | 0.02865 | -0.00046 | 0.01465 | | 17 | module.model.9.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.07578 | 0.00468 | 0.04201 | | 18 | module.model.9.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 28.51562 | 59.99985 | 23.43750 | 59.99985 | 0.02939 | -0.00044 | 0.01511 | | 19 | module.model.10.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.07091 | 0.00014 | 0.04306 | | 20 | module.model.10.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 24.60938 | 59.99985 | 20.89844 | 59.99985 | 0.03095 | -0.00059 | 0.01672 | | 21 | module.model.11.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.05729 | -0.00518 | 0.04267 | | 22 | module.model.11.3.weight | (512, 512, 1, 1) | 262144 | 104858 | 0.00000 | 0.00000 | 20.89844 | 59.99985 | 17.57812 | 59.99985 | 0.03229 | -0.00044 | 0.01797 | | 23 | module.model.12.0.weight | (512, 1, 3, 3) | 4608 | 4608 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.04981 | -0.00136 | 0.03967 | | 24 | module.model.12.3.weight | (1024, 512, 1, 1) | 524288 | 209716 | 0.00000 | 0.00000 | 16.01562 | 59.99985 | 44.23828 | 59.99985 | 0.02514 | -0.00106 | 0.01278 | | 25 | module.model.13.0.weight | (1024, 1, 3, 3) | 9216 | 9216 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.02396 | -0.00949 | 0.01549 | | 26 | module.model.13.3.weight | (1024, 1024, 1, 1) | 1048576 | 419431 | 0.00000 | 0.00000 | 44.72656 | 59.99994 | 1.46484 | 59.99994 | 0.01801 | -0.00017 | 0.00931 | | 27 | module.fc.weight | (1000, 1024) | 1024000 | 409600 | 1.46484 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 60.00000 | 0.05078 | 0.00271 | 0.02734 | | 28 | Total sparsity: | - | 4209088 | 1726917 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 58.97171 | 0.00000 | 0.00000 | 0.00000 | +----+--------------------------+--------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ Total sparsity: 58.97 --- validate (epoch=199)----------- 128116 samples (256 per mini-batch) == Top1: 65.337 Top5: 84.984 Loss: 1.494 --- test --------------------- 50000 samples (256 per mini-batch) == Top1: 68.810 Top5: 88.626 Loss: 1.282","title":"Results"},{"location":"model_zoo.html#learning-structured-sparsity-in-deep-neural-networks","text":"This research paper from the University of Pittsburgh, \"proposes a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNN\u2019s evaluation.\" Note that this paper does not use pruning, but instead uses group regularization during the training to force weights towards zero, as a group. We used a schedule which thresholds the regularized elements at a magnitude equal to the regularization strength. At the end of the regularization phase, we save the final sparsity masks generated by the regularization, and exit. Then we load this regularized model, remove the layers corresponding to the zeroed weight tensors (all of a layer's elements have a zero value).","title":"Learning Structured Sparsity in Deep Neural Networks"},{"location":"model_zoo.html#baseline-training","text":"We started by training the baseline ResNet20-Cifar dense network since we didn't have a pre-trained model. Distiller schedule: distiller/examples/ssl/resnet20_cifar_baseline_training.yaml Checkpoint files: distiller/examples/ssl/checkpoints/ $ time python3 compress_classifier.py --arch resnet20_cifar ../data.cifar10 -p=50 --lr=0.3 --epochs=180 --compress=../cifar10/resnet20/baseline_training.yaml -j=1 --deterministic","title":"Baseline training"},{"location":"model_zoo.html#regularization","text":"Then we started training from scratch again, but this time we used Group Lasso regularization on entire layers: Distiller schedule: distiller/examples/ssl/ssl_4D-removal_4L_training.yaml $ time python3 compress_classifier.py --arch resnet20_cifar ../data.cifar10 -p=50 --lr=0.4 --epochs=180 --compress=../ssl/ssl_4D-removal_training.yaml -j=1 --deterministic The diagram below shows the training of Resnet20/CIFAR10 using Group Lasso regularization on entire layers (in blue) vs. training Resnet20/CIFAR10 baseline (in red). You may notice several interesting things: 1. The LR-decay policy is the same, but the two sessions start with different initial LR values. 2. The data-loss of the regularized training follows the same shape as the un-regularized training (baseline), and eventually the two seem to merge. 3. We see similar behavior in the validation Top1 and Top5 accuracy results, but the regularized training eventually performs better. 4. In the top right corner we see the behavior of the regularization loss ( Reg Loss ), which actually increases for some time, until the data-loss has a sharp drop (after ~16K mini-batches), at which point the regularization loss also starts dropping. This regularization yields 5 layers with zeroed weight tensors. We load this model, remove the 5 layers, and start the fine tuning of the weights. This process of layer removal is specific to ResNet for CIFAR, which we altered by adding code to skip over layers during the forward path. When you export to ONNX, the removed layers do not participate in the forward path, so they don't get incarnated. We managed to remove 5 of the 16 3x3 convolution layers which dominate the computation time. It's not bad, but we probably could have done better.","title":"Regularization"},{"location":"model_zoo.html#fine-tuning","text":"During the fine-tuning process, because the removed layers do not participate in the forward path, they do not appear in the backward path and are not backpropogated: therefore they are completely disconnected from the network. We copy the checkpoint file of the regularized model to checkpoint_trained_4D_regularized_5Lremoved.pth.tar . Distiller schedule: distiller/examples/ssl/ssl_4D-removal_finetuning.yaml $ time python3 compress_classifier.py --arch resnet20_cifar ../data.cifar10 -p=50 --lr=0.1 --epochs=250 --resume=../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar --compress=../ssl/ssl_4D-removal_finetuning.yaml -j=1 --deterministic","title":"Fine-tuning"},{"location":"model_zoo.html#results_2","text":"Our baseline results for ResNet20 Cifar are: Top1=91.450 and Top5=99.750 We used Distiller's GroupLassoRegularizer to remove 5 layers from Resnet20 (CIFAR10) with no degradation of the accuracies. The regularized model exhibits really poor classification abilities: $ time python3 compress_classifier.py --arch resnet20_cifar ../data.cifar10 -p=50 --resume=../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar --evaluate = loading checkpoint ../cifar10/resnet20/checkpoint_trained_4D_regularized_5Lremoved.pth.tar best top@1: 90.620 Loaded compression schedule from checkpoint (epoch 179) Removing layer: module.layer1.0.conv1 [layer=0 block=0 conv=0] Removing layer: module.layer1.0.conv2 [layer=0 block=0 conv=1] Removing layer: module.layer1.1.conv1 [layer=0 block=1 conv=0] Removing layer: module.layer1.1.conv2 [layer=0 block=1 conv=1] Removing layer: module.layer2.2.conv2 [layer=1 block=2 conv=1] Files already downloaded and verified Files already downloaded and verified Dataset sizes: training=45000 validation=5000 test=10000 --- test --------------------- 10000 samples (256 per mini-batch) == Top1: 22.290 Top5: 68.940 Loss: 5.172 However, after fine-tuning, we recovered most of the accuracies loss, but not quite all of it: Top1=91.020 and Top5=99.670 We didn't spend time trying to wrestle with this network, and therefore didn't achieve SSL's published results (which showed that they managed to remove 6 layers and at the same time increase accuracies).","title":"Results"},{"location":"model_zoo.html#pruning-filters-for-efficient-convnets","text":"Quoting the authors directly: We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. The implementation of the research by Hao et al. required us to add filter-pruning sensitivity analysis, and support for \"network thinning\". After performing filter-pruning sensitivity analysis to assess which layers are more sensitive to the pruning of filters, we execute distiller.L1RankedStructureParameterPruner once in order to rank the filters of each layer by their L1-norm values, and then we prune the schedule-prescribed sparsity level. Distiller schedule: distiller/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_filter_rank.yaml Checkpoint files: checkpoint_finetuned.pth.tar The excerpt from the schedule, displayed below, shows how we declare the L1RankedStructureParameterPruner. This class currently ranks filters only, but because in the future this class may support ranking of various structures, you need to specify for each parameter both the target sparsity level, and the structure type ('3D' is filter-wise pruning). pruners: filter_pruner: class: 'L1RankedStructureParameterPruner' reg_regims: 'module.layer1.0.conv1.weight': [0.6, '3D'] 'module.layer1.1.conv1.weight': [0.6, '3D'] 'module.layer1.2.conv1.weight': [0.6, '3D'] 'module.layer1.3.conv1.weight': [0.6, '3D'] In the policy, we specify that we want to invoke this pruner once, at epoch 180. Because we are starting from a network which was trained for 180 epochs (see Baseline training below), the filter ranking is performed right at the outset of this schedule. policies: - pruner: instance_name: filter_pruner epochs: [180] Following the pruning, we want to \"physically\" remove the pruned filters from the network, which involves reconfiguring the Convolutional layers and the parameter tensors. When we remove filters from Convolution layer n we need to perform several changes to the network: 1. Shrink layer n 's weights tensor, leaving only the \"important\" filters. 2. Configure layer n 's .out_channels member to its new, smaller, value. 3. If a BN layer follows layer n , then it also needs to be reconfigured and its scale and shift parameter vectors need to be shrunk. 4. If a Convolution layer follows the BN layer, then it will have less input channels which requires reconfiguration and shrinking of its weights. All of this is performed by distiller.ResnetCifarFilterRemover which is also scheduled at epoch 180. We call this process \"network thinning\". extensions: net_thinner: class: 'FilterRemover' thinning_func_str: remove_filters arch: 'resnet56_cifar' dataset: 'cifar10' Network thinning requires us to understand the layer connectivity and data-dependency of the DNN, and we are working on a robust method to perform this. On networks with topologies similar to ResNet (residuals) and GoogLeNet (inception), which have several inputs and outputs to/from Convolution layers, there is extra details to consider. Our current implementation is specific to certain layers in ResNet and is a bit fragile. We will continue to improve and generalize this.","title":"Pruning Filters for Efficient ConvNets"},{"location":"model_zoo.html#baseline-training_1","text":"We started by training the baseline ResNet56-Cifar dense network (180 epochs) since we didn't have a pre-trained model. Distiller schedule: distiller/examples/pruning_filters_for_efficient_convnets/resnet56_cifar_baseline_training.yaml Checkpoint files: checkpoint.resnet56_cifar_baseline.pth.tar","title":"Baseline training"},{"location":"model_zoo.html#results_3","text":"We trained a ResNet56-Cifar10 network and achieve accuracy results which are on-par with published results: Top1: 92.970 and Top5: 99.740. We used Hao et al.'s algorithm to remove 37.3% of the original convolution MACs, while maintaining virtually the same accuracy as the baseline: Top1: 92.830 and Top5: 99.760","title":"Results"},{"location":"pruning.html","text":"Pruning A common methodology for inducing sparsity in weights and activations is called pruning . Pruning is the application of a binary criteria to decide which weights to prune: weights which match the pruning criteria are assigned a value of zero. Pruned elements are \"trimmed\" from the model: we zero their values and also make sure they don't take part in the back-propagation process. We can prune weights, biases, and activations. Biases are few and their contribution to a layer's output is relatively large, so there is little incentive to prune them. We usually see sparse activations following a ReLU layer, because ReLU quenches negative activations to exact zero (\\(ReLU(x): max(0,x)\\)). Sparsity in weights is less common, as weights tend to be very small, but are often not exact zeros. Let's define sparsity Sparsity is a a measure of how many elements in a tensor are exact zeros, relative to the tensor size. A tensor is considered sparse if \"most\" of its elements are zero. How much is \"most\", is not strictly defined, but when you see a sparse tensor you know it ;-) The \\(l_0\\)-\"norm\" function measures how many zero-elements are in a tensor x : \\[\\lVert x \\rVert_0\\;=\\;|x_1|^0 + |x_2|^0 + ... + |x_n|^0 \\] In other words, an element contributes either a value of 1 or 0 to \\(l_0\\). Anything but an exact zero contributes a value of 1 - that's pretty cool. Sometimes it helps to think about density, the number of non-zero elements (NNZ) and sparsity's complement: \\[ density = 1 - sparsity \\] You can use distiller.sparsity and distiller.density to query a PyTorch tensor's sparsity and density. What is weights pruning? Weights pruning, or model pruning, is a set of methods to increase the sparsity (amount of zero-valued elements in a tensor) of a network's weights. In general, the term 'parameters' refers to both weights and bias tensors of a model. Biases are rarely, if ever, pruned because there are very few bias elements compared to weights elements, and it is just not worth the trouble. Pruning requires a criteria for choosing which elements to prune - this is called the pruning criteria . The most common pruning criteria is the absolute value of each element: the element's absolute value is compared to some threshold value, and if it is below the threshold the element is set to zero (i.e. pruned) . This is implemented by the distiller.MagnitudeParameterPruner class. The idea behind this method, is that weights with small \\(l_1\\)-norms (absolute value) contribute little to the final result (low saliency), so they are less important and can be removed. A related idea motivating pruning, is that models are over-parametrized and contain redundant logic and features. Therefore, some of these redundancies can be removed by setting their weights to zero. And yet another way to think of pruning is to phrase it as a search for a set of weights with as many zeros as possible, which still produces acceptable inference accuracies compared to the dense-model (non-pruned model). Another way to look at it, is to imagine that because of the very high-dimensionality of the parameter space, the immediate space around the dense-model's solution likely contains some sparse solutions, and we want to use find these sparse solutions. Pruning schedule The most straight-forward to prune is to take a trained model and prune it once; also called one-shot pruning . In Learning both Weights and Connections for Efficient Neural Networks Song Han et. al show that this is surprisingly effective, but also leaves a lot of potential sparsity untapped. The surprise is what they call the \"free lunch\" effect: \"reducing 2x the connections without losing accuracy even without retraining.\" However, they also note that when employing a pruning-followed-by-retraining regimen, they can achieve much better results (higher sparsity at no accuracy loss). This is called iterative pruning , and the retraining that follows pruning is often referred to as fine-tuning . How the pruning criteria changes between iterations, how many iterations we perform and how often, and which tensors are pruned - this is collectively called the pruning schedule . We can think of iterative pruning as repeatedly learning which weights are important, removing the least important ones based on some importance criteria, and then retraining the model to let it \"recover\" from the pruning by adjusting the remaining weights. At each iteration, we prune more weights. The decision of when to stop pruning is also expressed in the schedule, and it depends on the pruning algorithm. For example, if we are trying to achieve a specific sparsity level, then we stop when the pruning achieves that level. And if we are pruning weights structures in order to reduce the required compute budget, then we stop the pruning when this compute reduction is achieved. Distiller supports expressing the pruning schedule as a YAML file (which is then executed by an instance of a PruningScheduler). Pruning granularity Pruning individual weight elements is called element-wise pruning , and it is also sometimes referred to as fine-grained pruning. Coarse-grained pruning - also referred to as structured pruning , group pruning , or block pruning - is pruning entire groups of elements which have some significance. Groups come in various shapes and sizes, but an easy to visualize group-pruning is filter-pruning, in which entire filters are removed. Sensitivity analysis The hard part about inducing sparsity via pruning is determining what threshold, or sparsity level, to use for each layer's tensors. Sensitivity analysis is a method that tries to help us rank the tensors by their sensitivity to pruning. The idea is to set the pruning level (percentage) of a specific layer, and then to prune once, run an evaluation on the test dataset and record the accuracy score. We do this for all of the parameterized layers, and for each layer we examine several sparsity levels. This should teach us about the \"sensitivity\" of each of the layers to pruning. The evaluated model should be trained to maximum accuracy before running the analysis, because we aim to understand the behavior of the trained model's performance in relation to pruning of a specific weights tensor. Much as we can prune structures, we can also perform sensitivity analysis on structures. Distiller implements element-wise pruning sensitivity analysis using the \\(l_1\\)-norm of individual elements; and filter-wise pruning sensitivity analysis using the mean \\(l_1\\)-norm of filters. The authors of Pruning Filters for Efficient ConvNets describe how they do sensitivity analysis: \"To understand the sensitivity of each layer, we prune each layer independently and evaluate the resulting pruned network\u2019s accuracy on the validation set. Figure 2(b) shows that layers that maintain their accuracy as filters are pruned away correspond to layers with larger slopes in Figure 2(a). On the contrary, layers with relatively flat slopes are more sensitive to pruning. We empirically determine the number of filters to prune for each layer based on their sensitivity to pruning. For deep networks such as VGG-16 or ResNets, we observe that layers in the same stage (with the same feature map size) have a similar sensitivity to pruning. To avoid introducing layer-wise meta-parameters, we use the same pruning ratio for all layers in the same stage. For layers that are sensitive to pruning, we prune a smaller percentage of these layers or completely skip pruning them.\" The diagram below shows the results of running an element-wise sensitivity analysis on Alexnet, using Distillers's perform_sensitivity_analysis utility function. As reported by Song Han, and exhibited in the diagram, in Alexnet the feature detecting layers (convolution layers) are more sensitive to pruning, and their sensitivity drops, the deeper they are. The fully-connected layers are much less sensitive, which is great, because that's where most of the parameters are. References Song Han, Jeff Pool, John Tran, William J. Dally . Learning both Weights and Connections for Efficient Neural Networks , arXiv:1607.04381v2, 2015. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf . Pruning Filters for Efficient ConvNets , arXiv:1608.08710v3, 2017.","title":"Pruning"},{"location":"pruning.html#pruning","text":"A common methodology for inducing sparsity in weights and activations is called pruning . Pruning is the application of a binary criteria to decide which weights to prune: weights which match the pruning criteria are assigned a value of zero. Pruned elements are \"trimmed\" from the model: we zero their values and also make sure they don't take part in the back-propagation process. We can prune weights, biases, and activations. Biases are few and their contribution to a layer's output is relatively large, so there is little incentive to prune them. We usually see sparse activations following a ReLU layer, because ReLU quenches negative activations to exact zero (\\(ReLU(x): max(0,x)\\)). Sparsity in weights is less common, as weights tend to be very small, but are often not exact zeros.","title":"Pruning"},{"location":"pruning.html#lets-define-sparsity","text":"Sparsity is a a measure of how many elements in a tensor are exact zeros, relative to the tensor size. A tensor is considered sparse if \"most\" of its elements are zero. How much is \"most\", is not strictly defined, but when you see a sparse tensor you know it ;-) The \\(l_0\\)-\"norm\" function measures how many zero-elements are in a tensor x : \\[\\lVert x \\rVert_0\\;=\\;|x_1|^0 + |x_2|^0 + ... + |x_n|^0 \\] In other words, an element contributes either a value of 1 or 0 to \\(l_0\\). Anything but an exact zero contributes a value of 1 - that's pretty cool. Sometimes it helps to think about density, the number of non-zero elements (NNZ) and sparsity's complement: \\[ density = 1 - sparsity \\] You can use distiller.sparsity and distiller.density to query a PyTorch tensor's sparsity and density.","title":"Let's define sparsity"},{"location":"pruning.html#what-is-weights-pruning","text":"Weights pruning, or model pruning, is a set of methods to increase the sparsity (amount of zero-valued elements in a tensor) of a network's weights. In general, the term 'parameters' refers to both weights and bias tensors of a model. Biases are rarely, if ever, pruned because there are very few bias elements compared to weights elements, and it is just not worth the trouble. Pruning requires a criteria for choosing which elements to prune - this is called the pruning criteria . The most common pruning criteria is the absolute value of each element: the element's absolute value is compared to some threshold value, and if it is below the threshold the element is set to zero (i.e. pruned) . This is implemented by the distiller.MagnitudeParameterPruner class. The idea behind this method, is that weights with small \\(l_1\\)-norms (absolute value) contribute little to the final result (low saliency), so they are less important and can be removed. A related idea motivating pruning, is that models are over-parametrized and contain redundant logic and features. Therefore, some of these redundancies can be removed by setting their weights to zero. And yet another way to think of pruning is to phrase it as a search for a set of weights with as many zeros as possible, which still produces acceptable inference accuracies compared to the dense-model (non-pruned model). Another way to look at it, is to imagine that because of the very high-dimensionality of the parameter space, the immediate space around the dense-model's solution likely contains some sparse solutions, and we want to use find these sparse solutions.","title":"What is weights pruning?"},{"location":"pruning.html#pruning-schedule","text":"The most straight-forward to prune is to take a trained model and prune it once; also called one-shot pruning . In Learning both Weights and Connections for Efficient Neural Networks Song Han et. al show that this is surprisingly effective, but also leaves a lot of potential sparsity untapped. The surprise is what they call the \"free lunch\" effect: \"reducing 2x the connections without losing accuracy even without retraining.\" However, they also note that when employing a pruning-followed-by-retraining regimen, they can achieve much better results (higher sparsity at no accuracy loss). This is called iterative pruning , and the retraining that follows pruning is often referred to as fine-tuning . How the pruning criteria changes between iterations, how many iterations we perform and how often, and which tensors are pruned - this is collectively called the pruning schedule . We can think of iterative pruning as repeatedly learning which weights are important, removing the least important ones based on some importance criteria, and then retraining the model to let it \"recover\" from the pruning by adjusting the remaining weights. At each iteration, we prune more weights. The decision of when to stop pruning is also expressed in the schedule, and it depends on the pruning algorithm. For example, if we are trying to achieve a specific sparsity level, then we stop when the pruning achieves that level. And if we are pruning weights structures in order to reduce the required compute budget, then we stop the pruning when this compute reduction is achieved. Distiller supports expressing the pruning schedule as a YAML file (which is then executed by an instance of a PruningScheduler).","title":"Pruning schedule"},{"location":"pruning.html#pruning-granularity","text":"Pruning individual weight elements is called element-wise pruning , and it is also sometimes referred to as fine-grained pruning. Coarse-grained pruning - also referred to as structured pruning , group pruning , or block pruning - is pruning entire groups of elements which have some significance. Groups come in various shapes and sizes, but an easy to visualize group-pruning is filter-pruning, in which entire filters are removed.","title":"Pruning granularity"},{"location":"pruning.html#sensitivity-analysis","text":"The hard part about inducing sparsity via pruning is determining what threshold, or sparsity level, to use for each layer's tensors. Sensitivity analysis is a method that tries to help us rank the tensors by their sensitivity to pruning. The idea is to set the pruning level (percentage) of a specific layer, and then to prune once, run an evaluation on the test dataset and record the accuracy score. We do this for all of the parameterized layers, and for each layer we examine several sparsity levels. This should teach us about the \"sensitivity\" of each of the layers to pruning. The evaluated model should be trained to maximum accuracy before running the analysis, because we aim to understand the behavior of the trained model's performance in relation to pruning of a specific weights tensor. Much as we can prune structures, we can also perform sensitivity analysis on structures. Distiller implements element-wise pruning sensitivity analysis using the \\(l_1\\)-norm of individual elements; and filter-wise pruning sensitivity analysis using the mean \\(l_1\\)-norm of filters. The authors of Pruning Filters for Efficient ConvNets describe how they do sensitivity analysis: \"To understand the sensitivity of each layer, we prune each layer independently and evaluate the resulting pruned network\u2019s accuracy on the validation set. Figure 2(b) shows that layers that maintain their accuracy as filters are pruned away correspond to layers with larger slopes in Figure 2(a). On the contrary, layers with relatively flat slopes are more sensitive to pruning. We empirically determine the number of filters to prune for each layer based on their sensitivity to pruning. For deep networks such as VGG-16 or ResNets, we observe that layers in the same stage (with the same feature map size) have a similar sensitivity to pruning. To avoid introducing layer-wise meta-parameters, we use the same pruning ratio for all layers in the same stage. For layers that are sensitive to pruning, we prune a smaller percentage of these layers or completely skip pruning them.\" The diagram below shows the results of running an element-wise sensitivity analysis on Alexnet, using Distillers's perform_sensitivity_analysis utility function. As reported by Song Han, and exhibited in the diagram, in Alexnet the feature detecting layers (convolution layers) are more sensitive to pruning, and their sensitivity drops, the deeper they are. The fully-connected layers are much less sensitive, which is great, because that's where most of the parameters are.","title":"Sensitivity analysis"},{"location":"pruning.html#references","text":"Song Han, Jeff Pool, John Tran, William J. Dally . Learning both Weights and Connections for Efficient Neural Networks , arXiv:1607.04381v2, 2015. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf . Pruning Filters for Efficient ConvNets , arXiv:1608.08710v3, 2017.","title":"References"},{"location":"quantization.html","text":"Quantization Quantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, the predominant numerical format used for research and for deployment has so far been 32-bit floating point, or FP32. However, the desire for reduced bandwidth and compute requirements of deep learning models has driven research into using lower-precision numerical formats. It has been extensively demonstrated that weights and activations can be represented using 8-bit integers (or INT8) without incurring significant loss in accuracy. The use of even lower bit-widths, such as 4/2/1-bits, is an active field of research that has also shown great progress. Note that this discussion is on quantization only in the context of more efficient inference. Using lower-precision numerics for more efficient training is currently out of scope. Motivation: Overall Efficiency The more obvious benefit from quantization is significantly reduced bandwidth and storage . For instance, using INT8 for weights and activations consumes 4x less overall bandwidth compared to FP32. Additionally integer compute is faster than floating point compute. It is also much more area and energy efficient : INT8 Operation Energy Saving vs FP32 Area Saving vs FP32 Add 30x 116x Multiply 18.5x 27x ( Dally, 2015 ) Note that very aggressive quantization can yield even more efficiency. If weights are binary (-1, 1) or ternary (-1, 0, 1 using 2-bits), then convolution and fully-connected layers can be computed with additions and subtractions only, removing multiplications completely. If activations are binary as well, then additions can also be removed, in favor of bitwise operations ( Rastegari et al., 2016 ). Integer vs. FP32 There are two main attributes when discussing a numerical format. The first is dynamic range , which refers to the range of representable numbers. The second one is how many values can be represented within the dynamic range, which in turn determines the precision / resolution of the format (the distance between two numbers). For all integer formats, the dynamic range is [-2^{n-1} .. 2^{n-1}-1] , where n is the number of bits. So for INT8 the range is [-128 .. 127] , and for INT4 it is [-8 .. 7] (we're limiting ourselves to signed integers for now). The number of representable values is 2^n . Contrast that with FP32, where the dynamic range is \\pm 3.4\\ x\\ 10^{38} , and approximately 4.2\\ x\\ 10^9 values can be represented. We can immediately see that FP32 is much more versatile , in that it is able to represent a wide range of distributions accurately. This is a nice property for deep learning models, where the distributions of weights and activations are usually very different (at least in dynamic range). In addition the dynamic range can differ between layers in the model. In order to be able to represent these different distributions with an integer format, a scale factor is used to map the dynamic range of the tensor to the integer format range. But still we remain with the issue of having a significantly lower number of representable values, that is - much lower resolution. Note that this scale factor is, in most cases, a floating-point number. Hence, even when using integer numerics, some floating-point computations remain. Courbariaux et al., 2014 scale using only shifts, eliminating the floating point operation. In GEMMLWOP , the FP32 scale factor is approximated using an integer or fixed-point multiplication followed by a shift operation. In many cases the effect of this approximation on accuracy is negligible. Avoiding Overflows Convolution and fully connected layers involve the storing of intermediate results in accumulators. Due to the limited dynamic range of integer formats, if we would use the same bit-width for the weights and activation, and for the accumulators, we would likely overflow very quickly. Therefore, accumulators are usually implemented with higher bit-widths. The result of multiplying two n -bit integers is, at most, a 2n -bit number. In convolution layers, such multiplications are accumulated c\\cdot k^2 times, where c is the number of input channels and k is the kernel width (assuming a square kernel). Hence, to avoid overflowing, the accumulator should be 2n + M -bits wide, where M is at least log_2(c\\cdot k^2) . In many cases 32-bit accumulators are used, however for INT4 and lower it might be possible to use less than 32 -bits, depending on the expected use cases and layer widths. \"Conservative\" Quantization: INT8 In many cases, taking a model trained for FP32 and directly quantizing it to INT8, without any re-training, can result in a relatively low loss of accuracy (which may or may not be acceptable, depending on the use case). Some fine-tuning can further improve the accuracy ( Gysel at al., 2018 ). As mentioned above, a scale factor is used to adapt the dynamic range of the tensor at hand to that of the integer format. This scale factor needs to be calculated per-layer per-tensor. The simplest way is to map the min/max values of the float tensor to the min/max of the integer format. For weights and biases this is easy, as they are set once training is complete. For activations, the min/max float values can be obtained \"online\" during inference, or \"offline\". Offline means gathering activations statistics before deploying the model, either during training or by running a few \"calibration\" batches on the trained FP32 model. Based on these gathered statistics, the scaled factors are calculated and are fixed once the model is deployed. This method has the risk of encountering values outside the previously observed ranges at runtime. These values will be clipped, which might lead to accuracy degradation. Online means calculating the min/max values for each tensor dynamically during runtime. In this method clipping cannot occur, however the added computation resources required to calculate the min/max values at runtime might be prohibitive. It is important to note, however, that the full float range of an activations tensor usually includes elements which are statistically outliers. These values can be discarded by using a narrower min/max range, effectively allowing some clipping to occur in favor of increasing the resolution provided to the part of the distribution containing most of the information. A simple method which can yield nice results is to simply use an average of the observed min/max values instead of the actual values. Alternatively, statistical measures can be used to intelligently select where to clip the original range in order to preserve as much information as possible ( Migacz, 2017 ). Going further, Banner et al., 2018 have proposed a method for analytically computing the clipping value under certain conditions. Another possible optimization point is scale-factor scope . The most common way is use a single scale-factor per-layer, but it is also possible to calculate a scale-factor per-channel. This can be beneficial if the weight distributions vary greatly between channels. When used to directly quantize a model without re-training, as described so far, this method is commonly referred to as post-training quantization . However, recent publications have shown that there are cases where post-training quantization to INT8 doesn't preserve accuracy ( Benoit et al., 2018 , Krishnamoorthi, 2018 ). Namely, smaller models such as MobileNet seem to not respond as well to post-training quantization, presumabley due to their smaller representational capacity. In such cases, quantization-aware training is used. \"Aggressive\" Quantization: INT4 and Lower Naively quantizing a FP32 model to INT4 and lower usually incurs significant accuracy degradation. Many works have tried to mitigate this effect. They usually employ one or more of the following concepts in order to improve model accuracy: Training / Re-Training : For INT4 and lower, training is required in order to obtain reasonable accuracy. The training loop is modified to take quantization into account. See details in the next section . Zhou S et al., 2016 have shown that bootstrapping the quantized model with trained FP32 weights leads to higher accuracy, as opposed to training from scratch. Other methods require a trained FP32 model, either as a starting point ( Zhou A et al., 2017 ), or as a teacher network in a knowledge distillation training setup (see here ). Replacing the activation function : The most common activation function in vision models is ReLU, which is unbounded. That is - its dynamic range is not limited for positive inputs. This is very problematic for INT4 and below due to the very limited range and resolution. Therefore, most methods replace ReLU with another function which is bounded. In some cases a clipping function with hard coded values is used ( Zhou S et al., 2016 , Mishra et al., 2018 ). Another method learns the clipping value per layer, with better results ( Choi et al., 2018 ). Once the clipping value is set, the scale factor used for quantization is also set, and no further calibration steps are required (as opposed to INT8 methods described above). Modifying network structure : Mishra et al., 2018 try to compensate for the loss of information due to quantization by using wider layers (more channels). Lin et al., 2017 proposed a binary quantization method in which a single FP32 convolution is replaced with multiple binary convolutions, each scaled to represent a different \"base\", covering a larger dynamic range overall. First and last layer : Many methods do not quantize the first and last layer of the model. It has been observed by Han et al., 2015 that the first convolutional layer is more sensitive to weights pruning, and some quantization works cite the same reason and show it empirically ( Zhou S et al., 2016 , Choi et al., 2018 ). Some works also note that these layers usually constitute a very small portion of the overall computation within the model, further reducing the motivation to quantize them ( Rastegari et al., 2016 ). Most methods keep the first and last layers at FP32. However, Choi et al., 2018 showed that \"conservative\" quantization of these layers, e.g. to INT8, does not reduce accuracy. Iterative quantization : Most methods quantize the entire model at once. Zhou A et al., 2017 employ an iterative method, which starts with a trained FP32 baseline, and quantizes only a portion of the model at the time followed by several epochs of re-training to recover the accuracy loss from quantization. Mixed Weights and Activations Precision : It has been observed that activations are more sensitive to quantization than weights ( Zhou S et al., 2016 ). Hence it is not uncommon to see experiments with activations quantized to a higher precision compared to weights. Some works have focused solely on quantizing weights, keeping the activations at FP32 ( Li et al., 2016 , Zhu et al., 2016 ). Quantization-Aware Training As mentioned above, in order to minimize the loss of accuracy from \"aggressive\" quantization, many methods that target INT4 and lower (and in some cases for INT8 as well) involve training the model in a way that considers the quantization. This means training with quantization of weights and activations \"baked\" into the training procedure. The training graph usually looks like this: A full precision copy of the weights is maintained throughout the training process (\"weights_fp\" in the diagram). Its purpose is to accumulate the small changes from the gradients without loss of precision (Note that the quantization of the weights is an integral part of the training graph, meaning that we back-propagate through it as well). Once the model is trained, only the quantized weights are used for inference. In the diagram we show \"layer N\" as the conv + batch-norm + activation combination, but the same applies to fully-connected layers, element-wise operations, etc. During training, the operations within \"layer N\" can still run in full precision, with the \"quantize\" operations in the boundaries ensuring discrete-valued weights and activations. This is sometimes called \"simulated quantization\". Straight-Through Estimator An important question in this context is how to back-propagate through the quantization functions. These functions are discrete-valued, hence their derivative is 0 almost everywhere. So, using their gradients as-is would severely hinder the learning process. An approximation commonly used to overcome this issue is the \"straight-through estimator\" (STE) ( Hinton et al., 2012 , Bengio, 2013 ), which simply passes the gradient through these functions as-is. References William Dally . High-Performance Hardware for Machine Learning. Tutorial, NIPS, 2015 Mohammad Rastegari, Vicente Ordone, Joseph Redmon and Ali Farhadi . XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. ECCV, 2016 Matthieu Courbariaux, Yoshua Bengio and Jean-Pierre David . Training deep neural networks with low precision multiplications. arxiv:1412.7024 Philipp Gysel, Jon Pimentel, Mohammad Motamedi and Soheil Ghiasi . Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 2018 Szymon Migacz . 8-bit Inference with TensorRT. GTC San Jose, 2017 Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu and Yuheng Zou . DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arxiv:1606.06160 Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu and Yurong Chen . Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights. ICLR, 2017 Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook and Debbie Marr . WRPN: Wide Reduced-Precision Networks. ICLR, 2018 Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan and Kailash Gopalakrishnan . PACT: Parameterized Clipping Activation for Quantized Neural Networks. arxiv:1805.06085 Xiaofan Lin, Cong Zhao and Wei Pan . Towards Accurate Binary Convolutional Neural Network. NIPS, 2017 Song Han, Jeff Pool, John Tran and William Dally . Learning both Weights and Connections for Efficient Neural Network. NIPS, 2015 Fengfu Li, Bo Zhang and Bin Liu . Ternary Weight Networks. arxiv:1605.04711 Chenzhuo Zhu, Song Han, Huizi Mao and William J. Dally . Trained Ternary Quantization. arxiv:1612.01064 Yoshua Bengio, Nicholas Leonard and Aaron Courville . Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arxiv:1308.3432 Geoffrey Hinton, Nitish Srivastava, Kevin Swersky, Tijmen Tieleman and Abdelrahman Mohamed . Neural Networks for Machine Learning. Coursera, video lectures, 2012 Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam and Dmitry Kalenichenko . Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. ECCV, 2018 Raghuraman Krishnamoorthi . Quantizing deep convolutional networks for efficient inference: A whitepaper arxiv:1806.08342 Ron Banner, Yury Nahshan, Elad Hoffer and Daniel Soudry . ACIQ: Analytical Clipping for Integer Quantization of neural networks arxiv:1810.05723","title":"Quantization"},{"location":"quantization.html#quantization","text":"Quantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, the predominant numerical format used for research and for deployment has so far been 32-bit floating point, or FP32. However, the desire for reduced bandwidth and compute requirements of deep learning models has driven research into using lower-precision numerical formats. It has been extensively demonstrated that weights and activations can be represented using 8-bit integers (or INT8) without incurring significant loss in accuracy. The use of even lower bit-widths, such as 4/2/1-bits, is an active field of research that has also shown great progress. Note that this discussion is on quantization only in the context of more efficient inference. Using lower-precision numerics for more efficient training is currently out of scope.","title":"Quantization"},{"location":"quantization.html#motivation-overall-efficiency","text":"The more obvious benefit from quantization is significantly reduced bandwidth and storage . For instance, using INT8 for weights and activations consumes 4x less overall bandwidth compared to FP32. Additionally integer compute is faster than floating point compute. It is also much more area and energy efficient : INT8 Operation Energy Saving vs FP32 Area Saving vs FP32 Add 30x 116x Multiply 18.5x 27x ( Dally, 2015 ) Note that very aggressive quantization can yield even more efficiency. If weights are binary (-1, 1) or ternary (-1, 0, 1 using 2-bits), then convolution and fully-connected layers can be computed with additions and subtractions only, removing multiplications completely. If activations are binary as well, then additions can also be removed, in favor of bitwise operations ( Rastegari et al., 2016 ).","title":"Motivation: Overall Efficiency"},{"location":"quantization.html#integer-vs-fp32","text":"There are two main attributes when discussing a numerical format. The first is dynamic range , which refers to the range of representable numbers. The second one is how many values can be represented within the dynamic range, which in turn determines the precision / resolution of the format (the distance between two numbers). For all integer formats, the dynamic range is [-2^{n-1} .. 2^{n-1}-1] , where n is the number of bits. So for INT8 the range is [-128 .. 127] , and for INT4 it is [-8 .. 7] (we're limiting ourselves to signed integers for now). The number of representable values is 2^n . Contrast that with FP32, where the dynamic range is \\pm 3.4\\ x\\ 10^{38} , and approximately 4.2\\ x\\ 10^9 values can be represented. We can immediately see that FP32 is much more versatile , in that it is able to represent a wide range of distributions accurately. This is a nice property for deep learning models, where the distributions of weights and activations are usually very different (at least in dynamic range). In addition the dynamic range can differ between layers in the model. In order to be able to represent these different distributions with an integer format, a scale factor is used to map the dynamic range of the tensor to the integer format range. But still we remain with the issue of having a significantly lower number of representable values, that is - much lower resolution. Note that this scale factor is, in most cases, a floating-point number. Hence, even when using integer numerics, some floating-point computations remain. Courbariaux et al., 2014 scale using only shifts, eliminating the floating point operation. In GEMMLWOP , the FP32 scale factor is approximated using an integer or fixed-point multiplication followed by a shift operation. In many cases the effect of this approximation on accuracy is negligible.","title":"Integer vs. FP32"},{"location":"quantization.html#avoiding-overflows","text":"Convolution and fully connected layers involve the storing of intermediate results in accumulators. Due to the limited dynamic range of integer formats, if we would use the same bit-width for the weights and activation, and for the accumulators, we would likely overflow very quickly. Therefore, accumulators are usually implemented with higher bit-widths. The result of multiplying two n -bit integers is, at most, a 2n -bit number. In convolution layers, such multiplications are accumulated c\\cdot k^2 times, where c is the number of input channels and k is the kernel width (assuming a square kernel). Hence, to avoid overflowing, the accumulator should be 2n + M -bits wide, where M is at least log_2(c\\cdot k^2) . In many cases 32-bit accumulators are used, however for INT4 and lower it might be possible to use less than 32 -bits, depending on the expected use cases and layer widths.","title":"Avoiding Overflows"},{"location":"quantization.html#conservative-quantization-int8","text":"In many cases, taking a model trained for FP32 and directly quantizing it to INT8, without any re-training, can result in a relatively low loss of accuracy (which may or may not be acceptable, depending on the use case). Some fine-tuning can further improve the accuracy ( Gysel at al., 2018 ). As mentioned above, a scale factor is used to adapt the dynamic range of the tensor at hand to that of the integer format. This scale factor needs to be calculated per-layer per-tensor. The simplest way is to map the min/max values of the float tensor to the min/max of the integer format. For weights and biases this is easy, as they are set once training is complete. For activations, the min/max float values can be obtained \"online\" during inference, or \"offline\". Offline means gathering activations statistics before deploying the model, either during training or by running a few \"calibration\" batches on the trained FP32 model. Based on these gathered statistics, the scaled factors are calculated and are fixed once the model is deployed. This method has the risk of encountering values outside the previously observed ranges at runtime. These values will be clipped, which might lead to accuracy degradation. Online means calculating the min/max values for each tensor dynamically during runtime. In this method clipping cannot occur, however the added computation resources required to calculate the min/max values at runtime might be prohibitive. It is important to note, however, that the full float range of an activations tensor usually includes elements which are statistically outliers. These values can be discarded by using a narrower min/max range, effectively allowing some clipping to occur in favor of increasing the resolution provided to the part of the distribution containing most of the information. A simple method which can yield nice results is to simply use an average of the observed min/max values instead of the actual values. Alternatively, statistical measures can be used to intelligently select where to clip the original range in order to preserve as much information as possible ( Migacz, 2017 ). Going further, Banner et al., 2018 have proposed a method for analytically computing the clipping value under certain conditions. Another possible optimization point is scale-factor scope . The most common way is use a single scale-factor per-layer, but it is also possible to calculate a scale-factor per-channel. This can be beneficial if the weight distributions vary greatly between channels. When used to directly quantize a model without re-training, as described so far, this method is commonly referred to as post-training quantization . However, recent publications have shown that there are cases where post-training quantization to INT8 doesn't preserve accuracy ( Benoit et al., 2018 , Krishnamoorthi, 2018 ). Namely, smaller models such as MobileNet seem to not respond as well to post-training quantization, presumabley due to their smaller representational capacity. In such cases, quantization-aware training is used.","title":"\"Conservative\" Quantization: INT8"},{"location":"quantization.html#aggressive-quantization-int4-and-lower","text":"Naively quantizing a FP32 model to INT4 and lower usually incurs significant accuracy degradation. Many works have tried to mitigate this effect. They usually employ one or more of the following concepts in order to improve model accuracy: Training / Re-Training : For INT4 and lower, training is required in order to obtain reasonable accuracy. The training loop is modified to take quantization into account. See details in the next section . Zhou S et al., 2016 have shown that bootstrapping the quantized model with trained FP32 weights leads to higher accuracy, as opposed to training from scratch. Other methods require a trained FP32 model, either as a starting point ( Zhou A et al., 2017 ), or as a teacher network in a knowledge distillation training setup (see here ). Replacing the activation function : The most common activation function in vision models is ReLU, which is unbounded. That is - its dynamic range is not limited for positive inputs. This is very problematic for INT4 and below due to the very limited range and resolution. Therefore, most methods replace ReLU with another function which is bounded. In some cases a clipping function with hard coded values is used ( Zhou S et al., 2016 , Mishra et al., 2018 ). Another method learns the clipping value per layer, with better results ( Choi et al., 2018 ). Once the clipping value is set, the scale factor used for quantization is also set, and no further calibration steps are required (as opposed to INT8 methods described above). Modifying network structure : Mishra et al., 2018 try to compensate for the loss of information due to quantization by using wider layers (more channels). Lin et al., 2017 proposed a binary quantization method in which a single FP32 convolution is replaced with multiple binary convolutions, each scaled to represent a different \"base\", covering a larger dynamic range overall. First and last layer : Many methods do not quantize the first and last layer of the model. It has been observed by Han et al., 2015 that the first convolutional layer is more sensitive to weights pruning, and some quantization works cite the same reason and show it empirically ( Zhou S et al., 2016 , Choi et al., 2018 ). Some works also note that these layers usually constitute a very small portion of the overall computation within the model, further reducing the motivation to quantize them ( Rastegari et al., 2016 ). Most methods keep the first and last layers at FP32. However, Choi et al., 2018 showed that \"conservative\" quantization of these layers, e.g. to INT8, does not reduce accuracy. Iterative quantization : Most methods quantize the entire model at once. Zhou A et al., 2017 employ an iterative method, which starts with a trained FP32 baseline, and quantizes only a portion of the model at the time followed by several epochs of re-training to recover the accuracy loss from quantization. Mixed Weights and Activations Precision : It has been observed that activations are more sensitive to quantization than weights ( Zhou S et al., 2016 ). Hence it is not uncommon to see experiments with activations quantized to a higher precision compared to weights. Some works have focused solely on quantizing weights, keeping the activations at FP32 ( Li et al., 2016 , Zhu et al., 2016 ).","title":"\"Aggressive\" Quantization: INT4 and Lower"},{"location":"quantization.html#quantization-aware-training","text":"As mentioned above, in order to minimize the loss of accuracy from \"aggressive\" quantization, many methods that target INT4 and lower (and in some cases for INT8 as well) involve training the model in a way that considers the quantization. This means training with quantization of weights and activations \"baked\" into the training procedure. The training graph usually looks like this: A full precision copy of the weights is maintained throughout the training process (\"weights_fp\" in the diagram). Its purpose is to accumulate the small changes from the gradients without loss of precision (Note that the quantization of the weights is an integral part of the training graph, meaning that we back-propagate through it as well). Once the model is trained, only the quantized weights are used for inference. In the diagram we show \"layer N\" as the conv + batch-norm + activation combination, but the same applies to fully-connected layers, element-wise operations, etc. During training, the operations within \"layer N\" can still run in full precision, with the \"quantize\" operations in the boundaries ensuring discrete-valued weights and activations. This is sometimes called \"simulated quantization\".","title":"Quantization-Aware Training"},{"location":"quantization.html#straight-through-estimator","text":"An important question in this context is how to back-propagate through the quantization functions. These functions are discrete-valued, hence their derivative is 0 almost everywhere. So, using their gradients as-is would severely hinder the learning process. An approximation commonly used to overcome this issue is the \"straight-through estimator\" (STE) ( Hinton et al., 2012 , Bengio, 2013 ), which simply passes the gradient through these functions as-is.","title":"Straight-Through Estimator"},{"location":"quantization.html#references","text":"William Dally . High-Performance Hardware for Machine Learning. Tutorial, NIPS, 2015 Mohammad Rastegari, Vicente Ordone, Joseph Redmon and Ali Farhadi . XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. ECCV, 2016 Matthieu Courbariaux, Yoshua Bengio and Jean-Pierre David . Training deep neural networks with low precision multiplications. arxiv:1412.7024 Philipp Gysel, Jon Pimentel, Mohammad Motamedi and Soheil Ghiasi . Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 2018 Szymon Migacz . 8-bit Inference with TensorRT. GTC San Jose, 2017 Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu and Yuheng Zou . DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arxiv:1606.06160 Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu and Yurong Chen . Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights. ICLR, 2017 Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook and Debbie Marr . WRPN: Wide Reduced-Precision Networks. ICLR, 2018 Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan and Kailash Gopalakrishnan . PACT: Parameterized Clipping Activation for Quantized Neural Networks. arxiv:1805.06085 Xiaofan Lin, Cong Zhao and Wei Pan . Towards Accurate Binary Convolutional Neural Network. NIPS, 2017 Song Han, Jeff Pool, John Tran and William Dally . Learning both Weights and Connections for Efficient Neural Network. NIPS, 2015 Fengfu Li, Bo Zhang and Bin Liu . Ternary Weight Networks. arxiv:1605.04711 Chenzhuo Zhu, Song Han, Huizi Mao and William J. Dally . Trained Ternary Quantization. arxiv:1612.01064 Yoshua Bengio, Nicholas Leonard and Aaron Courville . Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arxiv:1308.3432 Geoffrey Hinton, Nitish Srivastava, Kevin Swersky, Tijmen Tieleman and Abdelrahman Mohamed . Neural Networks for Machine Learning. Coursera, video lectures, 2012 Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam and Dmitry Kalenichenko . Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. ECCV, 2018 Raghuraman Krishnamoorthi . Quantizing deep convolutional networks for efficient inference: A whitepaper arxiv:1806.08342 Ron Banner, Yury Nahshan, Elad Hoffer and Daniel Soudry . ACIQ: Analytical Clipping for Integer Quantization of neural networks arxiv:1810.05723","title":"References"},{"location":"regularization.html","text":"Regularization In their book Deep Learning Ian Goodfellow et al. define regularization as \"any modification we make to a learning algorithm that is intended to reduce its generalization error, but not its training error.\" PyTorch's optimizers use \\(l_2\\) parameter regularization to limit the capacity of models (i.e. reduce the variance). In general, we can write this as: \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_R R(W) \\] And specifically, \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_R \\lVert W \\rVert_2^2 \\] Where W is the collection of all weight elements in the network (i.e. this is model.parameters()), \\(loss(W;x;y)\\) is the total training loss, and \\(loss_D(W)\\) is the data loss (i.e. the error of the objective function, also called the loss function, or criterion in the Distiller sample image classifier compression application). optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9, weight_decay=0.0001) criterion = nn.CrossEntropyLoss() ... for input, target in dataset: optimizer.zero_grad() output = model(input) loss = criterion(output, target) loss.backward() optimizer.step() \\(\\lambda_R\\) is a scalar called the regularization strength , and it balances the data error and the regularization error. In PyTorch, this is the weight_decay argument. \\(\\lVert W \\rVert_2^2\\) is the square of the \\(l_2\\)-norm of W, and as such it is a magnitude , or sizing, of the weights tensor. \\[ \\lVert W \\rVert_2^2 = \\sum_{l=1}^{L} \\sum_{i=1}^{n} |w_{l,i}|^2 \\;\\;where \\;n = torch.numel(w_l) \\] \\(L\\) is the number of layers in the network; and the notation about used 1-based numbering to simplify the notation. The qualitative differences between the \\(l_2\\)-norm, and the squared \\(l_2\\)-norm is explained in Deep Learning . Sparsity and Regularization We mention regularization because there is an interesting interaction between regularization and some DNN sparsity-inducing methods. In Dense-Sparse-Dense (DSD) , Song Han et al. use pruning as a regularizer to improve a model's accuracy: \"Sparsity is a powerful form of regularization. Our intuition is that, once the network arrives at a local minimum given the sparsity constraint, relaxing the constraint gives the network more freedom to escape the saddle point and arrive at a higher-accuracy local minimum.\" Regularization can also be used to induce sparsity. To induce element-wise sparsity we can use the \\(l_1\\)-norm, \\(\\lVert W \\rVert_1\\). \\[ \\lVert W \\rVert_1 = l_1(W) = \\sum_{i=1}^{|W|} |w_i| \\] \\(l_2\\)-norm regularization reduces overfitting and improves a model's accuracy by shrinking large parameters, but it does not force these parameters to absolute zero. \\(l_1\\)-norm regularization sets some of the parameter elements to zero, therefore limiting the model's capacity while making the model simpler. This is sometimes referred to as feature selection and gives us another interpretation of pruning. One of Distiller's Jupyter notebooks explains how the \\(l_1\\)-norm regularizer induces sparsity, and how it interacts with \\(l_2\\)-norm regularization. If we configure weight_decay to zero and use \\(l_1\\)-norm regularization, then we have: \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_R \\lVert W \\rVert_1 \\] If we use both regularizers, we have: \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_{R_2} \\lVert W \\rVert_2^2 + \\lambda_{R_1} \\lVert W \\rVert_1 \\] Class distiller.L1Regularizer implements \\(l_1\\)-norm regularization, and of course, you can also schedule regularization. l1_regularizer = distiller.s(model.parameters()) ... loss = criterion(output, target) + lambda * l1_regularizer() Group Regularization In Group Regularization, we penalize entire groups of parameter elements, instead of individual elements. Therefore, entire groups are either sparsified (i.e. all of the group elements have a value of zero) or not. The group structures have to be pre-defined. To the data loss, and the element-wise regularization (if any), we can add group-wise regularization penalty. We represent all of the parameter groups in layer \\(l\\) as \\( W_l^{(G)} \\), and we add the penalty of all groups for all layers. It gets a bit messy, but not overly complicated: \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_R R(W) + \\lambda_g \\sum_{l=1}^{L} R_g(W_l^{(G)}) \\] Let's denote all of the weight elements in group \\(g\\) as \\(w^{(g)}\\). \\[ R_g(w^{(g)}) = \\sum_{g=1}^{G} \\lVert w^{(g)} \\rVert_g = \\sum_{g=1}^{G} \\sum_{i=1}^{|w^{(g)}|} {(w_i^{(g)})}^2 \\] where \\(w^{(g)} \\in w^{(l)} \\) and \\( |w^{(g)}| \\) is the number of elements in \\( w^{(g)} \\). \\( \\lambda_g \\sum_{l=1}^{L} R_g(W_l^{(G)}) \\) is called the Group Lasso regularizer. Much as in \\(l_1\\)-norm regularization we sum the magnitudes of all tensor elements, in Group Lasso we sum the magnitudes of element structures (i.e. groups). Group Regularization is also called Block Regularization, Structured Regularization, or coarse-grained sparsity (remember that element-wise sparsity is sometimes referred to as fine-grained sparsity). Group sparsity exhibits regularity (i.e. its shape is regular), and therefore it can be beneficial to improve inference speed. Huizi-et-al-2017 provides an overview of some of the different groups: kernel, channel, filter, layers. Fiber structures such as matrix columns and rows, as well as various shaped structures (block sparsity), and even intra kernel strided sparsity can also be used. distiller.GroupLassoRegularizer currently implements most of these groups, and you can easily add new groups. References Ian Goodfellow and Yoshua Bengio and Aaron Courville . Deep Learning , arXiv:1607.04381v2, 2017. Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally . DSD: Dense-Sparse-Dense Training for Deep Neural Networks , arXiv:1607.04381v2, 2017. Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally . Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , arXiv:1705.08922v3, 2017. Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung . Structured pruning of deep convolutional neural networks , arXiv:1512.08571, 2015","title":"Regularization"},{"location":"regularization.html#regularization","text":"In their book Deep Learning Ian Goodfellow et al. define regularization as \"any modification we make to a learning algorithm that is intended to reduce its generalization error, but not its training error.\" PyTorch's optimizers use \\(l_2\\) parameter regularization to limit the capacity of models (i.e. reduce the variance). In general, we can write this as: \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_R R(W) \\] And specifically, \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_R \\lVert W \\rVert_2^2 \\] Where W is the collection of all weight elements in the network (i.e. this is model.parameters()), \\(loss(W;x;y)\\) is the total training loss, and \\(loss_D(W)\\) is the data loss (i.e. the error of the objective function, also called the loss function, or criterion in the Distiller sample image classifier compression application). optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9, weight_decay=0.0001) criterion = nn.CrossEntropyLoss() ... for input, target in dataset: optimizer.zero_grad() output = model(input) loss = criterion(output, target) loss.backward() optimizer.step() \\(\\lambda_R\\) is a scalar called the regularization strength , and it balances the data error and the regularization error. In PyTorch, this is the weight_decay argument. \\(\\lVert W \\rVert_2^2\\) is the square of the \\(l_2\\)-norm of W, and as such it is a magnitude , or sizing, of the weights tensor. \\[ \\lVert W \\rVert_2^2 = \\sum_{l=1}^{L} \\sum_{i=1}^{n} |w_{l,i}|^2 \\;\\;where \\;n = torch.numel(w_l) \\] \\(L\\) is the number of layers in the network; and the notation about used 1-based numbering to simplify the notation. The qualitative differences between the \\(l_2\\)-norm, and the squared \\(l_2\\)-norm is explained in Deep Learning .","title":"Regularization"},{"location":"regularization.html#sparsity-and-regularization","text":"We mention regularization because there is an interesting interaction between regularization and some DNN sparsity-inducing methods. In Dense-Sparse-Dense (DSD) , Song Han et al. use pruning as a regularizer to improve a model's accuracy: \"Sparsity is a powerful form of regularization. Our intuition is that, once the network arrives at a local minimum given the sparsity constraint, relaxing the constraint gives the network more freedom to escape the saddle point and arrive at a higher-accuracy local minimum.\" Regularization can also be used to induce sparsity. To induce element-wise sparsity we can use the \\(l_1\\)-norm, \\(\\lVert W \\rVert_1\\). \\[ \\lVert W \\rVert_1 = l_1(W) = \\sum_{i=1}^{|W|} |w_i| \\] \\(l_2\\)-norm regularization reduces overfitting and improves a model's accuracy by shrinking large parameters, but it does not force these parameters to absolute zero. \\(l_1\\)-norm regularization sets some of the parameter elements to zero, therefore limiting the model's capacity while making the model simpler. This is sometimes referred to as feature selection and gives us another interpretation of pruning. One of Distiller's Jupyter notebooks explains how the \\(l_1\\)-norm regularizer induces sparsity, and how it interacts with \\(l_2\\)-norm regularization. If we configure weight_decay to zero and use \\(l_1\\)-norm regularization, then we have: \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_R \\lVert W \\rVert_1 \\] If we use both regularizers, we have: \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_{R_2} \\lVert W \\rVert_2^2 + \\lambda_{R_1} \\lVert W \\rVert_1 \\] Class distiller.L1Regularizer implements \\(l_1\\)-norm regularization, and of course, you can also schedule regularization. l1_regularizer = distiller.s(model.parameters()) ... loss = criterion(output, target) + lambda * l1_regularizer()","title":"Sparsity and Regularization"},{"location":"regularization.html#group-regularization","text":"In Group Regularization, we penalize entire groups of parameter elements, instead of individual elements. Therefore, entire groups are either sparsified (i.e. all of the group elements have a value of zero) or not. The group structures have to be pre-defined. To the data loss, and the element-wise regularization (if any), we can add group-wise regularization penalty. We represent all of the parameter groups in layer \\(l\\) as \\( W_l^{(G)} \\), and we add the penalty of all groups for all layers. It gets a bit messy, but not overly complicated: \\[ loss(W;x;y) = loss_D(W;x;y) + \\lambda_R R(W) + \\lambda_g \\sum_{l=1}^{L} R_g(W_l^{(G)}) \\] Let's denote all of the weight elements in group \\(g\\) as \\(w^{(g)}\\). \\[ R_g(w^{(g)}) = \\sum_{g=1}^{G} \\lVert w^{(g)} \\rVert_g = \\sum_{g=1}^{G} \\sum_{i=1}^{|w^{(g)}|} {(w_i^{(g)})}^2 \\] where \\(w^{(g)} \\in w^{(l)} \\) and \\( |w^{(g)}| \\) is the number of elements in \\( w^{(g)} \\). \\( \\lambda_g \\sum_{l=1}^{L} R_g(W_l^{(G)}) \\) is called the Group Lasso regularizer. Much as in \\(l_1\\)-norm regularization we sum the magnitudes of all tensor elements, in Group Lasso we sum the magnitudes of element structures (i.e. groups). Group Regularization is also called Block Regularization, Structured Regularization, or coarse-grained sparsity (remember that element-wise sparsity is sometimes referred to as fine-grained sparsity). Group sparsity exhibits regularity (i.e. its shape is regular), and therefore it can be beneficial to improve inference speed. Huizi-et-al-2017 provides an overview of some of the different groups: kernel, channel, filter, layers. Fiber structures such as matrix columns and rows, as well as various shaped structures (block sparsity), and even intra kernel strided sparsity can also be used. distiller.GroupLassoRegularizer currently implements most of these groups, and you can easily add new groups.","title":"Group Regularization"},{"location":"regularization.html#references","text":"Ian Goodfellow and Yoshua Bengio and Aaron Courville . Deep Learning , arXiv:1607.04381v2, 2017. Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally . DSD: Dense-Sparse-Dense Training for Deep Neural Networks , arXiv:1607.04381v2, 2017. Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally . Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , arXiv:1705.08922v3, 2017. Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung . Structured pruning of deep convolutional neural networks , arXiv:1512.08571, 2015","title":"References"},{"location":"schedule.html","text":"Compression scheduler In iterative pruning, we create some kind of pruning regimen that specifies how to prune, and what to prune at every stage of the pruning and training stages. This motivated the design of CompressionScheduler : it needed to be part of the training loop, and to be able to make and implement pruning, regularization and quantization decisions. We wanted to be able to change the particulars of the compression schedule, w/o touching the code, and settled on using YAML as a container for this specification. We found that when we make many experiments on the same code base, it is easier to maintain all of these experiments if we decouple the differences from the code-base. Therefore, we added to the scheduler support for learning-rate decay scheduling because, again, we wanted the freedom to change the LR-decay policy without changing code. High level overview Let's briefly discuss the main mechanisms and abstractions: A schedule specification is composed of a list of sections defining instances of Pruners, Regularizers, Quantizers, LR-scheduler and Policies. Pruners, Regularizers and Quantizers are very similar: They implement either a Pruning/Regularization/Quantization algorithm, respectively. An LR-scheduler specifies the LR-decay algorithm. These define the what part of the schedule. The Policies define the when part of the schedule: at which epoch to start applying the Pruner/Regularizer/Quantizer/LR-decay, the epoch to end, and how often to invoke the policy (frequency of application). A policy also defines the instance of Pruner/Regularizer/Quantizer/LR-decay it is managing. The CompressionScheduler is configured from a YAML file or from a dictionary, but you can also manually create Policies, Pruners, Regularizers and Quantizers from code. Syntax through example We'll use alexnet.schedule_agp.yaml to explain some of the YAML syntax for configuring Sensitivity Pruning of Alexnet. version: 1 pruners: my_pruner: class: 'SensitivityPruner' sensitivities: 'features.module.0.weight': 0.25 'features.module.3.weight': 0.35 'features.module.6.weight': 0.40 'features.module.8.weight': 0.45 'features.module.10.weight': 0.55 'classifier.1.weight': 0.875 'classifier.4.weight': 0.875 'classifier.6.weight': 0.625 lr_schedulers: pruning_lr: class: ExponentialLR gamma: 0.9 policies: - pruner: instance_name : 'my_pruner' starting_epoch: 0 ending_epoch: 38 frequency: 2 - lr_scheduler: instance_name: pruning_lr starting_epoch: 24 ending_epoch: 200 frequency: 1 There is only one version of the YAML syntax, and the version number is not verified at the moment. However, to be future-proof it is probably better to let the YAML parser know that you are using version-1 syntax, in case there is ever a version 2. version: 1 In the pruners section, we define the instances of pruners we want the scheduler to instantiate and use. We define a single pruner instance, named my_pruner , of algorithm SensitivityPruner . We will refer to this instance in the Policies section. Then we list the sensitivity multipliers, \\(s\\), of each of the weight tensors. You may list as many Pruners as you want in this section, as long as each has a unique name. You can several types of pruners in one schedule. pruners: my_pruner: class: 'SensitivityPruner' sensitivities: 'features.module.0.weight': 0.25 'features.module.3.weight': 0.35 'features.module.6.weight': 0.40 'features.module.8.weight': 0.45 'features.module.10.weight': 0.55 'classifier.1.weight': 0.875 'classifier.4.weight': 0.875 'classifier.6.weight': 0.6 Next, we want to specify the learning-rate decay scheduling in the lr_schedulers section. We assign a name to this instance: pruning_lr . As in the pruners section, you may use any name, as long as all LR-schedulers have a unique name. At the moment, only one instance of LR-scheduler is allowed. The LR-scheduler must be a subclass of PyTorch's _LRScheduler . You can use any of the schedulers defined in torch.optim.lr_scheduler (see here ). In addition, we've implemented some additional schedulers in Distiller (see here ). The keyword arguments (kwargs) are passed directly to the LR-scheduler's constructor, so that as new LR-schedulers are added to torch.optim.lr_scheduler , they can be used without changing the application code. lr_schedulers: pruning_lr: class: ExponentialLR gamma: 0.9 Finally, we define the policies section which defines the actual scheduling. A Policy manages an instance of a Pruner , Regularizer , Quantizer , or LRScheduler , by naming the instance. In the example below, a PruningPolicy uses the pruner instance named my_pruner : it activates it at a frequency of 2 epochs (i.e. every other epoch), starting at epoch 0, and ending at epoch 38. policies: - pruner: instance_name : 'my_pruner' starting_epoch: 0 ending_epoch: 38 frequency: 2 - lr_scheduler: instance_name: pruning_lr starting_epoch: 24 ending_epoch: 200 frequency: 1 This is iterative pruning : Train Connectivity Prune Connections Retrain Weights Goto 2 It is described in Learning both Weights and Connections for Efficient Neural Networks : \"Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections...After an initial training phase, we remove all connections whose weight is lower than a threshold. This pruning converts a dense, fully-connected layer to a sparse layer. This first phase learns the topology of the networks \u2014 learning which connections are important and removing the unimportant connections. We then retrain the sparse network so the remaining connections can compensate for the connections that have been removed. The phases of pruning and retraining may be repeated iteratively to further reduce network complexity.\" Regularization You can also define and schedule regularization. L1 regularization Format (this is an informal specification, not a valid ABNF specification): regularizers: REGULARIZER_NAME_STR : class: L1Regularizer reg_regims: PYTORCH_PARAM_NAME_STR : STRENGTH_FLOAT ... PYTORCH_PARAM_NAME_STR : STRENGTH_FLOAT threshold_criteria: [Mean_Abs | Max] For example: version: 1 regularizers: my_L1_reg: class: L1Regularizer reg_regims: 'module.layer3.1.conv1.weight': 0.000002 'module.layer3.1.conv2.weight': 0.000002 'module.layer3.1.conv3.weight': 0.000002 'module.layer3.2.conv1.weight': 0.000002 threshold_criteria: Mean_Abs policies: - regularizer: instance_name: my_L1_reg starting_epoch: 0 ending_epoch: 60 frequency: 1 Group regularization Format (informal specification): Format: regularizers: REGULARIZER_NAME_STR : class: L1Regularizer reg_regims: PYTORCH_PARAM_NAME_STR : [ STRENGTH_FLOAT , '2D' | '3D' | '4D' | 'Channels' | 'Cols' | 'Rows' ] PYTORCH_PARAM_NAME_STR : [ STRENGTH_FLOAT , '2D' | '3D' | '4D' | 'Channels' | 'Cols' | 'Rows' ] threshold_criteria: [Mean_Abs | Max] For example: version: 1 regularizers: my_filter_regularizer: class: GroupLassoRegularizer reg_regims: 'module.layer3.1.conv1.weight': [0.00005, '3D'] 'module.layer3.1.conv2.weight': [0.00005, '3D'] 'module.layer3.1.conv3.weight': [0.00005, '3D'] 'module.layer3.2.conv1.weight': [0.00005, '3D'] threshold_criteria: Mean_Abs policies: - regularizer: instance_name: my_filter_regularizer starting_epoch: 0 ending_epoch: 60 frequency: 1 Mixing it up You can mix pruning and regularization. version: 1 pruners: my_pruner: class: 'SensitivityPruner' sensitivities: 'features.module.0.weight': 0.25 'features.module.3.weight': 0.35 'features.module.6.weight': 0.40 'features.module.8.weight': 0.45 'features.module.10.weight': 0.55 'classifier.1.weight': 0.875 'classifier.4.weight': 0.875 'classifier.6.weight': 0.625 regularizers: 2d_groups_regularizer: class: GroupLassoRegularizer reg_regims: 'features.module.0.weight': [0.000012, '2D'] 'features.module.3.weight': [0.000012, '2D'] 'features.module.6.weight': [0.000012, '2D'] 'features.module.8.weight': [0.000012, '2D'] 'features.module.10.weight': [0.000012, '2D'] lr_schedulers: # Learning rate decay scheduler pruning_lr: class: ExponentialLR gamma: 0.9 policies: - pruner: instance_name : 'my_pruner' starting_epoch: 0 ending_epoch: 38 frequency: 2 - regularizer: instance_name: '2d_groups_regularizer' starting_epoch: 0 ending_epoch: 38 frequency: 1 - lr_scheduler: instance_name: pruning_lr starting_epoch: 24 ending_epoch: 200 frequency: 1 Quantization-Aware Training Similarly to pruners and regularizers, specifying a quantizer in the scheduler YAML follows the constructor arguments of the Quantizer class (see details here ). Note that only a single quantizer instance may be defined per YAML. Let's see an example: quantizers: dorefa_quantizer: class: DorefaQuantizer bits_activations: 8 bits_weights: 4 bits_overrides: conv1: wts: null acts: null relu1: wts: null acts: null final_relu: wts: null acts: null fc: wts: null acts: null The specific quantization method we're instantiating here is DorefaQuantizer . Then we define the default bit-widths for activations and weights, in this case 8 and 4-bits, respectively. Then, we define the bits_overrides mapping. In the example above, we choose not to quantize the first and last layer of the model. In the case of DorefaQuantizer , the weights are quantized as part of the convolution / FC layers, but the activations are quantized in separate layers, which replace the ReLU layers in the original model (remember - even though we replaced the ReLU modules with our own quantization modules, the name of the modules isn't changed). So, in all, we need to reference the first layer with parameters conv1 , the first activation layer relu1 , the last activation layer final_relu and the last layer with parameters fc . Specifying null means \"do not quantize\". Note that for quantizers, we reference names of modules, not names of parameters as we do for pruners and regularizers. Defining overrides for groups of layers using regular expressions Suppose we have a sub-module in our model named block1 , which contains multiple convolution layers which we would like to quantize to, say, 2-bits. The convolution layers are named conv1 , conv2 and so on. In that case we would define the following: bits_overrides: 'block1\\.conv*': wts: 2 acts: null RegEx Note : Remember that the dot ( . ) is a meta-character (i.e. a reserved character) in regular expressions. So, to match the actual dot characters which separate sub-modules in PyTorch module names, we need to escape it: \\. Overlapping patterns are also possible, which allows to define some override for a groups of layers and also \"single-out\" specific layers for different overrides. For example, let's take the last example and configure a different override for block1.conv1 : bits_overrides: 'block1\\.conv1': wts: 4 acts: null 'block1\\.conv*': wts: 2 acts: null Important Note : The patterns are evaluated eagerly - first match wins. So, to properly quantize a model using \"broad\" patterns and more \"specific\" patterns as just shown, make sure the specific pattern is listed before the broad one. The QuantizationPolicy , which controls the quantization procedure during training, is actually quite simplistic. All it does is call the prepare_model() function of the Quantizer when it's initialized, followed by the first call to quantize_params() . Then, at the end of each epoch, after the float copy of the weights has been updated, it calls the quantize_params() function again. policies: - quantizer: instance_name: dorefa_quantizer starting_epoch: 0 ending_epoch: 200 frequency: 1 Important Note : As mentioned here , since the quantizer modifies the model's parameters (assuming training with quantization in the loop is used), the call to prepare_model() must be performed before an optimizer is called. Therefore, currently, the starting epoch for a quantization policy must be 0, otherwise the quantization process will not work as expected. If one wishes to do a \"warm-startup\" (or \"boot-strapping\"), training for a few epochs with full precision and only then starting to quantize, the only way to do this right now is to execute a separate run to generate the boot-strapped weights, and execute a second which will resume the checkpoint with the boot-strapped weights. Post-Training Quantization Post-training quantization differs from the other techniques described here. Since it is not executed during training, it does not require any Policies nor a Scheduler. Currently, the only method implemented for post-training quantization is range-based linear quantization . Quantizing a model using this method, requires adding 2 lines of code: quantizer = distiller.quantization.PostTrainLinearQuantizer(model, quantizer arguments ) quantizer.prepare_model() # Execute evaluation on model as usual See the documentation for PostTrainLinearQuantizer in range_linear.py for details on the available arguments. In addition to directly instantiating the quantizer with arguments, it can also be configured from a YAML file. The syntax for the YAML file is exactly the same as seen in the quantization-aware training section above. Not surprisingly, the class defined must be PostTrainLinearQuantizer , and any other components or policies defined in the YAML file are ignored. We'll see how to create the quantizer in this manner below. If more configurability is needed, a helper function can be used that will add a set of command-line arguments to configure the quantizer: parser = argparse.ArgumentParser() distiller.quantization.add_post_train_quant_args(parser) args = parser.parse_args() These are the available command line arguments: Arguments controlling quantization at evaluation time ( post-training quantization ): --quantize-eval, --qe Apply linear quantization to model before evaluation. Applicable only if --evaluate is also set --qe-calibration PORTION_OF_TEST_SET Run the model in evaluation mode on the specified portion of the test dataset and collect statistics. Ignores all other 'qe--*' arguments --qe-mode QE_MODE, --qem QE_MODE Linear quantization mode. Choices: sym | asym_s | asym_u --qe-bits-acts NUM_BITS, --qeba NUM_BITS Number of bits for quantization of activations --qe-bits-wts NUM_BITS, --qebw NUM_BITS Number of bits for quantization of weights --qe-bits-accum NUM_BITS Number of bits for quantization of the accumulator --qe-clip-acts, --qeca Enable clipping of activations using min/max values averaging over batch --qe-no-clip-layers LAYER_NAME [LAYER_NAME ...], --qencl LAYER_NAME [LAYER_NAME ...] List of layer names for which not to clip activations. Applicable only if --qe-clip-acts is also set --qe-per-channel, --qepc Enable per-channel quantization of weights (per output channel) --qe-stats-file PATH Path to YAML file with calibration stats. If not given, dynamic quantization will be run (Note that not all layer types are supported for dynamic quantization) --qe-config-file PATH Path to YAML file containing configuration for PostTrainLinearQuantizer (if present, all other --qe* arguments are ignored) (Note that --quantize-eval and --qe-calibration are mutually exclusive.) When using these command line arguments, the quantizer can be invoked as follows: if args.quantize_eval: if args.qe_config_file: quantizer = distiller.config_component_from_file_by_class(model, args.qe_config_file, 'PostTrainLinearQuantizer') else: quantizer = quantization.PostTrainLinearQuantizer(model, args.qe_bits_acts, args.qe_bits_wts, args.qe_bits_accum, None, args.qe_mode, args.qe_clip_acts, args.qe_no_clip_layers, args.qe_per_channel, args.qe_stats_file) quantizer.prepare_model() # Execute evaluation on model as usual Note that the command-line arguments don't expose the bits_overrides parameter of the quantizer, which allows fine-grained control over how each layer is quantized. To utilize this functionality, configure with a YAML file. To see integration of these command line arguments in use, see the image classification example . For examples invocations of post-training quantization see here . Collecting Statistics for Quantization To collect generate statistics that can be used for static quantization of activations, do the following (shown here assuming the command line argument --qe-calibration shown above is used, which specifies the number of batches to use for statistics generation): if args.qe_calibration: distiller.utils.assign_layer_fq_names(model) msglogger.info( Generating quantization calibration stats based on {0} users .format(args.qe_calibration)) collector = distiller.data_loggers.QuantCalibrationStatsCollector(model) with collector_context(collector): # Here call your model evaluation function, making sure to execute only # the portion of the dataset specified by the qe_calibration argument yaml_path = 'some/dir/quantization_stats.yaml' collector.save(yaml_path) The genreated YAML stats file can then be provided using the `--qe-stats-file argument. An example of a generated stats file can be found here . Knowledge Distillation Knowledge distillation (see here ) is also implemented as a Policy , which should be added to the scheduler. However, with the current implementation, it cannot be defined within the YAML file like the rest of the policies described above. To make the integration of this method into applications a bit easier, a helper function can be used that will add a set of command-line arguments related to knowledge distillation: import argparse import distiller parser = argparse.ArgumentParser() distiller.knowledge_distillation.add_distillation_args(parser) (The add_distillation_args function accepts some optional arguments, see its implementation at distiller/knowledge_distillation.py for details) These are the command line arguments exposed by this function: Knowledge Distillation Training Arguments: --kd-teacher ARCH Model architecture for teacher model --kd-pretrained Use pre-trained model for teacher --kd-resume PATH Path to checkpoint from which to load teacher weights --kd-temperature TEMP, --kd-temp TEMP Knowledge distillation softmax temperature --kd-distill-wt WEIGHT, --kd-dw WEIGHT Weight for distillation loss (student vs. teacher soft targets) --kd-student-wt WEIGHT, --kd-sw WEIGHT Weight for student vs. labels loss --kd-teacher-wt WEIGHT, --kd-tw WEIGHT Weight for teacher vs. labels loss --kd-start-epoch EPOCH_NUM Epoch from which to enable distillation Once arguments have been parsed, some initialization code is required, similar to the following: # Assuming: # args variable holds command line arguments # model variable holds the model we're going to train, that is - the student model # compression_scheduler variable holds a CompressionScheduler instance args.kd_policy = None if args.kd_teacher: # Create teacher model - replace this with your model creation code teacher = create_model(args.kd_pretrained, args.dataset, args.kd_teacher, device_ids=args.gpus) if args.kd_resume: teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume) # Create policy and add to scheduler dlw = distiller.DistillationLossWeights(args.kd_distill_wt, args.kd_student_wt, args.kd_teacher_wt) args.kd_policy = distiller.KnowledgeDistillationPolicy(model, teacher, args.kd_temp, dlw) compression_scheduler.add_policy(args.kd_policy, starting_epoch=args.kd_start_epoch, ending_epoch=args.epochs, frequency=1) Finally, during the training loop, we need to perform forward propagation through the teacher model as well. The KnowledgeDistillationPolicy class keeps a reference to both the student and teacher models, and exposes a forward function that performs forward propagation on both of them. Since this is not one of the standard policy callbacks, we need to call this function manually from our training loop, as follows: if args.kd_policy is None: # Revert to a normal forward-prop call if no knowledge distillation policy is present output = model(input_var) else: output = args.kd_policy.forward(input_var) To see this integration in action, take a look at the image classification sample at examples/classifier_compression/compress_classifier.py .","title":"Compression Scheduling"},{"location":"schedule.html#compression-scheduler","text":"In iterative pruning, we create some kind of pruning regimen that specifies how to prune, and what to prune at every stage of the pruning and training stages. This motivated the design of CompressionScheduler : it needed to be part of the training loop, and to be able to make and implement pruning, regularization and quantization decisions. We wanted to be able to change the particulars of the compression schedule, w/o touching the code, and settled on using YAML as a container for this specification. We found that when we make many experiments on the same code base, it is easier to maintain all of these experiments if we decouple the differences from the code-base. Therefore, we added to the scheduler support for learning-rate decay scheduling because, again, we wanted the freedom to change the LR-decay policy without changing code.","title":"Compression scheduler"},{"location":"schedule.html#high-level-overview","text":"Let's briefly discuss the main mechanisms and abstractions: A schedule specification is composed of a list of sections defining instances of Pruners, Regularizers, Quantizers, LR-scheduler and Policies. Pruners, Regularizers and Quantizers are very similar: They implement either a Pruning/Regularization/Quantization algorithm, respectively. An LR-scheduler specifies the LR-decay algorithm. These define the what part of the schedule. The Policies define the when part of the schedule: at which epoch to start applying the Pruner/Regularizer/Quantizer/LR-decay, the epoch to end, and how often to invoke the policy (frequency of application). A policy also defines the instance of Pruner/Regularizer/Quantizer/LR-decay it is managing. The CompressionScheduler is configured from a YAML file or from a dictionary, but you can also manually create Policies, Pruners, Regularizers and Quantizers from code.","title":"High level overview"},{"location":"schedule.html#syntax-through-example","text":"We'll use alexnet.schedule_agp.yaml to explain some of the YAML syntax for configuring Sensitivity Pruning of Alexnet. version: 1 pruners: my_pruner: class: 'SensitivityPruner' sensitivities: 'features.module.0.weight': 0.25 'features.module.3.weight': 0.35 'features.module.6.weight': 0.40 'features.module.8.weight': 0.45 'features.module.10.weight': 0.55 'classifier.1.weight': 0.875 'classifier.4.weight': 0.875 'classifier.6.weight': 0.625 lr_schedulers: pruning_lr: class: ExponentialLR gamma: 0.9 policies: - pruner: instance_name : 'my_pruner' starting_epoch: 0 ending_epoch: 38 frequency: 2 - lr_scheduler: instance_name: pruning_lr starting_epoch: 24 ending_epoch: 200 frequency: 1 There is only one version of the YAML syntax, and the version number is not verified at the moment. However, to be future-proof it is probably better to let the YAML parser know that you are using version-1 syntax, in case there is ever a version 2. version: 1 In the pruners section, we define the instances of pruners we want the scheduler to instantiate and use. We define a single pruner instance, named my_pruner , of algorithm SensitivityPruner . We will refer to this instance in the Policies section. Then we list the sensitivity multipliers, \\(s\\), of each of the weight tensors. You may list as many Pruners as you want in this section, as long as each has a unique name. You can several types of pruners in one schedule. pruners: my_pruner: class: 'SensitivityPruner' sensitivities: 'features.module.0.weight': 0.25 'features.module.3.weight': 0.35 'features.module.6.weight': 0.40 'features.module.8.weight': 0.45 'features.module.10.weight': 0.55 'classifier.1.weight': 0.875 'classifier.4.weight': 0.875 'classifier.6.weight': 0.6 Next, we want to specify the learning-rate decay scheduling in the lr_schedulers section. We assign a name to this instance: pruning_lr . As in the pruners section, you may use any name, as long as all LR-schedulers have a unique name. At the moment, only one instance of LR-scheduler is allowed. The LR-scheduler must be a subclass of PyTorch's _LRScheduler . You can use any of the schedulers defined in torch.optim.lr_scheduler (see here ). In addition, we've implemented some additional schedulers in Distiller (see here ). The keyword arguments (kwargs) are passed directly to the LR-scheduler's constructor, so that as new LR-schedulers are added to torch.optim.lr_scheduler , they can be used without changing the application code. lr_schedulers: pruning_lr: class: ExponentialLR gamma: 0.9 Finally, we define the policies section which defines the actual scheduling. A Policy manages an instance of a Pruner , Regularizer , Quantizer , or LRScheduler , by naming the instance. In the example below, a PruningPolicy uses the pruner instance named my_pruner : it activates it at a frequency of 2 epochs (i.e. every other epoch), starting at epoch 0, and ending at epoch 38. policies: - pruner: instance_name : 'my_pruner' starting_epoch: 0 ending_epoch: 38 frequency: 2 - lr_scheduler: instance_name: pruning_lr starting_epoch: 24 ending_epoch: 200 frequency: 1 This is iterative pruning : Train Connectivity Prune Connections Retrain Weights Goto 2 It is described in Learning both Weights and Connections for Efficient Neural Networks : \"Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections...After an initial training phase, we remove all connections whose weight is lower than a threshold. This pruning converts a dense, fully-connected layer to a sparse layer. This first phase learns the topology of the networks \u2014 learning which connections are important and removing the unimportant connections. We then retrain the sparse network so the remaining connections can compensate for the connections that have been removed. The phases of pruning and retraining may be repeated iteratively to further reduce network complexity.\"","title":"Syntax through example"},{"location":"schedule.html#regularization","text":"You can also define and schedule regularization.","title":"Regularization"},{"location":"schedule.html#l1-regularization","text":"Format (this is an informal specification, not a valid ABNF specification): regularizers: REGULARIZER_NAME_STR : class: L1Regularizer reg_regims: PYTORCH_PARAM_NAME_STR : STRENGTH_FLOAT ... PYTORCH_PARAM_NAME_STR : STRENGTH_FLOAT threshold_criteria: [Mean_Abs | Max] For example: version: 1 regularizers: my_L1_reg: class: L1Regularizer reg_regims: 'module.layer3.1.conv1.weight': 0.000002 'module.layer3.1.conv2.weight': 0.000002 'module.layer3.1.conv3.weight': 0.000002 'module.layer3.2.conv1.weight': 0.000002 threshold_criteria: Mean_Abs policies: - regularizer: instance_name: my_L1_reg starting_epoch: 0 ending_epoch: 60 frequency: 1","title":"L1 regularization"},{"location":"schedule.html#group-regularization","text":"Format (informal specification): Format: regularizers: REGULARIZER_NAME_STR : class: L1Regularizer reg_regims: PYTORCH_PARAM_NAME_STR : [ STRENGTH_FLOAT , '2D' | '3D' | '4D' | 'Channels' | 'Cols' | 'Rows' ] PYTORCH_PARAM_NAME_STR : [ STRENGTH_FLOAT , '2D' | '3D' | '4D' | 'Channels' | 'Cols' | 'Rows' ] threshold_criteria: [Mean_Abs | Max] For example: version: 1 regularizers: my_filter_regularizer: class: GroupLassoRegularizer reg_regims: 'module.layer3.1.conv1.weight': [0.00005, '3D'] 'module.layer3.1.conv2.weight': [0.00005, '3D'] 'module.layer3.1.conv3.weight': [0.00005, '3D'] 'module.layer3.2.conv1.weight': [0.00005, '3D'] threshold_criteria: Mean_Abs policies: - regularizer: instance_name: my_filter_regularizer starting_epoch: 0 ending_epoch: 60 frequency: 1","title":"Group regularization"},{"location":"schedule.html#mixing-it-up","text":"You can mix pruning and regularization. version: 1 pruners: my_pruner: class: 'SensitivityPruner' sensitivities: 'features.module.0.weight': 0.25 'features.module.3.weight': 0.35 'features.module.6.weight': 0.40 'features.module.8.weight': 0.45 'features.module.10.weight': 0.55 'classifier.1.weight': 0.875 'classifier.4.weight': 0.875 'classifier.6.weight': 0.625 regularizers: 2d_groups_regularizer: class: GroupLassoRegularizer reg_regims: 'features.module.0.weight': [0.000012, '2D'] 'features.module.3.weight': [0.000012, '2D'] 'features.module.6.weight': [0.000012, '2D'] 'features.module.8.weight': [0.000012, '2D'] 'features.module.10.weight': [0.000012, '2D'] lr_schedulers: # Learning rate decay scheduler pruning_lr: class: ExponentialLR gamma: 0.9 policies: - pruner: instance_name : 'my_pruner' starting_epoch: 0 ending_epoch: 38 frequency: 2 - regularizer: instance_name: '2d_groups_regularizer' starting_epoch: 0 ending_epoch: 38 frequency: 1 - lr_scheduler: instance_name: pruning_lr starting_epoch: 24 ending_epoch: 200 frequency: 1","title":"Mixing it up"},{"location":"schedule.html#quantization-aware-training","text":"Similarly to pruners and regularizers, specifying a quantizer in the scheduler YAML follows the constructor arguments of the Quantizer class (see details here ). Note that only a single quantizer instance may be defined per YAML. Let's see an example: quantizers: dorefa_quantizer: class: DorefaQuantizer bits_activations: 8 bits_weights: 4 bits_overrides: conv1: wts: null acts: null relu1: wts: null acts: null final_relu: wts: null acts: null fc: wts: null acts: null The specific quantization method we're instantiating here is DorefaQuantizer . Then we define the default bit-widths for activations and weights, in this case 8 and 4-bits, respectively. Then, we define the bits_overrides mapping. In the example above, we choose not to quantize the first and last layer of the model. In the case of DorefaQuantizer , the weights are quantized as part of the convolution / FC layers, but the activations are quantized in separate layers, which replace the ReLU layers in the original model (remember - even though we replaced the ReLU modules with our own quantization modules, the name of the modules isn't changed). So, in all, we need to reference the first layer with parameters conv1 , the first activation layer relu1 , the last activation layer final_relu and the last layer with parameters fc . Specifying null means \"do not quantize\". Note that for quantizers, we reference names of modules, not names of parameters as we do for pruners and regularizers.","title":"Quantization-Aware Training"},{"location":"schedule.html#defining-overrides-for-groups-of-layers-using-regular-expressions","text":"Suppose we have a sub-module in our model named block1 , which contains multiple convolution layers which we would like to quantize to, say, 2-bits. The convolution layers are named conv1 , conv2 and so on. In that case we would define the following: bits_overrides: 'block1\\.conv*': wts: 2 acts: null RegEx Note : Remember that the dot ( . ) is a meta-character (i.e. a reserved character) in regular expressions. So, to match the actual dot characters which separate sub-modules in PyTorch module names, we need to escape it: \\. Overlapping patterns are also possible, which allows to define some override for a groups of layers and also \"single-out\" specific layers for different overrides. For example, let's take the last example and configure a different override for block1.conv1 : bits_overrides: 'block1\\.conv1': wts: 4 acts: null 'block1\\.conv*': wts: 2 acts: null Important Note : The patterns are evaluated eagerly - first match wins. So, to properly quantize a model using \"broad\" patterns and more \"specific\" patterns as just shown, make sure the specific pattern is listed before the broad one. The QuantizationPolicy , which controls the quantization procedure during training, is actually quite simplistic. All it does is call the prepare_model() function of the Quantizer when it's initialized, followed by the first call to quantize_params() . Then, at the end of each epoch, after the float copy of the weights has been updated, it calls the quantize_params() function again. policies: - quantizer: instance_name: dorefa_quantizer starting_epoch: 0 ending_epoch: 200 frequency: 1 Important Note : As mentioned here , since the quantizer modifies the model's parameters (assuming training with quantization in the loop is used), the call to prepare_model() must be performed before an optimizer is called. Therefore, currently, the starting epoch for a quantization policy must be 0, otherwise the quantization process will not work as expected. If one wishes to do a \"warm-startup\" (or \"boot-strapping\"), training for a few epochs with full precision and only then starting to quantize, the only way to do this right now is to execute a separate run to generate the boot-strapped weights, and execute a second which will resume the checkpoint with the boot-strapped weights.","title":"Defining overrides for groups of layers using regular expressions"},{"location":"schedule.html#post-training-quantization","text":"Post-training quantization differs from the other techniques described here. Since it is not executed during training, it does not require any Policies nor a Scheduler. Currently, the only method implemented for post-training quantization is range-based linear quantization . Quantizing a model using this method, requires adding 2 lines of code: quantizer = distiller.quantization.PostTrainLinearQuantizer(model, quantizer arguments ) quantizer.prepare_model() # Execute evaluation on model as usual See the documentation for PostTrainLinearQuantizer in range_linear.py for details on the available arguments. In addition to directly instantiating the quantizer with arguments, it can also be configured from a YAML file. The syntax for the YAML file is exactly the same as seen in the quantization-aware training section above. Not surprisingly, the class defined must be PostTrainLinearQuantizer , and any other components or policies defined in the YAML file are ignored. We'll see how to create the quantizer in this manner below. If more configurability is needed, a helper function can be used that will add a set of command-line arguments to configure the quantizer: parser = argparse.ArgumentParser() distiller.quantization.add_post_train_quant_args(parser) args = parser.parse_args() These are the available command line arguments: Arguments controlling quantization at evaluation time ( post-training quantization ): --quantize-eval, --qe Apply linear quantization to model before evaluation. Applicable only if --evaluate is also set --qe-calibration PORTION_OF_TEST_SET Run the model in evaluation mode on the specified portion of the test dataset and collect statistics. Ignores all other 'qe--*' arguments --qe-mode QE_MODE, --qem QE_MODE Linear quantization mode. Choices: sym | asym_s | asym_u --qe-bits-acts NUM_BITS, --qeba NUM_BITS Number of bits for quantization of activations --qe-bits-wts NUM_BITS, --qebw NUM_BITS Number of bits for quantization of weights --qe-bits-accum NUM_BITS Number of bits for quantization of the accumulator --qe-clip-acts, --qeca Enable clipping of activations using min/max values averaging over batch --qe-no-clip-layers LAYER_NAME [LAYER_NAME ...], --qencl LAYER_NAME [LAYER_NAME ...] List of layer names for which not to clip activations. Applicable only if --qe-clip-acts is also set --qe-per-channel, --qepc Enable per-channel quantization of weights (per output channel) --qe-stats-file PATH Path to YAML file with calibration stats. If not given, dynamic quantization will be run (Note that not all layer types are supported for dynamic quantization) --qe-config-file PATH Path to YAML file containing configuration for PostTrainLinearQuantizer (if present, all other --qe* arguments are ignored) (Note that --quantize-eval and --qe-calibration are mutually exclusive.) When using these command line arguments, the quantizer can be invoked as follows: if args.quantize_eval: if args.qe_config_file: quantizer = distiller.config_component_from_file_by_class(model, args.qe_config_file, 'PostTrainLinearQuantizer') else: quantizer = quantization.PostTrainLinearQuantizer(model, args.qe_bits_acts, args.qe_bits_wts, args.qe_bits_accum, None, args.qe_mode, args.qe_clip_acts, args.qe_no_clip_layers, args.qe_per_channel, args.qe_stats_file) quantizer.prepare_model() # Execute evaluation on model as usual Note that the command-line arguments don't expose the bits_overrides parameter of the quantizer, which allows fine-grained control over how each layer is quantized. To utilize this functionality, configure with a YAML file. To see integration of these command line arguments in use, see the image classification example . For examples invocations of post-training quantization see here .","title":"Post-Training Quantization"},{"location":"schedule.html#collecting-statistics-for-quantization","text":"To collect generate statistics that can be used for static quantization of activations, do the following (shown here assuming the command line argument --qe-calibration shown above is used, which specifies the number of batches to use for statistics generation): if args.qe_calibration: distiller.utils.assign_layer_fq_names(model) msglogger.info( Generating quantization calibration stats based on {0} users .format(args.qe_calibration)) collector = distiller.data_loggers.QuantCalibrationStatsCollector(model) with collector_context(collector): # Here call your model evaluation function, making sure to execute only # the portion of the dataset specified by the qe_calibration argument yaml_path = 'some/dir/quantization_stats.yaml' collector.save(yaml_path) The genreated YAML stats file can then be provided using the `--qe-stats-file argument. An example of a generated stats file can be found here .","title":"Collecting Statistics for Quantization"},{"location":"schedule.html#knowledge-distillation","text":"Knowledge distillation (see here ) is also implemented as a Policy , which should be added to the scheduler. However, with the current implementation, it cannot be defined within the YAML file like the rest of the policies described above. To make the integration of this method into applications a bit easier, a helper function can be used that will add a set of command-line arguments related to knowledge distillation: import argparse import distiller parser = argparse.ArgumentParser() distiller.knowledge_distillation.add_distillation_args(parser) (The add_distillation_args function accepts some optional arguments, see its implementation at distiller/knowledge_distillation.py for details) These are the command line arguments exposed by this function: Knowledge Distillation Training Arguments: --kd-teacher ARCH Model architecture for teacher model --kd-pretrained Use pre-trained model for teacher --kd-resume PATH Path to checkpoint from which to load teacher weights --kd-temperature TEMP, --kd-temp TEMP Knowledge distillation softmax temperature --kd-distill-wt WEIGHT, --kd-dw WEIGHT Weight for distillation loss (student vs. teacher soft targets) --kd-student-wt WEIGHT, --kd-sw WEIGHT Weight for student vs. labels loss --kd-teacher-wt WEIGHT, --kd-tw WEIGHT Weight for teacher vs. labels loss --kd-start-epoch EPOCH_NUM Epoch from which to enable distillation Once arguments have been parsed, some initialization code is required, similar to the following: # Assuming: # args variable holds command line arguments # model variable holds the model we're going to train, that is - the student model # compression_scheduler variable holds a CompressionScheduler instance args.kd_policy = None if args.kd_teacher: # Create teacher model - replace this with your model creation code teacher = create_model(args.kd_pretrained, args.dataset, args.kd_teacher, device_ids=args.gpus) if args.kd_resume: teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume) # Create policy and add to scheduler dlw = distiller.DistillationLossWeights(args.kd_distill_wt, args.kd_student_wt, args.kd_teacher_wt) args.kd_policy = distiller.KnowledgeDistillationPolicy(model, teacher, args.kd_temp, dlw) compression_scheduler.add_policy(args.kd_policy, starting_epoch=args.kd_start_epoch, ending_epoch=args.epochs, frequency=1) Finally, during the training loop, we need to perform forward propagation through the teacher model as well. The KnowledgeDistillationPolicy class keeps a reference to both the student and teacher models, and exposes a forward function that performs forward propagation on both of them. Since this is not one of the standard policy callbacks, we need to call this function manually from our training loop, as follows: if args.kd_policy is None: # Revert to a normal forward-prop call if no knowledge distillation policy is present output = model(input_var) else: output = args.kd_policy.forward(input_var) To see this integration in action, take a look at the image classification sample at examples/classifier_compression/compress_classifier.py .","title":"Knowledge Distillation"},{"location":"tutorial-lang_model.html","text":"Using Distiller to prune a PyTorch language model Contents Introduction Setup Preparing the code Training-loop Creating compression baselines Compressing the language model What are we compressing? How are we compressing? When are we compressing? Until next time Introduction In this tutorial I'll show you how to compress a word-level language model using Distiller . Specifically, we use PyTorch\u2019s word-level language model sample code as the code-base of our example, weave in some Distiller code, and show how we compress the model using two different element-wise pruning algorithms. To make things manageable, I've divided the tutorial to two parts: in the first we will setup the sample application and prune using AGP . In the second part I'll show how I've added Baidu's RNN pruning algorithm and then use it to prune the same word-level language model. The completed code is available here . The results are displayed below and the code is available here . Note that we can improve the results by training longer, since the loss curves are usually still decreasing at the end of epoch 40. However, for demonstration purposes we don\u2019t need to do this. Type Sparsity NNZ Validation Test Command line Small 0% 7,135,600 101.13 96.29 time python3 main.py --cuda --epochs 40 --tied --wd=1e-6 Medium 0% 28,390,700 88.17 84.21 time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied,--wd=1e-6 Large 0% 85,917,000 87.49 83.85 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-6 Large 70% 25,487,550 90.67 85.96 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml Large 70% 25,487,550 90.59 85.84 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml --wd=1e-6 Large 70% 25,487,550 87.40 82.93 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70B.schedule_agp.yaml --wd=1e-6 Large 80.4% 16,847,550 89.31 83.64 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_80.schedule_agp.yaml --wd=1e-6 Large 90% 8,591,700 90.70 85.67 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_90.schedule_agp.yaml --wd=1e-6 Large 95% 4,295,850 98.42 92.79 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_95.schedule_agp.yaml --wd=1e-6 Table 1: AGP language model pruning results. NNZ stands for number of non-zero coefficients (embeddings are counted once, because they are tied). Figure 1: Perplexity vs model size (lower perplexity is better). The model is composed of an Encoder embedding, two LSTMs, and a Decoder embedding. The Encoder and decoder embeddings (projections) are tied to improve perplexity results (per https://arxiv.org/pdf/1611.01462.pdf), so in the sparsity statistics we account for only one of the encoder/decoder embeddings. We used the WikiText2 dataset (twice as large as PTB). We compared three model sizes: small (7.1M; 14M), medium (28M; 50M), large: (86M; 136M) \u2013 reported as (#parameters net/tied; #parameters gross). The results reported below use a preset seed (for reproducibility), and we expect results can be improved if we allow \u201ctrue\u201d pseudo-randomness. We limited our tests to 40 epochs, even though validation perplexity was still trending down. Essentially, this recreates the language model experiment in the AGP paper, and validates its conclusions: \u201cWe see that sparse models are able to outperform dense models which have significantly more parameters.\u201d The 80% sparse large model (which has 16.9M parameters and a perplexity of 83.64) is able to outperform the dense medium (which has 28.4M parameters and a perplexity of 84.21), a model which has 1.7 times more parameters. It also outperform the dense large model, which exemplifies how pruning can act as a regularizer. * \u201cOur results show that pruning works very well not only on the dense LSTM weights and dense softmax layer but also the dense embedding matrix. This suggests that during the optimization procedure the neural network can find a good sparse embedding for the words in the vocabulary that works well together with the sparse connectivity structure of the LSTM weights and softmax layer.\u201d Setup We start by cloning Pytorch\u2019s example repository . I\u2019ve copied the language model code to distiller\u2019s examples/word_language_model directory, so I\u2019ll use that for the rest of the tutorial. Next, let\u2019s create and activate a virtual environment, as explained in Distiller's README file. Now we can turn our attention to main.py , which contains the training application. Preparing the code We begin by adding code to invoke Distiller in file main.py . This involves a bit of mechanics, because we did not pip install Distiller in our environment (we don\u2019t have a setup.py script for Distiller as of yet). To make Distiller library functions accessible from main.py , we modify sys.path to include the distiller root directory by taking the current directory and pointing two directories up. This is very specific to the location of this example code, and it will break if you\u2019ve placed the code elsewhere \u2013 so be aware. import os import sys script_dir = os.path.dirname(__file__) module_path = os.path.abspath(os.path.join(script_dir, '..', '..')) if module_path not in sys.path: sys.path.append(module_path) import distiller import apputils from distiller.data_loggers import TensorBoardLogger, PythonLogger Next, we augment the application arguments with two Distiller-specific arguments. The first, --summary , gives us the ability to do simple compression instrumentation (e.g. log sparsity statistics). The second argument, --compress , is how we tell the application where the compression scheduling file is located. We also add two arguments - momentum and weight-decay - for the SGD optimizer. As I explain later, I replaced the original code's optimizer with SGD, so we need these extra arguments. # Distiller-related arguments SUMMARY_CHOICES = ['sparsity', 'model', 'modules', 'png', 'percentile'] parser.add_argument('--summary', type=str, choices=SUMMARY_CHOICES, help='print a summary of the model, and exit - options: ' + ' | '.join(SUMMARY_CHOICES)) parser.add_argument('--compress', dest='compress', type=str, nargs='?', action='store', help='configuration file for pruning the model (default is to use hard-coded schedule)') parser.add_argument('--momentum', default=0., type=float, metavar='M', help='momentum') parser.add_argument('--weight-decay', '--wd', default=0., type=float, metavar='W', help='weight decay (default: 1e-4)') We add code to handle the --summary application argument. It can be as simple as forwarding to distiller.model_summary or more complex, as in the Distiller sample. if args.summary: distiller.model_summary(model, None, args.summary, 'wikitext2') exit(0) Similarly, we add code to handle the --compress argument, which creates a CompressionScheduler and configures it from a YAML schedule file: if args.compress: source = args.compress compression_scheduler = distiller.CompressionScheduler(model) distiller.config.fileConfig(model, None, compression_scheduler, args.compress, msglogger) We also create the optimizer, and the learning-rate decay policy scheduler. The original PyTorch example manually manages the optimization and LR decay process, but I think that having a standard optimizer and LR-decay schedule gives us the flexibility to experiment with these during the training process. Using an SGD optimizer configured with momentum=0 and weight_decay=0 , and a ReduceLROnPlateau LR-decay policy with patience=0 and factor=0.5 will give the same behavior as in the original PyTorch example. From there, we can experiment with the optimizer and LR-decay configuration. optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=0, verbose=True, factor=0.5) Next, we add code to setup the logging backends: a Python logger backend which reads its configuration from file and logs messages to the console and log file ( pylogger ); and a TensorBoard backend logger which logs statistics to a TensorBoard data file ( tflogger ). I configured the TensorBoard backend to log gradients because RNNs suffer from vanishing and exploding gradients, so we might want to take a look in case the training experiences a sudden failure. This code is not strictly required, but it is quite useful to be able to log the session progress, and to export logs to TensorBoard for realtime visualization of the training progress. # Distiller loggers msglogger = apputils.config_pylogger('logging.conf', None) tflogger = TensorBoardLogger(msglogger.logdir) tflogger.log_gradients = True pylogger = PythonLogger(msglogger) Training loop Now we scroll down all the way to the train() function. We'll change its signature to include the epoch , optimizer , and compression_schdule . We'll soon see why we need these. def train(epoch, optimizer, compression_scheduler=None) Function train() is responsible for training the network in batches for one epoch, and in its epoch loop we want to perform compression. The CompressionScheduler invokes ScheduledTrainingPolicy instances per the scheduling specification that was programmed in the CompressionScheduler instance. There are four main SchedulingPolicy types: PruningPolicy , RegularizationPolicy , LRPolicy , and QuantizationPolicy . We'll be using PruningPolicy , which is triggered on_epoch_begin (to invoke the Pruners , and on_minibatch_begin (to mask the weights). Later we will create a YAML scheduling file, and specify the schedule of AutomatedGradualPruner instances. Because we are writing a single application, which can be used with various Policies in the future (e.g. group-lasso regularization), we should add code to invoke all of the CompressionScheduler 's callbacks, not just the mandatory on_epoch_begin callback. We invoke on_minibatch_begin before running the forward-pass, before_backward_pass after computing the loss, and on_minibatch_end after completing the backward-pass. def train(epoch, optimizer, compression_scheduler=None): ... # The line below was fixed as per: https://github.com/pytorch/examples/issues/214 for batch, i in enumerate(range(0, train_data.size(0), args.bptt)): data, targets = get_batch(train_data, i) # Starting each batch, we detach the hidden state from how it was previously produced. # If we didn't, the model would try backpropagating all the way to start of the dataset. hidden = repackage_hidden(hidden) if compression_scheduler: compression_scheduler.on_minibatch_begin(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch) output, hidden = model(data, hidden) loss = criterion(output.view(-1, ntokens), targets) if compression_scheduler: compression_scheduler.before_backward_pass(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch, loss=loss) optimizer.zero_grad() loss.backward() # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs. torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) optimizer.step() total_loss += loss.item() if compression_scheduler: compression_scheduler.on_minibatch_end(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch) The rest of the code could stay as in the original PyTorch sample, but I wanted to use an SGD optimizer, so I replaced: for p in model.parameters(): p.data.add_(-lr, p.grad.data) with: optimizer.step() The rest of the code in function train() logs to a text file and a TensorBoard backend. Again, such code is not mandatory, but a few lines give us a lot of visibility: we have training progress information saved to log, and we can monitor the training progress in realtime on TensorBoard. That's a lot for a few lines of code ;-) if batch % args.log_interval == 0 and batch > 0: cur_loss = total_loss / args.log_interval elapsed = time.time() - start_time lr = optimizer.param_groups[0]['lr'] msglogger.info( '| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.4f} | ms/batch {:5.2f} ' '| loss {:5.2f} | ppl {:8.2f}'.format( epoch, batch, len(train_data) // args.bptt, lr, elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss))) total_loss = 0 start_time = time.time() stats = ('Peformance/Training/', OrderedDict([ ('Loss', cur_loss), ('Perplexity', math.exp(cur_loss)), ('LR', lr), ('Batch Time', elapsed * 1000)]) ) steps_completed = batch + 1 distiller.log_training_progress(stats, model.named_parameters(), epoch, steps_completed, steps_per_epoch, args.log_interval, [tflogger]) Finally we get to the outer training-loop which loops on args.epochs . We add the two final CompressionScheduler callbacks: on_epoch_begin , at the start of the loop, and on_epoch_end after running evaluate on the model and updating the learning-rate. try: for epoch in range(0, args.epochs): epoch_start_time = time.time() if compression_scheduler: compression_scheduler.on_epoch_begin(epoch) train(epoch, optimizer, compression_scheduler) val_loss = evaluate(val_data) lr_scheduler.step(val_loss) if compression_scheduler: compression_scheduler.on_epoch_end(epoch) And that's it! The language model sample is ready for compression. Creating compression baselines In To prune, or not to prune: exploring the efficacy of pruning for model compression Zhu and Gupta, \"compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint.\" They also \"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning.\" This pruning schedule is implemented by distiller.AutomatedGradualPruner (AGP), which increases the sparsity level (expressed as a percentage of zero-valued elements) gradually over several pruning steps. Distiller's implementation only prunes elements once in an epoch (the model is fine-tuned in between pruning events), which is a small deviation from Zhu and Gupta's paper. The research paper specifies the schedule in terms of mini-batches, while our implementation specifies the schedule in terms of epochs. We feel that using epochs performs well, and is more \"stable\", since the number of mini-batches will change, if you change the batch size. Before we start compressing stuff ;-), we need to create baselines so we have something to benchmark against. Let's prepare small, medium, and large baseline models, like Table 3 of To prune, or Not to Prune . These will provide baseline perplexity results that we'll compare the compressed models against. I chose to use tied input/output embeddings, and constrained the training to 40 epochs. The table below shows the model sizes, where we are interested in the tied version (biases are ignored due to their small size and because we don't prune them). Size Number of Weights (untied) Number of Weights (tied) Small 13,951,200 7,295,600 Medium 50,021,400 28,390,700 Large 135,834,000 85,917,000 I started experimenting with the optimizer setup like in the PyTorch example, but I added some L2 regularization when I noticed that the training was overfitting. The two right columns show the perplexity results (lower is better) of each of the models with no L2 regularization and with 1e-5 and 1e-6. In all three model sizes using the smaller L2 regularization (1e-6) gave the best results. BTW, I'm not showing here experiments with even lower regularization because that did not help. Type Command line Validation Test Small time python3 main.py --cuda --epochs 40 --tied 105.23 99.53 Small time python3 main.py --cuda --epochs 40 --tied --wd=1e-6 101.13 96.29 Small time python3 main.py --cuda --epochs 40 --tied --wd=1e-5 109.49 103.53 Medium time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied 90.93 86.20 Medium time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --wd=1e-6 88.17 84.21 Medium time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --wd=1e-5 97.75 93.06 Large time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied 88.23 84.21 Large time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-6 87.49 83.85 Large time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-5 99.22 94.28 Compressing the language model OK, so now let's recreate the results of the language model experiment from section 4.2 of paper. We're using PyTorch's sample, so the language model we implement is not exactly like the one in the AGP paper (and uses a different dataset), but it's close enough, so if everything goes well, we should see similar compression results. What are we compressing? To gain insight about the model parameters, we can use the command-line to produce a weights-sparsity table: $ python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --summary=sparsity Parameters: +---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ | | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean | |---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------| | 0.00000 | encoder.weight | (33278, 1500) | 49917000 | 49916999 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.05773 | -0.00000 | 0.05000 | | 1.00000 | rnn.weight_ih_l0 | (6000, 1500) | 9000000 | 9000000 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.01491 | 0.00001 | 0.01291 | | 2.00000 | rnn.weight_hh_l0 | (6000, 1500) | 9000000 | 8999999 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00001 | 0.01491 | 0.00000 | 0.01291 | | 3.00000 | rnn.weight_ih_l1 | (6000, 1500) | 9000000 | 8999999 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00001 | 0.01490 | -0.00000 | 0.01291 | | 4.00000 | rnn.weight_hh_l1 | (6000, 1500) | 9000000 | 9000000 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.01491 | -0.00000 | 0.01291 | | 5.00000 | decoder.weight | (33278, 1500) | 49917000 | 49916999 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.05773 | -0.00000 | 0.05000 | | 6.00000 | Total sparsity: | - | 135834000 | 135833996 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | +---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ Total sparsity: 0.00 So what's going on here? encoder.weight and decoder.weight are the input and output embeddings, respectively. Remember that in the configuration I chose for the three model sizes these embeddings are tied, which means that we only have one copy of parameters, that is shared between the encoder and decoder. We also have two pairs of RNN (LSTM really) parameters. There is a pair because the model uses the command-line argument args.nlayers to decide how many instances of RNN (or LSTM or GRU) cells to use, and it defaults to 2. The recurrent cells are LSTM cells, because this is the default of args.model , which is used in the initialization of RNNModel . Let's look at the parameters of the first RNN: rnn.weight_ih_l0 and rnn.weight_hh_l0 : what are these? Recall the LSTM equations that PyTorch implements. In the equations, there are 8 instances of vector-matrix multiplication (when batch=1). These can be combined into a single matrix-matrix multiplication (GEMM), but PyTorch groups these into two GEMM operations: one GEMM multiplies the inputs ( rnn.weight_ih_l0 ), and the other multiplies the hidden-state ( rnn.weight_hh_l0 ). How are we compressing? Let's turn to the configurations of the Large language model compression schedule to 70%, 80%, 90% and 95% sparsity. Using AGP it is easy to configure the pruning schedule to produce an exact sparsity of the compressed model. I'll use the 70% schedule to show a concrete example. The YAML file has two sections: pruners and policies . Section pruners defines instances of ParameterPruner - in our case we define three instances of AutomatedGradualPruner : for the weights of the first RNN ( l0_rnn_pruner ), the second RNN ( l1_rnn_pruner ) and the embedding layer ( embedding_pruner ). These names are arbitrary, and serve are name-handles which bind Policies to Pruners - so you can use whatever names you want. Each AutomatedGradualPruner is configured with an initial_sparsity and final_sparsity . For examples, the l0_rnn_pruner below is configured to prune 5% of the weights as soon as it starts working, and finish when 70% of the weights have been pruned. The weights parameter tells the Pruner which weight tensors to prune. pruners: l0_rnn_pruner: class: AutomatedGradualPruner initial_sparsity : 0.05 final_sparsity: 0.70 weights: [rnn.weight_ih_l0, rnn.weight_hh_l0] l1_rnn_pruner: class: AutomatedGradualPruner initial_sparsity : 0.05 final_sparsity: 0.70 weights: [rnn.weight_ih_l1, rnn.weight_hh_l1] embedding_pruner: class: AutomatedGradualPruner initial_sparsity : 0.05 final_sparsity: 0.70 weights: [encoder.weight] When are we compressing? If the pruners section defines \"what-to-do\", the policies section defines \"when-to-do\". This part is harder, because we define the pruning schedule, which requires us to try a few different schedules until we understand which schedule works best. Below we define three PruningPolicy instances. The first two instances start operating at epoch 2 ( starting_epoch ), end at epoch 20 ( ending_epoch ), and operate once every epoch ( frequency ; as I explained above, Distiller's Pruning scheduling operates only at on_epoch_begin ). In between pruning operations, the pruned model is fine-tuned. policies: - pruner: instance_name : l0_rnn_pruner starting_epoch: 2 ending_epoch: 20 frequency: 1 - pruner: instance_name : l1_rnn_pruner starting_epoch: 2 ending_epoch: 20 frequency: 1 - pruner: instance_name : embedding_pruner starting_epoch: 3 ending_epoch: 21 frequency: 1 We invoke the compression as follows: $ time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml Table 1 above shows that we can make a negligible improvement when adding L2 regularization. I did some experimenting with the sparsity distribution between the layers, and the scheduling frequency and noticed that the embedding layers are much less sensitive to pruning than the RNN cells. I didn't notice any difference between the RNN cells, but I also didn't invest in this exploration. A new 70% sparsity schedule , prunes the RNNs only to 50% sparsity, but prunes the embedding to 85% sparsity, and achieves almost a 3 points improvement in the test perplexity results. We provide similar pruning schedules for the other compression rates. Until next time This concludes the first part of the tutorial on pruning a PyTorch language model. In the next installment, I'll explain how we added an implementation of Baidu Research's Exploring Sparsity in Recurrent Neural Networks paper, and applied to this language model. Geek On.","title":"Pruning a Language Model"},{"location":"tutorial-lang_model.html#using-distiller-to-prune-a-pytorch-language-model","text":"","title":"Using Distiller to prune a PyTorch language model"},{"location":"tutorial-lang_model.html#contents","text":"Introduction Setup Preparing the code Training-loop Creating compression baselines Compressing the language model What are we compressing? How are we compressing? When are we compressing? Until next time","title":"Contents"},{"location":"tutorial-lang_model.html#introduction","text":"In this tutorial I'll show you how to compress a word-level language model using Distiller . Specifically, we use PyTorch\u2019s word-level language model sample code as the code-base of our example, weave in some Distiller code, and show how we compress the model using two different element-wise pruning algorithms. To make things manageable, I've divided the tutorial to two parts: in the first we will setup the sample application and prune using AGP . In the second part I'll show how I've added Baidu's RNN pruning algorithm and then use it to prune the same word-level language model. The completed code is available here . The results are displayed below and the code is available here . Note that we can improve the results by training longer, since the loss curves are usually still decreasing at the end of epoch 40. However, for demonstration purposes we don\u2019t need to do this. Type Sparsity NNZ Validation Test Command line Small 0% 7,135,600 101.13 96.29 time python3 main.py --cuda --epochs 40 --tied --wd=1e-6 Medium 0% 28,390,700 88.17 84.21 time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied,--wd=1e-6 Large 0% 85,917,000 87.49 83.85 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-6 Large 70% 25,487,550 90.67 85.96 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml Large 70% 25,487,550 90.59 85.84 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml --wd=1e-6 Large 70% 25,487,550 87.40 82.93 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70B.schedule_agp.yaml --wd=1e-6 Large 80.4% 16,847,550 89.31 83.64 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_80.schedule_agp.yaml --wd=1e-6 Large 90% 8,591,700 90.70 85.67 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_90.schedule_agp.yaml --wd=1e-6 Large 95% 4,295,850 98.42 92.79 time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_95.schedule_agp.yaml --wd=1e-6 Table 1: AGP language model pruning results. NNZ stands for number of non-zero coefficients (embeddings are counted once, because they are tied). Figure 1: Perplexity vs model size (lower perplexity is better). The model is composed of an Encoder embedding, two LSTMs, and a Decoder embedding. The Encoder and decoder embeddings (projections) are tied to improve perplexity results (per https://arxiv.org/pdf/1611.01462.pdf), so in the sparsity statistics we account for only one of the encoder/decoder embeddings. We used the WikiText2 dataset (twice as large as PTB). We compared three model sizes: small (7.1M; 14M), medium (28M; 50M), large: (86M; 136M) \u2013 reported as (#parameters net/tied; #parameters gross). The results reported below use a preset seed (for reproducibility), and we expect results can be improved if we allow \u201ctrue\u201d pseudo-randomness. We limited our tests to 40 epochs, even though validation perplexity was still trending down. Essentially, this recreates the language model experiment in the AGP paper, and validates its conclusions: \u201cWe see that sparse models are able to outperform dense models which have significantly more parameters.\u201d The 80% sparse large model (which has 16.9M parameters and a perplexity of 83.64) is able to outperform the dense medium (which has 28.4M parameters and a perplexity of 84.21), a model which has 1.7 times more parameters. It also outperform the dense large model, which exemplifies how pruning can act as a regularizer. * \u201cOur results show that pruning works very well not only on the dense LSTM weights and dense softmax layer but also the dense embedding matrix. This suggests that during the optimization procedure the neural network can find a good sparse embedding for the words in the vocabulary that works well together with the sparse connectivity structure of the LSTM weights and softmax layer.\u201d","title":"Introduction"},{"location":"tutorial-lang_model.html#setup","text":"We start by cloning Pytorch\u2019s example repository . I\u2019ve copied the language model code to distiller\u2019s examples/word_language_model directory, so I\u2019ll use that for the rest of the tutorial. Next, let\u2019s create and activate a virtual environment, as explained in Distiller's README file. Now we can turn our attention to main.py , which contains the training application.","title":"Setup"},{"location":"tutorial-lang_model.html#preparing-the-code","text":"We begin by adding code to invoke Distiller in file main.py . This involves a bit of mechanics, because we did not pip install Distiller in our environment (we don\u2019t have a setup.py script for Distiller as of yet). To make Distiller library functions accessible from main.py , we modify sys.path to include the distiller root directory by taking the current directory and pointing two directories up. This is very specific to the location of this example code, and it will break if you\u2019ve placed the code elsewhere \u2013 so be aware. import os import sys script_dir = os.path.dirname(__file__) module_path = os.path.abspath(os.path.join(script_dir, '..', '..')) if module_path not in sys.path: sys.path.append(module_path) import distiller import apputils from distiller.data_loggers import TensorBoardLogger, PythonLogger Next, we augment the application arguments with two Distiller-specific arguments. The first, --summary , gives us the ability to do simple compression instrumentation (e.g. log sparsity statistics). The second argument, --compress , is how we tell the application where the compression scheduling file is located. We also add two arguments - momentum and weight-decay - for the SGD optimizer. As I explain later, I replaced the original code's optimizer with SGD, so we need these extra arguments. # Distiller-related arguments SUMMARY_CHOICES = ['sparsity', 'model', 'modules', 'png', 'percentile'] parser.add_argument('--summary', type=str, choices=SUMMARY_CHOICES, help='print a summary of the model, and exit - options: ' + ' | '.join(SUMMARY_CHOICES)) parser.add_argument('--compress', dest='compress', type=str, nargs='?', action='store', help='configuration file for pruning the model (default is to use hard-coded schedule)') parser.add_argument('--momentum', default=0., type=float, metavar='M', help='momentum') parser.add_argument('--weight-decay', '--wd', default=0., type=float, metavar='W', help='weight decay (default: 1e-4)') We add code to handle the --summary application argument. It can be as simple as forwarding to distiller.model_summary or more complex, as in the Distiller sample. if args.summary: distiller.model_summary(model, None, args.summary, 'wikitext2') exit(0) Similarly, we add code to handle the --compress argument, which creates a CompressionScheduler and configures it from a YAML schedule file: if args.compress: source = args.compress compression_scheduler = distiller.CompressionScheduler(model) distiller.config.fileConfig(model, None, compression_scheduler, args.compress, msglogger) We also create the optimizer, and the learning-rate decay policy scheduler. The original PyTorch example manually manages the optimization and LR decay process, but I think that having a standard optimizer and LR-decay schedule gives us the flexibility to experiment with these during the training process. Using an SGD optimizer configured with momentum=0 and weight_decay=0 , and a ReduceLROnPlateau LR-decay policy with patience=0 and factor=0.5 will give the same behavior as in the original PyTorch example. From there, we can experiment with the optimizer and LR-decay configuration. optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=0, verbose=True, factor=0.5) Next, we add code to setup the logging backends: a Python logger backend which reads its configuration from file and logs messages to the console and log file ( pylogger ); and a TensorBoard backend logger which logs statistics to a TensorBoard data file ( tflogger ). I configured the TensorBoard backend to log gradients because RNNs suffer from vanishing and exploding gradients, so we might want to take a look in case the training experiences a sudden failure. This code is not strictly required, but it is quite useful to be able to log the session progress, and to export logs to TensorBoard for realtime visualization of the training progress. # Distiller loggers msglogger = apputils.config_pylogger('logging.conf', None) tflogger = TensorBoardLogger(msglogger.logdir) tflogger.log_gradients = True pylogger = PythonLogger(msglogger)","title":"Preparing the code"},{"location":"tutorial-lang_model.html#training-loop","text":"Now we scroll down all the way to the train() function. We'll change its signature to include the epoch , optimizer , and compression_schdule . We'll soon see why we need these. def train(epoch, optimizer, compression_scheduler=None) Function train() is responsible for training the network in batches for one epoch, and in its epoch loop we want to perform compression. The CompressionScheduler invokes ScheduledTrainingPolicy instances per the scheduling specification that was programmed in the CompressionScheduler instance. There are four main SchedulingPolicy types: PruningPolicy , RegularizationPolicy , LRPolicy , and QuantizationPolicy . We'll be using PruningPolicy , which is triggered on_epoch_begin (to invoke the Pruners , and on_minibatch_begin (to mask the weights). Later we will create a YAML scheduling file, and specify the schedule of AutomatedGradualPruner instances. Because we are writing a single application, which can be used with various Policies in the future (e.g. group-lasso regularization), we should add code to invoke all of the CompressionScheduler 's callbacks, not just the mandatory on_epoch_begin callback. We invoke on_minibatch_begin before running the forward-pass, before_backward_pass after computing the loss, and on_minibatch_end after completing the backward-pass. def train(epoch, optimizer, compression_scheduler=None): ... # The line below was fixed as per: https://github.com/pytorch/examples/issues/214 for batch, i in enumerate(range(0, train_data.size(0), args.bptt)): data, targets = get_batch(train_data, i) # Starting each batch, we detach the hidden state from how it was previously produced. # If we didn't, the model would try backpropagating all the way to start of the dataset. hidden = repackage_hidden(hidden) if compression_scheduler: compression_scheduler.on_minibatch_begin(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch) output, hidden = model(data, hidden) loss = criterion(output.view(-1, ntokens), targets) if compression_scheduler: compression_scheduler.before_backward_pass(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch, loss=loss) optimizer.zero_grad() loss.backward() # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs. torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) optimizer.step() total_loss += loss.item() if compression_scheduler: compression_scheduler.on_minibatch_end(epoch, minibatch_id=batch, minibatches_per_epoch=steps_per_epoch) The rest of the code could stay as in the original PyTorch sample, but I wanted to use an SGD optimizer, so I replaced: for p in model.parameters(): p.data.add_(-lr, p.grad.data) with: optimizer.step() The rest of the code in function train() logs to a text file and a TensorBoard backend. Again, such code is not mandatory, but a few lines give us a lot of visibility: we have training progress information saved to log, and we can monitor the training progress in realtime on TensorBoard. That's a lot for a few lines of code ;-) if batch % args.log_interval == 0 and batch > 0: cur_loss = total_loss / args.log_interval elapsed = time.time() - start_time lr = optimizer.param_groups[0]['lr'] msglogger.info( '| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.4f} | ms/batch {:5.2f} ' '| loss {:5.2f} | ppl {:8.2f}'.format( epoch, batch, len(train_data) // args.bptt, lr, elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss))) total_loss = 0 start_time = time.time() stats = ('Peformance/Training/', OrderedDict([ ('Loss', cur_loss), ('Perplexity', math.exp(cur_loss)), ('LR', lr), ('Batch Time', elapsed * 1000)]) ) steps_completed = batch + 1 distiller.log_training_progress(stats, model.named_parameters(), epoch, steps_completed, steps_per_epoch, args.log_interval, [tflogger]) Finally we get to the outer training-loop which loops on args.epochs . We add the two final CompressionScheduler callbacks: on_epoch_begin , at the start of the loop, and on_epoch_end after running evaluate on the model and updating the learning-rate. try: for epoch in range(0, args.epochs): epoch_start_time = time.time() if compression_scheduler: compression_scheduler.on_epoch_begin(epoch) train(epoch, optimizer, compression_scheduler) val_loss = evaluate(val_data) lr_scheduler.step(val_loss) if compression_scheduler: compression_scheduler.on_epoch_end(epoch) And that's it! The language model sample is ready for compression.","title":"Training loop"},{"location":"tutorial-lang_model.html#creating-compression-baselines","text":"In To prune, or not to prune: exploring the efficacy of pruning for model compression Zhu and Gupta, \"compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint.\" They also \"propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning.\" This pruning schedule is implemented by distiller.AutomatedGradualPruner (AGP), which increases the sparsity level (expressed as a percentage of zero-valued elements) gradually over several pruning steps. Distiller's implementation only prunes elements once in an epoch (the model is fine-tuned in between pruning events), which is a small deviation from Zhu and Gupta's paper. The research paper specifies the schedule in terms of mini-batches, while our implementation specifies the schedule in terms of epochs. We feel that using epochs performs well, and is more \"stable\", since the number of mini-batches will change, if you change the batch size. Before we start compressing stuff ;-), we need to create baselines so we have something to benchmark against. Let's prepare small, medium, and large baseline models, like Table 3 of To prune, or Not to Prune . These will provide baseline perplexity results that we'll compare the compressed models against. I chose to use tied input/output embeddings, and constrained the training to 40 epochs. The table below shows the model sizes, where we are interested in the tied version (biases are ignored due to their small size and because we don't prune them). Size Number of Weights (untied) Number of Weights (tied) Small 13,951,200 7,295,600 Medium 50,021,400 28,390,700 Large 135,834,000 85,917,000 I started experimenting with the optimizer setup like in the PyTorch example, but I added some L2 regularization when I noticed that the training was overfitting. The two right columns show the perplexity results (lower is better) of each of the models with no L2 regularization and with 1e-5 and 1e-6. In all three model sizes using the smaller L2 regularization (1e-6) gave the best results. BTW, I'm not showing here experiments with even lower regularization because that did not help. Type Command line Validation Test Small time python3 main.py --cuda --epochs 40 --tied 105.23 99.53 Small time python3 main.py --cuda --epochs 40 --tied --wd=1e-6 101.13 96.29 Small time python3 main.py --cuda --epochs 40 --tied --wd=1e-5 109.49 103.53 Medium time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied 90.93 86.20 Medium time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --wd=1e-6 88.17 84.21 Medium time python3 main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --wd=1e-5 97.75 93.06 Large time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied 88.23 84.21 Large time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-6 87.49 83.85 Large time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --wd=1e-5 99.22 94.28","title":"Creating compression baselines"},{"location":"tutorial-lang_model.html#compressing-the-language-model","text":"OK, so now let's recreate the results of the language model experiment from section 4.2 of paper. We're using PyTorch's sample, so the language model we implement is not exactly like the one in the AGP paper (and uses a different dataset), but it's close enough, so if everything goes well, we should see similar compression results.","title":"Compressing the language model"},{"location":"tutorial-lang_model.html#what-are-we-compressing","text":"To gain insight about the model parameters, we can use the command-line to produce a weights-sparsity table: $ python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --summary=sparsity Parameters: +---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ | | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean | |---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------| | 0.00000 | encoder.weight | (33278, 1500) | 49917000 | 49916999 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.05773 | -0.00000 | 0.05000 | | 1.00000 | rnn.weight_ih_l0 | (6000, 1500) | 9000000 | 9000000 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.01491 | 0.00001 | 0.01291 | | 2.00000 | rnn.weight_hh_l0 | (6000, 1500) | 9000000 | 8999999 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00001 | 0.01491 | 0.00000 | 0.01291 | | 3.00000 | rnn.weight_ih_l1 | (6000, 1500) | 9000000 | 8999999 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00001 | 0.01490 | -0.00000 | 0.01291 | | 4.00000 | rnn.weight_hh_l1 | (6000, 1500) | 9000000 | 9000000 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.01491 | -0.00000 | 0.01291 | | 5.00000 | decoder.weight | (33278, 1500) | 49917000 | 49916999 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.05773 | -0.00000 | 0.05000 | | 6.00000 | Total sparsity: | - | 135834000 | 135833996 | 0.00000 | 0.00000 | 0 | 0.00000 | 0 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | +---------+------------------+---------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ Total sparsity: 0.00 So what's going on here? encoder.weight and decoder.weight are the input and output embeddings, respectively. Remember that in the configuration I chose for the three model sizes these embeddings are tied, which means that we only have one copy of parameters, that is shared between the encoder and decoder. We also have two pairs of RNN (LSTM really) parameters. There is a pair because the model uses the command-line argument args.nlayers to decide how many instances of RNN (or LSTM or GRU) cells to use, and it defaults to 2. The recurrent cells are LSTM cells, because this is the default of args.model , which is used in the initialization of RNNModel . Let's look at the parameters of the first RNN: rnn.weight_ih_l0 and rnn.weight_hh_l0 : what are these? Recall the LSTM equations that PyTorch implements. In the equations, there are 8 instances of vector-matrix multiplication (when batch=1). These can be combined into a single matrix-matrix multiplication (GEMM), but PyTorch groups these into two GEMM operations: one GEMM multiplies the inputs ( rnn.weight_ih_l0 ), and the other multiplies the hidden-state ( rnn.weight_hh_l0 ).","title":"What are we compressing?"},{"location":"tutorial-lang_model.html#how-are-we-compressing","text":"Let's turn to the configurations of the Large language model compression schedule to 70%, 80%, 90% and 95% sparsity. Using AGP it is easy to configure the pruning schedule to produce an exact sparsity of the compressed model. I'll use the 70% schedule to show a concrete example. The YAML file has two sections: pruners and policies . Section pruners defines instances of ParameterPruner - in our case we define three instances of AutomatedGradualPruner : for the weights of the first RNN ( l0_rnn_pruner ), the second RNN ( l1_rnn_pruner ) and the embedding layer ( embedding_pruner ). These names are arbitrary, and serve are name-handles which bind Policies to Pruners - so you can use whatever names you want. Each AutomatedGradualPruner is configured with an initial_sparsity and final_sparsity . For examples, the l0_rnn_pruner below is configured to prune 5% of the weights as soon as it starts working, and finish when 70% of the weights have been pruned. The weights parameter tells the Pruner which weight tensors to prune. pruners: l0_rnn_pruner: class: AutomatedGradualPruner initial_sparsity : 0.05 final_sparsity: 0.70 weights: [rnn.weight_ih_l0, rnn.weight_hh_l0] l1_rnn_pruner: class: AutomatedGradualPruner initial_sparsity : 0.05 final_sparsity: 0.70 weights: [rnn.weight_ih_l1, rnn.weight_hh_l1] embedding_pruner: class: AutomatedGradualPruner initial_sparsity : 0.05 final_sparsity: 0.70 weights: [encoder.weight]","title":"How are we compressing?"},{"location":"tutorial-lang_model.html#when-are-we-compressing","text":"If the pruners section defines \"what-to-do\", the policies section defines \"when-to-do\". This part is harder, because we define the pruning schedule, which requires us to try a few different schedules until we understand which schedule works best. Below we define three PruningPolicy instances. The first two instances start operating at epoch 2 ( starting_epoch ), end at epoch 20 ( ending_epoch ), and operate once every epoch ( frequency ; as I explained above, Distiller's Pruning scheduling operates only at on_epoch_begin ). In between pruning operations, the pruned model is fine-tuned. policies: - pruner: instance_name : l0_rnn_pruner starting_epoch: 2 ending_epoch: 20 frequency: 1 - pruner: instance_name : l1_rnn_pruner starting_epoch: 2 ending_epoch: 20 frequency: 1 - pruner: instance_name : embedding_pruner starting_epoch: 3 ending_epoch: 21 frequency: 1 We invoke the compression as follows: $ time python3 main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --tied --compress=../../examples/agp-pruning/word_lang_model.LARGE_70.schedule_agp.yaml Table 1 above shows that we can make a negligible improvement when adding L2 regularization. I did some experimenting with the sparsity distribution between the layers, and the scheduling frequency and noticed that the embedding layers are much less sensitive to pruning than the RNN cells. I didn't notice any difference between the RNN cells, but I also didn't invest in this exploration. A new 70% sparsity schedule , prunes the RNNs only to 50% sparsity, but prunes the embedding to 85% sparsity, and achieves almost a 3 points improvement in the test perplexity results. We provide similar pruning schedules for the other compression rates.","title":"When are we compressing?"},{"location":"tutorial-lang_model.html#until-next-time","text":"This concludes the first part of the tutorial on pruning a PyTorch language model. In the next installment, I'll explain how we added an implementation of Baidu Research's Exploring Sparsity in Recurrent Neural Networks paper, and applied to this language model. Geek On.","title":"Until next time"},{"location":"tutorial-struct_pruning.html","text":"Pruning Filters Channels Introduction Channel and filter pruning are examples of structured-pruning which create compressed models that do not require special hardware to execute. This latter fact makes this form of structured pruning particularly interesting and popular. In networks that have serial data dependencies, it is pretty straight-forward to understand and define how to prune channels and filters. However, in more complex models, with parallel-data dependencies (paths) - such as ResNets (skip connections) and GoogLeNet (Inception layers) \u2013 things become increasingly more complex and require a deeper understanding of the data flow in the model, in order to define the pruning schedule. This post explains channel and filter pruning, the challenges, and how to define a Distiller pruning schedule for these structures. The details of the implementation are left for a separate post. Before we dive into pruning, let\u2019s level-set on the terminology, because different people (and even research papers) do not always agree on the nomenclature. This reflects my understanding of the nomenclature, and therefore these are the names used in Distiller. I\u2019ll restrict this discussion to Convolution layers in CNNs, to contain the scope of the topic I\u2019ll be covering, although Distiller supports pruning of other structures such as matrix columns and rows. PyTorch describes torch.nn.Conv2d as applying \u201ca 2D convolution over an input signal composed of several input planes.\u201d We call each of these input planes a feature-map (or FM, for short). Another name is input channel , as in the R/G/B channels of an image. Some people refer to feature-maps as activations (i.e. the activation of neurons), although I think strictly speaking activations are the output of an activation layer that was fed a group of feature-maps. Because it is very common, and because the use of an activation is orthogonal to our discussion, I will use activations to refer to the output of a Convolution layer (i.e. 3D stack of feature-maps). In the PyTorch documentation Convolution outputs have shape (N, C out , H out , W out ) where N is a batch size, C out denotes a number of output channels, H out is a height of output planes in pixels, and W out is width in pixels. We won\u2019t be paying much attention to the batch-size since it\u2019s not important to our discussion, so without loss of generality we can set N=1. I\u2019m also assuming the most common Convolutions having groups==1 . Convolution weights are 4D: (F, C, K, K) where F is the number of filters, C is the number of channels, and K is the kernel size (we can assume the kernel height and width are equal for simplicity). A kernel is a 2D matrix (K, K) that is part of a 3D feature detector. This feature detector is called a filter and it is basically a stack of 2D kernels . Each kernel is convolved with a 2D input channel (i.e. feature-map) so if there are C in channels in the input, then there are C in kernels in a filter (C == C in ). Each filter is convolved with the entire input to create a single output channel (i.e. feature-map). If there are C out output channels, then there are C out filters (F == C out ). Filter Pruning Filter pruning and channel pruning are very similar, and I\u2019ll expand on that similarity later on \u2013 but for now let\u2019s focus on filter pruning. In filter pruning we use some criterion to determine which filters are important and which are not. Researchers came up with all sorts of pruning criteria: the L1-magnitude of the filters (citation), the entropy of the activations (citation), and the classification accuracy reduction (citation) are just some examples. Disregarding how we chose the filters to prune, let\u2019s imagine that in the diagram below, we chose to prune (remove) the green and orange filters (the circle with the \u201c*\u201d designates a Convolution operation). Since we have two less filters operating on the input, we must have two less output feature-maps. So when we prune filters, besides changing the physical size of the weight tensors, we also need to reconfigure the immediate Convolution layer (change its out_channels ) and the following Convolution layer (change its in_channels ). And finally, because the next layer\u2019s input is now smaller (has fewer channels), we should also shrink the next layer\u2019s weights tensors, by removing the channels corresponding to the filters we pruned. We say that there is a data-dependency between the two Convolution layers. I didn\u2019t make any mention of the activation function that usually follows Convolution, because these functions are parameter-less and are not sensitive to the shape of their input. There are some other dependencies that Distiller resolves (such as Optimizer parameters tightly-coupled to the weights) that I won\u2019t discuss here, because they are implementation details. The scheduler YAML syntax for this example is pasted below. We use L1-norm ranking of weight filters, and the pruning-rate is set by the AGP algorithm (Automatic Gradual Pruning). The Convolution layers are conveniently named conv1 and conv2 in this example. pruners: example_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters weights: [module.conv1.weight] Now let\u2019s add a Batch Normalization layer between the two convolutions: The Batch Normalization layer is parameterized by a couple of tensors that contain information per input-channel (i.e. scale and shift). Because our Convolution produces less output FMs, and these are the input to the Batch Normalization layer, we also need to reconfigure the Batch Normalization layer. And we also need to physically shrink the Batch Normalization layer\u2019s scale and shift tensors, which are coefficients in the BN input transformation. Moreover, the scale and shift coefficients that we remove from the tensors, must correspond to the filters (or output feature-maps channels) that we removed from the Convolution weight tensors. This small nuance will prove to be a large pain, but we\u2019ll get to that in later examples. The presence of a Batch Normalization layer in the example above is transparent to us, and in fact, the YAML schedule does not change. Distiller detects the presence of Batch Normalization layers and adjusts their parameters automatically. Let\u2019s look at another example, with non-serial data-dependencies. Here, the output of conv1 is the input for conv2 and conv3 . This is an example of parallel data-dependency, since both conv2 and conv3 depend on conv1 . Note that the Distiller YAML schedule is unchanged from the previous two examples, since we are still only explicitly pruning the weight filters of conv1 . The weight channels of conv2 and conv3 are pruned implicitly by Distiller in a process called \u201cThinning\u201d (on which I will expand in a different post). Next, let\u2019s look at another example also involving three Convolutions, but this time we want to prune the filters of two convolutional layers, whose outputs are element-wise-summed and fed into a third Convolution. In this example conv3 is dependent on both conv1 and conv2 , and there are two implications to this dependency. The first, and more obvious implication, is that we need to prune the same number of filters from both conv1 and conv2 . Since we apply element-wise addition on the outputs of conv1 and conv2 , they must have the same shape - and they can only have the same shape if conv1 and conv2 prune the same number of filters. The second implication of this triangular data-dependency is that both conv1 and conv2 must prune the same filters! Let\u2019s imagine for a moment, that we ignore this second constraint. The diagram below illustrates the dilemma that arises: how should we prune the channels of the weights of conv3 ? Obviously, we can\u2019t. We must apply the second constraint \u2013 and that means that we now need to be proactive: we need to decide whether to use the prune conv1 and conv2 according to the filter-pruning choices of conv1 or of conv2 . The diagram below illustrates the pruning scheme after deciding to follow the pruning choices of conv1 . The YAML compression schedule syntax needs to be able to express the two dependencies (or constraints) discussed above. First we need to tell the Filter Pruner that we there is a dependency of type Leader . This means that all of the tensors listed in the weights field are pruned together, to the same extent at each iteration, and that to prune the filters we will use the pruning decisions of the first tensor listed. In the example below module.conv1.weight and module.conv2.weight are pruned together according to the pruning choices for module.conv1.weight . pruners: example_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters group_dependency: Leader weights: [module.conv1.weight, module.conv2.weight] When we turn to filter-pruning ResNets we see some pretty long dependency chains because of the skip-connections. If you don\u2019t pay attention, you can easily under-specify (or mis-specify) dependency chains and Distiller will exit with an exception. The exception does not explain the specification error and this needs to be improved. Channel Pruning Channel pruning is very similar to Filter pruning with all the details of dependencies reversed. Look again at example #1, but this time imagine that we\u2019ve changed our schedule to prune the channels of module.conv2.weight . pruners: example_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Channels weights: [module.conv2.weight] As the diagram shows, conv1 is now dependent on conv2 and its weights filters will be implicitly pruned according to the channels removed from the weights of conv2 . Geek On.","title":"Pruning Filters and Channels"},{"location":"tutorial-struct_pruning.html#pruning-filters-channels","text":"","title":"Pruning Filters &amp; Channels"},{"location":"tutorial-struct_pruning.html#introduction","text":"Channel and filter pruning are examples of structured-pruning which create compressed models that do not require special hardware to execute. This latter fact makes this form of structured pruning particularly interesting and popular. In networks that have serial data dependencies, it is pretty straight-forward to understand and define how to prune channels and filters. However, in more complex models, with parallel-data dependencies (paths) - such as ResNets (skip connections) and GoogLeNet (Inception layers) \u2013 things become increasingly more complex and require a deeper understanding of the data flow in the model, in order to define the pruning schedule. This post explains channel and filter pruning, the challenges, and how to define a Distiller pruning schedule for these structures. The details of the implementation are left for a separate post. Before we dive into pruning, let\u2019s level-set on the terminology, because different people (and even research papers) do not always agree on the nomenclature. This reflects my understanding of the nomenclature, and therefore these are the names used in Distiller. I\u2019ll restrict this discussion to Convolution layers in CNNs, to contain the scope of the topic I\u2019ll be covering, although Distiller supports pruning of other structures such as matrix columns and rows. PyTorch describes torch.nn.Conv2d as applying \u201ca 2D convolution over an input signal composed of several input planes.\u201d We call each of these input planes a feature-map (or FM, for short). Another name is input channel , as in the R/G/B channels of an image. Some people refer to feature-maps as activations (i.e. the activation of neurons), although I think strictly speaking activations are the output of an activation layer that was fed a group of feature-maps. Because it is very common, and because the use of an activation is orthogonal to our discussion, I will use activations to refer to the output of a Convolution layer (i.e. 3D stack of feature-maps). In the PyTorch documentation Convolution outputs have shape (N, C out , H out , W out ) where N is a batch size, C out denotes a number of output channels, H out is a height of output planes in pixels, and W out is width in pixels. We won\u2019t be paying much attention to the batch-size since it\u2019s not important to our discussion, so without loss of generality we can set N=1. I\u2019m also assuming the most common Convolutions having groups==1 . Convolution weights are 4D: (F, C, K, K) where F is the number of filters, C is the number of channels, and K is the kernel size (we can assume the kernel height and width are equal for simplicity). A kernel is a 2D matrix (K, K) that is part of a 3D feature detector. This feature detector is called a filter and it is basically a stack of 2D kernels . Each kernel is convolved with a 2D input channel (i.e. feature-map) so if there are C in channels in the input, then there are C in kernels in a filter (C == C in ). Each filter is convolved with the entire input to create a single output channel (i.e. feature-map). If there are C out output channels, then there are C out filters (F == C out ).","title":"Introduction"},{"location":"tutorial-struct_pruning.html#filter-pruning","text":"Filter pruning and channel pruning are very similar, and I\u2019ll expand on that similarity later on \u2013 but for now let\u2019s focus on filter pruning. In filter pruning we use some criterion to determine which filters are important and which are not. Researchers came up with all sorts of pruning criteria: the L1-magnitude of the filters (citation), the entropy of the activations (citation), and the classification accuracy reduction (citation) are just some examples. Disregarding how we chose the filters to prune, let\u2019s imagine that in the diagram below, we chose to prune (remove) the green and orange filters (the circle with the \u201c*\u201d designates a Convolution operation). Since we have two less filters operating on the input, we must have two less output feature-maps. So when we prune filters, besides changing the physical size of the weight tensors, we also need to reconfigure the immediate Convolution layer (change its out_channels ) and the following Convolution layer (change its in_channels ). And finally, because the next layer\u2019s input is now smaller (has fewer channels), we should also shrink the next layer\u2019s weights tensors, by removing the channels corresponding to the filters we pruned. We say that there is a data-dependency between the two Convolution layers. I didn\u2019t make any mention of the activation function that usually follows Convolution, because these functions are parameter-less and are not sensitive to the shape of their input. There are some other dependencies that Distiller resolves (such as Optimizer parameters tightly-coupled to the weights) that I won\u2019t discuss here, because they are implementation details. The scheduler YAML syntax for this example is pasted below. We use L1-norm ranking of weight filters, and the pruning-rate is set by the AGP algorithm (Automatic Gradual Pruning). The Convolution layers are conveniently named conv1 and conv2 in this example. pruners: example_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters weights: [module.conv1.weight] Now let\u2019s add a Batch Normalization layer between the two convolutions: The Batch Normalization layer is parameterized by a couple of tensors that contain information per input-channel (i.e. scale and shift). Because our Convolution produces less output FMs, and these are the input to the Batch Normalization layer, we also need to reconfigure the Batch Normalization layer. And we also need to physically shrink the Batch Normalization layer\u2019s scale and shift tensors, which are coefficients in the BN input transformation. Moreover, the scale and shift coefficients that we remove from the tensors, must correspond to the filters (or output feature-maps channels) that we removed from the Convolution weight tensors. This small nuance will prove to be a large pain, but we\u2019ll get to that in later examples. The presence of a Batch Normalization layer in the example above is transparent to us, and in fact, the YAML schedule does not change. Distiller detects the presence of Batch Normalization layers and adjusts their parameters automatically. Let\u2019s look at another example, with non-serial data-dependencies. Here, the output of conv1 is the input for conv2 and conv3 . This is an example of parallel data-dependency, since both conv2 and conv3 depend on conv1 . Note that the Distiller YAML schedule is unchanged from the previous two examples, since we are still only explicitly pruning the weight filters of conv1 . The weight channels of conv2 and conv3 are pruned implicitly by Distiller in a process called \u201cThinning\u201d (on which I will expand in a different post). Next, let\u2019s look at another example also involving three Convolutions, but this time we want to prune the filters of two convolutional layers, whose outputs are element-wise-summed and fed into a third Convolution. In this example conv3 is dependent on both conv1 and conv2 , and there are two implications to this dependency. The first, and more obvious implication, is that we need to prune the same number of filters from both conv1 and conv2 . Since we apply element-wise addition on the outputs of conv1 and conv2 , they must have the same shape - and they can only have the same shape if conv1 and conv2 prune the same number of filters. The second implication of this triangular data-dependency is that both conv1 and conv2 must prune the same filters! Let\u2019s imagine for a moment, that we ignore this second constraint. The diagram below illustrates the dilemma that arises: how should we prune the channels of the weights of conv3 ? Obviously, we can\u2019t. We must apply the second constraint \u2013 and that means that we now need to be proactive: we need to decide whether to use the prune conv1 and conv2 according to the filter-pruning choices of conv1 or of conv2 . The diagram below illustrates the pruning scheme after deciding to follow the pruning choices of conv1 . The YAML compression schedule syntax needs to be able to express the two dependencies (or constraints) discussed above. First we need to tell the Filter Pruner that we there is a dependency of type Leader . This means that all of the tensors listed in the weights field are pruned together, to the same extent at each iteration, and that to prune the filters we will use the pruning decisions of the first tensor listed. In the example below module.conv1.weight and module.conv2.weight are pruned together according to the pruning choices for module.conv1.weight . pruners: example_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters group_dependency: Leader weights: [module.conv1.weight, module.conv2.weight] When we turn to filter-pruning ResNets we see some pretty long dependency chains because of the skip-connections. If you don\u2019t pay attention, you can easily under-specify (or mis-specify) dependency chains and Distiller will exit with an exception. The exception does not explain the specification error and this needs to be improved.","title":"Filter Pruning"},{"location":"tutorial-struct_pruning.html#channel-pruning","text":"Channel pruning is very similar to Filter pruning with all the details of dependencies reversed. Look again at example #1, but this time imagine that we\u2019ve changed our schedule to prune the channels of module.conv2.weight . pruners: example_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Channels weights: [module.conv2.weight] As the diagram shows, conv1 is now dependent on conv2 and its weights filters will be implicitly pruned according to the channels removed from the weights of conv2 . Geek On.","title":"Channel Pruning"},{"location":"usage.html","text":"Using the sample application The Distiller repository contains a sample application, distiller/examples/classifier_compression/compress_classifier.py , and a set of scheduling files which demonstrate Distiller's features. Following is a brief discussion of how to use this application and the accompanying schedules. You might also want to refer to the following resources: An explanation of the scheduler file format. An in-depth discussion of how we used these schedule files to implement several state-of-the-art DNN compression research papers. The sample application supports various features for compression of image classification DNNs, and gives an example of how to integrate distiller in your own application. The code is documented and should be considered the best source of documentation, but we provide some elaboration here. This diagram shows how where compress_classifier.py fits in the compression workflow, and how we integrate the Jupyter notebooks as part of our research work. Command line arguments To get help on the command line arguments, invoke: $ python3 compress_classifier.py --help For example: $ time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../sensitivity-pruning/alexnet.schedule_sensitivity.yaml Parameters: +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ | | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean | |----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------| | 0 | features.module.0.weight | (64, 3, 11, 11) | 23232 | 13411 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 42.27359 | 0.14391 | -0.00002 | 0.08805 | | 1 | features.module.3.weight | (192, 64, 5, 5) | 307200 | 115560 | 0.00000 | 0.00000 | 0.00000 | 1.91243 | 0.00000 | 62.38281 | 0.04703 | -0.00250 | 0.02289 | | 2 | features.module.6.weight | (384, 192, 3, 3) | 663552 | 256565 | 0.00000 | 0.00000 | 0.00000 | 6.18490 | 0.00000 | 61.33445 | 0.03354 | -0.00184 | 0.01803 | | 3 | features.module.8.weight | (256, 384, 3, 3) | 884736 | 315065 | 0.00000 | 0.00000 | 0.00000 | 6.96411 | 0.00000 | 64.38881 | 0.02646 | -0.00168 | 0.01422 | | 4 | features.module.10.weight | (256, 256, 3, 3) | 589824 | 186938 | 0.00000 | 0.00000 | 0.00000 | 15.49225 | 0.00000 | 68.30614 | 0.02714 | -0.00246 | 0.01409 | | 5 | classifier.1.weight | (4096, 9216) | 37748736 | 3398881 | 0.00000 | 0.21973 | 0.00000 | 0.21973 | 0.00000 | 90.99604 | 0.00589 | -0.00020 | 0.00168 | | 6 | classifier.4.weight | (4096, 4096) | 16777216 | 1782769 | 0.21973 | 3.46680 | 0.00000 | 3.46680 | 0.00000 | 89.37387 | 0.00849 | -0.00066 | 0.00263 | | 7 | classifier.6.weight | (1000, 4096) | 4096000 | 994738 | 3.36914 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 75.71440 | 0.01718 | 0.00030 | 0.00778 | | 8 | Total sparsity: | - | 61090496 | 7063928 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 88.43694 | 0.00000 | 0.00000 | 0.00000 | +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ 2018-04-04 21:30:52,499 - Total sparsity: 88.44 2018-04-04 21:30:52,499 - --- validate (epoch=89)----------- 2018-04-04 21:30:52,499 - 128116 samples (256 per mini-batch) 2018-04-04 21:31:04,646 - Epoch: [89][ 50/ 500] Loss 2.175988 Top1 51.289063 Top5 74.023438 2018-04-04 21:31:06,427 - Epoch: [89][ 100/ 500] Loss 2.171564 Top1 51.175781 Top5 74.308594 2018-04-04 21:31:11,432 - Epoch: [89][ 150/ 500] Loss 2.159347 Top1 51.546875 Top5 74.473958 2018-04-04 21:31:14,364 - Epoch: [89][ 200/ 500] Loss 2.156857 Top1 51.585938 Top5 74.568359 2018-04-04 21:31:18,381 - Epoch: [89][ 250/ 500] Loss 2.152790 Top1 51.707813 Top5 74.681250 2018-04-04 21:31:22,195 - Epoch: [89][ 300/ 500] Loss 2.149962 Top1 51.791667 Top5 74.755208 2018-04-04 21:31:25,508 - Epoch: [89][ 350/ 500] Loss 2.150936 Top1 51.827009 Top5 74.767857 2018-04-04 21:31:29,538 - Epoch: [89][ 400/ 500] Loss 2.150853 Top1 51.781250 Top5 74.763672 2018-04-04 21:31:32,842 - Epoch: [89][ 450/ 500] Loss 2.150156 Top1 51.828125 Top5 74.821181 2018-04-04 21:31:35,338 - Epoch: [89][ 500/ 500] Loss 2.150417 Top1 51.833594 Top5 74.817187 2018-04-04 21:31:35,357 - == Top1: 51.838 Top5: 74.817 Loss: 2.150 2018-04-04 21:31:35,364 - Saving checkpoint 2018-04-04 21:31:39,251 - --- test --------------------- 2018-04-04 21:31:39,252 - 50000 samples (256 per mini-batch) 2018-04-04 21:31:51,512 - Test: [ 50/ 195] Loss 1.487607 Top1 63.273438 Top5 85.695312 2018-04-04 21:31:55,015 - Test: [ 100/ 195] Loss 1.638043 Top1 60.636719 Top5 83.664062 2018-04-04 21:31:58,732 - Test: [ 150/ 195] Loss 1.833214 Top1 57.619792 Top5 80.447917 2018-04-04 21:32:01,274 - == Top1: 56.606 Top5: 79.446 Loss: 1.893 Let's look at the command line again: $ time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../sensitivity-pruning/alexnet.schedule_sensitivity.yaml In this example, we prune a TorchVision pre-trained AlexNet network, using the following configuration: Learning-rate of 0.005 Print progress every 50 mini-batches. Use 44 worker threads to load data (make sure to use something suitable for your machine). Run for 90 epochs. Torchvision's pre-trained models did not store the epoch metadata, so pruning starts at epoch 0. When you train and prune your own networks, the last training epoch is saved as a metadata with the model. Therefore, when you load such models, the first epoch is not 0, but it is the last training epoch. The pruning schedule is provided in alexnet.schedule_sensitivity.yaml Log files are written to directory logs . Examples Distiller comes with several example schedules which can be used together with compress_classifier.py . These example schedules (YAML) files, contain the command line that is used in order to invoke the schedule (so that you can easily recreate the results in your environment), together with the results of the pruning or regularization. The results usually contain a table showing the sparsity of each of the model parameters, together with the validation and test top1, top5 and loss scores. For more details on the example schedules, you can refer to the coverage of the Model Zoo . examples/agp-pruning : Automated Gradual Pruning (AGP) on MobileNet and ResNet18 (ImageNet dataset) examples/hybrid : AlexNet AGP with 2D (kernel) regularization (ImageNet dataset) AlexNet sensitivity pruning with 2D regularization examples/network_slimming : ResNet20 Network Slimming (this is work-in-progress) examples/pruning_filters_for_efficient_convnets : ResNet56 baseline training (CIFAR10 dataset) ResNet56 filter removal using filter ranking examples/sensitivity_analysis : Element-wise pruning sensitivity-analysis: AlexNet (ImageNet) MobileNet (ImageNet) ResNet18 (ImageNet) ResNet20 (CIFAR10) ResNet34 (ImageNet) Filter-wise pruning sensitivity-analysis: ResNet20 (CIFAR10) ResNet56 (CIFAR10) examples/sensitivity-pruning : AlexNet sensitivity pruning with Iterative Pruning AlexNet sensitivity pruning with One-Shot Pruning examples/ssl : ResNet20 baseline training (CIFAR10 dataset) Structured Sparsity Learning (SSL) with layer removal on ResNet20 SSL with channels removal on ResNet20 examples/quantization : AlexNet w. Batch-Norm (base FP32 + DoReFa) Pre-activation ResNet20 on CIFAR10 (base FP32 + DoReFa) Pre-activation ResNet18 on ImageNEt (base FP32 + DoReFa) Experiment reproducibility Experiment reproducibility is sometimes important. Pete Warden recently expounded about this in his blog . PyTorch's support for deterministic execution requires us to use only one thread for loading data (other wise the multi-threaded execution of the data loaders can create random order and change the results), and to set the seed of the CPU and GPU PRNGs. Using the --deterministic command-line flag and setting j=1 will produce reproducible results (for the same PyTorch version). Performing pruning sensitivity analysis Distiller supports element-wise and filter-wise pruning sensitivity analysis. In both cases, L1-norm is used to rank which elements or filters to prune. For example, when running filter-pruning sensitivity analysis, the L1-norm of the filters of each layer's weights tensor are calculated, and the bottom x% are set to zero. The analysis process is quite long, because currently we use the entire test dataset to assess the accuracy performance at each pruning level of each weights tensor. Using a small dataset for this would save much time and we plan on assessing if this will provide sufficient results. Results are output as a CSV file ( sensitivity.csv ) and PNG file ( sensitivity.png ). The implementation is in distiller/sensitivity.py and it contains further details about process and the format of the CSV file. The example below performs element-wise pruning sensitivity analysis on ResNet20 for CIFAR10: $ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10/ -j=1 --resume=../cifar10/resnet20/checkpoint_trained_dense.pth.tar --sense=element The sense command-line argument can be set to either element or filter , depending on the type of analysis you want done. There is also a Jupyter notebook with example invocations, outputs and explanations. Post-Training Quantization The following example qunatizes ResNet18 for ImageNet: $ python3 compress_classifier.py -a resnet18 ../../../data.imagenet --pretrained --quantize-eval --evaluate See here for more details on how to invoke post-training quantization from the command line. A checkpoint with the quantized model will be dumped in the run directory. It will contain the quantized model parameters (the data type will still be FP32, but the values will be integers). The calculated quantization parameters (scale and zero-point) are stored as well in each quantized layer. For more examples of post-training quantization see here . Summaries You can use the sample compression application to generate model summary reports, such as the attributes and compute summary report (see screen capture below). You can log sparsity statistics (written to console and CSV file), performance, optimizer and model information, and also create a PNG image of the DNN. Creating a PNG image is an experimental feature (it relies on features which are not available on PyTorch 3.1 and that we hope will be available in PyTorch's next release), so to use it you will need to compile the PyTorch master branch, and hope for the best ;-). $ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_ch_regularized_dense.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=compute Generates: +----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------+ | | Name | Type | Attrs | IFM | IFM volume | OFM | OFM volume | Weights volume | MACs | |----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------| | 0 | module.conv1 | Conv2d | k=(3, 3) | (1, 3, 32, 32) | 3072 | (1, 16, 32, 32) | 16384 | 432 | 442368 | | 1 | module.layer1.0.conv1 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 2 | module.layer1.0.conv2 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 3 | module.layer1.1.conv1 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 4 | module.layer1.1.conv2 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 5 | module.layer1.2.conv1 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 6 | module.layer1.2.conv2 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 7 | module.layer2.0.conv1 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 32, 16, 16) | 8192 | 4608 | 1179648 | | 8 | module.layer2.0.conv2 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 9 | module.layer2.0.downsample.0 | Conv2d | k=(1, 1) | (1, 16, 32, 32) | 16384 | (1, 32, 16, 16) | 8192 | 512 | 131072 | | 10 | module.layer2.1.conv1 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 11 | module.layer2.1.conv2 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 12 | module.layer2.2.conv1 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 13 | module.layer2.2.conv2 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 14 | module.layer3.0.conv1 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 64, 8, 8) | 4096 | 18432 | 1179648 | | 15 | module.layer3.0.conv2 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 16 | module.layer3.0.downsample.0 | Conv2d | k=(1, 1) | (1, 32, 16, 16) | 8192 | (1, 64, 8, 8) | 4096 | 2048 | 131072 | | 17 | module.layer3.1.conv1 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 18 | module.layer3.1.conv2 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 19 | module.layer3.2.conv1 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 20 | module.layer3.2.conv2 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 21 | module.fc | Linear | | (1, 64) | 64 | (1, 10) | 10 | 640 | 640 | +----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------+ Total MACs: 40,813,184 Using TensorBoard Google's TensorBoard is an excellent tool for visualizing the progress of DNN training. Distiller's logger supports writing performance indicators and parameter statistics in a file format that can be read by TensorBoard (Distiller uses TensorFlow's APIs in order to do this, which is why Distiller requires the installation of TensorFlow). To view the graphs, invoke the TensorBoard server. For example: $ tensorboard --logdir=logs Distillers's setup (requirements.txt) installs TensorFlow for CPU. If you want a different installation, please follow the TensorFlow installation instructions . Collecting activations statistics In CNNs with ReLU layers, ReLU activations (feature-maps) also exhibit a nice level of sparsity (50-60% sparsity is typical). You can collect activation statistics using the --act_stats command-line flag. For example: $ python3 compress_classifier.py -a=resnet56_cifar -p=50 ../../../data.cifar10 --resume=checkpoint.resnet56_cifar_baseline.pth.tar --act-stats=test -e The test parameter indicates that, in this example, we want to collect activation statistics during the test phase. Note that we also used the -e command-line argument to indicate that we want to run a test phase. The other two legal parameter values are train and valid which collect activation statistics during the training and validation phases, respectively. Collectors and their collaterals An instance of a subclass of ActivationStatsCollector can be used to collect activation statistics. Currently, ActivationStatsCollector has two types of subclasses: SummaryActivationStatsCollector and RecordsActivationStatsCollector . Instances of SummaryActivationStatsCollector compute the mean of some statistic of the activation. It is rather light-weight and quicker than collecting a record per activation. The statistic function is configured in the constructor. In the sample compression application, compress_classifier.py , we create a dictionary of collectors. For example: SummaryActivationStatsCollector(model, sparsity , lambda t: 100 * distiller.utils.sparsity(t)) The lambda expression is invoked per activation encountered during forward passes, and the value it returns (in this case, the sparsity of the activation tensors, multiplied by 100) is stored in module.sparsity ( \"sparsity\" is this collector's name). To access the statistics, you can invoke collector.value() , or you can access each module's data directly. Another type of collector is RecordsActivationStatsCollector which computes a hard-coded set of activations statistics and collects a record per activation . For obvious reasons, this is slower than instances of SummaryActivationStatsCollector . ActivationStatsCollector default to collecting activations statistics only on the output activations of ReLU layers, but we can choose any layer type we want. In the example below we collect statistics from outputs of torch.nn.Conv2d layers. RecordsActivationStatsCollector(model, classes=[torch.nn.Conv2d]) Collectors can write their data to Excel workbooks (which are named using the collector's name), by invoking collector.to_xlsx(path_to_workbook) . In compress_classifier.py we currently create four different collectors which you can selectively disable. You can also add other statistics collectors and use a different function to compute your new statistic. collectors = missingdict({ sparsity : SummaryActivationStatsCollector(model, sparsity , lambda t: 100 * distiller.utils.sparsity(t)), l1_channels : SummaryActivationStatsCollector(model, l1_channels , distiller.utils.activation_channels_l1), apoz_channels : SummaryActivationStatsCollector(model, apoz_channels , distiller.utils.activation_channels_apoz), records : RecordsActivationStatsCollector(model, classes=[torch.nn.Conv2d])}) By default, these Collectors write their data to files in the active log directory. You can use a utility function, distiller.log_activation_statsitics , to log the data of an ActivationStatsCollector instance to one of the backend-loggers. For an example, the code below logs the \"sparsity\" collector to a TensorBoard log file. distiller.log_activation_statsitics(epoch, train , loggers=[tflogger], collector=collectors[ sparsity ]) Caveats Distiller collects activations statistics using PyTorch's forward-hooks mechanism. Collectors iteratively register the modules' forward-hooks, and collectors are called during the forward traversal and get exposed to activation data. Registering for forward callbacks is performed like this: module.register_forward_hook This makes apparent two limitations of this mechanism: We can only register on PyTorch modules. This means that we can't register on the forward hook of a functionals such as torch.nn.functional.relu and torch.nn.functional.max_pool2d . Therefore, you may need to replace functionals with their module alternative. For example: class MadeUpNet(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 6, 5) def forward(self, x): x = F.relu(self.conv1(x)) return x Can be changed to: class MadeUpNet(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.relu = nn.ReLU(inplace=True) def forward(self, x): x = self.relu(self.conv1(x)) return x We can only use a module instance once in our models. If we use the same module several times, then we can't determine which node in the graph has invoked the callback, because the PyTorch callback signature def hook(module, input, output) doesn't provide enough contextual information. TorchVision's ResNet is an example of a model that uses the same instance of nn.ReLU multiple times: class BasicBlock(nn.Module): expansion = 1 def __init__(self, inplanes, planes, stride=1, downsample=None): super(BasicBlock, self).__init__() self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = nn.BatchNorm2d(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = nn.BatchNorm2d(planes) self.downsample = downsample self.stride = stride def forward(self, x): residual = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) # ================ out = self.conv2(out) out = self.bn2(out) if self.downsample is not None: residual = self.downsample(x) out += residual out = self.relu(out) # ================ return out In Distiller we changed ResNet to use multiple instances of nn.ReLU, and each instance is used only once: class BasicBlock(nn.Module): expansion = 1 def __init__(self, inplanes, planes, stride=1, downsample=None): super(BasicBlock, self).__init__() self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = nn.BatchNorm2d(planes) self.relu1 = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = nn.BatchNorm2d(planes) self.relu2 = nn.ReLU(inplace=True) self.downsample = downsample self.stride = stride def forward(self, x): residual = x out = self.conv1(x) out = self.bn1(out) out = self.relu1(out) # ================ out = self.conv2(out) out = self.bn2(out) if self.downsample is not None: residual = self.downsample(x) out += residual out = self.relu2(out) # ================ return out Using the Jupyter notebooks The Jupyter notebooks contain many examples of how to use the statistics summaries generated by Distiller. They are explained in a separate page. Generating this documentation Install mkdocs and the required packages by executing: $ pip3 install -r doc-requirements.txt To build the project documentation run: $ cd distiller/docs-src $ mkdocs build --clean This will create a folder named 'site' which contains the documentation website. Open distiller/docs/site/index.html to view the documentation home page.","title":"Usage"},{"location":"usage.html#using-the-sample-application","text":"The Distiller repository contains a sample application, distiller/examples/classifier_compression/compress_classifier.py , and a set of scheduling files which demonstrate Distiller's features. Following is a brief discussion of how to use this application and the accompanying schedules. You might also want to refer to the following resources: An explanation of the scheduler file format. An in-depth discussion of how we used these schedule files to implement several state-of-the-art DNN compression research papers. The sample application supports various features for compression of image classification DNNs, and gives an example of how to integrate distiller in your own application. The code is documented and should be considered the best source of documentation, but we provide some elaboration here. This diagram shows how where compress_classifier.py fits in the compression workflow, and how we integrate the Jupyter notebooks as part of our research work.","title":"Using the sample application"},{"location":"usage.html#command-line-arguments","text":"To get help on the command line arguments, invoke: $ python3 compress_classifier.py --help For example: $ time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../sensitivity-pruning/alexnet.schedule_sensitivity.yaml Parameters: +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ | | Name | Shape | NNZ (dense) | NNZ (sparse) | Cols (%) | Rows (%) | Ch (%) | 2D (%) | 3D (%) | Fine (%) | Std | Mean | Abs-Mean | |----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------| | 0 | features.module.0.weight | (64, 3, 11, 11) | 23232 | 13411 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 42.27359 | 0.14391 | -0.00002 | 0.08805 | | 1 | features.module.3.weight | (192, 64, 5, 5) | 307200 | 115560 | 0.00000 | 0.00000 | 0.00000 | 1.91243 | 0.00000 | 62.38281 | 0.04703 | -0.00250 | 0.02289 | | 2 | features.module.6.weight | (384, 192, 3, 3) | 663552 | 256565 | 0.00000 | 0.00000 | 0.00000 | 6.18490 | 0.00000 | 61.33445 | 0.03354 | -0.00184 | 0.01803 | | 3 | features.module.8.weight | (256, 384, 3, 3) | 884736 | 315065 | 0.00000 | 0.00000 | 0.00000 | 6.96411 | 0.00000 | 64.38881 | 0.02646 | -0.00168 | 0.01422 | | 4 | features.module.10.weight | (256, 256, 3, 3) | 589824 | 186938 | 0.00000 | 0.00000 | 0.00000 | 15.49225 | 0.00000 | 68.30614 | 0.02714 | -0.00246 | 0.01409 | | 5 | classifier.1.weight | (4096, 9216) | 37748736 | 3398881 | 0.00000 | 0.21973 | 0.00000 | 0.21973 | 0.00000 | 90.99604 | 0.00589 | -0.00020 | 0.00168 | | 6 | classifier.4.weight | (4096, 4096) | 16777216 | 1782769 | 0.21973 | 3.46680 | 0.00000 | 3.46680 | 0.00000 | 89.37387 | 0.00849 | -0.00066 | 0.00263 | | 7 | classifier.6.weight | (1000, 4096) | 4096000 | 994738 | 3.36914 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 75.71440 | 0.01718 | 0.00030 | 0.00778 | | 8 | Total sparsity: | - | 61090496 | 7063928 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 88.43694 | 0.00000 | 0.00000 | 0.00000 | +----+---------------------------+------------------+---------------+----------------+------------+------------+----------+----------+----------+------------+---------+----------+------------+ 2018-04-04 21:30:52,499 - Total sparsity: 88.44 2018-04-04 21:30:52,499 - --- validate (epoch=89)----------- 2018-04-04 21:30:52,499 - 128116 samples (256 per mini-batch) 2018-04-04 21:31:04,646 - Epoch: [89][ 50/ 500] Loss 2.175988 Top1 51.289063 Top5 74.023438 2018-04-04 21:31:06,427 - Epoch: [89][ 100/ 500] Loss 2.171564 Top1 51.175781 Top5 74.308594 2018-04-04 21:31:11,432 - Epoch: [89][ 150/ 500] Loss 2.159347 Top1 51.546875 Top5 74.473958 2018-04-04 21:31:14,364 - Epoch: [89][ 200/ 500] Loss 2.156857 Top1 51.585938 Top5 74.568359 2018-04-04 21:31:18,381 - Epoch: [89][ 250/ 500] Loss 2.152790 Top1 51.707813 Top5 74.681250 2018-04-04 21:31:22,195 - Epoch: [89][ 300/ 500] Loss 2.149962 Top1 51.791667 Top5 74.755208 2018-04-04 21:31:25,508 - Epoch: [89][ 350/ 500] Loss 2.150936 Top1 51.827009 Top5 74.767857 2018-04-04 21:31:29,538 - Epoch: [89][ 400/ 500] Loss 2.150853 Top1 51.781250 Top5 74.763672 2018-04-04 21:31:32,842 - Epoch: [89][ 450/ 500] Loss 2.150156 Top1 51.828125 Top5 74.821181 2018-04-04 21:31:35,338 - Epoch: [89][ 500/ 500] Loss 2.150417 Top1 51.833594 Top5 74.817187 2018-04-04 21:31:35,357 - == Top1: 51.838 Top5: 74.817 Loss: 2.150 2018-04-04 21:31:35,364 - Saving checkpoint 2018-04-04 21:31:39,251 - --- test --------------------- 2018-04-04 21:31:39,252 - 50000 samples (256 per mini-batch) 2018-04-04 21:31:51,512 - Test: [ 50/ 195] Loss 1.487607 Top1 63.273438 Top5 85.695312 2018-04-04 21:31:55,015 - Test: [ 100/ 195] Loss 1.638043 Top1 60.636719 Top5 83.664062 2018-04-04 21:31:58,732 - Test: [ 150/ 195] Loss 1.833214 Top1 57.619792 Top5 80.447917 2018-04-04 21:32:01,274 - == Top1: 56.606 Top5: 79.446 Loss: 1.893 Let's look at the command line again: $ time python3 compress_classifier.py -a alexnet --lr 0.005 -p 50 ../../../data.imagenet -j 44 --epochs 90 --pretrained --compress=../sensitivity-pruning/alexnet.schedule_sensitivity.yaml In this example, we prune a TorchVision pre-trained AlexNet network, using the following configuration: Learning-rate of 0.005 Print progress every 50 mini-batches. Use 44 worker threads to load data (make sure to use something suitable for your machine). Run for 90 epochs. Torchvision's pre-trained models did not store the epoch metadata, so pruning starts at epoch 0. When you train and prune your own networks, the last training epoch is saved as a metadata with the model. Therefore, when you load such models, the first epoch is not 0, but it is the last training epoch. The pruning schedule is provided in alexnet.schedule_sensitivity.yaml Log files are written to directory logs .","title":"Command line arguments"},{"location":"usage.html#examples","text":"Distiller comes with several example schedules which can be used together with compress_classifier.py . These example schedules (YAML) files, contain the command line that is used in order to invoke the schedule (so that you can easily recreate the results in your environment), together with the results of the pruning or regularization. The results usually contain a table showing the sparsity of each of the model parameters, together with the validation and test top1, top5 and loss scores. For more details on the example schedules, you can refer to the coverage of the Model Zoo . examples/agp-pruning : Automated Gradual Pruning (AGP) on MobileNet and ResNet18 (ImageNet dataset) examples/hybrid : AlexNet AGP with 2D (kernel) regularization (ImageNet dataset) AlexNet sensitivity pruning with 2D regularization examples/network_slimming : ResNet20 Network Slimming (this is work-in-progress) examples/pruning_filters_for_efficient_convnets : ResNet56 baseline training (CIFAR10 dataset) ResNet56 filter removal using filter ranking examples/sensitivity_analysis : Element-wise pruning sensitivity-analysis: AlexNet (ImageNet) MobileNet (ImageNet) ResNet18 (ImageNet) ResNet20 (CIFAR10) ResNet34 (ImageNet) Filter-wise pruning sensitivity-analysis: ResNet20 (CIFAR10) ResNet56 (CIFAR10) examples/sensitivity-pruning : AlexNet sensitivity pruning with Iterative Pruning AlexNet sensitivity pruning with One-Shot Pruning examples/ssl : ResNet20 baseline training (CIFAR10 dataset) Structured Sparsity Learning (SSL) with layer removal on ResNet20 SSL with channels removal on ResNet20 examples/quantization : AlexNet w. Batch-Norm (base FP32 + DoReFa) Pre-activation ResNet20 on CIFAR10 (base FP32 + DoReFa) Pre-activation ResNet18 on ImageNEt (base FP32 + DoReFa)","title":"Examples"},{"location":"usage.html#experiment-reproducibility","text":"Experiment reproducibility is sometimes important. Pete Warden recently expounded about this in his blog . PyTorch's support for deterministic execution requires us to use only one thread for loading data (other wise the multi-threaded execution of the data loaders can create random order and change the results), and to set the seed of the CPU and GPU PRNGs. Using the --deterministic command-line flag and setting j=1 will produce reproducible results (for the same PyTorch version).","title":"Experiment reproducibility"},{"location":"usage.html#performing-pruning-sensitivity-analysis","text":"Distiller supports element-wise and filter-wise pruning sensitivity analysis. In both cases, L1-norm is used to rank which elements or filters to prune. For example, when running filter-pruning sensitivity analysis, the L1-norm of the filters of each layer's weights tensor are calculated, and the bottom x% are set to zero. The analysis process is quite long, because currently we use the entire test dataset to assess the accuracy performance at each pruning level of each weights tensor. Using a small dataset for this would save much time and we plan on assessing if this will provide sufficient results. Results are output as a CSV file ( sensitivity.csv ) and PNG file ( sensitivity.png ). The implementation is in distiller/sensitivity.py and it contains further details about process and the format of the CSV file. The example below performs element-wise pruning sensitivity analysis on ResNet20 for CIFAR10: $ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10/ -j=1 --resume=../cifar10/resnet20/checkpoint_trained_dense.pth.tar --sense=element The sense command-line argument can be set to either element or filter , depending on the type of analysis you want done. There is also a Jupyter notebook with example invocations, outputs and explanations.","title":"Performing pruning sensitivity analysis"},{"location":"usage.html#post-training-quantization","text":"The following example qunatizes ResNet18 for ImageNet: $ python3 compress_classifier.py -a resnet18 ../../../data.imagenet --pretrained --quantize-eval --evaluate See here for more details on how to invoke post-training quantization from the command line. A checkpoint with the quantized model will be dumped in the run directory. It will contain the quantized model parameters (the data type will still be FP32, but the values will be integers). The calculated quantization parameters (scale and zero-point) are stored as well in each quantized layer. For more examples of post-training quantization see here .","title":"Post-Training Quantization"},{"location":"usage.html#summaries","text":"You can use the sample compression application to generate model summary reports, such as the attributes and compute summary report (see screen capture below). You can log sparsity statistics (written to console and CSV file), performance, optimizer and model information, and also create a PNG image of the DNN. Creating a PNG image is an experimental feature (it relies on features which are not available on PyTorch 3.1 and that we hope will be available in PyTorch's next release), so to use it you will need to compile the PyTorch master branch, and hope for the best ;-). $ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_ch_regularized_dense.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=compute Generates: +----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------+ | | Name | Type | Attrs | IFM | IFM volume | OFM | OFM volume | Weights volume | MACs | |----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------| | 0 | module.conv1 | Conv2d | k=(3, 3) | (1, 3, 32, 32) | 3072 | (1, 16, 32, 32) | 16384 | 432 | 442368 | | 1 | module.layer1.0.conv1 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 2 | module.layer1.0.conv2 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 3 | module.layer1.1.conv1 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 4 | module.layer1.1.conv2 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 5 | module.layer1.2.conv1 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 6 | module.layer1.2.conv2 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 16, 32, 32) | 16384 | 2304 | 2359296 | | 7 | module.layer2.0.conv1 | Conv2d | k=(3, 3) | (1, 16, 32, 32) | 16384 | (1, 32, 16, 16) | 8192 | 4608 | 1179648 | | 8 | module.layer2.0.conv2 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 9 | module.layer2.0.downsample.0 | Conv2d | k=(1, 1) | (1, 16, 32, 32) | 16384 | (1, 32, 16, 16) | 8192 | 512 | 131072 | | 10 | module.layer2.1.conv1 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 11 | module.layer2.1.conv2 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 12 | module.layer2.2.conv1 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 13 | module.layer2.2.conv2 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 32, 16, 16) | 8192 | 9216 | 2359296 | | 14 | module.layer3.0.conv1 | Conv2d | k=(3, 3) | (1, 32, 16, 16) | 8192 | (1, 64, 8, 8) | 4096 | 18432 | 1179648 | | 15 | module.layer3.0.conv2 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 16 | module.layer3.0.downsample.0 | Conv2d | k=(1, 1) | (1, 32, 16, 16) | 8192 | (1, 64, 8, 8) | 4096 | 2048 | 131072 | | 17 | module.layer3.1.conv1 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 18 | module.layer3.1.conv2 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 19 | module.layer3.2.conv1 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 20 | module.layer3.2.conv2 | Conv2d | k=(3, 3) | (1, 64, 8, 8) | 4096 | (1, 64, 8, 8) | 4096 | 36864 | 2359296 | | 21 | module.fc | Linear | | (1, 64) | 64 | (1, 10) | 10 | 640 | 640 | +----+------------------------------+--------+----------+-----------------+--------------+-----------------+--------------+------------------+---------+ Total MACs: 40,813,184","title":"Summaries"},{"location":"usage.html#using-tensorboard","text":"Google's TensorBoard is an excellent tool for visualizing the progress of DNN training. Distiller's logger supports writing performance indicators and parameter statistics in a file format that can be read by TensorBoard (Distiller uses TensorFlow's APIs in order to do this, which is why Distiller requires the installation of TensorFlow). To view the graphs, invoke the TensorBoard server. For example: $ tensorboard --logdir=logs Distillers's setup (requirements.txt) installs TensorFlow for CPU. If you want a different installation, please follow the TensorFlow installation instructions .","title":"Using TensorBoard"},{"location":"usage.html#collecting-activations-statistics","text":"In CNNs with ReLU layers, ReLU activations (feature-maps) also exhibit a nice level of sparsity (50-60% sparsity is typical). You can collect activation statistics using the --act_stats command-line flag. For example: $ python3 compress_classifier.py -a=resnet56_cifar -p=50 ../../../data.cifar10 --resume=checkpoint.resnet56_cifar_baseline.pth.tar --act-stats=test -e The test parameter indicates that, in this example, we want to collect activation statistics during the test phase. Note that we also used the -e command-line argument to indicate that we want to run a test phase. The other two legal parameter values are train and valid which collect activation statistics during the training and validation phases, respectively.","title":"Collecting activations statistics"},{"location":"usage.html#collectors-and-their-collaterals","text":"An instance of a subclass of ActivationStatsCollector can be used to collect activation statistics. Currently, ActivationStatsCollector has two types of subclasses: SummaryActivationStatsCollector and RecordsActivationStatsCollector . Instances of SummaryActivationStatsCollector compute the mean of some statistic of the activation. It is rather light-weight and quicker than collecting a record per activation. The statistic function is configured in the constructor. In the sample compression application, compress_classifier.py , we create a dictionary of collectors. For example: SummaryActivationStatsCollector(model, sparsity , lambda t: 100 * distiller.utils.sparsity(t)) The lambda expression is invoked per activation encountered during forward passes, and the value it returns (in this case, the sparsity of the activation tensors, multiplied by 100) is stored in module.sparsity ( \"sparsity\" is this collector's name). To access the statistics, you can invoke collector.value() , or you can access each module's data directly. Another type of collector is RecordsActivationStatsCollector which computes a hard-coded set of activations statistics and collects a record per activation . For obvious reasons, this is slower than instances of SummaryActivationStatsCollector . ActivationStatsCollector default to collecting activations statistics only on the output activations of ReLU layers, but we can choose any layer type we want. In the example below we collect statistics from outputs of torch.nn.Conv2d layers. RecordsActivationStatsCollector(model, classes=[torch.nn.Conv2d]) Collectors can write their data to Excel workbooks (which are named using the collector's name), by invoking collector.to_xlsx(path_to_workbook) . In compress_classifier.py we currently create four different collectors which you can selectively disable. You can also add other statistics collectors and use a different function to compute your new statistic. collectors = missingdict({ sparsity : SummaryActivationStatsCollector(model, sparsity , lambda t: 100 * distiller.utils.sparsity(t)), l1_channels : SummaryActivationStatsCollector(model, l1_channels , distiller.utils.activation_channels_l1), apoz_channels : SummaryActivationStatsCollector(model, apoz_channels , distiller.utils.activation_channels_apoz), records : RecordsActivationStatsCollector(model, classes=[torch.nn.Conv2d])}) By default, these Collectors write their data to files in the active log directory. You can use a utility function, distiller.log_activation_statsitics , to log the data of an ActivationStatsCollector instance to one of the backend-loggers. For an example, the code below logs the \"sparsity\" collector to a TensorBoard log file. distiller.log_activation_statsitics(epoch, train , loggers=[tflogger], collector=collectors[ sparsity ])","title":"Collectors and their collaterals"},{"location":"usage.html#caveats","text":"Distiller collects activations statistics using PyTorch's forward-hooks mechanism. Collectors iteratively register the modules' forward-hooks, and collectors are called during the forward traversal and get exposed to activation data. Registering for forward callbacks is performed like this: module.register_forward_hook This makes apparent two limitations of this mechanism: We can only register on PyTorch modules. This means that we can't register on the forward hook of a functionals such as torch.nn.functional.relu and torch.nn.functional.max_pool2d . Therefore, you may need to replace functionals with their module alternative. For example: class MadeUpNet(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 6, 5) def forward(self, x): x = F.relu(self.conv1(x)) return x Can be changed to: class MadeUpNet(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.relu = nn.ReLU(inplace=True) def forward(self, x): x = self.relu(self.conv1(x)) return x We can only use a module instance once in our models. If we use the same module several times, then we can't determine which node in the graph has invoked the callback, because the PyTorch callback signature def hook(module, input, output) doesn't provide enough contextual information. TorchVision's ResNet is an example of a model that uses the same instance of nn.ReLU multiple times: class BasicBlock(nn.Module): expansion = 1 def __init__(self, inplanes, planes, stride=1, downsample=None): super(BasicBlock, self).__init__() self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = nn.BatchNorm2d(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = nn.BatchNorm2d(planes) self.downsample = downsample self.stride = stride def forward(self, x): residual = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) # ================ out = self.conv2(out) out = self.bn2(out) if self.downsample is not None: residual = self.downsample(x) out += residual out = self.relu(out) # ================ return out In Distiller we changed ResNet to use multiple instances of nn.ReLU, and each instance is used only once: class BasicBlock(nn.Module): expansion = 1 def __init__(self, inplanes, planes, stride=1, downsample=None): super(BasicBlock, self).__init__() self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = nn.BatchNorm2d(planes) self.relu1 = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = nn.BatchNorm2d(planes) self.relu2 = nn.ReLU(inplace=True) self.downsample = downsample self.stride = stride def forward(self, x): residual = x out = self.conv1(x) out = self.bn1(out) out = self.relu1(out) # ================ out = self.conv2(out) out = self.bn2(out) if self.downsample is not None: residual = self.downsample(x) out += residual out = self.relu2(out) # ================ return out","title":"Caveats"},{"location":"usage.html#using-the-jupyter-notebooks","text":"The Jupyter notebooks contain many examples of how to use the statistics summaries generated by Distiller. They are explained in a separate page.","title":"Using the Jupyter notebooks"},{"location":"usage.html#generating-this-documentation","text":"Install mkdocs and the required packages by executing: $ pip3 install -r doc-requirements.txt To build the project documentation run: $ cd distiller/docs-src $ mkdocs build --clean This will create a folder named 'site' which contains the documentation website. Open distiller/docs/site/index.html to view the documentation home page.","title":"Generating this documentation"}]}
\ No newline at end of file
diff --git a/docs/search/text.js b/docs/search/text.js
deleted file mode 100644
index 17921b6e5e0e49f9da54e1635010b3e68ccab061..0000000000000000000000000000000000000000
--- a/docs/search/text.js
+++ /dev/null
@@ -1,390 +0,0 @@
-/**
- * @license RequireJS text 2.0.12 Copyright (c) 2010-2014, The Dojo Foundation All Rights Reserved.
- * Available via the MIT or new BSD license.
- * see: http://github.com/requirejs/text for details
- */
-/*jslint regexp: true */
-/*global require, XMLHttpRequest, ActiveXObject,
-  define, window, process, Packages,
-  java, location, Components, FileUtils */
-
-define(['module'], function (module) {
-    'use strict';
-
-    var text, fs, Cc, Ci, xpcIsWindows,
-        progIds = ['Msxml2.XMLHTTP', 'Microsoft.XMLHTTP', 'Msxml2.XMLHTTP.4.0'],
-        xmlRegExp = /^\s*<\?xml(\s)+version=[\'\"](\d)*.(\d)*[\'\"](\s)*\?>/im,
-        bodyRegExp = /<body[^>]*>\s*([\s\S]+)\s*<\/body>/im,
-        hasLocation = typeof location !== 'undefined' && location.href,
-        defaultProtocol = hasLocation && location.protocol && location.protocol.replace(/\:/, ''),
-        defaultHostName = hasLocation && location.hostname,
-        defaultPort = hasLocation && (location.port || undefined),
-        buildMap = {},
-        masterConfig = (module.config && module.config()) || {};
-
-    text = {
-        version: '2.0.12',
-
-        strip: function (content) {
-            //Strips <?xml ...?> declarations so that external SVG and XML
-            //documents can be added to a document without worry. Also, if the string
-            //is an HTML document, only the part inside the body tag is returned.
-            if (content) {
-                content = content.replace(xmlRegExp, "");
-                var matches = content.match(bodyRegExp);
-                if (matches) {
-                    content = matches[1];
-                }
-            } else {
-                content = "";
-            }
-            return content;
-        },
-
-        jsEscape: function (content) {
-            return content.replace(/(['\\])/g, '\\$1')
-                .replace(/[\f]/g, "\\f")
-                .replace(/[\b]/g, "\\b")
-                .replace(/[\n]/g, "\\n")
-                .replace(/[\t]/g, "\\t")
-                .replace(/[\r]/g, "\\r")
-                .replace(/[\u2028]/g, "\\u2028")
-                .replace(/[\u2029]/g, "\\u2029");
-        },
-
-        createXhr: masterConfig.createXhr || function () {
-            //Would love to dump the ActiveX crap in here. Need IE 6 to die first.
-            var xhr, i, progId;
-            if (typeof XMLHttpRequest !== "undefined") {
-                return new XMLHttpRequest();
-            } else if (typeof ActiveXObject !== "undefined") {
-                for (i = 0; i < 3; i += 1) {
-                    progId = progIds[i];
-                    try {
-                        xhr = new ActiveXObject(progId);
-                    } catch (e) {}
-
-                    if (xhr) {
-                        progIds = [progId];  // so faster next time
-                        break;
-                    }
-                }
-            }
-
-            return xhr;
-        },
-
-        /**
-         * Parses a resource name into its component parts. Resource names
-         * look like: module/name.ext!strip, where the !strip part is
-         * optional.
-         * @param {String} name the resource name
-         * @returns {Object} with properties "moduleName", "ext" and "strip"
-         * where strip is a boolean.
-         */
-        parseName: function (name) {
-            var modName, ext, temp,
-                strip = false,
-                index = name.indexOf("."),
-                isRelative = name.indexOf('./') === 0 ||
-                             name.indexOf('../') === 0;
-
-            if (index !== -1 && (!isRelative || index > 1)) {
-                modName = name.substring(0, index);
-                ext = name.substring(index + 1, name.length);
-            } else {
-                modName = name;
-            }
-
-            temp = ext || modName;
-            index = temp.indexOf("!");
-            if (index !== -1) {
-                //Pull off the strip arg.
-                strip = temp.substring(index + 1) === "strip";
-                temp = temp.substring(0, index);
-                if (ext) {
-                    ext = temp;
-                } else {
-                    modName = temp;
-                }
-            }
-
-            return {
-                moduleName: modName,
-                ext: ext,
-                strip: strip
-            };
-        },
-
-        xdRegExp: /^((\w+)\:)?\/\/([^\/\\]+)/,
-
-        /**
-         * Is an URL on another domain. Only works for browser use, returns
-         * false in non-browser environments. Only used to know if an
-         * optimized .js version of a text resource should be loaded
-         * instead.
-         * @param {String} url
-         * @returns Boolean
-         */
-        useXhr: function (url, protocol, hostname, port) {
-            var uProtocol, uHostName, uPort,
-                match = text.xdRegExp.exec(url);
-            if (!match) {
-                return true;
-            }
-            uProtocol = match[2];
-            uHostName = match[3];
-
-            uHostName = uHostName.split(':');
-            uPort = uHostName[1];
-            uHostName = uHostName[0];
-
-            return (!uProtocol || uProtocol === protocol) &&
-                   (!uHostName || uHostName.toLowerCase() === hostname.toLowerCase()) &&
-                   ((!uPort && !uHostName) || uPort === port);
-        },
-
-        finishLoad: function (name, strip, content, onLoad) {
-            content = strip ? text.strip(content) : content;
-            if (masterConfig.isBuild) {
-                buildMap[name] = content;
-            }
-            onLoad(content);
-        },
-
-        load: function (name, req, onLoad, config) {
-            //Name has format: some.module.filext!strip
-            //The strip part is optional.
-            //if strip is present, then that means only get the string contents
-            //inside a body tag in an HTML string. For XML/SVG content it means
-            //removing the <?xml ...?> declarations so the content can be inserted
-            //into the current doc without problems.
-
-            // Do not bother with the work if a build and text will
-            // not be inlined.
-            if (config && config.isBuild && !config.inlineText) {
-                onLoad();
-                return;
-            }
-
-            masterConfig.isBuild = config && config.isBuild;
-
-            var parsed = text.parseName(name),
-                nonStripName = parsed.moduleName +
-                    (parsed.ext ? '.' + parsed.ext : ''),
-                url = req.toUrl(nonStripName),
-                useXhr = (masterConfig.useXhr) ||
-                         text.useXhr;
-
-            // Do not load if it is an empty: url
-            if (url.indexOf('empty:') === 0) {
-                onLoad();
-                return;
-            }
-
-            //Load the text. Use XHR if possible and in a browser.
-            if (!hasLocation || useXhr(url, defaultProtocol, defaultHostName, defaultPort)) {
-                text.get(url, function (content) {
-                    text.finishLoad(name, parsed.strip, content, onLoad);
-                }, function (err) {
-                    if (onLoad.error) {
-                        onLoad.error(err);
-                    }
-                });
-            } else {
-                //Need to fetch the resource across domains. Assume
-                //the resource has been optimized into a JS module. Fetch
-                //by the module name + extension, but do not include the
-                //!strip part to avoid file system issues.
-                req([nonStripName], function (content) {
-                    text.finishLoad(parsed.moduleName + '.' + parsed.ext,
-                                    parsed.strip, content, onLoad);
-                });
-            }
-        },
-
-        write: function (pluginName, moduleName, write, config) {
-            if (buildMap.hasOwnProperty(moduleName)) {
-                var content = text.jsEscape(buildMap[moduleName]);
-                write.asModule(pluginName + "!" + moduleName,
-                               "define(function () { return '" +
-                                   content +
-                               "';});\n");
-            }
-        },
-
-        writeFile: function (pluginName, moduleName, req, write, config) {
-            var parsed = text.parseName(moduleName),
-                extPart = parsed.ext ? '.' + parsed.ext : '',
-                nonStripName = parsed.moduleName + extPart,
-                //Use a '.js' file name so that it indicates it is a
-                //script that can be loaded across domains.
-                fileName = req.toUrl(parsed.moduleName + extPart) + '.js';
-
-            //Leverage own load() method to load plugin value, but only
-            //write out values that do not have the strip argument,
-            //to avoid any potential issues with ! in file names.
-            text.load(nonStripName, req, function (value) {
-                //Use own write() method to construct full module value.
-                //But need to create shell that translates writeFile's
-                //write() to the right interface.
-                var textWrite = function (contents) {
-                    return write(fileName, contents);
-                };
-                textWrite.asModule = function (moduleName, contents) {
-                    return write.asModule(moduleName, fileName, contents);
-                };
-
-                text.write(pluginName, nonStripName, textWrite, config);
-            }, config);
-        }
-    };
-
-    if (masterConfig.env === 'node' || (!masterConfig.env &&
-            typeof process !== "undefined" &&
-            process.versions &&
-            !!process.versions.node &&
-            !process.versions['node-webkit'])) {
-        //Using special require.nodeRequire, something added by r.js.
-        fs = require.nodeRequire('fs');
-
-        text.get = function (url, callback, errback) {
-            try {
-                var file = fs.readFileSync(url, 'utf8');
-                //Remove BOM (Byte Mark Order) from utf8 files if it is there.
-                if (file.indexOf('\uFEFF') === 0) {
-                    file = file.substring(1);
-                }
-                callback(file);
-            } catch (e) {
-                if (errback) {
-                    errback(e);
-                }
-            }
-        };
-    } else if (masterConfig.env === 'xhr' || (!masterConfig.env &&
-            text.createXhr())) {
-        text.get = function (url, callback, errback, headers) {
-            var xhr = text.createXhr(), header;
-            xhr.open('GET', url, true);
-
-            //Allow plugins direct access to xhr headers
-            if (headers) {
-                for (header in headers) {
-                    if (headers.hasOwnProperty(header)) {
-                        xhr.setRequestHeader(header.toLowerCase(), headers[header]);
-                    }
-                }
-            }
-
-            //Allow overrides specified in config
-            if (masterConfig.onXhr) {
-                masterConfig.onXhr(xhr, url);
-            }
-
-            xhr.onreadystatechange = function (evt) {
-                var status, err;
-                //Do not explicitly handle errors, those should be
-                //visible via console output in the browser.
-                if (xhr.readyState === 4) {
-                    status = xhr.status || 0;
-                    if (status > 399 && status < 600) {
-                        //An http 4xx or 5xx error. Signal an error.
-                        err = new Error(url + ' HTTP status: ' + status);
-                        err.xhr = xhr;
-                        if (errback) {
-                            errback(err);
-                        }
-                    } else {
-                        callback(xhr.responseText);
-                    }
-
-                    if (masterConfig.onXhrComplete) {
-                        masterConfig.onXhrComplete(xhr, url);
-                    }
-                }
-            };
-            xhr.send(null);
-        };
-    } else if (masterConfig.env === 'rhino' || (!masterConfig.env &&
-            typeof Packages !== 'undefined' && typeof java !== 'undefined')) {
-        //Why Java, why is this so awkward?
-        text.get = function (url, callback) {
-            var stringBuffer, line,
-                encoding = "utf-8",
-                file = new java.io.File(url),
-                lineSeparator = java.lang.System.getProperty("line.separator"),
-                input = new java.io.BufferedReader(new java.io.InputStreamReader(new java.io.FileInputStream(file), encoding)),
-                content = '';
-            try {
-                stringBuffer = new java.lang.StringBuffer();
-                line = input.readLine();
-
-                // Byte Order Mark (BOM) - The Unicode Standard, version 3.0, page 324
-                // http://www.unicode.org/faq/utf_bom.html
-
-                // Note that when we use utf-8, the BOM should appear as "EF BB BF", but it doesn't due to this bug in the JDK:
-                // http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
-                if (line && line.length() && line.charAt(0) === 0xfeff) {
-                    // Eat the BOM, since we've already found the encoding on this file,
-                    // and we plan to concatenating this buffer with others; the BOM should
-                    // only appear at the top of a file.
-                    line = line.substring(1);
-                }
-
-                if (line !== null) {
-                    stringBuffer.append(line);
-                }
-
-                while ((line = input.readLine()) !== null) {
-                    stringBuffer.append(lineSeparator);
-                    stringBuffer.append(line);
-                }
-                //Make sure we return a JavaScript string and not a Java string.
-                content = String(stringBuffer.toString()); //String
-            } finally {
-                input.close();
-            }
-            callback(content);
-        };
-    } else if (masterConfig.env === 'xpconnect' || (!masterConfig.env &&
-            typeof Components !== 'undefined' && Components.classes &&
-            Components.interfaces)) {
-        //Avert your gaze!
-        Cc = Components.classes;
-        Ci = Components.interfaces;
-        Components.utils['import']('resource://gre/modules/FileUtils.jsm');
-        xpcIsWindows = ('@mozilla.org/windows-registry-key;1' in Cc);
-
-        text.get = function (url, callback) {
-            var inStream, convertStream, fileObj,
-                readData = {};
-
-            if (xpcIsWindows) {
-                url = url.replace(/\//g, '\\');
-            }
-
-            fileObj = new FileUtils.File(url);
-
-            //XPCOM, you so crazy
-            try {
-                inStream = Cc['@mozilla.org/network/file-input-stream;1']
-                           .createInstance(Ci.nsIFileInputStream);
-                inStream.init(fileObj, 1, 0, false);
-
-                convertStream = Cc['@mozilla.org/intl/converter-input-stream;1']
-                                .createInstance(Ci.nsIConverterInputStream);
-                convertStream.init(inStream, "utf-8", inStream.available(),
-                Ci.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
-
-                convertStream.readString(inStream.available(), readData);
-                convertStream.close();
-                inStream.close();
-                callback(readData.value);
-            } catch (e) {
-                throw new Error((fileObj && fileObj.path || '') + ': ' + e);
-            }
-        };
-    }
-    return text;
-});
diff --git a/docs/search/worker.js b/docs/search/worker.js
new file mode 100644
index 0000000000000000000000000000000000000000..a3ccc07f28275563550ecfec00580c3382193147
--- /dev/null
+++ b/docs/search/worker.js
@@ -0,0 +1,128 @@
+var base_path = 'function' === typeof importScripts ? '.' : '/search/';
+var allowSearch = false;
+var index;
+var documents = {};
+var lang = ['en'];
+var data;
+
+function getScript(script, callback) {
+  console.log('Loading script: ' + script);
+  $.getScript(base_path + script).done(function () {
+    callback();
+  }).fail(function (jqxhr, settings, exception) {
+    console.log('Error: ' + exception);
+  });
+}
+
+function getScriptsInOrder(scripts, callback) {
+  if (scripts.length === 0) {
+    callback();
+    return;
+  }
+  getScript(scripts[0], function() {
+    getScriptsInOrder(scripts.slice(1), callback);
+  });
+}
+
+function loadScripts(urls, callback) {
+  if( 'function' === typeof importScripts ) {
+    importScripts.apply(null, urls);
+    callback();
+  } else {
+    getScriptsInOrder(urls, callback);
+  }
+}
+
+function onJSONLoaded () {
+  data = JSON.parse(this.responseText);
+  var scriptsToLoad = ['lunr.js'];
+  if (data.config && data.config.lang && data.config.lang.length) {
+    lang = data.config.lang;
+  }
+  if (lang.length > 1 || lang[0] !== "en") {
+    scriptsToLoad.push('lunr.stemmer.support.js');
+    if (lang.length > 1) {
+      scriptsToLoad.push('lunr.multi.js');
+    }
+    for (var i=0; i < lang.length; i++) {
+      if (lang[i] != 'en') {
+        scriptsToLoad.push(['lunr', lang[i], 'js'].join('.'));
+      }
+    }
+  }
+  loadScripts(scriptsToLoad, onScriptsLoaded);
+}
+
+function onScriptsLoaded () {
+  console.log('All search scripts loaded, building Lunr index...');
+  if (data.config && data.config.separator && data.config.separator.length) {
+    lunr.tokenizer.separator = new RegExp(data.config.separator);
+  }
+  if (data.index) {
+    index = lunr.Index.load(data.index);
+    data.docs.forEach(function (doc) {
+      documents[doc.location] = doc;
+    });
+    console.log('Lunr pre-built index loaded, search ready');
+  } else {
+    index = lunr(function () {
+      if (lang.length === 1 && lang[0] !== "en" && lunr[lang[0]]) {
+        this.use(lunr[lang[0]]);
+      } else if (lang.length > 1) {
+        this.use(lunr.multiLanguage.apply(null, lang));  // spread operator not supported in all browsers: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_operator#Browser_compatibility
+      }
+      this.field('title');
+      this.field('text');
+      this.ref('location');
+
+      for (var i=0; i < data.docs.length; i++) {
+        var doc = data.docs[i];
+        this.add(doc);
+        documents[doc.location] = doc;
+      }
+    });
+    console.log('Lunr index built, search ready');
+  }
+  allowSearch = true;
+  postMessage({allowSearch: allowSearch});
+}
+
+function init () {
+  var oReq = new XMLHttpRequest();
+  oReq.addEventListener("load", onJSONLoaded);
+  var index_path = base_path + '/search_index.json';
+  if( 'function' === typeof importScripts ){
+      index_path = 'search_index.json';
+  }
+  oReq.open("GET", index_path);
+  oReq.send();
+}
+
+function search (query) {
+  if (!allowSearch) {
+    console.error('Assets for search still loading');
+    return;
+  }
+
+  var resultDocuments = [];
+  var results = index.search(query);
+  for (var i=0; i < results.length; i++){
+    var result = results[i];
+    doc = documents[result.ref];
+    doc.summary = doc.text.substring(0, 200);
+    resultDocuments.push(doc);
+  }
+  return resultDocuments;
+}
+
+if( 'function' === typeof importScripts ) {
+  onmessage = function (e) {
+    if (e.data.init) {
+      init();
+    } else if (e.data.query) {
+      postMessage({ results: search(e.data.query) });
+    } else {
+      console.error("Worker - Unrecognized message: " + e);
+    }
+  };
+}
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 91902aac6f770d48e9cac6cb2a83e1ef9fd36fb8..c40de064ca22188bd612c8039babf39792f83a33 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -1,132 +1,88 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
-
-    
     <url>
-     <loc>/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-    
-
-    
     <url>
-     <loc>/install/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-    
-
-    
     <url>
-     <loc>/usage/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-    
-
-    
     <url>
-     <loc>/schedule/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-    
-
-    
-        
     <url>
-     <loc>/pruning/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
     <url>
-     <loc>/regularization/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
     <url>
-     <loc>/quantization/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
     <url>
-     <loc>/knowledge_distillation/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
     <url>
-     <loc>/conditional_computation/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
-    
-
-    
-        
     <url>
-     <loc>/algo_pruning/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
     <url>
-     <loc>/algo_quantization/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
     <url>
-     <loc>/algo_earlyexit/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
-    
-
-    
     <url>
-     <loc>/model_zoo/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-    
-
-    
     <url>
-     <loc>/jupyter/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-    
-
-    
     <url>
-     <loc>/design/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-    
-
-    
-        
     <url>
-     <loc>/tutorial-struct_pruning/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
     <url>
-     <loc>/tutorial-lang_model/index.html</loc>
-     <lastmod>2019-02-24</lastmod>
+     <loc>None</loc>
+     <lastmod>2019-03-28</lastmod>
      <changefreq>daily</changefreq>
     </url>
-        
-    
-
 </urlset>
\ No newline at end of file
diff --git a/docs/sitemap.xml.gz b/docs/sitemap.xml.gz
new file mode 100644
index 0000000000000000000000000000000000000000..8ef639b6e8a04a14b3bb4bbe2bda045c7af2dc48
Binary files /dev/null and b/docs/sitemap.xml.gz differ
diff --git a/docs/tutorial-lang_model/index.html b/docs/tutorial-lang_model.html
similarity index 93%
rename from docs/tutorial-lang_model/index.html
rename to docs/tutorial-lang_model.html
index bf81346e7670079e775d6ab92587da28fbd366d6..1966666f1426a511e6bd2865887d2de4f843b8c6 100644
--- a/docs/tutorial-lang_model/index.html
+++ b/docs/tutorial-lang_model.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Pruning a Language Model - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Pruning a Language Model";
     var mkdocs_page_input_path = "tutorial-lang_model.md";
-    var mkdocs_page_url = "/tutorial-lang_model/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -101,32 +102,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -135,11 +136,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class=" current">
                     
-    <a class="current" href="index.html">Pruning a Language Model</a>
+    <a class="current" href="tutorial-lang_model.html">Pruning a Language Model</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#using-distiller-to-prune-a-pytorch-language-model">Using Distiller to prune a PyTorch language model</a></li>
@@ -176,7 +177,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -184,7 +185,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -308,7 +309,7 @@ Note that we can improve the results by training longer, since the loss curves a
 </table>
 <p align="center"><b>Table 1: AGP language model pruning results. <br>NNZ stands for number of non-zero coefficients (embeddings are counted once, because they are tied).</b></p>
 
-<p><center><img alt="Example 1" src="../imgs/word_lang_model_performance.png" /></center>
+<p><center><img alt="Example 1" src="imgs/word_lang_model_performance.png" /></center>
 <p align="center">
   <b>Figure 1: Perplexity vs model size (lower perplexity is better).</b>
 </p></p>
@@ -662,7 +663,7 @@ In the next installment, I'll explain how we added an implementation of Baidu Re
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
       
-        <a href="../tutorial-struct_pruning/index.html" class="btn btn-neutral" title="Pruning Filters and Channels"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="tutorial-struct_pruning.html" class="btn btn-neutral" title="Pruning Filters and Channels"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -688,16 +689,15 @@ In the next installment, I'll explain how we added an implementation of Baidu Re
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../tutorial-struct_pruning/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="tutorial-struct_pruning.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/tutorial-struct_pruning/index.html b/docs/tutorial-struct_pruning.html
similarity index 82%
rename from docs/tutorial-struct_pruning/index.html
rename to docs/tutorial-struct_pruning.html
index 680cf313a1da812cdaa897fc6256c48794f90561..a63b4975ba73e9d3d79ede6e33227d4f0a48ba95 100644
--- a/docs/tutorial-struct_pruning/index.html
+++ b/docs/tutorial-struct_pruning.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Pruning Filters and Channels - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Pruning Filters and Channels";
     var mkdocs_page_input_path = "tutorial-struct_pruning.md";
-    var mkdocs_page_url = "/tutorial-struct_pruning/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,22 +51,22 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../usage/index.html">Usage</a>
+    <a class="" href="usage.html">Usage</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -74,23 +75,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -101,32 +102,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -135,7 +136,7 @@
     <ul class="subnav">
                 <li class=" current">
                     
-    <a class="current" href="index.html">Pruning Filters and Channels</a>
+    <a class="current" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
     <ul class="subnav">
             
     <li class="toctree-l3"><a href="#pruning-filters-channels">Pruning Filters &amp; Channels</a></li>
@@ -155,7 +156,7 @@
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -170,7 +171,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -178,7 +179,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
         
@@ -210,7 +211,7 @@ Convolution weights are 4D: (F, C, K, K) where F is the number of filters, C is
 In filter pruning we use some criterion to determine which filters are <strong>important</strong> and which are not.  Researchers came up with all sorts of pruning criteria: the L1-magnitude of the filters (citation), the entropy of the activations (citation), and the classification accuracy reduction (citation) are just some examples.  Disregarding how we chose the filters to prune, let’s imagine that in the diagram below, we chose to prune (remove) the green and orange filters (the circle with the “*” designates a Convolution operation).</p>
 <p>Since we have two less filters operating on the input, we must have two less output feature-maps.  So when we prune filters, besides changing the physical size of the weight tensors, we also need to reconfigure the immediate Convolution layer (change its <code>out_channels</code>) and the following Convolution layer (change its <code>in_channels</code>).  And finally, because the next layer’s input is now smaller (has fewer channels),  we should also shrink the next layer’s weights tensors, by removing the channels corresponding to the filters we pruned.  We say that there is a <strong>data-dependency</strong> between the two Convolution layers.  I didn’t make any mention of the activation function that usually follows Convolution, because these functions are parameter-less and are not sensitive to the shape of their input.
 There are some other dependencies that Distiller resolves (such as Optimizer parameters tightly-coupled to the weights) that I won’t discuss here, because they are implementation details.
-<center><img alt="Example 1" src="../imgs/pruning_structs_ex1.png" /></center></p>
+<center><img alt="Example 1" src="imgs/pruning_structs_ex1.png" /></center></p>
 <p>The scheduler YAML syntax for this example is pasted below.  We use L1-norm ranking of weight filters, and the pruning-rate is set by the AGP algorithm (Automatic Gradual Pruning).  The Convolution layers are conveniently named <code>conv1</code> and <code>conv2</code> in this example.</p>
 <pre><code>pruners:
   example_pruner:
@@ -222,17 +223,17 @@ There are some other dependencies that Distiller resolves (such as Optimizer par
 </code></pre>
 
 <p>Now let’s add a Batch Normalization layer between the two convolutions:
-<center><img alt="Example 2" src="../imgs/pruning_structs_ex2.png" /></center></p>
+<center><img alt="Example 2" src="imgs/pruning_structs_ex2.png" /></center></p>
 <p>The Batch Normalization layer is parameterized by a couple of tensors that contain information per input-channel (i.e. scale and shift).  Because our Convolution produces less output FMs, and these are the input to the Batch Normalization layer, we also need to reconfigure the Batch Normalization layer.  And we also need to physically shrink the Batch Normalization layer’s scale and shift tensors, which are coefficients in the BN input transformation.  Moreover, the scale and shift coefficients that we remove from the tensors, must correspond to the filters (or output feature-maps channels) that we removed from the Convolution weight tensors.  This small nuance will prove to be a large pain, but we’ll get to that in later examples.
 The presence of a Batch Normalization layer in the example above is transparent to us, and in fact, the YAML schedule does not change.  Distiller detects the presence of Batch Normalization layers and adjusts their parameters automatically.</p>
 <p>Let’s look at another example, with non-serial data-dependencies.  Here, the output of <code>conv1</code> is the input for <code>conv2</code> and <code>conv3</code>.  This is an example of parallel data-dependency, since both <code>conv2</code> and <code>conv3</code> depend on <code>conv1</code>.
-<center><img alt="Example 3" src="../imgs/pruning_structs_ex3.png" /></center></p>
+<center><img alt="Example 3" src="imgs/pruning_structs_ex3.png" /></center></p>
 <p>Note that the Distiller YAML schedule is unchanged from the previous two examples, since we are still only explicitly pruning the weight filters of <code>conv1</code>.  The weight channels of <code>conv2</code> and <code>conv3</code> are pruned implicitly by Distiller in a process called “Thinning” (on which I will expand in a different post).</p>
 <p>Next, let’s look at another example also involving three Convolutions, but this time we want to prune the filters of two convolutional layers, whose outputs are element-wise-summed and fed into a third Convolution.
 In this example <code>conv3</code> is dependent on both <code>conv1</code> and <code>conv2</code>, and there are two implications to this dependency.  The first, and more obvious implication, is that we need to prune the same number of filters from both <code>conv1</code> and <code>conv2</code>.  Since we apply element-wise addition on the outputs of <code>conv1</code> and <code>conv2</code>, they must have the same shape - and they can only have the same shape if <code>conv1</code> and <code>conv2</code> prune the same number of filters.  The second implication of this triangular data-dependency is that both <code>conv1</code> and <code>conv2</code> must prune the <strong>same</strong> filters!  Let’s imagine for a moment, that we ignore this second constraint.  The diagram below illustrates the dilemma that arises: how should we prune the channels of the weights of <code>conv3</code>?  Obviously, we can’t.
-<center><img alt="Example 4" src="../imgs/pruning_structs_ex4.png" /></center></p>
+<center><img alt="Example 4" src="imgs/pruning_structs_ex4.png" /></center></p>
 <p>We must apply the second constraint – and that means that we now need to be proactive: we need to decide whether to use the prune <code>conv1</code> and <code>conv2</code> according to the filter-pruning choices of <code>conv1</code> or of <code>conv2</code>.  The diagram below illustrates the pruning scheme after deciding to follow the pruning choices of <code>conv1</code>.
-<center><img alt="Example 5" src="../imgs/pruning_structs_ex5.png" /></center></p>
+<center><img alt="Example 5" src="imgs/pruning_structs_ex5.png" /></center></p>
 <p>The YAML compression schedule syntax needs to be able to express the two dependencies (or constraints) discussed above.  First we need to tell the Filter Pruner that we there is a dependency of type <strong>Leader</strong>.  This means that all of the tensors listed in the <code>weights</code> field are pruned together, to the same extent at each iteration, and that to prune the filters we will use the pruning decisions of the first tensor listed.  In the example below <code>module.conv1.weight</code> and <code>module.conv2.weight</code> are pruned together according to the pruning choices for <code>module.conv1.weight</code>.</p>
 <pre><code>pruners:
   example_pruner:
@@ -257,7 +258,7 @@ In this example <code>conv3</code> is dependent on both <code>conv1</code> and <
 </code></pre>
 
 <p>As the diagram shows, <code>conv1</code> is now dependent on <code>conv2</code> and its weights filters will be implicitly pruned according to the channels removed from the weights of <code>conv2</code>.
-<center><img alt="Example 1" src="../imgs/pruning_structs_ex1.png" /></center></p>
+<center><img alt="Example 1" src="imgs/pruning_structs_ex1.png" /></center></p>
 <p>Geek On.</p>
               
             </div>
@@ -266,10 +267,10 @@ In this example <code>conv3</code> is dependent on both <code>conv1</code> and <
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../tutorial-lang_model/index.html" class="btn btn-neutral float-right" title="Pruning a Language Model">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="tutorial-lang_model.html" class="btn btn-neutral float-right" title="Pruning a Language Model">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../design/index.html" class="btn btn-neutral" title="Design"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="design.html" class="btn btn-neutral" title="Design"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -295,18 +296,17 @@ In this example <code>conv3</code> is dependent on both <code>conv1</code> and <
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../design/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="design.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../tutorial-lang_model/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="tutorial-lang_model.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>
diff --git a/docs/usage/index.html b/docs/usage.html
similarity index 91%
rename from docs/usage/index.html
rename to docs/usage.html
index 0856f77c8533029b24ffdac75d51fa6cdcde41d7..3095b44fd796af6de2da0ef4fb57fe83059d6389 100644
--- a/docs/usage/index.html
+++ b/docs/usage.html
@@ -7,25 +7,26 @@
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   
   
-  <link rel="shortcut icon" href="../img/favicon.ico">
+  <link rel="shortcut icon" href="img/favicon.ico">
   <title>Usage - Neural Network Distiller</title>
   <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
 
-  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
-  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
-  <link rel="stylesheet" href="../css/highlight.css">
-  <link href="../extra.css" rel="stylesheet">
+  <link rel="stylesheet" href="css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
+  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
+  <link href="extra.css" rel="stylesheet">
   
   <script>
     // Current page data
     var mkdocs_page_name = "Usage";
     var mkdocs_page_input_path = "usage.md";
-    var mkdocs_page_url = "/usage/index.html";
+    var mkdocs_page_url = null;
   </script>
   
-  <script src="../js/jquery-2.1.1.min.js"></script>
-  <script src="../js/modernizr-2.8.3.min.js"></script>
-  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
+  <script src="js/jquery-2.1.1.min.js" defer></script>
+  <script src="js/modernizr-2.8.3.min.js" defer></script>
+  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
+  <script>hljs.initHighlightingOnLoad();</script> 
   
 </head>
 
@@ -36,10 +37,10 @@
     
     <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
       <div class="wy-side-nav-search">
-        <a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
+        <a href="index.html" class="icon icon-home"> Neural Network Distiller</a>
         <div role="search">
-  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
-    <input type="text" name="q" placeholder="Search docs" />
+  <form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" title="Type search term here" />
   </form>
 </div>
       </div>
@@ -50,17 +51,17 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../index.html">Home</a>
+    <a class="" href="index.html">Home</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../install/index.html">Installation</a>
+    <a class="" href="install.html">Installation</a>
 	    </li>
           
             <li class="toctree-l1 current">
 		
-    <a class="current" href="index.html">Usage</a>
+    <a class="current" href="usage.html">Usage</a>
     <ul class="subnav">
             
     <li class="toctree-l2"><a href="#using-the-sample-application">Using the sample application</a></li>
@@ -97,7 +98,7 @@
           
             <li class="toctree-l1">
 		
-    <a class="" href="../schedule/index.html">Compression Scheduling</a>
+    <a class="" href="schedule.html">Compression Scheduling</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -106,23 +107,23 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../pruning/index.html">Pruning</a>
+    <a class="" href="pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../regularization/index.html">Regularization</a>
+    <a class="" href="regularization.html">Regularization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../quantization/index.html">Quantization</a>
+    <a class="" href="quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
+    <a class="" href="knowledge_distillation.html">Knowledge Distillation</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../conditional_computation/index.html">Conditional Computation</a>
+    <a class="" href="conditional_computation.html">Conditional Computation</a>
                 </li>
     </ul>
 	    </li>
@@ -133,32 +134,32 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../algo_pruning/index.html">Pruning</a>
+    <a class="" href="algo_pruning.html">Pruning</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_quantization/index.html">Quantization</a>
+    <a class="" href="algo_quantization.html">Quantization</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../algo_earlyexit/index.html">Early Exit</a>
+    <a class="" href="algo_earlyexit.html">Early Exit</a>
                 </li>
     </ul>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../model_zoo/index.html">Model Zoo</a>
+    <a class="" href="model_zoo.html">Model Zoo</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../jupyter/index.html">Jupyter Notebooks</a>
+    <a class="" href="jupyter.html">Jupyter Notebooks</a>
 	    </li>
           
             <li class="toctree-l1">
 		
-    <a class="" href="../design/index.html">Design</a>
+    <a class="" href="design.html">Design</a>
 	    </li>
           
             <li class="toctree-l1">
@@ -167,11 +168,11 @@
     <ul class="subnav">
                 <li class="">
                     
-    <a class="" href="../tutorial-struct_pruning/index.html">Pruning Filters and Channels</a>
+    <a class="" href="tutorial-struct_pruning.html">Pruning Filters and Channels</a>
                 </li>
                 <li class="">
                     
-    <a class="" href="../tutorial-lang_model/index.html">Pruning a Language Model</a>
+    <a class="" href="tutorial-lang_model.html">Pruning a Language Model</a>
                 </li>
     </ul>
 	    </li>
@@ -186,7 +187,7 @@
       
       <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
         <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
-        <a href="../index.html">Neural Network Distiller</a>
+        <a href="index.html">Neural Network Distiller</a>
       </nav>
 
       
@@ -194,7 +195,7 @@
         <div class="rst-content">
           <div role="navigation" aria-label="breadcrumbs navigation">
   <ul class="wy-breadcrumbs">
-    <li><a href="../index.html">Docs</a> &raquo;</li>
+    <li><a href="index.html">Docs</a> &raquo;</li>
     
       
     
@@ -217,7 +218,7 @@
 </ul>
 <p>The sample application supports various features for compression of image classification DNNs, and gives an example of how to integrate distiller in your own application.  The code is documented and should be considered the best source of documentation, but we provide some elaboration here.</p>
 <p>This diagram shows how where <code>compress_classifier.py</code> fits in the compression workflow, and how we integrate the Jupyter notebooks as part of our research work.
-<center><img alt="Using Distiller" src="../imgs/use-flow.png" /></center><br></p>
+<center><img alt="Using Distiller" src="imgs/use-flow.png" /></center><br></p>
 <h2 id="command-line-arguments">Command line arguments</h2>
 <p>To get help on the command line arguments, invoke:</p>
 <pre><code>$ python3 compress_classifier.py --help
@@ -281,7 +282,7 @@ Parameters:
 <h2 id="examples">Examples</h2>
 <p>Distiller comes with several example schedules which can be used together with <code>compress_classifier.py</code>.
 These example schedules (YAML) files, contain the command line that is used in order to invoke the schedule (so that you can easily recreate the results in your environment), together with the results of the pruning or regularization.  The results usually contain a table showing the sparsity of  each of the model parameters, together with the validation and test top1, top5 and loss scores.</p>
-<p>For more details on the example schedules, you can refer to the coverage of the <a href="../model_zoo/index.html">Model Zoo</a>.</p>
+<p>For more details on the example schedules, you can refer to the coverage of the <a href="model_zoo.html">Model Zoo</a>.</p>
 <ul>
 <li><strong>examples/agp-pruning</strong>:<ul>
 <li>Automated Gradual Pruning (AGP) on MobileNet and ResNet18 (ImageNet dataset)
@@ -356,7 +357,7 @@ Results are output as a CSV file (<code>sensitivity.csv</code>) and PNG file (<c
 <pre><code class="bash">$ python3 compress_classifier.py -a resnet18 ../../../data.imagenet  --pretrained --quantize-eval --evaluate
 </code></pre>
 
-<p>See <a href="../schedule/index.html#post-training-quantization">here</a> for more details on how to invoke post-training quantization from the command line.</p>
+<p>See <a href="schedule.html#post-training-quantization">here</a> for more details on how to invoke post-training quantization from the command line.</p>
 <p>A checkpoint with the quantized model will be dumped in the run directory. It will contain the quantized model parameters (the data type will still be FP32, but the values will be integers). The calculated quantization parameters (scale and zero-point) are stored as well in each quantized layer.</p>
 <p>For more examples of post-training quantization see <a href="https://github.com/NervanaSystems/distiller/blob/master/examples/quantization/post_training_quant">here</a>.</p>
 <h2 id="summaries">Summaries</h2>
@@ -555,10 +556,10 @@ Open distiller/docs/site/index.html to view the documentation home page.</p>
   
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="../schedule/index.html" class="btn btn-neutral float-right" title="Compression Scheduling">Next <span class="icon icon-circle-arrow-right"></span></a>
+        <a href="schedule.html" class="btn btn-neutral float-right" title="Compression Scheduling">Next <span class="icon icon-circle-arrow-right"></span></a>
       
       
-        <a href="../install/index.html" class="btn btn-neutral" title="Installation"><span class="icon icon-circle-arrow-left"></span> Previous</a>
+        <a href="install.html" class="btn btn-neutral" title="Installation"><span class="icon icon-circle-arrow-left"></span> Previous</a>
       
     </div>
   
@@ -584,18 +585,17 @@ Open distiller/docs/site/index.html to view the documentation home page.</p>
     <span class="rst-current-version" data-toggle="rst-current-version">
       
       
-        <span><a href="../install/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
+        <span><a href="install.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
       
       
-        <span style="margin-left: 15px"><a href="../schedule/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
+        <span style="margin-left: 15px"><a href="schedule.html" style="color: #fcfcfc">Next &raquo;</a></span>
       
     </span>
 </div>
-    <script>var base_url = '..';</script>
-    <script src="../js/theme.js"></script>
-      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
-      <script src="../search/require.js"></script>
-      <script src="../search/search.js"></script>
+    <script>var base_url = '.';</script>
+    <script src="js/theme.js" defer></script>
+      <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" defer></script>
+      <script src="search/main.js" defer></script>
 
 </body>
 </html>