diff --git a/docs-src/docs/algo_pruning.md b/docs-src/docs/algo_pruning.md
index aa4552fcf4a4470969a5bdcae1b6c37004129a1a..aac65d506306c7543f4f6b3f79cad1bfff61f97f 100755
--- a/docs-src/docs/algo_pruning.md
+++ b/docs-src/docs/algo_pruning.md
@@ -103,9 +103,11 @@ The authors describe AGP:
 - Does not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable.
 
 ## RNN pruner
-- <b>Reference:</b> [Exploring Sparsity in Recurrent Neural Networks](https://arxiv.org/abs/1704.05119)
-- <b>Authors:</b> Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta
-- <b>Status: not implemented</b><br>
+The authors of [Exploring Sparsity in Recurrent Neural Networks](https://arxiv.org/abs/1704.05119), Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, "propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network."  They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training.  They show pruning of RNN, GRU, LSTM and embedding layers.
+
+Distiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm.
+
+<center>![Gradual Pruning](imgs/baidu_rnn_pruning.png)</center>
 
 # Structure pruners
 Element-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation.  Structure pruners, remove entire "structures", such as kernels, filters, and even entire feature-maps.
diff --git a/docs/algo_pruning/index.html b/docs/algo_pruning/index.html
index 97e88d52e988ed3e2fb21b52af0b49a08323efcc..5ac80cfa5aa0745f9fc157b9832d2c4f734e5eca 100644
--- a/docs/algo_pruning/index.html
+++ b/docs/algo_pruning/index.html
@@ -272,11 +272,9 @@ abundant and gradually reduce the number of weights being pruned each time as th
 </ul>
 </blockquote>
 <h2 id="rnn-pruner">RNN pruner</h2>
-<ul>
-<li><b>Reference:</b> <a href="https://arxiv.org/abs/1704.05119">Exploring Sparsity in Recurrent Neural Networks</a></li>
-<li><b>Authors:</b> Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta</li>
-<li><b>Status: not implemented</b><br></li>
-</ul>
+<p>The authors of <a href="https://arxiv.org/abs/1704.05119">Exploring Sparsity in Recurrent Neural Networks</a>, Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, "propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network."  They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training.  They show pruning of RNN, GRU, LSTM and embedding layers.</p>
+<p>Distiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm.</p>
+<p><center><img alt="Gradual Pruning" src="../imgs/baidu_rnn_pruning.png" /></center></p>
 <h1 id="structure-pruners">Structure pruners</h1>
 <p>Element-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation.  Structure pruners, remove entire "structures", such as kernels, filters, and even entire feature-maps.</p>
 <h2 id="ranked-structure-pruner">Ranked structure pruner</h2>
diff --git a/docs/index.html b/docs/index.html
index aaa4ac7955e9350f4fe7a6a71eff49edd701f819..b5ca0802c2de721855bfb233ab3ac87936fe155b 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -246,5 +246,5 @@ And of course, if we used a sparse or compressed representation, then we are red
 
 <!--
 MkDocs version : 0.17.2
-Build Date UTC : 2018-05-22 09:40:34
+Build Date UTC : 2018-06-14 10:51:56
 -->
diff --git a/docs/search/search_index.json b/docs/search/search_index.json
index 746c39b3994b0c0f7869484ecd0cc452e1b967d6..6e6344cd64ce82898d2a5d60e9e56cafc511b5c9 100644
--- a/docs/search/search_index.json
+++ b/docs/search/search_index.json
@@ -282,7 +282,7 @@
         },
         {
             "location": "/algo_pruning/index.html",
-            "text": "Weights pruning algorithms\n\n\n\n\nMagnitude pruner\n\n\nThis is the most basic pruner: it applies a thresholding function, \\(thresh(.)\\), on each element, \\(w_i\\), of a weights tensor.  A different threshold can be used for each layer's weights tensor.\n\nBecause the threshold is applied on individual elements, this pruner belongs to the element-wise pruning algorithm family.\n\n\n\\[ thresh(w_i)=\\left\\lbrace\n\\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} }\n\\right\\rbrace \\]\n\n\nSensitivity pruner\n\n\nFinding a threshold magnitude per layer is daunting, especially since each layer's elements have different average absolute values.  We can take advantage of the fact that the weights of convolutional and fully connected layers exhibit a Gaussian distribution with a mean value roughly zero, to avoid using a direct threshold based on the values of each specific tensor.\n\n\nThe diagram below shows the distribution the weights tensor of the first convolutional layer, and first fully-connected layer in TorchVision's pre-trained Alexnet model.  You can see that they have an approximate Gaussian distribution.\n\n\n \n\n\nThe distributions of Alexnet conv1 and fc1 layers\n\n\nWe use the standard deviation of the weights tensor as a sort of normalizing factor between the different weights tensors.  For example, if a tensor is Normally distributed, then about 68% of the elements have an absolute value less than the standard deviation (\\(\\sigma\\)) of the tensor.  Thus, if we set the threshold to \\(s*\\sigma\\), then basically we are thresholding \\(s * 68\\%\\) of the tensor elements.  \n\n\n\\[ thresh(w_i)=\\left\\lbrace\n\\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} }\n\\right\\rbrace \\]\n\n\n\\[\n\\lambda = s * \\sigma_l \\;\\;\\; where\\; \\sigma_l\\; is \\;the \\;std \\;of \\;layer \\;l \\;as \\;measured \\;on \\;the \\;dense \\;model\n\\]\n\n\nHow do we choose this \\(s\\) multiplier?\n\n\nIn \nLearning both Weights and Connections for Efficient Neural Networks\n the authors write:\n\n\n\n\n\"We used the sensitivity results to find each layer\u2019s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer\u2019s weights\n\n\n\n\nSo the results of executing pruning sensitivity analysis on the tensor, gives us a good starting guess at \\(s\\).  Sensitivity analysis is an empirical method, and we still have to spend time to hone in on the exact multiplier value.\n\n\nMethod of operation\n\n\n\n\nStart by running a pruning sensitivity analysis on the model.  \n\n\nThen use the results to set and tune the threshold of each layer, but instead of using a direct threshold use a sensitivity parameter which is multiplied by the standard-deviation of the initial weight-tensor's distribution.\n\n\n\n\nSchedule\n\n\nIn their \npaper\n Song Han et al. use iterative pruning and change the value of the \\(s\\) multiplier at each pruning step.  Distiller's \nSensitivityPruner\n works differently: the value \\(s\\) is set once based on a one-time calculation of the standard-deviation of the tensor (the first time we prune), and relies on the fact that as the tensor is pruned, more elements are \"pulled\" toward the center of the distribution and thus more elements gets pruned.\n\n\nThis actually works quite well as we can see in the diagram below.  This is a TensorBoard screen-capture from Alexnet training, which shows how this method starts off pruning very aggressively, but then slowly reduces the pruning rate.\n\n\n\nWe use a simple iterative-pruning schedule such as: \nPrune every second epoch starting at epoch 0, and ending at epoch 38.\n  This excerpt from \nalexnet.schedule_sensitivity.yaml\n shows how this iterative schedule is conveyed in Distiller scheduling configuration YAML:\n\n\npruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.625\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n\n\n\nLevel pruner\n\n\nClass \nSparsityLevelParameterPruner\n uses a similar method to go around specifying specific thresholding magnitudes.\nInstead of specifying a threshold magnitude, you specify a target sparsity level (expressed as a fraction, so 0.5 means 50% sparsity).  Essentially this pruner also uses a pruning criteria based on the magnitude of each tensor element, but it has the advantage that you can aim for an exact and specific sparsity level.\n\nThis pruner is much more stable compared to \nSensitivityPruner\n because the target sparsity level is not coupled to the actual magnitudes of the elements. Distiller's \nSensitivityPruner\n is unstable because the final sparsity level depends on the convergence pattern of the tensor distribution.  Song Han's methodology of using several different values for the multiplier \\(s\\), and the recalculation of the standard-deviation at each pruning phase, probably gives it stability, but requires much more hyper-parameters (this is the reason we have not implemented it thus far).  \n\n\nTo set the target sparsity levels, you can once again use pruning sensitivity analysis to make better guesses at the correct sparsity level of each\n\n\nMethod of operation\n\n\n\n\nSort the weights in the specified layer by their absolute values. \n\n\nMask to zero the smallest magnitude weights until the desired sparsity level is reached.\n\n\n\n\nAutomated gradual pruner (AGP)\n\n\nIn \nTo prune, or not to prune: exploring the efficacy of pruning for model compression\n, authors Michael Zhu and Suyog Gupta provide an algorithm to schedule a Level Pruner which Distiller implements in \nAutomatedGradualPruner\n.\n\n\n\n\n\n\"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value \\(s_i\\) (usually 0) to a \ufb01nal sparsity value \\(s_f\\) over a span of n pruning steps.\nThe intuition behind this sparsity function in equation (1)  is to prune the network rapidly in the initial phase when the redundant connections are\nabundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.\"\"\n\n\n\n\n\n\nYou can play with the scheduling parameters in the \nagp_schedule.ipynb notebook\n.\n\n\nThe authors describe AGP:\n\n\n\n\n\n\nOur automated gradual pruning algorithm prunes the smallest magnitude weights to achieve a preset level of network sparsity.\n\n\nDoesn't require much hyper-parameter tuning\n\n\nShown to perform well across different models\n\n\nDoes not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable.\n\n\n\n\n\n\nRNN pruner\n\n\n\n\nReference:\n \nExploring Sparsity in Recurrent Neural Networks\n\n\nAuthors:\n Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta\n\n\nStatus: not implemented\n\n\n\n\nStructure pruners\n\n\nElement-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation.  Structure pruners, remove entire \"structures\", such as kernels, filters, and even entire feature-maps.\n\n\nRanked structure pruner\n\n\nThe \nL1RankedStructureParameterPruner\n pruner calculates the magnitude of some \"structure\", orders all of the structures based on some magnitude function and the \nm\n lowest ranking structures are pruned away.  Currently this pruner only performs ranking of filters (3D structures) and it uses the mean of the absolute value of the tensor as the representative of the filter magnitude.  The absolute mean does not depend on the size of the filter, so it is easier to use compared to just using the \\(L_1\\)-norm of the structure, and at the same time it is a good proxy of the \\(L_1\\)-norm.\n\n\nIn \nPruning Filters for Efficient ConvNets\n the authors use filter ranking, with \none-shot pruning\n followed by fine-tuning.  The authors of \nExploiting Sparseness in Deep Neural Networks for Large Vocabulary Speech Recognition\n also use a one-shot pruning schedule, for fully-connected layers, and they provide an explanation:\n\n\n\n\nFirst, after sweeping through the full training set several times the weights become relatively stable \u2014 they tend to remain either large or small magnitudes. Second, in a stabilized model, the importance of the connection is approximated well by the magnitudes of the weights (times the magnitudes of the corresponding input values, but these are relatively uniform within each layer since on the input layer, features are normalized to zero-mean and unit-variance, and hidden-layer values are probabilities)\n\n\n\n\nActivation-influenced pruner\n\n\nThe motivation for this pruner, is that if a feature-map produces very small activations, then this feature-map is not very important, and can be pruned away.\n- \nStatus: not implemented",
+            "text": "Weights pruning algorithms\n\n\n\n\nMagnitude pruner\n\n\nThis is the most basic pruner: it applies a thresholding function, \\(thresh(.)\\), on each element, \\(w_i\\), of a weights tensor.  A different threshold can be used for each layer's weights tensor.\n\nBecause the threshold is applied on individual elements, this pruner belongs to the element-wise pruning algorithm family.\n\n\n\\[ thresh(w_i)=\\left\\lbrace\n\\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} }\n\\right\\rbrace \\]\n\n\nSensitivity pruner\n\n\nFinding a threshold magnitude per layer is daunting, especially since each layer's elements have different average absolute values.  We can take advantage of the fact that the weights of convolutional and fully connected layers exhibit a Gaussian distribution with a mean value roughly zero, to avoid using a direct threshold based on the values of each specific tensor.\n\n\nThe diagram below shows the distribution the weights tensor of the first convolutional layer, and first fully-connected layer in TorchVision's pre-trained Alexnet model.  You can see that they have an approximate Gaussian distribution.\n\n\n \n\n\nThe distributions of Alexnet conv1 and fc1 layers\n\n\nWe use the standard deviation of the weights tensor as a sort of normalizing factor between the different weights tensors.  For example, if a tensor is Normally distributed, then about 68% of the elements have an absolute value less than the standard deviation (\\(\\sigma\\)) of the tensor.  Thus, if we set the threshold to \\(s*\\sigma\\), then basically we are thresholding \\(s * 68\\%\\) of the tensor elements.  \n\n\n\\[ thresh(w_i)=\\left\\lbrace\n\\matrix{{{w_i: \\; if \\;|w_i| \\; \\gt}\\;\\lambda}\\cr {0: \\; if \\; |w_i| \\leq \\lambda} }\n\\right\\rbrace \\]\n\n\n\\[\n\\lambda = s * \\sigma_l \\;\\;\\; where\\; \\sigma_l\\; is \\;the \\;std \\;of \\;layer \\;l \\;as \\;measured \\;on \\;the \\;dense \\;model\n\\]\n\n\nHow do we choose this \\(s\\) multiplier?\n\n\nIn \nLearning both Weights and Connections for Efficient Neural Networks\n the authors write:\n\n\n\n\n\"We used the sensitivity results to find each layer\u2019s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer\u2019s weights\n\n\n\n\nSo the results of executing pruning sensitivity analysis on the tensor, gives us a good starting guess at \\(s\\).  Sensitivity analysis is an empirical method, and we still have to spend time to hone in on the exact multiplier value.\n\n\nMethod of operation\n\n\n\n\nStart by running a pruning sensitivity analysis on the model.  \n\n\nThen use the results to set and tune the threshold of each layer, but instead of using a direct threshold use a sensitivity parameter which is multiplied by the standard-deviation of the initial weight-tensor's distribution.\n\n\n\n\nSchedule\n\n\nIn their \npaper\n Song Han et al. use iterative pruning and change the value of the \\(s\\) multiplier at each pruning step.  Distiller's \nSensitivityPruner\n works differently: the value \\(s\\) is set once based on a one-time calculation of the standard-deviation of the tensor (the first time we prune), and relies on the fact that as the tensor is pruned, more elements are \"pulled\" toward the center of the distribution and thus more elements gets pruned.\n\n\nThis actually works quite well as we can see in the diagram below.  This is a TensorBoard screen-capture from Alexnet training, which shows how this method starts off pruning very aggressively, but then slowly reduces the pruning rate.\n\n\n\nWe use a simple iterative-pruning schedule such as: \nPrune every second epoch starting at epoch 0, and ending at epoch 38.\n  This excerpt from \nalexnet.schedule_sensitivity.yaml\n shows how this iterative schedule is conveyed in Distiller scheduling configuration YAML:\n\n\npruners:\n  my_pruner:\n    class: 'SensitivityPruner'\n    sensitivities:\n      'features.module.0.weight': 0.25\n      'features.module.3.weight': 0.35\n      'features.module.6.weight': 0.40\n      'features.module.8.weight': 0.45\n      'features.module.10.weight': 0.55\n      'classifier.1.weight': 0.875\n      'classifier.4.weight': 0.875\n      'classifier.6.weight': 0.625\n\npolicies:\n  - pruner:\n      instance_name : 'my_pruner'\n    starting_epoch: 0\n    ending_epoch: 38\n    frequency: 2\n\n\n\n\nLevel pruner\n\n\nClass \nSparsityLevelParameterPruner\n uses a similar method to go around specifying specific thresholding magnitudes.\nInstead of specifying a threshold magnitude, you specify a target sparsity level (expressed as a fraction, so 0.5 means 50% sparsity).  Essentially this pruner also uses a pruning criteria based on the magnitude of each tensor element, but it has the advantage that you can aim for an exact and specific sparsity level.\n\nThis pruner is much more stable compared to \nSensitivityPruner\n because the target sparsity level is not coupled to the actual magnitudes of the elements. Distiller's \nSensitivityPruner\n is unstable because the final sparsity level depends on the convergence pattern of the tensor distribution.  Song Han's methodology of using several different values for the multiplier \\(s\\), and the recalculation of the standard-deviation at each pruning phase, probably gives it stability, but requires much more hyper-parameters (this is the reason we have not implemented it thus far).  \n\n\nTo set the target sparsity levels, you can once again use pruning sensitivity analysis to make better guesses at the correct sparsity level of each\n\n\nMethod of operation\n\n\n\n\nSort the weights in the specified layer by their absolute values. \n\n\nMask to zero the smallest magnitude weights until the desired sparsity level is reached.\n\n\n\n\nAutomated gradual pruner (AGP)\n\n\nIn \nTo prune, or not to prune: exploring the efficacy of pruning for model compression\n, authors Michael Zhu and Suyog Gupta provide an algorithm to schedule a Level Pruner which Distiller implements in \nAutomatedGradualPruner\n.\n\n\n\n\n\n\"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value \\(s_i\\) (usually 0) to a \ufb01nal sparsity value \\(s_f\\) over a span of n pruning steps.\nThe intuition behind this sparsity function in equation (1)  is to prune the network rapidly in the initial phase when the redundant connections are\nabundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.\"\"\n\n\n\n\n\n\nYou can play with the scheduling parameters in the \nagp_schedule.ipynb notebook\n.\n\n\nThe authors describe AGP:\n\n\n\n\n\n\nOur automated gradual pruning algorithm prunes the smallest magnitude weights to achieve a preset level of network sparsity.\n\n\nDoesn't require much hyper-parameter tuning\n\n\nShown to perform well across different models\n\n\nDoes not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable.\n\n\n\n\n\n\nRNN pruner\n\n\nThe authors of \nExploring Sparsity in Recurrent Neural Networks\n, Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, \"propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network.\"  They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training.  They show pruning of RNN, GRU, LSTM and embedding layers.\n\n\nDistiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm.\n\n\n\n\nStructure pruners\n\n\nElement-wise pruning can create very sparse models which can be compressed to consume less memory footprint and bandwidth, but without specialized hardware that can compute using the sparse representation of the tensors, we don't gain any speedup of the computation.  Structure pruners, remove entire \"structures\", such as kernels, filters, and even entire feature-maps.\n\n\nRanked structure pruner\n\n\nThe \nL1RankedStructureParameterPruner\n pruner calculates the magnitude of some \"structure\", orders all of the structures based on some magnitude function and the \nm\n lowest ranking structures are pruned away.  Currently this pruner only performs ranking of filters (3D structures) and it uses the mean of the absolute value of the tensor as the representative of the filter magnitude.  The absolute mean does not depend on the size of the filter, so it is easier to use compared to just using the \\(L_1\\)-norm of the structure, and at the same time it is a good proxy of the \\(L_1\\)-norm.\n\n\nIn \nPruning Filters for Efficient ConvNets\n the authors use filter ranking, with \none-shot pruning\n followed by fine-tuning.  The authors of \nExploiting Sparseness in Deep Neural Networks for Large Vocabulary Speech Recognition\n also use a one-shot pruning schedule, for fully-connected layers, and they provide an explanation:\n\n\n\n\nFirst, after sweeping through the full training set several times the weights become relatively stable \u2014 they tend to remain either large or small magnitudes. Second, in a stabilized model, the importance of the connection is approximated well by the magnitudes of the weights (times the magnitudes of the corresponding input values, but these are relatively uniform within each layer since on the input layer, features are normalized to zero-mean and unit-variance, and hidden-layer values are probabilities)\n\n\n\n\nActivation-influenced pruner\n\n\nThe motivation for this pruner, is that if a feature-map produces very small activations, then this feature-map is not very important, and can be pruned away.\n- \nStatus: not implemented",
             "title": "Pruning"
         },
         {
@@ -327,7 +327,7 @@
         },
         {
             "location": "/algo_pruning/index.html#rnn-pruner",
-            "text": "Reference:   Exploring Sparsity in Recurrent Neural Networks  Authors:  Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta  Status: not implemented",
+            "text": "The authors of  Exploring Sparsity in Recurrent Neural Networks , Sharan Narang, Erich Elsen, Gregory Diamos, and Shubho Sengupta, \"propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network.\"  They use a gradual pruning schedule which is reminiscent of the schedule used in AGP, for element-wise pruning of RNNs, which they also employ during training.  They show pruning of RNN, GRU, LSTM and embedding layers.  Distiller's distiller.pruning.BaiduRNNPruner class implements this pruning algorithm.",
             "title": "RNN pruner"
         },
         {
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 64ed47f90a3467338271eb8f9238ba481721edd5..2a7ed7825a5c6372a8fa6b5c519125699e5b35e4 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -4,7 +4,7 @@
     
     <url>
      <loc>/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
@@ -12,7 +12,7 @@
     
     <url>
      <loc>/install/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
@@ -20,7 +20,7 @@
     
     <url>
      <loc>/usage/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
@@ -28,7 +28,7 @@
     
     <url>
      <loc>/schedule/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
@@ -37,19 +37,19 @@
         
     <url>
      <loc>/pruning/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
         
     <url>
      <loc>/regularization/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
         
     <url>
      <loc>/quantization/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
         
@@ -59,13 +59,13 @@
         
     <url>
      <loc>/algo_pruning/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
         
     <url>
      <loc>/algo_quantization/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
         
@@ -74,7 +74,7 @@
     
     <url>
      <loc>/model_zoo/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
@@ -82,7 +82,7 @@
     
     <url>
      <loc>/jupyter/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
@@ -90,7 +90,7 @@
     
     <url>
      <loc>/design/index.html</loc>
-     <lastmod>2018-05-22</lastmod>
+     <lastmod>2018-06-14</lastmod>
      <changefreq>daily</changefreq>
     </url>