Skip to content
Snippets Groups Projects
Commit 7596a0a6 authored by Neta Zmora's avatar Neta Zmora
Browse files

Documentation: Early Exit documentation (2)

Add missing files from previous commit
parent 5681541f
No related branches found
No related tags found
No related merge requests found
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="../img/favicon.ico">
<title>Early Exit - Neural Network Distiller</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="../css/highlight.css">
<link href="../extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Early Exit";
var mkdocs_page_input_path = "algo_earlyexit.md";
var mkdocs_page_url = "/algo_earlyexit/index.html";
</script>
<script src="../js/jquery-2.1.1.min.js"></script>
<script src="../js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="../js/highlight.pack.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="../index.html">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../install/index.html">Installation</a>
</li>
<li class="toctree-l1">
<a class="" href="../usage/index.html">Usage</a>
</li>
<li class="toctree-l1">
<a class="" href="../schedule/index.html">Compression scheduling</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Compressing models</span>
<ul class="subnav">
<li class="">
<a class="" href="../pruning/index.html">Pruning</a>
</li>
<li class="">
<a class="" href="../regularization/index.html">Regularization</a>
</li>
<li class="">
<a class="" href="../quantization/index.html">Quantization</a>
</li>
<li class="">
<a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
</li>
<li class="">
<a class="" href="../conditional_computation/index.html">Conditional Computation</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Algorithms</span>
<ul class="subnav">
<li class="">
<a class="" href="../algo_pruning/index.html">Pruning</a>
</li>
<li class="">
<a class="" href="../algo_quantization/index.html">Quantization</a>
</li>
<li class=" current">
<a class="current" href="index.html">Early Exit</a>
<ul class="subnav">
<li class="toctree-l3"><a href="#early-exit-inference">Early Exit Inference</a></li>
<ul>
<li><a class="toctree-l4" href="#why-does-early-exit-work">Why Does Early Exit Work?</a></li>
<li><a class="toctree-l4" href="#example-code-for-early-exit">Example code for Early Exit</a></li>
<li><a class="toctree-l4" href="#references">References</a></li>
</ul>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="" href="../model_zoo/index.html">Model Zoo</a>
</li>
<li class="toctree-l1">
<a class="" href="../jupyter/index.html">Jupyter notebooks</a>
</li>
<li class="toctree-l1">
<a class="" href="../design/index.html">Design</a>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Neural Network Distiller</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li>Algorithms &raquo;</li>
<li>Early Exit</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h1 id="early-exit-inference">Early Exit Inference</h1>
<p>While Deep Neural Networks benefit from a large number of layers, it's often the case that many data points in classification tasks can be classified accurately with much less work. There have been several studies recently regarding the idea of exiting before the normal endpoint of the neural network. Panda et al in <a href="#panda">Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition</a> points out that a lot of data points can be classified easily and require less processing than some more difficult points and they view this in terms of power savings. Surat et al in <a href="#branchynet">BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks</a> look at a selective approach to exit placement and criteria for exiting early.</p>
<h2 id="why-does-early-exit-work">Why Does Early Exit Work?</h2>
<p>Early Exit is a strategy with a straightforward and easy to understand concept Figure #fig(boundaries) shows a simple example in a 2-D feature space. While deep networks can represent more complex and expressive boundaries between classes (assuming we’re confident of avoiding over-fitting the data), it’s also clear that much of the data can be properly classified with even the simplest of classification boundaries.</p>
<p><img alt="Figure !fig(boundaries): Simple and more expressive classification boundaries" src="../imgs/decision_boundary.png" /></p>
<p>Data points far from the boundary can be considered "easy to classify" and achieve a high degree of confidence quicker than do data points close to the boundary. In fact, we can think of the area between the outer straight lines as being the region that is "difficult to classify" and require the full expressiveness of the neural network to accurately classify it.</p>
<h2 id="example-code-for-early-exit">Example code for Early Exit</h2>
<p>Both CIFAR10 and ImageNet code comes directly from publically available examples from Pytorch. The only edits are the exits that are inserted in a methodology similar to BranchyNet work.</p>
<p>Deeper networks can benefit from multiple exits. Our examples illustrate both a single and a pair of early exits for CIFAR10 and ImageNet, respectively.</p>
<p>Note that this code does not actually take exits. What it does is to compute statistics of loss and accuracy assuming exits were taken when criteria are met. Actually implementing exits can be tricky and architecture dependent and we plan to address these issues.</p>
<h3 id="heuristics">Heuristics</h3>
<p>The insertion of the exits are ad-hoc, but there are some heuristic principals guiding their placement and parameters. The earlier exits are placed, the more agressive the exit as it essentially prunes the rest of the network at a very early stage, thus saving a lot of work. However, a diminishing percentage of data will be directed through the exit if we are to preserve accuracy.</p>
<p>There are other benefits to adding exits in that training the modified network now has backpropagation losses coming from the exits that affect the earlier layers more substantially than the last exit. This effect mitigates problems such as vanishing gradient.</p>
<h3 id="early-exit-hyperparameters">Early Exit Hyperparameters</h3>
<p>There are two parameters that are required to enable early exit. Leave them undefined if you are not enabling Early Exit:</p>
<ol>
<li>
<p><strong>--earlyexit_thresholds</strong> defines the
thresholds for each of the early exits. The cross entropy measure must be <strong>less than</strong> the specified threshold to take a specific exit, otherwise the data continues along the regular path. For example, you could specify "--earlyexit_thresholds 0.9 1.2" and this implies two early exits with corresponding thresholds of 0.9 and 1.2, respectively to take those exits.</p>
</li>
<li>
<p><strong>--earlyexit_lossweights</strong> provide the weights for the linear combination of losses during training to compute a signle, overall loss. We only specify weights for the early exits and assume that the sum of the weights (including final exit) are equal to 1.0. So an example of "--earlyexit_lossweights 0.2 0.3" implies two early exits weighted with values of 0.2 and 0.3, respectively and that the final exit has a value of 1.0-(0.2+0.3) = 0.5. Studies have shown that weighting the early exits more heavily will create more agressive early exits, but perhaps with a slight negative effect on accuracy.</p>
</li>
</ol>
<h3 id="cifar10">CIFAR10</h3>
<p>In the case of CIFAR10, we have inserted a single exit after the first full layer grouping. The layers on the exit path itself includes a convolutional layer and a fully connected layer. If you move the exit, be sure to match the proper sizes for inputs and outputs to the exit layers.</p>
<h3 id="imagenet">ImageNet</h3>
<p>This supports training and inference of the ImageNet dataset via several well known deep architectures. ResNet-50 is the architecture of interest in this study, however the exit is defined in the generic resnet code and could be used with other size resnets. There are two exits inserted in this example. Again, exit layers must have their sizes match properly.</p>
<h2 id="references">References</h2>
<p><div id="panda"></div> <strong>Priyadarshini Panda, Abhronil Sengupta, Kaushik Roy</strong>.
<a href="https://arxiv.org/abs/1509.08971v6"><em>Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition</em></a>, arXiv:1509.08971v6, 2017.</p>
<div id="branchynet"></div>
<p><strong>Surat Teerapittayanon, Bradley McDanel, H. T. Kung</strong>.
<a href="http://arxiv.org/abs/1709.01686"><em>BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks</em></a>, arXiv:1709.01686, 2017.</p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../model_zoo/index.html" class="btn btn-neutral float-right" title="Model Zoo">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../algo_quantization/index.html" class="btn btn-neutral" title="Quantization"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../algo_quantization/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../model_zoo/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
<script>var base_url = '..';</script>
<script src="../js/theme.js"></script>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
<script src="../search/require.js"></script>
<script src="../search/search.js"></script>
</body>
</html>
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="../img/favicon.ico">
<title>Conditional Computation - Neural Network Distiller</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="../css/highlight.css">
<link href="../extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Conditional Computation";
var mkdocs_page_input_path = "conditional_computation.md";
var mkdocs_page_url = "/conditional_computation/index.html";
</script>
<script src="../js/jquery-2.1.1.min.js"></script>
<script src="../js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="../js/highlight.pack.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../index.html" class="icon icon-home"> Neural Network Distiller</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="../index.html">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../install/index.html">Installation</a>
</li>
<li class="toctree-l1">
<a class="" href="../usage/index.html">Usage</a>
</li>
<li class="toctree-l1">
<a class="" href="../schedule/index.html">Compression scheduling</a>
</li>
<li class="toctree-l1">
<span class="caption-text">Compressing models</span>
<ul class="subnav">
<li class="">
<a class="" href="../pruning/index.html">Pruning</a>
</li>
<li class="">
<a class="" href="../regularization/index.html">Regularization</a>
</li>
<li class="">
<a class="" href="../quantization/index.html">Quantization</a>
</li>
<li class="">
<a class="" href="../knowledge_distillation/index.html">Knowledge Distillation</a>
</li>
<li class=" current">
<a class="current" href="index.html">Conditional Computation</a>
<ul class="subnav">
<li class="toctree-l3"><a href="#conditional-computation">Conditional Computation</a></li>
<ul>
<li><a class="toctree-l4" href="#references">References</a></li>
</ul>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Algorithms</span>
<ul class="subnav">
<li class="">
<a class="" href="../algo_pruning/index.html">Pruning</a>
</li>
<li class="">
<a class="" href="../algo_quantization/index.html">Quantization</a>
</li>
<li class="">
<a class="" href="../algo_earlyexit/index.html">Early Exit</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<a class="" href="../model_zoo/index.html">Model Zoo</a>
</li>
<li class="toctree-l1">
<a class="" href="../jupyter/index.html">Jupyter notebooks</a>
</li>
<li class="toctree-l1">
<a class="" href="../design/index.html">Design</a>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Neural Network Distiller</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html">Docs</a> &raquo;</li>
<li>Compressing models &raquo;</li>
<li>Conditional Computation</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h1 id="conditional-computation">Conditional Computation</h1>
<p>Conditional Computation refers to a class of algorithms in which each input sample uses a different part of the model, such that on average the compute, latency or power (depending on our objective) is reduced.
To quote <a href="#bengio">Bengio et. al</a></p>
<blockquote>
<p>"Conditional computation refers to activating only some of the units in a network, in an input-dependent fashion. For example, if we think we’re looking at a car, we only need to compute the activations of the vehicle detecting units, not of all features that a network could possible compute. The immediate effect of activating fewer units is that propagating information through the network will be faster, both at training as well as at test time. However, one needs to be able to decide in an intelligent fashion which units to turn on and off, depending on the input data. This is typically achieved with some form of gating structure, learned in parallel with the original network."</p>
</blockquote>
<p>As usual, there are several approaches to implement Conditional Computation:</p>
<ul>
<li><a href="#sun">Sun et. al</a> use several expert CNN, each trained on a different task, and combine them to one large network.</li>
<li><a href="#zheng">Zheng et. al</a> use cascading, an idea which may be familiar to you from Viola-Jones face detection.</li>
<li><a href="#theodorakopoulos">Theodorakopoulos et. al</a> add small layers that learn which filters to use per input sample, and then enforce that during inference (LKAM module).</li>
<li><a href="#ioannou">Ioannou et. al</a> introduce Conditional Networks: that "can be thought of as: i) decision trees augmented with data transformation
operators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions"</li>
<li><a href="#bolukbasi">Bolukbasi et. al</a> "learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example."</li>
</ul>
<p>Conditional Computation is especially useful for real-time, latency-sensitive applicative.<br>
In Distiller we currently have implemented a variant of Early Exit.</p>
<h2 id="references">References</h2>
<p><div id="bengio"></div> <strong>Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup.</strong>
<a href="https://arxiv.org/abs/1511.06297"><em>Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition</em></a>, arXiv:1511.06297v2, 2016.</p>
<div id="sun"></div>
<p><strong>Y. Sun, X.Wang, and X. Tang.</strong>
<em>Deep Convolutional Network Cascade for Facial Point Detection</em>. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2014</p>
<div id="zheng"></div>
<p><strong>X. Zheng, W.Ouyang, and X.Wang.</strong> <em>Multi-Stage Contextual Deep Learning for Pedestrian Detection.</em> In Proc. IEEE Intl Conf. on Computer Vision (ICCV), 2014.</p>
<div id="theodorakopoulos"></div>
<p><strong>I. Theodorakopoulos, V. Pothos, D. Kastaniotis and N. Fragoulis1.</strong> <em>Parsimonious Inference on Convolutional Neural Networks: Learning and applying on-line kernel activation rules.</em> Irida Labs S.A, January 2017</p>
<div id="bolukbasi"></div>
<p><strong>Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama</strong> <a href="http://proceedings.mlr.press/v70/bolukbasi17a/bolukbasi17a.pdf"><em>Adaptive Neural Networks for Efficient Inference</em></a>. Proceedings of the 34th International Conference on Machine Learning, PMLR 70:527-536, 2017.</p>
<div id="ioannou"></div>
<p><strong>Yani Ioannou, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, Antonio Criminisi</strong>.
<a href="https://arxiv.org/abs/1511.06297"><em>Decision Forests, Convolutional Networks and the Models in-Between</em></a>, arXiv:1511.06297v2, 2016.</p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../algo_pruning/index.html" class="btn btn-neutral float-right" title="Pruning">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../knowledge_distillation/index.html" class="btn btn-neutral" title="Knowledge Distillation"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../knowledge_distillation/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../algo_pruning/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
<script>var base_url = '..';</script>
<script src="../js/theme.js"></script>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
<script src="../search/require.js"></script>
<script src="../search/search.js"></script>
</body>
</html>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment