Skip to content
Snippets Groups Projects
  • Patrick Wendell's avatar
    74f82c71
    SPARK-2380: Support displaying accumulator values in the web UI · 74f82c71
    Patrick Wendell authored
    This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish.
    
    Currently in Spark developers have been extending the `TaskMetrics` functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually `TaskMetrics` and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going.
    
    ![counters](https://cloud.githubusercontent.com/assets/320616/3488815/6ee7bc34-0505-11e4-84ce-e36d9886e2cf.png)
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #1309 from pwendell/metrics and squashes the following commits:
    
    8815308 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into HEAD
    93fbe0f [Patrick Wendell] Other minor fixes
    cc43f68 [Patrick Wendell] Updating unit tests
    c991b1b [Patrick Wendell] Moving some code into the Accumulators class
    9a9ba3c [Patrick Wendell] More merge fixes
    c5ace9e [Patrick Wendell] More merge conflicts
    1da15e3 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into metrics
    9860c55 [Patrick Wendell] Potential solution to posting listener events
    0bb0e33 [Patrick Wendell] Remove "display" variable and assume display = name.isDefined
    0ec4ac7 [Patrick Wendell] Java API's
    e95bf69 [Patrick Wendell] Stash
    be97261 [Patrick Wendell] Style fix
    8407308 [Patrick Wendell] Removing examples in Hadoop and RDD class
    64d405f [Patrick Wendell] Adding missing file
    5d8b156 [Patrick Wendell] Changes based on Kay's review.
    9f18bad [Patrick Wendell] Minor style changes and tests
    7a63abc [Patrick Wendell] Adding Json serialization and responding to Reynold's feedback
    ad85076 [Patrick Wendell] Example of using named accumulators for custom RDD metrics.
    0b72660 [Patrick Wendell] Initial WIP example of supporing globally named accumulators.
    74f82c71
    History
    SPARK-2380: Support displaying accumulator values in the web UI
    Patrick Wendell authored
    This patch adds support for giving accumulators user-visible names and displaying accumulator values in the web UI. This allows users to create custom counters that can display in the UI. The current approach displays both the accumulator deltas caused by each task and a "current" value of the accumulator totals for each stage, which gets update as tasks finish.
    
    Currently in Spark developers have been extending the `TaskMetrics` functionality to provide custom instrumentation for RDD's. This provides a potentially nicer alternative of going through the existing accumulator framework (actually `TaskMetrics` and accumulators are on an awkward collision course as we add more features to the former). The current patch demo's how we can use the feature to provide instrumentation for RDD input sizes. The nice thing about going through accumulators is that users can read the current value of the data being tracked in their programs. This could be useful to e.g. decide to short-circuit a Spark stage depending on how things are going.
    
    ![counters](https://cloud.githubusercontent.com/assets/320616/3488815/6ee7bc34-0505-11e4-84ce-e36d9886e2cf.png)
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #1309 from pwendell/metrics and squashes the following commits:
    
    8815308 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into HEAD
    93fbe0f [Patrick Wendell] Other minor fixes
    cc43f68 [Patrick Wendell] Updating unit tests
    c991b1b [Patrick Wendell] Moving some code into the Accumulators class
    9a9ba3c [Patrick Wendell] More merge fixes
    c5ace9e [Patrick Wendell] More merge conflicts
    1da15e3 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into metrics
    9860c55 [Patrick Wendell] Potential solution to posting listener events
    0bb0e33 [Patrick Wendell] Remove "display" variable and assume display = name.isDefined
    0ec4ac7 [Patrick Wendell] Java API's
    e95bf69 [Patrick Wendell] Stash
    be97261 [Patrick Wendell] Style fix
    8407308 [Patrick Wendell] Removing examples in Hadoop and RDD class
    64d405f [Patrick Wendell] Adding missing file
    5d8b156 [Patrick Wendell] Changes based on Kay's review.
    9f18bad [Patrick Wendell] Minor style changes and tests
    7a63abc [Patrick Wendell] Adding Json serialization and responding to Reynold's feedback
    ad85076 [Patrick Wendell] Example of using named accumulators for custom RDD metrics.
    0b72660 [Patrick Wendell] Initial WIP example of supporing globally named accumulators.