Skip to content
Snippets Groups Projects
  1. Aug 30, 2014
    • Marcelo Vanzin's avatar
      [SPARK-2889] Create Hadoop config objects consistently. · b6cf1348
      Marcelo Vanzin authored
      Different places in the code were instantiating Configuration / YarnConfiguration objects in different ways. This could lead to confusion for people who actually expected "spark.hadoop.*" options to end up in the configs used by Spark code, since that would only happen for the SparkContext's config.
      
      This change modifies most places to use SparkHadoopUtil to initialize configs, and make that method do the translation that previously was only done inside SparkContext.
      
      The places that were not changed fall in one of the following categories:
      - Test code where this doesn't really matter
      - Places deep in the code where plumbing SparkConf would be too difficult for very little gain
      - Default values for arguments - since the caller can provide their own config in that case
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #1843 from vanzin/SPARK-2889 and squashes the following commits:
      
      52daf35 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      f179013 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      51e71cf [Marcelo Vanzin] Add test to ensure that overriding Yarn configs works.
      53f9506 [Marcelo Vanzin] Add DeveloperApi annotation.
      3d345cb [Marcelo Vanzin] Restore old method for backwards compat.
      fc45067 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      0ac3fdf [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      3f26760 [Marcelo Vanzin] Compilation fix.
      f16cadd [Marcelo Vanzin] Initialize config in SparkHadoopUtil.
      b8ab173 [Marcelo Vanzin] Update Utils API to take a Configuration argument.
      1e7003f [Marcelo Vanzin] Replace explicit Configuration instantiation with SparkHadoopUtil.
      b6cf1348
  2. Aug 29, 2014
  3. Aug 28, 2014
    • Sandy Ryza's avatar
      SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requeste... · 92af2314
      Sandy Ryza authored
      ...d queue doesn't exist
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1984 from sryza/sandy-spark-3082 and squashes the following commits:
      
      fe08c37 [Sandy Ryza] Remove log message entirely
      85253ad [Sandy Ryza] SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist
      92af2314
  4. Aug 27, 2014
    • Marcelo Vanzin's avatar
      [SPARK-2933] [yarn] Refactor and cleanup Yarn AM code. · b92d823a
      Marcelo Vanzin authored
      This change modifies the Yarn module so that all the logic related
      to running the ApplicationMaster is localized. Instead of, previously,
      4 different classes with mostly identical code, now we have:
      
      - A single, shared ApplicationMaster class, which can operate both in
        client and cluster mode, and substitutes the old ApplicationMaster
        (for cluster mode) and ExecutorLauncher (for client mode).
      
      The benefit here is that all different execution modes for all supported
      yarn versions use the same shared code for monitoring executor allocation,
      setting up configuration, and monitoring the process's lifecycle.
      
      - A new YarnRMClient interface, which defines basic RM functionality needed
        by the ApplicationMaster. This interface has concrete implementations for
        each supported Yarn version.
      
      - A new YarnAllocator interface, which just abstracts the existing interface
        of the YarnAllocationHandler class. This is to avoid having to touch the
        allocator code too much in this change, although it might benefit from a
        similar effort in the future.
      
      The end result is much easier to understand code, with much less duplication,
      making it much easier to fix bugs, add features, and test everything knowing
      that all supported versions will behave the same.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2020 from vanzin/SPARK-2933 and squashes the following commits:
      
      3bbf3e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
      ff389ed [Marcelo Vanzin] Do not interrupt reporter thread from within itself.
      3a8ed37 [Marcelo Vanzin] Remote stale comment.
      0f5142c [Marcelo Vanzin] Review feedback.
      41f8c8a [Marcelo Vanzin] Fix app status reporting.
      c0794be [Marcelo Vanzin] Correctly clean up staging directory.
      92770cc [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
      ecaf332 [Marcelo Vanzin] Small fix to shutdown code.
      f02d3f8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
      f581122 [Marcelo Vanzin] Review feedback.
      557fdeb [Marcelo Vanzin] Cleanup a couple more constants.
      be6068d [Marcelo Vanzin] Restore shutdown hook to clean up staging dir.
      5150993 [Marcelo Vanzin] Some more cleanup.
      b6289ab [Marcelo Vanzin] Move cluster/client code to separate methods.
      ecb23cd [Marcelo Vanzin] More trivial cleanup.
      34f1e63 [Marcelo Vanzin] Fix some questionable error handling.
      5657c7d [Marcelo Vanzin] Finish app if SparkContext initialization times out.
      0e4be3d [Marcelo Vanzin] Keep "ExecutorLauncher" as the main class for client-mode AM.
      91beabb [Marcelo Vanzin] Fix UI filter registration.
      8c72239 [Marcelo Vanzin] Trivial cleanups.
      99a52d5 [Marcelo Vanzin] Changes to the yarn-alpha project to use common AM code.
      848ca6d [Marcelo Vanzin] [SPARK-2933] [yarn] Refactor and cleanup Yarn AM code.
      b92d823a
  5. Aug 26, 2014
    • Andrew Or's avatar
      [SPARK-2886] Use more specific actor system name than "spark" · b21ae5bb
      Andrew Or authored
      As of #1777 we log the name of the actor system when it binds to a port. The current name "spark" is super general and does not convey any meaning. For instance, the following line is taken from my driver log after setting `spark.driver.port` to 5001.
      ```
      14/08/13 19:33:29 INFO Remoting: Remoting started; listening on addresses:
      [akka.tcp://sparkandrews-mbp:5001]
      14/08/13 19:33:29 INFO Remoting: Remoting now listens on addresses:
      [akka.tcp://sparkandrews-mbp:5001]
      14/08/06 13:40:05 INFO Utils: Successfully started service 'spark' on port 5001.
      ```
      This commit renames this to "sparkDriver" and "sparkExecutor". The goal of this unambitious PR is simply to make the logged information more explicit without introducing any change in functionality.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1810 from andrewor14/service-name and squashes the following commits:
      
      8c459ed [Andrew Or] Use a common variable for driver/executor actor system names
      3a92843 [Andrew Or] Change actor name to sparkDriver and sparkExecutor
      921363e [Andrew Or] Merge branch 'master' of github.com:apache/spark into service-name
      c8c6a62 [Andrew Or] Do not include hyphens in actor name
      1c1b42e [Andrew Or] Avoid spaces in akka system name
      f644b55 [Andrew Or] Use more specific service name
      b21ae5bb
  6. Aug 22, 2014
  7. Aug 20, 2014
    • Josh Rosen's avatar
      [SPARK-2974] [SPARK-2975] Fix two bugs related to spark.local.dirs · ebcb94f7
      Josh Rosen authored
      This PR fixes two bugs related to `spark.local.dirs` and `SPARK_LOCAL_DIRS`, one where `Utils.getLocalDir()` might return an invalid directory (SPARK-2974) and another where the `SPARK_LOCAL_DIRS` override didn't affect the driver, which could cause problems when running tasks in local mode (SPARK-2975).
      
      This patch fixes both issues: the new `Utils.getOrCreateLocalRootDirs(conf: SparkConf)` utility method manages the creation of local directories and handles the precedence among the different configuration options, so we should see the same behavior whether we're running in local mode or on a worker.
      
      It's kind of a pain to mock out environment variables in tests (no easy way to mock System.getenv), so I added a `private[spark]` method to SparkConf for accessing environment variables (by default, it just delegates to System.getenv).  By subclassing SparkConf and overriding this method, we can mock out SPARK_LOCAL_DIRS in tests.
      
      I also fixed a typo in PySpark where we used `SPARK_LOCAL_DIR` instead of `SPARK_LOCAL_DIRS` (I think this was technically innocuous, but it seemed worth fixing).
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #2002 from JoshRosen/local-dirs and squashes the following commits:
      
      efad8c6 [Josh Rosen] Address review comments:
      1dec709 [Josh Rosen] Minor updates to Javadocs.
      7f36999 [Josh Rosen] Use env vars to detect if running in YARN container.
      399ac25 [Josh Rosen] Update getLocalDir() documentation.
      bb3ad89 [Josh Rosen] Remove duplicated YARN getLocalDirs() code.
      3e92d44 [Josh Rosen] Move local dirs override logic into Utils; fix bugs:
      b2c4736 [Josh Rosen] Add failing tests for SPARK-2974 and SPARK-2975.
      007298b [Josh Rosen] Allow environment variables to be mocked in tests.
      6d9259b [Josh Rosen] Fix typo in PySpark: SPARK_LOCAL_DIR should be SPARK_LOCAL_DIRS
      ebcb94f7
  8. Aug 19, 2014
    • Thomas Graves's avatar
      [SPARK-3072] YARN - Exit when reach max number failed executors · 7eb9cbc2
      Thomas Graves authored
      In some cases on hadoop 2.x the spark application master doesn't properly exit and hangs around for 10 minutes after its really done.  We should make sure it exits properly and stops the driver.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #2022 from tgravescs/SPARK-3072 and squashes the following commits:
      
      665701d [Thomas Graves] Exit when reach max number failed executors
      7eb9cbc2
  9. Aug 18, 2014
    • Marcelo Vanzin's avatar
      [SPARK-2718] [yarn] Handle quotes and other characters in user args. · 6201b276
      Marcelo Vanzin authored
      Due to the way Yarn runs things through bash, normal quoting doesn't
      work as expected. This change applies the necessary voodoo to the user
      args to avoid issues with bash and special characters.
      
      The change also uncovered an issue with the event logger app name
      sanitizing code; it wasn't cleaning up all "bad" characters, so
      sometimes it would fail to create the log dirs. I just added some
      more bad character replacements.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #1724 from vanzin/SPARK-2718 and squashes the following commits:
      
      cc84b89 [Marcelo Vanzin] Review feedback.
      c1a257a [Marcelo Vanzin] Add test for backslashes.
      55571d4 [Marcelo Vanzin] Unbreak yarn-client.
      515613d [Marcelo Vanzin] [SPARK-2718] [yarn] Handle quotes and other characters in user args.
      6201b276
  10. Aug 09, 2014
    • li-zhihui's avatar
      [SPARK-2635] Fix race condition at SchedulerBackend.isReady in standalone mode · 28dbae85
      li-zhihui authored
      In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.
      
      Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.
      
      Author: li-zhihui <zhihui.li@intel.com>
      Author: Li Zhihui <zhihui.li@intel.com>
      
      Closes #1525 from li-zhihui/fixre4s and squashes the following commits:
      
      e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
      abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
      ca54bd9 [li-zhihui] Format log with String interpolation
      88c7dc6 [li-zhihui] Few codes and docs refactor
      41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
      28dbae85
  11. Aug 05, 2014
    • Thomas Graves's avatar
      SPARK-1680: use configs for specifying environment variables on YARN · 41e0a21b
      Thomas Graves authored
      Note that this also documents spark.executorEnv.*  which to me means its public.  If we don't want that please speak up.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1512 from tgravescs/SPARK-1680 and squashes the following commits:
      
      11525df [Thomas Graves] more doc changes
      553bad0 [Thomas Graves] fix documentation
      152bf7c [Thomas Graves] fix docs
      5382326 [Thomas Graves] try fix docs
      32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN
      41e0a21b
    • Thomas Graves's avatar
      SPARK-1890 and SPARK-1891- add admin and modify acls · 1c5555a2
      Thomas Graves authored
      It was easier to combine these 2 jira since they touch many of the same places.  This pr adds the following:
      
      - adds modify acls
      - adds admin acls (list of admins/users that get added to both view and modify acls)
      - modify Kill button on UI to take modify acls into account
      - changes config name of spark.ui.acls.enable to spark.acls.enable since I choose poorly in original name. We keep backwards compatibility so people can still use spark.ui.acls.enable. The acls should apply to any web ui as well as any CLI interfaces.
      - send view and modify acls information on to YARN so that YARN interfaces can use (yarn cli for killing applications for example).
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1196 from tgravescs/SPARK-1890 and squashes the following commits:
      
      8292eb1 [Thomas Graves] review comments
      b92ec89 [Thomas Graves] remove unneeded variable from applistener
      4c765f4 [Thomas Graves] Add in admin acls
      72eb0ac [Thomas Graves] Add modify acls
      1c5555a2
    • Thomas Graves's avatar
      SPARK-1528 - spark on yarn, add support for accessing remote HDFS · 2c0f705e
      Thomas Graves authored
      Add a config (spark.yarn.access.namenodes) to allow applications running on yarn to access other secure HDFS cluster.  User just specifies the namenodes of the other clusters and we get Tokens for those and ship them with the spark application.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1159 from tgravescs/spark-1528 and squashes the following commits:
      
      ddbcd16 [Thomas Graves] review comments
      0ac8501 [Thomas Graves] SPARK-1528 - add support for accessing remote HDFS
      2c0f705e
  12. Jul 30, 2014
    • derek ma's avatar
      Required AM memory is "amMem", not "args.amMemory" · 118c1c42
      derek ma authored
      "ERROR yarn.Client: Required AM memory (1024) is above the max threshold (1048) of this cluster" appears if this code is not changed. obviously, 1024 is less than 1048, so change this
      
      Author: derek ma <maji3@asiainfo-linkage.com>
      
      Closes #1494 from maji2014/master and squashes the following commits:
      
      b0f6640 [derek ma] Required AM memory is "amMem", not "args.amMemory"
      118c1c42
  13. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  14. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  15. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  16. Jul 24, 2014
    • GuoQiang Li's avatar
      [SPARK-2037]: yarn client mode doesn't support spark.yarn.max.executor.failures · 323a83c5
      GuoQiang Li authored
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #1180 from witgo/SPARK-2037 and squashes the following commits:
      
      3d52411 [GuoQiang Li] review commit
      7058f4d [GuoQiang Li] Correctly stop SparkContext
      6d0561f [GuoQiang Li] Fix: yarn client mode doesn't support spark.yarn.max.executor.failures
      323a83c5
    • Rahul Singhal's avatar
      SPARK-2150: Provide direct link to finished application UI in yarn resou... · 46e224aa
      Rahul Singhal authored
      ...rce manager UI
      
      Use the event logger directory to provide a direct link to finished
      application UI in yarn resourcemanager UI.
      
      Author: Rahul Singhal <rahul.singhal@guavus.com>
      
      Closes #1094 from rahulsinghaliitd/SPARK-2150 and squashes the following commits:
      
      95f230c [Rahul Singhal] SPARK-2150: Provide direct link to finished application UI in yarn resource manager UI
      46e224aa
  17. Jul 22, 2014
    • Gera Shegalov's avatar
      [YARN] SPARK-2577: File upload to viewfs is broken due to mount point re... · 02e45729
      Gera Shegalov authored
      Opting to the option 2 defined in SPARK-2577, i.e., retrieve and pass the correct file system object to addResource.
      
      Author: Gera Shegalov <gera@twitter.com>
      
      Closes #1483 from gerashegalov/master and squashes the following commits:
      
      90c9087 [Gera Shegalov] [YARN] SPARK-2577: File upload to viewfs is broken due to mount point resolution
      02e45729
  18. Jul 21, 2014
    • Sandy Ryza's avatar
      SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler · f89cf65d
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #634 from sryza/sandy-spark-1707 and squashes the following commits:
      
      2f6e358 [Sandy Ryza] Default min registered executors ratio to .8 for YARN
      354c630 [Sandy Ryza] Remove outdated comments
      c744ef3 [Sandy Ryza] Take out waitForInitialAllocations
      2a4329b [Sandy Ryza] SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler
      f89cf65d
  19. Jul 15, 2014
    • witgo's avatar
      SPARK-1291: Link the spark UI to RM ui in yarn-client mode · 72ea56da
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #1112 from witgo/SPARK-1291 and squashes the following commits:
      
      6022bcd [witgo] review commit
      1fbb925 [witgo] add addAmIpFilter to yarn alpha
      210299c [witgo] review commit
      1b92a07 [witgo] review commit
      6896586 [witgo] Add comments to addWebUIFilter
      3e9630b [witgo] review commit
      142ee29 [witgo] review commit
      1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
      72ea56da
  20. Jul 14, 2014
    • li-zhihui's avatar
      [SPARK-1946] Submit tasks after (configured ratio) executors have been registered · 3dd8af7a
      li-zhihui authored
      Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.
      
      A simple solution is sleeping few seconds in application, so that executors have enough time to register.
      
      The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.
      
      \# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0
      spark.scheduler.minRegisteredExecutorsRatio = 0.8
      
      \# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000
      spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000
      
      Author: li-zhihui <zhihui.li@intel.com>
      
      Closes #900 from li-zhihui/master and squashes the following commits:
      
      b9f8326 [li-zhihui] Add logs & edit docs
      1ac08b1 [li-zhihui] Add new configs to user docs
      22ead12 [li-zhihui] Move waitBackendReady to postStartHook
      c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS
      4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor
      0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks
      4261454 [li-zhihui] Add docs for new configs & code style
      ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime
      6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha
      812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode
      e7b6272 [li-zhihui] support yarn-cluster
      37f7dc2 [li-zhihui] support yarn mode(percentage style)
      3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
      3dd8af7a
  21. Jul 10, 2014
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
  22. Jun 30, 2014
    • Reynold Xin's avatar
      [SPARK-2318] When exiting on a signal, print the signal name first. · 5fccb567
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1260 from rxin/signalhandler1 and squashes the following commits:
      
      8e73552 [Reynold Xin] Uh add Logging back in ApplicationMaster.
      0402ba8 [Reynold Xin] Synchronize SignalLogger.register.
      dc70705 [Reynold Xin] Added SignalLogger to YARN ApplicationMaster.
      79a21b4 [Reynold Xin] Added license header.
      0da052c [Reynold Xin] Added the SignalLogger itself.
      e587d2e [Reynold Xin] [SPARK-2318] When exiting on a signal, print the signal name first.
      5fccb567
  23. Jun 26, 2014
    • Kay Ousterhout's avatar
      Remove use of spark.worker.instances · 48a82a82
      Kay Ousterhout authored
      spark.worker.instances was added as part of this commit: https://github.com/apache/spark/commit/1617816090e7b20124a512a43860a21232ebf511
      
      My understanding is that SPARK_WORKER_INSTANCES is supported for backwards compatibility,
      but spark.worker.instances is never used (SparkSubmit.scala sets spark.executor.instances) so should
      not have been added.
      
      @sryza @pwendell @tgravescs LMK if I'm understanding this correctly
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #1214 from kayousterhout/yarn_config and squashes the following commits:
      
      3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances
      48a82a82
  24. Jun 23, 2014
    • Marcelo Vanzin's avatar
      [SPARK-1395] Fix "local:" URI support in Yarn mode (again). · e380767d
      Marcelo Vanzin authored
      Recent changes ignored the fact that path may be defined with "local:"
      URIs, which means they need to be explicitly added to the classpath
      everywhere a remote process is started. This change fixes that by:
      
      - Using the correct methods to add paths to the classpath
      - Creating SparkConf settings for the Spark jar itself and for the
        user's jar
      - Propagating those two settings to the remote processes where needed
      
      This ensures that both in client and in cluster mode, the driver has
      the necessary info to build the executor's classpath and have things
      still work when they contain "local:" references.
      
      The change also fixes some confusion in ClientBase about whether
      to use SparkConf or system properties to propagate config options to
      the driver and executors, by standardizing on using data held by
      SparkConf.
      
      On the cleanup front, I removed the hacky way that log4j configuration
      was being propagated to handle the "local:" case. It's much more cleanly
      (and generically) handled by using spark-submit arguments (--files to
      upload a config file, or setting spark.executor.extraJavaOptions to pass
      JVM arguments and use a local file).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #560 from vanzin/yarn-local-2 and squashes the following commits:
      
      4e7f066 [Marcelo Vanzin] Correctly propagate SPARK_JAVA_OPTS to driver/executor.
      6a454ea [Marcelo Vanzin] Use constants for PWD in test.
      6dd5943 [Marcelo Vanzin] Fix propagation of config options to driver / executor.
      b2e377f [Marcelo Vanzin] Review feedback.
      93c3f85 [Marcelo Vanzin] Fix ClassCastException in test.
      e5c682d [Marcelo Vanzin] Fix cluster mode, restore SPARK_LOG4J_CONF.
      1dfbb40 [Marcelo Vanzin] Add documentation for spark.yarn.jar.
      bbdce05 [Marcelo Vanzin] [SPARK-1395] Fix "local:" URI support in Yarn mode (again).
      e380767d
  25. Jun 19, 2014
    • witgo's avatar
      [SPARK-2051]In yarn.ClientBase spark.yarn.dist.* do not work · bce0897b
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #969 from witgo/yarn_ClientBase and squashes the following commits:
      
      8117765 [witgo] review commit
      3bdbc52 [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase
      5261b6c [witgo] fix sys.props.get("SPARK_YARN_DIST_FILES")
      e3c1107 [witgo] update docs
      b6a9aa1 [witgo] merge master
      c8b4554 [witgo] review commit
      2f48789 [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase
      8d7b82f [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase
      1048549 [witgo] remove Utils.resolveURIs
      871f1db [witgo] add spark.yarn.dist.* documentation
      41bce59 [witgo] review commit
      35d6fa0 [witgo] move to ClientArguments
      55d72fc [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase
      9cdff16 [witgo] review commit
      8bc2f4b [witgo] review commit
      20e667c [witgo] Merge branch 'master' into yarn_ClientBase
      0961151 [witgo] merge master
      ce609fc [witgo] Merge branch 'master' into yarn_ClientBase
      8362489 [witgo] yarn.ClientBase spark.yarn.dist.* do not work
      bce0897b
  26. Jun 16, 2014
    • witgo's avatar
      [SPARK-1930] The Container is running beyond physical memory limits, so as to be killed · cdf2b045
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #894 from witgo/SPARK-1930 and squashes the following commits:
      
      564307e [witgo] Update the running-on-yarn.md
      3747515 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930
      172647b [witgo] add memoryOverhead docs
      a0ff545 [witgo] leaving only two configs
      a17bda2 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930
      478ca15 [witgo] Merge branch 'master' into SPARK-1930
      d1244a1 [witgo] Merge branch 'master' into SPARK-1930
      8b967ae [witgo] Merge branch 'master' into SPARK-1930
      655a820 [witgo] review commit
      71859a7 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930
      e3c531d [witgo] review commit
      e16f190 [witgo] different memoryOverhead
      ffa7569 [witgo] review commit
      5c9581f [witgo] Merge branch 'master' into SPARK-1930
      9a6bcf2 [witgo] review commit
      8fae45a [witgo] fix NullPointerException
      e0dcc16 [witgo] Adding  configuration items
      b6a989c [witgo] Fix container memory beyond limit, were killed
      cdf2b045
  27. Jun 12, 2014
    • John Zhao's avatar
      [SPARK-1516]Throw exception in yarn client instead of run system.exit directly. · f95ac686
      John Zhao authored
      All the changes is in  the package of "org.apache.spark.deploy.yarn":
          1) Throw exception in ClinetArguments and ClientBase instead of exit directly.
          2) in Client's main method, if exception is caught, it will exit with code 1, otherwise exit with code 0.
      
      After the fix, if user integrate the spark yarn client into their applications, when the argument is wrong or the running is finished, the application won't be terminated.
      
      Author: John Zhao <jzhao@alpinenow.com>
      
      Closes #490 from codeboyyong/jira_1516_systemexit_inyarnclient and squashes the following commits:
      
      138cb48 [John Zhao] [SPARK-1516]Throw exception in yarn clinet instead of run system.exit directly. All the changes is in  the package of "org.apache.spark.deploy.yarn": 1) Add a ClientException with an exitCode 2) Throws exception in ClinetArguments and ClientBase instead of exit directly 3) in Client's main method, catch exception and exit with the exitCode.
      f95ac686
    • Marcelo Vanzin's avatar
      [SPARK-2080] Yarn: report HS URL in client mode, correct user in cluster mode. · ecde5b83
      Marcelo Vanzin authored
      Yarn client mode was not setting the app's tracking URL to the
      History Server's URL when configured by the user. Now client mode
      behaves the same as cluster mode.
      
      In SparkContext.scala, the "user.name" system property had precedence
      over the SPARK_USER environment variable. This means that SPARK_USER
      was never used, since "user.name" is always set by the JVM. In Yarn
      cluster mode, this means the application always reported itself as
      being run by user "yarn" (or whatever user was running the Yarn NM).
      One could argue that the correct fix would be to use UGI.getCurrentUser()
      here, but at least for Yarn that will match what SPARK_USER is set
      to.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Thomas Graves <tgraves@apache.org>
      
      Closes #1002 from vanzin/yarn-client-url and squashes the following commits:
      
      4046e04 [Marcelo Vanzin] Set HS link in yarn-alpha also.
      4c692d9 [Marcelo Vanzin] Yarn: report HS URL in client mode, correct user in cluster mode.
      ecde5b83
  28. Jun 11, 2014
    • Sandy Ryza's avatar
      SPARK-1639. Tidy up some Spark on YARN code · 2a4225dd
      Sandy Ryza authored
      This contains a bunch of small tidyings of the Spark on YARN code.
      
      I focused on the yarn stable code.  @tgravescs, let me know if you'd like me to make these for the alpha code as well.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #561 from sryza/sandy-spark-1639 and squashes the following commits:
      
      72b6a02 [Sandy Ryza] Fix comment and set name on driver thread
      c2190b2 [Sandy Ryza] SPARK-1639. Tidy up some Spark on YARN code
      2a4225dd
  29. Jun 10, 2014
  30. Jun 09, 2014
    • Bernardo Gomez Palacio's avatar
      [SPARK-1522] : YARN ClientBase throws a NPE if there is no YARN Application CP · e2734476
      Bernardo Gomez Palacio authored
      The current implementation of ClientBase.getDefaultYarnApplicationClasspath inspects
      the MRJobConfig class for the field DEFAULT_YARN_APPLICATION_CLASSPATH when it should
      be really looking into YarnConfiguration. If the Application Configuration has no
      yarn.application.classpath defined a NPE exception will be thrown.
      
      Additional Changes include:
      * Test Suite for ClientBase added
      
      [ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522
      
      Author      : bernardo.gomezpalacio@gmail.com
      Testing     : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test
      
      Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com>
      
      Closes #433 from berngp/feature/SPARK-1522 and squashes the following commits:
      
      2c2e118 [Bernardo Gomez Palacio] [SPARK-1522]: YARN ClientBase throws a NPE if there is no YARN Application specific CP
      e2734476
  31. Jun 08, 2014
  32. Jun 05, 2014
  33. May 31, 2014
Loading