Skip to content
Snippets Groups Projects
  1. Sep 14, 2014
    • Prashant Sharma's avatar
      [SPARK-3452] Maven build should skip publishing artifacts people shouldn... · f493f798
      Prashant Sharma authored
      ...'t depend on
      
      Publish local in maven term is `install`
      
      and publish otherwise is `deploy`
      
      So disabled both for following projects.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2329 from ScrapCodes/SPARK-3452/maven-skip-install and squashes the following commits:
      
      257b79a [Prashant Sharma] [SPARK-3452] Maven build should skip publishing artifacts people shouldn't depend on
      f493f798
  2. Sep 12, 2014
  3. Sep 11, 2014
    • Andrew Or's avatar
      [Spark-3490] Disable SparkUI for tests · 6324eb7b
      Andrew Or authored
      We currently open many ephemeral ports during the tests, and as a result we occasionally can't bind to new ones. This has caused the `DriverSuite` and the `SparkSubmitSuite` to fail intermittently.
      
      By disabling the `SparkUI` when it's not needed, we already cut down on the number of ports opened significantly, on the order of the number of `SparkContexts` ever created. We must keep it enabled for a few tests for the UI itself, however.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2363 from andrewor14/disable-ui-for-tests and squashes the following commits:
      
      332a7d5 [Andrew Or] No need to set spark.ui.port to 0 anymore
      30c93a2 [Andrew Or] Simplify streaming UISuite
      a431b84 [Andrew Or] Fix streaming test failures
      8f5ae53 [Andrew Or] Fix no new line at the end
      29c9b5b [Andrew Or] Disable SparkUI for tests
      6324eb7b
    • Chris Cope's avatar
      [SPARK-2140] Updating heap memory calculation for YARN stable and alpha. · ed1980ff
      Chris Cope authored
      Updated pull request, reflecting YARN stable and alpha states. I am getting intermittent test failures on my own test infrastructure. Is that tracked anywhere yet?
      
      Author: Chris Cope <ccope@resilientscience.com>
      
      Closes #2253 from copester/master and squashes the following commits:
      
      5ad89da [Chris Cope] [SPARK-2140] Removing calculateAMMemory functions since they are no longer needed.
      52b4e45 [Chris Cope] [SPARK-2140] Updating heap memory calculation for YARN stable and alpha.
      ed1980ff
  4. Sep 10, 2014
    • Sandy Ryza's avatar
      SPARK-1713. Use a thread pool for launching executors. · 1f4a648d
      Sandy Ryza authored
      This patch copies the approach used in the MapReduce application master for launching containers.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #663 from sryza/sandy-spark-1713 and squashes the following commits:
      
      036550d [Sandy Ryza] SPARK-1713. [YARN] Use a threadpool for launching executor containers
      1f4a648d
    • Josh Rosen's avatar
      26503fdf
    • Benoy Antony's avatar
      [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme i... · 6f7a7683
      Benoy Antony authored
      ...s https
      
      Author: Benoy Antony <benoy@apache.org>
      
      Closes #2276 from benoyantony/SPARK-3286 and squashes the following commits:
      
      c3d51ee [Benoy Antony] Use address with scheme, but Allpha version removes the scheme
      e82f94e [Benoy Antony] Use address with scheme, but Allpha version removes the scheme
      92127c9 [Benoy Antony] rebasing from master
      450c536 [Benoy Antony] [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme is https
      f060c02 [Benoy Antony] [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme is https
      6f7a7683
  5. Sep 06, 2014
  6. Sep 05, 2014
    • Thomas Graves's avatar
      [SPARK-3375] spark on yarn container allocation issues · 62c55760
      Thomas Graves authored
      If yarn doesn't get the containers immediately it stops asking for them and the yarn application hangs with never getting any executors.
      
      The issue here is that we are sending the number of containers as 0 after we send the original one of X. on the yarn side this clears out the original request.
      
      For a ping we should just send empty asks.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #2275 from tgravescs/SPARK-3375 and squashes the following commits:
      
      74b6820 [Thomas Graves] send empty resource requests when we aren't asking for containers
      62c55760
    • Thomas Graves's avatar
      [SPARK-3260] yarn - pass acls along with executor launch · 51b53a75
      Thomas Graves authored
      Pass along the acl settings when we launch a container so that they can be applied to viewing the logs on a running NodeManager.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #2185 from tgravescs/SPARK-3260 and squashes the following commits:
      
      6f94b5a [Thomas Graves] make unit test more robust
      28b9dd3 [Thomas Graves] yarn - pass acls along with executor launch
      51b53a75
  7. Sep 03, 2014
    • Marcelo Vanzin's avatar
      [SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. · f2b5b619
      Marcelo Vanzin authored
      This change exposes the application ID generated by the Spark Master, Mesos or Yarn
      via the SparkListenerApplicationStart event. It then uses that information to expose the
      application via its ID in the history server, instead of using the internal directory name
      generated by the event logger as an application id. This allows someone who knows
      the application ID to easily figure out the URL for the application's entry in the HS, aside
      from looking better.
      
      In Yarn mode, this is used to generate a direct link from the RM application list to the
      Spark history server entry (thus providing a fix for SPARK-2150).
      
      Note this sort of assumes that the different managers will generate app ids that are
      sufficiently different from each other that clashes will not occur.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Andrew Or <andrewor14@gmail.com>
      
      Closes #1218 from vanzin/yarn-hs-link-2 and squashes the following commits:
      
      2d19f3c [Marcelo Vanzin] Review feedback.
      6706d3a [Marcelo Vanzin] Implement applicationId() in base classes.
      56fe42e [Marcelo Vanzin] Fix cluster mode history address, plus a cleanup.
      44112a8 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      8278316 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      a86bbcf [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      a0056e6 [Marcelo Vanzin] Unbreak test.
      4b10cfd [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      cb0cab2 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      25f2826 [Marcelo Vanzin] Add MIMA excludes.
      f0ba90f [Marcelo Vanzin] Use BufferedIterator.
      c90a08d [Marcelo Vanzin] Remove unused code.
      3f8ec66 [Marcelo Vanzin] Review feedback.
      21aa71b [Marcelo Vanzin] Fix JSON test.
      b022bae [Marcelo Vanzin] Undo SparkContext cleanup.
      c6d7478 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      4e3483f [Marcelo Vanzin] Fix test.
      57517b8 [Marcelo Vanzin] Review feedback. Mostly, more consistent use of Scala's Option.
      311e49d [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      d35d86f [Marcelo Vanzin] Fix yarn backend after rebase.
      36dc362 [Marcelo Vanzin] Don't use Iterator::takeWhile().
      0afd696 [Marcelo Vanzin] Wait until master responds before returning from start().
      abc4697 [Marcelo Vanzin] Make FsHistoryProvider keep a map of applications by id.
      26b266e [Marcelo Vanzin] Use Mesos framework ID as Spark application ID.
      b3f3664 [Marcelo Vanzin] [yarn] Make the RM link point to the app direcly in the HS.
      2fb7de4 [Marcelo Vanzin] Expose the application ID in the ApplicationStart event.
      ed10348 [Marcelo Vanzin] Expose application id to spark context.
      f2b5b619
    • Marcelo Vanzin's avatar
      [SPARK-3187] [yarn] Cleanup allocator code. · 6a72a369
      Marcelo Vanzin authored
      Move all shared logic to the base YarnAllocator class, and leave
      the version-specific logic in the version-specific module.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2169 from vanzin/SPARK-3187 and squashes the following commits:
      
      46c2826 [Marcelo Vanzin] Hide the privates.
      4dc9c83 [Marcelo Vanzin] Actually release containers.
      8b1a077 [Marcelo Vanzin] Changes to the Yarn alpha allocator.
      f3f5f1d [Marcelo Vanzin] [SPARK-3187] [yarn] Cleanup allocator code.
      6a72a369
  8. Sep 02, 2014
    • Marcelo Vanzin's avatar
      [SPARK-3347] [yarn] Fix yarn-alpha compilation. · 066f31a6
      Marcelo Vanzin authored
      Missing import. Oops.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2236 from vanzin/SPARK-3347 and squashes the following commits:
      
      594fc39 [Marcelo Vanzin] [SPARK-3347] [yarn] Fix yarn-alpha compilation.
      066f31a6
  9. Aug 31, 2014
    • scwf's avatar
      [SPARK-3010] fix redundant conditional · 725715cb
      scwf authored
      https://issues.apache.org/jira/browse/SPARK-3010
      
      this pr is to fix redundant conditional in spark, such as
      1.
      private[spark] def codegenEnabled: Boolean =
      if (getConf(CODEGEN_ENABLED, "false") == "true") true else false
      2.
      x => if (x == 2) true else false
      ...
      
      Author: scwf <wangfei1@huawei.com>
      Author: wangfei <wangfei_hello@126.com>
      
      Closes #1992 from scwf/condition and squashes the following commits:
      
      b2a044a [scwf] merge SecurityManager
      e16239c [scwf] fix confilct
      6811401 [scwf] fix merge confilct
      0824df4 [scwf] Merge branch 'master' of https://github.com/apache/spark into patch-4
      e274515 [scwf] fix redundant conditions
      d032bf9 [wangfei] [SQL]Excess judgment
      725715cb
  10. Aug 30, 2014
    • Marcelo Vanzin's avatar
      [SPARK-2889] Create Hadoop config objects consistently. · b6cf1348
      Marcelo Vanzin authored
      Different places in the code were instantiating Configuration / YarnConfiguration objects in different ways. This could lead to confusion for people who actually expected "spark.hadoop.*" options to end up in the configs used by Spark code, since that would only happen for the SparkContext's config.
      
      This change modifies most places to use SparkHadoopUtil to initialize configs, and make that method do the translation that previously was only done inside SparkContext.
      
      The places that were not changed fall in one of the following categories:
      - Test code where this doesn't really matter
      - Places deep in the code where plumbing SparkConf would be too difficult for very little gain
      - Default values for arguments - since the caller can provide their own config in that case
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #1843 from vanzin/SPARK-2889 and squashes the following commits:
      
      52daf35 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      f179013 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      51e71cf [Marcelo Vanzin] Add test to ensure that overriding Yarn configs works.
      53f9506 [Marcelo Vanzin] Add DeveloperApi annotation.
      3d345cb [Marcelo Vanzin] Restore old method for backwards compat.
      fc45067 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      0ac3fdf [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      3f26760 [Marcelo Vanzin] Compilation fix.
      f16cadd [Marcelo Vanzin] Initialize config in SparkHadoopUtil.
      b8ab173 [Marcelo Vanzin] Update Utils API to take a Configuration argument.
      1e7003f [Marcelo Vanzin] Replace explicit Configuration instantiation with SparkHadoopUtil.
      b6cf1348
  11. Aug 29, 2014
  12. Aug 28, 2014
    • Sandy Ryza's avatar
      SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requeste... · 92af2314
      Sandy Ryza authored
      ...d queue doesn't exist
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1984 from sryza/sandy-spark-3082 and squashes the following commits:
      
      fe08c37 [Sandy Ryza] Remove log message entirely
      85253ad [Sandy Ryza] SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist
      92af2314
  13. Aug 27, 2014
    • Marcelo Vanzin's avatar
      [SPARK-2933] [yarn] Refactor and cleanup Yarn AM code. · b92d823a
      Marcelo Vanzin authored
      This change modifies the Yarn module so that all the logic related
      to running the ApplicationMaster is localized. Instead of, previously,
      4 different classes with mostly identical code, now we have:
      
      - A single, shared ApplicationMaster class, which can operate both in
        client and cluster mode, and substitutes the old ApplicationMaster
        (for cluster mode) and ExecutorLauncher (for client mode).
      
      The benefit here is that all different execution modes for all supported
      yarn versions use the same shared code for monitoring executor allocation,
      setting up configuration, and monitoring the process's lifecycle.
      
      - A new YarnRMClient interface, which defines basic RM functionality needed
        by the ApplicationMaster. This interface has concrete implementations for
        each supported Yarn version.
      
      - A new YarnAllocator interface, which just abstracts the existing interface
        of the YarnAllocationHandler class. This is to avoid having to touch the
        allocator code too much in this change, although it might benefit from a
        similar effort in the future.
      
      The end result is much easier to understand code, with much less duplication,
      making it much easier to fix bugs, add features, and test everything knowing
      that all supported versions will behave the same.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2020 from vanzin/SPARK-2933 and squashes the following commits:
      
      3bbf3e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
      ff389ed [Marcelo Vanzin] Do not interrupt reporter thread from within itself.
      3a8ed37 [Marcelo Vanzin] Remote stale comment.
      0f5142c [Marcelo Vanzin] Review feedback.
      41f8c8a [Marcelo Vanzin] Fix app status reporting.
      c0794be [Marcelo Vanzin] Correctly clean up staging directory.
      92770cc [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
      ecaf332 [Marcelo Vanzin] Small fix to shutdown code.
      f02d3f8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
      f581122 [Marcelo Vanzin] Review feedback.
      557fdeb [Marcelo Vanzin] Cleanup a couple more constants.
      be6068d [Marcelo Vanzin] Restore shutdown hook to clean up staging dir.
      5150993 [Marcelo Vanzin] Some more cleanup.
      b6289ab [Marcelo Vanzin] Move cluster/client code to separate methods.
      ecb23cd [Marcelo Vanzin] More trivial cleanup.
      34f1e63 [Marcelo Vanzin] Fix some questionable error handling.
      5657c7d [Marcelo Vanzin] Finish app if SparkContext initialization times out.
      0e4be3d [Marcelo Vanzin] Keep "ExecutorLauncher" as the main class for client-mode AM.
      91beabb [Marcelo Vanzin] Fix UI filter registration.
      8c72239 [Marcelo Vanzin] Trivial cleanups.
      99a52d5 [Marcelo Vanzin] Changes to the yarn-alpha project to use common AM code.
      848ca6d [Marcelo Vanzin] [SPARK-2933] [yarn] Refactor and cleanup Yarn AM code.
      b92d823a
  14. Aug 26, 2014
    • Andrew Or's avatar
      [SPARK-2886] Use more specific actor system name than "spark" · b21ae5bb
      Andrew Or authored
      As of #1777 we log the name of the actor system when it binds to a port. The current name "spark" is super general and does not convey any meaning. For instance, the following line is taken from my driver log after setting `spark.driver.port` to 5001.
      ```
      14/08/13 19:33:29 INFO Remoting: Remoting started; listening on addresses:
      [akka.tcp://sparkandrews-mbp:5001]
      14/08/13 19:33:29 INFO Remoting: Remoting now listens on addresses:
      [akka.tcp://sparkandrews-mbp:5001]
      14/08/06 13:40:05 INFO Utils: Successfully started service 'spark' on port 5001.
      ```
      This commit renames this to "sparkDriver" and "sparkExecutor". The goal of this unambitious PR is simply to make the logged information more explicit without introducing any change in functionality.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1810 from andrewor14/service-name and squashes the following commits:
      
      8c459ed [Andrew Or] Use a common variable for driver/executor actor system names
      3a92843 [Andrew Or] Change actor name to sparkDriver and sparkExecutor
      921363e [Andrew Or] Merge branch 'master' of github.com:apache/spark into service-name
      c8c6a62 [Andrew Or] Do not include hyphens in actor name
      1c1b42e [Andrew Or] Avoid spaces in akka system name
      f644b55 [Andrew Or] Use more specific service name
      b21ae5bb
  15. Aug 22, 2014
  16. Aug 20, 2014
    • Josh Rosen's avatar
      [SPARK-2974] [SPARK-2975] Fix two bugs related to spark.local.dirs · ebcb94f7
      Josh Rosen authored
      This PR fixes two bugs related to `spark.local.dirs` and `SPARK_LOCAL_DIRS`, one where `Utils.getLocalDir()` might return an invalid directory (SPARK-2974) and another where the `SPARK_LOCAL_DIRS` override didn't affect the driver, which could cause problems when running tasks in local mode (SPARK-2975).
      
      This patch fixes both issues: the new `Utils.getOrCreateLocalRootDirs(conf: SparkConf)` utility method manages the creation of local directories and handles the precedence among the different configuration options, so we should see the same behavior whether we're running in local mode or on a worker.
      
      It's kind of a pain to mock out environment variables in tests (no easy way to mock System.getenv), so I added a `private[spark]` method to SparkConf for accessing environment variables (by default, it just delegates to System.getenv).  By subclassing SparkConf and overriding this method, we can mock out SPARK_LOCAL_DIRS in tests.
      
      I also fixed a typo in PySpark where we used `SPARK_LOCAL_DIR` instead of `SPARK_LOCAL_DIRS` (I think this was technically innocuous, but it seemed worth fixing).
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #2002 from JoshRosen/local-dirs and squashes the following commits:
      
      efad8c6 [Josh Rosen] Address review comments:
      1dec709 [Josh Rosen] Minor updates to Javadocs.
      7f36999 [Josh Rosen] Use env vars to detect if running in YARN container.
      399ac25 [Josh Rosen] Update getLocalDir() documentation.
      bb3ad89 [Josh Rosen] Remove duplicated YARN getLocalDirs() code.
      3e92d44 [Josh Rosen] Move local dirs override logic into Utils; fix bugs:
      b2c4736 [Josh Rosen] Add failing tests for SPARK-2974 and SPARK-2975.
      007298b [Josh Rosen] Allow environment variables to be mocked in tests.
      6d9259b [Josh Rosen] Fix typo in PySpark: SPARK_LOCAL_DIR should be SPARK_LOCAL_DIRS
      ebcb94f7
  17. Aug 19, 2014
    • Thomas Graves's avatar
      [SPARK-3072] YARN - Exit when reach max number failed executors · 7eb9cbc2
      Thomas Graves authored
      In some cases on hadoop 2.x the spark application master doesn't properly exit and hangs around for 10 minutes after its really done.  We should make sure it exits properly and stops the driver.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #2022 from tgravescs/SPARK-3072 and squashes the following commits:
      
      665701d [Thomas Graves] Exit when reach max number failed executors
      7eb9cbc2
  18. Aug 18, 2014
    • Marcelo Vanzin's avatar
      [SPARK-2718] [yarn] Handle quotes and other characters in user args. · 6201b276
      Marcelo Vanzin authored
      Due to the way Yarn runs things through bash, normal quoting doesn't
      work as expected. This change applies the necessary voodoo to the user
      args to avoid issues with bash and special characters.
      
      The change also uncovered an issue with the event logger app name
      sanitizing code; it wasn't cleaning up all "bad" characters, so
      sometimes it would fail to create the log dirs. I just added some
      more bad character replacements.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #1724 from vanzin/SPARK-2718 and squashes the following commits:
      
      cc84b89 [Marcelo Vanzin] Review feedback.
      c1a257a [Marcelo Vanzin] Add test for backslashes.
      55571d4 [Marcelo Vanzin] Unbreak yarn-client.
      515613d [Marcelo Vanzin] [SPARK-2718] [yarn] Handle quotes and other characters in user args.
      6201b276
  19. Aug 09, 2014
    • li-zhihui's avatar
      [SPARK-2635] Fix race condition at SchedulerBackend.isReady in standalone mode · 28dbae85
      li-zhihui authored
      In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.
      
      Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.
      
      Author: li-zhihui <zhihui.li@intel.com>
      Author: Li Zhihui <zhihui.li@intel.com>
      
      Closes #1525 from li-zhihui/fixre4s and squashes the following commits:
      
      e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
      abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
      ca54bd9 [li-zhihui] Format log with String interpolation
      88c7dc6 [li-zhihui] Few codes and docs refactor
      41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
      28dbae85
  20. Aug 05, 2014
    • Thomas Graves's avatar
      SPARK-1680: use configs for specifying environment variables on YARN · 41e0a21b
      Thomas Graves authored
      Note that this also documents spark.executorEnv.*  which to me means its public.  If we don't want that please speak up.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1512 from tgravescs/SPARK-1680 and squashes the following commits:
      
      11525df [Thomas Graves] more doc changes
      553bad0 [Thomas Graves] fix documentation
      152bf7c [Thomas Graves] fix docs
      5382326 [Thomas Graves] try fix docs
      32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN
      41e0a21b
    • Thomas Graves's avatar
      SPARK-1890 and SPARK-1891- add admin and modify acls · 1c5555a2
      Thomas Graves authored
      It was easier to combine these 2 jira since they touch many of the same places.  This pr adds the following:
      
      - adds modify acls
      - adds admin acls (list of admins/users that get added to both view and modify acls)
      - modify Kill button on UI to take modify acls into account
      - changes config name of spark.ui.acls.enable to spark.acls.enable since I choose poorly in original name. We keep backwards compatibility so people can still use spark.ui.acls.enable. The acls should apply to any web ui as well as any CLI interfaces.
      - send view and modify acls information on to YARN so that YARN interfaces can use (yarn cli for killing applications for example).
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1196 from tgravescs/SPARK-1890 and squashes the following commits:
      
      8292eb1 [Thomas Graves] review comments
      b92ec89 [Thomas Graves] remove unneeded variable from applistener
      4c765f4 [Thomas Graves] Add in admin acls
      72eb0ac [Thomas Graves] Add modify acls
      1c5555a2
    • Thomas Graves's avatar
      SPARK-1528 - spark on yarn, add support for accessing remote HDFS · 2c0f705e
      Thomas Graves authored
      Add a config (spark.yarn.access.namenodes) to allow applications running on yarn to access other secure HDFS cluster.  User just specifies the namenodes of the other clusters and we get Tokens for those and ship them with the spark application.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #1159 from tgravescs/spark-1528 and squashes the following commits:
      
      ddbcd16 [Thomas Graves] review comments
      0ac8501 [Thomas Graves] SPARK-1528 - add support for accessing remote HDFS
      2c0f705e
  21. Jul 30, 2014
    • derek ma's avatar
      Required AM memory is "amMem", not "args.amMemory" · 118c1c42
      derek ma authored
      "ERROR yarn.Client: Required AM memory (1024) is above the max threshold (1048) of this cluster" appears if this code is not changed. obviously, 1024 is less than 1048, so change this
      
      Author: derek ma <maji3@asiainfo-linkage.com>
      
      Closes #1494 from maji2014/master and squashes the following commits:
      
      b0f6640 [derek ma] Required AM memory is "amMem", not "args.amMemory"
      118c1c42
  22. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  23. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  24. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  25. Jul 24, 2014
    • GuoQiang Li's avatar
      [SPARK-2037]: yarn client mode doesn't support spark.yarn.max.executor.failures · 323a83c5
      GuoQiang Li authored
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #1180 from witgo/SPARK-2037 and squashes the following commits:
      
      3d52411 [GuoQiang Li] review commit
      7058f4d [GuoQiang Li] Correctly stop SparkContext
      6d0561f [GuoQiang Li] Fix: yarn client mode doesn't support spark.yarn.max.executor.failures
      323a83c5
    • Rahul Singhal's avatar
      SPARK-2150: Provide direct link to finished application UI in yarn resou... · 46e224aa
      Rahul Singhal authored
      ...rce manager UI
      
      Use the event logger directory to provide a direct link to finished
      application UI in yarn resourcemanager UI.
      
      Author: Rahul Singhal <rahul.singhal@guavus.com>
      
      Closes #1094 from rahulsinghaliitd/SPARK-2150 and squashes the following commits:
      
      95f230c [Rahul Singhal] SPARK-2150: Provide direct link to finished application UI in yarn resource manager UI
      46e224aa
  26. Jul 22, 2014
    • Gera Shegalov's avatar
      [YARN] SPARK-2577: File upload to viewfs is broken due to mount point re... · 02e45729
      Gera Shegalov authored
      Opting to the option 2 defined in SPARK-2577, i.e., retrieve and pass the correct file system object to addResource.
      
      Author: Gera Shegalov <gera@twitter.com>
      
      Closes #1483 from gerashegalov/master and squashes the following commits:
      
      90c9087 [Gera Shegalov] [YARN] SPARK-2577: File upload to viewfs is broken due to mount point resolution
      02e45729
  27. Jul 21, 2014
    • Sandy Ryza's avatar
      SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler · f89cf65d
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #634 from sryza/sandy-spark-1707 and squashes the following commits:
      
      2f6e358 [Sandy Ryza] Default min registered executors ratio to .8 for YARN
      354c630 [Sandy Ryza] Remove outdated comments
      c744ef3 [Sandy Ryza] Take out waitForInitialAllocations
      2a4329b [Sandy Ryza] SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler
      f89cf65d
  28. Jul 15, 2014
    • witgo's avatar
      SPARK-1291: Link the spark UI to RM ui in yarn-client mode · 72ea56da
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #1112 from witgo/SPARK-1291 and squashes the following commits:
      
      6022bcd [witgo] review commit
      1fbb925 [witgo] add addAmIpFilter to yarn alpha
      210299c [witgo] review commit
      1b92a07 [witgo] review commit
      6896586 [witgo] Add comments to addWebUIFilter
      3e9630b [witgo] review commit
      142ee29 [witgo] review commit
      1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
      72ea56da
  29. Jul 14, 2014
    • li-zhihui's avatar
      [SPARK-1946] Submit tasks after (configured ratio) executors have been registered · 3dd8af7a
      li-zhihui authored
      Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.
      
      A simple solution is sleeping few seconds in application, so that executors have enough time to register.
      
      The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.
      
      \# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0
      spark.scheduler.minRegisteredExecutorsRatio = 0.8
      
      \# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000
      spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000
      
      Author: li-zhihui <zhihui.li@intel.com>
      
      Closes #900 from li-zhihui/master and squashes the following commits:
      
      b9f8326 [li-zhihui] Add logs & edit docs
      1ac08b1 [li-zhihui] Add new configs to user docs
      22ead12 [li-zhihui] Move waitBackendReady to postStartHook
      c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS
      4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor
      0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks
      4261454 [li-zhihui] Add docs for new configs & code style
      ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime
      6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha
      812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode
      e7b6272 [li-zhihui] support yarn-cluster
      37f7dc2 [li-zhihui] support yarn mode(percentage style)
      3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
      3dd8af7a
Loading