Skip to content
Snippets Groups Projects
  1. Mar 30, 2017
  2. Mar 17, 2017
    • Sital Kedia's avatar
      [SPARK-13369] Add config for number of consecutive fetch failures · 7b5d873a
      Sital Kedia authored
      The previously hardcoded max 4 retries per stage is not suitable for all cluster configurations. Since spark retries a stage at the sign of the first fetch failure, you can easily end up with many stage retries to discover all the failures. In particular, two scenarios this value should change are (1) if there are more than 4 executors per node; in that case, it may take 4 retries to discover the problem with each executor on the node and (2) during cluster maintenance on large clusters, where multiple machines are serviced at once, but you also cannot afford total cluster downtime. By making this value configurable, cluster managers can tune this value to something more appropriate to their cluster configuration.
      
      Unit tests
      
      Author: Sital Kedia <skedia@fb.com>
      
      Closes #17307 from sitalkedia/SPARK-13369.
      7b5d873a
  3. Feb 24, 2017
    • Shubham Chopra's avatar
      [SPARK-15355][CORE] Proactive block replication · fa7c582e
      Shubham Chopra authored
      ## What changes were proposed in this pull request?
      
      We are proposing addition of pro-active block replication in case of executor failures. BlockManagerMasterEndpoint does all the book-keeping to keep a track of all the executors and the blocks they hold. It also keeps a track of which executors are alive through heartbeats. When an executor is removed, all this book-keeping state is updated to reflect the lost executor. This step can be used to identify executors that are still in possession of a copy of the cached data and a message could be sent to them to use the existing "replicate" function to find and place new replicas on other suitable hosts. Blocks replicated this way will let the master know of their existence.
      
      This can happen when an executor is lost, and would that way be pro-active as opposed be being done at query time.
      ## How was this patch tested?
      
      This patch was tested with existing unit tests along with new unit tests added to test the functionality.
      
      Author: Shubham Chopra <schopra31@bloomberg.net>
      
      Closes #14412 from shubhamchopra/ProactiveBlockReplication.
      fa7c582e
  4. Feb 09, 2017
    • José Hiram Soltren's avatar
      [SPARK-16554][CORE] Automatically Kill Executors and Nodes when they are Blacklisted · 6287c94f
      José Hiram Soltren authored
      ## What changes were proposed in this pull request?
      
      In SPARK-8425, we introduced a mechanism for blacklisting executors and nodes (hosts). After a certain number of failures, these resources would be "blacklisted" and no further work would be assigned to them for some period of time.
      
      In some scenarios, it is better to fail fast, and to simply kill these unreliable resources. This changes proposes to do so by having the BlacklistTracker kill unreliable resources when they would otherwise be "blacklisted".
      
      In order to be thread safe, this code depends on the CoarseGrainedSchedulerBackend sending a message to the driver backend in order to do the actual killing. This also helps to prevent a race which would permit work to begin on a resource (executor or node), between the time the resource is marked for killing and the time at which it is finally killed.
      
      ## How was this patch tested?
      
      ./dev/run-tests
      Ran https://github.com/jsoltren/jose-utils/blob/master/blacklist/test-blacklist.sh, and checked logs to see executors and nodes being killed.
      
      Testing can likely be improved here; suggestions welcome.
      
      Author: José Hiram Soltren <jose@cloudera.com>
      
      Closes #16650 from jsoltren/SPARK-16554-submit.
      6287c94f
    • Marcelo Vanzin's avatar
      [SPARK-17874][CORE] Add SSL port configuration. · 3fc8e8ca
      Marcelo Vanzin authored
      Make the SSL port configuration explicit, instead of deriving it
      from the non-SSL port, but retain the existing functionality in
      case anyone depends on it.
      
      The change starts the HTTPS and HTTP connectors separately, so
      that it's possible to use independent ports for each. For that to
      work, the initialization of the server needs to be shuffled around
      a bit. The change also makes it so the initialization of both
      connectors is similar, and end up using the same Scheduler - previously
      only the HTTP connector would use the correct one.
      
      Also fixed some outdated documentation about a couple of services
      that were removed long ago.
      
      Tested with unit tests and by running spark-shell with SSL configs.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16625 from vanzin/SPARK-17874.
      3fc8e8ca
  5. Jan 24, 2017
    • Marcelo Vanzin's avatar
      [SPARK-19139][CORE] New auth mechanism for transport library. · 8f3f73ab
      Marcelo Vanzin authored
      This change introduces a new auth mechanism to the transport library,
      to be used when users enable strong encryption. This auth mechanism
      has better security than the currently used DIGEST-MD5.
      
      The new protocol uses symmetric key encryption to mutually authenticate
      the endpoints, and is very loosely based on ISO/IEC 9798.
      
      The new protocol falls back to SASL when it thinks the remote end is old.
      Because SASL does not support asking the server for multiple auth protocols,
      which would mean we could re-use the existing SASL code by just adding a
      new SASL provider, the protocol is implemented outside of the SASL API
      to avoid the boilerplate of adding a new provider.
      
      Details of the auth protocol are discussed in the included README.md
      file.
      
      This change partly undos the changes added in SPARK-13331; AES encryption
      is now decoupled from SASL authentication. The encryption code itself,
      though, has been re-used as part of this change.
      
      ## How was this patch tested?
      
      - Unit tests
      - Tested Spark 2.2 against Spark 1.6 shuffle service with SASL enabled
      - Tested Spark 2.2 against Spark 2.2 shuffle service with SASL fallback disabled
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16521 from vanzin/SPARK-19139.
      8f3f73ab
    • uncleGen's avatar
      [DOCS] Fix typo in docs · 7c61c2a1
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      Fix typo in docs
      
      ## How was this patch tested?
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #16658 from uncleGen/typo-issue.
      7c61c2a1
  6. Jan 23, 2017
  7. Jan 11, 2017
    • Bryan Cutler's avatar
      [SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings... · 3bc2eff8
      Bryan Cutler authored
      [SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings used to resolve packages/artifacts
      
      ## What changes were proposed in this pull request?
      
      Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality.  This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.
      
      This change restructures the creation of the IvySettings object in two distinct ways.  First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included.  Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.
      ## How was this patch tested?
      
      Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined.  Added new test to load a simple Ivy settings file with a local filesystem resolver.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      Author: Ian Hummel <ian@themodernlife.net>
      
      Closes #15119 from BryanCutler/spark-custom-IvySettings.
      3bc2eff8
  8. Jan 07, 2017
    • Sean Owen's avatar
      [SPARK-19106][DOCS] Styling for the configuration docs is broken · 54138f6e
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      configuration.html section headings were not specified correctly in markdown and weren't rendering, being recognized correctly. Removed extra p tags and pulled level 4 titles up to level 3, since level 3 had been skipped. This improves the TOC.
      
      ## How was this patch tested?
      
      Doc build, manual check.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16490 from srowen/SPARK-19106.
      54138f6e
  9. Dec 27, 2016
    • Yuexin Zhang's avatar
      [SPARK-19006][DOCS] mention spark.kryoserializer.buffer.max must be less than 2048m in doc · 28ab0ec4
      Yuexin Zhang authored
      ## What changes were proposed in this pull request?
      
      On configuration doc page:https://spark.apache.org/docs/latest/configuration.html
      We mentioned spark.kryoserializer.buffer.max : Maximum allowable size of Kryo serialization buffer. This must be larger than any object you attempt to serialize. Increase this if you get a "buffer limit exceeded" exception inside Kryo.
      from source code, it has hard coded upper limit :
      ```
      val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", "64m").toInt
      if (maxBufferSizeMb >= ByteUnit.GiB.toMiB(2))
      { throw new IllegalArgumentException("spark.kryoserializer.buffer.max must be less than " + s"2048 mb, got: + $maxBufferSizeMb mb.") }
      ```
      We should mention "this value must be less than 2048 mb" on the configuration doc page as well.
      
      ## How was this patch tested?
      
      None. Since it's minor doc change.
      
      Author: Yuexin Zhang <yxzhang@cloudera.com>
      
      Closes #16412 from cnZach/SPARK-19006.
      28ab0ec4
  10. Dec 19, 2016
    • Josh Rosen's avatar
      [SPARK-18761][CORE] Introduce "task reaper" to oversee task killing in executors · fa829ce2
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      Spark's current task cancellation / task killing mechanism is "best effort" because some tasks may not be interruptible or may not respond to their "killed" flags being set. If a significant fraction of a cluster's task slots are occupied by tasks that have been marked as killed but remain running then this can lead to a situation where new jobs and tasks are starved of resources that are being used by these zombie tasks.
      
      This patch aims to address this problem by adding a "task reaper" mechanism to executors. At a high-level, task killing now launches a new thread which attempts to kill the task and then watches the task and periodically checks whether it has been killed. The TaskReaper will periodically re-attempt to call `TaskRunner.kill()` and will log warnings if the task keeps running. I modified TaskRunner to rename its thread at the start of the task, allowing TaskReaper to take a thread dump and filter it in order to log stacktraces from the exact task thread that we are waiting to finish. If the task has not stopped after a configurable timeout then the TaskReaper will throw an exception to trigger executor JVM death, thereby forcibly freeing any resources consumed by the zombie tasks.
      
      This feature is flagged off by default and is controlled by four new configurations under the `spark.task.reaper.*` namespace. See the updated `configuration.md` doc for details.
      
      ## How was this patch tested?
      
      Tested via a new test case in `JobCancellationSuite`, plus manual testing.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #16189 from JoshRosen/cancellation.
      fa829ce2
  11. Dec 18, 2016
  12. Dec 15, 2016
    • Imran Rashid's avatar
      [SPARK-8425][CORE] Application Level Blacklisting · 93cdb8a7
      Imran Rashid authored
      ## What changes were proposed in this pull request?
      
      This builds upon the blacklisting introduced in SPARK-17675 to add blacklisting of executors and nodes for an entire Spark application.  Resources are blacklisted based on tasks that fail, in tasksets that eventually complete successfully; they are automatically returned to the pool of active resources based on a timeout.  Full details are available in a design doc attached to the jira.
      ## How was this patch tested?
      
      Added unit tests, ran them via Jenkins, also ran a handful of them in a loop to check for flakiness.
      
      The added tests include:
      - verifying BlacklistTracker works correctly
      - verifying TaskSchedulerImpl interacts with BlacklistTracker correctly (via a mock BlacklistTracker)
      - an integration test for the entire scheduler with blacklisting in a few different scenarios
      
      Author: Imran Rashid <irashid@cloudera.com>
      Author: mwws <wei.mao@intel.com>
      
      Closes #14079 from squito/blacklist-SPARK-8425.
      93cdb8a7
  13. Dec 12, 2016
    • Marcelo Vanzin's avatar
      [SPARK-18773][CORE] Make commons-crypto config translation consistent. · bc59951b
      Marcelo Vanzin authored
      This change moves the logic that translates Spark configuration to
      commons-crypto configuration to the network-common module. It also
      extends TransportConf and ConfigProvider to provide the necessary
      interfaces for the translation to work.
      
      As part of the change, I removed SystemPropertyConfigProvider, which
      was mostly used as an "empty config" in unit tests, and adjusted the
      very few tests that required a specific config.
      
      I also changed the config keys for AES encryption to live under the
      "spark.network." namespace, which is more correct than their previous
      names under "spark.authenticate.".
      
      Tested via existing unit test.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #16200 from vanzin/SPARK-18773.
      bc59951b
  14. Dec 05, 2016
  15. Nov 28, 2016
    • Marcelo Vanzin's avatar
      [SPARK-18547][CORE] Propagate I/O encryption key when executors register. · 8b325b17
      Marcelo Vanzin authored
      This change modifies the method used to propagate encryption keys used during
      shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
      this change explicitly distributes the key using the messages exchanged between
      driver and executor during registration. When RPC encryption is enabled, this means
      key propagation is also secure.
      
      This allows shuffle encryption to work in non-YARN mode, which means that it's
      easier to write unit tests for areas of the code that are affected by the feature.
      
      The key is stored in the SecurityManager; because there are many instances of
      that class used in the code, the key is only guaranteed to exist in the instance
      managed by the SparkEnv. This path was chosen to avoid storing the key in the
      SparkConf, which would risk having the key being written to disk as part of the
      configuration (as, for example, is done when starting YARN applications).
      
      Tested by new and existing unit tests (which were moved from the YARN module to
      core), and by running apps with shuffle encryption enabled.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #15981 from vanzin/SPARK-18547.
      8b325b17
    • Mark Grover's avatar
      [SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI · 237c3b96
      Mark Grover authored
      ## What changes were proposed in this pull request?
      
      This patch adds a new property called `spark.secret.redactionPattern` that
      allows users to specify a scala regex to decide which Spark configuration
      properties and environment variables in driver and executor environments
      contain sensitive information. When this regex matches the property or
      environment variable name, its value is redacted from the environment UI and
      various logs like YARN and event logs.
      
      This change uses this property to redact information from event logs and YARN
      logs. It also, updates the UI code to adhere to this property instead of
      hardcoding the logic to decipher which properties are sensitive.
      
      Here's an image of the UI post-redaction:
      ![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png)
      
      Here's the text in the YARN logs, post-redaction:
      ``HADOOP_CREDSTORE_PASSWORD -> *********(redacted)``
      
      Here's the text in the event logs, post-redaction:
      ``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...``
      
      ## How was this patch tested?
      1. Unit tests are added to ensure that redaction works.
      2. A YARN job reading data off of S3 with confidential information
      (hadoop credential provider password) being provided in the environment
      variables of driver and executor. And, afterwards, logs were grepped to make
      sure that no mention of secret password was present. It was also ensure that
      the job was able to read the data off of S3 correctly, thereby ensuring that
      the sensitive information was being trickled down to the right places to read
      the data.
      3. The event logs were checked to make sure no mention of secret password was
      present.
      4. UI environment tab was checked to make sure there was no secret information
      being displayed.
      
      Author: Mark Grover <mark@apache.org>
      
      Closes #15971 from markgrover/master_redaction.
      237c3b96
  16. Nov 19, 2016
    • Sean Owen's avatar
      [SPARK-18353][CORE] spark.rpc.askTimeout defalut value is not 120s · 8b1e1088
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Avoid hard-coding spark.rpc.askTimeout to non-default in Client; fix doc about spark.rpc.askTimeout default
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15833 from srowen/SPARK-18353.
      8b1e1088
  17. Nov 16, 2016
    • Weiqing Yang's avatar
      [MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and... · 241e04bc
      Weiqing Yang authored
      [MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation
      
      ## What changes were proposed in this pull request?
      
      Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation.
      
      ## How was this patch tested?
      Manually.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #15886 from weiqingy/fixTypo.
      241e04bc
  18. Nov 11, 2016
    • Weiqing Yang's avatar
      [SPARK-16759][CORE] Add a configuration property to pass caller contexts of... · 3af89451
      Weiqing Yang authored
      [SPARK-16759][CORE] Add a configuration property to pass caller contexts of upstream applications into Spark
      
      ## What changes were proposed in this pull request?
      
      Many applications take Spark as a computing engine and run on it. This PR adds a configuration property `spark.log.callerContext` that can be used by Spark's upstream applications (e.g. Oozie) to set up their caller contexts into Spark. In the end, Spark will combine its own caller context with the caller contexts of its upstream applications, and write them into Yarn RM log and HDFS audit log.
      
      The audit log has a config to truncate the caller contexts passed in (default 128). The caller contexts will be sent over rpc, so it should be concise. The call context written into HDFS log and Yarn log consists of two parts: the information `A` specified by Spark itself and the value `B` of `spark.log.callerContext` property.  Currently `A` typically takes 64 to 74 characters,  so `B` can have up to 50 characters (mentioned in the doc `running-on-yarn.md`)
      ## How was this patch tested?
      
      Manual tests. I have run some Spark applications with `spark.log.callerContext` configuration in Yarn client/cluster mode, and verified that the caller contexts were written into Yarn RM log and HDFS audit log correctly.
      
      The ways to configure `spark.log.callerContext` property:
      - In spark-defaults.conf:
      
      ```
      spark.log.callerContext  infoSpecifiedByUpstreamApp
      ```
      - In app's source code:
      
      ```
      val spark = SparkSession
            .builder
            .appName("SparkKMeans")
            .config("spark.log.callerContext", "infoSpecifiedByUpstreamApp")
            .getOrCreate()
      ```
      
      When running on Spark Yarn cluster mode, the driver is unable to pass 'spark.log.callerContext' to Yarn client and AM since Yarn client and AM have already started before the driver performs `.config("spark.log.callerContext", "infoSpecifiedByUpstreamApp")`.
      
      The following  example shows the command line used to submit a SparkKMeans application and the corresponding records in Yarn RM log and HDFS audit log.
      
      Command:
      
      ```
      ./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5
      ```
      
      Yarn RM log:
      
      <img width="1440" alt="screen shot 2016-10-19 at 9 12 03 pm" src="https://cloud.githubusercontent.com/assets/8546874/19547050/7d2f278c-9649-11e6-9df8-8d5ff12609f0.png">
      
      HDFS audit log:
      
      <img width="1400" alt="screen shot 2016-10-19 at 10 18 14 pm" src="https://cloud.githubusercontent.com/assets/8546874/19547102/096060ae-964a-11e6-981a-cb28efd5a058.png">
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #15563 from weiqingy/SPARK-16759.
      3af89451
    • Junjie Chen's avatar
      [SPARK-13331] AES support for over-the-wire encryption · 4f15d94c
      Junjie Chen authored
      ## What changes were proposed in this pull request?
      
      DIGEST-MD5 mechanism is used for SASL authentication and secure communication. DIGEST-MD5 mechanism supports 3DES, DES, and RC4 ciphers. However, 3DES, DES and RC4 are slow relatively.
      
      AES provide better performance and security by design and is a replacement for 3DES according to NIST. Apache Common Crypto is a cryptographic library optimized with AES-NI, this patch employ Apache Common Crypto as enc/dec backend for SASL authentication and secure channel to improve spark RPC.
      ## How was this patch tested?
      
      Unit tests and Integration test.
      
      Author: Junjie Chen <junjie.j.chen@intel.com>
      
      Closes #15172 from cjjnjust/shuffle_rpc_encrypt.
      4f15d94c
  19. Nov 07, 2016
    • fidato's avatar
      [SPARK-16575][CORE] partition calculation mismatch with sc.binaryFiles · 6f369713
      fidato authored
      ## What changes were proposed in this pull request?
      
      This Pull request comprises of the critical bug SPARK-16575 changes. This change rectifies the issue with BinaryFileRDD partition calculations as  upon creating an RDD with sc.binaryFiles, the resulting RDD always just consisted of two partitions only.
      ## How was this patch tested?
      
      The original issue ie. getNumPartitions on binary Files RDD (always having two partitions) was first replicated and then tested upon the changes. Also the unit tests have been checked and passed.
      
      This contribution is my original work and I licence the work to the project under the project's open source license
      
      srowen hvanhovell rxin vanzin skyluc kmader zsxwing datafarmer Please have a look .
      
      Author: fidato <fidato.july13@gmail.com>
      
      Closes #15327 from fidato13/SPARK-16575.
      6f369713
  20. Nov 01, 2016
    • Josh Rosen's avatar
      [SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server · 6e629815
      Josh Rosen authored
      In SPARK-4761 / #3621 (December 2014) we enabled Kryo serialization by default in the Spark Thrift Server. However, I don't think that the original rationale for doing this still holds now that most Spark SQL serialization is now performed via encoders and our UnsafeRow format.
      
      In addition, the use of Kryo as the default serializer can introduce performance problems because the creation of new KryoSerializer instances is expensive and we haven't performed instance-reuse optimizations in several code paths (including DirectTaskResult deserialization).
      
      Given all of this, I propose to revert back to using JavaSerializer as the default serializer in the Thrift Server.
      
      /cc liancheng
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14906 from JoshRosen/disable-kryo-in-thriftserver.
      6e629815
  21. Oct 30, 2016
    • Hossein's avatar
      [SPARK-17919] Make timeout to RBackend configurable in SparkR · 2881a2d1
      Hossein authored
      ## What changes were proposed in this pull request?
      
      This patch makes RBackend connection timeout configurable by user.
      
      ## How was this patch tested?
      N/A
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #15471 from falaki/SPARK-17919.
      2881a2d1
  22. Oct 26, 2016
    • Alex Bozarth's avatar
      [SPARK-4411][WEB UI] Add "kill" link for jobs in the UI · 5d0f81da
      Alex Bozarth authored
      ## What changes were proposed in this pull request?
      
      Currently users can kill stages via the web ui but not jobs directly (jobs are killed if one of their stages is). I've added the ability to kill jobs via the web ui. This code change is based on #4823 by lianhuiwang and updated to work with the latest code matching how stages are currently killed. In general I've copied the kill stage code warning and note comments and all. I also updated applicable tests and documentation.
      
      ## How was this patch tested?
      
      Manually tested and dev/run-tests
      
      ![screen shot 2016-10-11 at 4 49 43 pm](https://cloud.githubusercontent.com/assets/13952758/19292857/12f1b7c0-8fd4-11e6-8982-210249f7b697.png)
      
      Author: Alex Bozarth <ajbozart@us.ibm.com>
      Author: Lianhui Wang <lianhuiwang09@gmail.com>
      
      Closes #15441 from ajbozarth/spark4411.
      5d0f81da
  23. Oct 22, 2016
    • Sandeep Singh's avatar
      [SPARK-928][CORE] Add support for Unsafe-based serializer in Kryo · bc167a2a
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      Now since we have migrated to Kryo-3.0.0 in https://issues.apache.org/jira/browse/SPARK-11416, we can gives users option to use unsafe SerDer. It can turned by setting `spark.kryo.useUnsafe` to `true`
      
      ## How was this patch tested?
      Ran existing tests
      
      ```
           Benchmark Kryo Unsafe vs safe Serialization: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
            ------------------------------------------------------------------------------------------------
            basicTypes: Int unsafe:true                    160 /  178         98.5          10.1       1.0X
            basicTypes: Long unsafe:true                   210 /  218         74.9          13.4       0.8X
            basicTypes: Float unsafe:true                  203 /  213         77.5          12.9       0.8X
            basicTypes: Double unsafe:true                 226 /  235         69.5          14.4       0.7X
            Array: Int unsafe:true                        1087 / 1101         14.5          69.1       0.1X
            Array: Long unsafe:true                       2758 / 2844          5.7         175.4       0.1X
            Array: Float unsafe:true                      1511 / 1552         10.4          96.1       0.1X
            Array: Double unsafe:true                     2942 / 2972          5.3         187.0       0.1X
            Map of string->Double unsafe:true             2645 / 2739          5.9         168.2       0.1X
            basicTypes: Int unsafe:false                   211 /  218         74.7          13.4       0.8X
            basicTypes: Long unsafe:false                  247 /  253         63.6          15.7       0.6X
            basicTypes: Float unsafe:false                 211 /  216         74.5          13.4       0.8X
            basicTypes: Double unsafe:false                227 /  233         69.2          14.4       0.7X
            Array: Int unsafe:false                       3012 / 3032          5.2         191.5       0.1X
            Array: Long unsafe:false                      4463 / 4515          3.5         283.8       0.0X
            Array: Float unsafe:false                     2788 / 2868          5.6         177.2       0.1X
            Array: Double unsafe:false                    3558 / 3752          4.4         226.2       0.0X
            Map of string->Double unsafe:false            2806 / 2933          5.6         178.4       0.1X
      ```
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      Author: Sandeep Singh <sandeep@origamilogic.com>
      
      Closes #12913 from techaddict/SPARK-928.
      bc167a2a
  24. Oct 18, 2016
    • Yu Peng's avatar
      [SPARK-17711] Compress rolled executor log · 231f39e3
      Yu Peng authored
      ## What changes were proposed in this pull request?
      
      This PR adds support for executor log compression.
      
      ## How was this patch tested?
      
      Unit tests
      
      cc: yhuai tdas mengxr
      
      Author: Yu Peng <loneknightpy@gmail.com>
      
      Closes #15285 from loneknightpy/compress-executor-log.
      231f39e3
  25. Oct 16, 2016
  26. Oct 15, 2016
    • Zhan Zhang's avatar
      [SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks across executors · ed146334
      Zhan Zhang authored
      ## What changes were proposed in this pull request?
      
      Restructure the code and implement two new task assigner.
      PackedAssigner: try to allocate tasks to the executors with least available cores, so that spark can release reserved executors when dynamic allocation is enabled.
      
      BalancedAssigner: try to allocate tasks to the executors with more available cores in order to balance the workload across all executors.
      
      By default, the original round robin assigner is used.
      
      We test a pipeline, and new PackedAssigner  save around 45% regarding the reserved cpu and memory with dynamic allocation enabled.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Both unit test in TaskSchedulerImplSuite and manual tests in production pipeline.
      
      Author: Zhan Zhang <zhanzhang@fb.com>
      
      Closes #15218 from zhzhan/packed-scheduler.
      ed146334
  27. Oct 12, 2016
    • Imran Rashid's avatar
      [SPARK-17675][CORE] Expand Blacklist for TaskSets · 9ce7d3e5
      Imran Rashid authored
      ## What changes were proposed in this pull request?
      
      This is a step along the way to SPARK-8425.
      
      To enable incremental review, the first step proposed here is to expand the blacklisting within tasksets. In particular, this will enable blacklisting for
      * (task, executor) pairs (this already exists via an undocumented config)
      * (task, node)
      * (taskset, executor)
      * (taskset, node)
      
      Adding (task, node) is critical to making spark fault-tolerant of one-bad disk in a cluster, without requiring careful tuning of "spark.task.maxFailures". The other additions are also important to avoid many misleading task failures and long scheduling delays when there is one bad node on a large cluster.
      
      Note that some of the code changes here aren't really required for just this -- they put pieces in place for SPARK-8425 even though they are not used yet (eg. the `BlacklistTracker` helper is a little out of place, `TaskSetBlacklist` holds onto a little more info than it needs to for just this change, and `ExecutorFailuresInTaskSet` is more complex than it needs to be).
      
      ## How was this patch tested?
      
      Added unit tests, run tests via jenkins.
      
      Author: Imran Rashid <irashid@cloudera.com>
      Author: mwws <wei.mao@intel.com>
      
      Closes #15249 from squito/taskset_blacklist_only.
      9ce7d3e5
  28. Sep 21, 2016
    • Marcelo Vanzin's avatar
      [SPARK-4563][CORE] Allow driver to advertise a different network address. · 2cd1bfa4
      Marcelo Vanzin authored
      The goal of this feature is to allow the Spark driver to run in an
      isolated environment, such as a docker container, and be able to use
      the host's port forwarding mechanism to be able to accept connections
      from the outside world.
      
      The change is restricted to the driver: there is no support for achieving
      the same thing on executors (or the YARN AM for that matter). Those still
      need full access to the outside world so that, for example, connections
      can be made to an executor's block manager.
      
      The core of the change is simple: add a new configuration that tells what's
      the address the driver should bind to, which can be different than the address
      it advertises to executors (spark.driver.host). Everything else is plumbing
      the new configuration where it's needed.
      
      To use the feature, the host starting the container needs to set up the
      driver's port range to fall into a range that is being forwarded; this
      required the block manager port to need a special configuration just for
      the driver, which falls back to the existing spark.blockManager.port when
      not set. This way, users can modify the driver settings without affecting
      the executors; it would theoretically be nice to also have different
      retry counts for driver and executors, but given that docker (at least)
      allows forwarding port ranges, we can probably live without that for now.
      
      Because of the nature of the feature it's kinda hard to add unit tests;
      I just added a simple one to make sure the configuration works.
      
      This was tested with a docker image running spark-shell with the following
      command:
      
       docker blah blah blah \
         -p 38000-38100:38000-38100 \
         [image] \
         spark-shell \
           --num-executors 3 \
           --conf spark.shuffle.service.enabled=false \
           --conf spark.dynamicAllocation.enabled=false \
           --conf spark.driver.host=[host's address] \
           --conf spark.driver.port=38000 \
           --conf spark.driver.blockManager.port=38020 \
           --conf spark.ui.port=38040
      
      Running on YARN; verified the driver works, executors start up and listen
      on ephemeral ports (instead of using the driver's config), and that caching
      and shuffling (without the shuffle service) works. Clicked through the UI
      to make sure all pages (including executor thread dumps) worked. Also tested
      apps without docker, and ran unit tests.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #15120 from vanzin/SPARK-4563.
      2cd1bfa4
  29. Sep 17, 2016
  30. Sep 14, 2016
  31. Sep 08, 2016
    • Gurvinder Singh's avatar
      [SPARK-15487][WEB UI] Spark Master UI to reverse proxy Application and Workers UI · 92ce8d48
      Gurvinder Singh authored
      ## What changes were proposed in this pull request?
      
      This pull request adds the functionality to enable accessing worker and application UI through master UI itself. Thus helps in accessing SparkUI when running spark cluster in closed networks e.g. Kubernetes. Cluster admin needs to expose only spark master UI and rest of the UIs can be in the private network, master UI will reverse proxy the connection request to corresponding resource. It adds the path for workers/application UIs as
      
      WorkerUI: <http/https>://master-publicIP:<port>/target/workerID/
      ApplicationUI: <http/https>://master-publicIP:<port>/target/appID/
      
      This makes it easy for users to easily protect the Spark master cluster access by putting some reverse proxy e.g. https://github.com/bitly/oauth2_proxy
      
      ## How was this patch tested?
      
      The functionality has been tested manually and there is a unit test too for testing access to worker UI with reverse proxy address.
      
      pwendell bomeng BryanCutler can you please review it, thanks.
      
      Author: Gurvinder Singh <gurvinder.singh@uninett.no>
      
      Closes #13950 from gurvindersingh/rproxy.
      92ce8d48
  32. Aug 31, 2016
    • Jeff Zhang's avatar
      [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command through --conf · fa634793
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      Allow user to set sparkr shell command through --conf spark.r.shell.command
      
      ## How was this patch tested?
      
      Unit test is added and also verify it manually through
      ```
      bin/sparkr --master yarn-client --conf spark.r.shell.command=/usr/local/bin/R
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14744 from zjffdu/SPARK-17178.
      fa634793
  33. Aug 30, 2016
    • Ferdinand Xu's avatar
      [SPARK-5682][CORE] Add encrypted shuffle in spark · 4b4e329e
      Ferdinand Xu authored
      This patch is using Apache Commons Crypto library to enable shuffle encryption support.
      
      Author: Ferdinand Xu <cheng.a.xu@intel.com>
      Author: kellyzly <kellyzly@126.com>
      
      Closes #8880 from winningsix/SPARK-10771.
      4b4e329e
  34. Aug 24, 2016
  35. Aug 21, 2016
    • wm624@hotmail.com's avatar
      [SPARK-17002][CORE] Document that spark.ssl.protocol. is required for SSL · e328f577
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      `spark.ssl.enabled`=true, but failing to set `spark.ssl.protocol` will fail and throw meaningless exception. `spark.ssl.protocol` is required when `spark.ssl.enabled`.
      
      Improvement: require `spark.ssl.protocol` when initializing SSLContext, otherwise throws an exception to indicate that.
      
      Remove the OrElse("default").
      
      Document this requirement in configure.md
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Manual tests:
      Build document and check document
      
      Configure `spark.ssl.enabled` only, it throws exception below:
      6/08/16 16:04:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mwang); groups with view permissions: Set(); users  with modify permissions: Set(mwang); groups with modify permissions: Set()
      Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: spark.ssl.protocol is required when enabling SSL connections.
      	at scala.Predef$.require(Predef.scala:224)
      	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:285)
      	at org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1026)
      	at org.apache.spark.deploy.master.Master$.main(Master.scala:1011)
      	at org.apache.spark.deploy.master.Master.main(Master.scala)
      
      Configure `spark.ssl.protocol`  and `spark.ssl.protocol`
      It works fine.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #14674 from wangmiao1981/ssl.
      e328f577
  36. Aug 12, 2016
    • WeichenXu's avatar
      [DOC] add config option spark.ui.enabled into document · 91f2735a
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      The configuration doc lost the config option `spark.ui.enabled` (default value is `true`)
      I think this option is important because many cases we would like to turn it off.
      so I add it.
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14604 from WeichenXu123/add_doc_param_spark_ui_enabled.
      91f2735a
Loading