-
- Downloads
[SPARK-14836][YARN] Zip all the jars before uploading to distributed cache
## What changes were proposed in this pull request? <copy form JIRA> Currently if neither `spark.yarn.jars` nor `spark.yarn.archive` is set (by default), Spark on yarn code will upload all the jars in the folder separately into distributed cache, this is quite time consuming, and very verbose, instead of upload jars separately into distributed cache, here changes to zip all the jars first, and then put into distributed cache. This will significantly improve the speed of starting time. ## How was this patch tested? Unit test and local integrated test is done. Verified with SparkPi both in spark cluster and client mode. Author: jerryshao <sshao@hortonworks.com> Closes #12597 from jerryshao/SPARK-14836.
Showing
- yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 18 additions, 3 deletions.../src/main/scala/org/apache/spark/deploy/yarn/Client.scala
- yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala 5 additions, 6 deletions...test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
Loading
Please register or sign in to comment