Skip to content
Snippets Groups Projects
  1. Jan 29, 2014
    • Erik Selin's avatar
      Merge pull request #494 from tyro89/worker_registration_issue · 0ff38c22
      Erik Selin authored
      Issue with failed worker registrations
      
      I've been going through the spark source after having some odd issues with workers dying and not coming back. After some digging (I'm very new to scala and spark) I believe I've found a worker registration issue. It looks to me like a failed registration follows the same code path as a successful registration which end up with workers believing they are connected (since they received a `RegisteredWorker` event) even tho they are not registered on the Master.
      
      This is a quick fix that I hope addresses this issue (assuming I didn't completely miss-read the code and I'm about to look like a silly person :P)
      
      I'm opening this pr now to start a chat with you guys while I do some more testing on my side :)
      
      Author: Erik Selin <erik.selin@jadedpixel.com>
      
      == Merge branch commits ==
      
      commit 973012f8a2dcf1ac1e68a69a2086a1b9a50f401b
      Author: Erik Selin <erik.selin@jadedpixel.com>
      Date:   Tue Jan 28 23:36:12 2014 -0500
      
          break logwarning into two lines to respect line character limit.
      
      commit e3754dc5b94730f37e9806974340e6dd93400f85
      Author: Erik Selin <erik.selin@jadedpixel.com>
      Date:   Tue Jan 28 21:16:21 2014 -0500
      
          add log warning when worker registration fails due to attempt to re-register on same address.
      
      commit 14baca241fa7823e1213cfc12a3ff2a9b865b1ed
      Author: Erik Selin <erik.selin@jadedpixel.com>
      Date:   Wed Jan 22 21:23:26 2014 -0500
      
          address code style comment
      
      commit 71c0d7e6f59cd378d4e24994c21140ab893954ee
      Author: Erik Selin <erik.selin@jadedpixel.com>
      Date:   Wed Jan 22 16:01:42 2014 -0500
      
          Make a failed registration not persist, not send a `RegisteredWordker` event and not run `schedule` but rather send a `RegisterWorkerFailed` message to the worker attempting to register.
      0ff38c22
  2. Jan 28, 2014
    • Tathagata Das's avatar
      Merge pull request #497 from tdas/docs-update · 79302096
      Tathagata Das authored
      Updated Spark Streaming Programming Guide
      
      Here is the updated version of the Spark Streaming Programming Guide. This is still a work in progress, but the major changes are in place. So feedback is most welcome.
      
      In general, I have tried to make the guide to easier to understand even if the reader does not know much about Spark. The updated website is hosted here -
      
      http://www.eecs.berkeley.edu/~tdas/spark_docs/streaming-programming-guide.html
      
      The major changes are:
      - Overview illustrates the usecases of Spark Streaming - various input sources and various output sources
      - An example right after overview to quickly give an idea of what Spark Streaming program looks like
      - Made Java API and examples a first class citizen like Scala by using tabs to show both Scala and Java examples (similar to AMPCamp tutorial's code tabs)
      - Highlighted the DStream operations updateStateByKey and transform because of their powerful nature
      - Updated driver node failure recovery text to highlight automatic recovery in Spark standalone mode
      - Added information about linking and using the external input sources like Kafka and Flume
      - In general, reorganized the sections to better show the Basic section and the more advanced sections like Tuning and Recovery.
      
      Todos:
      - Links to the docs of external Kafka, Flume, etc
      - Illustrate window operation with figure as well as example.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      == Merge branch commits ==
      
      commit 18ff10556570b39d672beeb0a32075215cfcc944
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Tue Jan 28 21:49:30 2014 -0800
      
          Fixed a lot of broken links.
      
      commit 34a5a6008dac2e107624c7ff0db0824ee5bae45f
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Tue Jan 28 18:02:28 2014 -0800
      
          Updated github url to use SPARK_GITHUB_URL variable.
      
      commit f338a60ae8069e0a382d2cb170227e5757cc0b7a
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Mon Jan 27 22:42:42 2014 -0800
      
          More updates based on Patrick and Harvey's comments.
      
      commit 89a81ff25726bf6d26163e0dd938290a79582c0f
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Mon Jan 27 13:08:34 2014 -0800
      
          Updated docs based on Patricks PR comments.
      
      commit d5b6196b532b5746e019b959a79ea0cc013a8fc3
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Sun Jan 26 20:15:58 2014 -0800
      
          Added spark.streaming.unpersist config and info on StreamingListener interface.
      
      commit e3dcb46ab83d7071f611d9b5008ba6bc16c9f951
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Sun Jan 26 18:41:12 2014 -0800
      
          Fixed docs on StreamingContext.getOrCreate.
      
      commit 6c29524639463f11eec721e4d17a9d7159f2944b
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Thu Jan 23 18:49:39 2014 -0800
      
          Added example and figure for window operations, and links to Kafka and Flume API docs.
      
      commit f06b964a51bb3b21cde2ff8bdea7d9785f6ce3a9
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Wed Jan 22 22:49:12 2014 -0800
      
          Fixed missing endhighlight tag in the MLlib guide.
      
      commit 036a7d46187ea3f2a0fb8349ef78f10d6c0b43a9
      Merge: eab351d a1cd1851
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Wed Jan 22 22:17:42 2014 -0800
      
          Merge remote-tracking branch 'apache/master' into docs-update
      
      commit eab351d05c0baef1d4b549e1581310087158d78d
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Date:   Wed Jan 22 22:17:15 2014 -0800
      
          Update Spark Streaming Programming Guide.
      79302096
    • Josh Rosen's avatar
      Merge pull request #523 from JoshRosen/SPARK-1043 · f8c742ce
      Josh Rosen authored
      Switch from MUTF8 to UTF8 in PySpark serializers.
      
      This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB.
      
      This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts.
      f8c742ce
    • Josh Rosen's avatar
      Switch from MUTF8 to UTF8 in PySpark serializers. · 1381fc72
      Josh Rosen authored
      This fixes SPARK-1043, a bug introduced in 0.9.0
      where PySpark couldn't serialize strings > 64kB.
      
      This fix was written by @tyro89 and @bouk in #512.
      This commit squashes and rebases their pull request
      in order to fix some merge conflicts.
      1381fc72
  3. Jan 27, 2014
    • Reynold Xin's avatar
      Merge pull request #466 from liyinan926/file-overwrite-new · 84670f27
      Reynold Xin authored
      Allow files added through SparkContext.addFile() to be overwritten
      
      This is useful for the cases when a file needs to be refreshed and downloaded by the executors periodically. For example, a possible use case is: the driver periodically renews a Hadoop delegation token and writes it to a token file. The token file needs to be downloaded by the executors whenever it gets renewed. However, the current implementation throws an exception when the target file exists and its contents do not match those of the new source. This PR adds an option to allow files to be overwritten to support use cases similar to the above.
      84670f27
    • Reynold Xin's avatar
      Merge pull request #516 from sarutak/master · 3d5c03e2
      Reynold Xin authored
      modified SparkPluginBuild.scala to use https protocol for accessing gith...
      
      We cannot build Spark behind a proxy although we execute sbt with -Dhttp(s).proxyHost -Dhttp(s).proxyPort -Dhttp(s).proxyUser -Dhttp(s).proxyPassword options.
      It's because of using git protocol to clone junit_xml_listener.git.
      I could build after modifying SparkPluginBuild.scala.
      
      I reported this issue to JIRA.
      https://spark-project.atlassian.net/browse/SPARK-1046
      3d5c03e2
    • Reynold Xin's avatar
      Merge pull request #490 from hsaputra/modify_checkoption_with_isdefined · f16c21e2
      Reynold Xin authored
      Replace the check for None Option with isDefined and isEmpty in Scala code
      
      Propose to replace the Scala check for Option "!= None" with Option.isDefined and "=== None" with Option.isEmpty.
      
      I think this, using method call if possible then operator function plus argument, will make the Scala code easier to read and understand.
      
      Pass compile and tests.
      f16c21e2
    • Sean Owen's avatar
      Merge pull request #460 from srowen/RandomInitialALSVectors · f67ce3e2
      Sean Owen authored
      Choose initial user/item vectors uniformly on the unit sphere
      
      ...rather than within the unit square to possibly avoid bias in the initial state and improve convergence.
      
      The current implementation picks the N vector elements uniformly at random from [0,1). This means they all point into one quadrant of the vector space. As N gets just a little large, the vector tend strongly to point into the "corner", towards (1,1,1...,1). The vectors are not unit vectors either.
      
      I suggest choosing the elements as Gaussian ~ N(0,1) and normalizing. This gets you uniform random choices on the unit sphere which is more what's of interest here. It has worked a little better for me in the past.
      
      This is pretty minor but wanted to warm up suggesting a few tweaks to ALS.
      Please excuse my Scala, pretty new to it.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      == Merge branch commits ==
      
      commit 492b13a7469e5a4ed7591ee8e56d8bd7570dfab6
      Author: Sean Owen <sowen@cloudera.com>
      Date:   Mon Jan 27 08:05:25 2014 +0000
      
          Style: spaces around binary operators
      
      commit ce2b5b5a4fefa0356875701f668f01f02ba4d87e
      Author: Sean Owen <sowen@cloudera.com>
      Date:   Sun Jan 19 22:50:03 2014 +0000
      
          Generate factors with all positive components, per discussion in https://github.com/apache/incubator-spark/pull/460
      
      commit b6f7a8a61643a8209e8bc662e8e81f2d15c710c7
      Author: Sean Owen <sowen@cloudera.com>
      Date:   Sat Jan 18 15:54:42 2014 +0000
      
          Choose initial user/item vectors uniformly on the unit sphere rather than within the unit square to possibly avoid bias in the initial state and improve convergence
      f67ce3e2
    • sarutak's avatar
  4. Jan 26, 2014
  5. Jan 25, 2014
    • Josh Rosen's avatar
      Fix ClassCastException in JavaPairRDD.collectAsMap() (SPARK-1040) · 740e865f
      Josh Rosen authored
      This fixes an issue where collectAsMap() could
      fail when called on a JavaPairRDD that was derived
      by transforming a non-JavaPairRDD.
      
      The root problem was that we were creating the
      JavaPairRDD's ClassTag by casting a
      ClassTag[AnyRef] to a ClassTag[Tuple2[K2, V2]].
      To fix this, I cast a ClassTag[Tuple2[_, _]]
      instead, since this actually produces a ClassTag
      of the appropriate type because ClassTags don't
      capture type parameters:
      
      scala> implicitly[ClassTag[Tuple2[_, _]]] == implicitly[ClassTag[Tuple2[Int, Int]]]
      res8: Boolean = true
      
      scala> implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[Tuple2[Int, Int]]] == implicitly[ClassTag[Tuple2[Int, Int]]]
      res9: Boolean = false
      740e865f
    • Josh Rosen's avatar
      Increase JUnit test verbosity under SBT. · 531d9d75
      Josh Rosen authored
      Upgrade junit-interface plugin from 0.9 to 0.10.
      
      I noticed that the JavaAPISuite tests didn't
      appear to display any output locally or under
      Jenkins, making it difficult to know whether they
      were running.  This change increases the verbosity
      to more closely match the ScalaTest tests.
      531d9d75
  6. Jan 23, 2014
  7. Jan 22, 2014
Loading