Skip to content
Snippets Groups Projects
  1. Dec 20, 2013
  2. Dec 19, 2013
  3. Dec 09, 2013
  4. Nov 29, 2013
  5. Nov 26, 2013
  6. Nov 10, 2013
  7. Nov 03, 2013
  8. Oct 22, 2013
    • Ewen Cheslack-Postava's avatar
      Pass self to SparkContext._ensure_initialized. · 317a9eb1
      Ewen Cheslack-Postava authored
      The constructor for SparkContext should pass in self so that we track
      the current context and produce errors if another one is created. Add
      a doctest to make sure creating multiple contexts triggers the
      exception.
      317a9eb1
    • Ewen Cheslack-Postava's avatar
      Add classmethod to SparkContext to set system properties. · 56d230e6
      Ewen Cheslack-Postava authored
      Add a new classmethod to SparkContext to set system properties like is
      possible in Scala/Java. Unlike the Java/Scala implementations, there's
      no access to System until the JVM bridge is created. Since
      SparkContext handles that, move the initialization of the JVM
      connection to a separate classmethod that can safely be called
      repeatedly as long as the same instance (or no instance) is provided.
      56d230e6
  9. Oct 19, 2013
    • Ewen Cheslack-Postava's avatar
      Add an add() method to pyspark accumulators. · 7eaa56de
      Ewen Cheslack-Postava authored
      Add a regular method for adding a term to accumulators in
      pyspark. Currently if you have a non-global accumulator, adding to it
      is awkward. The += operator can't be used for non-global accumulators
      captured via closure because it's involves an assignment. The only way
      to do it is using __iadd__ directly.
      
      Adding this method lets you write code like this:
      
      def main():
          sc = SparkContext()
          accum = sc.accumulator(0)
      
          rdd = sc.parallelize([1,2,3])
          def f(x):
              accum.add(x)
          rdd.foreach(f)
          print accum.value
      
      where using accum += x instead would have caused UnboundLocalError
      exceptions in workers. Currently it would have to be written as
      accum.__iadd__(x).
      7eaa56de
  10. Oct 09, 2013
  11. Oct 07, 2013
  12. Oct 04, 2013
    • Andre Schumacher's avatar
      Fixing SPARK-602: PythonPartitioner · c84946fe
      Andre Schumacher authored
      Currently PythonPartitioner determines partition ID by hashing a
      byte-array representation of PySpark's key. This PR lets
      PythonPartitioner use the actual partition ID, which is required e.g.
      for sorting via PySpark.
      c84946fe
  13. Sep 24, 2013
  14. Sep 08, 2013
  15. Sep 07, 2013
  16. Sep 06, 2013
  17. Sep 02, 2013
  18. Sep 01, 2013
  19. Aug 30, 2013
  20. Aug 29, 2013
    • Matei Zaharia's avatar
      ab0e625d
    • Matei Zaharia's avatar
      Change build and run instructions to use assemblies · 53cd50c0
      Matei Zaharia authored
      This commit makes Spark invocation saner by using an assembly JAR to
      find all of Spark's dependencies instead of adding all the JARs in
      lib_managed. It also packages the examples into an assembly and uses
      that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
      with two better-named scripts: "run-examples" for examples, and
      "spark-class" for Spark internal classes (e.g. REPL, master, etc). This
      is also designed to minimize the confusion people have in trying to use
      "run" to run their own classes; it's not meant to do that, but now at
      least if they look at it, they can modify run-examples to do a decent
      job for them.
      
      As part of this, Bagel's examples are also now properly moved to the
      examples package instead of bagel.
      53cd50c0
  21. Aug 28, 2013
Loading