Skip to content
Snippets Groups Projects
  1. Oct 22, 2013
    • Ewen Cheslack-Postava's avatar
      Pass self to SparkContext._ensure_initialized. · 317a9eb1
      Ewen Cheslack-Postava authored
      The constructor for SparkContext should pass in self so that we track
      the current context and produce errors if another one is created. Add
      a doctest to make sure creating multiple contexts triggers the
      exception.
      317a9eb1
    • Ewen Cheslack-Postava's avatar
      Add classmethod to SparkContext to set system properties. · 56d230e6
      Ewen Cheslack-Postava authored
      Add a new classmethod to SparkContext to set system properties like is
      possible in Scala/Java. Unlike the Java/Scala implementations, there's
      no access to System until the JVM bridge is created. Since
      SparkContext handles that, move the initialization of the JVM
      connection to a separate classmethod that can safely be called
      repeatedly as long as the same instance (or no instance) is provided.
      56d230e6
  2. Oct 19, 2013
    • Ewen Cheslack-Postava's avatar
      Add an add() method to pyspark accumulators. · 7eaa56de
      Ewen Cheslack-Postava authored
      Add a regular method for adding a term to accumulators in
      pyspark. Currently if you have a non-global accumulator, adding to it
      is awkward. The += operator can't be used for non-global accumulators
      captured via closure because it's involves an assignment. The only way
      to do it is using __iadd__ directly.
      
      Adding this method lets you write code like this:
      
      def main():
          sc = SparkContext()
          accum = sc.accumulator(0)
      
          rdd = sc.parallelize([1,2,3])
          def f(x):
              accum.add(x)
          rdd.foreach(f)
          print accum.value
      
      where using accum += x instead would have caused UnboundLocalError
      exceptions in workers. Currently it would have to be written as
      accum.__iadd__(x).
      7eaa56de
  3. Oct 09, 2013
  4. Oct 07, 2013
  5. Oct 04, 2013
    • Andre Schumacher's avatar
      Fixing SPARK-602: PythonPartitioner · c84946fe
      Andre Schumacher authored
      Currently PythonPartitioner determines partition ID by hashing a
      byte-array representation of PySpark's key. This PR lets
      PythonPartitioner use the actual partition ID, which is required e.g.
      for sorting via PySpark.
      c84946fe
  6. Sep 24, 2013
  7. Sep 08, 2013
  8. Sep 07, 2013
  9. Sep 06, 2013
  10. Sep 02, 2013
  11. Sep 01, 2013
  12. Aug 29, 2013
    • Matei Zaharia's avatar
      ab0e625d
    • Matei Zaharia's avatar
      Change build and run instructions to use assemblies · 53cd50c0
      Matei Zaharia authored
      This commit makes Spark invocation saner by using an assembly JAR to
      find all of Spark's dependencies instead of adding all the JARs in
      lib_managed. It also packages the examples into an assembly and uses
      that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script
      with two better-named scripts: "run-examples" for examples, and
      "spark-class" for Spark internal classes (e.g. REPL, master, etc). This
      is also designed to minimize the confusion people have in trying to use
      "run" to run their own classes; it's not meant to do that, but now at
      least if they look at it, they can modify run-examples to do a decent
      job for them.
      
      As part of this, Bagel's examples are also now properly moved to the
      examples package instead of bagel.
      53cd50c0
  13. Aug 28, 2013
  14. Aug 21, 2013
  15. Aug 16, 2013
  16. Aug 14, 2013
  17. Aug 12, 2013
  18. Aug 11, 2013
  19. Aug 10, 2013
  20. Aug 01, 2013
  21. Jul 30, 2013
    • Josh Rosen's avatar
      Do not inherit master's PYTHONPATH on workers. · b9573263
      Josh Rosen authored
      This fixes SPARK-832, an issue where PySpark
      would not work when the master and workers used
      different SPARK_HOME paths.
      
      This change may potentially break code that relied
      on the master's PYTHONPATH being used on workers.
      To have custom PYTHONPATH additions used on the
      workers, users should set a custom PYTHONPATH in
      spark-env.sh rather than setting it in the shell.
      b9573263
  22. Jul 29, 2013
  23. Jul 27, 2013
  24. Jul 16, 2013
Loading