- Aug 29, 2013
-
-
Matei Zaharia authored
-
Matei Zaharia authored
This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.
-
- Aug 28, 2013
-
-
Andre Schumacher authored
-
Josh Rosen authored
This addresses SPARK-885, a usability issue where PySpark's Java gateway process would be killed if the user hit ctrl-c. Note that SIGINT still won't cancel the running s This fix is based on http://stackoverflow.com/questions/5045771
-
Andre Schumacher authored
-
- Aug 21, 2013
-
-
Andre Schumacher authored
-
- Aug 16, 2013
-
-
Andre Schumacher authored
Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path
-
- Aug 14, 2013
-
-
Josh Rosen authored
-
- Aug 12, 2013
-
-
Andre Schumacher authored
Now ADD_FILES uses a comma as file name separator.
-
- Aug 11, 2013
-
-
stayhf authored
-
- Aug 10, 2013
-
-
stayhf authored
-
- Aug 01, 2013
-
-
Matei Zaharia authored
-
- Jul 30, 2013
-
-
Josh Rosen authored
This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell.
-
- Jul 29, 2013
-
-
Matei Zaharia authored
batch input records for more efficient NumPy computations
-
Matei Zaharia authored
One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely)
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
- Jul 27, 2013
-
-
Matei Zaharia authored
-
- Jul 16, 2013
-
-
Matei Zaharia authored
-
- Jul 01, 2013
-
-
root authored
debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala
-
- Jun 21, 2013
-
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
- Apr 02, 2013
-
-
Jey Kottalam authored
-
Jey Kottalam authored
-
- Feb 24, 2013
-
-
Josh Rosen authored
-
- Feb 09, 2013
-
-
Mark Hamstra authored
-
- Feb 03, 2013
-
-
Josh Rosen authored
-
Josh Rosen authored
-
Josh Rosen authored
-
- Feb 01, 2013
-
-
Josh Rosen authored
-
Josh Rosen authored
The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
-
Josh Rosen authored
-
- Jan 31, 2013
-
-
Patrick Wendell authored
This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
-
- Jan 30, 2013
-
-
Patrick Wendell authored
Also, adds a line in doc explaining how to use.
-
- Jan 25, 2013
-
-
Stephen Haberman authored
-