- Aug 14, 2013
-
-
Josh Rosen authored
-
- Aug 12, 2013
-
-
Andre Schumacher authored
Now ADD_FILES uses a comma as file name separator.
-
- Aug 11, 2013
-
-
stayhf authored
-
- Aug 10, 2013
-
-
stayhf authored
-
- Aug 01, 2013
-
-
Matei Zaharia authored
-
- Jul 30, 2013
-
-
Josh Rosen authored
This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell.
-
- Jul 29, 2013
-
-
Matei Zaharia authored
batch input records for more efficient NumPy computations
-
Matei Zaharia authored
One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely)
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
- Jul 27, 2013
-
-
Matei Zaharia authored
-
- Jul 16, 2013
-
-
Matei Zaharia authored
-
- Jul 01, 2013
-
-
root authored
debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala
-
- Jun 21, 2013
-
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
- Apr 02, 2013
-
-
Jey Kottalam authored
-
Jey Kottalam authored
-
- Feb 24, 2013
-
-
Josh Rosen authored
-
- Feb 09, 2013
-
-
Mark Hamstra authored
-
- Feb 03, 2013
-
-
Josh Rosen authored
-
Josh Rosen authored
-
Josh Rosen authored
-
- Feb 01, 2013
-
-
Josh Rosen authored
-
Josh Rosen authored
The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
-
Josh Rosen authored
-
- Jan 31, 2013
-
-
Patrick Wendell authored
This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
-
- Jan 30, 2013
-
-
Patrick Wendell authored
Also, adds a line in doc explaining how to use.
-
- Jan 25, 2013
-
-
Stephen Haberman authored
-
- Jan 23, 2013
-
-
Josh Rosen authored
cloudpickle runs into issues while pickling subclasses of AccumulatorParam, which may be related to this Python issue: http://bugs.python.org/issue7689 This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.
-
Josh Rosen authored
Fix minor documentation formatting issues.
-
- Jan 22, 2013
-
-
Josh Rosen authored
-
Josh Rosen authored
-
- Jan 21, 2013
-
-
Josh Rosen authored
This should avoid exceptions caused by existing files with different contents. I also removed some unused code.
-
- Jan 20, 2013
-
-
Josh Rosen authored
PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.
-
Josh Rosen authored
-