- Feb 26, 2014
-
-
Bouke van der Bijl authored
This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason @JoshRosen Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes #644 from bouk/catch-depickling-errors and squashes the following commits: f0f67cc [Bouke van der Bijl] Lol indentation 0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block
-
- Feb 22, 2014
-
-
jyotiska authored
Fixed minor typo in worker.py Author: jyotiska <jyotiska123@gmail.com> Closes #630 from jyotiska/pyspark_code and squashes the following commits: ee44201 [jyotiska] typo fixed in worker.py
-
- Jan 28, 2014
-
-
Josh Rosen authored
This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB. This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts.
-
- Jan 12, 2014
-
-
Matei Zaharia authored
This helps in case the exception happened while serializing a record to be sent to Java, leaving the stream to Java in an inconsistent state where PythonRDD won't be able to read the error.
-
- Nov 10, 2013
-
-
Josh Rosen authored
-
Josh Rosen authored
-
Josh Rosen authored
For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
-
- Nov 03, 2013
-
-
Josh Rosen authored
If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
-
Josh Rosen authored
Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
-
- Sep 01, 2013
-
-
Matei Zaharia authored
-
- Aug 16, 2013
-
-
Andre Schumacher authored
Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path
-
- Jul 16, 2013
-
-
Matei Zaharia authored
-
- Jun 21, 2013
-
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
- Feb 01, 2013
-
-
Josh Rosen authored
-
- Jan 31, 2013
-
-
Patrick Wendell authored
This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
-
- Jan 23, 2013
-
-
Josh Rosen authored
Fix minor documentation formatting issues.
-
- Jan 22, 2013
-
-
Josh Rosen authored
-
- Jan 21, 2013
-
-
Josh Rosen authored
This should avoid exceptions caused by existing files with different contents. I also removed some unused code.
-
- Jan 20, 2013
-
-
Matei Zaharia authored
-
- Jan 08, 2013
-
-
Josh Rosen authored
-
- Jan 01, 2013
-
-
Josh Rosen authored
-
- Dec 24, 2012
-
-
Josh Rosen authored
Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.
-
- Oct 19, 2012
-
-
Josh Rosen authored
-
- Aug 27, 2012
-
-
Josh Rosen authored
-
Josh Rosen authored
-
Josh Rosen authored
-
- Aug 24, 2012
-
-
Josh Rosen authored
-
- Aug 22, 2012
-
-
Josh Rosen authored
-
- Aug 21, 2012
-
-
Josh Rosen authored
Objects serialized with JSON can be compared for equality, but JSON can be slow to serialize and only supports a limited range of data types.
-
- Aug 19, 2012
-
-
Josh Rosen authored
-
Josh Rosen authored
-