Commits · 12738c1aec136acd7f2e3e2f8f2b541db0890630 · cs525-sp18-g07 / spark

Feb 26, 2014

SPARK-1115: Catch depickling errors · 12738c1a

Bouke van der Bijl authored 11 years ago

This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason

@JoshRosen

Author: Bouke van der Bijl <boukevanderbijl@gmail.com>

Closes #644 from bouk/catch-depickling-errors and squashes the following commits:

f0f67cc [Bouke van der Bijl] Lol indentation
0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block

12738c1a

Feb 22, 2014

Fixed minor typo in worker.py · 3ff077d4

jyotiska authored 11 years ago

Fixed minor typo in worker.py

Author: jyotiska <jyotiska123@gmail.com>

Closes #630 from jyotiska/pyspark_code and squashes the following commits:

ee44201 [jyotiska] typo fixed in worker.py

3ff077d4

Jan 28, 2014

Switch from MUTF8 to UTF8 in PySpark serializers. · 1381fc72

Josh Rosen authored 11 years ago

This fixes SPARK-1043, a bug introduced in 0.9.0
where PySpark couldn't serialize strings > 64kB.

This fix was written by @tyro89 and @bouk in #512.
This commit squashes and rebases their pull request
in order to fix some merge conflicts.

1381fc72

Jan 12, 2014

Log Python exceptions to stderr as well · 5741078c

Matei Zaharia authored 11 years ago

This helps in case the exception happened while serializing a record to
be sent to Java, leaving the stream to Java in an inconsistent state
where PythonRDD won't be able to read the error.

5741078c

Nov 10, 2013

FramedSerializer: _dumps => dumps, _loads => loads. · 13122ceb
Josh Rosen authored 11 years ago

13122ceb
Send PySpark commands as bytes insetad of strings. · ffa5bedf
Josh Rosen authored 11 years ago

ffa5bedf

Add custom serializer support to PySpark. · cbb7f04a

Josh Rosen authored 11 years ago

For now, this only adds MarshalSerializer, but it lays the groundwork
for other supporting custom serializers.  Many of these mechanisms
can also be used to support deserialization of different data formats
sent by Java, such as data encoded by MsgPack.

This also fixes a bug in SparkContext.union().

cbb7f04a

Nov 03, 2013

Remove Pickle-wrapping of Java objects in PySpark. · 7d68a81a

Josh Rosen authored 11 years ago

If we support custom serializers, the Python
worker will know what type of input to expect,
so we won't need to wrap Tuple2 and Strings into
pickled tuples and strings.

7d68a81a

Replace magic lengths with constants in PySpark. · a48d88d2

Josh Rosen authored 11 years ago

Write the length of the accumulators section up-front rather
than terminating it with a negative length.  I find this
easier to read.

a48d88d2

Sep 01, 2013
- Allow PySpark to launch worker.py directly on Windows · 6550e5e6
  Matei Zaharia authored 12 years ago
  
  6550e5e6
Aug 16, 2013

Implementing SPARK-878 for PySpark: adding zip and egg files to context and... · c7e348fa

Andre Schumacher authored 12 years ago

Implementing SPARK-878 for PySpark: adding zip and egg files to context and passing it down to workers which add these to their sys.path

c7e348fa

Jul 16, 2013
- Add Apache license headers and LICENSE and NOTICE files · af3c9d50
  Matei Zaharia authored 12 years ago
  
  af3c9d50
Jun 21, 2013
- Fix reporting of PySpark exceptions · c75bed0e
  Jey Kottalam authored 12 years ago
  
  c75bed0e
- Add tests and fixes for Python daemon shutdown · 62c47814
  Jey Kottalam authored 12 years ago
  
  62c47814
- Prefork Python worker processes · c79a6078
  Jey Kottalam authored 12 years ago
  
  c79a6078
- Add Python timing instrumentation · 40afe0d2
  Jey Kottalam authored 12 years ago
  
  40afe0d2
Feb 01, 2013
- Fix stdout redirection in PySpark. · 57b64d0d
  Josh Rosen authored 12 years ago
  
  57b64d0d
Jan 31, 2013

SPARK-673: Capture and re-throw Python exceptions · 3446d5c8

Patrick Wendell authored 12 years ago

This patch alters the Python <-> executor protocol to pass on
exception data when they occur in user Python code.

3446d5c8

Jan 23, 2013
- Allow PySpark's SparkFiles to be used from driver · ae2ed294
  Josh Rosen authored 12 years ago
  
  Fix minor documentation formatting issues.
  ae2ed294
Jan 22, 2013
- Fix sys.path bug in PySpark SparkContext.addPyFile · 35168d9c
  Josh Rosen authored 12 years ago
  
  35168d9c
Jan 21, 2013

Don't download files to master's working directory. · ef711902

Josh Rosen authored 12 years ago

This should avoid exceptions caused by existing
files with different contents.

I also removed some unused code.

ef711902

Jan 20, 2013
- Added accumulators to PySpark · 8e7f098a
  Matei Zaharia authored 12 years ago
  
  8e7f098a
Jan 08, 2013
- Add mapPartitionsWithSplit() to PySpark. · b57dd0f1
  Josh Rosen authored 12 years ago
  
  b57dd0f1
Jan 01, 2013
- Rename top-level 'pyspark' directory to 'python' · b58340db
  Josh Rosen authored 12 years ago
  
  b58340db
Dec 24, 2012

Use filesystem to collect RDDs in PySpark. · 4608902f

Josh Rosen authored 12 years ago

Passing large volumes of data through Py4J seems
to be slow.  It appears to be faster to write the
data to the local filesystem and read it back from
Python.

4608902f

Oct 19, 2012
- Update Python API for v0.6.0 compatibility. · 52989c8a
  Josh Rosen authored 12 years ago
  
  52989c8a
Aug 27, 2012
- Simplify Python worker; pipeline the map step of partitionBy(). · 200d248d
  Josh Rosen authored 13 years ago
  
  200d248d
- Use local combiners in Python API combineByKey(). · 6904cb77
  Josh Rosen authored 13 years ago
  
  6904cb77
- Add broadcast variables to Python API. · f79a1e4d
  Josh Rosen authored 13 years ago
  
  f79a1e4d
Aug 24, 2012
- Refactor Python MappedRDD to use iterator pipelines. · f3b852ce
  Josh Rosen authored 13 years ago
  
  f3b852ce
Aug 22, 2012
- Use numpy in Python k-means example. · 607b53ab
  Josh Rosen authored 13 years ago
  
  607b53ab
Aug 21, 2012

Use only cPickle for serialization in Python API. · fd94e544

Josh Rosen authored 13 years ago

Objects serialized with JSON can be compared for equality, but JSON can be slow
to serialize and only supports a limited range of data types.

fd94e544

Aug 19, 2012
- Bundle cloudpickle with pyspark. · 13b95149
  Josh Rosen authored 13 years ago
  
  13b95149
- Add Python API. · 886b39de
  Josh Rosen authored 13 years ago
  
  886b39de