Commits · 9dd15fe700ad8f52739cce58cbdf198fab8fd5d8 · cs525-sp18-g07 / spark

Aug 14, 2013
- Fix PySpark unit tests on Python 2.6. · 7a9abb9d
  Josh Rosen authored 11 years ago
  
  7a9abb9d
Aug 12, 2013
- Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark · 8fd5c7bc
  Andre Schumacher authored 11 years ago
  
  Now ADD_FILES uses a comma as file name separator.
  8fd5c7bc
Aug 11, 2013
- Code update for Matei's suggestions · 24f02082
  stayhf authored 11 years ago
  
  24f02082
Aug 10, 2013
- Simple PageRank algorithm implementation in Python for SPARK-760 · 55d9bde2
  stayhf authored 11 years ago
  
  55d9bde2
Aug 01, 2013
- Fix string parsing and style in LR · 5ac54839
  Matei Zaharia authored 11 years ago
  
  5ac54839
Jul 30, 2013

Do not inherit master's PYTHONPATH on workers. · b9573263

Josh Rosen authored 11 years ago

This fixes SPARK-832, an issue where PySpark
would not work when the master and workers used
different SPARK_HOME paths.

This change may potentially break code that relied
on the master's PYTHONPATH being used on workers.
To have custom PYTHONPATH additions used on the
workers, users should set a custom PYTHONPATH in
spark-env.sh rather than setting it in the shell.

b9573263

Jul 29, 2013
- Update the Python logistic regression example to read from a file and · 01f94931
  Matei Zaharia authored 11 years ago
  
  batch input records for more efficient NumPy computations
  01f94931
- SPARK-815. Python parallelize() should split lists before batching · feba7ee5
  Matei Zaharia authored 11 years ago
  
  One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely)
  feba7ee5
- Use None instead of empty string as it's slightly smaller/faster · d75c3086
  Matei Zaharia authored 11 years ago
  
  d75c3086
- Allow python/run-tests to run from any directory · 96b50e82
  Matei Zaharia authored 11 years ago
  
  96b50e82
- Optimize Python foreach() to not return as many objects · b5ec3556
  Matei Zaharia authored 11 years ago
  
  b5ec3556
- Optimize Python take() to not compute entire first partition · b9d6783f
  Matei Zaharia authored 11 years ago
  
  b9d6783f
Jul 27, 2013
- Some fixes to Python examples (style and package name for LR) · f11ad72d
  Matei Zaharia authored 11 years ago
  
  f11ad72d
Jul 16, 2013
- Add Apache license headers and LICENSE and NOTICE files · af3c9d50
  Matei Zaharia authored 11 years ago
  
  af3c9d50
Jul 01, 2013

Fixed PySpark perf regression by not using socket.makefile(), and improved · ec31e68d

root authored 11 years ago

debuggability by letting "print" statements show up in the executor's stderr

Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala

ec31e68d

Jun 21, 2013
- Fix reporting of PySpark exceptions · c75bed0e
  Jey Kottalam authored 11 years ago
  
  c75bed0e
- PySpark daemon: fix deadlock, improve error handling · 7c5ff733
  Jey Kottalam authored 11 years ago
  
  7c5ff733
- Add tests and fixes for Python daemon shutdown · 62c47814
  Jey Kottalam authored 11 years ago
  
  62c47814
- Prefork Python worker processes · c79a6078
  Jey Kottalam authored 11 years ago
  
  c79a6078
- Add Python timing instrumentation · 40afe0d2
  Jey Kottalam authored 12 years ago
  
  40afe0d2
Apr 02, 2013
- Fix Python saveAsTextFile doctest to not expect order to be preserved · 9a731f5a
  Jey Kottalam authored 12 years ago
  
  9a731f5a
- Fix argv handling in Python transitive closure example · 20604001
  Jey Kottalam authored 12 years ago
  
  20604001
Feb 24, 2013
- Change numSplits to numPartitions in PySpark. · 2c966c98
  Josh Rosen authored 12 years ago
  
  2c966c98
Feb 09, 2013
- Add commutative requirement for 'reduce' to Python docstring. · b7a1fb5c
  Mark Hamstra authored 12 years ago
  
  b7a1fb5c
Feb 03, 2013
- Remove unnecessary doctest __main__ methods. · e6172911
  Josh Rosen authored 12 years ago
  
  e6172911
- Fetch fewer objects in PySpark's take() method. · 8fbd5380
  Josh Rosen authored 12 years ago
  
  8fbd5380
- Fix reporting of PySpark doctest failures. · 2415c18f
  Josh Rosen authored 12 years ago
  
  2415c18f
Feb 01, 2013

Use spark.local.dir for PySpark temp files (SPARK-580). · e211f405
Josh Rosen authored 12 years ago

e211f405

Do not launch JavaGateways on workers (SPARK-674). · 9cc6ff9c

Josh Rosen authored 12 years ago

The problem was that the gateway was being initialized whenever the
pyspark.context module was loaded.  The fix uses lazy initialization
that occurs only when SparkContext instances are actually constructed.

I also made the gateway and jvm variables private.

This change results in ~3-4x performance improvement when running the
PySpark unit tests.

9cc6ff9c

Fix stdout redirection in PySpark. · 57b64d0d
Josh Rosen authored 12 years ago

57b64d0d

Jan 31, 2013

SPARK-673: Capture and re-throw Python exceptions · 3446d5c8

Patrick Wendell authored 12 years ago

This patch alters the Python <-> executor protocol to pass on
exception data when they occur in user Python code.

3446d5c8

Jan 30, 2013
- Make module help available in python shell. · 3f945e3b
  Patrick Wendell authored 12 years ago
  
  Also, adds a line in doc explaining how to use.
  3f945e3b
Jan 25, 2013
- Replace old 'master' term with 'driver'. · 7dfb82a9
  Stephen Haberman authored 12 years ago
  
  7dfb82a9
Jan 23, 2013

Remove use of abc.ABCMeta due to cloudpickle issue. · b47d054c

Josh Rosen authored 12 years ago

cloudpickle runs into issues while pickling subclasses of AccumulatorParam,
which may be related to this Python issue:

    http://bugs.python.org/issue7689

This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.

b47d054c

Allow PySpark's SparkFiles to be used from driver · ae2ed294
Josh Rosen authored 12 years ago
```
Fix minor documentation formatting issues.
```
ae2ed294

Jan 22, 2013
- Fix sys.path bug in PySpark SparkContext.addPyFile · 35168d9c
  Josh Rosen authored 12 years ago
  
  35168d9c
- Make AccumulatorParam an abstract base class. · c75ae362
  Josh Rosen authored 12 years ago
  
  c75ae362
Jan 21, 2013

Don't download files to master's working directory. · ef711902

Josh Rosen authored 12 years ago

This should avoid exceptions caused by existing
files with different contents.

I also removed some unused code.

ef711902

Jan 20, 2013
- Fix PythonPartitioner equality; see SPARK-654. · 9f211dd3
  Josh Rosen authored 12 years ago
  
  PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.
  9f211dd3
- Clean up setup code in PySpark checkpointing tests · 00d70cd6
  Josh Rosen authored 12 years ago
  
  00d70cd6