Commits · a99fb3747a0bc9498cb1d19ae5b5bb0163e6f52b · cs525-sp18-g07 / spark

Mar 07, 2014

Spark 1165 rdd.intersection in python and java · 6e730edc

Prashant Sharma authored 11 years ago

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Prashant Sharma <scrapcodes@gmail.com>

Closes #80 from ScrapCodes/SPARK-1165/RDD.intersection and squashes the following commits:

9b015e9 [Prashant Sharma] Added a note, shuffle is required for intersection.
1fea813 [Prashant Sharma] correct the lines wrapping
d0c71f3 [Prashant Sharma] SPARK-1165 RDD.intersection in java
d6effee [Prashant Sharma] SPARK-1165 Implemented RDD.intersection in python.

6e730edc

Mar 06, 2014

SPARK-1187, Added missing Python APIs · 3d3acef0

Prabin Banka authored 11 years ago

The following Python APIs are added,
RDD.id()
SparkContext.setJobGroup()
SparkContext.setLocalProperty()
SparkContext.getLocalProperty()
SparkContext.sparkUser()

was raised earlier as a part of  apache/incubator-spark#486

Author: Prabin Banka <prabin.banka@imaginea.com>

Closes #75 from prabinb/python-api-backup and squashes the following commits:

cc3c6cd [Prabin Banka] Added missing Python APIs

3d3acef0

Mar 04, 2014

SPARK-1109 wrong API docs for pyspark map function · 02836657

Prashant Sharma authored 11 years ago

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes #73 from ScrapCodes/SPARK-1109/wrong-API-docs and squashes the following commits:

1a55b58 [Prashant Sharma] SPARK-1109 wrong API docs for pyspark map function

02836657

Feb 26, 2014

SPARK-1115: Catch depickling errors · 12738c1a

Bouke van der Bijl authored 11 years ago

This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason

@JoshRosen

Author: Bouke van der Bijl <boukevanderbijl@gmail.com>

Closes #644 from bouk/catch-depickling-errors and squashes the following commits:

f0f67cc [Bouke van der Bijl] Lol indentation
0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block

12738c1a

Feb 22, 2014

doctest updated for mapValues, flatMapValues in rdd.py · 722199fa

jyotiska authored 11 years ago

Updated doctests for mapValues and flatMapValues in rdd.py

Author: jyotiska <jyotiska123@gmail.com>

Closes #621 from jyotiska/python_spark and squashes the following commits:

716f7cd [jyotiska] doctest updated for mapValues, flatMapValues in rdd.py

722199fa

Fixed minor typo in worker.py · 3ff077d4

jyotiska authored 11 years ago

Fixed minor typo in worker.py

Author: jyotiska <jyotiska123@gmail.com>

Closes #630 from jyotiska/pyspark_code and squashes the following commits:

ee44201 [jyotiska] typo fixed in worker.py

3ff077d4

Feb 20, 2014

SPARK-1114: Allow PySpark to use existing JVM and Gateway · 59b13795

Ahir Reddy authored 11 years ago

Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.

Author: Ahir Reddy <ahirreddy@gmail.com>

Closes #622 from ahirreddy/pyspark-existing-jvm and squashes the following commits:

a86f457 [Ahir Reddy] Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.

59b13795

Feb 09, 2014

Merge pull request #562 from jyotiska/master. Closes #562. · 2ef37c93

jyotiska authored 11 years ago

Added example Python code for sort

I added an example Python code for sort. Right now, PySpark has limited examples for new people willing to use the project. This example code sorts integers stored in a file. I was able to sort 5 million, 10 million and 25 million integers with this code.

Author: jyotiska <jyotiska123@gmail.com>

== Merge branch commits ==

commit 8ad8faf6c8e02ae1cd68565d98524edf165f54df
Author: jyotiska <jyotiska123@gmail.com>
Date:   Sun Feb 9 11:00:41 2014 +0530

    Added comments in code on collect() method

commit 6f98f1e313f4472a7c2207d36c4f0fbcebc95a8c
Author: jyotiska <jyotiska123@gmail.com>
Date:   Sat Feb 8 13:12:37 2014 +0530

    Updated python example code sort.py

commit 945e39a5d68daa7e5bab0d96cbd35d7c4b04eafb
Author: jyotiska <jyotiska123@gmail.com>
Date:   Sat Feb 8 12:59:09 2014 +0530

    Added example python code for sort

2ef37c93

Feb 08, 2014

Merge pull request #542 from markhamstra/versionBump. Closes #542. · c2341c92

Mark Hamstra authored 11 years ago

Version number to 1.0.0-SNAPSHOT

Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.

@pwendell

Author: Mark Hamstra <markhamstra@gmail.com>

== Merge branch commits ==

commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
Author: Mark Hamstra <markhamstra@gmail.com>
Date:   Wed Feb 5 09:30:32 2014 -0800

    Version number to 1.0.0-SNAPSHOT

c2341c92

Feb 06, 2014

Merge pull request #498 from ScrapCodes/python-api. Closes #498. · 084839ba

Prashant Sharma authored 11 years ago

Python api additions

Author: Prashant Sharma <prashant.s@imaginea.com>

== Merge branch commits ==

commit 8b51591f1a7a79a62c13ee66ff8d83040f7eccd8
Author: Prashant Sharma <prashant.s@imaginea.com>
Date:   Fri Jan 24 11:50:29 2014 +0530

    Josh's and Patricks review comments.

commit d37f9677838e43bef6c18ef61fbf08055ba6d1ca
Author: Prashant Sharma <prashant.s@imaginea.com>
Date:   Thu Jan 23 17:27:17 2014 +0530

    fixed doc tests

commit 27cb54bf5c99b1ea38a73858c291d0a1c43d8b7c
Author: Prashant Sharma <prashant.s@imaginea.com>
Date:   Thu Jan 23 16:48:43 2014 +0530

    Added keys and values methods for PairFunctions in python

commit 4ce76b396fbaefef2386d7a36d611572bdef9b5d
Author: Prashant Sharma <prashant.s@imaginea.com>
Date:   Thu Jan 23 13:51:26 2014 +0530

    Added foreachPartition

commit 05f05341a187cba829ac0e6c2bdf30be49948c89
Author: Prashant Sharma <prashant.s@imaginea.com>
Date:   Thu Jan 23 13:02:59 2014 +0530

    Added coalesce fucntion to python API

commit 6568d2c2fa14845dc56322c0f39ba2e13b3b26dd
Author: Prashant Sharma <prashant.s@imaginea.com>
Date:   Thu Jan 23 12:52:44 2014 +0530

    added repartition function to python API.

084839ba

Jan 28, 2014

Switch from MUTF8 to UTF8 in PySpark serializers. · 1381fc72

Josh Rosen authored 11 years ago

This fixes SPARK-1043, a bug introduced in 0.9.0
where PySpark couldn't serialize strings > 64kB.

This fix was written by @tyro89 and @bouk in #512.
This commit squashes and rebases their pull request
in order to fix some merge conflicts.

1381fc72

Jan 23, 2014
- Deprecate mapPartitionsWithSplit in PySpark. · 4cebb79c
  Josh Rosen authored 11 years ago
  
  Also, replace the last reference to it in the docs. This fixes SPARK-1026.
  4cebb79c
- Fix for SPARK-1025: PySpark hang on missing files. · f8306849
  Josh Rosen authored 11 years ago
  
  f8306849
- Fix SPARK-978: ClassCastException in PySpark cartesian. · 61569906
  Josh Rosen authored 11 years ago
  
  61569906
- Fix SPARK-1034: Py4JException on PySpark Cartesian Result · 0035dbbc
  Josh Rosen authored 11 years ago
  
  0035dbbc
Jan 18, 2014

Merge pull request #462 from mateiz/conf-file-fix · bf569954

Patrick Wendell authored 11 years ago

Remove Typesafe Config usage and conf files to fix nested property names

With Typesafe Config we had the subtle problem of no longer allowing
nested property names, which are used for a few of our properties:
http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html



This PR is for branch 0.9 but should be added into master too.
(cherry picked from commit 34e911ce)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>

bf569954

Jan 14, 2014
- Complain if Python and NumPy versions are too old for MLlib · 5b3a3e28
  Matei Zaharia authored 11 years ago
  
  5b3a3e28
- Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+) · 938e4a0e
  Matei Zaharia authored 11 years ago
  
  938e4a0e
Jan 13, 2014
- Disable MLlib tests for now while Jenkins is still on Python 2.6 · cc93c2ab
  Matei Zaharia authored 11 years ago
  
  cc93c2ab
Jan 12, 2014

Log Python exceptions to stderr as well · 5741078c

Matei Zaharia authored 11 years ago

This helps in case the exception happened while serializing a record to
be sent to Java, leaving the stream to Java in an inconsistent state
where PythonRDD won't be able to read the error.

5741078c

Update some Python MLlib parameters to use camelCase, and tweak docs · 4c28a2ba

Matei Zaharia authored 11 years ago

We've used camel case in other Spark methods so it felt reasonable to
keep using it here and make the code match Scala/Java as much as
possible. Note that parameter names matter in Python because it allows
passing optional parameters by name.

4c28a2ba

Add Naive Bayes to Python MLlib, and some API fixes · 9a0dfdf8

Matei Zaharia authored 11 years ago

- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
  algorithms better and in particular make it easier to call from Java
  (added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
  get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
  they could only be run individually through each file)

9a0dfdf8

Jan 06, 2014
- Added predictAll python function to MatrixFactorizationModel · 754f5300
  Hossein Falaki authored 11 years ago
  
  754f5300
- Added Rating deserializer · 04132ea9
  Hossein Falaki authored 11 years ago
  
  04132ea9
Jan 04, 2014
- Added python binding for bulk recommendation · 8d0c2f73
  Hossein Falaki authored 11 years ago
  
  8d0c2f73
Jan 03, 2014
- Changes on top of Prashant's patch. · 9e6f3bdc
  Patrick Wendell authored 11 years ago
  
  Closes #316
  9e6f3bdc
- sbin/spark-class* -> bin/spark-class* · 74ba97fc
  Prashant Sharma authored 11 years ago
  
  74ba97fc
- fixed review comments · 94f2fffa
  Prashant Sharma authored 11 years ago
  
  94f2fffa
Jan 02, 2014
- pyspark -> bin/pyspark · a3f90a2e
  Prashant Sharma authored 11 years ago
  
  a3f90a2e
Jan 01, 2014
- Fix Python code after change of getOrElse · 7e8d2e8a
  Matei Zaharia authored 11 years ago
  
  7e8d2e8a
- Miscellaneous fixes from code review. · e2c68642
  Matei Zaharia authored 11 years ago
  
  Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
  e2c68642
Dec 30, 2013
- Updated docs for SparkConf and handled review comments · 0fa58097
  Matei Zaharia authored 11 years ago
  
  0fa58097
Dec 29, 2013
- Properly show Spark properties on web UI, and change app name property · 994f080f
  Matei Zaharia authored 11 years ago
  
  994f080f
- Fix some Python docs and make sure to unset SPARK_TESTING in Python · eaa8a68f
  Matei Zaharia authored 11 years ago
  
  tests so we don't get the test spark.conf on the classpath.
  eaa8a68f
- Add Python docs about SparkConf · 58c6fa20
  Matei Zaharia authored 11 years ago
  
  58c6fa20
- Fix some other Python tests due to initializing JVM in a different way · 615fb649
  Matei Zaharia authored 11 years ago
  
  The test in context.py created two different instances of the SparkContext class by copying "globals", so that some tests can have a global "sc" object and others can try initializing their own contexts. This led to two JVM gateways being created since SparkConf also looked at pyspark.context.SparkContext to get the JVM.
  615fb649
- Add SparkConf support in Python · cd00225d
  Matei Zaharia authored 11 years ago
  
  cd00225d
Dec 28, 2013
- Fix Python use of getLocalDir · 1c11f54a
  Matei Zaharia authored 11 years ago
  
  1c11f54a
- Make Python function/line appear in the UI. · fec01664
  Tor Myklebust authored 11 years ago
  
  fec01664
Dec 25, 2013
- Remove commented code in __init__.py. · 9cbcf814
  Tor Myklebust authored 11 years ago
  
  9cbcf814