- Feb 26, 2014
-
-
Bouke van der Bijl authored
This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason @JoshRosen Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes #644 from bouk/catch-depickling-errors and squashes the following commits: f0f67cc [Bouke van der Bijl] Lol indentation 0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block
-
- Feb 22, 2014
-
-
jyotiska authored
Updated doctests for mapValues and flatMapValues in rdd.py Author: jyotiska <jyotiska123@gmail.com> Closes #621 from jyotiska/python_spark and squashes the following commits: 716f7cd [jyotiska] doctest updated for mapValues, flatMapValues in rdd.py
-
jyotiska authored
Fixed minor typo in worker.py Author: jyotiska <jyotiska123@gmail.com> Closes #630 from jyotiska/pyspark_code and squashes the following commits: ee44201 [jyotiska] typo fixed in worker.py
-
- Feb 20, 2014
-
-
Ahir Reddy authored
Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization. Author: Ahir Reddy <ahirreddy@gmail.com> Closes #622 from ahirreddy/pyspark-existing-jvm and squashes the following commits: a86f457 [Ahir Reddy] Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.
-
- Feb 09, 2014
-
-
jyotiska authored
Added example Python code for sort I added an example Python code for sort. Right now, PySpark has limited examples for new people willing to use the project. This example code sorts integers stored in a file. I was able to sort 5 million, 10 million and 25 million integers with this code. Author: jyotiska <jyotiska123@gmail.com> == Merge branch commits == commit 8ad8faf6c8e02ae1cd68565d98524edf165f54df Author: jyotiska <jyotiska123@gmail.com> Date: Sun Feb 9 11:00:41 2014 +0530 Added comments in code on collect() method commit 6f98f1e313f4472a7c2207d36c4f0fbcebc95a8c Author: jyotiska <jyotiska123@gmail.com> Date: Sat Feb 8 13:12:37 2014 +0530 Updated python example code sort.py commit 945e39a5d68daa7e5bab0d96cbd35d7c4b04eafb Author: jyotiska <jyotiska123@gmail.com> Date: Sat Feb 8 12:59:09 2014 +0530 Added example python code for sort
-
- Feb 08, 2014
-
-
Mark Hamstra authored
Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <markhamstra@gmail.com> == Merge branch commits == commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71 Author: Mark Hamstra <markhamstra@gmail.com> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT
-
- Feb 06, 2014
-
-
Prashant Sharma authored
Python api additions Author: Prashant Sharma <prashant.s@imaginea.com> == Merge branch commits == commit 8b51591f1a7a79a62c13ee66ff8d83040f7eccd8 Author: Prashant Sharma <prashant.s@imaginea.com> Date: Fri Jan 24 11:50:29 2014 +0530 Josh's and Patricks review comments. commit d37f9677838e43bef6c18ef61fbf08055ba6d1ca Author: Prashant Sharma <prashant.s@imaginea.com> Date: Thu Jan 23 17:27:17 2014 +0530 fixed doc tests commit 27cb54bf5c99b1ea38a73858c291d0a1c43d8b7c Author: Prashant Sharma <prashant.s@imaginea.com> Date: Thu Jan 23 16:48:43 2014 +0530 Added keys and values methods for PairFunctions in python commit 4ce76b396fbaefef2386d7a36d611572bdef9b5d Author: Prashant Sharma <prashant.s@imaginea.com> Date: Thu Jan 23 13:51:26 2014 +0530 Added foreachPartition commit 05f05341a187cba829ac0e6c2bdf30be49948c89 Author: Prashant Sharma <prashant.s@imaginea.com> Date: Thu Jan 23 13:02:59 2014 +0530 Added coalesce fucntion to python API commit 6568d2c2fa14845dc56322c0f39ba2e13b3b26dd Author: Prashant Sharma <prashant.s@imaginea.com> Date: Thu Jan 23 12:52:44 2014 +0530 added repartition function to python API.
-
- Jan 28, 2014
-
-
Josh Rosen authored
This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB. This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts.
-
- Jan 23, 2014
-
-
Josh Rosen authored
Also, replace the last reference to it in the docs. This fixes SPARK-1026.
-
Josh Rosen authored
-
Josh Rosen authored
-
Josh Rosen authored
-
- Jan 18, 2014
-
-
Patrick Wendell authored
Remove Typesafe Config usage and conf files to fix nested property names With Typesafe Config we had the subtle problem of no longer allowing nested property names, which are used for a few of our properties: http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html This PR is for branch 0.9 but should be added into master too. (cherry picked from commit 34e911ce) Signed-off-by:
Patrick Wendell <pwendell@gmail.com>
-
- Jan 14, 2014
-
-
Matei Zaharia authored
-
Matei Zaharia authored
-
- Jan 13, 2014
-
-
Matei Zaharia authored
-
- Jan 12, 2014
-
-
Matei Zaharia authored
This helps in case the exception happened while serializing a record to be sent to Java, leaving the stream to Java in an inconsistent state where PythonRDD won't be able to read the error.
-
Matei Zaharia authored
We've used camel case in other Spark methods so it felt reasonable to keep using it here and make the code match Scala/Java as much as possible. Note that parameter names matter in Python because it allows passing optional parameters by name.
-
Matei Zaharia authored
- Added a Python wrapper for Naive Bayes - Updated the Scala Naive Bayes to match the style of our other algorithms better and in particular make it easier to call from Java (added builder pattern, removed default value in train method) - Updated Python MLlib functions to not require a SparkContext; we can get that from the RDD the user gives - Added a toString method in LabeledPoint - Made the Python MLlib tests run as part of run-tests as well (before they could only be run individually through each file)
-
- Jan 06, 2014
-
-
Hossein Falaki authored
-
Hossein Falaki authored
-
- Jan 04, 2014
-
-
Hossein Falaki authored
-
- Jan 03, 2014
-
-
Patrick Wendell authored
Closes #316
-
Prashant Sharma authored
-
Prashant Sharma authored
-
- Jan 02, 2014
-
-
Prashant Sharma authored
-
- Jan 01, 2014
-
-
Matei Zaharia authored
-
Matei Zaharia authored
Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
-
- Dec 30, 2013
-
-
Matei Zaharia authored
-
- Dec 29, 2013
-
-
Matei Zaharia authored
-
Matei Zaharia authored
tests so we don't get the test spark.conf on the classpath.
-
Matei Zaharia authored
-
Matei Zaharia authored
The test in context.py created two different instances of the SparkContext class by copying "globals", so that some tests can have a global "sc" object and others can try initializing their own contexts. This led to two JVM gateways being created since SparkConf also looked at pyspark.context.SparkContext to get the JVM.
-
Matei Zaharia authored
-
- Dec 28, 2013
-
-
Matei Zaharia authored
-
Tor Myklebust authored
-
- Dec 25, 2013
-
-
Tor Myklebust authored
-
Tor Myklebust authored
-
- Dec 24, 2013
-
-
Tor Myklebust authored
-
Tor Myklebust authored
-