Skip to content
Snippets Groups Projects
  1. Mar 07, 2014
    • Prashant Sharma's avatar
      Spark 1165 rdd.intersection in python and java · 6e730edc
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Prashant Sharma <scrapcodes@gmail.com>
      
      Closes #80 from ScrapCodes/SPARK-1165/RDD.intersection and squashes the following commits:
      
      9b015e9 [Prashant Sharma] Added a note, shuffle is required for intersection.
      1fea813 [Prashant Sharma] correct the lines wrapping
      d0c71f3 [Prashant Sharma] SPARK-1165 RDD.intersection in java
      d6effee [Prashant Sharma] SPARK-1165 Implemented RDD.intersection in python.
      6e730edc
  2. Mar 06, 2014
    • Prabin Banka's avatar
      SPARK-1187, Added missing Python APIs · 3d3acef0
      Prabin Banka authored
      The following Python APIs are added,
      RDD.id()
      SparkContext.setJobGroup()
      SparkContext.setLocalProperty()
      SparkContext.getLocalProperty()
      SparkContext.sparkUser()
      
      was raised earlier as a part of  apache/incubator-spark#486
      
      Author: Prabin Banka <prabin.banka@imaginea.com>
      
      Closes #75 from prabinb/python-api-backup and squashes the following commits:
      
      cc3c6cd [Prabin Banka] Added missing Python APIs
      3d3acef0
  3. Mar 04, 2014
  4. Feb 26, 2014
    • Bouke van der Bijl's avatar
      SPARK-1115: Catch depickling errors · 12738c1a
      Bouke van der Bijl authored
      This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason
      
      @JoshRosen
      
      Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
      
      Closes #644 from bouk/catch-depickling-errors and squashes the following commits:
      
      f0f67cc [Bouke van der Bijl] Lol indentation
      0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block
      12738c1a
  5. Feb 22, 2014
    • jyotiska's avatar
      doctest updated for mapValues, flatMapValues in rdd.py · 722199fa
      jyotiska authored
      Updated doctests for mapValues and flatMapValues in rdd.py
      
      Author: jyotiska <jyotiska123@gmail.com>
      
      Closes #621 from jyotiska/python_spark and squashes the following commits:
      
      716f7cd [jyotiska] doctest updated for mapValues, flatMapValues in rdd.py
      722199fa
    • jyotiska's avatar
      Fixed minor typo in worker.py · 3ff077d4
      jyotiska authored
      Fixed minor typo in worker.py
      
      Author: jyotiska <jyotiska123@gmail.com>
      
      Closes #630 from jyotiska/pyspark_code and squashes the following commits:
      
      ee44201 [jyotiska] typo fixed in worker.py
      3ff077d4
  6. Feb 20, 2014
    • Ahir Reddy's avatar
      SPARK-1114: Allow PySpark to use existing JVM and Gateway · 59b13795
      Ahir Reddy authored
      Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.
      
      Author: Ahir Reddy <ahirreddy@gmail.com>
      
      Closes #622 from ahirreddy/pyspark-existing-jvm and squashes the following commits:
      
      a86f457 [Ahir Reddy] Patch to allow PySpark to use existing JVM and Gateway. Changes to PySpark implementation of SparkConf to take existing SparkConf JVM handle. Change to PySpark SparkContext to allow subclass specific context initialization.
      59b13795
  7. Feb 09, 2014
    • jyotiska's avatar
      Merge pull request #562 from jyotiska/master. Closes #562. · 2ef37c93
      jyotiska authored
      Added example Python code for sort
      
      I added an example Python code for sort. Right now, PySpark has limited examples for new people willing to use the project. This example code sorts integers stored in a file. I was able to sort 5 million, 10 million and 25 million integers with this code.
      
      Author: jyotiska <jyotiska123@gmail.com>
      
      == Merge branch commits ==
      
      commit 8ad8faf6c8e02ae1cd68565d98524edf165f54df
      Author: jyotiska <jyotiska123@gmail.com>
      Date:   Sun Feb 9 11:00:41 2014 +0530
      
          Added comments in code on collect() method
      
      commit 6f98f1e313f4472a7c2207d36c4f0fbcebc95a8c
      Author: jyotiska <jyotiska123@gmail.com>
      Date:   Sat Feb 8 13:12:37 2014 +0530
      
          Updated python example code sort.py
      
      commit 945e39a5d68daa7e5bab0d96cbd35d7c4b04eafb
      Author: jyotiska <jyotiska123@gmail.com>
      Date:   Sat Feb 8 12:59:09 2014 +0530
      
          Added example python code for sort
      2ef37c93
  8. Feb 08, 2014
    • Mark Hamstra's avatar
      Merge pull request #542 from markhamstra/versionBump. Closes #542. · c2341c92
      Mark Hamstra authored
      Version number to 1.0.0-SNAPSHOT
      
      Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.
      
      @pwendell
      
      Author: Mark Hamstra <markhamstra@gmail.com>
      
      == Merge branch commits ==
      
      commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
      Author: Mark Hamstra <markhamstra@gmail.com>
      Date:   Wed Feb 5 09:30:32 2014 -0800
      
          Version number to 1.0.0-SNAPSHOT
      c2341c92
  9. Feb 06, 2014
    • Prashant Sharma's avatar
      Merge pull request #498 from ScrapCodes/python-api. Closes #498. · 084839ba
      Prashant Sharma authored
      Python api additions
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      == Merge branch commits ==
      
      commit 8b51591f1a7a79a62c13ee66ff8d83040f7eccd8
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Fri Jan 24 11:50:29 2014 +0530
      
          Josh's and Patricks review comments.
      
      commit d37f9677838e43bef6c18ef61fbf08055ba6d1ca
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 17:27:17 2014 +0530
      
          fixed doc tests
      
      commit 27cb54bf5c99b1ea38a73858c291d0a1c43d8b7c
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 16:48:43 2014 +0530
      
          Added keys and values methods for PairFunctions in python
      
      commit 4ce76b396fbaefef2386d7a36d611572bdef9b5d
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 13:51:26 2014 +0530
      
          Added foreachPartition
      
      commit 05f05341a187cba829ac0e6c2bdf30be49948c89
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 13:02:59 2014 +0530
      
          Added coalesce fucntion to python API
      
      commit 6568d2c2fa14845dc56322c0f39ba2e13b3b26dd
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Date:   Thu Jan 23 12:52:44 2014 +0530
      
          added repartition function to python API.
      084839ba
  10. Jan 28, 2014
    • Josh Rosen's avatar
      Switch from MUTF8 to UTF8 in PySpark serializers. · 1381fc72
      Josh Rosen authored
      This fixes SPARK-1043, a bug introduced in 0.9.0
      where PySpark couldn't serialize strings > 64kB.
      
      This fix was written by @tyro89 and @bouk in #512.
      This commit squashes and rebases their pull request
      in order to fix some merge conflicts.
      1381fc72
  11. Jan 23, 2014
  12. Jan 18, 2014
  13. Jan 14, 2014
  14. Jan 13, 2014
  15. Jan 12, 2014
    • Matei Zaharia's avatar
      Log Python exceptions to stderr as well · 5741078c
      Matei Zaharia authored
      This helps in case the exception happened while serializing a record to
      be sent to Java, leaving the stream to Java in an inconsistent state
      where PythonRDD won't be able to read the error.
      5741078c
    • Matei Zaharia's avatar
      Update some Python MLlib parameters to use camelCase, and tweak docs · 4c28a2ba
      Matei Zaharia authored
      We've used camel case in other Spark methods so it felt reasonable to
      keep using it here and make the code match Scala/Java as much as
      possible. Note that parameter names matter in Python because it allows
      passing optional parameters by name.
      4c28a2ba
    • Matei Zaharia's avatar
      Add Naive Bayes to Python MLlib, and some API fixes · 9a0dfdf8
      Matei Zaharia authored
      - Added a Python wrapper for Naive Bayes
      - Updated the Scala Naive Bayes to match the style of our other
        algorithms better and in particular make it easier to call from Java
        (added builder pattern, removed default value in train method)
      - Updated Python MLlib functions to not require a SparkContext; we can
        get that from the RDD the user gives
      - Added a toString method in LabeledPoint
      - Made the Python MLlib tests run as part of run-tests as well (before
        they could only be run individually through each file)
      9a0dfdf8
  16. Jan 06, 2014
  17. Jan 04, 2014
  18. Jan 03, 2014
  19. Jan 02, 2014
  20. Jan 01, 2014
  21. Dec 30, 2013
  22. Dec 29, 2013
  23. Dec 28, 2013
  24. Dec 25, 2013
Loading