Skip to content
Snippets Groups Projects
  1. May 17, 2016
    • Xiangrui Meng's avatar
      [SPARK-14906][ML] Copy linalg in PySpark to new ML package · 8ad9f08c
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      Copy the linalg (Vector/Matrix and VectorUDT/MatrixUDT) in PySpark to new ML package.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #13099 from viirya/move-pyspark-vector-matrix-udt4.
      8ad9f08c
  2. May 13, 2016
  3. Apr 14, 2016
    • Holden Karau's avatar
      [SPARK-14573][PYSPARK][BUILD] Fix PyDoc Makefile & highlighting issues · 478af2f4
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      The PyDoc Makefile used "=" rather than "?=" for setting env variables so it overwrote the user values. This ignored the environment variables we set for linting allowing warnings through. This PR also fixes the warnings that had been introduced.
      
      ## How was this patch tested?
      
      manual local export & make
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12336 from holdenk/SPARK-14573-fix-pydoc-makefile.
      478af2f4
  4. Mar 26, 2016
    • Shixiong Zhu's avatar
      [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq,... · d23ad7c1
      Shixiong Zhu authored
      [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter
      
      ## What changes were proposed in this pull request?
      
      This PR removes all docs about the old streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter projects since I have already copied them to https://github.com/spark-packages
      
      Also remove mqtt_wordcount.py that I forgot to remove previously.
      
      ## How was this patch tested?
      
      Jenkins PR Build.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #11824 from zsxwing/remove-doc.
      d23ad7c1
  5. Mar 14, 2016
  6. Feb 12, 2016
    • Holden Karau's avatar
      [SPARK-13154][PYTHON] Add linting for pydocs · 64515e5f
      Holden Karau authored
      We should have lint rules using sphinx to automatically catch the pydoc issues that are sometimes introduced.
      
      Right now ./dev/lint-python will skip building the docs if sphinx isn't present - but it might make sense to fail hard - just a matter of if we want to insist all PySpark developers have sphinx present.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #11109 from holdenk/SPARK-13154-add-pydoc-lint-for-docs.
      64515e5f
  7. Jan 12, 2016
  8. Oct 20, 2015
  9. Sep 29, 2015
  10. Sep 05, 2015
  11. Aug 04, 2015
    • Mike Dusenberry's avatar
      [SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedRowMatrix to PySpark. · 571d5b53
      Mike Dusenberry authored
      This PR adds the RowMatrix, IndexedRowMatrix, and CoordinateMatrix distributed matrices to PySpark.  Each distributed matrix class acts as a wrapper around the Scala/Java counterpart by maintaining a reference to the Java object.  New distributed matrices can be created using factory methods added to DistributedMatrices, which creates the Java distributed matrix and then wraps it with the corresponding PySpark class.  This design allows for simple conversion between the various distributed matrices, and lets us re-use the Scala code.  Serialization between Python and Java is implemented using DataFrames as needed for IndexedRowMatrix and CoordinateMatrix for simplicity.  Associated documentation and unit-tests have also been added.  To facilitate code review, this PR implements access to the rows/entries as RDDs, the number of rows & columns, and conversions between the various distributed matrices (not including BlockMatrix), and does not implement the other linear algebra functions of the matrices, although this will be very simple to add now.
      
      Author: Mike Dusenberry <mwdusenb@us.ibm.com>
      
      Closes #7554 from dusenberrymw/SPARK-6485_Add_CoordinateMatrix_RowMatrix_IndexedMatrix_to_PySpark and squashes the following commits:
      
      bb039cb [Mike Dusenberry] Minor documentation update.
      b887c18 [Mike Dusenberry] Updating the matrix conversion logic again to make it even cleaner.  Now, we allow the 'rows' parameter in the constructors to be either an RDD or the Java matrix object. If 'rows' is an RDD, we create a Java matrix object, wrap it, and then store that.  If 'rows' is a Java matrix object of the correct type, we just wrap and store that directly.  This is only for internal usage, and publicly, we still require 'rows' to be an RDD.  We no longer store the 'rows' RDD, and instead just compute it from the Java object when needed.  The point of this is that when we do matrix conversions, we do the conversion on the Scala/Java side, which returns a Java object, so we should use that directly, but exposing 'java_matrix' parameter in the public API is not ideal. This non-public feature of allowing 'rows' to be a Java matrix object is documented in the '__init__' constructor docstrings, which are not part of the generated public API, and doctests are also included.
      7f0dcb6 [Mike Dusenberry] Updating module docstring.
      cfc1be5 [Mike Dusenberry] Use 'new SQLContext(matrix.rows.sparkContext)' rather than 'SQLContext.getOrCreate', as the later doesn't guarantee that the SparkContext will be the same as for the matrix.rows data.
      687e345 [Mike Dusenberry] Improving conversion performance.  This adds an optional 'java_matrix' parameter to the constructors, and pulls the conversion logic out into a '_create_from_java' function. Now, if the constructors are given a valid Java distributed matrix object as 'java_matrix', they will store those internally, rather than create a new one on the Scala/Java side.
      3e50b6e [Mike Dusenberry] Moving the distributed matrices to pyspark.mllib.linalg.distributed.
      308f197 [Mike Dusenberry] Using properties for better documentation.
      1633f86 [Mike Dusenberry] Minor documentation cleanup.
      f0c13a7 [Mike Dusenberry] CoordinateMatrix should inherit from DistributedMatrix.
      ffdd724 [Mike Dusenberry] Updating doctests to make documentation cleaner.
      3fd4016 [Mike Dusenberry] Updating docstrings.
      27cd5f6 [Mike Dusenberry] Simplifying input conversions in the constructors for each distributed matrix.
      a409cf5 [Mike Dusenberry] Updating doctests to be less verbose by using lists instead of DenseVectors explicitly.
      d19b0ba [Mike Dusenberry] Updating code and documentation to note that a vector-like object (numpy array, list, etc.) can be used in place of explicit Vector object, and adding conversions when necessary to RowMatrix construction.
      4bd756d [Mike Dusenberry] Adding param documentation to IndexedRow and MatrixEntry.
      c6bded5 [Mike Dusenberry] Move conversion logic from tuples to IndexedRow or MatrixEntry types from within the IndexedRowMatrix and CoordinateMatrix constructors to separate _convert_to_indexed_row and _convert_to_matrix_entry functions.
      329638b [Mike Dusenberry] Moving the Experimental tag to the top of each docstring.
      0be6826 [Mike Dusenberry] Simplifying doctests by removing duplicated rows/entries RDDs within the various tests.
      c0900df [Mike Dusenberry] Adding the colons that were accidentally not inserted.
      4ad6819 [Mike Dusenberry] Documenting the  and  parameters.
      3b854b9 [Mike Dusenberry] Minor updates to documentation.
      10046e8 [Mike Dusenberry] Updating documentation to use class constructors instead of the removed DistributedMatrices factory methods.
      119018d [Mike Dusenberry] Adding static  methods to each of the distributed matrix classes to consolidate conversion logic.
      4d7af86 [Mike Dusenberry] Adding type checks to the constructors.  Although it is slightly verbose, it is better for the user to have a good error message than a cryptic stacktrace.
      93b6a3d [Mike Dusenberry] Pulling the DistributedMatrices Python class out of this pull request.
      f6f3c68 [Mike Dusenberry] Pulling the DistributedMatrices Scala class out of this pull request.
      6a3ecb7 [Mike Dusenberry] Updating pattern matching.
      08f287b [Mike Dusenberry] Slight reformatting of the documentation.
      a245dc0 [Mike Dusenberry] Updating Python doctests for compatability between Python 2 & 3. Since Python 3 removed the idea of a separate 'long' type, all values that would have been outputted as a 'long' (ex: '4L') will now be treated as an 'int' and outputed as one (ex: '4').  The doctests now explicitly convert to ints so that both Python 2 and 3 will have the same output.  This is fine since the values are all small, and thus can be easily represented as ints.
      4d3a37e [Mike Dusenberry] Reformatting a few long Python doctest lines.
      7e3ca16 [Mike Dusenberry] Fixing long lines.
      f721ead [Mike Dusenberry] Updating documentation for each of the distributed matrices.
      ab0e8b6 [Mike Dusenberry] Updating unit test to be more useful.
      dda2f89 [Mike Dusenberry] Added wrappers for the conversions between the various distributed matrices.  Added logic to be able to access the rows/entries of the distributed matrices, which requires serialization through DataFrames for IndexedRowMatrix and CoordinateMatrix types. Added unit tests.
      0cd7166 [Mike Dusenberry] Implemented the CoordinateMatrix API in PySpark, following the idea of the IndexedRowMatrix API, including using DataFrames for serialization.
      3c369cb [Mike Dusenberry] Updating the architecture a bit to make conversions between the various distributed matrix types easier.  The different distributed matrix classes are now only wrappers around the Java objects, and take the Java object as an argument during construction.  This way, we can call  for example on an , which returns a reference to a Java RowMatrix object, and then construct a PySpark RowMatrix object wrapped around the Java object.  This is analogous to the behavior of PySpark RDDs and DataFrames.  We now delegate creation of the various distributed matrices from scratch in PySpark to the factory methods on .
      4bdd09b [Mike Dusenberry] Implemented the IndexedRowMatrix API in PySpark, following the idea of the RowMatrix API.  Note that for the IndexedRowMatrix, we use DataFrames to serialize the data between Python and Scala/Java, so we accept PySpark RDDs, then convert to a DataFrame, then convert back to RDDs on the Scala/Java side before constructing the IndexedRowMatrix.
      23bf1ec [Mike Dusenberry] Updating documentation to add PySpark RowMatrix. Inserting newline above doctest so that it renders properly in API docs.
      b194623 [Mike Dusenberry] Updating design to have a PySpark RowMatrix simply create and keep a reference to a wrapper over a Java RowMatrix.  Updating DistributedMatrices factory methods to accept numRows and numCols with default values.  Updating PySpark DistributedMatrices factory method to simply create a PySpark RowMatrix. Adding additional doctests for numRows and numCols parameters.
      bc2d220 [Mike Dusenberry] Adding unit tests for RowMatrix methods.
      d7e316f [Mike Dusenberry] Implemented the RowMatrix API in PySpark by doing the following: Added a DistributedMatrices class to contain factory methods for creating the various distributed matrices.  Added a factory method for creating a RowMatrix from an RDD of Vectors.  Added a createRowMatrix function to the PythonMLlibAPI to interface with the factory method.  Added DistributedMatrix, DistributedMatrices, and RowMatrix classes to the pyspark.mllib.linalg api.
      571d5b53
  12. Jul 17, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-7879] [MLLIB] KMeans API for spark.ml Pipelines · 34a889db
      Yu ISHIKAWA authored
      I Implemented the KMeans API for spark.ml Pipelines. But it doesn't include clustering abstractions for spark.ml (SPARK-7610). It would fit for another issues. And I'll try it later, since we are trying to add the hierarchical clustering algorithms in another issue. Thanks.
      
      [SPARK-7879] KMeans API for spark.ml Pipelines - ASF JIRA https://issues.apache.org/jira/browse/SPARK-7879
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #6756 from yu-iskw/SPARK-7879 and squashes the following commits:
      
      be752de [Yu ISHIKAWA] Add assertions
      a14939b [Yu ISHIKAWA] Fix the dashed line's length in pyspark.ml.rst
      4c61693 [Yu ISHIKAWA] Remove the test about whether "features" and "prediction" columns exist or not in Python
      fb2417c [Yu ISHIKAWA] Use getInt, instead of get
      f397be4 [Yu ISHIKAWA] Switch the comparisons.
      ca78b7d [Yu ISHIKAWA] Add the Scala docs about the constraints of each parameter.
      effc650 [Yu ISHIKAWA] Using expertSetParam and expertGetParam
      c8dc6e6 [Yu ISHIKAWA] Remove an unnecessary test
      19a9d63 [Yu ISHIKAWA] Include spark.ml.clustering to python tests
      1abb19c [Yu ISHIKAWA] Add the statements about spark.ml.clustering into pyspark.ml.rst
      f8338bc [Yu ISHIKAWA] Add the placeholders in Python
      4a03003 [Yu ISHIKAWA] Test for contains in Python
      6566c8b [Yu ISHIKAWA] Use `get`, instead of `apply`
      288e8d5 [Yu ISHIKAWA] Using `contains` to check the column names
      5a7d574 [Yu ISHIKAWA] Renamce `validateInitializationMode` to `validateInitMode` and remove throwing exception
      97cfae3 [Yu ISHIKAWA] Fix the type of return value of `KMeans.copy`
      e933723 [Yu ISHIKAWA] Remove the default value of seed from the Model class
      978ee2c [Yu ISHIKAWA] Modify the docs of KMeans, according to mllib's KMeans
      2ec80bc [Yu ISHIKAWA] Fit on 1 line
      e186be1 [Yu ISHIKAWA] Make a few variables, setters and getters be expert ones
      b2c205c [Yu ISHIKAWA] Rename the method `getInitializationSteps` to `getInitSteps` and `setInitializationSteps` to `setInitSteps` in Scala and Python
      f43f5b4 [Yu ISHIKAWA] Rename the method `getInitializationMode` to `getInitMode` and `setInitializationMode` to `setInitMode` in Scala and Python
      3cb5ba4 [Yu ISHIKAWA] Modify the description about epsilon and the validation
      4fa409b [Yu ISHIKAWA] Add a comment about the default value of epsilon
      2f392e1 [Yu ISHIKAWA] Make some variables `final` and Use `IntParam` and `DoubleParam`
      19326f8 [Yu ISHIKAWA] Use `udf`, instead of callUDF
      4d2ad1e [Yu ISHIKAWA] Modify the indentations
      0ae422f [Yu ISHIKAWA] Add a test for `setParams`
      4ff7913 [Yu ISHIKAWA] Add "ml.clustering" to `javacOptions` in SparkBuild.scala
      11ffdf1 [Yu ISHIKAWA] Use `===` and the variable
      220a176 [Yu ISHIKAWA] Set a random seed in the unit testing
      92c3efc [Yu ISHIKAWA] Make the points for a test be fewer
      c758692 [Yu ISHIKAWA] Modify the parameters of KMeans in Python
      6aca147 [Yu ISHIKAWA] Add some unit testings to validate the setter methods
      687cacc [Yu ISHIKAWA] Alias mllib.KMeans as MLlibKMeans in KMeansSuite.scala
      a4dfbef [Yu ISHIKAWA] Modify the last brace and indentations
      5bedc51 [Yu ISHIKAWA] Remve an extra new line
      444c289 [Yu ISHIKAWA] Add the validation for `runs`
      e41989c [Yu ISHIKAWA] Modify how to validate `initStep`
      7ea133a [Yu ISHIKAWA] Change how to validate `initMode`
      7991e15 [Yu ISHIKAWA] Add a validation for `k`
      c2df35d [Yu ISHIKAWA] Make `predict` private
      93aa2ff [Yu ISHIKAWA] Use `withColumn` in `transform`
      d3a79f7 [Yu ISHIKAWA] Remove the inhefited docs
      e9532e1 [Yu ISHIKAWA] make `parentModel` of KMeansModel private
      8559772 [Yu ISHIKAWA] Remove the `paramMap` parameter of KMeans
      6684850 [Yu ISHIKAWA] Rename `initializationSteps` to `initSteps`
      99b1b96 [Yu ISHIKAWA] Rename `initializationMode` to `initMode`
      79ea82b [Yu ISHIKAWA] Modify the parameters of KMeans docs
      6569bcd [Yu ISHIKAWA] Change how to set the default values with `setDefault`
      20a795a [Yu ISHIKAWA] Change how to set the default values with `setDefault`
      11c2a12 [Yu ISHIKAWA] Limit the imports
      badb481 [Yu ISHIKAWA] Alias spark.mllib.{KMeans, KMeansModel}
      f80319a [Yu ISHIKAWA] Rebase mater branch and add copy methods
      85d92b1 [Yu ISHIKAWA] Add `KMeans.setPredictionCol`
      aa9469d [Yu ISHIKAWA] Fix a python test suite error caused by python 3.x
      c2d6bcb [Yu ISHIKAWA] ADD Java test suites of the KMeans API for spark.ml Pipeline
      598ed2e [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Python
      63ad785 [Yu ISHIKAWA] Implement the KMeans API for spark.ml Pipelines in Scala
      34a889db
  13. May 14, 2015
    • Xiangrui Meng's avatar
      [SPARK-7619] [PYTHON] fix docstring signature · 48fc38f5
      Xiangrui Meng authored
      Just realized that we need `\` at the end of the docstring. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6161 from mengxr/SPARK-7619 and squashes the following commits:
      
      e44495f [Xiangrui Meng] fix docstring signature
      48fc38f5
  14. May 12, 2015
    • Xiangrui Meng's avatar
      [SPARK-7572] [MLLIB] do not import Param/Params under pyspark.ml · 77f64c73
      Xiangrui Meng authored
      Remove `Param` and `Params` from `pyspark.ml` and add a section in the doc. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6094 from mengxr/SPARK-7572 and squashes the following commits:
      
      022abd6 [Xiangrui Meng] do not import Param/Params under spark.ml
      77f64c73
    • Burak Yavuz's avatar
      [SPARK-7487] [ML] Feature Parity in PySpark for ml.regression · 8e935b0a
      Burak Yavuz authored
      Added LinearRegression Python API
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #6016 from brkyvz/ml-reg and squashes the following commits:
      
      11c9ef9 [Burak Yavuz] address comments
      1027a40 [Burak Yavuz] fix typo
      4c699ad [Burak Yavuz] added tree regressor api
      8afead2 [Burak Yavuz] made mixin for DT
      fa51c74 [Burak Yavuz] save additions
      0640d48 [Burak Yavuz] added ml.regression
      82aac48 [Burak Yavuz] added linear regression
      8e935b0a
  15. May 05, 2015
    • Xiangrui Meng's avatar
      [SPARK-7333] [MLLIB] Add BinaryClassificationEvaluator to PySpark · ee374e89
      Xiangrui Meng authored
      This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5885 from mengxr/SPARK-7333 and squashes the following commits:
      
      25d7451 [Xiangrui Meng] fix tests in python 3
      babdde7 [Xiangrui Meng] fix doc
      cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
      ee374e89
  16. Apr 09, 2015
    • Yanbo Liang's avatar
      [SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API · a0411aeb
      Yanbo Liang authored
      Support FPGrowth algorithm in Python API.
      Should we remove "Experimental" which were marked for FPGrowth and FPGrowthModel in Scala? jkbradley
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #5213 from yanboliang/spark-6264 and squashes the following commits:
      
      ed62ead [Yanbo Liang] trigger jenkins
      8ce0359 [Yanbo Liang] fix docstring style
      544c725 [Yanbo Liang] address comments
      a2d7cf7 [Yanbo Liang] add doc for FPGrowth.train()
      dcf7d73 [Yanbo Liang] add python doc
      b18fd07 [Yanbo Liang] trigger jenkins
      2c951b8 [Yanbo Liang] fix typos
      7f62c8f [Yanbo Liang] add fpm to __init__.py
      b96206a [Yanbo Liang] Support FPGrowth algorithm in Python API
      a0411aeb
  17. Apr 01, 2015
    • Joseph K. Bradley's avatar
      [SPARK-6657] [Python] [Docs] fixed python doc build warnings · fb25e8c7
      Joseph K. Bradley authored
      fixed python doc build warnings
      
      CC whomever wants to review: rxin mengxr davies
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #5317 from jkbradley/python-doc-warnings and squashes the following commits:
      
      4cd43c2 [Joseph K. Bradley] fixed python doc build warnings
      fb25e8c7
  18. Mar 29, 2015
    • Reynold Xin's avatar
      [DOC] Improvements to Python docs. · 5eef00d0
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5238 from rxin/pyspark-docs and squashes the following commits:
      
      c285951 [Reynold Xin] Reset deprecation warning.
      8c1031e [Reynold Xin] inferSchema
      dd91b1a [Reynold Xin] [DOC] Improvements to Python docs.
      5eef00d0
  19. Mar 05, 2015
    • Xiangrui Meng's avatar
      [SPARK-6090][MLLIB] add a basic BinaryClassificationMetrics to PySpark/MLlib · 0bfacd5c
      Xiangrui Meng authored
      A simple wrapper around the Scala implementation. `DataFrame` is used for serialization/deserialization. Methods that return `RDD`s are not supported in this PR.
      
      davies If we recognize Scala's `Product`s in Py4J, we can easily add wrappers for Scala methods that returns `RDD[(Double, Double)]`. Is it easy to register serializer for `Product` in PySpark?
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4863 from mengxr/SPARK-6090 and squashes the following commits:
      
      009a3a3 [Xiangrui Meng] provide schema
      dcddab5 [Xiangrui Meng] add a basic BinaryClassificationMetrics to PySpark/MLlib
      0bfacd5c
  20. Mar 02, 2015
  21. Feb 25, 2015
    • Davies Liu's avatar
      [SPARK-5944] [PySpark] fix version in Python API docs · f3f4c87b
      Davies Liu authored
      use RELEASE_VERSION when building the Python API docs
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4731 from davies/api_version and squashes the following commits:
      
      c9744c9 [Davies Liu] Update create-release.sh
      08cbc3f [Davies Liu] fix python docs
      f3f4c87b
  22. Feb 24, 2015
    • Davies Liu's avatar
      [SPARK-5994] [SQL] Python DataFrame documentation fixes · d641fbb3
      Davies Liu authored
      select empty should NOT be the same as select. make sure selectExpr is behaving the same.
      join param documentation
      link to source doesn't work in jekyll generated file
      cross reference of columns (i.e. enabling linking)
      show(): move df example before df.show()
      move tests in SQLContext out of docstring otherwise doc is too long
      Column.desc and .asc doesn't have any documentation
      in documentation, sort functions.*)
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4756 from davies/df_docs and squashes the following commits:
      
      f30502c [Davies Liu] fix doc
      32f0d46 [Davies Liu] fix DataFrame docs
      d641fbb3
  23. Feb 20, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release · 4a17eedb
      Joseph K. Bradley authored
      For SPARK-5867:
      * The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API.
      * It should also include Python examples now.
      
      For SPARK-5892:
      * Fix Python docs
      * Various other cleanups
      
      BTW, I accidentally merged this with master.  If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
      
      CC: mengxr  (ML),  davies  (Python docs)
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits:
      
      f191bb0 [Joseph K. Bradley] small cleanups
      e786efa [Joseph K. Bradley] small doc corrections
      6b1ab4a [Joseph K. Bradley] fixed python lint test
      946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example.  Changed spark.ml Java examples to use DataFrames API instead of sql()
      da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3
      629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python
      b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation
      34b067f [Joseph K. Bradley] small doc correction
      da16aef [Joseph K. Bradley] Fixed python mllib docs
      8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
      695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs
      a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
      b05a80d [Joseph K. Bradley] organize imports. doc cleanups
      e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
      4a17eedb
  24. Feb 15, 2015
    • Xiangrui Meng's avatar
      [SPARK-5769] Set params in constructors and in setParams in Python ML pipelines · cd4a1536
      Xiangrui Meng authored
      This PR allow Python users to set params in constructors and in setParams, where we use decorator `keyword_only` to force keyword arguments. The trade-off is discussed in the design doc of SPARK-4586.
      
      Generated doc:
      ![screen shot 2015-02-12 at 3 06 58 am](https://cloud.githubusercontent.com/assets/829644/6166491/9cfcd06a-b265-11e4-99ea-473d866634fc.png)
      
      CC: davies rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4564 from mengxr/py-pipeline-kw and squashes the following commits:
      
      fedf720 [Xiangrui Meng] use toDF
      d565f2c [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into py-pipeline-kw
      cbc15d3 [Xiangrui Meng] fix style
      5032097 [Xiangrui Meng] update pipeline signature
      950774e [Xiangrui Meng] simplify keyword_only and update constructor/setParams signatures
      fdde5fc [Xiangrui Meng] fix style
      c9384b8 [Xiangrui Meng] fix sphinx doc
      8e59180 [Xiangrui Meng] add setParams and make constructors take params, where we force keyword args
      cd4a1536
  25. Feb 14, 2015
    • Reynold Xin's avatar
      [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames · e98dfe62
      Reynold Xin authored
      - The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
      - toDataFrame -> toDF
      - Dsl -> functions
      - implicits moved into SQLContext.implicits
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      
      Python changes:
      - toDataFrame -> toDF
      - Dsl -> functions package
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      - add toDF functions to RDD on SQLContext init
      - add flatMap to DataFrame
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
      
      5ef9910 [Reynold Xin] More fix
      61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
      ff5832c [Reynold Xin] Fix python
      749c675 [Reynold Xin] count(*) fixes.
      5806df0 [Reynold Xin] Fix build break again.
      d941f3d [Reynold Xin] Fixed explode compilation break.
      fe1267a [Davies Liu] flatMap
      c4afb8e [Reynold Xin] style
      d9de47f [Davies Liu] add comment
      b783994 [Davies Liu] add comment for toDF
      e2154e5 [Davies Liu] schema() -> schema
      3a1004f [Davies Liu] Dsl -> functions, toDF()
      fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      97dd47c [Davies Liu] fix mistake
      6168f74 [Davies Liu] fix test
      1fc0199 [Davies Liu] fix test
      a075cd5 [Davies Liu] clean up, toPandas
      663d314 [Davies Liu] add test for agg('*')
      9e214d5 [Reynold Xin] count(*) fixes.
      1ed7136 [Reynold Xin] Fix build break again.
      921b2e3 [Reynold Xin] Fixed explode compilation break.
      14698d4 [Davies Liu] flatMap
      ba3e12d [Reynold Xin] style
      d08c92d [Davies Liu] add comment
      5c8b524 [Davies Liu] add comment for toDF
      a4e5e66 [Davies Liu] schema() -> schema
      d377fc9 [Davies Liu] Dsl -> functions, toDF()
      6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      e98dfe62
  26. Feb 09, 2015
    • Davies Liu's avatar
      [SPARK-5469] restructure pyspark.sql into multiple files · 08488c17
      Davies Liu authored
      All the DataTypes moved into pyspark.sql.types
      
      The changes can be tracked by `--find-copies-harder -M25`
      ```
      davieslocalhost:~/work/spark/python$ git diff --find-copies-harder -M25 --numstat master..
      2       5       python/docs/pyspark.ml.rst
      0       3       python/docs/pyspark.mllib.rst
      10      2       python/docs/pyspark.sql.rst
      1       1       python/pyspark/mllib/linalg.py
      21      14      python/pyspark/{mllib => sql}/__init__.py
      14      2108    python/pyspark/{sql.py => sql/context.py}
      10      1772    python/pyspark/{sql.py => sql/dataframe.py}
      7       6       python/pyspark/{sql_tests.py => sql/tests.py}
      8       1465    python/pyspark/{sql.py => sql/types.py}
      4       2       python/run-tests
      1       1       sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala
      ```
      
      Also `git blame -C -C python/pyspark/sql/context.py` to track the history.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4479 from davies/sql and squashes the following commits:
      
      1b5f0a5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into sql
      2b2b983 [Davies Liu] restructure pyspark.sql
      08488c17
  27. Jan 28, 2015
    • Xiangrui Meng's avatar
      [SPARK-4586][MLLIB] Python API for ML pipeline and parameters · e80dc1c5
      Xiangrui Meng authored
      This PR adds Python API for ML pipeline and parameters. The design doc can be found on the JIRA page. It includes transformers and an estimator to demo the simple text classification example code.
      
      TODO:
      - [x] handle parameters in LRModel
      - [x] unit tests
      - [x] missing some docs
      
      CC: davies jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4151 from mengxr/SPARK-4586 and squashes the following commits:
      
      415268e [Xiangrui Meng] remove inherit_doc from __init__
      edbd6fe [Xiangrui Meng] move Identifiable to ml.util
      44c2405 [Xiangrui Meng] Merge pull request #2 from davies/ml
      dd1256b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
      14ae7e2 [Davies Liu] fix docs
      54ca7df [Davies Liu] fix tests
      78638df [Davies Liu] Merge branch 'SPARK-4586' of github.com:mengxr/spark into ml
      fc59a02 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
      1dca16a [Davies Liu] refactor
      090b3a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into ml
      0882513 [Xiangrui Meng] update doc style
      a4f4dbf [Xiangrui Meng] add unit test for LR
      7521d1c [Xiangrui Meng] add unit tests to HashingTF and Tokenizer
      ba0ba1e [Xiangrui Meng] add unit tests for pipeline
      0586c7b [Xiangrui Meng] add more comments to the example
      5153cff [Xiangrui Meng] simplify java models
      036ca04 [Xiangrui Meng] gen numFeatures
      46fa147 [Xiangrui Meng] update mllib/pom.xml to include python files in the assembly
      1dcc17e [Xiangrui Meng] update code gen and make param appear in the doc
      f66ba0c [Xiangrui Meng] make params a property
      d5efd34 [Xiangrui Meng] update doc conf and move embedded param map to instance attribute
      f4d0fe6 [Xiangrui Meng] use LabeledDocument and Document in example
      05e3e40 [Xiangrui Meng] update example
      d3e8dbe [Xiangrui Meng] more docs optimize pipeline.fit impl
      56de571 [Xiangrui Meng] fix style
      d0c5bb8 [Xiangrui Meng] a working copy
      bce72f4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-4586
      17ecfb9 [Xiangrui Meng] code gen for shared params
      d9ea77c [Xiangrui Meng] update doc
      c18dca1 [Xiangrui Meng] make the example working
      dadd84e [Xiangrui Meng] add base classes and docs
      a3015cf [Xiangrui Meng] add Estimator and Transformer
      46eea43 [Xiangrui Meng] a pipeline in python
      33b68e0 [Xiangrui Meng] a working LR
      e80dc1c5
  28. Dec 17, 2014
    • Joseph K. Bradley's avatar
      [SPARK-4821] [mllib] [python] [docs] Fix for pyspark.mllib.rand doc · affc3f46
      Joseph K. Bradley authored
      + small doc edit
      + include edit to make IntelliJ happy
      
      CC: davies  mengxr
      
      Note to davies  -- this does not fix the "WARNING: Literal block expected; none found." warnings since that seems to involve spacing which IntelliJ does not like.  (Those warnings occur when generating the Python docs.)
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #3669 from jkbradley/python-warnings and squashes the following commits:
      
      4587868 [Joseph K. Bradley] fixed warning
      8cb073c [Joseph K. Bradley] Updated based on davies recommendation
      c51eca4 [Joseph K. Bradley] Updated rst file for pyspark.mllib.rand doc.  Small doc edit.  Small include edit to make IntelliJ happy.
      affc3f46
  29. Nov 20, 2014
    • Davies Liu's avatar
      [SPARK-4439] [MLlib] add python api for random forest · 1c53a5db
      Davies Liu authored
      ```
          class RandomForestModel
           |  A model trained by RandomForest
           |
           |  numTrees(self)
           |      Get number of trees in forest.
           |
           |  predict(self, x)
           |      Predict values for a single data point or an RDD of points using the model trained.
           |
           |  toDebugString(self)
           |      Full model
           |
           |  totalNumNodes(self)
           |      Get total number of nodes, summed over all trees in the forest.
           |
      
          class RandomForest
           |  trainClassifier(cls, data, numClassesForClassification, categoricalFeaturesInfo, numTrees, featureSubsetStrategy='auto', impurity='gini', maxDepth=4, maxBins=32, seed=None):
           |      Method to train a decision tree model for binary or multiclass classification.
           |
           |      :param data: Training dataset: RDD of LabeledPoint.
           |                   Labels should take values {0, 1, ..., numClasses-1}.
           |      :param numClassesForClassification: number of classes for classification.
           |      :param categoricalFeaturesInfo: Map storing arity of categorical features.
           |                                  E.g., an entry (n -> k) indicates that feature n is categorical
           |                                  with k categories indexed from 0: {0, 1, ..., k-1}.
           |      :param numTrees: Number of trees in the random forest.
           |      :param featureSubsetStrategy: Number of features to consider for splits at each node.
           |                                Supported: "auto" (default), "all", "sqrt", "log2", "onethird".
           |                                If "auto" is set, this parameter is set based on numTrees:
           |                                  if numTrees == 1, set to "all";
           |                                  if numTrees > 1 (forest) set to "sqrt".
           |      :param impurity: Criterion used for information gain calculation.
           |                   Supported values: "gini" (recommended) or "entropy".
           |      :param maxDepth: Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means
           |                       1 internal node + 2 leaf nodes. (default: 4)
           |      :param maxBins: maximum number of bins used for splitting features (default: 100)
           |      :param seed:  Random seed for bootstrapping and choosing feature subsets.
           |      :return: RandomForestModel that can be used for prediction
           |
           |   trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetStrategy='auto', impurity='variance', maxDepth=4, maxBins=32, seed=None):
           |      Method to train a decision tree model for regression.
           |
           |      :param data: Training dataset: RDD of LabeledPoint.
           |                   Labels are real numbers.
           |      :param categoricalFeaturesInfo: Map storing arity of categorical features.
           |                                   E.g., an entry (n -> k) indicates that feature n is categorical
           |                                   with k categories indexed from 0: {0, 1, ..., k-1}.
           |      :param numTrees: Number of trees in the random forest.
           |      :param featureSubsetStrategy: Number of features to consider for splits at each node.
           |                                 Supported: "auto" (default), "all", "sqrt", "log2", "onethird".
           |                                 If "auto" is set, this parameter is set based on numTrees:
           |                                 if numTrees == 1, set to "all";
           |                                 if numTrees > 1 (forest) set to "onethird".
           |      :param impurity: Criterion used for information gain calculation.
           |                       Supported values: "variance".
           |      :param maxDepth: Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means
           |                       1 internal node + 2 leaf nodes.(default: 4)
           |      :param maxBins: maximum number of bins used for splitting features (default: 100)
           |      :param seed:  Random seed for bootstrapping and choosing feature subsets.
           |      :return: RandomForestModel that can be used for prediction
           |
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3320 from davies/forest and squashes the following commits:
      
      8003dfc [Davies Liu] reorder
      53cf510 [Davies Liu] fix docs
      4ca593d [Davies Liu] fix docs
      e0df852 [Davies Liu] fix docs
      0431746 [Davies Liu] rebased
      2b6f239 [Davies Liu] Merge branch 'master' of github.com:apache/spark into forest
      885abee [Davies Liu] address comments
      dae7fc0 [Davies Liu] address comments
      89a000f [Davies Liu] fix docs
      565d476 [Davies Liu] add python api for random forest
      1c53a5db
  30. Oct 31, 2014
    • Kousuke Saruta's avatar
      [SPARK-3870] EOL character enforcement · 55ab7770
      Kousuke Saruta authored
      We have shell scripts and Windows batch files, so we should enforce proper EOL character.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2726 from sarutak/eol-enforcement and squashes the following commits:
      
      9748c3f [Kousuke Saruta] Fixed make.bat
      252de89 [Kousuke Saruta] Removed extra characters from make.bat
      5b81c00 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
      8633ed2 [Kousuke Saruta] merge branch 'master' of git://git.apache.org/spark into eol-enforcement
      5d630d8 [Kousuke Saruta] Merged
      ba10797 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
      7407515 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
      772fd4e [Kousuke Saruta] Normized EOL character in make.bat and compute-classpath.cmd
      ac7f873 [Kousuke Saruta] Added an entry for .gitattributes to .rat-excludes
      1570e77 [Kousuke Saruta] Added .gitattributes
      55ab7770
  31. Oct 18, 2014
    • Davies Liu's avatar
      [SPARK-3952] [Streaming] [PySpark] add Python examples in Streaming Programming Guide · 05db2da7
      Davies Liu authored
      Having Python examples in Streaming Programming Guide.
      
      Also add RecoverableNetworkWordCount example.
      
      Author: Davies Liu <davies.liu@gmail.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #2808 from davies/pyguide and squashes the following commits:
      
      8d4bec4 [Davies Liu] update readme
      26a7e37 [Davies Liu] fix format
      3821c4d [Davies Liu] address comments, add missing file
      7e4bb8a [Davies Liu] add Python examples in Streaming Programming Guide
      05db2da7
  32. Oct 14, 2014
    • Masayoshi TSUZUKI's avatar
      [SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows · 66af8e25
      Masayoshi TSUZUKI authored
      Modified not to pollute environment variables.
      Just moved the main logic into `XXX2.cmd` from `XXX.cmd`, and call `XXX2.cmd` with cmd command in `XXX.cmd`.
      `pyspark.cmd` and `spark-class.cmd` are already using the same way, but `spark-shell.cmd`, `spark-submit.cmd` and `/python/docs/make.bat` are not.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2797 from tsudukim/feature/SPARK-3943 and squashes the following commits:
      
      b397a7d [Masayoshi TSUZUKI] [SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows
      66af8e25
  33. Oct 12, 2014
    • giwa's avatar
      [SPARK-2377] Python API for Streaming · 69c67aba
      giwa authored
      This patch brings Python API for Streaming.
      
      This patch is based on work from @giwa
      
      Author: giwa <ugw.gi.world@gmail.com>
      Author: Ken Takagiwa <ken@Kens-MacBook-Pro.local>
      Author: Davies Liu <davies.liu@gmail.com>
      Author: Ken Takagiwa <ken@kens-mbp.gateway.sonic.net>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Ken <ugw.gi.world@gmail.com>
      Author: Ken Takagiwa <ugw.gi.world@gmail.com>
      Author: Matthew Farrellee <matt@redhat.com>
      
      Closes #2538 from davies/streaming and squashes the following commits:
      
      64561e4 [Davies Liu] fix tests
      331ecce [Davies Liu] fix example
      3e2492b [Davies Liu] change updateStateByKey() to easy API
      182be73 [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      02d0575 [Davies Liu] add wrapper for foreachRDD()
      bebeb4a [Davies Liu] address all comments
      6db00da [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      8380064 [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      52c535b [Davies Liu] remove fix for sum()
      e108ec1 [Davies Liu]  address comments
      37fe06f [Davies Liu] use random port for callback server
      d05871e [Davies Liu] remove reuse of PythonRDD
      be5e5ff [Davies Liu] merge branch of env, make tests stable.
      8071541 [Davies Liu] Merge branch 'env' into streaming
      c7bbbce [Davies Liu] fix sphinx docs
      6bb9d91 [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      4d0ea8b [Davies Liu] clear reference of SparkEnv after stop
      54bd92b [Davies Liu] improve tests
      c2b31cb [Davies Liu] Merge branch 'master' of github.com:apache/spark into streaming
      7a88f9f [Davies Liu] rollback RDD.setContext(), use textFileStream() to test checkpointing
      bd8a4c2 [Davies Liu] fix scala style
      7797c70 [Davies Liu] refactor
      ff88bec [Davies Liu] rename RDDFunction to TransformFunction
      d328aca [Davies Liu] fix serializer in queueStream
      6f0da2f [Davies Liu] recover from checkpoint
      fa7261b [Davies Liu] refactor
      a13ff34 [Davies Liu] address comments
      8466916 [Davies Liu] support checkpoint
      9a16bd1 [Davies Liu] change number of partitions during tests
      b98d63f [Davies Liu] change private[spark] to private[python]
      eed6e2a [Davies Liu] rollback not needed changes
      e00136b [Davies Liu] address comments
      069a94c [Davies Liu] fix the number of partitions during window()
      338580a [Davies Liu] change _first(), _take(), _collect() as private API
      19797f9 [Davies Liu] clean up
      6ebceca [Davies Liu] add more tests
      c40c52d [Davies Liu] change first(), take(n) to has the same behavior as RDD
      98ac6c2 [Davies Liu] support ssc.transform()
      b983f0f [Davies Liu] address comments
      847f9b9 [Davies Liu] add more docs, add first(), take()
      e059ca2 [Davies Liu] move check of window into Python
      fce0ef5 [Davies Liu] rafactor of foreachRDD()
      7001b51 [Davies Liu] refactor of queueStream()
      26ea396 [Davies Liu] refactor
      74df565 [Davies Liu] fix print and docs
      b32774c [Davies Liu] move java_import into streaming
      604323f [Davies Liu] enable streaming tests
      c499ba0 [Davies Liu] remove Time and Duration
      3f0fb4b [Davies Liu] refactor fix tests
      c28f520 [Davies Liu] support updateStateByKey
      d357b70 [Davies Liu] support windowed dstream
      bd13026 [Davies Liu] fix examples
      eec401e [Davies Liu] refactor, combine TransformedRDD, fix reuse PythonRDD, fix union
      9a57685 [Davies Liu] fix python style
      bd27874 [Davies Liu] fix scala style
      7339be0 [Davies Liu] delete tests
      7f53086 [Davies Liu] support transform(), refactor and cleanup
      df098fc [Davies Liu] Merge branch 'master' into giwa
      550dfd9 [giwa] WIP fixing 1.1 merge
      5cdb6fa [giwa] changed for SCCallSiteSync
      e685853 [giwa] meged with rebased 1.1 branch
      2d32a74 [giwa] added some StreamingContextTestSuite
      4a59e1e [giwa] WIP:added more test for StreamingContext
      8ffdbf1 [giwa] added atexit to handle callback server
      d5f5fcb [giwa] added comment for StreamingContext.sparkContext
      63c881a [giwa] added StreamingContext.sparkContext
      d39f102 [giwa] added StreamingContext.remember
      d542743 [giwa] clean up code
      2fdf0de [Matthew Farrellee] Fix scalastyle errors
      c0a06bc [giwa] delete not implemented functions
      f385976 [giwa] delete inproper comments
      b0f2015 [giwa] added comment in dstream._test_output
      bebb3f3 [giwa] remove the last brank line
      fbed8da [giwa] revert pom.xml
      8ed93af [giwa] fixed explanaiton
      066ba90 [giwa] revert pom.xml
      fa4af88 [giwa] remove duplicated import
      6ae3caa [giwa] revert pom.xml
      7dc7391 [giwa] fixed typo
      62dc7a3 [giwa] clean up exmples
      f04882c [giwa] clen up examples
      b171ec3 [giwa] fixed pep8 violation
      f198d14 [giwa] clean up code
      3166d31 [giwa] clean up
      c00e091 [giwa] change test case not to use awaitTermination
      e80647e [giwa] adopted the latest compression way of python command
      58e41ff [giwa] merge with master
      455e5af [giwa] removed wasted print in DStream
      af336b7 [giwa] add comments
      ddd4ee1 [giwa] added TODO coments
      99ce042 [giwa] added saveAsTextFiles and saveAsPickledFiles
      2a06cdb [giwa] remove waste duplicated code
      c5ecfc1 [giwa] basic function test cases are passed
      8dcda84 [giwa] all tests are passed if numSlice is 2 and the numver of each input is over 4
      795b2cd [giwa] broke something
      1e126bf [giwa] WIP: solved partitioned and None is not recognized
      f67cf57 [giwa] added mapValues and flatMapVaules WIP for glom and mapPartitions test
      953deb0 [giwa] edited the comment to add more precise description
      af610d3 [giwa] removed unnesessary changes
      c1d546e [giwa] fixed PEP-008 violation
      99410be [giwa] delete waste file
      b3b0362 [giwa] added basic operation test cases
      9cde7c9 [giwa] WIP added test case
      bd3ba53 [giwa] WIP
      5c04a5f [giwa] WIP: added PythonTestInputStream
      019ef38 [giwa] WIP
      1934726 [giwa] update comment
      376e3ac [giwa] WIP
      932372a [giwa] clean up dstream.py
      0b09cff [giwa] added stop in StreamingContext
      92e333e [giwa] implemented reduce and count function in Dstream
      1b83354 [giwa] Removed the waste line
      88f7506 [Ken Takagiwa] Kill py4j callback server properly
      54b5358 [Ken Takagiwa] tried to restart callback server
      4f07163 [Tathagata Das] Implemented DStream.foreachRDD in the Python API using Py4J callback server.
      fe02547 [Ken Takagiwa] remove waste file
      2ad7bd3 [Ken Takagiwa] clean up codes
      6197a11 [Ken Takagiwa] clean up code
      eb4bf48 [Ken Takagiwa] fix map function
      98c2a00 [Ken Takagiwa] added count operation but this implementation need double check
      58591d2 [Ken Takagiwa] reduceByKey is working
      0df7111 [Ken Takagiwa] delete old file
      f485b1d [Ken Takagiwa] fied input of socketTextDStream
      dd6de81 [Ken Takagiwa] initial commit for socketTextStream
      247fd74 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10
      4bcb318 [Ken Takagiwa] implementing transform function in Python
      38adf95 [Ken Takagiwa] added reducedByKey not working yet
      66fcfff [Ken Takagiwa] modify dstream.py to fix indent error
      41886c2 [Ken Takagiwa] comment PythonDStream.PairwiseDStream
      0b99bec [Ken] initial commit for pySparkStreaming
      c214199 [giwa] added testcase for combineByKey
      5625bdc [giwa] added gorupByKey testcase
      10ab87b [giwa] added sparkContext as input parameter in StreamingContext
      10b5b04 [giwa] removed wasted print in DStream
      e54f986 [giwa] add comments
      16aa64f [giwa] added TODO coments
      74535d4 [giwa] added saveAsTextFiles and saveAsPickledFiles
      f76c182 [giwa] remove waste duplicated code
      18c8723 [giwa] modified streaming test case to add coment
      13fb44c [giwa] basic function test cases are passed
      3000b2b [giwa] all tests are passed if numSlice is 2 and the numver of each input is over 4
      ff14070 [giwa] broke something
      bcdec33 [giwa] WIP: solved partitioned and None is not recognized
      270a9e1 [giwa] added mapValues and flatMapVaules WIP for glom and mapPartitions test
      bb10956 [giwa] edited the comment to add more precise description
      253a863 [giwa] removed unnesessary changes
      3d37822 [giwa] fixed PEP-008 violation
      f21cab3 [giwa] delete waste file
      878bad7 [giwa] added basic operation test cases
      ce2acd2 [giwa] WIP added test case
      9ad6855 [giwa] WIP
      1df77f5 [giwa] WIP: added PythonTestInputStream
      1523b66 [giwa] WIP
      8a0fbbc [giwa] update comment
      fe648e3 [giwa] WIP
      29c2bc5 [giwa] initial commit for testcase
      4d40d63 [giwa] clean up dstream.py
      c462bb3 [giwa] added stop in StreamingContext
      d2c01ba [giwa] clean up examples
      3c45cd2 [giwa] implemented reduce and count function in Dstream
      b349649 [giwa] Removed the waste line
      3b498e1 [Ken Takagiwa] Kill py4j callback server properly
      84a9668 [Ken Takagiwa] tried to restart callback server
      9ab8952 [Tathagata Das] Added extra line.
      05e991b [Tathagata Das] Added missing file
      b1d2a30 [Tathagata Das] Implemented DStream.foreachRDD in the Python API using Py4J callback server.
      678e854 [Ken Takagiwa] remove waste file
      0a8bbbb [Ken Takagiwa] clean up codes
      bab31c1 [Ken Takagiwa] clean up code
      72b9738 [Ken Takagiwa] fix map function
      d3ee86a [Ken Takagiwa] added count operation but this implementation need double check
      15feea9 [Ken Takagiwa] edit python sparkstreaming example
      6f98e50 [Ken Takagiwa] reduceByKey is working
      c455c8d [Ken Takagiwa] added reducedByKey not working yet
      dc6995d [Ken Takagiwa] delete old file
      b31446a [Ken Takagiwa] fixed typo of network_workdcount.py
      ccfd214 [Ken Takagiwa] added doctest for pyspark.streaming.duration
      0d1b954 [Ken Takagiwa] fied input of socketTextDStream
      f746109 [Ken Takagiwa] initial commit for socketTextStream
      bb7ccf3 [Ken Takagiwa] remove unused import in python
      224fc5e [Ken Takagiwa] add empty line
      d2099d8 [Ken Takagiwa] sorted the import following Spark coding convention
      5bac7ec [Ken Takagiwa] revert streaming/pom.xml
      e1df940 [Ken Takagiwa] revert pom.xml
      494cae5 [Ken Takagiwa] remove not implemented DStream functions in python
      17a74c6 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10
      1a0f065 [Ken Takagiwa] implementing transform function in Python
      d7b4d6f [Ken Takagiwa] added reducedByKey not working yet
      87438e2 [Ken Takagiwa] modify dstream.py to fix indent error
      b406252 [Ken Takagiwa] comment PythonDStream.PairwiseDStream
      454981d [Ken] initial commit for pySparkStreaming
      150b94c [giwa] added some StreamingContextTestSuite
      f7bc8f9 [giwa] WIP:added more test for StreamingContext
      ee50c5a [giwa] added atexit to handle callback server
      fdc9125 [giwa] added comment for StreamingContext.sparkContext
      f5bfb70 [giwa] added StreamingContext.sparkContext
      da09768 [giwa] added StreamingContext.remember
      d68b568 [giwa] clean up code
      4afa390 [giwa] clean up code
      1fd6bc7 [Ken Takagiwa] Merge pull request #2 from mattf/giwa-master
      d9d59fe [Matthew Farrellee] Fix scalastyle errors
      67473a9 [giwa] delete not implemented functions
      c97377c [giwa] delete inproper comments
      2ea769e [giwa] added comment in dstream._test_output
      3b27bd4 [giwa] remove the last brank line
      acfcaeb [giwa] revert pom.xml
      93f7637 [giwa] fixed explanaiton
      50fd6f9 [giwa] revert pom.xml
      4f82c89 [giwa] remove duplicated import
      9d1de23 [giwa] revert pom.xml
      7339df2 [giwa] fixed typo
      9c85e48 [giwa] clean up exmples
      24f95db [giwa] clen up examples
      0d30109 [giwa] fixed pep8 violation
      b7dab85 [giwa] improve test case
      583e66d [giwa] move tests for streaming inside streaming directory
      1d84142 [giwa] remove unimplement test
      f0ea311 [giwa] clean up code
      171edeb [giwa] clean up
      4dedd2d [giwa] change test case not to use awaitTermination
      268a6a5 [giwa] Changed awaitTermination not to call awaitTermincation in Scala. Just use time.sleep instread
      09a28bf [giwa] improve testcases
      58150f5 [giwa] Changed the test case to focus the test operation
      199e37f [giwa] adopted the latest compression way of python command
      185fdbf [giwa] merge with master
      f1798c4 [giwa] merge with master
      e70f706 [giwa] added testcase for combineByKey
      e162822 [giwa] added gorupByKey testcase
      97742fe [giwa] added sparkContext as input parameter in StreamingContext
      14d4c0e [giwa] removed wasted print in DStream
      6d8190a [giwa] add comments
      4aa99e4 [giwa] added TODO coments
      e9fab72 [giwa] added saveAsTextFiles and saveAsPickledFiles
      94f2b65 [giwa] remove waste duplicated code
      580fbc2 [giwa] modified streaming test case to add coment
      99e4bb3 [giwa] basic function test cases are passed
      7051a84 [giwa] all tests are passed if numSlice is 2 and the numver of each input is over 4
      35933e1 [giwa] broke something
      9767712 [giwa] WIP: solved partitioned and None is not recognized
      4f2d7e6 [giwa] added mapValues and flatMapVaules WIP for glom and mapPartitions test
      33c0f94d [giwa] edited the comment to add more precise description
      774f18d [giwa] removed unnesessary changes
      3a671cc [giwa] remove export PYSPARK_PYTHON in spark submit
      8efa266 [giwa] fixed PEP-008 violation
      fa75d71 [giwa] delete waste file
      7f96294 [giwa] added basic operation test cases
      3dda31a [giwa] WIP added test case
      1f68b78 [giwa] WIP
      c05922c [giwa] WIP: added PythonTestInputStream
      1fd12ae [giwa] WIP
      c880a33 [giwa] update comment
      5d22c92 [giwa] WIP
      ea4b06b [giwa] initial commit for testcase
      5a9b525 [giwa] clean up dstream.py
      79c5809 [giwa] added stop in StreamingContext
      189dcea [giwa] clean up examples
      b8d7d24 [giwa] implemented reduce and count function in Dstream
      b6468e6 [giwa] Removed the waste line
      b47b5fd [Ken Takagiwa] Kill py4j callback server properly
      19ddcdd [Ken Takagiwa] tried to restart callback server
      c9fc124 [Tathagata Das] Added extra line.
      4caae3f [Tathagata Das] Added missing file
      4eff053 [Tathagata Das] Implemented DStream.foreachRDD in the Python API using Py4J callback server.
      5e822d4 [Ken Takagiwa] remove waste file
      aeaf8a5 [Ken Takagiwa] clean up codes
      9fa249b [Ken Takagiwa] clean up code
      05459c6 [Ken Takagiwa] fix map function
      a9f4ecb [Ken Takagiwa] added count operation but this implementation need double check
      d1ee6ca [Ken Takagiwa] edit python sparkstreaming example
      0b8b7d0 [Ken Takagiwa] reduceByKey is working
      d25d5cf [Ken Takagiwa] added reducedByKey not working yet
      7f7c5d1 [Ken Takagiwa] delete old file
      967dc26 [Ken Takagiwa] fixed typo of network_workdcount.py
      57fb740 [Ken Takagiwa] added doctest for pyspark.streaming.duration
      4b69fb1 [Ken Takagiwa] fied input of socketTextDStream
      02f618a [Ken Takagiwa] initial commit for socketTextStream
      4ce4058 [Ken Takagiwa] remove unused import in python
      856d98e [Ken Takagiwa] add empty line
      490e338 [Ken Takagiwa] sorted the import following Spark coding convention
      5594bd4 [Ken Takagiwa] revert pom.xml
      2adca84 [Ken Takagiwa] remove not implemented DStream functions in python
      e551e13 [Ken Takagiwa] add coment for hack why PYSPARK_PYTHON is needed in spark-submit
      3758175 [Ken Takagiwa] add coment for hack why PYSPARK_PYTHON is needed in spark-submit
      c5518b4 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10
      dcf243f [Ken Takagiwa] implementing transform function in Python
      9af03f4 [Ken Takagiwa] added reducedByKey not working yet
      6e0d9c7 [Ken Takagiwa] modify dstream.py to fix indent error
      e497b9b [Ken Takagiwa] comment PythonDStream.PairwiseDStream
      5c3a683 [Ken] initial commit for pySparkStreaming
      665bfdb [giwa] added testcase for combineByKey
      a3d2379 [giwa] added gorupByKey testcase
      636090a [giwa] added sparkContext as input parameter in StreamingContext
      e7ebb08 [giwa] removed wasted print in DStream
      d8b593b [giwa] add comments
      ea9c873 [giwa] added TODO coments
      89ae38a [giwa] added saveAsTextFiles and saveAsPickledFiles
      e3033fc [giwa] remove waste duplicated code
      a14c7e1 [giwa] modified streaming test case to add coment
      536def4 [giwa] basic function test cases are passed
      2112638 [giwa] all tests are passed if numSlice is 2 and the numver of each input is over 4
      080541a [giwa] broke something
      0704b86 [giwa] WIP: solved partitioned and None is not recognized
      90a6484 [giwa] added mapValues and flatMapVaules WIP for glom and mapPartitions test
      a65f302 [giwa] edited the comment to add more precise description
      bdde697 [giwa] removed unnesessary changes
      e8c7bfc [giwa] remove export PYSPARK_PYTHON in spark submit
      3334169 [giwa] fixed PEP-008 violation
      db0a303 [giwa] delete waste file
      2cfd3a0 [giwa] added basic operation test cases
      90ae568 [giwa] WIP added test case
      a120d07 [giwa] WIP
      f671cdb [giwa] WIP: added PythonTestInputStream
      56fae45 [giwa] WIP
      e35e101 [giwa] Merge branch 'master' into testcase
      ba5112d [giwa] update comment
      28aa56d [giwa] WIP
      fb08559 [giwa] initial commit for testcase
      a613b85 [giwa] clean up dstream.py
      c40c0ef [giwa] added stop in StreamingContext
      31e4260 [giwa] clean up examples
      d2127d6 [giwa] implemented reduce and count function in Dstream
      48f7746 [giwa] Removed the waste line
      0f83eaa [Ken Takagiwa] delete py4j 0.8.1
      1679808 [Ken Takagiwa] Kill py4j callback server properly
      f96cd4e [Ken Takagiwa] tried to restart callback server
      fe86198 [Ken Takagiwa] add py4j 0.8.2.1 but server is not launched
      1064fe0 [Ken Takagiwa] Merge branch 'master' of https://github.com/giwa/spark
      28c6620 [Ken Takagiwa] Implemented DStream.foreachRDD in the Python API using Py4J callback server
      85b0fe1 [Ken Takagiwa] Merge pull request #1 from tdas/python-foreach
      54e2e8c [Tathagata Das] Added extra line.
      e185338 [Tathagata Das] Added missing file
      a778d4b [Tathagata Das] Implemented DStream.foreachRDD in the Python API using Py4J callback server.
      cc2092b [Ken Takagiwa] remove waste file
      d042ac6 [Ken Takagiwa] clean up codes
      84a021f [Ken Takagiwa] clean up code
      bd20e17 [Ken Takagiwa] fix map function
      d01a125 [Ken Takagiwa] added count operation but this implementation need double check
      7d05109 [Ken Takagiwa] merge with remote branch
      ae464e0 [Ken Takagiwa] edit python sparkstreaming example
      04af046 [Ken Takagiwa] reduceByKey is working
      3b6d7b0 [Ken Takagiwa] implementing transform function in Python
      571d52d [Ken Takagiwa] added reducedByKey not working yet
      5720979 [Ken Takagiwa] delete old file
      e604fcb [Ken Takagiwa] fixed typo of network_workdcount.py
      4b7c08b [Ken Takagiwa] Merge branch 'master' of https://github.com/giwa/spark
      ce7d426 [Ken Takagiwa] added doctest for pyspark.streaming.duration
      a8c9fd5 [Ken Takagiwa] fixed for socketTextStream
      a61fa9e [Ken Takagiwa] fied input of socketTextDStream
      1e84f41 [Ken Takagiwa] initial commit for socketTextStream
      6d012f7 [Ken Takagiwa] remove unused import in python
      25d30d5 [Ken Takagiwa] add empty line
      6e0a64a [Ken Takagiwa] sorted the import following Spark coding convention
      fa4a7fc [Ken Takagiwa] revert streaming/pom.xml
      8f8202b [Ken Takagiwa] revert streaming pom.xml
      c9d79dd [Ken Takagiwa] revert pom.xml
      57e3e52 [Ken Takagiwa] remove not implemented DStream functions in python
      0a516f5 [Ken Takagiwa] add coment for hack why PYSPARK_PYTHON is needed in spark-submit
      a7a0b5c [Ken Takagiwa] add coment for hack why PYSPARK_PYTHON is needed in spark-submit
      72bfc66 [Ken Takagiwa] modified the code base on comment in https://github.com/tdas/spark/pull/10
      69e9cd3 [Ken Takagiwa] implementing transform function in Python
      94a0787 [Ken Takagiwa] added reducedByKey not working yet
      88068cf [Ken Takagiwa] modify dstream.py to fix indent error
      1367be5 [Ken Takagiwa] comment PythonDStream.PairwiseDStream
      eb2b3ba [Ken] Merge remote-tracking branch 'upstream/master'
      d8e51f9 [Ken] initial commit for pySparkStreaming
      69c67aba
  34. Oct 11, 2014
    • cocoatomo's avatar
      [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings · 7a3f589e
      cocoatomo authored
      Sphinx documents contains a corrupted ReST format and have some warnings.
      
      The purpose of this issue is same as https://issues.apache.org/jira/browse/SPARK-3773.
      
      commit: 0e8203f4
      
      output
      ```
      $ cd ./python/docs
      $ make clean html
      rm -rf _build/*
      sphinx-build -b html -d _build/doctrees   . _build/html
      Making output directory...
      Running Sphinx v1.2.3
      loading pickled environment... not yet created
      building [html]: targets for 4 source files that are out of date
      updating environment: 4 added, 0 changed, 0 removed
      reading sources... [100%] pyspark.sql
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/feature.py:docstring of pyspark.mllib.feature.Word2VecModel.findSynonyms:4: WARNING: Field list ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/feature.py:docstring of pyspark.mllib.feature.Word2VecModel.transform:3: WARNING: Field list ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/sql.py:docstring of pyspark.sql:4: WARNING: Bullet list ends without a blank line; unexpected unindent.
      looking for now-outdated files... none found
      pickling environment... done
      checking consistency... done
      preparing documents... done
      writing output... [100%] pyspark.sql
      writing additional files... (12 module code pages) _modules/index search
      copying static files... WARNING: html_static_path entry u'/Users/<user>/MyRepos/Scala/spark/python/docs/_static' does not exist
      done
      copying extra files... done
      dumping search index... done
      dumping object inventory... done
      build succeeded, 4 warnings.
      
      Build finished. The HTML pages are in _build/html.
      ```
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2766 from cocoatomo/issues/3909-sphinx-build-warnings and squashes the following commits:
      
      2c7faa8 [cocoatomo] [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings
      7a3f589e
  35. Oct 07, 2014
    • Davies Liu's avatar
      [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs · 798ed22c
      Davies Liu authored
      Retire Epydoc, use Sphinx to generate API docs.
      
      Refine Sphinx docs, also convert some docstrings into Sphinx style.
      
      It looks like:
      ![api doc](https://cloud.githubusercontent.com/assets/40902/4538272/9e2d4f10-4dec-11e4-8d96-6e45a8fe51f9.png)
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2689 from davies/docs and squashes the following commits:
      
      bf4a0a5 [Davies Liu] fix links
      3fb1572 [Davies Liu] fix _static in jekyll
      65a287e [Davies Liu] fix scripts and logo
      8524042 [Davies Liu] Merge branch 'master' of github.com:apache/spark into docs
      d5b874a [Davies Liu] Merge branch 'master' of github.com:apache/spark into docs
      4bc1c3c [Davies Liu] refactor
      746d0b6 [Davies Liu] @param -> :param
      240b393 [Davies Liu] replace epydoc with sphinx doc
      798ed22c
    • Liquan Pei's avatar
      [SPARK-3486][MLlib][PySpark] PySpark support for Word2Vec · 098c7344
      Liquan Pei authored
      mengxr
      Added PySpark support for Word2Vec
      Change list
      (1) PySpark support for Word2Vec
      (2) SerDe support of string sequence both on python side and JVM side
      (3) Test for SerDe of string sequence on JVM side
      
      Author: Liquan Pei <liquanpei@gmail.com>
      
      Closes #2356 from Ishiihara/Word2Vec-python and squashes the following commits:
      
      476ea34 [Liquan Pei] style fixes
      b13a0b9 [Liquan Pei] resolve merge conflicts and minor fixes
      8671eba [Liquan Pei] Merge remote-tracking branch 'upstream/master' into Word2Vec-python
      daf88a6 [Liquan Pei] modification according to feedback
      a73fa19 [Liquan Pei] clean up
      3d8007b [Liquan Pei] fix findSynonyms for vector
      1bdcd2e [Liquan Pei] minor fixes
      cdef9f4 [Liquan Pei] add missing comments
      b7447eb [Liquan Pei] modify according to feedback
      b9a7383 [Liquan Pei] cache words RDD in fit
      89490bf [Liquan Pei] add tests and Word2VecModelWrapper
      78bbb53 [Liquan Pei] use pickle for seq string SerDe
      a264b08 [Liquan Pei] Merge remote-tracking branch 'upstream/master' into Word2Vec-python
      ca1e5ff [Liquan Pei] fix test
      68e7276 [Liquan Pei] minor style fixes
      48d5e72 [Liquan Pei] Functionality improvement
      0ad3ac1 [Liquan Pei] minor fix
      c867fdf [Liquan Pei] add Word2Vec to pyspark
      098c7344
  36. Oct 06, 2014
    • cocoatomo's avatar
      [SPARK-3773][PySpark][Doc] Sphinx build warning · 2300eb58
      cocoatomo authored
      When building Sphinx documents for PySpark, we have 12 warnings.
      Their causes are almost docstrings in broken ReST format.
      
      To reproduce this issue, we should run following commands on the commit: 6e27cb63.
      
      ```bash
      $ cd ./python/docs
      $ make clean html
      ...
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of pyspark.SparkContext.sequenceFile:4: ERROR: Unexpected indentation.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of pyspark.RDD.saveAsSequenceFile:4: ERROR: Unexpected indentation.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:14: ERROR: Unexpected indentation.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:16: WARNING: Definition list ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:17: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:14: ERROR: Unexpected indentation.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:16: WARNING: Definition list ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:17: WARNING: Block quote ends without a blank line; unexpected unindent.
      /Users/<user>/MyRepos/Scala/spark/python/docs/pyspark.mllib.rst:50: WARNING: missing attribute mentioned in :members: or __all__: module pyspark.mllib.regression, attribute RidgeRegressionModelLinearRegressionWithSGD
      /Users/<user>/MyRepos/Scala/spark/python/pyspark/mllib/tree.py:docstring of pyspark.mllib.tree.DecisionTreeModel.predict:3: ERROR: Unexpected indentation.
      ...
      checking consistency... /Users/<user>/MyRepos/Scala/spark/python/docs/modules.rst:: WARNING: document isn't included in any toctree
      ...
      copying static files... WARNING: html_static_path entry u'/Users/<user>/MyRepos/Scala/spark/python/docs/_static' does not exist
      ...
      build succeeded, 12 warnings.
      ```
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2653 from cocoatomo/issues/3773-sphinx-build-warnings and squashes the following commits:
      
      6f65661 [cocoatomo] [SPARK-3773][PySpark][Doc] Sphinx build warning
      2300eb58
  37. Sep 16, 2014
    • Davies Liu's avatar
      [SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx · ec1adecb
      Davies Liu authored
      Using Sphinx to generate API docs for PySpark.
      
      requirement: Sphinx
      
      ```
      $ cd python/docs/
      $ make html
      ```
      
      The generated API docs will be located at python/docs/_build/html/index.html
      
      It can co-exists with those generated by Epydoc.
      
      This is the first working version, after merging in, then we can continue to improve it and replace the epydoc finally.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2292 from davies/sphinx and squashes the following commits:
      
      425a3b1 [Davies Liu] cleanup
      1573298 [Davies Liu] move docs to python/docs/
      5fe3903 [Davies Liu] Merge branch 'master' into sphinx
      9468ab0 [Davies Liu] fix makefile
      b408f38 [Davies Liu] address all comments
      e2ccb1b [Davies Liu] update name and version
      9081ead [Davies Liu] generate PySpark API docs using Sphinx
      ec1adecb
Loading