Skip to content
Snippets Groups Projects
  1. Jun 02, 2016
    • Holden Karau's avatar
      [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspark TreeEnsemble missing methods · 72353311
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Add `toDebugString` and `totalNumNodes` to `TreeEnsembleModels` and add `toDebugString` to `DecisionTreeModel`
      
      ## How was this patch tested?
      
      Extended doc tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #12919 from holdenk/SPARK-15139-pyspark-treeEnsemble-missing-methods.
      72353311
  2. Jun 01, 2016
    • Yanbo Liang's avatar
      [SPARK-15587][ML] ML 2.0 QA: Scala APIs audit for ml.feature · 07a98ca4
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      ML 2.0 QA: Scala APIs audit for ml.feature. Mainly include:
      * Remove seed for ```QuantileDiscretizer```, since we use ```approxQuantile``` to produce bins and ```seed``` is useless.
      * Scala API docs update.
      * Sync Scala and Python API docs for these changes.
      
      ## How was this patch tested?
      Exist tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13410 from yanboliang/spark-15587.
      07a98ca4
    • Reynold Xin's avatar
      [SPARK-15686][SQL] Move user-facing streaming classes into sql.streaming · a71d1364
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch moves all user-facing structured streaming classes into sql.streaming. As part of this, I also added some since version annotation to methods and classes that don't have them.
      
      ## How was this patch tested?
      Updated tests to reflect the moves.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13429 from rxin/SPARK-15686.
      a71d1364
  3. May 31, 2016
    • Tathagata Das's avatar
      [SPARK-15517][SQL][STREAMING] Add support for complete output mode in Structure Streaming · 90b11439
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      Currently structured streaming only supports append output mode.  This PR adds the following.
      
      - Added support for Complete output mode in the internal state store, analyzer and planner.
      - Added public API in Scala and Python for users to specify output mode
      - Added checks for unsupported combinations of output mode and DF operations
        - Plans with no aggregation should support only Append mode
        - Plans with aggregation should support only Update and Complete modes
        - Default output mode is Append mode (**Question: should we change this to automatically set to Complete mode when there is aggregation?**)
      - Added support for Complete output mode in Memory Sink. So Memory Sink internally supports append and complete, update. But from public API only Complete and Append output modes are supported.
      
      ## How was this patch tested?
      Unit tests in various test suites
      - StreamingAggregationSuite: tests for complete mode
      - MemorySinkSuite: tests for checking behavior in Append and Complete modes.
      - UnsupportedOperationSuite: tests for checking unsupported combinations of DF ops and output modes
      - DataFrameReaderWriterSuite: tests for checking that output mode cannot be called on static DFs
      - Python doc test and existing unit tests modified to call write.outputMode.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #13286 from tdas/complete-mode.
      90b11439
    • Yanbo Liang's avatar
      [MINOR][DOC][ML] ml.clustering scala & python api doc sync · 594484cd
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Since we done Scala API audit for ml.clustering at #13148, we should also fix and update the corresponding Python API docs to keep them in sync.
      
      ## How was this patch tested?
      Docs change, no tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13291 from yanboliang/spark-15361-followup.
      594484cd
    • Shixiong Zhu's avatar
      Revert "[SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers option work · 9a74de18
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This reverts commit c24b6b67. Sent a PR to run Jenkins tests due to the revert conflicts of `dev/deps/spark-deps-hadoop*`.
      
      ## How was this patch tested?
      
      Jenkins unit tests, integration tests, manual tests)
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13417 from zsxwing/revert-SPARK-11753.
      9a74de18
  4. May 27, 2016
    • yinxusen's avatar
      [SPARK-15008][ML][PYSPARK] Add integration test for OneVsRest · 130b8d07
      yinxusen authored
      ## What changes were proposed in this pull request?
      
      1. Add `_transfer_param_map_to/from_java` for OneVsRest;
      
      2. Add `_compare_params` in ml/tests.py to help compare params.
      
      3. Add `test_onevsrest` as the integration test for OneVsRest.
      
      ## How was this patch tested?
      
      Python unit test.
      
      Author: yinxusen <yinxusen@gmail.com>
      
      Closes #12875 from yinxusen/SPARK-15008.
      130b8d07
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'a -> an' · 6b1a6180
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `a` -> `an`
      
      I use regex to generate potential error lines:
      `grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml/*/*scala`
      and review them line by line.
      
      ## How was this patch tested?
      
      local build
      `lint-java` checking
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13317 from zhengruifeng/a_an.
      6b1a6180
  5. May 26, 2016
  6. May 25, 2016
  7. May 24, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-15433] [PYSPARK] PySpark core test should not use SerDe from PythonMLLibAPI · 695d9a0f
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Currently PySpark core test uses the `SerDe` from `PythonMLLibAPI` which includes many MLlib things. It should use `SerDeUtil` instead.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #13214 from viirya/pycore-use-serdeutil.
      695d9a0f
    • Liang-Chi Hsieh's avatar
      [SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers option work · c24b6b67
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Jackson suppprts `allowNonNumericNumbers` option to parse non-standard non-numeric numbers such as "NaN", "Infinity", "INF".  Currently used Jackson version (2.5.3) doesn't support it all. This patch upgrades the library and make the two ignored tests in `JsonParsingOptionsSuite` passed.
      
      ## How was this patch tested?
      
      `JsonParsingOptionsSuite`.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #9759 from viirya/fix-json-nonnumric.
      c24b6b67
    • Nick Pentreath's avatar
      [SPARK-15442][ML][PYSPARK] Add 'relativeError' param to PySpark QuantileDiscretizer · 6075f5b4
      Nick Pentreath authored
      This PR adds the `relativeError` param to PySpark's `QuantileDiscretizer` to match Scala.
      
      Also cleaned up a duplication of `numBuckets` where the param is both a class and instance attribute (I removed the instance attr to match the style of params throughout `ml`).
      
      Finally, cleaned up the docs for `QuantileDiscretizer` to reflect that it now uses `approxQuantile`.
      
      ## How was this patch tested?
      
      A little doctest and built API docs locally to check HTML doc generation.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #13228 from MLnick/SPARK-15442-py-relerror-param.
      6075f5b4
    • Daoyuan Wang's avatar
      [SPARK-15397][SQL] fix string udf locate as hive · d642b273
      Daoyuan Wang authored
      ## What changes were proposed in this pull request?
      
      in hive, `locate("aa", "aaa", 0)` would yield 0, `locate("aa", "aaa", 1)` would yield 1 and `locate("aa", "aaa", 2)` would yield 2, while in Spark, `locate("aa", "aaa", 0)` would yield 1,  `locate("aa", "aaa", 1)` would yield 2 and  `locate("aa", "aaa", 2)` would yield 0. This results from the different understanding of the third parameter in udf `locate`. It means the starting index and starts from 1, so when we use 0, the return would always be 0.
      
      ## How was this patch tested?
      
      tested with modified `StringExpressionsSuite` and `StringFunctionsSuite`
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #13186 from adrian-wang/locate.
      d642b273
  8. May 23, 2016
    • WeichenXu's avatar
      [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with... · a15ca553
      WeichenXu authored
      [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code
      
      ## What changes were proposed in this pull request?
      
      Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code.
      
      ## How was this patch tested?
      
      Existing test.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #13242 from WeichenXu123/python_doctest_update_sparksession.
      a15ca553
    • Dongjoon Hyun's avatar
      [MINOR][SQL][DOCS] Add notes of the deterministic assumption on UDF functions · 37c617e4
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Spark assumes that UDF functions are deterministic. This PR adds explicit notes about that.
      
      ## How was this patch tested?
      
      It's only about docs.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #13087 from dongjoon-hyun/SPARK-15282.
      37c617e4
  9. May 20, 2016
    • Bryan Cutler's avatar
      [SPARK-15456][PYSPARK] Fixed PySpark shell context initialization when HiveConf not present · 021c1970
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      When PySpark shell cannot find HiveConf, it will fallback to create a SparkSession from a SparkContext.  This fixes a bug caused by using a variable to SparkContext before it was initialized.
      
      ## How was this patch tested?
      
      Manually starting PySpark shell and using the SparkContext
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13237 from BryanCutler/pyspark-shell-session-context-SPARK-15456.
      021c1970
    • Liang-Chi Hsieh's avatar
      [SPARK-15444][PYSPARK][ML][HOTFIX] Default value mismatch of param... · 4e739331
      Liang-Chi Hsieh authored
      [SPARK-15444][PYSPARK][ML][HOTFIX] Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression
      
      ## What changes were proposed in this pull request?
      
      Default value mismatch of param linkPredictionCol for GeneralizedLinearRegression between PySpark and Scala. That is because default value conflict between #13106 and #13129. This causes ml.tests failed.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #13220 from viirya/hotfix-regresstion.
      4e739331
    • Andrew Or's avatar
      [SPARK-15417][SQL][PYTHON] PySpark shell always uses in-memory catalog · c32b1b16
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      There is no way to use the Hive catalog in `pyspark-shell`. This is because we used to create a `SparkContext` before calling `SparkSession.enableHiveSupport().getOrCreate()`, which just gets the existing `SparkContext` instead of creating a new one. As a result, `spark.sql.catalogImplementation` was never propagated.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #13203 from andrewor14/fix-pyspark-shell.
      c32b1b16
  10. May 19, 2016
    • Reynold Xin's avatar
      [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate... · f2ee0ed4
      Reynold Xin authored
      [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified
      
      ## What changes were proposed in this pull request?
      Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that.
      
      This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession.
      
      ## How was this patch tested?
      Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13200 from rxin/SPARK-15075.
      f2ee0ed4
    • Yanbo Liang's avatar
      [MINOR][ML][PYSPARK] ml.evaluation Scala and Python API sync · 66436778
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      ```ml.evaluation``` Scala and Python API sync.
      
      ## How was this patch tested?
      Only API docs change, no new tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13195 from yanboliang/evaluation-doc.
      66436778
    • Davies Liu's avatar
      [SPARK-15392][SQL] fix default value of size estimation of logical plan · 5ccecc07
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      We use autoBroadcastJoinThreshold + 1L as the default value of size estimation, that is not good in 2.0, because we will calculate the size based on size of schema, then the estimation could be less than autoBroadcastJoinThreshold if you have an SELECT on top of an DataFrame created from RDD.
      
      This PR change the default value to Long.MaxValue.
      
      ## How was this patch tested?
      
      Added regression tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #13183 from davies/fix_default_size.
      5ccecc07
    • Holden Karau's avatar
      [SPARK-15316][PYSPARK][ML] Add linkPredictionCol to GeneralizedLinearRegression · e71cd96b
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      Add linkPredictionCol to GeneralizedLinearRegression and fix the PyDoc to generate the bullet list
      
      ## How was this patch tested?
      
      doctests & built docs locally
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #13106 from holdenk/SPARK-15316-add-linkPredictionCol-toGeneralizedLinearRegression.
      e71cd96b
    • gatorsmile's avatar
      [SPARK-14603][SQL][FOLLOWUP] Verification of Metadata Operations by Session Catalog · ef7a5e0b
      gatorsmile authored
      #### What changes were proposed in this pull request?
      This follow-up PR is to address the remaining comments in https://github.com/apache/spark/pull/12385
      
      The major change in this PR is to issue better error messages in PySpark by using the mechanism that was proposed by davies in https://github.com/apache/spark/pull/7135
      
      For example, in PySpark, if we input the following statement:
      ```python
      >>> l = [('Alice', 1)]
      >>> df = sqlContext.createDataFrame(l)
      >>> df.createTempView("people")
      >>> df.createTempView("people")
      ```
      Before this PR, the exception we will get is like
      ```
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/dataframe.py", line 152, in createTempView
          self._jdf.createTempView(name)
        File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
        File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 63, in deco
          return f(*a, **kw)
        File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling o35.createTempView.
      : org.apache.spark.sql.catalyst.analysis.TempTableAlreadyExistsException: Temporary table 'people' already exists;
          at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTempView(SessionCatalog.scala:324)
          at org.apache.spark.sql.SparkSession.createTempView(SparkSession.scala:523)
          at org.apache.spark.sql.Dataset.createTempView(Dataset.scala:2328)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
          at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
          at py4j.Gateway.invoke(Gateway.java:280)
          at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
          at py4j.commands.CallCommand.execute(CallCommand.java:79)
          at py4j.GatewayConnection.run(GatewayConnection.java:211)
          at java.lang.Thread.run(Thread.java:745)
      ```
      After this PR, the exception we will get become cleaner:
      ```
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/dataframe.py", line 152, in createTempView
          self._jdf.createTempView(name)
        File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
        File "/Users/xiaoli/IdeaProjects/sparkDelivery/python/pyspark/sql/utils.py", line 75, in deco
          raise AnalysisException(s.split(': ', 1)[1], stackTrace)
      pyspark.sql.utils.AnalysisException: u"Temporary table 'people' already exists;"
      ```
      
      #### How was this patch tested?
      Fixed an existing PySpark test case
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #13126 from gatorsmile/followup-14684.
      ef7a5e0b
  11. May 18, 2016
    • Bryan Cutler's avatar
      [DOC][MINOR] ml.feature Scala and Python API sync · b1bc5ebd
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      I reviewed Scala and Python APIs for ml.feature and corrected discrepancies.
      
      ## How was this patch tested?
      
      Built docs locally, ran style checks
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13159 from BryanCutler/ml.feature-api-sync.
      b1bc5ebd
    • Reynold Xin's avatar
      [SPARK-14463][SQL] Document the semantics for read.text · 4987f39a
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch is a follow-up to https://github.com/apache/spark/pull/13104 and adds documentation to clarify the semantics of read.text with respect to partitioning.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13184 from rxin/SPARK-14463.
      4987f39a
    • Nick Pentreath's avatar
      [SPARK-14891][ML] Add schema validation for ALS · e8b79afa
      Nick Pentreath authored
      This PR adds schema validation to `ml`'s ALS and ALSModel. Currently, no schema validation was performed as `transformSchema` was never called in `ALS.fit` or `ALSModel.transform`. Furthermore, due to no schema validation, if users passed in Long (or Float etc) ids, they would be silently cast to Int with no warning or error thrown.
      
      With this PR, ALS now supports all numeric types for `user`, `item`, and `rating` columns. The rating column is cast to `Float` and the user and item cols are cast to `Int` (as is the case currently) - however for user/item, the cast throws an error if the value is outside integer range. Behavior for rating col is unchanged (as it is not an issue).
      
      ## How was this patch tested?
      New test cases in `ALSSuite`.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #12762 from MLnick/SPARK-14891-als-validate-schema.
      e8b79afa
    • Liang-Chi Hsieh's avatar
      [SPARK-15342] [SQL] [PYSPARK] PySpark test for non ascii column name does not... · 3d1e67f9
      Liang-Chi Hsieh authored
      [SPARK-15342] [SQL] [PYSPARK] PySpark test for non ascii column name does not actually test with unicode column name
      
      ## What changes were proposed in this pull request?
      
      The PySpark SQL `test_column_name_with_non_ascii` wants to test non-ascii column name. But it doesn't actually test it. We need to construct an unicode explicitly using `unicode` under Python 2.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #13134 from viirya/correct-non-ascii-colname-pytest.
      3d1e67f9
    • Takuya Kuwahara's avatar
      [SPARK-14978][PYSPARK] PySpark TrainValidationSplitModel should support validationMetrics · 411c04ad
      Takuya Kuwahara authored
      ## What changes were proposed in this pull request?
      
      This pull request includes supporting validationMetrics for TrainValidationSplitModel with Python and test for it.
      
      ## How was this patch tested?
      
      test in `python/pyspark/ml/tests.py`
      
      Author: Takuya Kuwahara <taakuu19@gmail.com>
      
      Closes #12767 from taku-k/spark-14978.
      411c04ad
  12. May 17, 2016
    • Sean Zhong's avatar
      [SPARK-15171][SQL] Remove the references to deprecated method dataset.registerTempTable · 25b315e6
      Sean Zhong authored
      ## What changes were proposed in this pull request?
      
      Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`.
      
      ## How was this patch tested?
      
      This PR only changes the unit test code, examples, and comments. It should be safe.
      This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged.
      
      Author: Sean Zhong <seanzhong@databricks.com>
      
      Closes #13098 from clockfly/spark-15171-remove-deprecation.
      25b315e6
    • Dongjoon Hyun's avatar
      [SPARK-15244] [PYTHON] Type of column name created with createDataFrame is not consistent. · 0f576a57
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      **createDataFrame** returns inconsistent types for column names.
      ```python
      >>> from pyspark.sql.types import StructType, StructField, StringType
      >>> schema = StructType([StructField(u"col", StringType())])
      >>> df1 = spark.createDataFrame([("a",)], schema)
      >>> df1.columns # "col" is str
      ['col']
      >>> df2 = spark.createDataFrame([("a",)], [u"col"])
      >>> df2.columns # "col" is unicode
      [u'col']
      ```
      
      The reason is only **StructField** has the following code.
      ```
      if not isinstance(name, str):
          name = name.encode('utf-8')
      ```
      This PR adds the same logic into **createDataFrame** for consistency.
      ```
      if isinstance(schema, list):
          schema = [x.encode('utf-8') if not isinstance(x, str) else x for x in schema]
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins test (with new python doctest)
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #13097 from dongjoon-hyun/SPARK-15244.
      0f576a57
    • DB Tsai's avatar
      [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms · e2efe052
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      Once SPARK-14487 and SPARK-14549 are merged, we will migrate to use the new vector and matrix type in the new ml pipeline based apis.
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12627 from dbtsai/SPARK-14615-NewML.
      e2efe052
    • Xiangrui Meng's avatar
      [SPARK-14906][ML] Copy linalg in PySpark to new ML package · 8ad9f08c
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      Copy the linalg (Vector/Matrix and VectorUDT/MatrixUDT) in PySpark to new ML package.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #13099 from viirya/move-pyspark-vector-matrix-udt4.
      8ad9f08c
  13. May 13, 2016
Loading