Skip to content
Snippets Groups Projects
  1. Jul 03, 2015
    • zhichao.li's avatar
      [SPARK-8226] [SQL] Add function shiftrightunsigned · ab535b9a
      zhichao.li authored
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #7035 from zhichao-li/shiftRightUnsigned and squashes the following commits:
      
      6bcca5a [zhichao.li] change coding style
      3e9f5ae [zhichao.li] python style
      d85ae0b [zhichao.li] add shiftrightunsigned
      ab535b9a
  2. Jul 02, 2015
    • Reynold Xin's avatar
      Revert "[SPARK-8784] [SQL] Add Python API for hex and unhex" · e589e71a
      Reynold Xin authored
      This reverts commit fc7aebd9.
      e589e71a
    • Davies Liu's avatar
      [SPARK-8784] [SQL] Add Python API for hex and unhex · fc7aebd9
      Davies Liu authored
      Also improve the performance of hex/unhex
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7181 from davies/hex and squashes the following commits:
      
      f032fbb [Davies Liu] Merge branch 'hex' of github.com:davies/spark into hex
      49e325f [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
      b31fc9a [Davies Liu] Update math.scala
      25156b7 [Davies Liu] address comments and fix test
      c3af78c [Davies Liu] address commments
      1a24082 [Davies Liu] Add Python API for hex and unhex
      fc7aebd9
    • Yijie Shen's avatar
      [SPARK-8407] [SQL] complex type constructors: struct and named_struct · 52302a80
      Yijie Shen authored
      This is a follow up of [SPARK-8283](https://issues.apache.org/jira/browse/SPARK-8283) ([PR-6828](https://github.com/apache/spark/pull/6828)), to support both `struct` and `named_struct` in Spark SQL.
      
      After [#6725](https://github.com/apache/spark/pull/6828), the semantic of [`CreateStruct`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala#L56) methods have changed a little and do not limited to cols of `NamedExpressions`, it will name non-NamedExpression fields following the hive convention, col1, col2 ...
      
      This PR would both loosen [`struct`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L723) to take children of `Expression` type and add `named_struct` support.
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #6874 from yijieshen/SPARK-8283 and squashes the following commits:
      
      4cd3375ac [Yijie Shen] change struct documentation
      d599d0b [Yijie Shen] rebase code
      9a7039e [Yijie Shen] fix reviews and regenerate golden answers
      b487354 [Yijie Shen] replace assert using checkAnswer
      f07e114 [Yijie Shen] tiny fix
      9613be9 [Yijie Shen] review fix
      7fef712 [Yijie Shen] Fix checkInputTypes' implementation using foldable and nullable
      60812a7 [Yijie Shen] Fix type check
      828d694 [Yijie Shen] remove unnecessary resolved assertion inside dataType method
      fd3cd8e [Yijie Shen] remove type check from eval
      7a71255 [Yijie Shen] tiny fix
      ccbbd86 [Yijie Shen] Fix reviews
      47da332 [Yijie Shen] remove nameStruct API from DataFrame
      917e680 [Yijie Shen] Fix reviews
      4bd75ad [Yijie Shen] loosen struct method in functions.scala to take Expression children
      0acb7be [Yijie Shen] Add CreateNamedStruct in both DataFrame function API and FunctionRegistery
      52302a80
    • Tarek Auel's avatar
      [SPARK-8223] [SPARK-8224] [SQL] shift left and shift right · 5b333813
      Tarek Auel authored
      Jira:
      https://issues.apache.org/jira/browse/SPARK-8223
      https://issues.apache.org/jira/browse/SPARK-8224
      
      ~~I am aware of #7174 and will update this pr, if it's merged.~~ Done
      I don't know if #7034 can simplify this, but we can have a look on it, if it gets merged
      
      rxin In the Jira ticket the function as no second argument. I added a `numBits` argument that allows to specify the number of bits. I guess this improves the usability. I wanted to add `shiftleft(value)` as well, but the `selectExpr` dataframe tests crashes, if I have both. I order to do this, I added the following to the functions.scala `def shiftRight(e: Column): Column = ShiftRight(e.expr, lit(1).expr)`, but as I mentioned this doesn't pass tests like `df.selectExpr("shiftRight(a)", ...` (not enough arguments exception).
      
      If we need the bitwise shift in order to be hive compatible, I suggest to add `shiftLeft` and something like `shiftLeftX`
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7178 from tarekauel/8223 and squashes the following commits:
      
      8023bb5 [Tarek Auel] [SPARK-8223][SPARK-8224] fixed test
      f3f64e6 [Tarek Auel] [SPARK-8223][SPARK-8224] Integer -> Int
      f628706 [Tarek Auel] [SPARK-8223][SPARK-8224] removed toString; updated function description
      3b56f2a [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223
      5189690 [Tarek Auel] [SPARK-8223][SPARK-8224] minor fix and style fix
      9434a28 [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223
      44ee324 [Tarek Auel] [SPARK-8223][SPARK-8224] docu fix
      ac7fe9d [Tarek Auel] [SPARK-8223][SPARK-8224] right and left bit shift
      5b333813
  3. Jul 01, 2015
    • Reynold Xin's avatar
      [SPARK-8770][SQL] Create BinaryOperator abstract class. · 9fd13d56
      Reynold Xin authored
      Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression.
      
      This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7174 from rxin/binary-opterator and squashes the following commits:
      
      f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
      fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator
      d8518cf [Reynold Xin] Updated Python tests.
      9fd13d56
  4. Jun 30, 2015
    • Tarek Auel's avatar
      [SPARK-8727] [SQL] Missing python api; md5, log2 · ccdb0522
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8727
      
      Author: Tarek Auel <tarek.auel@gmail.com>
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7114 from tarekauel/missing-python and squashes the following commits:
      
      ef4c61b [Tarek Auel] [SPARK-8727] revert dataframe change
      4029d4d [Tarek Auel] removed dataframe pi and e unit test
      66f0d2b [Tarek Auel] removed pi and e from python api and dataframe api; added _to_java_column(col) for strlen
      4d07318 [Tarek Auel] fixed python unit test
      45f2bee [Tarek Auel] fixed result of pi and e
      c39f47b [Tarek Auel] add python api
      bd50a3a [Tarek Auel] add missing python functions
      ccdb0522
  5. Jun 29, 2015
    • Tarek Auel's avatar
      [SPARK-8235] [SQL] misc function sha / sha1 · a5c2961c
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8235
      
      I added the support for sha1. If I understood rxin correctly, sha and sha1 should execute the same algorithm, shouldn't they?
      
      Please take a close look on the Python part. This is adopted from #6934
      
      Author: Tarek Auel <tarek.auel@gmail.com>
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #6963 from tarekauel/SPARK-8235 and squashes the following commits:
      
      f064563 [Tarek Auel] change to shaHex
      7ce3cdc [Tarek Auel] rely on automatic cast
      a1251d6 [Tarek Auel] Merge remote-tracking branch 'upstream/master' into SPARK-8235
      68eb043 [Tarek Auel] added docstring
      be5aff1 [Tarek Auel] improved error message
      7336c96 [Tarek Auel] added type check
      cf23a80 [Tarek Auel] simplified example
      ebf75ef [Tarek Auel] [SPARK-8301] updated the python documentation. Removed sha in python and scala
      6d6ff0d [Tarek Auel] [SPARK-8233] added docstring
      ea191a9 [Tarek Auel] [SPARK-8233] fixed signatureof python function. Added expected type to misc
      e3fd7c3 [Tarek Auel] SPARK[8235] added sha to the list of __all__
      e5dad4e [Tarek Auel] SPARK[8235] sha / sha1
      a5c2961c
  6. Jun 26, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-8237] [SQL] Add misc function sha2 · 47c874ba
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8237
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6934 from viirya/expr_sha2 and squashes the following commits:
      
      35e0bb3 [Liang-Chi Hsieh] For comments.
      68b5284 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2
      8573aff [Liang-Chi Hsieh] Remove unnecessary Product.
      ee61e06 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_sha2
      59e41aa [Liang-Chi Hsieh] Add misc function: sha2.
      47c874ba
  7. Jun 19, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-8207] [SQL] Add math function bin · 2c59d5c1
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8207
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6721 from viirya/expr_bin and squashes the following commits:
      
      07e1c8f [Liang-Chi Hsieh] Remove AbstractUnaryMathExpression and let BIN inherit UnaryExpression.
      0677f1a [Liang-Chi Hsieh] For comments.
      cf62b95 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      0cf20f2 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      dea9c12 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      d4f4774 [Liang-Chi Hsieh] Add @ignore_unicode_prefix.
      7a0196f [Liang-Chi Hsieh] Fix python style.
      ac2bacd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      a0a2d0f [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      4cb764d [Liang-Chi Hsieh] For comments.
      0f78682 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      c0c3197 [Liang-Chi Hsieh] Add bin to FunctionRegistry.
      824f761 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_bin
      50e0c3b [Liang-Chi Hsieh] Add math function bin(a: long): string.
      2c59d5c1
  8. Jun 18, 2015
    • Reynold Xin's avatar
      [SPARK-8218][SQL] Binary log math function update. · dc413138
      Reynold Xin authored
      Some minor updates based on after merging #6725.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6871 from rxin/log and squashes the following commits:
      
      ab51542 [Reynold Xin] Use JVM log
      76fc8de [Reynold Xin] Fixed arg.
      a7c1522 [Reynold Xin] [SPARK-8218][SQL] Binary log math function update.
      dc413138
    • Liang-Chi Hsieh's avatar
      [SPARK-8218][SQL] Add binary log math function · fee3438a
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8218
      
      Because there is already `log` unary function defined, the binary log function is called `logarithm` for now.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6725 from viirya/expr_binary_log and squashes the following commits:
      
      bf96bd9 [Liang-Chi Hsieh] Compare log result in string.
      102070d [Liang-Chi Hsieh] Round log result to better comparing in python test.
      fd01863 [Liang-Chi Hsieh] For comments.
      beed631 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      6089d11 [Liang-Chi Hsieh] Remove unnecessary override.
      8cf37b7 [Liang-Chi Hsieh] For comments.
      bc89597 [Liang-Chi Hsieh] For comments.
      db7dc38 [Liang-Chi Hsieh] Use ctor instead of companion object.
      0634ef7 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      1750034 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      3d75bfc [Liang-Chi Hsieh] Fix scala style.
      5b39c02 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      23c54a3 [Liang-Chi Hsieh] Fix scala style.
      ebc9929 [Liang-Chi Hsieh] Let Logarithm accept one parameter too.
      605574d [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      21c3bfd [Liang-Chi Hsieh] Fix scala style.
      c6c187f [Liang-Chi Hsieh] For comments.
      c795342 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
      f373bac [Liang-Chi Hsieh] Add binary log expression.
      fee3438a
  9. May 23, 2015
    • Davies Liu's avatar
      [SPARK-7322, SPARK-7836, SPARK-7822][SQL] DataFrame window function related updates · efe3bfdf
      Davies Liu authored
      1. ntile should take an integer as parameter.
      2. Added Python API (based on #6364)
      3. Update documentation of various DataFrame Python functions.
      
      Author: Davies Liu <davies@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6374 from rxin/window-final and squashes the following commits:
      
      69004c7 [Reynold Xin] Style fix.
      288cea9 [Reynold Xin] Update documentaiton.
      7cb8985 [Reynold Xin] Merge pull request #6364 from davies/window
      66092b4 [Davies Liu] update docs
      ed73cb4 [Reynold Xin] [SPARK-7322][SQL] Improve DataFrame window function documentation.
      ef55132 [Davies Liu] Merge branch 'master' of github.com:apache/spark into window4
      8936ade [Davies Liu] fix maxint in python 3
      2649358 [Davies Liu] update docs
      778e2c0 [Davies Liu] SPARK-7836 and SPARK-7822: Python API of window functions
      efe3bfdf
  10. May 21, 2015
    • Davies Liu's avatar
      [SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs · 8ddcb25b
      Davies Liu authored
      Add version info for public Python SQL API.
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6295 from davies/versions and squashes the following commits:
      
      cfd91e6 [Davies Liu] add more version for DataFrame API
      600834d [Davies Liu] add version to SQL API docs
      8ddcb25b
  11. May 18, 2015
    • Davies Liu's avatar
      [SPARK-6216] [PYSPARK] check python version of worker with driver · 32fbd297
      Davies Liu authored
      This PR revert #5404, change to pass the version of python in driver into JVM, check it in worker before deserializing closure, then it can works with different major version of Python.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6203 from davies/py_version and squashes the following commits:
      
      b8fb76e [Davies Liu] fix test
      6ce5096 [Davies Liu] use string for version
      47c6278 [Davies Liu] check python version of worker with driver
      32fbd297
  12. May 15, 2015
    • Davies Liu's avatar
      [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files · d7b69946
      Davies Liu authored
      dataframe.py is splited into column.py, group.py and dataframe.py:
      ```
         360 column.py
        1223 dataframe.py
         183 group.py
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6201 from davies/split_df and squashes the following commits:
      
      fc8f5ab [Davies Liu] split dataframe.py into multiple files
      d7b69946
  13. May 14, 2015
    • Michael Armbrust's avatar
      [SPARK-7548] [SQL] Add explode function for DataFrames · 6d0633e3
      Michael Armbrust authored
      Add an `explode` function for dataframes and modify the analyzer so that single table generating functions can be present in a select clause along with other expressions.   There are currently the following restrictions:
       - only top level TGFs are allowed (i.e. no `select(explode('list) + 1)`)
       - only one may be present in a single select to avoid potentially confusing implicit Cartesian products.
      
      TODO:
       - [ ] Python
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6107 from marmbrus/explodeFunction and squashes the following commits:
      
      7ee2c87 [Michael Armbrust] whitespace
      6f80ba3 [Michael Armbrust] Update dataframe.py
      c176c89 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      81b5da3 [Michael Armbrust] style
      d3faa05 [Michael Armbrust] fix self join case
      f9e1e3e [Michael Armbrust] fix python, add since
      4f0d0a9 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      e710fe4 [Michael Armbrust] add java and python
      52ca0dc [Michael Armbrust] [SPARK-7548][SQL] Add explode function for dataframes.
      6d0633e3
  14. May 12, 2015
    • Reynold Xin's avatar
      [SPARK-7321][SQL] Add Column expression for conditional statements (when/otherwise) · 97dee313
      Reynold Xin authored
      This builds on https://github.com/apache/spark/pull/5932 and should close https://github.com/apache/spark/pull/5932 as well.
      
      As an example:
      ```python
      df.select(when(df['age'] == 2, 3).otherwise(4).alias("age")).collect()
      ```
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: kaka1992 <kaka_1992@163.com>
      
      Closes #6072 from rxin/when-expr and squashes the following commits:
      
      8f49201 [Reynold Xin] Throw exception if otherwise is applied twice.
      0455eda [Reynold Xin] Reset run-tests.
      bfb9d9f [Reynold Xin] Updated documentation and test cases.
      762f6a5 [Reynold Xin] Merge pull request #5932 from kaka1992/IFCASE
      95724c6 [kaka1992] Update
      8218d0a [kaka1992] Update
      801009e [kaka1992] Update
      76d6346 [kaka1992] [SPARK-7321][SQL] Add Column expression for conditional statements (if, case)
      97dee313
  15. May 07, 2015
    • Olivier Girardot's avatar
      [SPARK-7118] [Python] Add the coalesce Spark SQL function available in PySpark · 068c3158
      Olivier Girardot authored
      This patch adds a proxy call from PySpark to the Spark SQL coalesce function and this patch comes out of a discussion on devspark with rxin
      
      This contribution is my original work and i license the work to the project under the project's open source license.
      
      Olivier.
      
      Author: Olivier Girardot <o.girardot@lateral-thoughts.com>
      
      Closes #5698 from ogirardot/master and squashes the following commits:
      
      d9a4439 [Olivier Girardot] SPARK-7118 Add the coalesce Spark SQL function available in PySpark
      068c3158
    • Shiti's avatar
      [SPARK-7295][SQL] bitwise operations for DataFrame DSL · fa8fddff
      Shiti authored
      Author: Shiti <ssaxena.ece@gmail.com>
      
      Closes #5867 from Shiti/spark-7295 and squashes the following commits:
      
      71a9913 [Shiti] implementation for bitwise and,or, not and xor on Column with tests and docs
      fa8fddff
  16. May 06, 2015
    • Burak Yavuz's avatar
      [SPARK-7358][SQL] Move DataFrame mathfunctions into functions · ba2b5661
      Burak Yavuz authored
      After a discussion on the user mailing list, it was decided to put all UDF's under `o.a.s.sql.functions`
      
      cc rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5923 from brkyvz/move-math-funcs and squashes the following commits:
      
      a8dc3f7 [Burak Yavuz] address comments
      cf7a7bb [Burak Yavuz] [SPARK-7358] Move DataFrame mathfunctions into functions
      ba2b5661
  17. May 01, 2015
    • Reynold Xin's avatar
      [SPARK-7274] [SQL] Create Column expression for array/struct creation. · 37537760
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5802 from rxin/SPARK-7274 and squashes the following commits:
      
      19aecaa [Reynold Xin] Fixed unicode tests.
      bfc1538 [Reynold Xin] Export all Python functions.
      2517b8c [Reynold Xin] Code review.
      23da335 [Reynold Xin] Fixed Python bug.
      132002e [Reynold Xin] Fixed tests.
      56fce26 [Reynold Xin] Added Python support.
      b0d591a [Reynold Xin] Fixed debug error.
      86926a6 [Reynold Xin] Added test suite.
      7dbb9ab [Reynold Xin] Ok one more.
      470e2f5 [Reynold Xin] One more MLlib ...
      e2d14f0 [Reynold Xin] [SPARK-7274][SQL] Create Column expression for array/struct creation.
      37537760
  18. Apr 30, 2015
    • Burak Yavuz's avatar
      [SPARK-7248] implemented random number generators for DataFrames · b5347a46
      Burak Yavuz authored
      Adds the functions `rand` (Uniform Dist) and `randn` (Normal Dist.) as expressions to DataFrames.
      
      cc mengxr rxin
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5819 from brkyvz/df-rng and squashes the following commits:
      
      50d69d4 [Burak Yavuz] add seed for test that failed
      4234c3a [Burak Yavuz] fix Rand expression
      13cad5c [Burak Yavuz] couple fixes
      7d53953 [Burak Yavuz] waiting for hive tests
      b453716 [Burak Yavuz] move radn with seed down
      03637f0 [Burak Yavuz] fix broken hive func
      c5909eb [Burak Yavuz] deleted old implementation of Rand
      6d43895 [Burak Yavuz] implemented random generators
      b5347a46
  19. Apr 29, 2015
    • Burak Yavuz's avatar
      [SPARK-7188] added python support for math DataFrame functions · fe917f5e
      Burak Yavuz authored
      Adds support for the math functions for DataFrames in PySpark.
      
      rxin I love Davies.
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5750 from brkyvz/python-math-udfs and squashes the following commits:
      
      7c4f563 [Burak Yavuz] removed is_math
      3c4adde [Burak Yavuz] cleanup imports
      d5dca3f [Burak Yavuz] moved math functions to mathfunctions
      25e6534 [Burak Yavuz] addressed comments v2.0
      d3f7e0f [Burak Yavuz] addressed comments and added tests
      7b7d7c4 [Burak Yavuz] remove tests for removed methods
      33c2c15 [Burak Yavuz] fixed python style
      3ee0c05 [Burak Yavuz] added python functions
      fe917f5e
  20. Apr 28, 2015
    • Reynold Xin's avatar
      [SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs. · d94cd1a7
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5709 from rxin/inc-id and squashes the following commits:
      
      7853611 [Reynold Xin] private sql.
      a9fda0d [Reynold Xin] Missed a few numbers.
      343d896 [Reynold Xin] Self review feedback.
      a7136cb [Reynold Xin] [SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs.
      d94cd1a7
  21. Apr 26, 2015
  22. Apr 17, 2015
    • Davies Liu's avatar
      [SPARK-6957] [SPARK-6958] [SQL] improve API compatibility to pandas · c84d9169
      Davies Liu authored
      ```
      select(['cola', 'colb'])
      
      groupby(['colA', 'colB'])
      groupby([df.colA, df.colB])
      
      df.sort('A', ascending=True)
      df.sort(['A', 'B'], ascending=True)
      df.sort(['A', 'B'], ascending=[1, 0])
      ```
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5544 from davies/compatibility and squashes the following commits:
      
      4944058 [Davies Liu] add docstrings
      adb2816 [Davies Liu] Merge branch 'master' of github.com:apache/spark into compatibility
      bcbbcab [Davies Liu] support ascending as list
      8dabdf0 [Davies Liu] improve API compatibility to pandas
      c84d9169
  23. Apr 16, 2015
    • Davies Liu's avatar
      [SPARK-4897] [PySpark] Python 3 support · 04e44b37
      Davies Liu authored
      This PR update PySpark to support Python 3 (tested with 3.4).
      
      Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped.
      
      TODO: ec2/spark-ec2.py is not fully tested with python3.
      
      Author: Davies Liu <davies@databricks.com>
      Author: twneale <twneale@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5173 from davies/python3 and squashes the following commits:
      
      d7d6323 [Davies Liu] fix tests
      6c52a98 [Davies Liu] fix mllib test
      99e334f [Davies Liu] update timeout
      b716610 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      cafd5ec [Davies Liu] adddress comments from @mengxr
      bf225d7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      179fc8d [Davies Liu] tuning flaky tests
      8c8b957 [Davies Liu] fix ResourceWarning in Python 3
      5c57c95 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      4006829 [Davies Liu] fix test
      2fc0066 [Davies Liu] add python3 path
      71535e9 [Davies Liu] fix xrange and divide
      5a55ab4 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      125f12c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ed498c8 [Davies Liu] fix compatibility with python 3
      820e649 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      e8ce8c9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ad7c374 [Davies Liu] fix mllib test and warning
      ef1fc2f [Davies Liu] fix tests
      4eee14a [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      20112ff [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      59bb492 [Davies Liu] fix tests
      1da268c [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      ca0fdd3 [Davies Liu] fix code style
      9563a15 [Davies Liu] add imap back for python 2
      0b1ec04 [Davies Liu] make python examples work with Python 3
      d2fd566 [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      a716d34 [Davies Liu] test with python 3.4
      f1700e8 [Davies Liu] fix test in python3
      671b1db [Davies Liu] fix test in python3
      692ff47 [Davies Liu] fix flaky test
      7b9699f [Davies Liu] invalidate import cache for Python 3.3+
      9c58497 [Davies Liu] fix kill worker
      309bfbf [Davies Liu] keep compatibility
      5707476 [Davies Liu] cleanup, fix hash of string in 3.3+
      8662d5b [Davies Liu] Merge branch 'master' of github.com:apache/spark into python3
      f53e1f0 [Davies Liu] fix tests
      70b6b73 [Davies Liu] compile ec2/spark_ec2.py in python 3
      a39167e [Davies Liu] support customize class in __main__
      814c77b [Davies Liu] run unittests with python 3
      7f4476e [Davies Liu] mllib tests passed
      d737924 [Davies Liu] pass ml tests
      375ea17 [Davies Liu] SQL tests pass
      6cc42a9 [Davies Liu] rename
      431a8de [Davies Liu] streaming tests pass
      78901a7 [Davies Liu] fix hash of serializer in Python 3
      24b2f2e [Davies Liu] pass all RDD tests
      35f48fe [Davies Liu] run future again
      1eebac2 [Davies Liu] fix conflict in ec2/spark_ec2.py
      6e3c21d [Davies Liu] make cloudpickle work with Python3
      2fb2db3 [Josh Rosen] Guard more changes behind sys.version; still doesn't run
      1aa5e8f [twneale] Turned out `pickle.DictionaryType is dict` == True, so swapped it out
      7354371 [twneale] buffer --> memoryview  I'm not super sure if this a valid change, but the 2.7 docs recommend using memoryview over buffer where possible, so hoping it'll work.
      b69ccdf [twneale] Uses the pure python pickle._Pickler instead of c-extension _pickle.Pickler. It appears pyspark 2.7 uses the pure python pickler as well, so this shouldn't degrade pickling performance (?).
      f40d925 [twneale] xrange --> range
      e104215 [twneale] Replaces 2.7 types.InstsanceType with 3.4 `object`....could be horribly wrong depending on how types.InstanceType is used elsewhere in the package--see http://bugs.python.org/issue8206
      79de9d0 [twneale] Replaces python2.7 `file` with 3.4 _io.TextIOWrapper
      2adb42d [Josh Rosen] Fix up some import differences between Python 2 and 3
      854be27 [Josh Rosen] Run `futurize` on Python code:
      7c5b4ce [Josh Rosen] Remove Python 3 check in shell.py.
      04e44b37
  24. Apr 08, 2015
    • Davies Liu's avatar
      [SPARK-6781] [SQL] use sqlContext in python shell · 6ada4f6f
      Davies Liu authored
      Use `sqlContext` in PySpark shell, make it consistent with SQL programming guide. `sqlCtx` is also kept for compatibility.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5425 from davies/sqlCtx and squashes the following commits:
      
      af67340 [Davies Liu] sqlCtx -> sqlContext
      15a278f [Davies Liu] use sqlContext in python shell
      6ada4f6f
  25. Apr 01, 2015
    • ksonj's avatar
      [SPARK-6553] [pyspark] Support functools.partial as UDF · 757b2e91
      ksonj authored
      
      Use `f.__repr__()` instead of `f.__name__` when instantiating `UserDefinedFunction`s, so `functools.partial`s may be used.
      
      Author: ksonj <kson@siberie.de>
      
      Closes #5206 from ksonj/partials and squashes the following commits:
      
      ea66f3d [ksonj] Inserted blank lines for PEP8 compliance
      d81b02b [ksonj] added tests for udf with partial function and callable object
      2c76100 [ksonj] Makes UDFs work with all types of callables
      b814a12 [ksonj] support functools.partial as udf
      
      (cherry picked from commit 98f72dfc)
      Signed-off-by: default avatarJosh Rosen <joshrosen@databricks.com>
      757b2e91
  26. Mar 31, 2015
    • Reynold Xin's avatar
      [Doc] Improve Python DataFrame documentation · 305abe1e
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5287 from rxin/pyspark-df-doc-cleanup-context and squashes the following commits:
      
      1841b60 [Reynold Xin] Lint.
      f2007f1 [Reynold Xin] functions and types.
      bc3b72b [Reynold Xin] More improvements to DataFrame Python doc.
      ac1d4c0 [Reynold Xin] Bug fix.
      b163365 [Reynold Xin] Python fix. Added Experimental flag to DataFrameNaFunctions.
      608422d [Reynold Xin] [Doc] Cleanup context.py Python docs.
      305abe1e
  27. Feb 24, 2015
    • Davies Liu's avatar
      [SPARK-5994] [SQL] Python DataFrame documentation fixes · d641fbb3
      Davies Liu authored
      select empty should NOT be the same as select. make sure selectExpr is behaving the same.
      join param documentation
      link to source doesn't work in jekyll generated file
      cross reference of columns (i.e. enabling linking)
      show(): move df example before df.show()
      move tests in SQLContext out of docstring otherwise doc is too long
      Column.desc and .asc doesn't have any documentation
      in documentation, sort functions.*)
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4756 from davies/df_docs and squashes the following commits:
      
      f30502c [Davies Liu] fix doc
      32f0d46 [Davies Liu] fix DataFrame docs
      d641fbb3
    • Reynold Xin's avatar
      [SPARK-5985][SQL] DataFrame sortBy -> orderBy in Python. · fba11c2f
      Reynold Xin authored
      Also added desc/asc function for constructing sorting expressions more conveniently. And added a small fix to lift alias out of cast expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4752 from rxin/SPARK-5985 and squashes the following commits:
      
      aeda5ae [Reynold Xin] Added Experimental flag to ColumnName.
      047ad03 [Reynold Xin] Lift alias out of cast.
      c9cf17c [Reynold Xin] [SPARK-5985][SQL] DataFrame sortBy -> orderBy in Python.
      fba11c2f
  28. Feb 17, 2015
    • Davies Liu's avatar
      [SPARK-5859] [PySpark] [SQL] fix DataFrame Python API · d8adefef
      Davies Liu authored
      1. added explain()
      2. add isLocal()
      3. do not call show() in __repl__
      4. add foreach() and foreachPartition()
      5. add distinct()
      6. fix functions.col()/column()/lit()
      7. fix unit tests in sql/functions.py
      8. fix unicode in showString()
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4645 from davies/df6 and squashes the following commits:
      
      6b46a2c [Davies Liu] fix DataFrame Python API
      d8adefef
  29. Feb 16, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-5799][SQL] Compute aggregation function on specified numeric columns · 5c78be7a
      Liang-Chi Hsieh authored
      Compute aggregation function on specified numeric columns. For example:
      
          val df = Seq(("a", 1, 0, "b"), ("b", 2, 4, "c"), ("a", 2, 3, "d")).toDataFrame("key", "value1", "value2", "rest")
          df.groupBy("key").min("value2")
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4592 from viirya/specific_cols_agg and squashes the following commits:
      
      9446896 [Liang-Chi Hsieh] For comments.
      314c4cd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      353fad7 [Liang-Chi Hsieh] For python unit tests.
      54ed0c4 [Liang-Chi Hsieh] Address comments.
      b079e6b [Liang-Chi Hsieh] Remove duplicate codes.
      55100fb [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      880c2ac [Liang-Chi Hsieh] Fix Python style checks.
      4c63a01 [Liang-Chi Hsieh] Fix pyspark.
      b1a24fc [Liang-Chi Hsieh] Address comments.
      2592f29 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      27069c3 [Liang-Chi Hsieh] Combine functions and add varargs annotation.
      371a3f7 [Liang-Chi Hsieh] Compute aggregation function on specified numeric columns.
      5c78be7a
  30. Feb 14, 2015
    • Reynold Xin's avatar
      [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames · e98dfe62
      Reynold Xin authored
      - The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
      - toDataFrame -> toDF
      - Dsl -> functions
      - implicits moved into SQLContext.implicits
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      
      Python changes:
      - toDataFrame -> toDF
      - Dsl -> functions package
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      - add toDF functions to RDD on SQLContext init
      - add flatMap to DataFrame
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
      
      5ef9910 [Reynold Xin] More fix
      61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
      ff5832c [Reynold Xin] Fix python
      749c675 [Reynold Xin] count(*) fixes.
      5806df0 [Reynold Xin] Fix build break again.
      d941f3d [Reynold Xin] Fixed explode compilation break.
      fe1267a [Davies Liu] flatMap
      c4afb8e [Reynold Xin] style
      d9de47f [Davies Liu] add comment
      b783994 [Davies Liu] add comment for toDF
      e2154e5 [Davies Liu] schema() -> schema
      3a1004f [Davies Liu] Dsl -> functions, toDF()
      fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      97dd47c [Davies Liu] fix mistake
      6168f74 [Davies Liu] fix test
      1fc0199 [Davies Liu] fix test
      a075cd5 [Davies Liu] clean up, toPandas
      663d314 [Davies Liu] add test for agg('*')
      9e214d5 [Reynold Xin] count(*) fixes.
      1ed7136 [Reynold Xin] Fix build break again.
      921b2e3 [Reynold Xin] Fixed explode compilation break.
      14698d4 [Davies Liu] flatMap
      ba3e12d [Reynold Xin] style
      d08c92d [Davies Liu] add comment
      5c8b524 [Davies Liu] add comment for toDF
      a4e5e66 [Davies Liu] schema() -> schema
      d377fc9 [Davies Liu] Dsl -> functions, toDF()
      6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      e98dfe62
Loading