Skip to content
Snippets Groups Projects
  • Reynold Xin's avatar
    119f45d6
    [SPARK-5097][SQL] DataFrame · 119f45d6
    Reynold Xin authored
    This pull request redesigns the existing Spark SQL dsl, which already provides data frame like functionalities.
    
    TODOs:
    With the exception of Python support, other tasks can be done in separate, follow-up PRs.
    - [ ] Audit of the API
    - [ ] Documentation
    - [ ] More test cases to cover the new API
    - [x] Python support
    - [ ] Type alias SchemaRDD
    
    Author: Reynold Xin <rxin@databricks.com>
    Author: Davies Liu <davies@databricks.com>
    
    Closes #4173 from rxin/df1 and squashes the following commits:
    
    0a1a73b [Reynold Xin] Merge branch 'df1' of github.com:rxin/spark into df1
    23b4427 [Reynold Xin] Mima.
    828f70d [Reynold Xin] Merge pull request #7 from davies/df
    257b9e6 [Davies Liu] add repartition
    6bf2b73 [Davies Liu] fix collect with UDT and tests
    e971078 [Reynold Xin] Missing quotes.
    b9306b4 [Reynold Xin] Remove removeColumn/updateColumn for now.
    a728bf2 [Reynold Xin] Example rename.
    e8aa3d3 [Reynold Xin] groupby -> groupBy.
    9662c9e [Davies Liu] improve DataFrame Python API
    4ae51ea [Davies Liu] python API for dataframe
    1e5e454 [Reynold Xin] Fixed a bug with symbol conversion.
    2ca74db [Reynold Xin] Couple minor fixes.
    ea98ea1 [Reynold Xin] Documentation & literal expressions.
    2b22684 [Reynold Xin] Got rid of IntelliJ problems.
    02bbfbc [Reynold Xin] Tightening imports.
    ffbce66 [Reynold Xin] Fixed compilation error.
    59b6d8b [Reynold Xin] Style violation.
    b85edfb [Reynold Xin] ALS.
    8c37f0a [Reynold Xin] Made MLlib and examples compile
    6d53134 [Reynold Xin] Hive module.
    d35efd5 [Reynold Xin] Fixed compilation error.
    ce4a5d2 [Reynold Xin] Fixed test cases in SQL except ParquetIOSuite.
    66d5ef1 [Reynold Xin] SQLContext minor patch.
    c9bcdc0 [Reynold Xin] Checkpoint: SQL module compiles!
    119f45d6
    History
    [SPARK-5097][SQL] DataFrame
    Reynold Xin authored
    This pull request redesigns the existing Spark SQL dsl, which already provides data frame like functionalities.
    
    TODOs:
    With the exception of Python support, other tasks can be done in separate, follow-up PRs.
    - [ ] Audit of the API
    - [ ] Documentation
    - [ ] More test cases to cover the new API
    - [x] Python support
    - [ ] Type alias SchemaRDD
    
    Author: Reynold Xin <rxin@databricks.com>
    Author: Davies Liu <davies@databricks.com>
    
    Closes #4173 from rxin/df1 and squashes the following commits:
    
    0a1a73b [Reynold Xin] Merge branch 'df1' of github.com:rxin/spark into df1
    23b4427 [Reynold Xin] Mima.
    828f70d [Reynold Xin] Merge pull request #7 from davies/df
    257b9e6 [Davies Liu] add repartition
    6bf2b73 [Davies Liu] fix collect with UDT and tests
    e971078 [Reynold Xin] Missing quotes.
    b9306b4 [Reynold Xin] Remove removeColumn/updateColumn for now.
    a728bf2 [Reynold Xin] Example rename.
    e8aa3d3 [Reynold Xin] groupby -> groupBy.
    9662c9e [Davies Liu] improve DataFrame Python API
    4ae51ea [Davies Liu] python API for dataframe
    1e5e454 [Reynold Xin] Fixed a bug with symbol conversion.
    2ca74db [Reynold Xin] Couple minor fixes.
    ea98ea1 [Reynold Xin] Documentation & literal expressions.
    2b22684 [Reynold Xin] Got rid of IntelliJ problems.
    02bbfbc [Reynold Xin] Tightening imports.
    ffbce66 [Reynold Xin] Fixed compilation error.
    59b6d8b [Reynold Xin] Style violation.
    b85edfb [Reynold Xin] ALS.
    8c37f0a [Reynold Xin] Made MLlib and examples compile
    6d53134 [Reynold Xin] Hive module.
    d35efd5 [Reynold Xin] Fixed compilation error.
    ce4a5d2 [Reynold Xin] Fixed test cases in SQL except ParquetIOSuite.
    66d5ef1 [Reynold Xin] SQLContext minor patch.
    c9bcdc0 [Reynold Xin] Checkpoint: SQL module compiles!