Skip to content
Snippets Groups Projects
  • Dongjoon Hyun's avatar
    d42bb7cc
    [SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT · d42bb7cc
    Dongjoon Hyun authored
    ## What changes were proposed in this pull request?
    
    Currently, `SQLBuilder` handles `LIMIT` by always adding `LIMIT` at the end of the generated subSQL. It makes `RuntimeException`s like the following. This PR adds a parenthesis always except `SubqueryAlias` is used together with `LIMIT`.
    
    **Before**
    
    ``` scala
    scala> sql("CREATE TABLE tbl(id INT)")
    scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2")
    java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ...
    ```
    
    **After**
    
    ``` scala
    scala> sql("CREATE TABLE tbl(id INT)")
    scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2")
    scala> sql("SELECT id2 FROM v1")
    res4: org.apache.spark.sql.DataFrame = [id2: int]
    ```
    
    **Fixed cases in this PR**
    
    The following two cases are the detail query plans having problematic SQL generations.
    
    1. `SELECT * FROM (SELECT id FROM tbl LIMIT 2)`
    
        Please note that **FROM SELECT** part of the generated SQL in the below. When we don't use '()' for limit, this fails.
    
    ```scala
    # Original logical plan:
    Project [id#1]
    +- GlobalLimit 2
       +- LocalLimit 2
          +- Project [id#1]
             +- MetastoreRelation default, tbl
    
    # Canonicalized logical plan:
    Project [gen_attr_0#1 AS id#4]
    +- SubqueryAlias tbl
       +- Project [gen_attr_0#1]
          +- GlobalLimit 2
             +- LocalLimit 2
                +- Project [gen_attr_0#1]
                   +- SubqueryAlias gen_subquery_0
                      +- Project [id#1 AS gen_attr_0#1]
                         +- SQLTable default, tbl, [id#1]
    
    # Generated SQL:
    SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM SELECT `gen_attr_0` FROM (SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2) AS tbl
    ```
    
    2. `SELECT * FROM (SELECT id FROM tbl TABLESAMPLE (2 ROWS))`
    
        Please note that **((~~~) AS gen_subquery_0 LIMIT 2)** in the below. When we use '()' for limit on `SubqueryAlias`, this fails.
    
    ```scala
    # Original logical plan:
    Project [id#1]
    +- Project [id#1]
       +- GlobalLimit 2
          +- LocalLimit 2
             +- MetastoreRelation default, tbl
    
    # Canonicalized logical plan:
    Project [gen_attr_0#1 AS id#4]
    +- SubqueryAlias tbl
       +- Project [gen_attr_0#1]
          +- GlobalLimit 2
             +- LocalLimit 2
                +- SubqueryAlias gen_subquery_0
                   +- Project [id#1 AS gen_attr_0#1]
                      +- SQLTable default, tbl, [id#1]
    
    # Generated SQL:
    SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM ((SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2)) AS tbl
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test with a newly added test case.
    
    Author: Dongjoon Hyun <dongjoon@apache.org>
    
    Closes #15546 from dongjoon-hyun/SPARK-17982.
    d42bb7cc
    History
    [SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with parenthesis for LIMIT
    Dongjoon Hyun authored
    ## What changes were proposed in this pull request?
    
    Currently, `SQLBuilder` handles `LIMIT` by always adding `LIMIT` at the end of the generated subSQL. It makes `RuntimeException`s like the following. This PR adds a parenthesis always except `SubqueryAlias` is used together with `LIMIT`.
    
    **Before**
    
    ``` scala
    scala> sql("CREATE TABLE tbl(id INT)")
    scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2")
    java.lang.RuntimeException: Failed to analyze the canonicalized SQL: ...
    ```
    
    **After**
    
    ``` scala
    scala> sql("CREATE TABLE tbl(id INT)")
    scala> sql("CREATE VIEW v1(id2) AS SELECT id FROM tbl LIMIT 2")
    scala> sql("SELECT id2 FROM v1")
    res4: org.apache.spark.sql.DataFrame = [id2: int]
    ```
    
    **Fixed cases in this PR**
    
    The following two cases are the detail query plans having problematic SQL generations.
    
    1. `SELECT * FROM (SELECT id FROM tbl LIMIT 2)`
    
        Please note that **FROM SELECT** part of the generated SQL in the below. When we don't use '()' for limit, this fails.
    
    ```scala
    # Original logical plan:
    Project [id#1]
    +- GlobalLimit 2
       +- LocalLimit 2
          +- Project [id#1]
             +- MetastoreRelation default, tbl
    
    # Canonicalized logical plan:
    Project [gen_attr_0#1 AS id#4]
    +- SubqueryAlias tbl
       +- Project [gen_attr_0#1]
          +- GlobalLimit 2
             +- LocalLimit 2
                +- Project [gen_attr_0#1]
                   +- SubqueryAlias gen_subquery_0
                      +- Project [id#1 AS gen_attr_0#1]
                         +- SQLTable default, tbl, [id#1]
    
    # Generated SQL:
    SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM SELECT `gen_attr_0` FROM (SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2) AS tbl
    ```
    
    2. `SELECT * FROM (SELECT id FROM tbl TABLESAMPLE (2 ROWS))`
    
        Please note that **((~~~) AS gen_subquery_0 LIMIT 2)** in the below. When we use '()' for limit on `SubqueryAlias`, this fails.
    
    ```scala
    # Original logical plan:
    Project [id#1]
    +- Project [id#1]
       +- GlobalLimit 2
          +- LocalLimit 2
             +- MetastoreRelation default, tbl
    
    # Canonicalized logical plan:
    Project [gen_attr_0#1 AS id#4]
    +- SubqueryAlias tbl
       +- Project [gen_attr_0#1]
          +- GlobalLimit 2
             +- LocalLimit 2
                +- SubqueryAlias gen_subquery_0
                   +- Project [id#1 AS gen_attr_0#1]
                      +- SQLTable default, tbl, [id#1]
    
    # Generated SQL:
    SELECT `gen_attr_0` AS `id` FROM (SELECT `gen_attr_0` FROM ((SELECT `id` AS `gen_attr_0` FROM `default`.`tbl`) AS gen_subquery_0 LIMIT 2)) AS tbl
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test with a newly added test case.
    
    Author: Dongjoon Hyun <dongjoon@apache.org>
    
    Closes #15546 from dongjoon-hyun/SPARK-17982.