Skip to content
  • Liang-Chi Hsieh's avatar
    8135e0e5
    [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing · 8135e0e5
    Liang-Chi Hsieh authored
    ## What changes were proposed in this pull request?
    
    When reading file stream with non-globbing path, the results return data with all `null`s for the
    partitioned columns. E.g.,
    
        case class A(id: Int, value: Int)
        val data = spark.createDataset(Seq(
          A(1, 1),
          A(2, 2),
          A(2, 3))
        )
        val url = "/tmp/test"
        data.write.partitionBy("id").parquet(url)
        spark.read.parquet(url).show
    
        +-----+---+
        |value| id|
        +-----+---+
        |    2|  2|
        |    3|  2|
        |    1|  1|
        +-----+---+
    
        val s = spark.readStream.schema(spark.read.load(url).schema).parquet(url)
        s.writeStream.queryName("test").format("memory").start()
    
        sql("SELECT * FROM test").show
    
        +-----+----+
        |value|  id|
        +-----+----+
        |    2|null|
        |    3|null|
        |    1|null|
        +-----+----+
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes #14803 from viirya/filestreamsource-option.
    8135e0e5
    [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing
    Liang-Chi Hsieh authored
    ## What changes were proposed in this pull request?
    
    When reading file stream with non-globbing path, the results return data with all `null`s for the
    partitioned columns. E.g.,
    
        case class A(id: Int, value: Int)
        val data = spark.createDataset(Seq(
          A(1, 1),
          A(2, 2),
          A(2, 3))
        )
        val url = "/tmp/test"
        data.write.partitionBy("id").parquet(url)
        spark.read.parquet(url).show
    
        +-----+---+
        |value| id|
        +-----+---+
        |    2|  2|
        |    3|  2|
        |    1|  1|
        +-----+---+
    
        val s = spark.readStream.schema(spark.read.load(url).schema).parquet(url)
        s.writeStream.queryName("test").format("memory").start()
    
        sql("SELECT * FROM test").show
    
        +-----+----+
        |value|  id|
        +-----+----+
        |    2|null|
        |    3|null|
        |    1|null|
        +-----+----+
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes #14803 from viirya/filestreamsource-option.
Loading