Skip to content
  • hyukjinkwon's avatar
    4bac703e
    [SPARK-13667][SQL] Support for specifying custom date format for date and... · 4bac703e
    hyukjinkwon authored
    [SPARK-13667][SQL] Support for specifying custom date format for date and timestamp types at CSV datasource.
    
    ## What changes were proposed in this pull request?
    
    This PR adds the support to specify custom date format for `DateType` and `TimestampType`.
    
    For `TimestampType`, this uses the given format to infer schema and also to convert the values
    For `DateType`, this uses the given format to convert the values.
    If the `dateFormat` is not given, then it works with `DateTimeUtils.stringToTime()` for backwords compatibility.
    When it's given, then it uses `SimpleDateFormat` for parsing data.
    
    In addition, `IntegerType`, `DoubleType` and `LongType` have a higher priority than `TimestampType` in type inference. This means even if the given format is `yyyy` or `yyyy.MM`, it will be inferred as `IntegerType` or `DoubleType`. Since it is type inference, I think it is okay to give such precedences.
    
    In addition, I renamed `csv.CSVInferSchema` to `csv.InferSchema` as JSON datasource has `json.InferSchema`. Although they have the same names, I did this because I thought the parent package name can still differentiate each.  Accordingly, the suite name was also changed from `CSVInferSchemaSuite` to `InferSchemaSuite`.
    
    ## How was this patch tested?
    
    unit tests are used and `./dev/run_tests` for coding style tests.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #11550 from HyukjinKwon/SPARK-13667.
    4bac703e
    [SPARK-13667][SQL] Support for specifying custom date format for date and...
    hyukjinkwon authored
    [SPARK-13667][SQL] Support for specifying custom date format for date and timestamp types at CSV datasource.
    
    ## What changes were proposed in this pull request?
    
    This PR adds the support to specify custom date format for `DateType` and `TimestampType`.
    
    For `TimestampType`, this uses the given format to infer schema and also to convert the values
    For `DateType`, this uses the given format to convert the values.
    If the `dateFormat` is not given, then it works with `DateTimeUtils.stringToTime()` for backwords compatibility.
    When it's given, then it uses `SimpleDateFormat` for parsing data.
    
    In addition, `IntegerType`, `DoubleType` and `LongType` have a higher priority than `TimestampType` in type inference. This means even if the given format is `yyyy` or `yyyy.MM`, it will be inferred as `IntegerType` or `DoubleType`. Since it is type inference, I think it is okay to give such precedences.
    
    In addition, I renamed `csv.CSVInferSchema` to `csv.InferSchema` as JSON datasource has `json.InferSchema`. Although they have the same names, I did this because I thought the parent package name can still differentiate each.  Accordingly, the suite name was also changed from `CSVInferSchemaSuite` to `InferSchemaSuite`.
    
    ## How was this patch tested?
    
    unit tests are used and `./dev/run_tests` for coding style tests.
    
    Author: hyukjinkwon <gurwls223@gmail.com>
    
    Closes #11550 from HyukjinKwon/SPARK-13667.
Loading