Skip to content
Snippets Groups Projects
Commit 4bac703e authored by hyukjinkwon's avatar hyukjinkwon Committed by Reynold Xin
Browse files

[SPARK-13667][SQL] Support for specifying custom date format for date and...

[SPARK-13667][SQL] Support for specifying custom date format for date and timestamp types at CSV datasource.

## What changes were proposed in this pull request?

This PR adds the support to specify custom date format for `DateType` and `TimestampType`.

For `TimestampType`, this uses the given format to infer schema and also to convert the values
For `DateType`, this uses the given format to convert the values.
If the `dateFormat` is not given, then it works with `DateTimeUtils.stringToTime()` for backwords compatibility.
When it's given, then it uses `SimpleDateFormat` for parsing data.

In addition, `IntegerType`, `DoubleType` and `LongType` have a higher priority than `TimestampType` in type inference. This means even if the given format is `yyyy` or `yyyy.MM`, it will be inferred as `IntegerType` or `DoubleType`. Since it is type inference, I think it is okay to give such precedences.

In addition, I renamed `csv.CSVInferSchema` to `csv.InferSchema` as JSON datasource has `json.InferSchema`. Although they have the same names, I did this because I thought the parent package name can still differentiate each.  Accordingly, the suite name was also changed from `CSVInferSchemaSuite` to `InferSchemaSuite`.

## How was this patch tested?

unit tests are used and `./dev/run_tests` for coding style tests.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #11550 from HyukjinKwon/SPARK-13667.
parent ac41fc64
No related branches found
No related tags found
No related merge requests found
Showing with 173 additions and 66 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment