Skip to content
Snippets Groups Projects
Commit ca4e960a authored by aokolnychyi's avatar aokolnychyi Committed by Takuya UESHIN
Browse files

[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds

The PR contains a tiny change to fix the way Spark parses string literals into timestamps. Currently, some timestamps that contain nanoseconds are corrupted during the conversion from internal UTF8Strings into the internal representation of timestamps.

Consider the following example:
```
spark.sql("SELECT cast('2015-01-02 00:00:00.000000001' as TIMESTAMP)").show(false)
+------------------------------------------------+
|CAST(2015-01-02 00:00:00.000000001 AS TIMESTAMP)|
+------------------------------------------------+
|2015-01-02 00:00:00.000001                      |
+------------------------------------------------+
```

The fix was tested with existing tests. Also, there is a new test to cover cases that did not work previously.

Author: aokolnychyi <anton.okolnychyi@sap.com>

Closes #18252 from aokolnychyi/spark-17914.
parent 22dd65f5
No related branches found
No related tags found
No related merge requests found
......@@ -32,7 +32,7 @@ import org.apache.spark.unsafe.types.UTF8String
* Helper functions for converting between internal and external date and time representations.
* Dates are exposed externally as java.sql.Date and are represented internally as the number of
* dates since the Unix epoch (1970-01-01). Timestamps are exposed externally as java.sql.Timestamp
* and are stored internally as longs, which are capable of storing timestamps with 100 nanosecond
* and are stored internally as longs, which are capable of storing timestamps with microsecond
* precision.
*/
object DateTimeUtils {
......@@ -399,13 +399,14 @@ object DateTimeUtils {
digitsMilli += 1
}
if (!justTime && isInvalidDate(segments(0), segments(1), segments(2))) {
return None
// We are truncating the nanosecond part, which results in loss of precision
while (digitsMilli > 6) {
segments(6) /= 10
digitsMilli -= 1
}
// Instead of return None, we truncate the fractional seconds to prevent inserting NULL
if (segments(6) > 999999) {
segments(6) = segments(6).toString.take(6).toInt
if (!justTime && isInvalidDate(segments(0), segments(1), segments(2))) {
return None
}
if (segments(3) < 0 || segments(3) > 23 || segments(4) < 0 || segments(4) > 59 ||
......
......@@ -34,6 +34,22 @@ class DateTimeUtilsSuite extends SparkFunSuite {
((timestamp + tz.getOffset(timestamp)) / MILLIS_PER_DAY).toInt
}
test("nanoseconds truncation") {
def checkStringToTimestamp(originalTime: String, expectedParsedTime: String) {
val parsedTimestampOp = DateTimeUtils.stringToTimestamp(UTF8String.fromString(originalTime))
assert(parsedTimestampOp.isDefined, "timestamp with nanoseconds was not parsed correctly")
assert(DateTimeUtils.timestampToString(parsedTimestampOp.get) === expectedParsedTime)
}
checkStringToTimestamp("2015-01-02 00:00:00.123456789", "2015-01-02 00:00:00.123456")
checkStringToTimestamp("2015-01-02 00:00:00.100000009", "2015-01-02 00:00:00.1")
checkStringToTimestamp("2015-01-02 00:00:00.000050000", "2015-01-02 00:00:00.00005")
checkStringToTimestamp("2015-01-02 00:00:00.12005", "2015-01-02 00:00:00.12005")
checkStringToTimestamp("2015-01-02 00:00:00.100", "2015-01-02 00:00:00.1")
checkStringToTimestamp("2015-01-02 00:00:00.000456789", "2015-01-02 00:00:00.000456")
checkStringToTimestamp("1950-01-02 00:00:00.000456789", "1950-01-02 00:00:00.000456")
}
test("timestamp and us") {
val now = new Timestamp(System.currentTimeMillis())
now.setNanos(1000)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment