Skip to content
Snippets Groups Projects
Commit dbf3e298 authored by Shixiong Zhu's avatar Shixiong Zhu
Browse files

[SPARK-18764][CORE] Add a warning log when skipping a corrupted file

## What changes were proposed in this pull request?

It's better to add a warning log when skipping a corrupted file. It will be helpful when we want to finish the job first, then find them in the log and fix these files.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16192 from zsxwing/SPARK-18764.
parent f1fca81b
No related branches found
No related tags found
No related merge requests found
......@@ -259,7 +259,9 @@ class HadoopRDD[K, V](
try {
finished = !reader.next(key, value)
} catch {
case e: IOException if ignoreCorruptFiles => finished = true
case e: IOException if ignoreCorruptFiles =>
logWarning(s"Skipped the rest content in the corrupted file: ${split.inputSplit}", e)
finished = true
}
if (!finished) {
inputMetrics.incRecordsRead(1)
......
......@@ -189,7 +189,11 @@ class NewHadoopRDD[K, V](
try {
finished = !reader.nextKeyValue
} catch {
case e: IOException if ignoreCorruptFiles => finished = true
case e: IOException if ignoreCorruptFiles =>
logWarning(
s"Skipped the rest content in the corrupted file: ${split.serializableHadoopSplit}",
e)
finished = true
}
if (finished) {
// Close and release the reader here; close() will also be called when the task
......
......@@ -139,6 +139,7 @@ class FileScanRDD(
}
} catch {
case e: IOException =>
logWarning(s"Skipped the rest content in the corrupted file: $currentFile", e)
finished = true
null
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment