Skip to content
Snippets Groups Projects
Commit ab4a6bfd authored by Rajesh Balamohan's avatar Rajesh Balamohan Committed by Reynold Xin
Browse files

[SPARK-12898] Consider having dummyCallSite for HiveTableScan

Currently, HiveTableScan runs with getCallSite which is really expensive and shows up when scanning through large table with partitions (e.g TPC-DS) which slows down the overall runtime of the job. It would be good to consider having dummyCallSite in HiveTableScan.

Author: Rajesh Balamohan <rbalamohan@apache.org>

Closes #10825 from rajeshbalamohan/SPARK-12898.
parent e75e340a
No related branches found
No related tags found
No related merge requests found
...@@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.expressions._ ...@@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.execution._ import org.apache.spark.sql.execution._
import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive._
import org.apache.spark.sql.types.{BooleanType, DataType} import org.apache.spark.sql.types.{BooleanType, DataType}
import org.apache.spark.util.Utils
/** /**
* The Hive table scan operator. Column and partition pruning are both handled. * The Hive table scan operator. Column and partition pruning are both handled.
...@@ -133,11 +134,17 @@ case class HiveTableScan( ...@@ -133,11 +134,17 @@ case class HiveTableScan(
} }
protected override def doExecute(): RDD[InternalRow] = { protected override def doExecute(): RDD[InternalRow] = {
// Using dummyCallSite, as getCallSite can turn out to be expensive with
// with multiple partitions.
val rdd = if (!relation.hiveQlTable.isPartitioned) { val rdd = if (!relation.hiveQlTable.isPartitioned) {
hadoopReader.makeRDDForTable(relation.hiveQlTable) Utils.withDummyCallSite(sqlContext.sparkContext) {
hadoopReader.makeRDDForTable(relation.hiveQlTable)
}
} else { } else {
hadoopReader.makeRDDForPartitionedTable( Utils.withDummyCallSite(sqlContext.sparkContext) {
prunePartitions(relation.getHiveQlPartitions(partitionPruningPred))) hadoopReader.makeRDDForPartitionedTable(
prunePartitions(relation.getHiveQlPartitions(partitionPruningPred)))
}
} }
rdd.mapPartitionsInternal { iter => rdd.mapPartitionsInternal { iter =>
val proj = UnsafeProjection.create(schema) val proj = UnsafeProjection.create(schema)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment