Skip to content
Snippets Groups Projects
Commit 472f1618 authored by Reynold Xin's avatar Reynold Xin Committed by Yin Huai
Browse files

[SPARK-15636][SQL] Make aggregate expressions more concise in explain

## What changes were proposed in this pull request?
This patch reduces the verbosity of aggregate expressions in explain (but does not actually remove any information). As an example, for the following command:
```
spark.range(10).selectExpr("sum(id) + 1", "count(distinct id)").explain(true)
```

Output before this patch:
```
== Physical Plan ==
*TungstenAggregate(key=[], functions=[(sum(id#0L),mode=Final,isDistinct=false),(count(id#0L),mode=Final,isDistinct=true)], output=[(sum(id) + 1)#3L,count(DISTINCT id)#16L])
+- Exchange SinglePartition, None
   +- *TungstenAggregate(key=[], functions=[(sum(id#0L),mode=PartialMerge,isDistinct=false),(count(id#0L),mode=Partial,isDistinct=true)], output=[sum#18L,count#21L])
      +- *TungstenAggregate(key=[id#0L], functions=[(sum(id#0L),mode=PartialMerge,isDistinct=false)], output=[id#0L,sum#18L])
         +- Exchange hashpartitioning(id#0L, 5), None
            +- *TungstenAggregate(key=[id#0L], functions=[(sum(id#0L),mode=Partial,isDistinct=false)], output=[id#0L,sum#18L])
               +- *Range (0, 10, splits=2)
```

Output after this patch:
```
== Physical Plan ==
*TungstenAggregate(key=[], functions=[sum(id#0L),count(distinct id#0L)], output=[(sum(id) + 1)#3L,count(DISTINCT id)#16L])
+- Exchange SinglePartition, None
   +- *TungstenAggregate(key=[], functions=[merge_sum(id#0L),partial_count(distinct id#0L)], output=[sum#18L,count#21L])
      +- *TungstenAggregate(key=[id#0L], functions=[merge_sum(id#0L)], output=[id#0L,sum#18L])
         +- Exchange hashpartitioning(id#0L, 5), None
            +- *TungstenAggregate(key=[id#0L], functions=[partial_sum(id#0L)], output=[id#0L,sum#18L])
               +- *Range (0, 10, splits=2)
```

Note the change from `(sum(id#0L),mode=PartialMerge,isDistinct=false)` to `merge_sum(id#0L)`.

In general aggregate explain is still very verbose, but further work will be done as follow-up pull requests.

## How was this patch tested?
Tested manually.

Author: Reynold Xin <rxin@databricks.com>

Closes #13367 from rxin/SPARK-15636.
parent 74c1b79f
No related branches found
No related tags found
No related merge requests found
......@@ -185,7 +185,7 @@ abstract class Expression extends TreeNode[Expression] {
*/
def prettyName: String = nodeName.toLowerCase
private def flatArguments = productIterator.flatMap {
protected def flatArguments = productIterator.flatMap {
case t: Traversable[_] => t
case single => single :: Nil
}
......
......@@ -126,7 +126,14 @@ private[sql] case class AggregateExpression(
AttributeSet(childReferences)
}
override def toString: String = s"($aggregateFunction,mode=$mode,isDistinct=$isDistinct)"
override def toString: String = {
val prefix = mode match {
case Partial => "partial_"
case PartialMerge => "merge_"
case Final | Complete => ""
}
prefix + aggregateFunction.toAggString(isDistinct)
}
override def sql: String = aggregateFunction.sql(isDistinct)
}
......@@ -203,6 +210,12 @@ sealed abstract class AggregateFunction extends Expression with ImplicitCastInpu
val distinct = if (isDistinct) "DISTINCT " else ""
s"$prettyName($distinct${children.map(_.sql).mkString(", ")})"
}
/** String representation used in explain plans. */
def toAggString(isDistinct: Boolean): String = {
val start = if (isDistinct) "(distinct " else "("
prettyName + flatArguments.mkString(start, ", ", ")")
}
}
/**
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment