From 20adf9aa1f42353432d356117e655e799ea1290b Mon Sep 17 00:00:00 2001 From: John O'Leary <jgoleary@gmail.com> Date: Mon, 25 Sep 2017 09:16:27 +0900 Subject: [PATCH] [SPARK-22107] Change as to alias in python quickstart ## What changes were proposed in this pull request? Updated docs so that a line of python in the quick start guide executes. Closes #19283 ## How was this patch tested? Existing tests. Author: John O'Leary <jgoleary@gmail.com> Closes #19326 from jgoleary/issues/22107. --- docs/quick-start.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/quick-start.md b/docs/quick-start.md index a85e5b28a6..200b97230e 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -153,7 +153,7 @@ This first maps a line to an integer value and aliases it as "numWords", creatin One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can implement MapReduce flows easily: {% highlight python %} ->>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).as("word")).groupBy("word").count() +>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).alias("word")).groupBy("word").count() {% endhighlight %} Here, we use the `explode` function in `select`, to transfrom a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`: -- GitLab