Skip to content
Snippets Groups Projects
Commit dfd9723d authored by Sandeep Singh's avatar Sandeep Singh Committed by Sean Owen
Browse files

[MINOR][DOCS] Fix type Information in Quick Start and Programming Guide

Author: Sandeep Singh <sandeep@techaddict.me>

Closes #12841 from techaddict/improve_docs_1.
parent f10ae4b1
No related branches found
No related tags found
No related merge requests found
...@@ -328,7 +328,7 @@ Text file RDDs can be created using `SparkContext`'s `textFile` method. This met ...@@ -328,7 +328,7 @@ Text file RDDs can be created using `SparkContext`'s `textFile` method. This met
{% highlight scala %} {% highlight scala %}
scala> val distFile = sc.textFile("data.txt") scala> val distFile = sc.textFile("data.txt")
distFile: RDD[String] = MappedRDD@1d4cee08 distFile: org.apache.spark.rdd.RDD[String] = data.txt MapPartitionsRDD[10] at textFile at <console>:26
{% endhighlight %} {% endhighlight %}
Once created, `distFile` can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the `map` and `reduce` operations as follows: `distFile.map(s => s.length).reduce((a, b) => a + b)`. Once created, `distFile` can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the `map` and `reduce` operations as follows: `distFile.map(s => s.length).reduce((a, b) => a + b)`.
......
...@@ -33,7 +33,7 @@ Spark's primary abstraction is a distributed collection of items called a Resili ...@@ -33,7 +33,7 @@ Spark's primary abstraction is a distributed collection of items called a Resili
{% highlight scala %} {% highlight scala %}
scala> val textFile = sc.textFile("README.md") scala> val textFile = sc.textFile("README.md")
textFile: spark.RDD[String] = spark.MappedRDD@2ee9b6e3 textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:25
{% endhighlight %} {% endhighlight %}
RDDs have _[actions](programming-guide.html#actions)_, which return values, and _[transformations](programming-guide.html#transformations)_, which return pointers to new RDDs. Let's start with a few actions: RDDs have _[actions](programming-guide.html#actions)_, which return values, and _[transformations](programming-guide.html#transformations)_, which return pointers to new RDDs. Let's start with a few actions:
...@@ -50,7 +50,7 @@ Now let's use a transformation. We will use the [`filter`](programming-guide.htm ...@@ -50,7 +50,7 @@ Now let's use a transformation. We will use the [`filter`](programming-guide.htm
{% highlight scala %} {% highlight scala %}
scala> val linesWithSpark = textFile.filter(line => line.contains("Spark")) scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
linesWithSpark: spark.RDD[String] = spark.FilteredRDD@7dd4af09 linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:27
{% endhighlight %} {% endhighlight %}
We can chain together transformations and actions: We can chain together transformations and actions:
...@@ -123,7 +123,7 @@ One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can i ...@@ -123,7 +123,7 @@ One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can i
{% highlight scala %} {% highlight scala %}
scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCounts: spark.RDD[(String, Int)] = spark.ShuffledAggregatedRDD@71f027b8 wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:28
{% endhighlight %} {% endhighlight %}
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action: Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
...@@ -181,7 +181,7 @@ Spark also supports pulling data sets into a cluster-wide in-memory cache. This ...@@ -181,7 +181,7 @@ Spark also supports pulling data sets into a cluster-wide in-memory cache. This
{% highlight scala %} {% highlight scala %}
scala> linesWithSpark.cache() scala> linesWithSpark.cache()
res7: spark.RDD[String] = spark.FilteredRDD@17e51082 res7: linesWithSpark.type = MapPartitionsRDD[2] at filter at <console>:27
scala> linesWithSpark.count() scala> linesWithSpark.count()
res8: Long = 19 res8: Long = 19
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment