Skip to content
Snippets Groups Projects
Commit 03e8dc68 authored by Tathagata Das's avatar Tathagata Das
Browse files

Changes functions comments to make them more consistent.

parent 12b020b6
No related branches found
No related tags found
No related merge requests found
......@@ -26,7 +26,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `groupByKey` to each RDD. Hash partitioning is used to
* Return a new DStream by applying `groupByKey` to each RDD. Hash partitioning is used to
* generate the RDDs with Spark's default number of partitions.
*/
def groupByKey(): DStream[(K, Seq[V])] = {
......@@ -34,7 +34,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `groupByKey` to each RDD. Hash partitioning is used to
* Return a new DStream by applying `groupByKey` to each RDD. Hash partitioning is used to
* generate the RDDs with `numPartitions` partitions.
*/
def groupByKey(numPartitions: Int): DStream[(K, Seq[V])] = {
......@@ -42,7 +42,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `groupByKey` on each RDD. The supplied [[spark.Partitioner]]
* Return a new DStream by applying `groupByKey` on each RDD. The supplied [[spark.Partitioner]]
* is used to control the partitioning of each RDD.
*/
def groupByKey(partitioner: Partitioner): DStream[(K, Seq[V])] = {
......@@ -54,7 +54,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* merged using the associative reduce function. Hash partitioning is used to generate the RDDs
* with Spark's default number of partitions.
*/
......@@ -63,7 +63,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* merged using the supplied reduce function. Hash partitioning is used to generate the RDDs
* with `numPartitions` partitions.
*/
......@@ -72,7 +72,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* merged using the supplied reduce function. [[spark.Partitioner]] is used to control the
* partitioning of each RDD.
*/
......@@ -82,7 +82,7 @@ extends Serializable {
}
/**
* Combine elements of each key in DStream's RDDs using custom function. This is similar to the
* Combine elements of each key in DStream's RDDs using custom functions. This is similar to the
* combineByKey for RDDs. Please refer to combineByKey in [[spark.PairRDDFunctions]] for more
* information.
*/
......@@ -95,7 +95,7 @@ extends Serializable {
}
/**
* Create a new DStream by counting the number of values of each key in each RDD. Hash
* Return a new DStream by counting the number of values of each key in each RDD. Hash
* partitioning is used to generate the RDDs with Spark's `numPartitions` partitions.
*/
def countByKey(numPartitions: Int = self.ssc.sc.defaultParallelism): DStream[(K, Long)] = {
......@@ -103,7 +103,7 @@ extends Serializable {
}
/**
* Creates a new DStream by applying `groupByKey` over a sliding window. This is similar to
* Return a new DStream by applying `groupByKey` over a sliding window. This is similar to
* `DStream.groupByKey()` but applies it over a sliding window. The new DStream generates RDDs
* with the same interval as this DStream. Hash partitioning is used to generate the RDDs with
* Spark's default number of partitions.
......@@ -115,7 +115,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `groupByKey` over a sliding window. Similar to
* Return a new DStream by applying `groupByKey` over a sliding window. Similar to
* `DStream.groupByKey()`, but applies it over a sliding window. Hash partitioning is used to
* generate the RDDs with Spark's default number of partitions.
* @param windowDuration width of the window; must be a multiple of this DStream's
......@@ -129,7 +129,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `groupByKey` over a sliding window on `this` DStream.
* Return a new DStream by applying `groupByKey` over a sliding window on `this` DStream.
* Similar to `DStream.groupByKey()`, but applies it over a sliding window.
* Hash partitioning is used to generate the RDDs with `numPartitions` partitions.
* @param windowDuration width of the window; must be a multiple of this DStream's
......@@ -167,7 +167,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `reduceByKey` over a sliding window on `this` DStream.
* Return a new DStream by applying `reduceByKey` over a sliding window on `this` DStream.
* Similar to `DStream.reduceByKey()`, but applies it over a sliding window. The new DStream
* generates RDDs with the same interval as this DStream. Hash partitioning is used to generate
* the RDDs with Spark's default number of partitions.
......@@ -183,7 +183,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `reduceByKey` over a sliding window. This is similar to
* Return a new DStream by applying `reduceByKey` over a sliding window. This is similar to
* `DStream.reduceByKey()` but applies it over a sliding window. Hash partitioning is used to
* generate the RDDs with Spark's default number of partitions.
* @param reduceFunc associative reduce function
......@@ -202,7 +202,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `reduceByKey` over a sliding window. This is similar to
* Return a new DStream by applying `reduceByKey` over a sliding window. This is similar to
* `DStream.reduceByKey()` but applies it over a sliding window. Hash partitioning is used to
* generate the RDDs with `numPartitions` partitions.
* @param reduceFunc associative reduce function
......@@ -223,7 +223,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying `reduceByKey` over a sliding window. Similar to
* Return a new DStream by applying `reduceByKey` over a sliding window. Similar to
* `DStream.reduceByKey()`, but applies it over a sliding window.
* @param reduceFunc associative reduce function
* @param windowDuration width of the window; must be a multiple of this DStream's
......@@ -247,7 +247,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying incremental `reduceByKey` over a sliding window.
* Return a new DStream by applying incremental `reduceByKey` over a sliding window.
* The reduced value of over a new window is calculated using the old window's reduce value :
* 1. reduce the new values that entered the window (e.g., adding new counts)
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
......@@ -280,7 +280,7 @@ extends Serializable {
}
/**
* Create a new DStream by applying incremental `reduceByKey` over a sliding window.
* Return a new DStream by applying incremental `reduceByKey` over a sliding window.
* The reduced value of over a new window is calculated using the old window's reduce value :
* 1. reduce the new values that entered the window (e.g., adding new counts)
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
......@@ -316,7 +316,7 @@ extends Serializable {
}
/**
* Create a new DStream by counting the number of values for each key over a window.
* Return a new DStream by counting the number of values for each key over a window.
* Hash partitioning is used to generate the RDDs with `numPartitions` partitions.
* @param windowDuration width of the window; must be a multiple of this DStream's
* batching interval
......@@ -341,7 +341,7 @@ extends Serializable {
}
/**
* Create a new "state" DStream where the state for each key is updated by applying
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of each key.
* Hash partitioning is used to generate the RDDs with Spark's default number of partitions.
* @param updateFunc State update function. If `this` function returns None, then
......@@ -355,7 +355,7 @@ extends Serializable {
}
/**
* Create a new "state" DStream where the state for each key is updated by applying
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of each key.
* Hash partitioning is used to generate the RDDs with `numPartitions` partitions.
* @param updateFunc State update function. If `this` function returns None, then
......@@ -390,7 +390,7 @@ extends Serializable {
}
/**
* Create a new "state" DStream where the state for each key is updated by applying
* Return a new "state" DStream where the state for each key is updated by applying
* the given function on the previous state of the key and the new values of each key.
* [[spark.Paxrtitioner]] is used to control the partitioning of each RDD.
* @param updateFunc State update function. If `this` function returns None, then
......
......@@ -25,17 +25,17 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
// Methods common to all DStream's
// =======================================================================
/** Returns a new DStream containing only the elements that satisfy a predicate. */
/** Return a new DStream containing only the elements that satisfy a predicate. */
def filter(f: JFunction[(K, V), java.lang.Boolean]): JavaPairDStream[K, V] =
dstream.filter((x => f(x).booleanValue()))
/** Persists RDDs of this DStream with the default storage level (MEMORY_ONLY_SER) */
/** Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER) */
def cache(): JavaPairDStream[K, V] = dstream.cache()
/** Persists RDDs of this DStream with the default storage level (MEMORY_ONLY_SER) */
/** Persist RDDs of this DStream with the default storage level (MEMORY_ONLY_SER) */
def persist(): JavaPairDStream[K, V] = dstream.cache()
/** Persists the RDDs of this DStream with the given storage level */
/** Persist the RDDs of this DStream with the given storage level */
def persist(storageLevel: StorageLevel): JavaPairDStream[K, V] = dstream.persist(storageLevel)
/** Method that generates a RDD for the given Duration */
......@@ -67,7 +67,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
dstream.window(windowDuration, slideDuration)
/**
* Returns a new DStream which computed based on tumbling window on this DStream.
* Return a new DStream which computed based on tumbling window on this DStream.
* This is equivalent to window(batchDuration, batchDuration).
* @param batchDuration tumbling window duration; must be a multiple of this DStream's interval
*/
......@@ -75,7 +75,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
dstream.tumble(batchDuration)
/**
* Returns a new DStream by unifying data of another DStream with this DStream.
* Return a new DStream by unifying data of another DStream with this DStream.
* @param that Another DStream having the same interval (i.e., slideDuration) as this DStream.
*/
def union(that: JavaPairDStream[K, V]): JavaPairDStream[K, V] =
......@@ -86,21 +86,21 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
// =======================================================================
/**
* Create a new DStream by applying `groupByKey` to each RDD. Hash partitioning is used to
* Return a new DStream by applying `groupByKey` to each RDD. Hash partitioning is used to
* generate the RDDs with Spark's default number of partitions.
*/
def groupByKey(): JavaPairDStream[K, JList[V]] =
dstream.groupByKey().mapValues(seqAsJavaList _)
/**
* Create a new DStream by applying `groupByKey` to each RDD. Hash partitioning is used to
* Return a new DStream by applying `groupByKey` to each RDD. Hash partitioning is used to
* generate the RDDs with `numPartitions` partitions.
*/
def groupByKey(numPartitions: Int): JavaPairDStream[K, JList[V]] =
dstream.groupByKey(numPartitions).mapValues(seqAsJavaList _)
/**
* Creates a new DStream by applying `groupByKey` on each RDD of `this` DStream.
* Return a new DStream by applying `groupByKey` on each RDD of `this` DStream.
* Therefore, the values for each key in `this` DStream's RDDs are grouped into a
* single sequence to generate the RDDs of the new DStream. [[spark.Partitioner]]
* is used to control the partitioning of each RDD.
......@@ -109,7 +109,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
dstream.groupByKey(partitioner).mapValues(seqAsJavaList _)
/**
* Create a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* merged using the associative reduce function. Hash partitioning is used to generate the RDDs
* with Spark's default number of partitions.
*/
......@@ -117,7 +117,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
dstream.reduceByKey(func)
/**
* Create a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* merged using the supplied reduce function. Hash partitioning is used to generate the RDDs
* with `numPartitions` partitions.
*/
......@@ -125,7 +125,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
dstream.reduceByKey(func, numPartitions)
/**
* Create a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
* merged using the supplied reduce function. [[spark.Partitioner]] is used to control the
* partitioning of each RDD.
*/
......@@ -149,7 +149,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by counting the number of values of each key in each RDD. Hash
* Return a new DStream by counting the number of values of each key in each RDD. Hash
* partitioning is used to generate the RDDs with Spark's `numPartitions` partitions.
*/
def countByKey(numPartitions: Int): JavaPairDStream[K, JLong] = {
......@@ -158,7 +158,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
/**
* Create a new DStream by counting the number of values of each key in each RDD. Hash
* Return a new DStream by counting the number of values of each key in each RDD. Hash
* partitioning is used to generate the RDDs with the default number of partitions.
*/
def countByKey(): JavaPairDStream[K, JLong] = {
......@@ -166,7 +166,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Creates a new DStream by applying `groupByKey` over a sliding window. This is similar to
* Return a new DStream by applying `groupByKey` over a sliding window. This is similar to
* `DStream.groupByKey()` but applies it over a sliding window. The new DStream generates RDDs
* with the same interval as this DStream. Hash partitioning is used to generate the RDDs with
* Spark's default number of partitions.
......@@ -178,7 +178,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by applying `groupByKey` over a sliding window. Similar to
* Return a new DStream by applying `groupByKey` over a sliding window. Similar to
* `DStream.groupByKey()`, but applies it over a sliding window. Hash partitioning is used to
* generate the RDDs with Spark's default number of partitions.
* @param windowDuration width of the window; must be a multiple of this DStream's
......@@ -193,7 +193,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by applying `groupByKey` over a sliding window on `this` DStream.
* Return a new DStream by applying `groupByKey` over a sliding window on `this` DStream.
* Similar to `DStream.groupByKey()`, but applies it over a sliding window.
* Hash partitioning is used to generate the RDDs with `numPartitions` partitions.
* @param windowDuration width of the window; must be a multiple of this DStream's
......@@ -210,7 +210,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by applying `groupByKey` over a sliding window on `this` DStream.
* Return a new DStream by applying `groupByKey` over a sliding window on `this` DStream.
* Similar to `DStream.groupByKey()`, but applies it over a sliding window.
* @param windowDuration width of the window; must be a multiple of this DStream's
* batching interval
......@@ -243,7 +243,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by applying `reduceByKey` over a sliding window. This is similar to
* Return a new DStream by applying `reduceByKey` over a sliding window. This is similar to
* `DStream.reduceByKey()` but applies it over a sliding window. Hash partitioning is used to
* generate the RDDs with Spark's default number of partitions.
* @param reduceFunc associative reduce function
......@@ -262,7 +262,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by applying `reduceByKey` over a sliding window. This is similar to
* Return a new DStream by applying `reduceByKey` over a sliding window. This is similar to
* `DStream.reduceByKey()` but applies it over a sliding window. Hash partitioning is used to
* generate the RDDs with `numPartitions` partitions.
* @param reduceFunc associative reduce function
......@@ -283,7 +283,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by applying `reduceByKey` over a sliding window. Similar to
* Return a new DStream by applying `reduceByKey` over a sliding window. Similar to
* `DStream.reduceByKey()`, but applies it over a sliding window.
* @param reduceFunc associative reduce function
* @param windowDuration width of the window; must be a multiple of this DStream's
......@@ -303,7 +303,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by reducing over a using incremental computation.
* Return a new DStream by reducing over a using incremental computation.
* The reduced value of over a new window is calculated using the old window's reduce value :
* 1. reduce the new values that entered the window (e.g., adding new counts)
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
......@@ -328,7 +328,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by applying incremental `reduceByKey` over a sliding window.
* Return a new DStream by applying incremental `reduceByKey` over a sliding window.
* The reduced value of over a new window is calculated using the old window's reduce value :
* 1. reduce the new values that entered the window (e.g., adding new counts)
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
......@@ -366,7 +366,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}
/**
* Create a new DStream by applying incremental `reduceByKey` over a sliding window.
* Return a new DStream by applying incremental `reduceByKey` over a sliding window.
* The reduced value of over a new window is calculated using the old window's reduce value :
* 1. reduce the new values that entered the window (e.g., adding new counts)
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment