Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
spark
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
cs525-sp18-g07
spark
Commits
c58d4ea3
Commit
c58d4ea3
authored
11 years ago
by
Patrick Wendell
Browse files
Options
Downloads
Patches
Plain Diff
Response to Matei's review
parent
71010178
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
core/src/main/scala/org/apache/spark/SparkContext.scala
+14
-13
14 additions, 13 deletions
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
+8
-8
8 additions, 8 deletions
...in/scala/org/apache/spark/api/java/JavaSparkContext.scala
with
22 additions
and
21 deletions
core/src/main/scala/org/apache/spark/SparkContext.scala
+
14
−
13
View file @
c58d4ea3
...
@@ -355,7 +355,7 @@ class SparkContext(
...
@@ -355,7 +355,7 @@ class SparkContext(
* @param valueClass Class of the values
* @param valueClass Class of the values
* @param minSplits Minimum number of Hadoop Splits to generate.
* @param minSplits Minimum number of Hadoop Splits to generate.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -374,7 +374,7 @@ class SparkContext(
...
@@ -374,7 +374,7 @@ class SparkContext(
/** Get an RDD for a Hadoop file with an arbitrary InputFormat
/** Get an RDD for a Hadoop file with an arbitrary InputFormat
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -407,7 +407,7 @@ class SparkContext(
...
@@ -407,7 +407,7 @@ class SparkContext(
* val file = sparkContext.hadoopFile[LongWritable, Text, TextInputFormat](path, minSplits)
* val file = sparkContext.hadoopFile[LongWritable, Text, TextInputFormat](path, minSplits)
* }}}
* }}}
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -428,8 +428,9 @@ class SparkContext(
...
@@ -428,8 +428,9 @@ class SparkContext(
* can just write, for example,
* can just write, for example,
* {{{
* {{{
* val file = sparkContext.hadoopFile[LongWritable, Text, TextInputFormat](path)
* val file = sparkContext.hadoopFile[LongWritable, Text, TextInputFormat](path)
* }}}
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -453,7 +454,7 @@ class SparkContext(
...
@@ -453,7 +454,7 @@ class SparkContext(
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
* and extra configuration options to pass to the input format.
* and extra configuration options to pass to the input format.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -474,7 +475,7 @@ class SparkContext(
...
@@ -474,7 +475,7 @@ class SparkContext(
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
* and extra configuration options to pass to the input format.
* and extra configuration options to pass to the input format.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -489,12 +490,12 @@ class SparkContext(
...
@@ -489,12 +490,12 @@ class SparkContext(
/** Get an RDD for a Hadoop SequenceFile with given key and value types.
/** Get an RDD for a Hadoop SequenceFile with given key and value types.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
*
*/
*/
def
sequenceFile
[
K
:
ClassTag
,
V:
ClassTag
](
path
:
String
,
def
sequenceFile
[
K
,
V
](
path
:
String
,
keyClass
:
Class
[
K
],
keyClass
:
Class
[
K
],
valueClass
:
Class
[
V
],
valueClass
:
Class
[
V
],
minSplits
:
Int
minSplits
:
Int
...
@@ -505,12 +506,12 @@ class SparkContext(
...
@@ -505,12 +506,12 @@ class SparkContext(
/** Get an RDD for a Hadoop SequenceFile with given key and value types.
/** Get an RDD for a Hadoop SequenceFile with given key and value types.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
* */
* */
def
sequenceFile
[
K
:
ClassTag
,
V:
ClassTag
](
path
:
String
,
keyClass
:
Class
[
K
],
valueClass
:
Class
[
V
]
def
sequenceFile
[
K
,
V
](
path
:
String
,
keyClass
:
Class
[
K
],
valueClass
:
Class
[
V
]
)
:
RDD
[(
K
,
V
)]
=
)
:
RDD
[(
K
,
V
)]
=
sequenceFile
(
path
,
keyClass
,
valueClass
,
defaultMinSplits
)
sequenceFile
(
path
,
keyClass
,
valueClass
,
defaultMinSplits
)
...
@@ -530,7 +531,7 @@ class SparkContext(
...
@@ -530,7 +531,7 @@ class SparkContext(
* for the appropriate type. In addition, we pass the converter a ClassTag of its type to
* for the appropriate type. In addition, we pass the converter a ClassTag of its type to
* allow it to figure out the Writable class to use in the subclass case.
* allow it to figure out the Writable class to use in the subclass case.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -1058,7 +1059,7 @@ object SparkContext {
...
@@ -1058,7 +1059,7 @@ object SparkContext {
implicit
def
rddToAsyncRDDActions
[
T:
ClassTag
](
rdd
:
RDD
[
T
])
=
new
AsyncRDDActions
(
rdd
)
implicit
def
rddToAsyncRDDActions
[
T:
ClassTag
](
rdd
:
RDD
[
T
])
=
new
AsyncRDDActions
(
rdd
)
implicit
def
rddToSequenceFileRDDFunctions
[
K
<%
Writable:
ClassTag
,
V
<%
Writable:
ClassTag
](
implicit
def
rddToSequenceFileRDDFunctions
[
K
<%
Writable:
ClassTag
,
V
<%
Writable:
ClassTag
](
rdd
:
RDD
[(
K
,
V
)])
=
rdd
:
RDD
[(
K
,
V
)])
=
new
SequenceFileRDDFunctions
(
rdd
)
new
SequenceFileRDDFunctions
(
rdd
)
implicit
def
rddToOrderedRDDFunctions
[
K
<%
Ordered
[
K
]
:
ClassTag
,
V:
ClassTag
](
implicit
def
rddToOrderedRDDFunctions
[
K
<%
Ordered
[
K
]
:
ClassTag
,
V:
ClassTag
](
...
...
This diff is collapsed.
Click to expand it.
core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
+
8
−
8
View file @
c58d4ea3
...
@@ -139,7 +139,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
...
@@ -139,7 +139,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
/** Get an RDD for a Hadoop SequenceFile with given key and value types.
/** Get an RDD for a Hadoop SequenceFile with given key and value types.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -156,7 +156,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
...
@@ -156,7 +156,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
/** Get an RDD for a Hadoop SequenceFile.
/** Get an RDD for a Hadoop SequenceFile.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -197,7 +197,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
...
@@ -197,7 +197,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
* other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
* other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
* etc).
* etc).
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -218,7 +218,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
...
@@ -218,7 +218,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
* Get an RDD for a Hadoop-readable dataset from a Hadooop JobConf giving its InputFormat and any
* Get an RDD for a Hadoop-readable dataset from a Hadooop JobConf giving its InputFormat and any
* other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
* other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -236,7 +236,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
...
@@ -236,7 +236,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
/** Get an RDD for a Hadoop file with an arbitrary InputFormat.
/** Get an RDD for a Hadoop file with an arbitrary InputFormat.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -255,7 +255,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
...
@@ -255,7 +255,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
/** Get an RDD for a Hadoop file with an arbitrary InputFormat
/** Get an RDD for a Hadoop file with an arbitrary InputFormat
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -276,7 +276,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
...
@@ -276,7 +276,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
* and extra configuration options to pass to the input format.
* and extra configuration options to pass to the input format.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
@@ -296,7 +296,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
...
@@ -296,7 +296,7 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
* Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
* and extra configuration options to pass to the input format.
* and extra configuration options to pass to the input format.
*
*
* Note: Because Hadoop's RecordReader class re-uses the same Writable object for each
*
'''
Note:
'''
Because Hadoop's RecordReader class re-uses the same Writable object for each
* record, directly caching the returned RDD will create many references to the same object.
* record, directly caching the returned RDD will create many references to the same object.
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* If you plan to directly cache Hadoop writable objects, you should first copy them using
* a `map` function.
* a `map` function.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment