Commits · e54a37fe15a8fa8daec6c00fde4d191680b004c4 · cs525-sp18-g07 / spark

Nov 01, 2013
- Document all the URIs for addJar/addFile · e54a37fe
  Evan Chan authored 11 years ago
  
  e54a37fe
Oct 24, 2013

Add a `repartition` operator. · 08c1a42d

Patrick Wendell authored 11 years ago

This patch adds an operator called repartition with more straightforward
semantics than the current `coalesce` operator. There are a few use cases
where this operator is useful:

1. If a user wants to increase the number of partitions in the RDD. This
is more common now with streaming. E.g. a user is ingesting data on one
node but they want to add more partitions to ensure parallelism of
subsequent operations across threads or the cluster.

Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's
super confusing.

2. If a user has input data where the number of partitions is not known. E.g.

> sc.textFile("some file").coalesce(50)....

This is both vague semantically (am I growing or shrinking this RDD) but also,
may not work correctly if the base RDD has fewer than 50 partitions.

The new operator forces shuffles every time, so it will always produce exactly
the number of new partitions. It also throws an exception rather than silently
not-working if a bad input is passed.

I am currently adding streaming tests (requires refactoring some of the test
suite to allow testing at partition granularity), so this is not ready for
merge yet. But feedback is welcome.

08c1a42d

Oct 23, 2013
- Fixing broken links in programming guide · 4e093b88
  Patrick Wendell authored 11 years ago
  
  4e093b88
Oct 22, 2013
- Add notes to python documentation about using SparkContext.setSystemProperty. · c8748c25
  Ewen Cheslack-Postava authored 11 years ago
  
  c8748c25
- Docs: Fix links to RDD API documentation · 962bec97
  Aaron Davidson authored 11 years ago
  
  962bec97
Oct 19, 2013

Clarify compression property. · 6b628362

Patrick Wendell authored 11 years ago

Clarifies that this governs compression of internal data, not input
data or output data.

6b628362

Oct 17, 2013
- Code styling. Updated doc. · 35b2415f
  Mosharaf Chowdhury authored 11 years ago
  
  35b2415f
Oct 10, 2013
- Minor clarification and cleanup to spark-standalone.md · 66c20635
  Aaron Davidson authored 11 years ago
  
  66c20635
- Address Matei's comments on documentation · 42d8b8ef
  Aaron Davidson authored 11 years ago
  
  Updates to the documentation and changing some logError()s to logWarning()s.
  42d8b8ef
Oct 09, 2013
- Fix PySpark docs and an overly long line of code after fdbae41e · 478b2b7e
  Matei Zaharia authored 11 years ago
  
  478b2b7e
Oct 08, 2013
- Add docs for standalone scheduler fault tolerance · 4ea8ee46
  Aaron Davidson authored 11 years ago
  
  Also fix a couple HTML/Markdown issues in other files.
  4ea8ee46
Oct 06, 2013
- Merging build changes in from 0.8 · aa9fb849
  Patrick Wendell authored 11 years ago
  
  aa9fb849
Oct 04, 2013
- Adding implicit feedback ALS to MLlib user guide · 93b96b44
  Nick Pentreath authored 11 years ago
  
  93b96b44
Oct 03, 2013
- Adding in the --addJars option to make SparkContext.addJar work on yarn and cleanup · 0fff4ee8
  tgravescs authored 11 years ago
  
  the classpaths
  0fff4ee8
Oct 02, 2013
- Allow users to set the application name for Spark on Yarn · bc3b20ab
  tgravescs authored 11 years ago
  
  bc3b20ab
Sep 24, 2013
- Update build version in master · 6079721f
  Patrick Wendell authored 11 years ago
  
  6079721f
Sep 23, 2013
- $Y.CORP.YAHOO.COM\tgraves's avatar$
  Support distributed cache files and archives on spark on yarn and attempt to... · 9d424686
  Y.CORP.YAHOO.COM\tgraves authored 11 years ago
  
  Support distributed cache files and archives on spark on yarn and attempt to cleanup the staging directory on exit
  9d424686
Sep 15, 2013
- Fix typo in Maven build docs · ac0dd993
  Jey Kottalam authored 11 years ago
  
  ac0dd993
- Bumping Mesos version to 0.13.0 · c856860c
  Patrick Wendell authored 11 years ago
  
  c856860c
- Explain yarn.version in Maven build docs · 362ea0c0
  Patrick Wendell authored 11 years ago
  
  362ea0c0
Sep 11, 2013
- More updates to Spark on Mesos documentation. · 8e2602dd
  Benjamin Hindman authored 11 years ago
  
  8e2602dd
- Updated Spark on Mesos documentation. · a0f0c1be
  Benjamin Hindman authored 11 years ago
  
  a0f0c1be
- Change port from 3030 to 4040 · bddf1356
  Patrick Wendell authored 11 years ago
  
  bddf1356
Sep 10, 2013
- Update Python API features · 2425eb85
  Matei Zaharia authored 11 years ago
  
  2425eb85
Sep 09, 2013
- Document fortran dependency for MLBase · cefee1ed
  Patrick Wendell authored 11 years ago
  
  cefee1ed
Sep 08, 2013
- Small tweaks to MLlib docs · 7a5c4b64
  Matei Zaharia authored 11 years ago
  
  7a5c4b64
- Fix some review comments · b4588549
  Matei Zaharia authored 11 years ago
  
  b4588549
- respose to PR comments · 81a8bd46
  Ameet Talwalkar authored 11 years ago
  
  81a8bd46
- updates based on comments to PR · 5ac62dbb
  Ameet Talwalkar authored 11 years ago
  
  5ac62dbb
- Updated cluster diagram to show caches · 5a587fb9
  Matei Zaharia authored 11 years ago
  
  5a587fb9
- Adding more docs and some code cleanup · c190b48b
  Patrick Wendell authored 11 years ago
  
  c190b48b
- Review comments · af8ffdb7
  Matei Zaharia authored 11 years ago
  
  af8ffdb7
- Some tweaks to CDH/HDP doc · c0d37510
  Matei Zaharia authored 11 years ago
  
  c0d37510
- Added cluster overview doc, made logo higher-resolution, and added more · f261d2a6
  Matei Zaharia authored 11 years ago
  
  details on monitoring
  f261d2a6
- More fair scheduler docs and property names. · 651a96ad
  Matei Zaharia authored 11 years ago
  
  Also changed uses of "job" terminology to "application" when they referred to an entire Spark program, to avoid confusion.
  651a96ad
- Work in progress: · 98fb6982
  Matei Zaharia authored 11 years ago
  
  - Add job scheduling docs - Rename some fair scheduler properties - Organize intro page better - Link to Apache wiki for "contributing to Spark"
  98fb6982
Sep 07, 2013
- File rename · 22b982d2
  Patrick Wendell authored 11 years ago
  
  22b982d2
- Changes based on feedback · 61c4762d
  Patrick Wendell authored 11 years ago
  
  61c4762d
- CR feedback from Matei · be1ee28c
  Evan Chan authored 11 years ago
  
  be1ee28c
Sep 06, 2013
- Add references to make-distribution.sh · ff1dbf21
  Evan Chan authored 11 years ago
  
  ff1dbf21