Commits · 7be75682b931dd52014f3cfdc6887e54583ad0af · cs525-sp18-g07 / spark

Oct 08, 2013

Merge branch 'master' into wip-merge-master · 7be75682

Prashant Sharma authored 11 years ago

Conflicts:
	bagel/pom.xml
	core/pom.xml
	core/src/test/scala/org/apache/spark/ui/UISuite.scala
	examples/pom.xml
	mllib/pom.xml
	pom.xml
	project/SparkBuild.scala
	repl/pom.xml
	streaming/pom.xml
	tools/pom.xml

In scala 2.10, a shorter representation is used for naming artifacts
 so changed to shorter scala version for artifacts and made it a property in pom.

7be75682

Oct 07, 2013

Merge pull request #42 from pwendell/shuffle-read-perf · ea34c521

Reynold Xin authored 11 years ago

Fix inconsistent and incorrect log messages in shuffle read path

The user-facing messages generated by the CacheManager are currently wrong and somewhat misleading. This patch makes the messages more accurate. It also uses a consistent representation of the partition being fetched (`rdd_xx_yy`) so that it's easier for users to trace what is going on when reading logs.

ea34c521

Responses to review · 8b377718
Patrick Wendell authored 11 years ago

8b377718
Fix inconsistent and incorrect log messages in shuffle read path · 391133f6
Patrick Wendell authored 11 years ago

391133f6

Merge pull request #39 from pwendell/master · 02f37ee8

Patrick Wendell authored 11 years ago

Adding Shark 0.7.1 to EC2 scripts

This adds a newer version of Shark to the ec2 scripts. I've tested this for both Hadoop1 and Hadoop2 clusters.

02f37ee8

Adding Shark 0.7.1 to EC2 scripts · 3745a182
Patrick Wendell authored 11 years ago

3745a182

Merge pull request #31 from sundeepn/branch-0.8 · 213b70a2

Reynold Xin authored 11 years ago


Resolving package conflicts with hadoop 0.23.9

Hadoop 0.23.9 is having a package conflict with easymock's dependencies.

(cherry picked from commit 023e3fdf)
Signed-off-by: Reynold Xin <rxin@apache.org>

213b70a2

Oct 06, 2013

Merge pull request #37 from pwendell/merge-0.8 · d585613e

Patrick Wendell authored 11 years ago

merge in remaining changes from `branch-0.8`

This merges in the following changes from `branch-0.8`:

- The scala version is included in the published maven artifact names
- A unit tests which had non-deterministic failures is ignored (see SPARK-908)
- A minor documentation change shows the short version instead of the full version
- Moving the kafka jar to be "provided"
- Changing the default spark ec2 version.
- Some spacing changes caused by Maven's release plugin

Note that I've squashed this into a single commit rather than pull in the branch-0.8 history. There are a bunch of release/revert commits there that make the history super ugly.

d585613e

Merging build changes in from 0.8 · aa9fb849
Patrick Wendell authored 11 years ago

aa9fb849

Oct 05, 2013

Merge pull request #20 from harveyfeng/hadoop-config-cache · 4a25b116

Matei Zaharia authored 11 years ago

Allow users to pass broadcasted Configurations and cache InputFormats across Hadoop file reads.

Note: originally from https://github.com/mesos/spark/pull/942

Currently motivated by Shark queries on Hive-partitioned tables, where there's a JobConf broadcast for every Hive-partition (i.e., every subdirectory read). The only thing different about those JobConfs is the input path - the Hadoop Configuration that the JobConfs are constructed from remain the same.
This PR only modifies the old Hadoop API RDDs, but similar additions to the new API might reduce computation latencies a little bit for high-frequency FileInputDStreams (which only uses the new API right now).

As a small bonus, added InputFormats caching, to avoid reflection calls for every RDD#compute().

Few other notes:

Added a general soft-reference hashmap in SparkHadoopUtil because I wanted to avoid adding another class to SparkEnv.
SparkContext default hadoopConfiguration isn't cached. There's no equals() method for Configuration, so there isn't a good way to determine when configuration properties have changed.

4a25b116

Some comments regarding JobConf and InputFormat caching for HadoopRDDs. · 6a2bbec5
Harvey Feng authored 11 years ago

6a2bbec5

Merge pull request #36 from pwendell/versions · 8fc68d04

Reynold Xin authored 11 years ago

Bumping EC2 default version in master to .

This change was already made on . This PR ports the change up to master.

8fc68d04

Make HadoopRDD object Spark private. · 96929f28
Harvey Feng authored 11 years ago

96929f28
Bumping EC2 default version in master to `0.8.0`. · 2484b846
Patrick Wendell authored 11 years ago

2484b846
Fix API changes; lines > 100 chars. · b5e93c12
Harvey Feng authored 11 years ago

b5e93c12

Merge pull request #27 from davidmccauley/master · 100222b0

Matei Zaharia authored 11 years ago

SPARK-920/921 - JSON endpoint updates

920 - Removal of duplicate scheme part of Spark URI, it was appearing as spark://spark//host:port in the JSON field.

JSON now delivered as:
url:spark://127.0.0.1:7077

921 - Adding the URL of the Main Application UI will allow custom interfaces (that use the JSON output) to redirect from the standalone UI.

100222b0

Merge pull request #33 from AndreSchumacher/pyspark_partition_key_change · 08641932

Matei Zaharia authored 11 years ago

Fixing SPARK-602: PythonPartitioner

Currently PythonPartitioner determines partition ID by hashing a
byte-array representation of PySpark's key. This PR lets
PythonPartitioner use the actual partition ID, which is required e.g.
for sorting via PySpark.

08641932

Fixed tests, changed property akka.remote.netty.x to akka.remote.netty.tcp.x · 3e414952
Prashant Sharma authored 11 years ago

3e414952

Merge branch 'master' into scala-2.10 · c810ee06

Prashant Sharma authored 11 years ago

Conflicts:
	core/src/test/scala/org/apache/spark/DistributedSuite.scala
	project/SparkBuild.scala

c810ee06

Oct 04, 2013

Fixing SPARK-602: PythonPartitioner · c84946fe

Andre Schumacher authored 11 years ago

Currently PythonPartitioner determines partition ID by hashing a
byte-array representation of PySpark's key. This PR lets
PythonPartitioner use the actual partition ID, which is required e.g.
for sorting via PySpark.

c84946fe

Oct 03, 2013

Merge pull request #26 from Du-Li/master · 232765f7

Matei Zaharia authored 11 years ago

fixed a wildcard bug in make-distribution.sh; ask sbt to check local
maven repo in project/SparkBuild.scala

(1) fixed a wildcard bug in make-distribution.sh:
with the wildcard * in quotes, this cp command failed. it worked after
moving the wildcard out quotes.

(2) ask sbt to check local maven repo in SparkBuild.scala:
To build Spark (0.9.0-SNAPSHOT) with the HEAD of mesos (0.15.0), I must
do "make maven-install" under mesos/build, which publishes the java .jar
file under ~/.m2. However, when building Spark (after pointing mesos to
version 0.15.0), sbt uses ivy which by default only checks ~/.ivy2. This
change is to tell sbt to also check ~/.m2.

232765f7

Merge pull request #25 from CruncherBigData/master · 405e69bb
Matei Zaharia authored 11 years ago
```
Update README: updated the link
```
405e69bb
Merge pull request #28 from tgravescs/sparYarnAppName · 49dbfccf
Matei Zaharia authored 11 years ago
```
Allow users to set the application name for Spark on Yarn
```
49dbfccf
Add default value to usage statement · c021b8c2
tgravescs authored 11 years ago

c021b8c2

Oct 02, 2013

Merge pull request #10 from kayousterhout/results_through-bm · e597ea34

Matei Zaharia authored 11 years ago

Send Task results through the block manager when larger than Akka frame size (fixes SPARK-669).

This change requires adding an extra failure mode: tasks can complete
successfully, but the result gets lost or flushed from the block manager
before it's been fetched.

This change also moves the deserialization of tasks into a separate thread, so it's no longer part of the DAG scheduler's tight loop. This should improve scheduler throughput, particularly when tasks are sending back large results.

Thanks Josh for writing the original version of this patch!

This is duplicated from the mesos/spark repo: https://github.com/mesos/spark/pull/835

e597ea34

Allow users to set the application name for Spark on Yarn · bc3b20ab
tgravescs authored 11 years ago

bc3b20ab
SPARK-921 - Add Application UI URL to ApplicationInfo Json output · 1577b373
David McCauley authored 11 years ago

1577b373
SPARK-920 - JSON endpoint URI scheme part (spark://) duplicated · 351da546
David McCauley authored 11 years ago

351da546

Oct 01, 2013
- ask ivy/sbt to check local maven repo under ~/.m2 · 9fd6bba6
  Du Li authored 11 years ago
  
  9fd6bba6
- fixed a bug of using wildcard in quotes · 0d19f00e
  Du Li authored 11 years ago
  
  0d19f00e
- Update README · c85f7205
  CruncherBigData authored 11 years ago
  
  c85f7205
- Merge branch 'master' into scala-2.10 · 58296928
  Prashant Sharma authored 11 years ago
  
  Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
  58296928
- Added additional unit test for repeated task failures · 0dcad2ed
  Kay Ousterhout authored 11 years ago
  
  0dcad2ed
- Fixed compilation errors and broken test. · dea4677c
  Kay Ousterhout authored 11 years ago
  
  dea4677c
Sep 30, 2013
- Merge remote-tracking branch 'upstream/master' into results_through-bm · 8deda427
  Kay Ousterhout authored 11 years ago
  
  Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala
  8deda427
- Addressed Matei's code review comments · 58b764b7
  Kay Ousterhout authored 11 years ago
  
  58b764b7
- Fixed non termination of Executor backend, when sc.stop is not called. · 9865fd6a
  Prashant Sharma authored 11 years ago
  
  9865fd6a
Sep 29, 2013
- Merge HadoopDatasetRDD into HadoopRDD. · 7d06bdde
  Harvey Feng authored 11 years ago
  
  7d06bdde
Sep 26, 2013
- Merge remote-tracking branch 'oldsparkme/hadoopRDD-broadcast-change' into hadoop-config-cache · 41708571
  Harvey Feng authored 11 years ago
  
  41708571
- Merge pull request #17 from rxin/optimize · 714fdabd
  Reynold Xin authored 11 years ago
  
  Remove -optimize flag
  714fdabd