- Oct 08, 2013
-
-
Prashant Sharma authored
Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
-
- Oct 07, 2013
-
-
Reynold Xin authored
Fix inconsistent and incorrect log messages in shuffle read path The user-facing messages generated by the CacheManager are currently wrong and somewhat misleading. This patch makes the messages more accurate. It also uses a consistent representation of the partition being fetched (`rdd_xx_yy`) so that it's easier for users to trace what is going on when reading logs.
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
Adding Shark 0.7.1 to EC2 scripts This adds a newer version of Shark to the ec2 scripts. I've tested this for both Hadoop1 and Hadoop2 clusters.
-
Patrick Wendell authored
-
Reynold Xin authored
Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit 023e3fdf) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
- Oct 06, 2013
-
-
Patrick Wendell authored
merge in remaining changes from `branch-0.8` This merges in the following changes from `branch-0.8`: - The scala version is included in the published maven artifact names - A unit tests which had non-deterministic failures is ignored (see SPARK-908) - A minor documentation change shows the short version instead of the full version - Moving the kafka jar to be "provided" - Changing the default spark ec2 version. - Some spacing changes caused by Maven's release plugin Note that I've squashed this into a single commit rather than pull in the branch-0.8 history. There are a bunch of release/revert commits there that make the history super ugly.
-
Patrick Wendell authored
-
- Oct 05, 2013
-
-
Matei Zaharia authored
Allow users to pass broadcasted Configurations and cache InputFormats across Hadoop file reads. Note: originally from https://github.com/mesos/spark/pull/942 Currently motivated by Shark queries on Hive-partitioned tables, where there's a JobConf broadcast for every Hive-partition (i.e., every subdirectory read). The only thing different about those JobConfs is the input path - the Hadoop Configuration that the JobConfs are constructed from remain the same. This PR only modifies the old Hadoop API RDDs, but similar additions to the new API might reduce computation latencies a little bit for high-frequency FileInputDStreams (which only uses the new API right now). As a small bonus, added InputFormats caching, to avoid reflection calls for every RDD#compute(). Few other notes: Added a general soft-reference hashmap in SparkHadoopUtil because I wanted to avoid adding another class to SparkEnv. SparkContext default hadoopConfiguration isn't cached. There's no equals() method for Configuration, so there isn't a good way to determine when configuration properties have changed.
-
Harvey Feng authored
-
Reynold Xin authored
Bumping EC2 default version in master to . This change was already made on . This PR ports the change up to master.
-
Harvey Feng authored
-
Patrick Wendell authored
-
Harvey Feng authored
-
Matei Zaharia authored
SPARK-920/921 - JSON endpoint updates 920 - Removal of duplicate scheme part of Spark URI, it was appearing as spark://spark//host:port in the JSON field. JSON now delivered as: url:spark://127.0.0.1:7077 921 - Adding the URL of the Main Application UI will allow custom interfaces (that use the JSON output) to redirect from the standalone UI.
-
Matei Zaharia authored
Fixing SPARK-602: PythonPartitioner Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
-
Prashant Sharma authored
-
Prashant Sharma authored
Conflicts: core/src/test/scala/org/apache/spark/DistributedSuite.scala project/SparkBuild.scala
-
- Oct 04, 2013
-
-
Andre Schumacher authored
Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
-
- Oct 03, 2013
-
-
Matei Zaharia authored
fixed a wildcard bug in make-distribution.sh; ask sbt to check local maven repo in project/SparkBuild.scala (1) fixed a wildcard bug in make-distribution.sh: with the wildcard * in quotes, this cp command failed. it worked after moving the wildcard out quotes. (2) ask sbt to check local maven repo in SparkBuild.scala: To build Spark (0.9.0-SNAPSHOT) with the HEAD of mesos (0.15.0), I must do "make maven-install" under mesos/build, which publishes the java .jar file under ~/.m2. However, when building Spark (after pointing mesos to version 0.15.0), sbt uses ivy which by default only checks ~/.ivy2. This change is to tell sbt to also check ~/.m2.
-
Matei Zaharia authored
Update README: updated the link
-
Matei Zaharia authored
Allow users to set the application name for Spark on Yarn
-
tgravescs authored
-
- Oct 02, 2013
-
-
Matei Zaharia authored
Send Task results through the block manager when larger than Akka frame size (fixes SPARK-669). This change requires adding an extra failure mode: tasks can complete successfully, but the result gets lost or flushed from the block manager before it's been fetched. This change also moves the deserialization of tasks into a separate thread, so it's no longer part of the DAG scheduler's tight loop. This should improve scheduler throughput, particularly when tasks are sending back large results. Thanks Josh for writing the original version of this patch! This is duplicated from the mesos/spark repo: https://github.com/mesos/spark/pull/835
-
tgravescs authored
-
David McCauley authored
-
spark://David McCauley authored
-
- Oct 01, 2013
-
-
Du Li authored
-
Du Li authored
-
CruncherBigData authored
-
Prashant Sharma authored
Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
-
Kay Ousterhout authored
-
Kay Ousterhout authored
-
- Sep 30, 2013
-
-
Kay Ousterhout authored
Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala
-
Kay Ousterhout authored
-
Prashant Sharma authored
-
- Sep 29, 2013
-
-
Harvey Feng authored
-
- Sep 26, 2013
-
-
Harvey Feng authored
-
Reynold Xin authored
Remove -optimize flag
-