- May 06, 2014
-
-
ArcherShao authored
Modify wrong comment of function addWithoutResize. Author: ArcherShao <ArcherShao@users.noreply.github.com> Closes #667 from ArcherShao/patch-3 and squashes the following commits: a607358 [ArcherShao] Update OpenHashSet.scala
-
Michael Armbrust authored
I also removed a println that I bumped into. Author: Michael Armbrust <michael@databricks.com> Closes #658 from marmbrus/nullPrimitives and squashes the following commits: a3ec4f3 [Michael Armbrust] Remove println. 695606b [Michael Armbrust] Support for null primatives from using scala and java reflection.
-
Andrew Or authored
73b0cbcc introduced a few special profiles that are not covered in the `make-distribution.sh`. This affects hadoop versions 2.2.x, 2.3.x, and 2.4.x. Without these special profiles, a java version error for protobufs is thrown at run time. I took the opportunity to rewrite the way we construct the maven command. Previously, the only hadoop version that triggered the `yarn-alpha` profile was 0.23.x, which was inconsistent with the [docs](https://github.com/apache/spark/blob/master/docs/building-with-maven.md). This is now generalized to hadoop versions from 0.23.x to 2.1.x. Author: Andrew Or <andrewor14@gmail.com> Closes #660 from andrewor14/hadoop-distribution and squashes the following commits: 6740126 [Andrew Or] Generalize the yarn profile to hadoop versions 2.2+ 88f192d [Andrew Or] Add the required special profiles to make-distribution.sh
-
- May 05, 2014
-
-
Cheng Lian authored
[SPARK-1678][SPARK-1679] In-memory compression bug fix and made compression configurable, disabled by default In-memory compression is now configurable in `SparkConf` by the `spark.sql.inMemoryCompression.enabled` property, and is disabled by default. To help code review, the bug fix is in [the first commit](https://github.com/liancheng/spark/commit/d537a367edf0bf24d0b925cc58b21d805ccbc11f), compression configuration is in [the second one](https://github.com/liancheng/spark/commit/4ce09aa8aa820bbbbbaa0f3f084a6cff1d4e6195). Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #608 from liancheng/spark-1678 and squashes the following commits: 66c3a8d [Cheng Lian] Renamed in-memory compression configuration key f8fb3a0 [Cheng Lian] Added assertion for testing .hasNext of various decoder 4ce09aa [Cheng Lian] Made in-memory compression configurable via SparkConf d537a36 [Cheng Lian] Fixed SPARK-1678
-
Xiangrui Meng authored
Final pass before the v1.0 release. * Remove `VectorRDDs` * Move `BinaryClassificationMetrics` from `evaluation.binary` to `evaluation` * Change default value of `addIntercept` to false and allow to add intercept in Ridge and Lasso. * Clean `DecisionTree` package doc and test suite. * Mark model constructors `private[spark]` * Rename `loadLibSVMData` to `loadLibSVMFile` and hide `LabelParser` from users. * Add `saveAsLibSVMFile`. * Add `appendBias` to `MLUtils`. Author: Xiangrui Meng <meng@databricks.com> Closes #524 from mengxr/mllib-cleaning and squashes the following commits: 295dc8b [Xiangrui Meng] update loadLibSVMFile doc 1977ac1 [Xiangrui Meng] fix doc of appendBias 649fcf0 [Xiangrui Meng] rename loadLibSVMData to loadLibSVMFile; hide LabelParser from user APIs 54b812c [Xiangrui Meng] add appendBias a71e7d0 [Xiangrui Meng] add saveAsLibSVMFile d976295 [Xiangrui Meng] Merge branch 'master' into mllib-cleaning b7e5cec [Xiangrui Meng] remove some experimental annotations and make model constructors private[mllib] 9b02b93 [Xiangrui Meng] minor code style update a593ddc [Xiangrui Meng] fix python tests fc28c18 [Xiangrui Meng] mark more classes experimental f6cbbff [Xiangrui Meng] fix Java tests 0af70b0 [Xiangrui Meng] minor 6e139ef [Xiangrui Meng] Merge branch 'master' into mllib-cleaning 94e6dce [Xiangrui Meng] move BinaryLabelCounter and BinaryConfusionMatrixImpl to evaluation.binary df34907 [Xiangrui Meng] clean DecisionTreeSuite to use LocalSparkContext c81807f [Xiangrui Meng] set the default value of AddIntercept to false 03389c0 [Xiangrui Meng] allow to add intercept in Ridge and Lasso c66c56f [Xiangrui Meng] move tree md to package object doc a2695df [Xiangrui Meng] update guide for BinaryClassificationMetrics 9194f4c [Xiangrui Meng] move BinaryClassificationMetrics one level up 1c1a0e3 [Xiangrui Meng] remove VectorRDDs because it only contains one function that is not necessary for us to maintain
-
Andrew Or authored
Hopefully this can go into 1.0, as a few people on the user list have asked for this. Author: Andrew Or <andrewor14@gmail.com> Closes #648 from andrewor14/expose-listeners and squashes the following commits: e45e1ef [Andrew Or] Add missing colons (minor) 350d643 [Andrew Or] Expose SparkListeners and relevant classes as DeveloperApi
-
Sandy Ryza authored
Author: Sandy Ryza <sandy@cloudera.com> Closes #657 from sryza/sandy-spark-1728 and squashes the following commits: 4751443 [Sandy Ryza] SPARK-1728. JavaRDDLike.mapPartitionsWithIndex requires ClassTag
-
Andrew Or authored
This copies the datanucleus jars over from `lib_managed` into `dist/lib`, if any. The `CLASSPATH` must also be updated to reflect this change. Author: Andrew Or <andrewor14@gmail.com> Closes #610 from andrewor14/hive-distribution and squashes the following commits: a4bc96f [Andrew Or] Rename search path in jar error check fa205e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution 7855f58 [Andrew Or] Have jar command respect JAVA_HOME + check for jar errors both cases c16bbfd [Andrew Or] Merge branch 'master' of github.com:apache/spark into hive-distribution 32f6826 [Andrew Or] Leave the double colons 940a1bb [Andrew Or] Add back 2>/dev/null 58357cc [Andrew Or] Include datanucleus jars in Spark distribution built with Hive support
-
Tathagata Das authored
- SPARK-1558: Updated custom receiver guide to match it with the new API - SPARK-1504: Added deployment and monitoring subsection to streaming - SPARK-1505: Added migration guide for migrating from 0.9.x and below to Spark 1.0 - Updated various Java streaming examples to use JavaReceiverInputDStream to highlight the API change. - Removed the requirement for cleaner ttl from streaming guide Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #652 from tdas/doc-fix and squashes the following commits: cb4f4b7 [Tathagata Das] Possible fix for flaky graceful shutdown test. ab71f7f [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into doc-fix 8d6ff9b [Tathagata Das] Addded migration guide to Spark Streaming. 7d171df [Tathagata Das] Added reference to JavaReceiverInputStream in examples and streaming guide. 49edd7c [Tathagata Das] Change java doc links to use Java docs. 11528d7 [Tathagata Das] Updated links on index page. ff80970 [Tathagata Das] More updates to streaming guide. 4dc42e9 [Tathagata Das] Added monitoring and other documentation in the streaming guide. 14c6564 [Tathagata Das] Updated custom receiver guide.
-
Bouke van der Bijl authored
This is because Mesos calls it with a different environment or something, the result is that the Spark jar is missing and it can't load classes. This fixes http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html I have no idea whether this is the right fix, I can only confirm that it fixes the issue for us. The `registered` method is called from mesos (https://github.com/apache/mesos/blob/765ff9bc2ac5a12d4362f8235b572a37d646390a/src/java/jni/org_apache_mesos_MesosExecutorDriver.cpp) I am unsure which commit caused this regression Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes #620 from bouk/mesos-classloader-fix and squashes the following commits: c13eae0 [Bouke van der Bijl] Use getContextOrSparkClassLoader in SparkEnv and CompressionCodec
-
Sean Owen authored
See related discussion at https://github.com/apache/spark/pull/468 This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`. - Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows. - Removes `hadoop.major.version` - Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes: - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue - like the jets3t version issue now - Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden - _(YARN profiles in the parent now only exist to add the sub-module)_ - Fixes the jets3t dependency issue - and makes it a runtime dependency - and centralizes config of this guy in the parent pom - Updates build docs - Updates SBT build too - and fixes a regex problem along the way Author: Sean Owen <sowen@cloudera.com> Closes #629 from srowen/SPARK-1556 and squashes the following commits: c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles 274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build) f21f356 [Sean Owen] Build changes to set up for jets3t fix
-
Reynold Xin authored
See discussion from http://apache-spark-developers-list.1001551.n3.nabble.com/bug-using-kryo-as-closure-serializer-td6473.html Author: Reynold Xin <rxin@apache.org> Closes #642 from rxin/docs-ser and squashes the following commits: a507db5 [Reynold Xin] Use "Java" instead of default. 5eb8cdd [Reynold Xin] Updated doc for spark.closure.serializer to indicate only the default serializer work.
-
- May 04, 2014
-
-
msiddalingaiah authored
I tested the change locally with Spark 0.9.1, but I can't test with 1.0.0 because there was no AMI for it at the time. It's a trivial fix, so it shouldn't cause any problems. Author: msiddalingaiah <madhu@madhu.com> Closes #641 from msiddalingaiah/master and squashes the following commits: a4f7404 [msiddalingaiah] Address SPARK-1717
-
Sandeep authored
Catching the InvocationTargetException, printing getTargetException. Author: Sandeep <sandeep@techaddict.me> Closes #630 from techaddict/SPARK-1710 and squashes the following commits: 834d79b [Sandeep] changes from srowen suggestions 109d604 [Sandeep] SPARK-1710: spark-submit should print better errors than "InvocationTargetException"
-
Allan Douglas R. de Oliveira authored
This is specially import because some ssh errors are raised as UsageError, preventing an automated usage of the script from detecting the failure. Author: Allan Douglas R. de Oliveira <allan@chaordicsystems.com> Closes #638 from douglaz/ec2_exit_code_fix and squashes the following commits: 5915e6d [Allan Douglas R. de Oliveira] EC2 script should exit with non-zero code on UsageError
-
witgo authored
...park built for hadoop 2.3.0 , 2.4.0 Author: witgo <witgo@qq.com> Closes #628 from witgo/SPARK-1693_new and squashes the following commits: e3af968 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1693_new dc63905 [witgo] SPARK-1693: Most of the tests throw a java.lang.SecurityException when spark built for hadoop 2.3.0 , 2.4.0
-
Sean Owen authored
SPARK-1629. Addendum: Depend on commons lang3 (already used by tachyon) as it's used in ReplSuite, and return to use lang3 utility in Utils.scala For consideration. This was proposed in related discussion: https://github.com/apache/spark/pull/569 Author: Sean Owen <sowen@cloudera.com> Closes #635 from srowen/SPARK-1629.2 and squashes the following commits: a442b98 [Sean Owen] Depend on commons lang3 (already used by tachyon) as it's used in ReplSuite, and return to use lang3 utility in Utils.scala
-
Patrick Wendell authored
This add some guards and good warning messages if users hit this issue. /cc @aarondav with whom I discussed parts of the design. Author: Patrick Wendell <pwendell@gmail.com> Closes #627 from pwendell/jdk6 and squashes the following commits: a38a958 [Patrick Wendell] Code review feedback 94e9f84 [Patrick Wendell] SPARK-1703 Warn users if Spark is run on JRE6 but compiled with JDK7.
-
Sean Owen authored
It appears that one of these methods doesn't use `org.apache.spark.api.java.function.Function2` like all the others, but uses Scala's `Function2`. Author: Sean Owen <sowen@cloudera.com> Closes #633 from srowen/SPARK-1663.2 and squashes the following commits: 1e0232d [Sean Owen] Fix signature of one version of reduceByKeyAndWindow to use Java API Function2, as apparently intended
-
Rahul Singhal authored
The current test is checking the exit code of "tail" rather than "mvn". This new check will make sure that mvn is installed and was able to execute the "version command". Author: Rahul Singhal <rahul.singhal@guavus.com> Closes #580 from rahulsinghaliitd/SPARK-1658 and squashes the following commits: 83c0313 [Rahul Singhal] SPARK-1658: Correctly identify if maven is installed and working bf821b9 [Rahul Singhal] SPARK-1658: Correctly identify if maven is installed and working
-
witgo authored
This is a part of [PR 590](https://github.com/apache/spark/pull/590) Author: witgo <witgo@qq.com> Closes #626 from witgo/yarn_version and squashes the following commits: c390631 [witgo] restore the yarn dependency declarations f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha 2df6cf5 [witgo] review commit a1d876a [witgo] review commit 20e7e3e [witgo] review commit c76763b [witgo] The default value of yarn.version is equal to hadoop.version
-
Michael Armbrust authored
This is ready when Jenkins is. Author: Michael Armbrust <michael@databricks.com> Closes #596 from marmbrus/moreTests and squashes the following commits: 85be703 [Michael Armbrust] Blacklist MR required tests. 35bc311 [Michael Armbrust] Add hive golden answers. ede98fd [Michael Armbrust] More hive gitignore da096ea [Michael Armbrust] update whitelist
-
- May 03, 2014
-
-
Michael Armbrust authored
Author: Michael Armbrust <michael@databricks.com> Closes #616 from marmbrus/ruleLogging and squashes the following commits: 39c09fe [Michael Armbrust] Fix off by one error. 5af3537 [Michael Armbrust] Better logging when applying rules.
-
Allan Douglas R. de Oliveira authored
Added option to configure number of worker instances and to set SPARK_MASTER_OPTS Depends on: https://github.com/mesos/spark-ec2/pull/46 Author: Allan Douglas R. de Oliveira <allan@chaordicsystems.com> Closes #612 from douglaz/ec2_configurable_workers and squashes the following commits: d6c5d65 [Allan Douglas R. de Oliveira] Added master opts parameter 6c34671 [Allan Douglas R. de Oliveira] Use number of worker instances as string on template ba528b9 [Allan Douglas R. de Oliveira] Added SPARK_WORKER_INSTANCES parameter
-
Aaron Davidson authored
Previously, we indicated disconnected(), which keeps the application in a limbo state where it has no executors but thinks it will get them soon. This is a bug fix that hopefully can be included in 1.0. Author: Aaron Davidson <aaron@databricks.com> Closes #605 from aarondav/appremoved and squashes the following commits: bea02a2 [Aaron Davidson] SPARK-1689 AppClient should indicate app is dead() when removed
-
Cheng Lian authored
Should lookup `shutdownDeleteTachyonPaths` instead of `shutdownDeletePaths`. Together with a minor style clean up: `find {...}.isDefined` to `exists {...}`. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #575 from liancheng/tachyonFix and squashes the following commits: deb8f31 [Cheng Lian] Fixed logical error in when cleanup Tachyon files and minor style cleanup
-
Sean Owen authored
SPARK-1663. Corrections for several compile errors in streaming code examples, and updates to follow API changes I gave the Streaming code examples, both Scala and Java, a test run today. I turned up a number of small errors, mostly compile errors in the Java examples. There were a few typos in the Scala too. I also took the liberty of adding things like imports, since in several cases they are not obvious. Feel free to push back on some changes. There's one thing I haven't quite addressed in the changes. `JavaPairDStream` uses the Java API version of `Function2` in almost all cases, as `JFunction2`. However it uses `scala.Function2` in: ``` def reduceByKeyAndWindow(reduceFunc: Function2[V, V, V], windowDuration: Duration) :JavaPairDStream[K, V] = { dstream.reduceByKeyAndWindow(reduceFunc, windowDuration) } ``` Is that a typo? Also, in Scala, I could not get this to compile: ``` val windowedWordCounts = pairs.reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) error: missing parameter type for expanded function ((x$1, x$2) => x$1.$plus(x$2)) ``` You can see my fix below but am I missing something? Otherwise I can say these all worked for me! Author: Sean Owen <sowen@cloudera.com> Closes #589 from srowen/SPARK-1663 and squashes the following commits: 65a906b [Sean Owen] Corrections for several compile errors in streaming code examples, and updates to follow API changes
-
Thomas Graves authored
Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to https://github.com/apache/spark/pull/607 Author: Thomas Graves <tgraves@apache.org> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up.
-
ArcherShao authored
Modify spelling errors Author: ArcherShao <ArcherShao@users.noreply.github.com> Closes #619 from ArcherShao/patch-1 and squashes the following commits: 2957195 [ArcherShao] Update SchemaRDD.scala
-
Aaron Davidson authored
This will ensure that sockets do not build up over the course of a job, and that cancellation successfully cleans up sockets. Tested in standalone mode. More file descriptors spawn than expected (around 1000ish rather than the expected 8ish) but they do not pile up between runs, or as high as before (where they went up to around 5k). Author: Aaron Davidson <aaron@databricks.com> Closes #623 from aarondav/pyspark2 and squashes the following commits: 0ca13bb [Aaron Davidson] SPARK-1700: Close socket file descriptors on task completion
-
- May 02, 2014
-
-
Sandy Ryza authored
Author: Sandy Ryza <sandy@cloudera.com> Closes #601 from sryza/sandy-spark-1492 and squashes the following commits: 5df1634 [Sandy Ryza] Address additional comments from Patrick. be46d1f [Sandy Ryza] Address feedback from Marcelo and Patrick 867a3ea [Sandy Ryza] SPARK-1492. Update Spark YARN docs to use spark-submit
-
wangfei authored
Author: wangfei <wangfei_hello@126.com> Closes #613 from scwf/masterIndex and squashes the following commits: 1463056 [wangfei] delete no use var: masterIndex
-
witgo authored
...llections does not exist Author: witgo <witgo@qq.com> Closes #611 from witgo/SPARK-1695 and squashes the following commits: d77a887 [witgo] Fix SPARK-1695: java8-tests compiler error: package com.google.common.collections does not exist
-
- May 01, 2014
-
-
Andrew Or authored
Modifications to Spark core are limited to exposing functionality to test files + minor style fixes. (728 / 769 lines are from tests) Author: Andrew Or <andrewor14@gmail.com> Closes #591 from andrewor14/event-log-tests and squashes the following commits: 2883837 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests c3afcea [Andrew Or] Compromise 2d5daf8 [Andrew Or] Use temp directory provided by the OS rather than /tmp 2b52151 [Andrew Or] Remove unnecessary file delete + add a comment 62010fd [Andrew Or] More cleanup (renaming variables, updating comments etc) ad2beff [Andrew Or] Clean up EventLoggingListenerSuite + modify a few comments 862e752 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests e0ba2f8 [Andrew Or] Fix test failures caused by race condition in processing/mutating events b990453 [Andrew Or] ReplayListenerBus suite - tests do not all pass yet ab66a84 [Andrew Or] Tests for FileLogger + delete file after tests 187bb25 [Andrew Or] Formatting and renaming variables 769336f [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests 5d38ffe [Andrew Or] Clean up EventLoggingListenerSuite + add comments e12f4b1 [Andrew Or] Preliminary tests for EventLoggingListener (need major cleanup)
-
witgo authored
Author: witgo <witgo@qq.com> Closes #581 from witgo/SPARK-1659 and squashes the following commits: 0b2cf98 [witgo] Delete spark-submit obsolete usage: "--arg ARG"
-
wangfei authored
Author: wangfei <wangfei_hello@126.com> Closes #614 from scwf/pxcw and squashes the following commits: d1016ba [wangfei] fix spelling mistake
-
Michael Armbrust authored
The JIRA in question is actually reporting a bug with Shark, but I wanted to make sure Spark SQL did not have similar problems. This fixes a bug in our parsing code that was preventing the test from executing, but it looks like the RegexSerDe is working in Spark SQL. Author: Michael Armbrust <michael@databricks.com> Closes #595 from marmbrus/fixRegexSerdeTest and squashes the following commits: a4dc612 [Michael Armbrust] Add files created by hive to gitignore. efa6402 [Michael Armbrust] Fix Hive serde_regex test.
-
Patrick Wendell authored
This is a fairly straightforward fix. The bug was reported by @vanzin and the fix was proposed by @deanwampler and myself. Please take a look! Author: Patrick Wendell <pwendell@gmail.com> Closes #609 from pwendell/quotes and squashes the following commits: 8bed767 [Patrick Wendell] SPARK-1691: Support quoted arguments inside of spark-submit.
-
- Apr 30, 2014
-
-
witgo authored
...OS_WINDOWS` Author: witgo <witgo@qq.com> Closes #569 from witgo/SPARK-1629 and squashes the following commits: 31520eb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1629 fcaafd7 [witgo] merge mastet 49e248e [witgo] Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_OS_WINDOWS`
-
Sandy Ryza authored
This reopens https://github.com/apache/incubator-spark/pull/640 against the new repo Author: Sandy Ryza <sandy@cloudera.com> Closes #30 from sryza/sandy-spark-1004 and squashes the following commits: 89889d4 [Sandy Ryza] Move unzipping py4j to the generate-resources phase so that it gets included in the jar the first time 5165a02 [Sandy Ryza] Fix docs fd0df79 [Sandy Ryza] PySpark on YARN
-