Skip to content
Snippets Groups Projects
  1. Sep 03, 2014
    • Marcelo Vanzin's avatar
      [SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. · f2b5b619
      Marcelo Vanzin authored
      This change exposes the application ID generated by the Spark Master, Mesos or Yarn
      via the SparkListenerApplicationStart event. It then uses that information to expose the
      application via its ID in the history server, instead of using the internal directory name
      generated by the event logger as an application id. This allows someone who knows
      the application ID to easily figure out the URL for the application's entry in the HS, aside
      from looking better.
      
      In Yarn mode, this is used to generate a direct link from the RM application list to the
      Spark history server entry (thus providing a fix for SPARK-2150).
      
      Note this sort of assumes that the different managers will generate app ids that are
      sufficiently different from each other that clashes will not occur.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Andrew Or <andrewor14@gmail.com>
      
      Closes #1218 from vanzin/yarn-hs-link-2 and squashes the following commits:
      
      2d19f3c [Marcelo Vanzin] Review feedback.
      6706d3a [Marcelo Vanzin] Implement applicationId() in base classes.
      56fe42e [Marcelo Vanzin] Fix cluster mode history address, plus a cleanup.
      44112a8 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      8278316 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      a86bbcf [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      a0056e6 [Marcelo Vanzin] Unbreak test.
      4b10cfd [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      cb0cab2 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      25f2826 [Marcelo Vanzin] Add MIMA excludes.
      f0ba90f [Marcelo Vanzin] Use BufferedIterator.
      c90a08d [Marcelo Vanzin] Remove unused code.
      3f8ec66 [Marcelo Vanzin] Review feedback.
      21aa71b [Marcelo Vanzin] Fix JSON test.
      b022bae [Marcelo Vanzin] Undo SparkContext cleanup.
      c6d7478 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      4e3483f [Marcelo Vanzin] Fix test.
      57517b8 [Marcelo Vanzin] Review feedback. Mostly, more consistent use of Scala's Option.
      311e49d [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
      d35d86f [Marcelo Vanzin] Fix yarn backend after rebase.
      36dc362 [Marcelo Vanzin] Don't use Iterator::takeWhile().
      0afd696 [Marcelo Vanzin] Wait until master responds before returning from start().
      abc4697 [Marcelo Vanzin] Make FsHistoryProvider keep a map of applications by id.
      26b266e [Marcelo Vanzin] Use Mesos framework ID as Spark application ID.
      b3f3664 [Marcelo Vanzin] [yarn] Make the RM link point to the app direcly in the HS.
      2fb7de4 [Marcelo Vanzin] Expose the application ID in the ApplicationStart event.
      ed10348 [Marcelo Vanzin] Expose application id to spark context.
      f2b5b619
    • Marcelo Vanzin's avatar
      [SPARK-2845] Add timestamps to block manager events. · ccc69e26
      Marcelo Vanzin authored
      These are not used by the UI but are useful when analysing the
      logs from a spark job.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #654 from vanzin/bm-event-tstamp and squashes the following commits:
      
      d5d6e66 [Marcelo Vanzin] Fix tests.
      ec06218 [Marcelo Vanzin] Review feedback.
      f134dbc [Marcelo Vanzin] Merge branch 'master' into bm-event-tstamp
      b495b7c [Marcelo Vanzin] Merge branch 'master' into bm-event-tstamp
      7d2fe9e [Marcelo Vanzin] Review feedback.
      d6f381c [Marcelo Vanzin] Update tests added after patch was created.
      45e3bf8 [Marcelo Vanzin] Fix unit test after merge.
      b37a10f [Marcelo Vanzin] Use === in test assertions.
      ef72824 [Marcelo Vanzin] Handle backwards compatibility with 1.0.0.
      aca1151 [Marcelo Vanzin] Fix unit test to check new fields.
      efdda8e [Marcelo Vanzin] Add timestamps to block manager events.
      ccc69e26
    • RJ Nowling's avatar
      [SPARK-3263][GraphX] Fix changes made to GraphGenerator.logNormalGraph in PR #720 · e5d37680
      RJ Nowling authored
      PR #720 made multiple changes to GraphGenerator.logNormalGraph including:
      
      * Replacing the call to functions for generating random vertices and edges with in-line implementations with different equations. Based on reading the Pregel paper, I believe the in-line functions are incorrect.
      * Hard-coding of RNG seeds so that method now generates the same graph for a given number of vertices, edges, mu, and sigma -- user is not able to override seed or specify that seed should be randomly generated.
      * Backwards-incompatible change to logNormalGraph signature with introduction of new required parameter.
      * Failed to update scala docs and programming guide for API changes
      * Added a Synthetic Benchmark in the examples.
      
      This PR:
      * Removes the in-line calls and calls original vertex / edge generation functions again
      * Adds an optional seed parameter for deterministic behavior (when desired)
      * Keeps the number of partitions parameter that was added.
      * Keeps compatibility with the synthetic benchmark example
      * Maintains backwards-compatible API
      
      Author: RJ Nowling <rnowling@gmail.com>
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #2168 from rnowling/graphgenrand and squashes the following commits:
      
      f1cd79f [Ankur Dave] Style fixes
      e11918e [RJ Nowling] Fix bad comparisons in unit tests
      785ac70 [RJ Nowling] Fix style error
      c70868d [RJ Nowling] Fix logNormalGraph scala doc for seed
      41fd1f8 [RJ Nowling] Fix logNormalGraph scala doc for seed
      799f002 [RJ Nowling] Added test for different seeds for sampleLogNormal
      43949ad [RJ Nowling] Added test for different seeds for generateRandomEdges
      2faf75f [RJ Nowling] Added unit test for logNormalGraph
      82f22397 [RJ Nowling] Add unit test for sampleLogNormal
      b99cba9 [RJ Nowling] Make sampleLogNormal private to Spark (vs private) for unit testing
      6803da1 [RJ Nowling] Add GraphGeneratorsSuite with test for generateRandomEdges
      1c8fc44 [RJ Nowling] Connected components part of SynthBenchmark was failing to call count on RDD before printing
      dfbb6dd [RJ Nowling] Fix parameter name in SynthBenchmark docs
      b5eeb80 [RJ Nowling] Add optional seed parameter to SynthBenchmark and set default to randomly generate a seed
      1ff8d30 [RJ Nowling] Fix bug in generateRandomEdges where numVertices instead of numEdges was used to control number of edges to generate
      98bb73c [RJ Nowling] Add documentation for logNormalGraph parameters
      d40141a [RJ Nowling] Fix style error
      684804d [RJ Nowling] revert PR #720 which introduce errors in logNormalGraph and messed up seeding of RNGs.  Add user-defined optional seed for deterministic behavior
      c183136 [RJ Nowling] Fix to deterministic GraphGenerators.logNormalGraph that allows generating graphs randomly using optional seed.
      015010c [RJ Nowling] Fixed GraphGenerator logNormalGraph API to make backward-incompatible change in commit 894ecde0
      e5d37680
    • Davies Liu's avatar
      [SPARK-3309] [PySpark] Put all public API in __all__ · 6481d274
      Davies Liu authored
      Put all public API in __all__, also put them all in pyspark.__init__.py, then we can got all the documents for public API by `pydoc pyspark`. It also can be used by other programs (such as Sphinx or Epydoc) to generate only documents for public APIs.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2205 from davies/public and squashes the following commits:
      
      c6c5567 [Davies Liu] fix message
      f7b35be [Davies Liu] put SchemeRDD, Row in pyspark.sql module
      7e3016a [Davies Liu] add __all__ in mllib
      6281b48 [Davies Liu] fix doc for SchemaRDD
      6caab21 [Davies Liu] add public interfaces into pyspark.__init__.py
      6481d274
    • Marcelo Vanzin's avatar
      [SPARK-3187] [yarn] Cleanup allocator code. · 6a72a369
      Marcelo Vanzin authored
      Move all shared logic to the base YarnAllocator class, and leave
      the version-specific logic in the version-specific module.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2169 from vanzin/SPARK-3187 and squashes the following commits:
      
      46c2826 [Marcelo Vanzin] Hide the privates.
      4dc9c83 [Marcelo Vanzin] Actually release containers.
      8b1a077 [Marcelo Vanzin] Changes to the Yarn alpha allocator.
      f3f5f1d [Marcelo Vanzin] [SPARK-3187] [yarn] Cleanup allocator code.
      6a72a369
  2. Sep 02, 2014
    • Patrick Wendell's avatar
      SPARK-3358: [EC2] Switch back to HVM instances for m3.X. · c64cc435
      Patrick Wendell authored
      During regression tests of Spark 1.1 we discovered perf issues with
      PVM instances when running PySpark. This reverts a change added in #1156
      which changed the default type for m3 instances to PVM.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2244 from pwendell/ec2-hvm and squashes the following commits:
      
      1342d7e [Patrick Wendell] SPARK-3358: [EC2] Switch back to HVM instances for m3.X.
      c64cc435
    • Liang-Chi Hsieh's avatar
      [SPARK-3300][SQL] No need to call clear() and shorten build() · 24ab3840
      Liang-Chi Hsieh authored
      The function `ensureFreeSpace` in object `ColumnBuilder` clears old buffer before copying its content to new buffer. This PR fixes it.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #2195 from viirya/fix_buffer_clear and squashes the following commits:
      
      792f009 [Liang-Chi Hsieh] no need to call clear(). use flip() instead of calling limit(), position() and rewind().
      df2169f [Liang-Chi Hsieh] should clean old buffer after copying its content.
      24ab3840
    • Cheng Lian's avatar
      [SQL] Renamed ColumnStat to ColumnMetrics to avoid confusion between ColumnStats · 19d3e1e8
      Cheng Lian authored
      Class names of these two are just too similar.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2189 from liancheng/column-metrics and squashes the following commits:
      
      8bb3b21 [Cheng Lian] Renamed ColumnStat to ColumnMetrics to avoid confusion between ColumnStats
      19d3e1e8
    • Takuya UESHIN's avatar
      [SPARK-3341][SQL] The dataType of Sqrt expression should be DoubleType. · 0cd91f66
      Takuya UESHIN authored
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #2233 from ueshin/issues/SPARK-3341 and squashes the following commits:
      
      e497320 [Takuya UESHIN] Fix data type of Sqrt expression.
      0cd91f66
    • luluorta's avatar
      [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPartitions · 9b225ac3
      luluorta authored
      If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:
      java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions
      
      Author: luluorta <luluorta@gmail.com>
      
      Closes #1763 from luluorta/fix-graph-zip and squashes the following commits:
      
      8338961 [luluorta] fix GraphX EdgeRDD zipPartitions
      9b225ac3
    • Tathagata Das's avatar
      [SPARK-1981][Streaming][Hotfix] Fixed docs related to kinesis · e9bb12be
      Tathagata Das authored
      - Include kinesis in the unidocs
      - Hide non-public classes from docs
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #2239 from tdas/kinesis-doc-fix and squashes the following commits:
      
      156e20c [Tathagata Das] More fixes, based on PR comments.
      e9a6c01 [Tathagata Das] Fixed docs related to kinesis
      e9bb12be
    • Larry Xiao's avatar
      [SPARK-2981][GraphX] EdgePartition1D Int overflow · aa7de128
      Larry Xiao authored
      minor fix
      detail is here: https://issues.apache.org/jira/browse/SPARK-2981
      
      Author: Larry Xiao <xiaodi@sjtu.edu.cn>
      
      Closes #1902 from larryxiao/2981 and squashes the following commits:
      
      88059a2 [Larry Xiao] [SPARK-2981][GraphX] EdgePartition1D Int overflow
      aa7de128
    • uncleGen's avatar
      [SPARK-3123][GraphX]: override the "setName" function to set EdgeRDD's name... · 7c9bbf17
      uncleGen authored
      [SPARK-3123][GraphX]: override the "setName" function to set EdgeRDD's name manually just as VertexRDD does.
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #2033 from uncleGen/master_origin and squashes the following commits:
      
      801994b [uncleGen] Update EdgeRDD.scala
      7c9bbf17
    • Larry Xiao's avatar
      [SPARK-1986][GraphX]move lib.Analytics to org.apache.spark.examples · 7c92b49d
      Larry Xiao authored
      to support ~/spark/bin/run-example GraphXAnalytics triangles
      /soc-LiveJournal1.txt --numEPart=256
      
      Author: Larry Xiao <xiaodi@sjtu.edu.cn>
      
      Closes #1766 from larryxiao/1986 and squashes the following commits:
      
      bb77cd9 [Larry Xiao] [SPARK-1986][GraphX]move lib.Analytics to org.apache.spark.examples
      7c92b49d
    • Prudhvi Krishna's avatar
      SPARK-3328 fixed make-distribution script --with-tachyon option. · 644e3152
      Prudhvi Krishna authored
      Directory path for dependencies jar and resources in Tachyon 0.5.0 has been changed.
      
      Author: Prudhvi Krishna <prudhvi953@gmail.com>
      
      Closes #2228 from prudhvije/SPARK-3328/make-dist-fix and squashes the following commits:
      
      d1d2c22 [Prudhvi Krishna] SPARK-3328 fixed make-distribution script --with-tachyon option.
      644e3152
    • Davies Liu's avatar
      [SPARK-2871] [PySpark] add countApproxDistinct() API · e2c901b4
      Davies Liu authored
      RDD.countApproxDistinct(relativeSD=0.05):
      
              :: Experimental ::
              Return approximate number of distinct elements in the RDD.
      
              The algorithm used is based on streamlib's implementation of
              "HyperLogLog in Practice: Algorithmic Engineering of a State
              of The Art Cardinality Estimation Algorithm", available
              <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
      
              This support all the types of objects, which is supported by
              Pyrolite, nearly all builtin types.
      
              param relativeSD Relative accuracy. Smaller values create
                                 counters that require more space.
                                 It must be greater than 0.000017.
      
              >>> n = sc.parallelize(range(1000)).map(str).countApproxDistinct()
              >>> 950 < n < 1050
              True
              >>> n = sc.parallelize([i % 20 for i in range(1000)]).countApproxDistinct()
              >>> 18 < n < 22
              True
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2142 from davies/countApproxDistinct and squashes the following commits:
      
      e20da47 [Davies Liu] remove the correction in Python
      c38c4e4 [Davies Liu] fix doc tests
      2ab157c [Davies Liu] fix doc tests
      9d2565f [Davies Liu] add commments and link for hash collision correction
      d306492 [Davies Liu] change range of hash of tuple to [0, maxint]
      ded624f [Davies Liu] calculate hash in Python
      4cba98f [Davies Liu] add more tests
      a85a8c6 [Davies Liu] Merge branch 'master' into countApproxDistinct
      e97e342 [Davies Liu] add countApproxDistinct()
      e2c901b4
    • Sandy Ryza's avatar
      SPARK-3052. Misleading and spurious FileSystem closed errors whenever a ... · 81b9d5b6
      Sandy Ryza authored
      ...job fails while reading from Hadoop
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1956 from sryza/sandy-spark-3052 and squashes the following commits:
      
      815813a [Sandy Ryza] SPARK-3052. Misleading and spurious FileSystem closed errors whenever a job fails while reading from Hadoop
      81b9d5b6
    • Marcelo Vanzin's avatar
      [SPARK-3347] [yarn] Fix yarn-alpha compilation. · 066f31a6
      Marcelo Vanzin authored
      Missing import. Oops.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2236 from vanzin/SPARK-3347 and squashes the following commits:
      
      594fc39 [Marcelo Vanzin] [SPARK-3347] [yarn] Fix yarn-alpha compilation.
      066f31a6
    • Andrew Or's avatar
      [SPARK-1919] Fix Windows spark-shell --jars · 8f1f9aaf
      Andrew Or authored
      We were trying to add `file:/C:/path/to/my.jar` to the class path. We should add `C:/path/to/my.jar` instead. Tested on Windows 8.1.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2211 from andrewor14/windows-shell-jars and squashes the following commits:
      
      262c6a2 [Andrew Or] Oops... Add the new code to the correct place
      0d5a0c1 [Andrew Or] Format jar path only for adding to shell classpath
      42bd626 [Andrew Or] Remove unnecessary code
      0049f1b [Andrew Or] Remove embarrassing log messages
      b1755a0 [Andrew Or] Format jar paths properly before adding them to the classpath
      8f1f9aaf
    • Josh Rosen's avatar
      [SPARK-3061] Fix Maven build under Windows · 378b2315
      Josh Rosen authored
      The Maven build was failing on Windows because it tried to call the unix `unzip` utility to extract the Py4J files into core's build directory.  I've fixed this issue by using the `maven-antrun-plugin` to perform the unzipping.
      
      I also fixed an issue that prevented tests from running under Windows:
      
      In the Maven ScalaTest plugin, the filename listed in <filereports> is placed under the <reportsDirectory>; the current code places it in a subdirectory of reportsDirectory, e.g.
      
      ```
      ${project.build.directory}/surefire-reports/${project.build.directory}/SparkTestSuite.txt
      ```
      
      This caused problems under Windows because it would try to create a subdirectory named "c:\\".
      
      Note that the tests still fail under Windows (for other reasons); this PR just allows them to run and fail rather than crash when trying to create the test reports directory.
      
      Author: Josh Rosen <joshrosen@apache.org>
      Author: Josh Rosen <rosenville@gmail.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2165 from JoshRosen/windows-support and squashes the following commits:
      
      651d210 [Josh Rosen] Unzip to python/build instead of core/build
      fbf3e61 [Josh Rosen] 4 spaces -> 2 spaces
      e347668 [Josh Rosen] Fix Maven scalatest filereports path:
      4994af1 [Josh Rosen] [SPARK-3061] Use maven-antrun-plugin to unzip Py4J.
      378b2315
    • Sean Owen's avatar
      SPARK-3331 [BUILD] PEP8 tests fail because they check unzipped py4j code · 32ec0a8c
      Sean Owen authored
      PEP8 tests run on files under "./python", but unzipped py4j code is found at "./python/build/py4j". Py4J code fails style checks and can fail ./dev/run-tests if this code is present locally.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2222 from srowen/SPARK-3331 and squashes the following commits:
      
      34711ec [Sean Owen] Restrict lint check to pyspark/, since the local directory can contain unzipped py4j code in build/py4j
      32ec0a8c
    • Reza Zadeh's avatar
      [MLlib] Squash bug in IndexedRowMatrix · 0f16b23c
      Reza Zadeh authored
      Kill this bug fast before it does damage.
      
      Author: Reza Zadeh <rizlar@gmail.com>
      
      Closes #2224 from rezazadeh/indexrmbug and squashes the following commits:
      
      53386d6 [Reza Zadeh] Squash bug in IndexedRowMatrix
      0f16b23c
    • lirui's avatar
      SPARK-2636: Expose job ID in JobWaiter API · fbf2678c
      lirui authored
      This PR adds the async actions to the Java API. User can call these async actions to get the FutureAction and use JobWaiter (for SimpleFutureAction) to retrieve job Id.
      
      Author: lirui <rui.li@intel.com>
      
      Closes #2176 from lirui-intel/SPARK-2636 and squashes the following commits:
      
      ccaafb7 [lirui] SPARK-2636: fix java doc
      5536d55 [lirui] SPARK-2636: mark the async API as experimental
      e2e01d5 [lirui] SPARK-2636: add mima exclude
      0ca320d [lirui] SPARK-2636: fix method name & javadoc
      3fa39f7 [lirui] SPARK-2636: refine the patch
      af4f5d9 [lirui] SPARK-2636: remove unused imports
      843276c [lirui] SPARK-2636: only keep foreachAsync in the java API
      fbf5744 [lirui] SPARK-2636: add more async actions for java api
      1b25abc [lirui] SPARK-2636: expose some fields in JobWaiter
      d09f732 [lirui] SPARK-2636: fix build
      eb1ee79 [lirui] SPARK-2636: change some parameters in SimpleFutureAction to member field
      6e2b87b [lirui] SPARK-2636: add java API for async actions
      fbf2678c
    • Daniel Darabos's avatar
      [SPARK-3342] Add SSDs to block device mapping · 44d3a6a7
      Daniel Darabos authored
      On `m3.2xlarge` instances the 2x80GB SSDs are inaccessible if not added to the block device mapping when the instance is created. They work when added with this patch. I have not tested this with other instance types, and I do not know much about this script and EC2 deployment in general. Maybe this code needs to depend on the instance type.
      
      The requirement for this mapping is described in the AWS docs at:
      http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#InstanceStore_UsageScenarios
      
      "For M3 instances, you must specify instance store volumes in the block
      device mapping for the instance. When you launch an M3 instance, we
      ignore any instance store volumes specified in the block device mapping
      for the AMI."
      
      Author: Daniel Darabos <darabos.daniel@gmail.com>
      
      Closes #2081 from darabos/patch-1 and squashes the following commits:
      
      1ceb2c8 [Daniel Darabos] Use %d string interpolation instead of {}.
      a1854d7 [Daniel Darabos] Only specify ephemeral device mapping for M3.
      e0d9e37 [Daniel Darabos] Create ephemeral device mapping based on get_num_disks().
      6b116a6 [Daniel Darabos] Add SSDs to block device mapping
      44d3a6a7
  3. Sep 01, 2014
    • Reynold Xin's avatar
      [SPARK-3135] Avoid extra mem copy in TorrentBroadcast via ByteArrayChunkOutputStream · db160676
      Reynold Xin authored
      This also enables supporting broadcast variables larger than 2G.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2054 from rxin/ByteArrayChunkOutputStream and squashes the following commits:
      
      618d9c8 [Reynold Xin] Code review.
      93f5a51 [Reynold Xin] Added comments.
      ee88e73 [Reynold Xin] to -> until
      bbd1cb1 [Reynold Xin] Renamed a variable.
      36f4d01 [Reynold Xin] Sort imports.
      8f1a8eb [Reynold Xin] [SPARK-3135] Created ByteArrayChunkOutputStream and used it to avoid memory copy in TorrentBroadcast.
      db160676
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 1f98add9
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #1696 (close requested by 'pwendell')
      Closes #1384 (close requested by 'pwendell')
      Closes #845 (close requested by 'pwendell')
      Closes #81 (close requested by 'pwendell')
      Closes #1528 (close requested by 'pwendell')
      Closes #1018 (close requested by 'pwendell')
      1f98add9
  4. Aug 31, 2014
    • scwf's avatar
      [SPARK-3010] fix redundant conditional · 725715cb
      scwf authored
      https://issues.apache.org/jira/browse/SPARK-3010
      
      this pr is to fix redundant conditional in spark, such as
      1.
      private[spark] def codegenEnabled: Boolean =
      if (getConf(CODEGEN_ENABLED, "false") == "true") true else false
      2.
      x => if (x == 2) true else false
      ...
      
      Author: scwf <wangfei1@huawei.com>
      Author: wangfei <wangfei_hello@126.com>
      
      Closes #1992 from scwf/condition and squashes the following commits:
      
      b2a044a [scwf] merge SecurityManager
      e16239c [scwf] fix confilct
      6811401 [scwf] fix merge confilct
      0824df4 [scwf] Merge branch 'master' of https://github.com/apache/spark into patch-4
      e274515 [scwf] fix redundant conditions
      d032bf9 [wangfei] [SQL]Excess judgment
      725715cb
  5. Aug 30, 2014
    • Nicholas Chammas's avatar
      [Spark QA] only check code files for new classes · c567a68a
      Nicholas Chammas authored
      Look only at code files (`.py`, `.java`, and `.scala`) for new classes.
      
      Should get rid of false alarms like [the one reported here](https://github.com/apache/spark/pull/2014#issuecomment-52912040).
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #2184 from nchammas/jenkins-ignore-noncode and squashes the following commits:
      
      33786ac [Nicholas Chammas] break up long line
      3f91a14 [Nicholas Chammas] rename array of source files
      8b82a26 [Nicholas Chammas] [Spark QA] only check code files for new classes
      c567a68a
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 9b8c2287
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #1922 (close requested by 'JoshRosen')
      Closes #1356 (close requested by 'pwendell')
      Closes #1698 (close requested by 'mengxr')
      Closes #254 (close requested by 'mateiz')
      Closes #2135 (close requested by 'andrewor14')
      9b8c2287
    • Holden Karau's avatar
      SPARK-3318: Documentation update in addFile on how to use SparkFiles.get · ba78383b
      Holden Karau authored
      Rather than specifying the path to SparkFiles we need to use the filename.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #2210 from holdenk/SPARK-3318-documentation-for-addfiles-should-say-to-use-file-not-path and squashes the following commits:
      
      a25d27a [Holden Karau] Update the JavaSparkContext addFile method to be clear about using fileName with SparkFiles as well
      0ebcb05 [Holden Karau] Documentation update in addFile on how to use SparkFiles.get to specify filename rather than path
      ba78383b
    • Marcelo Vanzin's avatar
      [SPARK-2889] Create Hadoop config objects consistently. · b6cf1348
      Marcelo Vanzin authored
      Different places in the code were instantiating Configuration / YarnConfiguration objects in different ways. This could lead to confusion for people who actually expected "spark.hadoop.*" options to end up in the configs used by Spark code, since that would only happen for the SparkContext's config.
      
      This change modifies most places to use SparkHadoopUtil to initialize configs, and make that method do the translation that previously was only done inside SparkContext.
      
      The places that were not changed fall in one of the following categories:
      - Test code where this doesn't really matter
      - Places deep in the code where plumbing SparkConf would be too difficult for very little gain
      - Default values for arguments - since the caller can provide their own config in that case
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #1843 from vanzin/SPARK-2889 and squashes the following commits:
      
      52daf35 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      f179013 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      51e71cf [Marcelo Vanzin] Add test to ensure that overriding Yarn configs works.
      53f9506 [Marcelo Vanzin] Add DeveloperApi annotation.
      3d345cb [Marcelo Vanzin] Restore old method for backwards compat.
      fc45067 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      0ac3fdf [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      3f26760 [Marcelo Vanzin] Compilation fix.
      f16cadd [Marcelo Vanzin] Initialize config in SparkHadoopUtil.
      b8ab173 [Marcelo Vanzin] Update Utils API to take a Configuration argument.
      1e7003f [Marcelo Vanzin] Replace explicit Configuration instantiation with SparkHadoopUtil.
      b6cf1348
    • Reynold Xin's avatar
      Manually close old pull requests · d90434c0
      Reynold Xin authored
      Closes #1824
      d90434c0
    • Raymond Liu's avatar
      [SPARK-2288] Hide ShuffleBlockManager behind ShuffleManager · acea9280
      Raymond Liu authored
      By Hiding the shuffleblockmanager behind Shufflemanager, we decouple the shuffle data's block mapping management work from Diskblockmananger. This give a more clear interface and more easy for other shuffle manager to implement their own block management logic. the jira ticket have more details.
      
      Author: Raymond Liu <raymond.liu@intel.com>
      
      Closes #1241 from colorant/shuffle and squashes the following commits:
      
      0e01ae3 [Raymond Liu] Move ShuffleBlockmanager behind shuffleManager
      acea9280
    • Kousuke Saruta's avatar
      [SPARK-3305] Remove unused import from UI classes. · 7e662af3
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2200 from sarutak/SPARK-3305 and squashes the following commits:
      
      3cbd6ee [Kousuke Saruta] Removed unused import from classes related to UI
      7e662af3
    • Patrick Wendell's avatar
      a004a8d8
  6. Aug 29, 2014
    • Cheng Lian's avatar
      [SPARK-3320][SQL] Made batched in-memory column buffer building work for... · 32b18dd5
      Cheng Lian authored
      [SPARK-3320][SQL] Made batched in-memory column buffer building work for SchemaRDDs with empty partitions
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2213 from liancheng/spark-3320 and squashes the following commits:
      
      45a0139 [Cheng Lian] Fixed typo in InMemoryColumnarQuerySuite
      f67067d [Cheng Lian] Fixed SPARK-3320
      32b18dd5
    • wangfei's avatar
      [SPARK-3296][mllib] spark-example should be run-example in head notation of... · 13901764
      wangfei authored
      [SPARK-3296][mllib] spark-example should be run-example in head notation of DenseKMeans and SparseNaiveBayes
      
      `./bin/spark-example`  should be `./bin/run-example` in DenseKMeans and SparseNaiveBayes
      
      Author: wangfei <wangfei_hello@126.com>
      
      Closes #2193 from scwf/run-example and squashes the following commits:
      
      207eb3a [wangfei] spark-example should be run-example
      27a8999 [wangfei] ./bin/spark-example should be ./bin/run-example
      13901764
    • Zdenek Farana's avatar
      [SPARK-3173][SQL] Timestamp support in the parser · 98ddbe6c
      Zdenek Farana authored
      If you have a table with TIMESTAMP column, that column can't be used in WHERE clause properly - it is not evaluated properly. [More](https://issues.apache.org/jira/browse/SPARK-3173)
      
      Motivation: http://www.aproint.com/aggregation-with-spark-sql/
      
      - [x] modify SqlParser so it supports casting to TIMESTAMP (workaround for item 2)
      - [x] the string literal should be converted into Timestamp if the column is Timestamp.
      
      Author: Zdenek Farana <zdenek.farana@gmail.com>
      Author: Zdenek Farana <zdenek.farana@aproint.com>
      
      Closes #2084 from byF/SPARK-3173 and squashes the following commits:
      
      442b59d [Zdenek Farana] Fixed test merge conflict
      2dbf4f6 [Zdenek Farana] Merge remote-tracking branch 'origin/SPARK-3173' into SPARK-3173
      65b6215 [Zdenek Farana] Fixed timezone sensitivity in the test
      47b27b4 [Zdenek Farana] Now works in the case of "StringLiteral=TimestampColumn"
      96a661b [Zdenek Farana] Code style change
      491dfcf [Zdenek Farana] Added test cases for SPARK-3173
      4446b1e [Zdenek Farana] A string literal is casted into Timestamp when the column is Timestamp.
      59af397 [Zdenek Farana] Added a new TIMESTAMP keyword; CAST to TIMESTAMP now can be used in SQL expression.
      98ddbe6c
    • qiping.lqp's avatar
      [SPARK-3291][SQL]TestcaseName in createQueryTest should not contain ":" · 634d04b8
      qiping.lqp authored
      ":" is not allowed to appear in a file name of Windows system. If file name contains ":", this file can't be checked out in a Windows system and developers using Windows must be careful to not commit the deletion of such files, Which is very inconvenient.
      
      Author: qiping.lqp <qiping.lqp@alibaba-inc.com>
      
      Closes #2191 from chouqin/querytest and squashes the following commits:
      
      0e943a1 [qiping.lqp] rename golden file
      60a863f [qiping.lqp] TestcaseName in createQueryTest should not contain ":"
      634d04b8
    • Cheng Lian's avatar
      [SPARK-3269][SQL] Decreases initial buffer size for row set to prevent OOM · d94a44d7
      Cheng Lian authored
      When a large batch size is specified, `SparkSQLOperationManager` OOMs even if the whole result set is much smaller than the batch size.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2171 from liancheng/jdbc-fetch-size and squashes the following commits:
      
      5e1623b [Cheng Lian] Decreases initial buffer size for row set to prevent OOM
      d94a44d7
Loading