Skip to content
Snippets Groups Projects
  1. Oct 26, 2013
  2. Oct 25, 2013
    • Patrick Wendell's avatar
      Adding improved error message when multiple assembly jars are present. · 4ba32678
      Patrick Wendell authored
      This can happen easily if building different hadoop versions.
      4ba32678
    • Matei Zaharia's avatar
      Merge pull request #108 from alig/master · bab496c1
      Matei Zaharia authored
      Changes to enable executing by using HDFS as a synchronization point between driver and executors, as well as ensuring executors exit properly.
      bab496c1
    • Matei Zaharia's avatar
      Merge pull request #102 from tdas/transform · d307db6e
      Matei Zaharia authored
      Added new Spark Streaming operations
      
      New operations
      - transformWith which allows arbitrary 2-to-1 DStream transform, added to Scala and Java API
      - StreamingContext.transform to allow arbitrary n-to-1 DStream
      - leftOuterJoin and rightOuterJoin between 2 DStreams, added to Scala and Java API
      - missing variations of join and cogroup added to Scala Java API
      - missing JavaStreamingContext.union
      
      Updated a number of Java and Scala API docs
      d307db6e
    • Ali Ghodsi's avatar
      fixing comments on PR · eef261c8
      Ali Ghodsi authored
      eef261c8
    • Matei Zaharia's avatar
      Merge pull request #111 from kayousterhout/ui_name · 85e2cab6
      Matei Zaharia authored
      Properly display the name of a stage in the UI.
      
      This fixes a bug introduced by the fix for SPARK-940, which
      changed the UI to display the RDD name rather than the stage
      name. As a result, no name for the stage was shown when
      using the Spark shell, which meant that there was no way to
      click on the stage to see more details (e.g., the running
      tasks). This commit changes the UI back to using the
      stage name.
      
      @pwendell -- let me know if this change was intentional
      85e2cab6
    • Tathagata Das's avatar
      dc957078
    • Kay Ousterhout's avatar
      Properly display the name of a stage in the UI. · a9c8d83a
      Kay Ousterhout authored
      This fixes a bug introduced by the fix for SPARK-940, which
      changed the UI to display the RDD name rather than the stage
      name. As a result, no name for the stage was shown when
      using the Spark shell, which meant that there was no way to
      click on the stage to see more details (e.g., the running
      tasks). This commit changes the UI back to using the
      stage name.
      a9c8d83a
    • Reynold Xin's avatar
      Merge pull request #110 from pwendell/master · ab35ec4f
      Reynold Xin authored
      Exclude jopt from kafka dependency.
      
      Kafka uses an older version of jopt that causes bad conflicts with the version
      used by spark-perf. It's not easy to remove this downstream because of the way
      that spark-perf uses Spark (by including a spark assembly as an unmanaged jar).
      This fixes the problem at its source by just never including it.
      ab35ec4f
    • Patrick Wendell's avatar
      Exclude jopt from kafka dependency. · af4a529f
      Patrick Wendell authored
      Kafka uses an older version of jopt that causes bad conflicts with the version
      used by spark-perf. It's not easy to remove this downstream because of the way
      that spark-perf uses Spark (by including a spark assembly as an unmanaged jar).
      This fixes the problem at its source by just never including it.
      af4a529f
    • Reynold Xin's avatar
      Merge pull request #109 from pwendell/master · 4f2c9438
      Reynold Xin authored
      Adding Java/Java Streaming versions of `repartition` with associated tests
      4f2c9438
    • Patrick Wendell's avatar
      Style fixes · ad5f579c
      Patrick Wendell authored
      ad5f579c
    • Patrick Wendell's avatar
      Spacing fix · e5f6d569
      Patrick Wendell authored
      e5f6d569
  3. Oct 24, 2013
    • Patrick Wendell's avatar
      Small spacing fix · a351fd4a
      Patrick Wendell authored
      a351fd4a
    • Patrick Wendell's avatar
      31e92b72
    • Reynold Xin's avatar
      Merge pull request #106 from pwendell/master · 99ad4a61
      Reynold Xin authored
      Add a `repartition` operator.
      
      This patch adds an operator called repartition with more straightforward
      semantics than the current `coalesce` operator. There are a few use cases
      where this operator is useful:
      
      1. If a user wants to increase the number of partitions in the RDD. This
      is more common now with streaming. E.g. a user is ingesting data on one
      node but they want to add more partitions to ensure parallelism of
      subsequent operations across threads or the cluster.
      
      Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's
      super confusing.
      
      2. If a user has input data where the number of partitions is not known. E.g.
      
      > sc.textFile("some file").coalesce(50)....
      
      This is both vague semantically (am I growing or shrinking this RDD) but also,
      may not work correctly if the base RDD has fewer than 50 partitions.
      
      The new operator forces shuffles every time, so it will always produce exactly
      the number of new partitions. It also throws an exception rather than silently
      not-working if a bad input is passed.
      
      I am currently adding streaming tests (requires refactoring some of the test
      suite to allow testing at partition granularity), so this is not ready for
      merge yet. But feedback is welcome.
      99ad4a61
    • Patrick Wendell's avatar
      Some clean-up of tests · 39f6f755
      Patrick Wendell authored
      39f6f755
    • Tathagata Das's avatar
      Fixed accidental bug. · e962a6e6
      Tathagata Das authored
      e962a6e6
    • Patrick Wendell's avatar
      Removing Java for now · 9423532f
      Patrick Wendell authored
      9423532f
    • Patrick Wendell's avatar
      Adding tests · 05ac9940
      Patrick Wendell authored
      05ac9940
    • Patrick Wendell's avatar
      Always use a shuffle · 2fda84fe
      Patrick Wendell authored
      2fda84fe
    • Patrick Wendell's avatar
      Add a `repartition` operator. · 08c1a42d
      Patrick Wendell authored
      This patch adds an operator called repartition with more straightforward
      semantics than the current `coalesce` operator. There are a few use cases
      where this operator is useful:
      
      1. If a user wants to increase the number of partitions in the RDD. This
      is more common now with streaming. E.g. a user is ingesting data on one
      node but they want to add more partitions to ensure parallelism of
      subsequent operations across threads or the cluster.
      
      Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's
      super confusing.
      
      2. If a user has input data where the number of partitions is not known. E.g.
      
      > sc.textFile("some file").coalesce(50)....
      
      This is both vague semantically (am I growing or shrinking this RDD) but also,
      may not work correctly if the base RDD has fewer than 50 partitions.
      
      The new operator forces shuffles every time, so it will always produce exactly
      the number of new partitions. It also throws an exception rather than silently
      not-working if a bad input is passed.
      
      I am currently adding streaming tests (requires refactoring some of the test
      suite to allow testing at partition granularity), so this is not ready for
      merge yet. But feedback is welcome.
      08c1a42d
    • Ali Ghodsi's avatar
      Makes Spark SIMR ready. · 05a0df2b
      Ali Ghodsi authored
      05a0df2b
    • Tathagata Das's avatar
      0400aba1
    • Tathagata Das's avatar
      Added JavaStreamingContext.transform · bacfe5eb
      Tathagata Das authored
      bacfe5eb
    • Matei Zaharia's avatar
      Merge pull request #93 from kayousterhout/ui_new_state · 1dc776b8
      Matei Zaharia authored
      Show "GETTING_RESULTS" state in UI.
      
      This commit adds a set of calls using the SparkListener interface
      that indicate when a task is remotely fetching results, so that
      we can display this (potentially time-consuming) phase of execution
      to users through the UI.
      1dc776b8
  4. Oct 23, 2013
Loading