Skip to content
Snippets Groups Projects
  1. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  2. May 17, 2016
  3. Apr 28, 2016
  4. Mar 16, 2016
    • Sean Owen's avatar
      [SPARK-13823][SPARK-13397][SPARK-13395][CORE] More warnings, StandardCharset follow up · 3b461d9e
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Follow up to https://github.com/apache/spark/pull/11657
      
      - Also update `String.getBytes("UTF-8")` to use `StandardCharsets.UTF_8`
      - And fix one last new Coverity warning that turned up (use of unguarded `wait()` replaced by simpler/more robust `java.util.concurrent` classes in tests)
      - And while we're here cleaning up Coverity warnings, just fix about 15 more build warnings
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11725 from srowen/SPARK-13823.2.
      3b461d9e
  5. Mar 01, 2016
    • Reynold Xin's avatar
      [SPARK-13548][BUILD] Move tags and unsafe modules into common · b0ee7d43
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch moves tags and unsafe modules into common directory to remove 2 top level non-user-facing directories.
      
      ## How was this patch tested?
      Jenkins should suffice.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11426 from rxin/SPARK-13548.
      b0ee7d43
  6. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  7. Dec 19, 2015
  8. Nov 17, 2015
  9. Oct 07, 2015
  10. Sep 15, 2015
  11. Jul 31, 2015
    • Josh Rosen's avatar
      [SPARK-9464][SQL] Property checks for UTF8String · 14f26344
      Josh Rosen authored
      This PR is based on the original work by JoshRosen in #7780, which adds ScalaCheck property-based tests for UTF8String.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7830 from yjshen/utf8-property-checks and squashes the following commits:
      
      593da3a [Yijie Shen] resolve comments
      c0800e6 [Yijie Shen] Finish all todos in suite
      52f51a0 [Josh Rosen] Add some more failing tests
      49ed0697 [Josh Rosen] Rename suite
      9209c64 [Josh Rosen] UTF8String Property Checks.
      14f26344
  12. Jun 28, 2015
    • Josh Rosen's avatar
      [SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all · f5100451
      Josh Rosen authored
      Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.
      
      See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:
      
      70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
      f5100451
  13. Jun 25, 2015
    • Tom Graves's avatar
      [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/ta… · e988adb5
      Tom Graves authored
      …rget versions.
      
      I basically copied the compatibility rules from the top level pom.xml into here.  Someone more familiar with all the options in the top level pom may want to make sure nothing else should be copied on down.
      
      With this is allows me to build with jdk8 and run with lower versions.  Source shows compiled for jdk6 as its supposed to.
      
      Author: Tom Graves <tgraves@yahoo-inc.com>
      Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
      
      Closes #6989 from tgravescs/SPARK-8574 and squashes the following commits:
      
      e1ea2d4 [Thomas Graves] Change to use combine.children="append"
      150d645 [Tom Graves] [SPARK-8574] org/apache/spark/unsafe doesn't honor the java source/target versions
      e988adb5
  14. Jun 03, 2015
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
  15. May 20, 2015
    • Josh Rosen's avatar
      [SPARK-7251] Perform sequential scan when iterating over BytesToBytesMap · f2faa7af
      Josh Rosen authored
      This patch modifies `BytesToBytesMap.iterator()` to iterate through records in the order that they appear in the data pages rather than iterating through the hashtable pointer arrays. This results in fewer random memory accesses, significantly improving performance for scan-and-copy operations.
      
      This is possible because our data pages are laid out as sequences of `[keyLength][data][valueLength][data]` entries.  In order to mark the end of a partially-filled data page, we write `-1` as a special end-of-page length (BytesToByesMap supports empty/zero-length keys and values, which is why we had to use a negative length).
      
      This patch incorporates / closes #5836.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6159 from JoshRosen/SPARK-7251 and squashes the following commits:
      
      05bd90a [Josh Rosen] Compare capacity, not size, to MAX_CAPACITY
      2a20d71 [Josh Rosen] Fix maximum BytesToBytesMap capacity
      bc4854b [Josh Rosen] Guard against overflow when growing BytesToBytesMap
      f5feadf [Josh Rosen] Add test for iterating over an empty map
      273b842 [Josh Rosen] [SPARK-7251] Perform sequential scan when iterating over entries in BytesToBytesMap
      f2faa7af
  16. May 13, 2015
    • Josh Rosen's avatar
      [SPARK-7081] Faster sort-based shuffle path using binary processing cache-aware sort · 73bed408
      Josh Rosen authored
      This patch introduces a new shuffle manager that enhances the existing sort-based shuffle with a new cache-friendly sort algorithm that operates directly on binary data. The goals of this patch are to lower memory usage and Java object overheads during shuffle and to speed up sorting. It also lays groundwork for follow-up patches that will enable end-to-end processing of serialized records.
      
      The new shuffle manager, `UnsafeShuffleManager`, can be enabled by setting `spark.shuffle.manager=tungsten-sort` in SparkConf.
      
      The new shuffle manager uses directly-managed memory to implement several performance optimizations for certain types of shuffles. In cases where the new performance optimizations cannot be applied, the new shuffle manager delegates to SortShuffleManager to handle those shuffles.
      
      UnsafeShuffleManager's optimizations will apply when _all_ of the following conditions hold:
      
       - The shuffle dependency specifies no aggregation or output ordering.
       - The shuffle serializer supports relocation of serialized values (this is currently supported
         by KryoSerializer and Spark SQL's custom serializers).
       - The shuffle produces fewer than 16777216 output partitions.
       - No individual record is larger than 128 MB when serialized.
      
      In addition, extra spill-merging optimizations are automatically applied when the shuffle compression codec supports concatenation of serialized streams. This is currently supported by Spark's LZF serializer.
      
      At a high-level, UnsafeShuffleManager's design is similar to Spark's existing SortShuffleManager.  In sort-based shuffle, incoming records are sorted according to their target partition ids, then written to a single map output file. Reducers fetch contiguous regions of this file in order to read their portion of the map output. In cases where the map output data is too large to fit in memory, sorted subsets of the output can are spilled to disk and those on-disk files are merged to produce the final output file.
      
      UnsafeShuffleManager optimizes this process in several ways:
      
       - Its sort operates on serialized binary data rather than Java objects, which reduces memory consumption and GC overheads. This optimization requires the record serializer to have certain properties to allow serialized records to be re-ordered without requiring deserialization.  See SPARK-4550, where this optimization was first proposed and implemented, for more details.
      
       - It uses a specialized cache-efficient sorter (UnsafeShuffleExternalSorter) that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, this fits more of the array into cache.
      
       - The spill merging procedure operates on blocks of serialized records that belong to the same partition and does not need to deserialize records during the merge.
      
       - When the spill compression codec supports concatenation of compressed data, the spill merge simply concatenates the serialized and compressed spill partitions to produce the final output partition.  This allows efficient data copying methods, like NIO's `transferTo`, to be used and avoids the need to allocate decompression or copying buffers during the merge.
      
      The shuffle read path is unchanged.
      
      This patch is similar to [SPARK-4550](http://issues.apache.org/jira/browse/SPARK-4550) / #4450 but uses a slightly different implementation. The `unsafe`-based implementation featured in this patch lays the groundwork for followup patches that will enable sorting to operate on serialized data pages that will be prepared by Spark SQL's new `unsafe` operators (such as the new aggregation operator introduced in #5725).
      
      ### Future work
      
      There are several tasks that build upon this patch, which will be left to future work:
      
      - [SPARK-7271](https://issues.apache.org/jira/browse/SPARK-7271) Redesign / extend the shuffle interfaces to accept binary data as input. The goal here is to let us bypass serialization steps in cases where the sort input is produced by an operator that operates directly on binary data.
      - Extension / redesign of the `Serializer` API. We can add new methods which allow serializers to determine the size requirements for serializing objects and for serializing objects directly to a specified memory address (similar to how `UnsafeRowConverter` works in Spark SQL).
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5868)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5868 from JoshRosen/unsafe-sort and squashes the following commits:
      
      ef0a86e [Josh Rosen] Fix scalastyle errors
      7610f2f [Josh Rosen] Add tests for proper cleanup of shuffle data.
      d494ffe [Josh Rosen] Fix deserialization of JavaSerializer instances.
      52a9981 [Josh Rosen] Fix some bugs in the address packing code.
      51812a7 [Josh Rosen] Change shuffle manager sort name to tungsten-sort
      4023fa4 [Josh Rosen] Add @Private annotation to some Java classes.
      de40b9d [Josh Rosen] More comments to try to explain metrics code
      df07699 [Josh Rosen] Attempt to clarify confusing metrics update code
      5e189c6 [Josh Rosen] Track time spend closing / flushing files; split TimeTrackingOutputStream into separate file.
      d5779c6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort
      c2ce78e [Josh Rosen] Fix a missed usage of MAX_PARTITION_ID
      e3b8855 [Josh Rosen] Cleanup in UnsafeShuffleWriter
      4a2c785 [Josh Rosen] rename 'sort buffer' to 'pointer array'
      6276168 [Josh Rosen] Remove ability to disable spilling in UnsafeShuffleExternalSorter.
      57312c9 [Josh Rosen] Clarify fileBufferSize units
      2d4e4f4 [Josh Rosen] Address some minor comments in UnsafeShuffleExternalSorter.
      fdcac08 [Josh Rosen] Guard against overflow when expanding sort buffer.
      85da63f [Josh Rosen] Cleanup in UnsafeShuffleSorterIterator.
      0ad34da [Josh Rosen] Fix off-by-one in nextInt() call
      56781a1 [Josh Rosen] Rename UnsafeShuffleSorter to UnsafeShuffleInMemorySorter
      e995d1a [Josh Rosen] Introduce MAX_SHUFFLE_OUTPUT_PARTITIONS.
      e58a6b4 [Josh Rosen] Add more tests for PackedRecordPointer encoding.
      4f0b770 [Josh Rosen] Attempt to implement proper shuffle write metrics.
      d4e6d89 [Josh Rosen] Update to bit shifting constants
      69d5899 [Josh Rosen] Remove some unnecessary override vals
      8531286 [Josh Rosen] Add tests that automatically trigger spills.
      7c953f9 [Josh Rosen] Add test that covers UnsafeShuffleSortDataFormat.swap().
      e1855e5 [Josh Rosen] Fix a handful of misc. IntelliJ inspections
      39434f9 [Josh Rosen] Avoid integer multiplication overflow in getMemoryUsage (thanks FindBugs!)
      1e3ad52 [Josh Rosen] Delete unused ByteBufferOutputStream class.
      ea4f85f [Josh Rosen] Roll back an unnecessary change in Spillable.
      ae538dc [Josh Rosen] Document UnsafeShuffleManager.
      ec6d626 [Josh Rosen] Add notes on maximum # of supported shuffle partitions.
      0d4d199 [Josh Rosen] Bump up shuffle.memoryFraction to make tests pass.
      b3b1924 [Josh Rosen] Properly implement close() and flush() in DummySerializerInstance.
      1ef56c7 [Josh Rosen] Revise compression codec support in merger; test cross product of configurations.
      b57c17f [Josh Rosen] Disable some overly-verbose logs that rendered DEBUG useless.
      f780fb1 [Josh Rosen] Add test demonstrating which compression codecs support concatenation.
      4a01c45 [Josh Rosen] Remove unnecessary log message
      27b18b0 [Josh Rosen] That for inserting records AT the max record size.
      fcd9a3c [Josh Rosen] Add notes + tests for maximum record / page sizes.
      9d1ee7c [Josh Rosen] Fix MiMa excludes for ShuffleWriter change
      fd4bb9e [Josh Rosen] Use own ByteBufferOutputStream rather than Kryo's
      67d25ba [Josh Rosen] Update Exchange operator's copying logic to account for new shuffle manager
      8f5061a [Josh Rosen] Strengthen assertion to check partitioning
      01afc74 [Josh Rosen] Actually read data in UnsafeShuffleWriterSuite
      1929a74 [Josh Rosen] Update to reflect upstream ShuffleBlockManager -> ShuffleBlockResolver rename.
      e8718dd [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort
      9b7ebed [Josh Rosen] More defensive programming RE: cleaning up spill files and memory after errors
      7cd013b [Josh Rosen] Begin refactoring to enable proper tests for spilling.
      722849b [Josh Rosen] Add workaround for transferTo() bug in merging code; refactor tests.
      9883e30 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort
      b95e642 [Josh Rosen] Refactor and document logic that decides when to spill.
      1ce1300 [Josh Rosen] More minor cleanup
      5e8cf75 [Josh Rosen] More minor cleanup
      e67f1ea [Josh Rosen] Remove upper type bound in ShuffleWriter interface.
      cfe0ec4 [Josh Rosen] Address a number of minor review comments:
      8a6fe52 [Josh Rosen] Rename UnsafeShuffleSpillWriter to UnsafeShuffleExternalSorter
      11feeb6 [Josh Rosen] Update TODOs related to shuffle write metrics.
      b674412 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort
      aaea17b [Josh Rosen] Add comments to UnsafeShuffleSpillWriter.
      4f70141 [Josh Rosen] Fix merging; now passes UnsafeShuffleSuite tests.
      133c8c9 [Josh Rosen] WIP towards testing UnsafeShuffleWriter.
      f480fb2 [Josh Rosen] WIP in mega-refactoring towards shuffle-specific sort.
      57f1ec0 [Josh Rosen] WIP towards packed record pointers for use in optimized shuffle sort.
      69232fd [Josh Rosen] Enable compressible address encoding for off-heap mode.
      7ee918e [Josh Rosen] Re-order imports in tests
      3aeaff7 [Josh Rosen] More refactoring and cleanup; begin cleaning iterator interfaces
      3490512 [Josh Rosen] Misc. cleanup
      f156a8f [Josh Rosen] Hacky metrics integration; refactor some interfaces.
      2776aca [Josh Rosen] First passing test for ExternalSorter.
      5e100b2 [Josh Rosen] Super-messy WIP on external sort
      595923a [Josh Rosen] Remove some unused variables.
      8958584 [Josh Rosen] Fix bug in calculating free space in current page.
      f17fa8f [Josh Rosen] Add missing newline
      c2fca17 [Josh Rosen] Small refactoring of SerializerPropertiesSuite to enable test re-use:
      b8a09fe [Josh Rosen] Back out accidental log4j.properties change
      bfc12d3 [Josh Rosen] Add tests for serializer relocation property.
      240864c [Josh Rosen] Remove PrefixComputer and require prefix to be specified as part of insert()
      1433b42 [Josh Rosen] Store record length as int instead of long.
      026b497 [Josh Rosen] Re-use a buffer in UnsafeShuffleWriter
      0748458 [Josh Rosen] Port UnsafeShuffleWriter to Java.
      87e721b [Josh Rosen] Renaming and comments
      d3cc310 [Josh Rosen] Flag that SparkSqlSerializer2 supports relocation
      e2d96ca [Josh Rosen] Expand serializer API and use new function to help control when new UnsafeShuffle path is used.
      e267cee [Josh Rosen] Fix compilation of UnsafeSorterSuite
      9c6cf58 [Josh Rosen] Refactor to use DiskBlockObjectWriter.
      253f13e [Josh Rosen] More cleanup
      8e3ec20 [Josh Rosen] Begin code cleanup.
      4d2f5e1 [Josh Rosen] WIP
      3db12de [Josh Rosen] Minor simplification and sanity checks in UnsafeSorter
      767d3ca [Josh Rosen] Fix invalid range in UnsafeSorter.
      e900152 [Josh Rosen] Add test for empty iterator in UnsafeSorter
      57a4ea0 [Josh Rosen] Make initialSize configurable in UnsafeSorter
      abf7bfe [Josh Rosen] Add basic test case.
      81d52c5 [Josh Rosen] WIP on UnsafeSorter
      73bed408
  17. Apr 30, 2015
    • Josh Rosen's avatar
      [SPARK-7288] Suppress compiler warnings due to use of sun.misc.Unsafe; add... · 07a86205
      Josh Rosen authored
      [SPARK-7288] Suppress compiler warnings due to use of sun.misc.Unsafe; add facade in front of Unsafe; remove use of Unsafe.setMemory
      
      This patch suppresses compiler warnings due to our use of `sun.misc.Unsafe` (introduced in #5725).  These warnings can only be suppressed via the `-XDignore.symbol.file` javac flag; the `SuppressWarnings` annotation won't work for these.
      
      In order to restrict uses of this compiler flag to the `unsafe` module, I placed a facade in front of `Unsafe` so that other modules won't call it directly. This facade also will also help us to avoid accidental usage of deprecated Unsafe methods or methods that aren't supported in Java 6.
      
      I also removed an unnecessary use of `Unsafe.setMemory`, which isn't present in certain versions of Java 6, and excluded the new `unsafe` module from Javadoc.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5814 from JoshRosen/unsafe-compiler-warnings-fixes and squashes the following commits:
      
      9e8c483 [Josh Rosen] Exclude new unsafe module from Javadoc
      ba75ecf [Josh Rosen] Only apply -XDignore.symbol.file flag in unsafe project.
      7403345 [Josh Rosen] Put facade in front of Unsafe.
      50230c0 [Josh Rosen] Remove usage of Unsafe.setMemory
      96d41c9 [Josh Rosen] Use -XDignore.symbol.file to suppress warnings about sun.misc.Unsafe usage
      07a86205
  18. Apr 29, 2015
    • Josh Rosen's avatar
      [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use managed memory for aggregations · f49284b5
      Josh Rosen authored
      This patch adds managed-memory-based aggregation to Spark SQL / DataFrames. Instead of working with Java objects, this new aggregation path uses `sun.misc.Unsafe` to manipulate raw memory.  This reduces the memory footprint for aggregations, resulting in fewer spills, OutOfMemoryErrors, and garbage collection pauses.  As a result, this allows for higher memory utilization.  It can also result in better cache locality since objects will be stored closer together in memory.
      
      This feature can be eanbled by setting `spark.sql.unsafe.enabled=true`.  For now, this feature is only supported when codegen is enabled and only supports aggregations for which the grouping columns are primitive numeric types or strings and aggregated values are numeric.
      
      ### Managing memory with sun.misc.Unsafe
      
      This patch supports both on- and off-heap managed memory.
      
      - In on-heap mode, memory addresses are identified by the combination of a base Object and an offset within that object.
      - In off-heap mode, memory is addressed directly with 64-bit long addresses.
      
      To support both modes, functions that manipulate memory accept both `baseObject` and `baseOffset` fields.  In off-heap mode, we simply pass `null` as `baseObject`.
      
      We allocate memory in large chunks, so memory fragmentation and allocation speed are not significant bottlenecks.
      
      By default, we use on-heap mode.  To enable off-heap mode, set `spark.unsafe.offHeap=true`.
      
      To track allocated memory, this patch extends `SparkEnv` with an `ExecutorMemoryManager` and supplies each `TaskContext` with a `TaskMemoryManager`.  These classes work together to track allocations and detect memory leaks.
      
      ### Compact tuple format
      
      This patch introduces `UnsafeRow`, a compact row layout.  In this format, each tuple has three parts: a null bit set, fixed length values, and variable-length values:
      
      ![image](https://cloud.githubusercontent.com/assets/50748/7328538/2fdb65ce-ea8b-11e4-9743-6c0f02bb7d1f.png)
      
      - Rows are always 8-byte word aligned (so their sizes will always be a multiple of 8 bytes)
      - The bit set is used for null tracking:
      	- Position _i_ is set if and only if field _i_ is null
      	- The bit set is aligned to an 8-byte word boundary.
      - Every field appears as an 8-byte word in the fixed-length values part:
      	- If a field is null, we zero out the values.
      	- If a field is variable-length, the word stores a relative offset (w.r.t. the base of the tuple) that points to the beginning of the field's data in the variable-length part.
      - Each variable-length data type can have its own encoding:
      	- For strings, the first word stores the length of the string and is followed by UTF-8 encoded bytes.  If necessary, the end of the string is padded with empty bytes in order to ensure word-alignment.
      
      For example, a tuple that consists 3 fields of type (int, string, string), with value (null, “data”, “bricks”) would look like this:
      
      ![image](https://cloud.githubusercontent.com/assets/50748/7328526/1e21959c-ea8b-11e4-9a28-a4350fe4a7b5.png)
      
      This format allows us to compare tuples for equality by directly comparing their raw bytes.  This also enables fast hashing of tuples.
      
      ### Hash map for performing aggregations
      
      This patch introduces `UnsafeFixedWidthAggregationMap`, a hash map for performing aggregations where the aggregation result columns are fixed-with.  This map's keys and values are `Row` objects. `UnsafeFixedWidthAggregationMap` is implemented on top of `BytesToBytesMap`, an append-only map which supports byte-array keys and values.
      
      `BytesToBytesMap` stores pointers to key and value tuples.  For each record with a new key, we copy the key and create the aggregation value buffer for that key and put them in a buffer. The hash table then simply stores pointers to the key and value. For each record with an existing key, we simply run the aggregation function to update the values in place.
      
      This map is implemented using open hashing with triangular sequence probing.  Each entry stores two words in a long array: the first word stores the address of the key and the second word stores the relative offset from the key tuple to the value tuple, as well as the key's 32-bit hashcode.  By storing the full hashcode, we reduce the number of equality checks that need to be performed to handle position collisions ()since the chance of hashcode collision is much lower than position collision).
      
      `UnsafeFixedWidthAggregationMap` allows regular Spark SQL `Row` objects to be used when probing the map.  Internally, it encodes these rows into `UnsafeRow` format using `UnsafeRowConverter`.  This conversion has a small overhead that can be eliminated in the future once we use UnsafeRows in other operators.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5725)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5725 from JoshRosen/unsafe and squashes the following commits:
      
      eeee512 [Josh Rosen] Add converters for Null, Boolean, Byte, and Short columns.
      81f34f8 [Josh Rosen] Follow 'place children last' convention for GeneratedAggregate
      1bc36cc [Josh Rosen] Refactor UnsafeRowConverter to avoid unnecessary boxing.
      017b2dc [Josh Rosen] Remove BytesToBytesMap.finalize()
      50e9671 [Josh Rosen] Throw memory leak warning even in case of error; add warning about code duplication
      70a39e4 [Josh Rosen] Split MemoryManager into ExecutorMemoryManager and TaskMemoryManager:
      6e4b192 [Josh Rosen] Remove an unused method from ByteArrayMethods.
      de5e001 [Josh Rosen] Fix debug vs. trace in logging message.
      a19e066 [Josh Rosen] Rename unsafe Java test suites to match Scala test naming convention.
      78a5b84 [Josh Rosen] Add logging to MemoryManager
      ce3c565 [Josh Rosen] More comments, formatting, and code cleanup.
      529e571 [Josh Rosen] Measure timeSpentResizing in nanoseconds instead of milliseconds.
      3ca84b2 [Josh Rosen] Only zero the used portion of groupingKeyConversionScratchSpace
      162caf7 [Josh Rosen] Fix test compilation
      b45f070 [Josh Rosen] Don't redundantly store the offset from key to value, since we can compute this from the key size.
      a8e4a3f [Josh Rosen] Introduce MemoryManager interface; add to SparkEnv.
      0925847 [Josh Rosen] Disable MiMa checks for new unsafe module
      cde4132 [Josh Rosen] Add missing pom.xml
      9c19fc0 [Josh Rosen] Add configuration options for heap vs. offheap
      6ffdaa1 [Josh Rosen] Null handling improvements in UnsafeRow.
      31eaabc [Josh Rosen] Lots of TODO and doc cleanup.
      a95291e [Josh Rosen] Cleanups to string handling code
      afe8dca [Josh Rosen] Some Javadoc cleanup
      f3dcbfe [Josh Rosen] More mod replacement
      854201a [Josh Rosen] Import and comment cleanup
      06e929d [Josh Rosen] More warning cleanup
      ef6b3d3 [Josh Rosen] Fix a bunch of FindBugs and IntelliJ inspections
      29a7575 [Josh Rosen] Remove debug logging
      49aed30 [Josh Rosen] More long -> int conversion.
      b26f1d3 [Josh Rosen] Fix bug in murmur hash implementation.
      765243d [Josh Rosen] Enable optional performance metrics for hash map.
      23a440a [Josh Rosen] Bump up default hash map size
      628f936 [Josh Rosen] Use ints intead of longs for indexing.
      92d5a06 [Josh Rosen] Address a number of minor code review comments.
      1f4b716 [Josh Rosen] Merge Unsafe code into the regular GeneratedAggregate, guarded by a configuration flag; integrate planner support and re-enable all tests.
      d85eeff [Josh Rosen] Add basic sanity test for UnsafeFixedWidthAggregationMap
      bade966 [Josh Rosen] Comment update (bumping to refresh GitHub cache...)
      b3eaccd [Josh Rosen] Extract aggregation map into its own class.
      d2bb986 [Josh Rosen] Update to implement new Row methods added upstream
      58ac393 [Josh Rosen] Use UNSAFE allocator in GeneratedAggregate (TODO: make this configurable)
      7df6008 [Josh Rosen] Optimizations related to zeroing out memory:
      c1b3813 [Josh Rosen] Fix bug in UnsafeMemoryAllocator.free():
      738fa33 [Josh Rosen] Add feature flag to guard UnsafeGeneratedAggregate
      c55bf66 [Josh Rosen] Free buffer once iterator has been fully consumed.
      62ab054 [Josh Rosen] Optimize for fact that get() is only called on String columns.
      c7f0b56 [Josh Rosen] Reuse UnsafeRow pointer in UnsafeRowConverter
      ae39694 [Josh Rosen] Add finalizer as "cleanup method of last resort"
      c754ae1 [Josh Rosen] Now that the store*() contract has been stregthened, we can remove an extra lookup
      f764d13 [Josh Rosen] Simplify address + length calculation in Location.
      079f1bf [Josh Rosen] Some clarification of the BytesToBytesMap.lookup() / set() contract.
      1a483c5 [Josh Rosen] First version that passes some aggregation tests:
      fc4c3a8 [Josh Rosen] Sketch how the converters will be used in UnsafeGeneratedAggregate
      53ba9b7 [Josh Rosen] Start prototyping Java Row -> UnsafeRow converters
      1ff814d [Josh Rosen] Add reminder to free memory on iterator completion
      8a8f9df [Josh Rosen] Add skeleton for GeneratedAggregate integration.
      5d55cef [Josh Rosen] Add skeleton for Row implementation.
      f03e9c1 [Josh Rosen] Play around with Unsafe implementations of more string methods.
      ab68e08 [Josh Rosen] Begin merging the UTF8String implementations.
      480a74a [Josh Rosen] Initial import of code from Databricks unsafe utils repo.
      f49284b5
  19. Mar 20, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT. · a7456459
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5056 from vanzin/SPARK-6371 and squashes the following commits:
      
      63220df [Marcelo Vanzin] Merge branch 'master' into SPARK-6371
      6506f75 [Marcelo Vanzin] Use more fine-grained exclusion.
      178ba71 [Marcelo Vanzin] Oops.
      75b2375 [Marcelo Vanzin] Exclude VertexRDD in MiMA.
      a45a62c [Marcelo Vanzin] Work around MIMA warning.
      1d8a670 [Marcelo Vanzin] Re-group jetty exclusion.
      0e8e909 [Marcelo Vanzin] Ignore ml, don't ignore graphx.
      cef4603 [Marcelo Vanzin] Indentation.
      296cf82 [Marcelo Vanzin] [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.
      a7456459
  20. Mar 05, 2015
  21. Jan 08, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4048] Enhance and extend hadoop-provided profile. · 48cecf67
      Marcelo Vanzin authored
      This change does a few things to make the hadoop-provided profile more useful:
      
      - Create new profiles for other libraries / services that might be provided by the infrastructure
      - Simplify and fix the poms so that the profiles are only activated while building assemblies.
      - Fix tests so that they're able to run when the profiles are activated
      - Add a new env variable to be used by distributions that use these profiles to provide the runtime
        classpath for Spark jobs and daemons.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
      
      82eb688 [Marcelo Vanzin] Add a comment.
      eb228c0 [Marcelo Vanzin] Fix borked merge.
      4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
      371ebee [Marcelo Vanzin] Review feedback.
      52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      322f882 [Marcelo Vanzin] Fix merge fail.
      f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9640503 [Marcelo Vanzin] Cleanup child process log message.
      115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
      e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
      7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
      1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
      d1399ed [Marcelo Vanzin] Restore jetty dependency.
      82a54b9 [Marcelo Vanzin] Remove unused profile.
      5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
      1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
      f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
      9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
      d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
      4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
      417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
      2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
      1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
      284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
      48cecf67
  22. Jan 06, 2015
    • Sean Owen's avatar
      SPARK-4159 [CORE] Maven build doesn't run JUnit test suites · 4cba6eb4
      Sean Owen authored
      This PR:
      
      - Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
      - Tells `surefire` to test only Java tests
      - Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.
      
      For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3651 from srowen/SPARK-4159 and squashes the following commits:
      
      2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
      12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
      e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
      4cba6eb4
  23. Nov 19, 2014
    • Prashant Sharma's avatar
      SPARK-3962 Marked scope as provided for external projects. · 1c938413
      Prashant Sharma authored
      Somehow maven shade plugin is set in infinite loop of creating effective pom.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Prashant Sharma <scrapcodes@gmail.com>
      
      Closes #2959 from ScrapCodes/SPARK-3962/scope-provided and squashes the following commits:
      
      994d1d3 [Prashant Sharma] Fixed failing flume tests
      270b4fb [Prashant Sharma] Removed most of the unused code.
      bb3bbfd [Prashant Sharma] SPARK-3962 Marked scope as provided for external.
      1c938413
  24. Nov 18, 2014
    • Marcelo Vanzin's avatar
      Bumping version to 1.3.0-SNAPSHOT. · 397d3aae
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3277 from vanzin/version-1.3 and squashes the following commits:
      
      7c3c396 [Marcelo Vanzin] Added temp repo to sbt build.
      5f404ff [Marcelo Vanzin] Add another exclusion.
      19457e7 [Marcelo Vanzin] Update old version to 1.2, add temporary 1.2 repo.
      3c8d705 [Marcelo Vanzin] Workaround for MIMA checks.
      e940810 [Marcelo Vanzin] Bumping version to 1.3.0-SNAPSHOT.
      397d3aae
  25. Sep 06, 2014
  26. Jul 30, 2014
    • Sean Owen's avatar
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven... · 6ab96a6f
      Sean Owen authored
      SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep
      
      The Maven-based builds in the build matrix have been failing for a few days:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/
      
      On inspection, it looks like the Spark SQL Java tests don't compile:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/244/consoleFull
      
      I confirmed it by repeating the command vs master:
      
      `mvn -Dhadoop.version=1.0.4 -Dlabel=centos -DskipTests clean package`
      
      The problem is that this module doesn't depend on JUnit. In fact, none of the modules do, but `com.novocode:junit-interface` (the SBT-JUnit bridge) pulls it in, in most places. However this module doesn't depend on `com.novocode:junit-interface`
      
      Adding the `junit:junit` dependency fixes the compile problem. In fact, the other modules with Java tests should probably depend on it explicitly instead of happening to get it via `com.novocode:junit-interface`, since that is a bit SBT/Scala-specific (and I am not even sure it's needed).
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1660 from srowen/SPARK-2749 and squashes the following commits:
      
      858ff7c [Sean Owen] Add explicit junit dep to other modules with Java tests for robustness
      9636794 [Sean Owen] Add junit dep so that Spark SQL Java tests compile
      6ab96a6f
  27. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  28. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  29. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  30. Jul 10, 2014
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
  31. Jun 05, 2014
  32. May 10, 2014
    • Sean Owen's avatar
      SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite failure · 2b7bd29e
      Sean Owen authored
      TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure.
      
      I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?)
      
      velvia notes:
      "I have found a workaround.  If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty."
      
      There are at least 3 versions of Netty in play in the build:
      
      - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem
      - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6.
      - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final
      
      The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue.
      
      The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final.
      
      But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile.
      
      If the build *keeps* io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation.
      
      So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict:
      
      - Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts
      - Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty
      - Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent
      - Update SBT build accordingly
      
      A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #723 from srowen/SPARK-1789 and squashes the following commits:
      
      43661b7 [Sean Owen] Update and add Netty excludes to prevent some JAR conflicts that cause test issues
      2b7bd29e
  33. Apr 29, 2014
    • witgo's avatar
      Improved build configuration · 030f2c21
      witgo authored
      1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x
      2, Fix SPARK-1491: maven hadoop-provided profile fails to build
      3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency
      4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces)
      
      Author: witgo <witgo@qq.com>
      
      Closes #480 from witgo/format_pom and squashes the following commits:
      
      03f652f [witgo] review commit
      b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence
      7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence
      0da4bc3 [witgo] merge master
      d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      e345919 [witgo] add avro dependency to yarn-alpha
      77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency
      1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
      934f24d [witgo] review commit
      cf46edc [witgo] exclude jruby
      06e7328 [witgo] Merge branch 'SparkBuild' into format_pom
      99464d2 [witgo] fix maven hadoop-provided profile fails to build
      0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x
      6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
      030f2c21
  34. Mar 08, 2014
    • Sandy Ryza's avatar
      SPARK-1193. Fix indentation in pom.xmls · a99fb374
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #91 from sryza/sandy-spark-1193 and squashes the following commits:
      
      a878124 [Sandy Ryza] SPARK-1193. Fix indentation in pom.xmls
      a99fb374
  35. Mar 02, 2014
    • Patrick Wendell's avatar
      SPARK-1121: Include avro for yarn-alpha builds · c3f5e075
      Patrick Wendell authored
      This lets us explicitly include Avro based on a profile for 0.23.X
      builds. It makes me sad how convoluted it is to express this logic
      in Maven. @tgraves and @sryza curious if this works for you.
      
      I'm also considering just reverting to how it was before. The only
      real problem was that Spark advertised a dependency on Avro
      even though it only really depends transitively on Avro through
      other deps.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #49 from pwendell/avro-build-fix and squashes the following commits:
      
      8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile
      c3f5e075
Loading