Commits · 8fab7995d36c7bc4524393b20a4e524dbf6bbf62 · cs525-sp18-g07 / spark

Oct 02, 2017

[SPARK-22167][R][BUILD] sparkr packaging issue allow zinc · 8fab7995

Holden Karau authored 7 years ago

## What changes were proposed in this pull request?

When zinc is running the pwd might be in the root of the project. A quick solution to this is to not go a level up incase we are in the root rather than root/core/. If we are in the root everything works fine, if we are in core add a script which goes and runs the level up

## How was this patch tested?

set -x in the SparkR install scripts.

Author: Holden Karau <holden@us.ibm.com>

Closes #19402 from holdenk/SPARK-22167-sparkr-packaging-issue-allow-zinc.

8fab7995

[SPARK-22173][WEB-UI] Table CSS style needs to be adjusted in History Page and in Executors Page. · 405c0e99

guoxiaolong authored 7 years ago

## What changes were proposed in this pull request?

There is a problem with table CSS style.

1. At present, table CSS style is too crowded, and the table width cannot adapt itself.

2. Table CSS style is different from job page, stage page, task page, master page, worker page, etc. The Spark web UI needs to be consistent.

fix before:
![01](https://user-images.githubusercontent.com/26266482/31041261-c6766c3a-a5c4-11e7-97a7-96bd51ef12bd.png)

![02](https://user-images.githubusercontent.com/26266482/31041266-d75b6a32-a5c4-11e7-8071-e3bbbba39b80.png)

----------------------------------------------------------------------------------------------------------

fix after:
![1](https://user-images.githubusercontent.com/26266482/31041162-808a5a3e-a5c3-11e7-8d92-d763b500ce53.png)

![2](https://user-images.githubusercontent.com/26266482/31041166-86e583e0-a5c3-11e7-949c-11c370db9e27.png)

## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: guoxiaolong <guo.xiaolong1@zte.com.cn>

Closes #19397 from guoxiaolongzte/SPARK-22173.

405c0e99

Oct 01, 2017

[SPARK-22001][ML][SQL] ImputerModel can do withColumn for all input columns at one pass · 3ca36708

Liang-Chi Hsieh authored 7 years ago

## What changes were proposed in this pull request?

SPARK-21690 makes one-pass `Imputer` by parallelizing the computation of all input columns. When we transform dataset with `ImputerModel`, we do `withColumn` on all input columns sequentially. We can also do this on all input columns at once by adding a `withColumns` API to `Dataset`.

The new `withColumns` API is for internal use only now.

## How was this patch tested?

Existing tests for `ImputerModel`'s change. Added tests for `withColumns` API.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #19229 from viirya/SPARK-22001.

3ca36708

[SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r · 02c91e03

hyukjinkwon authored 7 years ago

## What changes were proposed in this pull request?

Currently, we set lintr to jimhester/lintra769c0b (see [this](https://github.com/apache/spark/commit/7d1175011c976756efcd4e4e4f70a8fd6f287026) and [SPARK-14074](https://issues.apache.org/jira/browse/SPARK-14074)).

I first tested and checked lintr-1.0.1 but it looks many important fixes are missing (for example, checking 100 length). So, I instead tried the latest commit, https://github.com/jimhester/lintr/commit/5431140ffea65071f1327625d4a8de9688fa7e72, in my local and fixed the check failures.

It looks it has fixed many bugs and now finds many instances that I have observed and thought should be caught time to time, here I filed [the results](https://gist.github.com/HyukjinKwon/4f59ddcc7b6487a02da81800baca533c).

The downside looks it now takes about 7ish mins, (it was 2ish mins before) in my local.

## How was this patch tested?

Manually, `./dev/lint-r` after manually updating the lintr package.

Author: hyukjinkwon <gurwls223@gmail.com>
Author: zuotingbing <zuo.tingbing9@zte.com.cn>

Closes #19290 from HyukjinKwon/upgrade-r-lint.

02c91e03

Sep 29, 2017

[SPARK-22122][SQL] Use analyzed logical plans to count input rows in TPCDSQueryBenchmark · c6610a99

Takeshi Yamamuro authored 7 years ago

## What changes were proposed in this pull request?
Since the current code ignores WITH clauses to check input relations in TPCDS queries, this leads to inaccurate per-row processing time for benchmark results. For example, in `q2`, this fix could catch all the input relations: `web_sales`, `date_dim`, and `catalog_sales` (the current code catches `date_dim` only). The one-third of the TPCDS queries uses WITH clauses, so I think it is worth fixing this.

## How was this patch tested?
Manually checked.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #19344 from maropu/RespectWithInTPCDSBench.

c6610a99

[SPARK-21904][SQL] Rename tempTables to tempViews in SessionCatalog · 530fe683

gatorsmile authored 7 years ago

### What changes were proposed in this pull request?
`tempTables` is not right. To be consistent, we need to rename the internal variable names/comments to tempViews in SessionCatalog too.

### How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #19117 from gatorsmile/renameTempTablesToTempViews.

530fe683

Revert "[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile" · 47286401
gatorsmile authored 7 years ago
```
This reverts commit a2516f41.
```
47286401

[SPARK-22161][SQL] Add Impala-modified TPC-DS queries · 9ed7394a

gatorsmile authored 7 years ago

## What changes were proposed in this pull request?

Added IMPALA-modified TPCDS queries to TPC-DS query suites.

- Ref: https://github.com/cloudera/impala-tpcds-kit/tree/master/queries

## How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #19386 from gatorsmile/addImpalaQueries.

9ed7394a

[SPARK-22129][SPARK-22138] Release script improvements · ecbe416a

Holden Karau authored 7 years ago

## What changes were proposed in this pull request?

Use the GPG_KEY param, fix lsof to non-hardcoded path, remove version swap since it wasn't really needed. Use EXPORT on JAVA_HOME for downstream scripts as well.

## How was this patch tested?

Rolled 2.1.2 RC2

Author: Holden Karau <holden@us.ibm.com>

Closes #19359 from holdenk/SPARK-22129-fix-signing.

ecbe416a

[SPARK-22142][BUILD][STREAMING] Move Flume support behind a profile · a2516f41

Sean Owen authored 7 years ago

## What changes were proposed in this pull request?

Add 'flume' profile to enable Flume-related integration modules

## How was this patch tested?

Existing tests; no functional change

Author: Sean Owen <sowen@cloudera.com>

Closes #19365 from srowen/SPARK-22142.

a2516f41

[SPARK-22141][FOLLOWUP][SQL] Add comments for the order of batches · 0fa4dbe4

Wang Gengliang authored 7 years ago

## What changes were proposed in this pull request?
Add comments for specifying the position of  batch "Check Cartesian Products", as rxin suggested in https://github.com/apache/spark/pull/19362 .

## How was this patch tested?
Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes #19379 from gengliangwang/SPARK-22141-followup.

0fa4dbe4

[SPARK-22146] FileNotFoundException while reading ORC files containing special characters · 161ba7ea

Marco Gaido authored 7 years ago

## What changes were proposed in this pull request?

Reading ORC files containing special characters like '%' fails with a FileNotFoundException.
This PR aims to fix the problem.

## How was this patch tested?

Added UT.

Author: Marco Gaido <marcogaido91@gmail.com>
Author: Marco Gaido <mgaido@hortonworks.com>

Closes #19368 from mgaido91/SPARK-22146.

161ba7ea

Sep 28, 2017

[SPARK-22160][SQL] Make sample points per partition (in range partitioner)... · 323806e6

Reynold Xin authored 7 years ago

[SPARK-22160][SQL] Make sample points per partition (in range partitioner) configurable and bump the default value up to 100

## What changes were proposed in this pull request?
Spark's RangePartitioner hard codes the number of sampling points per partition to be 20. This is sometimes too low. This ticket makes it configurable, via spark.sql.execution.rangeExchange.sampleSizePerPartition, and raises the default in Spark SQL to be 100.

## How was this patch tested?
Added a pretty sophisticated test based on chi square test ...

Author: Reynold Xin <rxin@databricks.com>

Closes #19387 from rxin/SPARK-22160.

323806e6

[SPARK-22159][SQL] Make config names consistently end with "enabled". · d29d1e87

Reynold Xin authored 7 years ago

## What changes were proposed in this pull request?
spark.sql.execution.arrow.enable and spark.sql.codegen.aggregate.map.twolevel.enable -> enabled

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #19384 from rxin/SPARK-22159.

d29d1e87

[SPARK-22153][SQL] Rename ShuffleExchange -> ShuffleExchangeExec · d74dee13

Reynold Xin authored 7 years ago

## What changes were proposed in this pull request?
For some reason when we added the Exec suffix to all physical operators, we missed this one. I was looking for this physical operator today and couldn't find it, because I was looking for ExchangeExec.

## How was this patch tested?
This is a simple rename and should be covered by existing tests.

Author: Reynold Xin <rxin@databricks.com>

Closes #19376 from rxin/SPARK-22153.

d74dee13

[SPARK-22128][CORE] Update paranamer to 2.8 to avoid BytecodeReadingParanamer... · 01bd00d1

Sean Owen authored 7 years ago

[SPARK-22128][CORE] Update paranamer to 2.8 to avoid BytecodeReadingParanamer ArrayIndexOutOfBoundsException with Scala 2.12 + Java 8 lambda

## What changes were proposed in this pull request?

Un-manage jackson-module-paranamer version to let it use the version desired by jackson-module-scala; manage paranamer up from 2.8 for jackson-module-scala 2.7.9, to override avro 1.7.7's desired paranamer 2.3

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #19352 from srowen/SPARK-22128.

01bd00d1

[SPARK-22135][MESOS] metrics in spark-dispatcher not being registered properly · f20be4d7

Paul Mackles authored 7 years ago

## What changes were proposed in this pull request?

Fix a trivial bug with how metrics are registered in the mesos dispatcher. Bug resulted in creating a new registry each time the metricRegistry() method was called.

## How was this patch tested?

Verified manually on local mesos setup

Author: Paul Mackles <pmackles@adobe.com>

Closes #19358 from pmackles/SPARK-22135.

f20be4d7

Sep 27, 2017

[SPARK-22123][CORE] Add latest failure reason for task set blacklist · 3b117d63

zhoukang authored 7 years ago

## What changes were proposed in this pull request?
This patch add latest failure reason for task set blacklist.Which can be showed on spark ui and let user know failure reason directly.
Till now , every job which aborted by completed blacklist just show log like below which has no more information:
`Aborting $taskSet because task $indexInTaskSet (partition $partition) cannot run anywhere due to node and executor blacklist.  Blacklisting behavior cannot run anywhere due to node and executor blacklist.Blacklisting behavior can be configured via spark.blacklist.*."`
**After modify:**
```
Aborting TaskSet 0.0 because task 0 (partition 0)
cannot run anywhere due to node and executor blacklist.
Most recent failure:
Some(Lost task 0.1 in stage 0.0 (TID 3,xxx, executor 1): java.lang.Exception: Fake error!
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:73)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:305)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
).

Blacklisting behavior can be configured via spark.blacklist.*.

```

## How was this patch tested?

Unit test and manually test.

Author: zhoukang <zhoukang199191@gmail.com>

Closes #19338 from caneGuy/zhoukang/improve-blacklist.

3b117d63

[MINOR] Fixed up pandas_udf related docs and formatting · 7bf4da8a

Bryan Cutler authored 7 years ago

## What changes were proposed in this pull request?

Fixed some minor issues with pandas_udf related docs and formatting.

## How was this patch tested?

NA

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #19375 from BryanCutler/arrow-pandas_udf-cleanup-minor.

7bf4da8a

[SPARK-22140] Add TPCDSQuerySuite · 9244957b

gatorsmile authored 7 years ago

## What changes were proposed in this pull request?
Now, we are not running TPC-DS queries as regular test cases. Thus, we need to add a test suite using empty tables for ensuring the new code changes will not break them. For example, optimizer/analyzer batches should not exceed the max iteration.

## How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #19361 from gatorsmile/tpcdsQuerySuite.

9244957b

[SPARK-22143][SQL] Fix memory leak in OffHeapColumnVector · 02bb0682

Herman van Hovell authored 7 years ago

## What changes were proposed in this pull request?
`WriteableColumnVector` does not close its child column vectors. This can create memory leaks for `OffHeapColumnVector` where we do not clean up the memory allocated by a vectors children. This can be especially bad for string columns (which uses a child byte column vector).

## How was this patch tested?
I have updated the existing tests to always use both on-heap and off-heap vectors. Testing and diagnoses was done locally.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #19367 from hvanhovell/SPARK-22143.

02bb0682

[HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle · 9b98aef6

Sean Owen authored 7 years ago

## What changes were proposed in this pull request?

Fix finalizer checkstyle violation by just turning it off; re-disable checkstyle as it won't be run by SBT PR builder. See https://github.com/apache/spark/pull/18887#issuecomment-332580700

## How was this patch tested?

`./dev/lint-java` runs successfully

Author: Sean Owen <sowen@cloudera.com>

Closes #19371 from srowen/HotfixFinalizerCheckstlye.

9b98aef6

[SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format for vectorized UDF. · 09cbf3df

Takuya UESHIN authored 7 years ago

## What changes were proposed in this pull request?

Currently we use Arrow File format to communicate with Python worker when invoking vectorized UDF but we can use Arrow Stream format.

This pr replaces the Arrow File format with the Arrow Stream format.

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <ueshin@databricks.com>

Closes #19349 from ueshin/issues/SPARK-22125.

09cbf3df

[SPARK-22130][CORE] UTF8String.trim() scans " " twice · 12e740bb

Kazuaki Ishizaki authored 7 years ago

## What changes were proposed in this pull request?

This PR allows us to scan a string including only white space (e.g. `"     "`) once while the current implementation scans twice (right to left, and then left to right).

## How was this patch tested?

Existing test suites

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19355 from kiszk/SPARK-22130.

12e740bb

[SAPRK-20785][WEB-UI][SQL] Spark should provide jump links and add (count) in the SQL web ui. · d2b8b63b

guoxiaolong authored 7 years ago

## What changes were proposed in this pull request?

propose:

it provide links that jump to Running Queries,Completed Queries and Failed Queries.
it add (count) about Running Queries,Completed Queries and Failed Queries.
This is a small optimization in in the SQL web ui.

fix before:

![1](https://user-images.githubusercontent.com/26266482/30840686-36025cc0-a2ab-11e7-8d8d-1de0122a84fb.png)

fix after:
![2](https://user-images.githubusercontent.com/26266482/30840723-6cc67a52-a2ab-11e7-8002-9191a55895a6.png)

## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: guoxiaolong <guo.xiaolong1@zte.com.cn>

Closes #19346 from guoxiaolongzte/SPARK-20785.

d2b8b63b

[SPARK-20642][CORE] Store FsHistoryProvider listing data in a KVStore. · 74daf622

Marcelo Vanzin authored 7 years ago

The application listing is still generated from event logs, but is now stored
in a KVStore instance. By default an in-memory store is used, but a new config
allows setting a local disk path to store the data, in which case a LevelDB
store will be created.

The provider stores things internally using the public REST API types; I believe
this is better going forward since it will make it easier to get rid of the
internal history server API which is mostly redundant at this point.

I also added a finalizer to LevelDBIterator, to make sure that resources are
eventually released. This helps when code iterates but does not exhaust the
iterator, thus not triggering the auto-close code.

HistoryServerSuite was modified to not re-start the history server unnecessarily;
this makes the json validation tests run more quickly.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #18887 from vanzin/SPARK-20642.

74daf622

[SPARK-22141][SQL] Propagate empty relation before checking Cartesian products · 9c5935d0

Wang Gengliang authored 7 years ago

## What changes were proposed in this pull request?

When inferring constraints from children, Join's condition can be simplified as None.
For example,
```
val testRelation = LocalRelation('a.int)
val x = testRelation.as("x")
val y = testRelation.where($"a" === 2 && !($"a" === 2)).as("y")
x.join.where($"x.a" === $"y.a")
```
The plan will become
```
Join Inner
:- LocalRelation <empty>, [a#23]
+- LocalRelation <empty>, [a#224]
```
And the Cartesian products check will throw exception for above plan.

Propagate empty relation before checking Cartesian products, and the issue is resolved.

## How was this patch tested?

Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes #19362 from gengliangwang/MoveCheckCartesianProducts.

9c5935d0

Sep 26, 2017

[SPARK-22112][PYSPARK] Supports RDD of strings as input in spark.read.csv in PySpark · 1fdfe693

goldmedal authored 7 years ago

## What changes were proposed in this pull request?
We added a method to the scala API for creating a `DataFrame` from `DataSet[String]` storing CSV in [SPARK-15463](https://issues.apache.org/jira/browse/SPARK-15463) but PySpark doesn't have `Dataset` to support this feature. Therfore, I add an API to create a `DataFrame` from `RDD[String]` storing csv and it's also consistent with PySpark's `spark.read.json`.

For example as below
```
>>> rdd = sc.textFile('python/test_support/sql/ages.csv')
>>> df2 = spark.read.csv(rdd)
>>> df2.dtypes
[('_c0', 'string'), ('_c1', 'string')]
```
## How was this patch tested?
add unit test cases.

Author: goldmedal <liugs963@gmail.com>

Closes #19339 from goldmedal/SPARK-22112.

1fdfe693

[BUILD] Close stale PRs · ceaec938

hyukjinkwon authored 7 years ago

Closes #13794
Closes #18474
Closes #18897
Closes #18978
Closes #19152
Closes #19238
Closes #19295
Closes #19334
Closes #19335
Closes #19347
Closes #19236
Closes #19244
Closes #19300
Closes #19315
Closes #19356
Closes #15009
Closes #18253

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #19348 from HyukjinKwon/stale-prs.

ceaec938

[SPARK-22103][FOLLOWUP] Rename addExtraCode to addInnerClass · f21f6ce9

Juliusz Sompolski authored 7 years ago

## What changes were proposed in this pull request?

Address PR comments that appeared post-merge, to rename `addExtraCode` to `addInnerClass`,
and not count the size of the inner class to the size of the outer class.

## How was this patch tested?

YOLO.

Author: Juliusz Sompolski <julek@databricks.com>

Closes #19353 from juliuszsompolski/SPARK-22103followup.

f21f6ce9

[SPARK-22124][SQL] Sample and Limit should also defer input evaluation under codegen · 64fbd1ce

Liang-Chi Hsieh authored 7 years ago

## What changes were proposed in this pull request?

We can override `usedInputs` to claim that an operator defers input evaluation. `Sample` and `Limit` are two operators which should claim it but don't. We should do it.

## How was this patch tested?

Existing tests.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #19345 from viirya/SPARK-22124.

64fbd1ce

Sep 25, 2017

[SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_udf and add doctests · d8e825e3

Bryan Cutler authored 7 years ago

## What changes were proposed in this pull request?

This change disables the use of 0-parameter pandas_udfs due to the API being overly complex and awkward, and can easily be worked around by using an index column as an input argument.  Also added doctests for pandas_udfs which revealed bugs for handling empty partitions and using the pandas_udf decorator.

## How was this patch tested?

Reworked existing 0-parameter test to verify error is raised, added doctest for pandas_udf, added new tests for empty partition and decorator usage.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #19325 from BryanCutler/arrow-pandas_udf-0-param-remove-SPARK-22106.

d8e825e3

[SPARK-22120][SQL] TestHiveSparkSession.reset() should clean out Hive warehouse directory · ce204780

Greg Owen authored 7 years ago

## What changes were proposed in this pull request?
During TestHiveSparkSession.reset(), which is called after each TestHiveSingleton suite, we now delete and recreate the Hive warehouse directory.

## How was this patch tested?
Ran full suite of tests locally, verified that they pass.

Author: Greg Owen <greg@databricks.com>

Closes #19341 from GregOwen/SPARK-22120.

ce204780

[SPARK-22103] Move HashAggregateExec parent consume to a separate function in codegen · 038b1857

Juliusz Sompolski authored 7 years ago

## What changes were proposed in this pull request?

HashAggregateExec codegen uses two paths for fast hash table and a generic one.
It generates code paths for iterating over both, and both code paths generate the consume code of the parent operator, resulting in that code being expanded twice.
This leads to a long generated function that might be an issue for the compiler (see e.g. SPARK-21603).
I propose to remove the double expansion by generating the consume code in a helper function that can just be called from both iterating loops.

An issue with separating the `consume` code to a helper function was that a number of places relied and assumed on being in the scope of an outside `produce` loop and e.g. use `continue` to jump out.
I replaced such code flows with nested scopes. It is code that should be handled the same by compiler, while getting rid of depending on assumptions that are outside of the `consume`'s own scope.

## How was this patch tested?

Existing test coverage.

Author: Juliusz Sompolski <julek@databricks.com>

Closes #19324 from juliuszsompolski/aggrconsumecodegen.

038b1857

[SPARK-22083][CORE] Release locks in MemoryStore.evictBlocksToFreeSpace · 2c5b9b11

Imran Rashid authored 7 years ago

## What changes were proposed in this pull request?

MemoryStore.evictBlocksToFreeSpace acquires write locks for all the
blocks it intends to evict up front.  If there is a failure to evict
blocks (eg., some failure dropping a block to disk), then we have to
release the lock.  Otherwise the lock is never released and an executor
trying to get the lock will wait forever.

## How was this patch tested?

Added unit test.

Author: Imran Rashid <irashid@cloudera.com>

Closes #19311 from squito/SPARK-22083.

2c5b9b11

[SPARK-22100][SQL] Make percentile_approx support date/timestamp type and... · 365a29bd

Zhenhua Wang authored 7 years ago

[SPARK-22100][SQL] Make percentile_approx support date/timestamp type and change the output type to be the same as input type

## What changes were proposed in this pull request?

The `percentile_approx` function previously accepted numeric type input and output double type results.

But since all numeric types, date and timestamp types are represented as numerics internally, `percentile_approx` can support them easily.

After this PR, it supports date type, timestamp type and numeric types as input types. The result type is also changed to be the same as the input type, which is more reasonable for percentiles.

This change is also required when we generate equi-height histograms for these types.

## How was this patch tested?

Added a new test and modified some existing tests.

Author: Zhenhua Wang <wangzhenhua@huawei.com>

Closes #19321 from wzhfy/approx_percentile_support_types.

365a29bd

Sep 24, 2017

[SPARK-22107] Change as to alias in python quickstart · 20adf9aa

John O'Leary authored 7 years ago

## What changes were proposed in this pull request?

Updated docs so that a line of python in the quick start guide executes. Closes #19283

## How was this patch tested?

Existing tests.

Author: John O'Leary <jgoleary@gmail.com>

Closes #19326 from jgoleary/issues/22107.

20adf9aa

[SPARK-22087][SPARK-14650][WIP][BUILD][REPL][CORE] Compile Spark REPL for... · 576c43fb

Sean Owen authored 7 years ago

[SPARK-22087][SPARK-14650][WIP][BUILD][REPL][CORE] Compile Spark REPL for Scala 2.12 + other 2.12 fixes

## What changes were proposed in this pull request?

Enable Scala 2.12 REPL. Fix most remaining issues with 2.12 compilation and warnings, including:

- Selecting Kafka 0.10.1+ for Scala 2.12 and patching over a minor API difference
- Fixing lots of "eta expansion of zero arg method deprecated" warnings
- Resolving the SparkContext.sequenceFile implicits compile problem
- Fixing an odd but valid jetty-server missing dependency in hive-thriftserver

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #19307 from srowen/Scala212.

576c43fb

[SPARK-22058][CORE] the BufferedInputStream will not be closed if an exception occurs. · 4943ea59

zuotingbing authored 7 years ago

## What changes were proposed in this pull request?

EventLoggingListener use `val in = new BufferedInputStream(fs.open(log))` and will close it if `codec.map(_.compressedInputStream(in)).getOrElse(in)`  occurs an exception .
But, if `CompressionCodec.createCodec(new SparkConf, c)` throws an exception, the BufferedInputStream `in` will not be closed anymore.

## How was this patch tested?

exist tests

Author: zuotingbing <zuo.tingbing9@zte.com.cn>

Closes #19277 from zuotingbing/SPARK-22058.

4943ea59

[SPARK-22093][TESTS] Fixes `assume` in `UtilsSuite` and `HiveDDLSuite` · 9d48bd0b

hyukjinkwon authored 7 years ago

## What changes were proposed in this pull request?

This PR proposes to remove `assume` in `Utils.resolveURIs` and replace `assume` to `assert` in `Utils.resolveURI` in the test cases in `UtilsSuite`.

It looks `Utils.resolveURIs` supports multiple but also single paths as input. So, it looks not meaningful to check if the input has `,`.

For the test for `Utils.resolveURI`, I replaced it to `assert` because it looks taking single path and in order to prevent future mistakes when adding more tests here.

For `assume` in `HiveDDLSuite`, it looks it should be `assert` to test at the last
## How was this patch tested?

Fixed unit tests.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #19332 from HyukjinKwon/SPARK-22093.

9d48bd0b