Commits · d6ddfdf60e77340256873b5acf08e85f95cf3bc2 · cs525-sp18-g07 / spark

Mar 27, 2017

[MINOR][SPARKR] Move 'Data type mapping between R and Spark' to right place in SparkR doc. · 1d00761b

Yanbo Liang authored 8 years ago

Section ```Data type mapping between R and Spark``` was put in the wrong place in SparkR doc currently, we should move it to a separate section.

## What changes were proposed in this pull request?
Before this PR:
![image](https://cloud.githubusercontent.com/assets/1962026/24340911/bc01a532-126a-11e7-9a08-0d60d13a547c.png)

After this PR:
![image](https://cloud.githubusercontent.com/assets/1962026/24340938/d9d32a9a-126a-11e7-8891-d2f5b46e0c71.png)

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #17440 from yanboliang/sparkr-doc.

1d00761b

Mar 23, 2017

[SPARK-10849][SQL] Adds option to the JDBC data source write for user to... · c7911807

sureshthalamati authored 8 years ago

[SPARK-10849][SQL] Adds option to the JDBC data source write for user to specify database column type for the create table

## What changes were proposed in this pull request?
Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism. If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns.

The solution is to allow users to specify database column data type for the create table as JDBC datasource option(createTableColumnTypes) on write. Data type information can be specified in the same format as table schema DDL format (e.g: `name CHAR(64), comments VARCHAR(1024)`).

All supported target database types can not be specified , the data types has to be valid spark sql data types also. For example user can not specify target database CLOB data type. This will be supported in the follow-up PR.

Example:
```Scala
df.write
.option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)")
.jdbc(url, "TEST.DBCOLTYPETEST", properties)
```
## How was this patch tested?
Added new test cases to the JDBCWriteSuite

Author: sureshthalamati <suresh.thalamati@gmail.com>

Closes #16209 from sureshthalamati/jdbc_custom_dbtype_option_json-spark-10849.

c7911807

Mar 22, 2017

[SPARK-20021][PYSPARK] Miss backslash in python code · facfd608

uncleGen authored 8 years ago

## What changes were proposed in this pull request?

Add backslash for line continuation in python code.

## How was this patch tested?

Jenkins.

Author: uncleGen <hustyugm@gmail.com>
Author: dylon <hustyugm@gmail.com>

Closes #17352 from uncleGen/python-example-doc.

facfd608

Mar 21, 2017

[SPARK-20011][ML][DOCS] Clarify documentation for ALS 'rank' parameter · 7620aed8

christopher snow authored 8 years ago

## What changes were proposed in this pull request?

API documentation and collaborative filtering documentation page changes to clarify inconsistent description of ALS rank parameter.

 - [DOCS] was previously: "rank is the number of latent factors in the model."
 - [API] was previously:  "rank - number of features to use"

This change describes rank in both places consistently as:

 - "Number of features to use (also referred to as the number of latent factors)"

Author: Chris Snow <chris.snowuk.ibm.com>

Author: christopher snow <chsnow123@gmail.com>

Closes #17345 from snowch/SPARK-20011.

7620aed8

Mar 20, 2017

[SPARK-19906][SS][DOCS] Documentation describing how to write queries to Kafka · c2d1761a

Tyson Condie authored 8 years ago

## What changes were proposed in this pull request?

Add documentation that describes how to write streaming and batch queries to Kafka.

zsxwing tdas

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Tyson Condie <tcondie@gmail.com>

Closes #17246 from tcondie/kafka-write-docs.

c2d1761a

Mar 17, 2017

[SPARK-13369] Add config for number of consecutive fetch failures · 7b5d873a

Sital Kedia authored 8 years ago

The previously hardcoded max 4 retries per stage is not suitable for all cluster configurations. Since spark retries a stage at the sign of the first fetch failure, you can easily end up with many stage retries to discover all the failures. In particular, two scenarios this value should change are (1) if there are more than 4 executors per node; in that case, it may take 4 retries to discover the problem with each executor on the node and (2) during cluster maintenance on large clusters, where multiple machines are serviced at once, but you also cannot afford total cluster downtime. By making this value configurable, cluster managers can tune this value to something more appropriate to their cluster configuration.

Unit tests

Author: Sital Kedia <skedia@fb.com>

Closes #17307 from sitalkedia/SPARK-13369.

7b5d873a

Mar 12, 2017

[DOCS][SS] fix structured streaming python example · e29a74d5

uncleGen authored 8 years ago

## What changes were proposed in this pull request?

- SS python example: `TypeError: 'xxx' object is not callable`
- some other doc issue.

## How was this patch tested?

Jenkins.

Author: uncleGen <hustyugm@gmail.com>

Closes #17257 from uncleGen/docs-ss-python.

e29a74d5

Mar 10, 2017

[SPARK-17979][SPARK-14453] Remove deprecated SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS · 8f0490e2

Yong Tang authored 8 years ago

This fix removes deprecated support for config `SPARK_YARN_USER_ENV`, as is mentioned in SPARK-17979.
This fix also removes deprecated support for the following:
```
SPARK_YARN_USER_ENV
SPARK_JAVA_OPTS
SPARK_CLASSPATH
SPARK_WORKER_INSTANCES
```

Related JIRA:
[SPARK-14453]: https://issues.apache.org/jira/browse/SPARK-14453
[SPARK-12344]: https://issues.apache.org/jira/browse/SPARK-12344
[SPARK-15781]: https://issues.apache.org/jira/browse/SPARK-15781

Existing tests should pass.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes #17212 from yongtang/SPARK-17979.

8f0490e2

Mar 09, 2017

[SPARK-19715][STRUCTURED STREAMING] Option to Strip Paths in FileSource · 40da4d18

Liwei Lin authored 8 years ago

## What changes were proposed in this pull request?

Today, we compare the whole path when deciding if a file is new in the FileSource for structured streaming. However, this would cause false negatives in the case where the path has changed in a cosmetic way (i.e. changing `s3n` to `s3a`).

This patch adds an option `fileNameOnly` that causes the new file check to be based only on the filename (but still store the whole path in the log).

## Usage

```scala
spark
  .readStream
  .option("fileNameOnly", true)
  .text("s3n://bucket/dir1/dir2")
  .writeStream
  ...
```
## How was this patch tested?

Added a test case

Author: Liwei Lin <lwlin7@gmail.com>

Closes #17120 from lw-lin/filename-only.

40da4d18

Mar 07, 2017

[SPARK-19516][DOC] update public doc to use SparkSession instead of SparkContext · d69aeeaf

Wenchen Fan authored 8 years ago

## What changes were proposed in this pull request?

After Spark 2.0, `SparkSession` becomes the new entry point for Spark applications. We should update the public documents to reflect this.

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #16856 from cloud-fan/doc.

d69aeeaf

[SPARK-17498][ML] StringIndexer enhancement for handling unseen labels · 4a9034b1

VinceShieh authored 8 years ago

## What changes were proposed in this pull request?
This PR is an enhancement to ML StringIndexer.
Before this PR, String Indexer only supports "skip"/"error" options to deal with unseen records.
But those unseen records might still be useful and user would like to keep the unseen labels in
certain use cases, This PR enables StringIndexer to support keeping unseen labels as
indices [numLabels].

'''Before
StringIndexer().setHandleInvalid("skip")
StringIndexer().setHandleInvalid("error")
'''After
support the third option "keep"
StringIndexer().setHandleInvalid("keep")

## How was this patch tested?
Test added in StringIndexerSuite

Signed-off-by: VinceShieh <vincent.xieintel.com>
(Please fill in changes proposed in this fix)

Author: VinceShieh <vincent.xie@intel.com>

Closes #16883 from VinceShieh/spark-17498.

4a9034b1

Mar 03, 2017

[MINOR][DOC] Fix doc for web UI https configuration · ba186a84

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

Doc about enabling web UI https is not correct, "spark.ui.https.enabled" is not existed, actually enabling SSL is enough for https.

## How was this patch tested?

N/A

Author: jerryshao <sshao@hortonworks.com>

Closes #17147 from jerryshao/fix-doc-ssl.

ba186a84

[SPARK-19797][DOC] ML pipeline document correction · 0bac3e4c

Zhe Sun authored 8 years ago

## What changes were proposed in this pull request?
Description about pipeline in this paragraph is incorrect https://spark.apache.org/docs/latest/ml-pipeline.html#how-it-works

> If the Pipeline had more **stages**, it would call the LogisticRegressionModel’s transform() method on the DataFrame before passing the DataFrame to the next stage.

Reason: Transformer could also be a stage. But only another Estimator will invoke an transform call and pass the data to next stage. The description in the document misleads ML pipeline users.

## How was this patch tested?
This is a tiny modification of **docs/ml-pipelines.md**. I jekyll build the modification and check the compiled document.

Author: Zhe Sun <ymwdalex@gmail.com>

Closes #17137 from ymwdalex/SPARK-19797-ML-pipeline-document-correction.

0bac3e4c

Mar 02, 2017

[SPARK-19345][ML][DOC] Add doc for "coldStartStrategy" usage in ALS · 9cca3dbf

Nick Pentreath authored 8 years ago

[SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489) added the ability to skip `NaN` predictions during `ALSModel.transform`. This PR adds documentation for the `coldStartStrategy` param to the ALS user guide, and add code to the examples to illustrate usage.

## How was this patch tested?

Doc and example change only. Build HTML doc locally and verified example code builds, and runs in shell for Scala/Python.

Author: Nick Pentreath <nickp@za.ibm.com>

Closes #17102 from MLnick/SPARK-19345-coldstart-doc.

9cca3dbf

[SPARK-18352][DOCS] wholeFile JSON update doc and programming guide · 8d6ef895

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Update doc for R, programming guide. Clarify default behavior for all languages.

## How was this patch tested?

manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17128 from felixcheung/jsonwholefiledoc.

8d6ef895

Feb 28, 2017

[SPARK-19769][DOCS] Update quickstart instructions · bf5987cb

Michael McCune authored 8 years ago

## What changes were proposed in this pull request?

This change addresses the renaming of the `simple.sbt` build file to
`build.sbt`. Newer versions of the sbt tool are not finding the older
named file and are looking for the `build.sbt`. The quickstart
instructions for self-contained applications is updated with this
change.

## How was this patch tested?

As this is a relatively minor change of a few words, the markdown was checked for syntax and spelling. Site was built with `SKIP_API=1 jekyll serve` for testing purposes.

Author: Michael McCune <msm@redhat.com>

Closes #17101 from elmiko/spark-19769.

bf5987cb

[MINOR][DOC] Update GLM doc to include tweedie distribution · d743ea4c

actuaryzhang authored 8 years ago

Update GLM documentation to include the Tweedie distribution. #16344

jkbradley yanboliang

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #17103 from actuaryzhang/doc.

d743ea4c

[SPARK-19660][CORE][SQL] Replace the configuration property names that are... · 9b8eca65

Yuming Wang authored 8 years ago

[SPARK-19660][CORE][SQL] Replace the configuration property names that are deprecated in the version of Hadoop 2.6

## What changes were proposed in this pull request?

Replace all the Hadoop deprecated configuration property names according to [DeprecatedProperties](https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html).

except:
https://github.com/apache/spark/blob/v2.1.0/python/pyspark/sql/tests.py#L1533
https://github.com/apache/spark/blob/v2.1.0/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala#L987
https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala#L45
https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L614

## How was this patch tested?

Existing tests

Author: Yuming Wang <wgyumg@gmail.com>

Closes #16990 from wangyum/HadoopDeprecatedProperties.

9b8eca65

Feb 25, 2017

[MINOR][DOCS] Fixes two problems in the SQL programing guide page · 061bcfb8

Boaz Mohar authored 8 years ago

## What changes were proposed in this pull request?

Removed duplicated lines in sql python example and found a typo.

## How was this patch tested?

Searched for other typo's in the page to minimize PR's.

Author: Boaz Mohar <boazmohar@gmail.com>

Closes #17066 from boazmohar/doc-fix.

061bcfb8

Feb 24, 2017

[MINOR][DOCS] Fix few typos in structured streaming doc · 1b9ba258

Ramkumar Venkataraman authored 8 years ago

## What changes were proposed in this pull request?

Minor typo in `even-time`, which is changed to `event-time` and a couple of grammatical errors fix.

## How was this patch tested?

N/A - since this is a doc fix. I did a jekyll build locally though.

Author: Ramkumar Venkataraman <rvenkataraman@paypal.com>

Closes #17037 from ramkumarvenkat/doc-fix.

1b9ba258

[SPARK-15355][CORE] Proactive block replication · fa7c582e

Shubham Chopra authored 8 years ago

## What changes were proposed in this pull request?

We are proposing addition of pro-active block replication in case of executor failures. BlockManagerMasterEndpoint does all the book-keeping to keep a track of all the executors and the blocks they hold. It also keeps a track of which executors are alive through heartbeats. When an executor is removed, all this book-keeping state is updated to reflect the lost executor. This step can be used to identify executors that are still in possession of a copy of the cached data and a message could be sent to them to use the existing "replicate" function to find and place new replicas on other suitable hosts. Blocks replicated this way will let the master know of their existence.

This can happen when an executor is lost, and would that way be pro-active as opposed be being done at query time.
## How was this patch tested?

This patch was tested with existing unit tests along with new unit tests added to test the functionality.

Author: Shubham Chopra <schopra31@bloomberg.net>

Closes #14412 from shubhamchopra/ProactiveBlockReplication.

fa7c582e

Feb 23, 2017

[SPARK-16122][DOCS] application environment rest api · d0276245

uncleGen authored 8 years ago

## What changes were proposed in this pull request?

follow up pr of #16949.

## How was this patch tested?

jenkins

Author: uncleGen <hustyugm@gmail.com>

Closes #17033 from uncleGen/doc-restapi-environment.

d0276245

[SPARK-19684][DOCS] Remove developer info from docs. · f87a6a59

Kay Ousterhout authored 8 years ago

This commit moves developer-specific information from the release-
specific documentation in this repo to the developer tools page on
the main Spark website. This commit relies on this PR on the
Spark website: https://github.com/apache/spark-website/pull/33.

srowen

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #17018 from kayousterhout/SPARK-19684.

f87a6a59

Feb 22, 2017

[SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM. · 4661d30b

Marcelo Vanzin authored 8 years ago

Allow an application to use the History Server URL as the tracking
URL in the YARN RM, so there's still a link to the web UI somewhere
in YARN even if the driver's UI is disabled. This is useful, for
example, if an admin wants to disable the driver UI by default for
applications, since it's harder to secure it (since it involves non
trivial ssl certificate and auth management that admins may not want
to expose to user apps).

This needs to be opt-in, because of the way the YARN proxy works, so
a new configuration was added to enable the option.

The YARN RM will proxy requests to live AMs instead of redirecting
the client, so pages in the SHS UI will not render correctly since
they'll reference invalid paths in the RM UI. The proxy base support
in the SHS cannot be used since that would prevent direct access to
the SHS.

So, to solve this problem, for the feature to work end-to-end, a new
YARN-specific filter was added that detects whether the requests come
from the proxy and redirects the client appropriatly. The SHS admin has
to add this filter manually if they want the feature to work.

Tested with new unit test, and by running with the documented configuration
set in a test cluster. Also verified the driver UI is used when it's
enabled.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16946 from vanzin/SPARK-19554.

4661d30b

Feb 21, 2017

[SPARK-19337][ML][DOC] Documentation and examples for LinearSVC · 280afe0e

Yuhao Yang authored 8 years ago

## What changes were proposed in this pull request?

Documentation and examples (Java, scala, python, R) for LinearSVC

## How was this patch tested?
local doc generation

Author: Yuhao Yang <yuhao.yang@intel.com>

Closes #16968 from hhbyyh/mlsvmdoc.

280afe0e

Feb 16, 2017

[SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549

Sean Owen authored 8 years ago

- Move external/java8-tests tests into core, streaming, sql and remove
- Remove MaxPermGen and related options
- Fix some reflection / TODOs around Java 8+ methods
- Update doc references to 1.7/1.8 differences
- Remove Java 7/8 related build profiles
- Update some plugins for better Java 8 compatibility
- Fix a few Java-related warnings

For the future:

- Update Java 8 examples to fully use Java 8
- Update Java tests to use lambdas for simplicity
- Update Java internal implementations to use lambdas

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16871 from srowen/SPARK-19493.

Unverified

0e240549

Feb 15, 2017

[SPARK-18080][ML][PYTHON] Python API & Examples for Locality Sensitive Hashing · 08c1972a

Yun Ni authored 8 years ago

## What changes were proposed in this pull request?
This pull request includes python API and examples for LSH. The API changes was based on yanboliang 's PR #15768 and resolved conflicts and API changes on the Scala API. The examples are consistent with Scala examples of MinHashLSH and BucketedRandomProjectionLSH.

## How was this patch tested?
API and examples are tested using spark-submit:
`bin/spark-submit examples/src/main/python/ml/min_hash_lsh.py`
`bin/spark-submit examples/src/main/python/ml/bucketed_random_projection_lsh.py`

User guide changes are generated and manually inspected:
`SKIP_API=1 jekyll build`

Author: Yun Ni <yunn@uber.com>
Author: Yanbo Liang <ybliang8@gmail.com>
Author: Yunni <Euler57721@gmail.com>

Closes #16715 from Yunni/spark-18080.

08c1972a

Feb 14, 2017

[SPARK-19584][SS][DOCS] update structured streaming documentation around batch mode · 447b2b53

Tyson Condie authored 8 years ago

## What changes were proposed in this pull request?

Revision to structured-streaming-kafka-integration.md to reflect new Batch query specification and options.

zsxwing tdas

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Tyson Condie <tcondie@gmail.com>

Closes #16918 from tcondie/kafka-docs.

447b2b53

[SPARK-19585][DOC][SQL] Fix the cacheTable and uncacheTable api call in the doc · 9b5e460a

Sunitha Kambhampati authored 8 years ago

## What changes were proposed in this pull request?

https://spark.apache.org/docs/latest/sql-programming-guide.html#caching-data-in-memory
In the doc, the call spark.cacheTable(“tableName”) and spark.uncacheTable(“tableName”) actually needs to be spark.catalog.cacheTable and spark.catalog.uncacheTable

## How was this patch tested?
Built the docs and verified the change shows up fine.

Author: Sunitha Kambhampati <skambha@us.ibm.com>

Closes #16919 from skambha/docChange.

9b5e460a

Feb 13, 2017

[SPARK-19520][STREAMING] Do not encrypt data written to the WAL. · 0169360e

Marcelo Vanzin authored 8 years ago

Spark's I/O encryption uses an ephemeral key for each driver instance.
So driver B cannot decrypt data written by driver A since it doesn't
have the correct key.

The write ahead log is used for recovery, thus needs to be readable by
a different driver. So it cannot be encrypted by Spark's I/O encryption
code.

The BlockManager APIs used by the WAL code to write the data automatically
encrypt data, so changes are needed so that callers can to opt out of
encryption.

Aside from that, the "putBytes" API in the BlockManager does not do
encryption, so a separate situation arised where the WAL would write
unencrypted data to the BM and, when those blocks were read, decryption
would fail. So the WAL code needs to ask the BM to encrypt that data
when encryption is enabled; this code is not optimal since it results
in a (temporary) second copy of the data block in memory, but should be
OK for now until a more performant solution is added. The non-encryption
case should not be affected.

Tested with new unit tests, and by running streaming apps that do
recovery using the WAL data with I/O encryption turned on.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16862 from vanzin/SPARK-19520.

0169360e

Feb 10, 2017

Encryption of shuffle files · c5a66356

Hervé authored 8 years ago

Hello

According to my understanding of commits 4b4e329e & 8b325b17, one may now encrypt shuffle files regardless of the cluster manager in use.

However I have limited understanding of the code, I'm not able to find out whether theses changes also comprise all "temporary local storage, such as shuffle files, cached data, and other application files".

Please feel free to amend or reject my PR if I'm wrong.

dud

Author: Hervé <dud225@users.noreply.github.com>

Closes #16885 from dud225/patch-1.

c5a66356

[SPARK-19545][YARN] Fix compile issue for Spark on Yarn when building against Hadoop 2.6.0~2.6.3 · 8e8afb3a

jerryshao authored 8 years ago

## What changes were proposed in this pull request?

Due to the newly added API in Hadoop 2.6.4+, Spark builds against Hadoop 2.6.0~2.6.3 will meet compile error. So here still reverting back to use reflection to handle this issue.

## How was this patch tested?

Manual verification.

Author: jerryshao <sshao@hortonworks.com>

Closes #16884 from jerryshao/SPARK-19545.

Unverified

8e8afb3a

Feb 09, 2017

[SPARK-16554][CORE] Automatically Kill Executors and Nodes when they are Blacklisted · 6287c94f

José Hiram Soltren authored 8 years ago

## What changes were proposed in this pull request?

In SPARK-8425, we introduced a mechanism for blacklisting executors and nodes (hosts). After a certain number of failures, these resources would be "blacklisted" and no further work would be assigned to them for some period of time.

In some scenarios, it is better to fail fast, and to simply kill these unreliable resources. This changes proposes to do so by having the BlacklistTracker kill unreliable resources when they would otherwise be "blacklisted".

In order to be thread safe, this code depends on the CoarseGrainedSchedulerBackend sending a message to the driver backend in order to do the actual killing. This also helps to prevent a race which would permit work to begin on a resource (executor or node), between the time the resource is marked for killing and the time at which it is finally killed.

## How was this patch tested?

./dev/run-tests
Ran https://github.com/jsoltren/jose-utils/blob/master/blacklist/test-blacklist.sh, and checked logs to see executors and nodes being killed.

Testing can likely be improved here; suggestions welcome.

Author: José Hiram Soltren <jose@cloudera.com>

Closes #16650 from jsoltren/SPARK-16554-submit.

6287c94f

[SPARK-17874][CORE] Add SSL port configuration. · 3fc8e8ca

Marcelo Vanzin authored 8 years ago

Make the SSL port configuration explicit, instead of deriving it
from the non-SSL port, but retain the existing functionality in
case anyone depends on it.

The change starts the HTTPS and HTTP connectors separately, so
that it's possible to use independent ports for each. For that to
work, the initialization of the server needs to be shuffled around
a bit. The change also makes it so the initialization of both
connectors is similar, and end up using the same Scheduler - previously
only the HTTP connector would use the correct one.

Also fixed some outdated documentation about a couple of services
that were removed long ago.

Tested with unit tests and by running spark-shell with SSL configs.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16625 from vanzin/SPARK-17874.

3fc8e8ca

Feb 08, 2017

[SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and earlier · e8d3fca4

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

- Remove support for Hadoop 2.5 and earlier
- Remove reflection and code constructs only needed to support multiple versions at once
- Update docs to reflect newer versions
- Remove older versions' builds and profiles.

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16810 from srowen/SPARK-19464.

Unverified

e8d3fca4

Feb 07, 2017

[MINOR][DOC] Remove parenthesis in readStream() on kafka structured streaming doc · 5a0569ce

manugarri authored 8 years ago

There is a typo in http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-stream , python example n1 uses `readStream()` instead of `readStream`

Just removed the parenthesis.

Author: manugarri <manuel.garrido.pena@gmail.com>

Closes #16836 from manugarri/fix_kafka_python_doc.

5a0569ce

Feb 03, 2017

[SPARK-19386][SPARKR][DOC] Bisecting k-means in SparkR documentation · 48aafeda

krishnakalyan3 authored 8 years ago

## What changes were proposed in this pull request?
Update programming guide, example and vignette with Bisecting k-means.

Author: krishnakalyan3 <krishnakalyan3@gmail.com>

Closes #16767 from krishnakalyan3/bisecting-kmeans.

48aafeda

Feb 01, 2017

[SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning · 04ee8cf6

Zheng RuiFeng authored 8 years ago

## What changes were proposed in this pull request?
Fix brokens links in ml-pipeline and ml-tuning
`<div data-lang="scala">`  ->   `<div data-lang="scala" markdown="1">`

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #16754 from zhengruifeng/doc_api_fix.

Unverified

04ee8cf6

[SPARK-19402][DOCS] Support LaTex inline formula correctly and fix warnings in... · f1a1f260

hyukjinkwon authored 8 years ago

[SPARK-19402][DOCS] Support LaTex inline formula correctly and fix warnings in Scala/Java APIs generation

## What changes were proposed in this pull request?

This PR proposes three things as below:

- Support LaTex inline-formula, `\( ... \)` in Scala API documentation
  It seems currently,

  ```
  \( ... \)
  ```

  are rendered as they are, for example,

  <img width="345" alt="2017-01-30 10 01 13" src="https://cloud.githubusercontent.com/assets/6477701/22423960/ab37d54a-e737-11e6-9196-4f6229c0189c.png">

  It seems mistakenly more backslashes were added.

- Fix warnings Scaladoc/Javadoc generation
  This PR fixes t two types of warnings as below:

  ```
  [warn] .../spark/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala:335: Could not find any member to link for "UnsupportedOperationException".
  [warn]   /**
  [warn]   ^
  ```

  ```
  [warn] .../spark/sql/core/src/main/scala/org/apache/spark/sql/internal/VariableSubstitution.scala:24: Variable var undefined in comment for class VariableSubstitution in class VariableSubstitution
  [warn]  * `${var}`, `${system:var}` and `${env:var}`.
  [warn]      ^
  ```

- Fix Javadoc8 break
  ```
  [error] .../spark/mllib/target/java/org/apache/spark/ml/PredictionModel.java:7: error: reference not found
  [error]  *                       E.g., {link VectorUDT} for vector features.
  [error]                                       ^
  [error] .../spark/mllib/target/java/org/apache/spark/ml/PredictorParams.java:12: error: reference not found
  [error]    *                          E.g., {link VectorUDT} for vector features.
  [error]                                            ^
  [error] .../spark/mllib/target/java/org/apache/spark/ml/Predictor.java:10: error: reference not found
  [error]  *                       E.g., {link VectorUDT} for vector features.
  [error]                                       ^
  [error] .../spark/sql/hive/target/java/org/apache/spark/sql/hive/HiveAnalysis.java:5: error: reference not found
  [error]  * Note that, this rule must be run after {link PreprocessTableInsertion}.
  [error]                                                  ^
  ```

## How was this patch tested?

Manually via `sbt unidoc` and `jeykil build`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #16741 from HyukjinKwon/warn-and-break.

Unverified

f1a1f260

Jan 30, 2017

[SPARK-19396][DOC] JDBC Options are Case In-sensitive · c0eda7e8

gatorsmile authored 8 years ago

### What changes were proposed in this pull request?
The case are not sensitive in JDBC options, after the PR https://github.com/apache/spark/pull/15884 is merged to Spark 2.1.

### How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #16734 from gatorsmile/fixDocCaseInsensitive.

c0eda7e8