diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index f10d7e277eea3e1ec2bf94904dce9a0a57ccddbd..1a8206abe38387d6069e31b8019fe1707199d4f0 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,7 +6,7 @@ It lists steps that are required before creating a PR. In particular, consider: - Is the change important and ready enough to ask the community to spend time reviewing? - Have you searched for existing, related JIRAs and pull requests? -- Is this a new feature that can stand alone as a package on http://spark-packages.org ? +- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ? - Is the change being proposed clearly explained and motivated? When you contribute code, you affirm that the contribution is your original work and that you diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R index 15afe01c24ed2627dcd0c678ff290714a4c35292..06015362e6bc13626827e92efde155a4485a3ead 100644 --- a/R/pkg/R/sparkR.R +++ b/R/pkg/R/sparkR.R @@ -100,7 +100,7 @@ sparkR.stop <- function() { #' @param sparkEnvir Named list of environment variables to set on worker nodes #' @param sparkExecutorEnv Named list of environment variables to be used when launching executors #' @param sparkJars Character vector of jar files to pass to the worker nodes -#' @param sparkPackages Character vector of packages from spark-packages.org +#' @param sparkPackages Character vector of package coordinates #' @seealso \link{sparkR.session} #' @rdname sparkR.init-deprecated #' @export @@ -327,7 +327,7 @@ sparkRHive.init <- function(jsc = NULL) { #' @param sparkHome Spark Home directory. #' @param sparkConfig named list of Spark configuration to set on worker nodes. #' @param sparkJars character vector of jar files to pass to the worker nodes. -#' @param sparkPackages character vector of packages from spark-packages.org +#' @param sparkPackages character vector of package coordinates #' @param enableHiveSupport enable support for Hive, fallback if not built with Hive support; once #' set, this cannot be turned off on an existing session #' @param ... named Spark properties passed to the method. diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index d3bf082aa751a78b783f06c3e4a673d1124735d6..ad5b5c9adfac878e2b9958db5891572caf20bd96 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -114,7 +114,7 @@ <li class="divider"></li> <li><a href="building-spark.html">Building Spark</a></li> <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li> - <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Supplemental+Spark+Projects">Supplemental Projects</a></li> + <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects">Third Party Projects</a></li> </ul> </li> </ul> diff --git a/docs/index.md b/docs/index.md index 0cb8803783a0f98f2ec2e82cfa7be6552a7f4d66..a7a92f6c4f6d736ee7d974b0d85de3f772da127e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -120,7 +120,7 @@ options for deployment: * [OpenStack Swift](storage-openstack-swift.html) * [Building Spark](building-spark.html): build Spark using the Maven system * [Contributing to Spark](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) -* [Supplemental Projects](https://cwiki.apache.org/confluence/display/SPARK/Supplemental+Spark+Projects): related third party Spark projects +* [Third Party Projects](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects): related third party Spark projects **External Resources:** diff --git a/docs/sparkr.md b/docs/sparkr.md index 4bbc362c52086cbacb4b80ee5fac9486ff1c5f7b..b881119731045e41de83b490e81a8e10bc12e442 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -110,7 +110,8 @@ head(df) SparkR supports operating on a variety of data sources through the `SparkDataFrame` interface. This section describes the general methods for loading and saving data using Data Sources. You can check the Spark SQL programming guide for more [specific options](sql-programming-guide.html#manually-specifying-options) that are available for the built-in data sources. -The general method for creating SparkDataFrames from data sources is `read.df`. This method takes in the path for the file to load and the type of data source, and the currently active SparkSession will be used automatically. SparkR supports reading JSON, CSV and Parquet files natively and through [Spark Packages](http://spark-packages.org/) you can find data source connectors for popular file formats like [Avro](http://spark-packages.org/package/databricks/spark-avro). These packages can either be added by +The general method for creating SparkDataFrames from data sources is `read.df`. This method takes in the path for the file to load and the type of data source, and the currently active SparkSession will be used automatically. +SparkR supports reading JSON, CSV and Parquet files natively, and through packages available from sources like [Third Party Projects](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects), you can find data source connectors for popular file formats like Avro. These packages can either be added by specifying `--packages` with `spark-submit` or `sparkR` commands, or if initializing SparkSession with `sparkPackages` parameter when in an interactive R shell or from RStudio. <div data-lang="r" markdown="1"> diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 5392b4a9bcf4bcea932a31f8421474333b6881b1..43f1cf3e31871737d165a123fccdc8779d657f44 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -2382,7 +2382,7 @@ additional effort may be necessary to achieve exactly-once semantics. There are - [Kafka Integration Guide](streaming-kafka-integration.html) - [Kinesis Integration Guide](streaming-kinesis-integration.html) - [Custom Receiver Guide](streaming-custom-receivers.html) -* Third-party DStream data sources can be found in [Spark Packages](https://spark-packages.org/) +* Third-party DStream data sources can be found in [Third Party Projects](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) * API documentation - Scala docs * [StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) and diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala index 71807b771a95f6309974c3e6a45a37a6ec0aa8ec..825c01365dd1e02ee206a42435b5e487552f6ad6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala @@ -142,12 +142,13 @@ case class DataSource( } else if (provider.toLowerCase == "avro" || provider == "com.databricks.spark.avro") { throw new AnalysisException( - s"Failed to find data source: ${provider.toLowerCase}. Please use Spark " + - "package http://spark-packages.org/package/databricks/spark-avro") + s"Failed to find data source: ${provider.toLowerCase}. Please find an Avro " + + "package at " + + "https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects") } else { throw new ClassNotFoundException( s"Failed to find data source: $provider. Please find packages at " + - "http://spark-packages.org", + "https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects", error) } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index a2164f9ae3d3ed3010d3f8625bbb5326637ee418..3cc3b319f5a57e91a52de4c40720ddadb84baceb 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -1645,21 +1645,18 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { e = intercept[AnalysisException] { sql(s"select id from `com.databricks.spark.avro`.`file_path`") } - assert(e.message.contains("Failed to find data source: com.databricks.spark.avro. " + - "Please use Spark package http://spark-packages.org/package/databricks/spark-avro")) + assert(e.message.contains("Failed to find data source: com.databricks.spark.avro.")) // data source type is case insensitive e = intercept[AnalysisException] { sql(s"select id from Avro.`file_path`") } - assert(e.message.contains("Failed to find data source: avro. Please use Spark package " + - "http://spark-packages.org/package/databricks/spark-avro")) + assert(e.message.contains("Failed to find data source: avro.")) e = intercept[AnalysisException] { sql(s"select id from avro.`file_path`") } - assert(e.message.contains("Failed to find data source: avro. Please use Spark package " + - "http://spark-packages.org/package/databricks/spark-avro")) + assert(e.message.contains("Failed to find data source: avro.")) e = intercept[AnalysisException] { sql(s"select id from `org.apache.spark.sql.sources.HadoopFsRelationProvider`.`file_path`") diff --git a/sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite.scala index 5ea1f32433699962eb1cc241d67caa89ec2f54cf..76ffb949f12930ec37251dde71c375149323ecf9 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite.scala @@ -74,16 +74,16 @@ class ResolvedDataSourceSuite extends SparkFunSuite { val error1 = intercept[AnalysisException] { getProvidingClass("avro") } - assert(error1.getMessage.contains("spark-packages")) + assert(error1.getMessage.contains("Failed to find data source: avro.")) val error2 = intercept[AnalysisException] { getProvidingClass("com.databricks.spark.avro") } - assert(error2.getMessage.contains("spark-packages")) + assert(error2.getMessage.contains("Failed to find data source: com.databricks.spark.avro.")) val error3 = intercept[ClassNotFoundException] { getProvidingClass("asfdwefasdfasdf") } - assert(error3.getMessage.contains("spark-packages")) + assert(error3.getMessage.contains("Failed to find data source: asfdwefasdfasdf.")) } }