Skip to content
Snippets Groups Projects
  1. Jun 02, 2016
    • Liwei Lin's avatar
      [SPARK-15208][WIP][CORE][STREAMING][DOCS] Update Spark examples with AccumulatorV2 · a0eec8e8
      Liwei Lin authored
      ## What changes were proposed in this pull request?
      
      The patch updates the codes & docs in the example module as well as the related doc module:
      
      - [ ] [docs] `streaming-programming-guide.md`
        - [x] scala code part
        - [ ] java code part
        - [ ] python code part
      - [x] [examples] `RecoverableNetworkWordCount.scala`
      - [ ] [examples] `JavaRecoverableNetworkWordCount.java`
      - [ ] [examples] `recoverable_network_wordcount.py`
      
      ## How was this patch tested?
      
      Ran the examples and verified results manually.
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #12981 from lw-lin/accumulatorV2-examples.
      a0eec8e8
  2. Jun 01, 2016
    • WeichenXu's avatar
      [SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section · 2402b914
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      Update document programming-guide accumulator section (scala language)
      java and python version, because the API haven't done, so I do not modify them.
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #13441 from WeichenXu123/update_doc_accumulatorV2_clean.
      2402b914
  3. May 30, 2016
    • Matthew Wise's avatar
      [DOCS] fix example code issues in documentation · 2d34183b
      Matthew Wise authored
      ## What changes were proposed in this pull request?
      
      Fixed broken java code examples in streaming documentation
      
      Attn: tdas
      
      Author: Matthew Wise <matthew.rs.wise@gmail.com>
      
      Closes #13388 from mawise/fix_docs_java_streaming_example.
      2d34183b
  4. May 27, 2016
    • Yanbo Liang's avatar
      [SPARK-11959][SPARK-15484][DOC][ML] Document WLS and IRLS · a3550e37
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * Document ```WeightedLeastSquares```(normal equation) and ```IterativelyReweightedLeastSquares```.
      * Copy ```L-BFGS``` documents from ```spark.mllib``` to ```spark.ml```.
      
      Due to the session ```Optimization of linear methods``` is used for developers, I think we should provide the brief introduction of the optimization method, necessary references and how it implements in Spark. It's not necessary to paste all mathematical formula and derivation here. If developers/users want to learn more, they can track reference.
      
      ## How was this patch tested?
      Document update, no tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13262 from yanboliang/spark-15484.
      a3550e37
    • sethah's avatar
      [SPARK-15186][ML][DOCS] Add user guide for generalized linear regression · c96244f5
      sethah authored
      ## What changes were proposed in this pull request?
      
      This patch adds a user guide section for generalized linear regression and includes the examples from [#12754](https://github.com/apache/spark/pull/12754).
      
      ## How was this patch tested?
      
      Documentation only, no tests required.
      
      ## Approach
      
      In general, it is a bit unclear what level of detail ought to be included in the user guide since there is a lot of variability within the current user guide. I tried to give a fairly brief mathematical introduction to GLMs, and cover what types of problems they could be used for. Additionally, I included a brief blurb on the IRLS solver. The input/output columns are given in a table as is found elsewhere in the docs (though, again, these appear rather intermittently in the current docs), as well as a table providing the supported families and their link functions.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #13139 from sethah/SPARK-15186.
      c96244f5
    • jerryshao's avatar
      [YARN][DOC][MINOR] Remove several obsolete env variables and update the doc · 1b98fa2e
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Remove several obsolete env variables not supported for Spark on YARN now, also updates the docs to include several changes with 2.0.
      
      ## How was this patch tested?
      
      N/A
      
      CC vanzin tgravescs
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #13296 from jerryshao/yarn-doc.
      1b98fa2e
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'a -> an' · 6b1a6180
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `a` -> `an`
      
      I use regex to generate potential error lines:
      `grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml/*/*scala`
      and review them line by line.
      
      ## How was this patch tested?
      
      local build
      `lint-java` checking
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13317 from zhengruifeng/a_an.
      6b1a6180
  5. May 26, 2016
    • felixcheung's avatar
      [SPARK-10903] followup - update API doc for SqlContext · c8288323
      felixcheung authored
      ## What changes were proposed in this pull request?
      
      Follow up on the earlier PR - in here we are fixing up roxygen2 doc examples.
      Also add to the programming guide migration section.
      
      ## How was this patch tested?
      
      SparkR tests
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #13340 from felixcheung/sqlcontextdoc.
      c8288323
    • Steve Loughran's avatar
      [SPARK-13148][YARN] document zero-keytab Oozie application launch; add diagnostics · 01b350a4
      Steve Loughran authored
      This patch provides detail on what to do for keytabless Oozie launches of spark apps, and adds some debug-level diagnostics of what credentials have been submitted
      
      Author: Steve Loughran <stevel@hortonworks.com>
      Author: Steve Loughran <stevel@apache.org>
      
      Closes #11033 from steveloughran/stevel/feature/SPARK-13148-oozie.
      01b350a4
  6. May 25, 2016
    • Krishna Kalyan's avatar
      [SPARK-12071][DOC] Document the behaviour of NA in R · 9082b796
      Krishna Kalyan authored
      ## What changes were proposed in this pull request?
      
      Under Upgrading From SparkR 1.5.x to 1.6.x section added the information, SparkSQL converts `NA` in R to `null`.
      
      ## How was this patch tested?
      
      Document update, no tests.
      
      Author: Krishna Kalyan <krishnakalyan3@gmail.com>
      
      Closes #13268 from krishnakalyan3/spark-12071-1.
      9082b796
    • Holden Karau's avatar
      [SPARK-15412][PYSPARK][SPARKR][DOCS] Improve linear isotonic regression pydoc... · cd9f1690
      Holden Karau authored
      [SPARK-15412][PYSPARK][SPARKR][DOCS] Improve linear isotonic regression pydoc & doc build insturctions
      
      ## What changes were proposed in this pull request?
      
      PySpark: Add links to the predictors from the models in regression.py, improve linear and isotonic pydoc in minor ways.
      User guide / R: Switch the installed package list to be enough to build the R docs on a "fresh" install on ubuntu and add sudo to match the rest of the commands.
      User Guide: Add a note about using gem2.0 for systems with both 1.9 and 2.0 (e.g. some ubuntu but maybe more).
      
      ## How was this patch tested?
      
      built pydocs locally, tested new user build instructions
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #13199 from holdenk/SPARK-15412-improve-linear-isotonic-regression-pydoc.
      cd9f1690
  7. May 24, 2016
  8. May 23, 2016
  9. May 22, 2016
  10. May 20, 2016
    • sethah's avatar
      [SPARK-15394][ML][DOCS] User guide typos and grammar audit · 5e203505
      sethah authored
      ## What changes were proposed in this pull request?
      
      Correct some typos and incorrectly worded sentences.
      
      ## How was this patch tested?
      
      Doc changes only.
      
      Note that many of these changes were identified by whomfire01
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #13180 from sethah/ml_guide_audit.
      5e203505
  11. May 17, 2016
    • Sean Zhong's avatar
      [SPARK-15171][SQL] Remove the references to deprecated method dataset.registerTempTable · 25b315e6
      Sean Zhong authored
      ## What changes were proposed in this pull request?
      
      Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`.
      
      ## How was this patch tested?
      
      This PR only changes the unit test code, examples, and comments. It should be safe.
      This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged.
      
      Author: Sean Zhong <seanzhong@databricks.com>
      
      Closes #13098 from clockfly/spark-15171-remove-deprecation.
      25b315e6
    • Yuhao Yang's avatar
      [SPARK-15182][ML] Copy MLlib doc to ML: ml.feature.tf, idf · 3308a862
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      
      We should now begin copying algorithm details from the spark.mllib guide to spark.ml as needed, rather than just linking back to the corresponding algorithms in the spark.mllib user guide.
      
      ## How was this patch tested?
      
      manual review for doc.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      Author: Yuhao Yang <yuhao.yang@intel.com>
      
      Closes #12957 from hhbyyh/tfidfdoc.
      3308a862
    • Sean Owen's avatar
      [SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki · 932d8002
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include:
      
      - Copying in / merging text from the "Useful Developer Tools" wiki, in areas of
        - Docker
        - R
        - Running one test
      - standardizing on ./build/mvn not mvn, and likewise for ./build/sbt
      - correcting some typos
      - standardizing code block formatting
      
      No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki
      
      ## How was this patch tested?
      
      Jekyll doc build and inspection of resulting HTML in browser.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #13124 from srowen/SPARK-15333.
      932d8002
    • wm624@hotmail.com's avatar
      [SPARK-14434][ML] User guide doc and examples for GaussianMixture in spark.ml · 4134ff0c
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      Add guide doc and examples for GaussianMixture in Spark.ml in Java, Scala and Python.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Manual compile and test all examples
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #12788 from wangmiao1981/example.
      4134ff0c
  12. May 16, 2016
  13. May 15, 2016
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos · c7efc56c
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1,Rename matrix args in BreezeUtil to upper to match the doc
      2,Fix several typos in ML and SQL
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13078 from zhengruifeng/fix_ann.
      c7efc56c
  14. May 11, 2016
    • cody koeninger's avatar
      [SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact · 89e67d66
      cody koeninger authored
      ## What changes were proposed in this pull request?
      Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions
      
      ## How was this patch tested?
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #12946 from koeninger/SPARK-15085.
      89e67d66
    • Zheng RuiFeng's avatar
      [SPARK-15150][EXAMPLE][DOC] Update LDA examples · d88afabd
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1,create a libsvm-type dataset for lda: `data/mllib/sample_lda_libsvm_data.txt`
      2,add python example
      3,directly read the datafile in examples
      4,BTW, change to `SparkSession` in `aft_survival_regression.py`
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/lda_example.py`
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12927 from zhengruifeng/lda_pe.
      d88afabd
    • Nicholas Chammas's avatar
      [SPARK-15238] Clarify supported Python versions · fafc95af
      Nicholas Chammas authored
      This PR:
      * Clarifies that Spark *does* support Python 3, starting with Python 3.4.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #13017 from nchammas/supported-python-versions.
      fafc95af
    • Zheng RuiFeng's avatar
      [SPARK-15149][EXAMPLE][DOC] update kmeans example · 8beae591
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Python example for ml.kmeans already exists, but not included in user guide.
      1,small changes like: `example_on` `example_off`
      2,add it to user guide
      3,update examples to directly read datafile
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/kmeans_example.py
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12925 from zhengruifeng/km_pe.
      8beae591
    • Zheng RuiFeng's avatar
      [SPARK-14340][EXAMPLE][DOC] Update Examples and User Guide for ml.BisectingKMeans · cef73b56
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      1, add BisectingKMeans to ml-clustering.md
      2, add the missing Scala BisectingKMeansExample
      3, create a new datafile `data/mllib/sample_kmeans_data.txt`
      
      ## How was this patch tested?
      
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #11844 from zhengruifeng/doc_bkm.
      cef73b56
    • Zheng RuiFeng's avatar
      [SPARK-15141][EXAMPLE][DOC] Update OneVsRest Examples · ad1a8466
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1, Add python example for OneVsRest
      2, remove args-parsing
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py`
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12920 from zhengruifeng/ovr_pe.
      ad1a8466
  15. May 10, 2016
  16. May 09, 2016
    • Philipp Hoffmann's avatar
      [SPARK-15223][DOCS] fix wrongly named config reference · 65b4ab28
      Philipp Hoffmann authored
      ## What changes were proposed in this pull request?
      
      The configuration setting `spark.executor.logs.rolling.size.maxBytes` was changed to `spark.executor.logs.rolling.maxSize` in 1.4 or so.
      
      This commit fixes a remaining reference to the old name in the documentation.
      
      Also the description for `spark.executor.logs.rolling.maxSize` was edited to clearly state that the unit for the size is bytes.
      
      ## How was this patch tested?
      
      no tests
      
      Author: Philipp Hoffmann <mail@philipphoffmann.de>
      
      Closes #13001 from philipphoffmann/patch-3.
      65b4ab28
    • Yanbo Liang's avatar
      [MINOR] [SPARKR] Update data-manipulation.R to use native csv reader · ee3b1715
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR.
      * Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example.
      
      ## How was this patch tested?
      Offline test.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13005 from yanboliang/r-df-examples.
      ee3b1715
  17. May 07, 2016
    • Bryan Cutler's avatar
      [DOC][MINOR] Fixed minor errors in feature.ml user guide doc · 5d188a69
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Fixed some minor errors found when reviewing feature.ml user guide
      
      ## How was this patch tested?
      built docs locally
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #12940 from BryanCutler/feature.ml-doc_fixes-DOCS-MINOR.
      5d188a69
  18. May 06, 2016
    • Zheng RuiFeng's avatar
      [SPARK-14512] [DOC] Add python example for QuantileDiscretizer · 76ad04d9
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Add the missing python example for QuantileDiscretizer
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12281 from zhengruifeng/discret_pe.
      76ad04d9
    • Luciano Resende's avatar
      [SPARK-14738][BUILD] Separate docker integration tests from main build · a03c5e68
      Luciano Resende authored
      ## What changes were proposed in this pull request?
      
      Create a maven profile for executing the docker integration tests using maven
      Remove docker integration tests from main sbt build
      Update documentation on how to run docker integration tests from sbt
      
      ## How was this patch tested?
      
      Manual test of the docker integration tests as in :
      mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test
      
      ## Other comments
      
      Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348
      
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #12508 from lresende/docker.
      a03c5e68
  19. May 04, 2016
    • Bryan Cutler's avatar
      [SPARK-12299][CORE] Remove history serving functionality from Master · cf2e9da6
      Bryan Cutler authored
      Remove history server functionality from standalone Master.  Previously, the Master process rebuilt a SparkUI once the application was completed which sometimes caused problems, such as OOM, when the application event log is large (see SPARK-6270).  Keeping this functionality out of the Master will help to simplify the process and increase stability.
      
      Testing for this change included running core unit tests and manually running an application on a standalone cluster to verify that it completed successfully and that the Master UI functioned correctly.  Also added 2 unit tests to verify killing an application and driver from MasterWebUI makes the correct request to the Master.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #10991 from BryanCutler/remove-history-master-SPARK-12299.
      cf2e9da6
    • Dhruve Ashar's avatar
      [SPARK-4224][CORE][YARN] Support group acls · a4564774
      Dhruve Ashar authored
      ## What changes were proposed in this pull request?
      Currently only a list of users can be specified for view and modify acls. This change enables a group of admins/devs/users to be provisioned for viewing and modifying Spark jobs.
      
      **Changes Proposed in the fix**
      Three new corresponding config entries have been added where the user can specify the groups to be given access.
      
      ```
      spark.admin.acls.groups
      spark.modify.acls.groups
      spark.ui.view.acls.groups
      ```
      
      New config entries were added because specifying the users and groups explicitly is a better and cleaner way compared to specifying them in the existing config entry using a delimiter.
      
      A generic trait has been introduced to provide the user to group mapping which makes it pluggable to support a variety of mapping protocols - similar to the one used in hadoop. A default unix shell based implementation has been provided.
      Custom user to group mapping protocol can be specified and configured by the entry ```spark.user.groups.mapping```
      
      **How the patch was Tested**
      We ran different spark jobs setting the config entries in combinations of admin, modify and ui acls. For modify acls we tried killing the job stages from the ui and using yarn commands. For view acls we tried accessing the UI tabs and the logs. Headless accounts were used to launch these jobs and different users tried to modify and view the jobs to ensure that the groups mapping applied correctly.
      
      Additional Unit tests have been added without modifying the existing ones. These test for different ways of setting the acls through configuration and/or API and validate the expected behavior.
      
      Author: Dhruve Ashar <dhruveashar@gmail.com>
      
      Closes #12760 from dhruve/impr/SPARK-4224.
      a4564774
  20. May 03, 2016
  21. May 02, 2016
  22. Apr 30, 2016
    • pshearer's avatar
      [SPARK-13973][PYSPARK] Make pyspark fail noisily if IPYTHON or IPYTHON_OPTS are set · 0368ff30
      pshearer authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-13973
      
      Following discussion with srowen the IPYTHON and IPYTHON_OPTS variables are removed. If they are set in the user's environment, pyspark will not execute and prints an error message. Failing noisily will force users to remove these options and learn the new configuration scheme, which is much more sustainable and less confusing.
      
      ## How was this patch tested?
      
      Manual testing; set IPYTHON=1 and verified that the error message prints.
      
      Author: pshearer <pshearer@massmutual.com>
      Author: shearerp <shearerp@umich.edu>
      
      Closes #12528 from shearerp/master.
      0368ff30
  23. Apr 29, 2016
    • Sun Rui's avatar
      [SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR. · 4ae9fe09
      Sun Rui authored
      ## What changes were proposed in this pull request?
      
      dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame.
      
      The function signature is:
      
      	dapply(df, function(localDF) {}, schema = NULL)
      
      R function input: local data.frame from the partition on local node
      R function output: local data.frame
      
      Schema specifies the Row format of the resulting DataFrame. It must match the R function's output.
      If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply().
      
      ## How was this patch tested?
      SparkR unit tests.
      
      Author: Sun Rui <rui.sun@intel.com>
      Author: Sun Rui <sunrui2016@gmail.com>
      
      Closes #12493 from sun-rui/SPARK-12919.
      4ae9fe09
Loading