Skip to content
Snippets Groups Projects
  1. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  2. Mar 07, 2016
    • Dongjoon Hyun's avatar
      [SPARK-12243][BUILD][PYTHON] PySpark tests are slow in Jenkins. · e72914f3
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      In the Jenkins pull request builder, PySpark tests take around [962 seconds ](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52530/console) of end-to-end time to run, despite the fact that we run four Python test suites in parallel. According to the log, the basic reason is that the long running test starts at the end due to FIFO queue. We first try to reduce the test time by just starting some long running tests first with simple priority queue.
      
      ```
      ========================================================================
      Running PySpark tests
      ========================================================================
      ...
      Finished test(python3.4): pyspark.streaming.tests (213s)
      Finished test(pypy): pyspark.sql.tests (92s)
      Finished test(pypy): pyspark.streaming.tests (280s)
      Tests passed in 962 seconds
      ```
      
      ## How was this patch tested?
      
      Manual check.
      Check 'Running PySpark tests' part of the Jenkins log.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11551 from dongjoon-hyun/SPARK-12243.
      e72914f3
  3. Dec 16, 2015
  4. Oct 19, 2015
    • Brennon York's avatar
      [SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python · d3180c25
      Brennon York authored
      This commit refactors the `run-tests-jenkins` script into Python. This refactoring was done by brennonyork in #7401; this PR contains a few minor edits from joshrosen in order to bring it up to date with other recent changes.
      
      From the original PR description (by brennonyork):
      
      Currently a few things are left out that, could and I think should, be smaller JIRA's after this.
      
      1. There are still a few areas where we use environment variables where we don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in lieu of everything else, but wanted to point that out.
      2. The PR tests are still written in bash. I opted to not change those and just rewrite the runner into Python. This is a great follow-on JIRA IMO.
      3. All of the linting scripts are still in bash as well and would likely do to just add those in as follow-on JIRA's as well.
      
      Closes #7401.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #9161 from JoshRosen/run-tests-jenkins-refactoring.
      d3180c25
  5. Oct 13, 2015
  6. Aug 11, 2015
    • Tathagata Das's avatar
      [SPARK-9572] [STREAMING] [PYSPARK] Added StreamingContext.getActiveOrCreate() in Python · 5b8bb1b2
      Tathagata Das authored
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #8080 from tdas/SPARK-9572 and squashes the following commits:
      
      64a231d [Tathagata Das] Fix based on comments
      741a0d0 [Tathagata Das] Fixed style
      f4f094c [Tathagata Das] Tweaked test
      9afcdbe [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9572
      e21488d [Tathagata Das] Minor update
      1a371d9 [Tathagata Das] Addressed comments.
      60479da [Tathagata Das] Fixed indent
      9c2da9c [Tathagata Das] Fixed bugs
      b5bd32c [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9572
      b55b348 [Tathagata Das] Removed prints
      5781728 [Tathagata Das] Fix style issues
      b711214 [Tathagata Das] Reverted run-tests.py
      643b59d [Tathagata Das] Revert unnecessary change
      150e58c [Tathagata Das] Added StreamingContext.getActiveOrCreate() in Python
      5b8bb1b2
  7. Jul 08, 2015
    • Davies Liu's avatar
      [SPARK-8450] [SQL] [PYSARK] cleanup type converter for Python DataFrame · 74d8d3d9
      Davies Liu authored
      This PR fixes the converter for Python DataFrame, especially for DecimalType
      
      Closes #7106
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7131 from davies/decimal_python and squashes the following commits:
      
      4d3c234 [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_python
      20531d6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_python
      7d73168 [Davies Liu] fix conflit
      6cdd86a [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_python
      7104e97 [Davies Liu] improve type infer
      9cd5a21 [Davies Liu] run python tests with SPARK_PREPEND_CLASSES
      829a05b [Davies Liu] fix UDT in python
      c99e8c5 [Davies Liu] fix mima
      c46814a [Davies Liu] convert decimal for Python DataFrames
      74d8d3d9
  8. Jul 01, 2015
    • cocoatomo's avatar
      [SPARK-8763] [PYSPARK] executing run-tests.py with Python 2.6 fails with... · fdcad6ef
      cocoatomo authored
      [SPARK-8763] [PYSPARK] executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function
      
      Running run-tests.py with Python 2.6 cause following error:
      
      ```
      Running PySpark tests. Output is in python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
      Will test against the following Python executables: ['python2.6', 'python3.4', 'pypy']
      Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
      Traceback (most recent call last):
        File "./python/run-tests.py", line 196, in <module>
          main()
        File "./python/run-tests.py", line 159, in main
          python_implementation = subprocess.check_output(
      AttributeError: 'module' object has no attribute 'check_output'
      ...
      ```
      
      The cause of this error is using subprocess.check_output function, which exists since Python 2.7.
      (ref. https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #7161 from cocoatomo/issues/8763-test-fails-py26 and squashes the following commits:
      
      cf4f901 [cocoatomo] [SPARK-8763] backport process.check_output function from Python 2.7
      fdcad6ef
  9. Jun 30, 2015
    • Josh Rosen's avatar
      [SPARK-5161] [HOTFIX] Fix bug in Python test failure reporting · 6c5a6db4
      Josh Rosen authored
      This patch fixes a bug introduced in #7031 which can cause Jenkins to incorrectly report a build with failed Python tests as passing if an error occurred while printing the test failure message.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7112 from JoshRosen/python-tests-hotfix and squashes the following commits:
      
      c3f2961 [Josh Rosen] Hotfix for bug in Python test failure reporting
      6c5a6db4
  10. Jun 29, 2015
    • Josh Rosen's avatar
      [SPARK-5161] Parallelize Python test execution · 7bbbe380
      Josh Rosen authored
      This commit parallelizes the Python unit test execution, significantly reducing Jenkins build times.  Parallelism is now configurable by passing the `-p` or `--parallelism` flags to either `dev/run-tests` or `python/run-tests` (the default parallelism is 4, but I've successfully tested with higher parallelism).
      
      To avoid flakiness, I've disabled the Spark Web UI for the Python tests, similar to what we've done for the JVM tests.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7031 from JoshRosen/parallelize-python-tests and squashes the following commits:
      
      feb3763 [Josh Rosen] Re-enable other tests
      f87ea81 [Josh Rosen] Only log output from failed tests
      d4ded73 [Josh Rosen] Logging improvements
      a2717e1 [Josh Rosen] Make parallelism configurable via dev/run-tests
      1bacf1b [Josh Rosen] Merge remote-tracking branch 'origin/master' into parallelize-python-tests
      110cd9d [Josh Rosen] Fix universal_newlines for Python 3
      cd13db8 [Josh Rosen] Also log python_implementation
      9e31127 [Josh Rosen] Log Python --version output for each executable.
      a2b9094 [Josh Rosen] Bump up parallelism.
      5552380 [Josh Rosen] Python 3 fix
      866b5b9 [Josh Rosen] Fix lazy logging warnings in Prospector checks
      87cb988 [Josh Rosen] Skip MLLib tests for PyPy
      8309bfe [Josh Rosen] Temporarily disable parallelism to debug a failure
      9129027 [Josh Rosen] Disable Spark UI in Python tests
      037b686 [Josh Rosen] Temporarily disable JVM tests so we can test Python speedup in Jenkins.
      af4cef4 [Josh Rosen] Initial attempt at parallelizing Python test execution
      7bbbe380
  11. Jun 27, 2015
    • Josh Rosen's avatar
      [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with... · 40648c56
      Josh Rosen authored
      [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system
      
      This patch refactors the `python/run-tests` script:
      
      - It's now written in Python instead of Bash.
      - The descriptions of the tests to run are now stored in `dev/run-tests`'s modules.  This allows the pull request builder to skip Python tests suites that were not affected by the pull request's changes.  For example, we can now skip the PySpark Streaming test cases when only SQL files are changed.
      - `python/run-tests` now supports command-line flags to make it easier to run individual test suites (this addresses SPARK-5482):
      
        ```
      Usage: run-tests [options]
      
      Options:
        -h, --help            show this help message and exit
        --python-executables=PYTHON_EXECUTABLES
                              A comma-separated list of Python executables to test
                              against (default: python2.6,python3.4,pypy)
        --modules=MODULES     A comma-separated list of Python modules to test
                              (default: pyspark-core,pyspark-ml,pyspark-mllib
                              ,pyspark-sql,pyspark-streaming)
         ```
      - `dev/run-tests` has been split into multiple files: the module definitions and test utility functions are now stored inside of a `dev/sparktestsupport` Python module, allowing them to be re-used from the Python test runner script.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6967 from JoshRosen/run-tests-python-modules and squashes the following commits:
      
      f578d6d [Josh Rosen] Fix print for Python 2.x
      8233d61 [Josh Rosen] Add python/run-tests.py to Python lint checks
      34c98d2 [Josh Rosen] Fix universal_newlines for Python 3
      8f65ed0 [Josh Rosen] Fix handling of  module in python/run-tests
      37aff00 [Josh Rosen] Python 3 fix
      27a389f [Josh Rosen] Skip MLLib tests for PyPy
      c364ccf [Josh Rosen] Use which() to convert PYSPARK_PYTHON to an absolute path before shelling out to run tests
      568a3fd [Josh Rosen] Fix hashbang
      3b852ae [Josh Rosen] Fall back to PYSPARK_PYTHON when sys.executable is None (fixes a test)
      f53db55 [Josh Rosen] Remove python2 flag, since the test runner script also works fine under Python 3
      9c80469 [Josh Rosen] Fix passing of PYSPARK_PYTHON
      d33e525 [Josh Rosen] Merge remote-tracking branch 'origin/master' into run-tests-python-modules
      4f8902c [Josh Rosen] Python lint fixes.
      8f3244c [Josh Rosen] Use universal_newlines to fix dev/run-tests doctest failures on Python 3.
      f542ac5 [Josh Rosen] Fix lint check for Python 3
      fff4d09 [Josh Rosen] Add dev/sparktestsupport to pep8 checks
      2efd594 [Josh Rosen] Update dev/run-tests to use new Python test runner flags
      b2ab027 [Josh Rosen] Add command-line options for running individual suites in python/run-tests
      caeb040 [Josh Rosen] Fixes to PySpark test module definitions
      d6a77d3 [Josh Rosen] Fix the tests of dev/run-tests
      def2d8a [Josh Rosen] Two minor fixes
      aec0b8f [Josh Rosen] Actually get the Kafka stuff to run properly
      04015b9 [Josh Rosen] First attempt at getting PySpark Kafka test to work in new runner script
      4c97136 [Josh Rosen] PYTHONPATH fixes
      dcc9c09 [Josh Rosen] Fix time division
      32660fc [Josh Rosen] Initial cut at Python test runner refactoring
      311c6a9 [Josh Rosen] Move shell utility functions to own module.
      1bdeb87 [Josh Rosen] Move module definitions to separate file.
      40648c56
Loading