Commit 9909efc1 authored 10 years ago by Aaron Davidson Committed by Reynold Xin 10 years ago

SPARK-1839: PySpark RDD#take() shouldn't always read from driver

This patch simply ports over the Scala implementation of RDD#take(), which reads the first partition at the driver, then decides how many more partitions it needs to read and will possibly start a real job if it's more than 1. (Note that SparkContext#runJob(allowLocal=true) only runs the job locally if there's 1 partition selected and no parent stages.)

Author: Aaron Davidson <aaron@databricks.com>

Closes #922 from aarondav/take and squashes the following commits:

fa06df9 [Aaron Davidson] SPARK-1839: PySpark RDD#take() shouldn't always read from driver

parent 7d52777e

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 84 additions and 21 deletions

Please register or to comment