Skip to content
Snippets Groups Projects
Commit 4ae9fe09 authored by Sun Rui's avatar Sun Rui Committed by Shivaram Venkataraman
Browse files

[SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR.

## What changes were proposed in this pull request?

dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame.

The function signature is:

	dapply(df, function(localDF) {}, schema = NULL)

R function input: local data.frame from the partition on local node
R function output: local data.frame

Schema specifies the Row format of the resulting DataFrame. It must match the R function's output.
If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply().

## How was this patch tested?
SparkR unit tests.

Author: Sun Rui <rui.sun@intel.com>
Author: Sun Rui <sunrui2016@gmail.com>

Closes #12493 from sun-rui/SPARK-12919.
parent d78fbcc3
No related branches found
No related tags found
No related merge requests found
Showing
with 337 additions and 15 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment