Skip to content
Snippets Groups Projects
  • Sun Rui's avatar
    4ae9fe09
    [SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR. · 4ae9fe09
    Sun Rui authored
    ## What changes were proposed in this pull request?
    
    dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame.
    
    The function signature is:
    
    	dapply(df, function(localDF) {}, schema = NULL)
    
    R function input: local data.frame from the partition on local node
    R function output: local data.frame
    
    Schema specifies the Row format of the resulting DataFrame. It must match the R function's output.
    If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply().
    
    ## How was this patch tested?
    SparkR unit tests.
    
    Author: Sun Rui <rui.sun@intel.com>
    Author: Sun Rui <sunrui2016@gmail.com>
    
    Closes #12493 from sun-rui/SPARK-12919.
    4ae9fe09
    History
    [SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR.
    Sun Rui authored
    ## What changes were proposed in this pull request?
    
    dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame.
    
    The function signature is:
    
    	dapply(df, function(localDF) {}, schema = NULL)
    
    R function input: local data.frame from the partition on local node
    R function output: local data.frame
    
    Schema specifies the Row format of the resulting DataFrame. It must match the R function's output.
    If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply().
    
    ## How was this patch tested?
    SparkR unit tests.
    
    Author: Sun Rui <rui.sun@intel.com>
    Author: Sun Rui <sunrui2016@gmail.com>
    
    Closes #12493 from sun-rui/SPARK-12919.