Commits · 2bfd5accdce2ae31feeeddf213a019cf8ec97663 · cs525-sp18-g07 / spark

Jul 10, 2017

[SPARK-21266][R][PYTHON] Support schema a DDL-formatted string in dapply/gapply/from_json · 2bfd5acc

hyukjinkwon authored 7 years ago

## What changes were proposed in this pull request?

This PR supports schema in a DDL formatted string for `from_json` in R/Python and `dapply` and `gapply` in R, which are commonly used and/or consistent with Scala APIs.

Additionally, this PR exposes `structType` in R to allow working around in other possible corner cases.

**Python**

`from_json`

```python
from pyspark.sql.functions import from_json

data = [(1, '''{"a": 1}''')]
df = spark.createDataFrame(data, ("key", "value"))
df.select(from_json(df.value, "a INT").alias("json")).show()
```

**R**

`from_json`

```R
df <- sql("SELECT named_struct('name', 'Bob') as people")
df <- mutate(df, people_json = to_json(df$people))
head(select(df, from_json(df$people_json, "name STRING")))
```

`structType.character`

```R
structType("a STRING, b INT")
```

`dapply`

```R
dapply(createDataFrame(list(list(1.0)), "a"), function(x) {x}, "a DOUBLE")
```

`gapply`

```R
gapply(createDataFrame(list(list(1.0)), "a"), "a", function(key, x) { x }, "a DOUBLE")
```

## How was this patch tested?

Doc tests for `from_json` in Python and unit tests `test_sparkSQL.R` in R.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #18498 from HyukjinKwon/SPARK-21266.

2bfd5acc

Jun 28, 2017

[SPARK-21224][R] Specify a schema by using a DDL-formatted string when reading in R · db44f5f3

hyukjinkwon authored 7 years ago

## What changes were proposed in this pull request?

This PR proposes to support a DDL-formetted string as schema as below:

```r
mockLines <- c("{\"name\":\"Michael\"}",
               "{\"name\":\"Andy\", \"age\":30}",
               "{\"name\":\"Justin\", \"age\":19}")
jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp")
writeLines(mockLines, jsonPath)
df <- read.df(jsonPath, "json", "name STRING, age DOUBLE")
collect(df)
```

## How was this patch tested?

Tests added in `test_streaming.R` and `test_sparkSQL.R` and manual tests.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #18431 from HyukjinKwon/r-ddl-schema.

db44f5f3

Jun 18, 2017

[SPARK-20892][SPARKR] Add SQL trunc function to SparkR · 110ce1f2

actuaryzhang authored 7 years ago

## What changes were proposed in this pull request?

Add SQL trunc function

## How was this patch tested?
standard test

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #18291 from actuaryzhang/sparkRTrunc2.

110ce1f2

Jun 11, 2017

[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move · 9f4ff955

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

clean up after big test move

## How was this patch tested?

unit tests, jenkins

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #18267 from felixcheung/rtestset2.

9f4ff955

[SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN · dc4c3518

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

Move all existing tests to non-installed directory so that it will never run by installing SparkR package

For a follow-up PR:
- remove all skip_on_cran() calls in tests
- clean up test timer
- improve or change basic tests that do run on CRAN (if anyone has suggestion)

It looks like `R CMD build pkg` will still put pkg\tests (ie. the full tests) into the source package but `R CMD INSTALL` on such source package does not install these tests (and so `R CMD check` does not run them)

## How was this patch tested?

- [x] unit tests, Jenkins
- [x] AppVeyor
- [x] make a source package, install it, `R CMD check` it - verify the full tests are not installed or run

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #18264 from felixcheung/rtestset.

dc4c3518

May 31, 2017

[SPARK-20877][SPARKR][WIP] add timestamps to test runs · 382fefd1

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

to investigate how long they run

## How was this patch tested?

Jenkins, AppVeyor

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #18104 from felixcheung/rtimetest.

382fefd1

May 23, 2017

[SPARK-20727] Skip tests that use Hadoop utils on CRAN Windows · d06610f9

Shivaram Venkataraman authored 7 years ago

## What changes were proposed in this pull request?

This change skips tests that use the Hadoop libraries while running
on CRAN check with Windows as the operating system. This is to handle
cases where the Hadoop winutils binaries are missing on the target
system. The skipped tests consist of
1. Tests that save, load a model in MLlib
2. Tests that save, load CSV, JSON and Parquet files in SQL
3. Hive tests

## How was this patch tested?

Tested by running on a local windows VM with HADOOP_HOME unset. Also testing with https://win-builder.r-project.org

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #17966 from shivaram/sparkr-windows-cran.

d06610f9

May 14, 2017

[SPARK-20726][SPARKR] wrapper for SQL broadcast · 5a799fd8

zero323 authored 7 years ago

## What changes were proposed in this pull request?

- Adds R wrapper for `o.a.s.sql.functions.broadcast`.
- Renames `broadcast` to `broadcast_`.

## How was this patch tested?

Unit tests, check `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17965 from zero323/SPARK-20726.

5a799fd8

May 12, 2017

[SPARK-20704][SPARKR] change CRAN test to run single thread · 888b84ab

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

- [x] need to test by running R CMD check --as-cran
- [x] sanity check vignettes

## How was this patch tested?

Jenkins

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17945 from felixcheung/rchangesforpackage.

888b84ab

May 09, 2017

[SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames() test fails · b952b44a

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

Change it to check for relative count like in this test https://github.com/apache/spark/blame/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L3355 for catalog APIs

## How was this patch tested?

unit tests, this needs to combine with another commit with SQL change to check

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17905 from felixcheung/rtabletests.

b952b44a

May 08, 2017

[SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails · 2abfee18

Hossein authored 7 years ago

## What changes were proposed in this pull request?
Cleaning existing temp tables before running tableNames tests

## How was this patch tested?
SparkR Unit tests

Author: Hossein <hossein@databricks.com>

Closes #17903 from falaki/SPARK-20661.

2abfee18

[SPARK-20626][SPARKR] address date test warning with timezone on windows · c24bdaab

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

set timezone on windows

## How was this patch tested?

unit test, AppVeyor

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17892 from felixcheung/rtimestamptest.

c24bdaab

May 07, 2017

[SPARK-20550][SPARKR] R wrapper for Dataset.alias · 1f73d358

zero323 authored 7 years ago

## What changes were proposed in this pull request?

- Add SparkR wrapper for `Dataset.alias`.
- Adjust roxygen annotations for `functions.alias` (including example usage).

## How was this patch tested?

Unit tests, `check_cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17825 from zero323/SPARK-20550.

1f73d358

[SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppVeyor · 7087e011

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

add environment

## How was this patch tested?

wait for appveyor run

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17878 from felixcheung/appveyorrcran.

7087e011

May 04, 2017

[SPARK-20544][SPARKR] R wrapper for input_file_name · f21897fc

zero323 authored 7 years ago

## What changes were proposed in this pull request?

Adds wrapper for `o.a.s.sql.functions.input_file_name`

## How was this patch tested?

Existing unit tests, additional unit tests, `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17818 from zero323/SPARK-20544.

f21897fc

[SPARK-20585][SPARKR] R generic hint support · 9c36aa27

zero323 authored 7 years ago

## What changes were proposed in this pull request?

Adds support for generic hints on `SparkDataFrame`

## How was this patch tested?

Unit tests, `check-cran.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17851 from zero323/SPARK-20585.

9c36aa27

May 03, 2017

[SPARK-20543][SPARKR] skip tests when running on CRAN · fc472bdd

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

General rule on skip or not:
skip if
- RDD tests
- tests could run long or complicated (streaming, hivecontext)
- tests on error conditions
- tests won't likely change/break

## How was this patch tested?

unit tests, `R CMD check --as-cran`, `R CMD check`

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17817 from felixcheung/rskiptest.

fc472bdd

May 01, 2017

[SPARK-20532][SPARKR] Implement grouping and grouping_id · 90d77e97

zero323 authored 7 years ago

## What changes were proposed in this pull request?

Adds R wrappers for:

- `o.a.s.sql.functions.grouping` as `o.a.s.sql.functions.is_grouping` (to avoid shading `base::grouping`
- `o.a.s.sql.functions.grouping_id`

## How was this patch tested?

Existing unit tests, additional unit tests. `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17807 from zero323/SPARK-20532.

90d77e97

[SPARK-20490][SPARKR] Add R wrappers for eqNullSafe and ! / not · 80e9cf1b

zero323 authored 7 years ago

## What changes were proposed in this pull request?

- Add null-safe equality operator `%<=>%` (sames as `o.a.s.sql.Column.eqNullSafe`, `o.a.s.sql.Column.<=>`)
- Add boolean negation operator `!` and function `not `.

## How was this patch tested?

Existing unit tests, additional unit tests, `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17783 from zero323/SPARK-20490.

80e9cf1b

Apr 30, 2017

[SPARK-20535][SPARKR] R wrappers for explode_outer and posexplode_outer · ae3df4e9

zero323 authored 7 years ago

## What changes were proposed in this pull request?

Ad R wrappers for

- `o.a.s.sql.functions.explode_outer`
- `o.a.s.sql.functions.posexplode_outer`

## How was this patch tested?

Additional unit tests, manual testing.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17809 from zero323/SPARK-20535.

ae3df4e9

Apr 29, 2017

[SPARK-20493][R] De-duplicate parse logics for DDL-like type strings in R · 70f1bcd7

hyukjinkwon authored 7 years ago

## What changes were proposed in this pull request?

It seems we are using `SQLUtils.getSQLDataType` for type string in structField. It looks we can replace this with `CatalystSqlParser.parseDataType`.

They look similar DDL-like type definitions as below:

```scala
scala> Seq(Tuple1(Tuple1("a"))).toDF.show()
```
```
+---+
| _1|
+---+
|[a]|
+---+
```

```scala
scala> Seq(Tuple1(Tuple1("a"))).toDF.select($"_1".cast("struct<_1:string>")).show()
```
```
+---+
| _1|
+---+
|[a]|
+---+
```

Such type strings looks identical when R’s one as below:

```R
> write.df(sql("SELECT named_struct('_1', 'a') as struct"), "/tmp/aa", "parquet")
> collect(read.df("/tmp/aa", "parquet", structType(structField("struct", "struct<_1:string>"))))
  struct
1      a
```

R’s one is stricter because we are checking the types via regular expressions in R side ahead.

Actual logics there look a bit different but as we check it ahead in R side, it looks replacing it would not introduce (I think) no behaviour changes. To make this sure, the tests dedicated for it were added in SPARK-20105. (It looks `structField` is the only place that calls this method).

## How was this patch tested?

Existing tests - https://github.com/apache/spark/blob/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L143-L194 should cover this.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17785 from HyukjinKwon/SPARK-20493.

70f1bcd7

Apr 26, 2017

[SPARK-20437][R] R wrappers for rollup and cube · df58a95a

zero323 authored 7 years ago

## What changes were proposed in this pull request?

- Add `rollup` and `cube` methods and corresponding generics.
- Add short description to the vignette.

## How was this patch tested?

- Existing unit tests.
- Additional unit tests covering new features.
- `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17728 from zero323/SPARK-20437.

df58a95a

Apr 24, 2017

[SPARK-20438][R] SparkR wrappers for split and repeat · 8a272ddc

zero323 authored 7 years ago

## What changes were proposed in this pull request?

Add wrappers for `o.a.s.sql.functions`:

- `split` as `split_string`
- `repeat` as `repeat_string`

## How was this patch tested?

Existing tests, additional unit tests, `check-cran.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17729 from zero323/SPARK-20438.

8a272ddc

Apr 21, 2017

[SPARK-20371][R] Add wrappers for collect_list and collect_set · fd648bff

zero323 authored 7 years ago

## What changes were proposed in this pull request?

Adds wrappers for `collect_list` and `collect_set`.

## How was this patch tested?

Unit tests, `check-cran.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17672 from zero323/SPARK-20371.

fd648bff

Apr 19, 2017

[SPARK-20375][R] R wrappers for array and map · 46c57497

zero323 authored 7 years ago

## What changes were proposed in this pull request?

Adds wrappers for `o.a.s.sql.functions.array` and `o.a.s.sql.functions.map`

## How was this patch tested?

Unit tests, `check-cran.sh`

Author: zero323 <zero323@users.noreply.github.com>

Closes #17674 from zero323/SPARK-20375.

46c57497

Apr 17, 2017

[SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json.array in from_json function in R · 24f09b39

hyukjinkwon authored 7 years ago

## What changes were proposed in this pull request?

This was suggested to be `as.json.array` at the first place in the PR to SPARK-19828 but we could not do this as the lint check emits an error for multiple dots in the variable names.

After SPARK-20278, now we are able to use `multiple.dots.in.names`. `asJsonArray` in `from_json` function is still able to be changed as 2.2 is not released yet.

So, this PR proposes to rename `asJsonArray` to `as.json.array`.

## How was this patch tested?

Jenkins tests, local tests with `./R/run-tests.sh` and manual `./dev/lint-r`. Existing tests should cover this.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17653 from HyukjinKwon/SPARK-19828-followup.

24f09b39

Apr 12, 2017

[SPARK-20298][SPARKR][MINOR] fixed spelling mistake "charactor" · 044f7ecb

Brendan Dwyer authored 8 years ago

## What changes were proposed in this pull request?

Fixed spelling of "charactor"

## How was this patch tested?

Spelling change only

Author: Brendan Dwyer <brendan.dwyer@ibm.com>

Closes #17611 from bdwyer2/SPARK-20298.

044f7ecb

Apr 06, 2017

[SPARK-20195][SPARKR][SQL] add createTable catalog API and deprecate createExternalTable · 5a693b41

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Following up on #17483, add createTable (which is new in 2.2.0) and deprecate createExternalTable, plus a number of minor fixes

## How was this patch tested?

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17511 from felixcheung/rceatetable.

5a693b41

Apr 02, 2017

[SPARK-20159][SPARKR][SQL] Support all catalog API in R · 93dbfe70

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Add a set of catalog API in R

```
"currentDatabase",
"listColumns",
"listDatabases",
"listFunctions",
"listTables",
"recoverPartitions",
"refreshByPath",
"refreshTable",
"setCurrentDatabase",
```
https://github.com/apache/spark/pull/17483/files#diff-6929e6c5e59017ff954e110df20ed7ff

## How was this patch tested?

manual tests, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17483 from felixcheung/rcatalog.

93dbfe70

Mar 27, 2017

[SPARK-20105][TESTS][R] Add tests for checkType and type string in structField in R · 3fada2f5

hyukjinkwon authored 8 years ago

## What changes were proposed in this pull request?

It seems `checkType` and the type string in `structField` are not being tested closely. This string format currently seems SparkR-specific (see https://github.com/apache/spark/blob/d1f6c64c4b763c05d6d79ae5497f298dc3835f3e/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L93-L131) but resembles SQL type definition.

Therefore, it seems nicer if we test positive/negative cases in R side.

## How was this patch tested?

Unit tests in `test_sparkSQL.R`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17439 from HyukjinKwon/r-typestring-tests.

3fada2f5

Mar 20, 2017

[SPARK-19949][SQL] unify bad record handling in CSV and JSON · 68d65fae

Wenchen Fan authored 8 years ago

## What changes were proposed in this pull request?

Currently JSON and CSV have exactly the same logic about handling bad records, this PR tries to abstract it and put it in a upper level to reduce code duplication.

The overall idea is, we make the JSON and CSV parser to throw a BadRecordException, then the upper level, FailureSafeParser, handles bad records according to the parse mode.

Behavior changes:
1. with PERMISSIVE mode, if the number of tokens doesn't match the schema, previously CSV parser will treat it as a legal record and parse as many tokens as possible. After this PR, we treat it as an illegal record, and put the raw record string in a special column, but we still parse as many tokens as possible.
2. all logging is removed as they are not very useful in practice.

## How was this patch tested?

existing tests

Author: Wenchen Fan <wenchen@databricks.com>
Author: hyukjinkwon <gurwls223@gmail.com>
Author: Wenchen Fan <cloud0fan@gmail.com>

Closes #17315 from cloud-fan/bad-record2.

68d65fae

[SPARK-20020][SPARKR] DataFrame checkpoint API · c4059772

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Add checkpoint, setCheckpointDir API to R

## How was this patch tested?

unit tests, manual tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17351 from felixcheung/rdfcheckpoint.

c4059772

[SPARK-19849][SQL] Support ArrayType in to_json to produce JSON array · 0cdcf911

hyukjinkwon authored 8 years ago

## What changes were proposed in this pull request?

This PR proposes to support an array of struct type in `to_json` as below:

```scala
import org.apache.spark.sql.functions._

val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
df.select(to_json($"a").as("json")).show()
```

```
+----------+
|      json|
+----------+
|[{"_1":1}]|
+----------+
```

Currently, it throws an exception as below (a newline manually inserted for readability):

```
org.apache.spark.sql.AnalysisException: cannot resolve 'structtojson(`array`)' due to data type
mismatch: structtojson requires that the expression is a struct expression.;;
```

This allows the roundtrip with `from_json` as below:

```scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
val df = Seq("""[{"a":1}, {"a":2}]""").toDF("json").select(from_json($"json", schema).as("array"))
df.show()

// Read back.
df.select(to_json($"array").as("json")).show()
```

```
+----------+
|     array|
+----------+
|[[1], [2]]|
+----------+

+-----------------+
|             json|
+-----------------+
|[{"a":1},{"a":2}]|
+-----------------+
```

Also, this PR proposes to rename from `StructToJson` to `StructsToJson ` and `JsonToStruct` to `JsonToStructs`.

## How was this patch tested?

Unit tests in `JsonFunctionsSuite` and `JsonExpressionsSuite` for Scala, doctest for Python and test in `test_sparkSQL.R` for R.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17192 from HyukjinKwon/SPARK-19849.

0cdcf911

Mar 19, 2017

[SPARK-18817][SPARKR][SQL] change derby log output to temp dir · 422aa67d

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Passes R `tempdir()` (this is the R session temp dir, shared with other temp files/dirs) to JVM, set System.Property for derby home dir to move derby.log

## How was this patch tested?

Manually, unit tests

With this, these are relocated to under /tmp
```
# ls /tmp/RtmpG2M0cB/
derby.log
```
And they are removed automatically when the R session is ended.

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16330 from felixcheung/rderby.

422aa67d

Mar 14, 2017

[SPARK-19828][R] Support array type in from_json in R · d1f6c64c

hyukjinkwon authored 8 years ago

## What changes were proposed in this pull request?

Since we could not directly define the array type in R, this PR proposes to support array types in R as string types that are used in `structField` as below:

```R
jsonArr <- "[{\"name\":\"Bob\"}, {\"name\":\"Alice\"}]"
df <- as.DataFrame(list(list("people" = jsonArr)))
collect(select(df, alias(from_json(df$people, "array<struct<name:string>>"), "arrcol")))
```

prints

```R
      arrcol
1 Bob, Alice
```

## How was this patch tested?

Unit tests in `test_sparkSQL.R`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17178 from HyukjinKwon/SPARK-19828.

d1f6c64c

Mar 08, 2017

[SPARK-19601][SQL] Fix CollapseRepartition rule to preserve shuffle-enabled Repartition · 9a6ac722

Xiao Li authored 8 years ago

### What changes were proposed in this pull request?

Observed by felixcheung  in https://github.com/apache/spark/pull/16739, when users use the shuffle-enabled `repartition` API, they expect the partition they got should be the exact number they provided, even if they call shuffle-disabled `coalesce` later.

Currently, `CollapseRepartition` rule does not consider whether shuffle is enabled or not. Thus, we got the following unexpected result.

```Scala
    val df = spark.range(0, 10000, 1, 5)
    val df2 = df.repartition(10)
    assert(df2.coalesce(13).rdd.getNumPartitions == 5)
    assert(df2.coalesce(7).rdd.getNumPartitions == 5)
    assert(df2.coalesce(3).rdd.getNumPartitions == 3)
```

This PR is to fix the issue. We preserve shuffle-enabled Repartition.

### How was this patch tested?
Added a test case

Author: Xiao Li <gatorsmile@gmail.com>

Closes #16933 from gatorsmile/CollapseRepartition.

9a6ac722

Mar 06, 2017

[SPARK-19818][SPARKR] rbind should check for name consistency of input data frames · 1f6c090c

actuaryzhang authored 8 years ago

## What changes were proposed in this pull request?
Added checks for name consistency of input data frames in union.

## How was this patch tested?
new test.

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #17159 from actuaryzhang/sparkRUnion.

1f6c090c

Mar 05, 2017

[SPARK-19795][SPARKR] add column functions to_json, from_json · 80d5338b

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Add column functions: to_json, from_json, and tests covering error cases.

## How was this patch tested?

unit tests, manual

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17134 from felixcheung/rtojson.

80d5338b

Mar 01, 2017

[DOC][MINOR][SPARKR] Update SparkR doc for names, columns and colnames · 2ff1467d

actuaryzhang authored 8 years ago

Update R doc:
1. columns, names and colnames returns a vector of strings, not **list** as in current doc.
2. `colnames<-` does allow the subset assignment, so the length of `value` can be less than the number of columns, e.g., `colnames(df)[1] <- "a"`.

felixcheung

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #17115 from actuaryzhang/sparkRMinorDoc.

2ff1467d

Feb 23, 2017

[SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[" takes vector index · 7bf09433

actuaryzhang authored 8 years ago

## What changes were proposed in this pull request?
The `[[` method is supposed to take a single index and return a column. This is different from base R which takes a vector index.  We should check for this and issue warning or error when vector index is supplied (which is very likely given the behavior in base R).

Currently I'm issuing a warning message and just take the first element of the vector index. We could change this to an error it that's better.

## How was this patch tested?
new tests

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #17017 from actuaryzhang/sparkRSubsetter.

7bf09433