Commit f1933123 authored 10 years ago by Aaron Davidson Committed by Matei Zaharia 10 years ago

SPARK-2028: Expose mapPartitionsWithInputSplit in HadoopRDD

This allows users to gain access to the InputSplit which backs each partition.

An alternative solution would have been to have a .withInputSplit() method which returns a new RDD[(InputSplit, (K, V))], but this is confusing because you could not cache this RDD or shuffle it, as InputSplit is not inherently serializable.

Author: Aaron Davidson <aaron@databricks.com>

Closes #973 from aarondav/hadoop and squashes the following commits:

9c9112b [Aaron Davidson] Add JavaAPISuite test
9942cd7 [Aaron Davidson] Add Java API
1284a3a [Aaron Davidson] SPARK-2028: Expose mapPartitionsWithInputSplit in HadoopRDD

parent 72cfb139

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 222 additions and 11 deletions

Please register or to comment