Skip to content
Snippets Groups Projects
  • Matei Zaharia's avatar
    feba7ee5
    SPARK-815. Python parallelize() should split lists before batching · feba7ee5
    Matei Zaharia authored
    One unfortunate consequence of this fix is that we materialize any
    collections that are given to us as generators, but this seems necessary
    to get reasonable behavior on small collections. We could add a
    batchSize parameter later to bypass auto-computation of batch size if
    this becomes a problem (e.g. if users really want to parallelize big
    generators nicely)
    feba7ee5
    History
    SPARK-815. Python parallelize() should split lists before batching
    Matei Zaharia authored
    One unfortunate consequence of this fix is that we materialize any
    collections that are given to us as generators, but this seems necessary
    to get reasonable behavior on small collections. We could add a
    batchSize parameter later to bypass auto-computation of batch size if
    this becomes a problem (e.g. if users really want to parallelize big
    generators nicely)