Skip to content
Snippets Groups Projects
user avatar
Davies Liu authored
Add several default configs for PySpark, related to serialization in JVM.

spark.serializer = org.apache.spark.serializer.KryoSerializer
spark.serializer.objectStreamReset = 100
spark.rdd.compress = True

This will help to reduce the memory usage during RDD.partitionBy()

Author: Davies Liu <davies.liu@gmail.com>

Closes #1568 from davies/conf and squashes the following commits:

cd316f1 [Davies Liu] remove duplicated line
f71a355 [Davies Liu] rebase to master, add spark.rdd.compress = True
8f63f45 [Davies Liu] Merge branch 'master' into conf
8bc9f08 [Davies Liu] fix unittest
c04a83d [Davies Liu] some default configs for PySpark
75663b57
History