Skip to content
  • Davies Liu's avatar
    1ef656ea
    [SPARK-3047] [PySpark] add an option to use str in textFileRDD · 1ef656ea
    Davies Liu authored
    str is much efficient than unicode (both CPU and memory), it'e better to use str in textFileRDD. In order to keep compatibility, use unicode by default. (Maybe change it in the future).
    
    use_unicode=True:
    
    daviesliudm:~/work/spark$ time python wc.py
    (u'./universe/spark/sql/core/target/java/org/apache/spark/sql/execution/ExplainCommand$.java', 7776)
    
    real	2m8.298s
    user	0m0.185s
    sys	0m0.064s
    
    use_unicode=False
    
    daviesliudm:~/work/spark$ time python wc.py
    ('./universe/spark/sql/core/target/java/org/apache/spark/sql/execution/ExplainCommand$.java', 7776)
    
    real	1m26.402s
    user	0m0.182s
    sys	0m0.062s
    
    We can see that it got 32% improvement!
    
    Author: Davies Liu <davies.liu@gmail.com>
    
    Closes #1951 from davies/unicode and squashes the following commits:
    
    8352d57 [Davies Liu] update version number
    a286f2f [Davies Liu] rollback loads()
    85246e5 [Davies Liu] add docs for use_unicode
    a0295e1 [Davies Liu] add an option to use str in textFile()
    1ef656ea
    [SPARK-3047] [PySpark] add an option to use str in textFileRDD
    Davies Liu authored
    str is much efficient than unicode (both CPU and memory), it'e better to use str in textFileRDD. In order to keep compatibility, use unicode by default. (Maybe change it in the future).
    
    use_unicode=True:
    
    daviesliudm:~/work/spark$ time python wc.py
    (u'./universe/spark/sql/core/target/java/org/apache/spark/sql/execution/ExplainCommand$.java', 7776)
    
    real	2m8.298s
    user	0m0.185s
    sys	0m0.064s
    
    use_unicode=False
    
    daviesliudm:~/work/spark$ time python wc.py
    ('./universe/spark/sql/core/target/java/org/apache/spark/sql/execution/ExplainCommand$.java', 7776)
    
    real	1m26.402s
    user	0m0.182s
    sys	0m0.062s
    
    We can see that it got 32% improvement!
    
    Author: Davies Liu <davies.liu@gmail.com>
    
    Closes #1951 from davies/unicode and squashes the following commits:
    
    8352d57 [Davies Liu] update version number
    a286f2f [Davies Liu] rollback loads()
    85246e5 [Davies Liu] add docs for use_unicode
    a0295e1 [Davies Liu] add an option to use str in textFile()
Loading