Skip to content
Snippets Groups Projects
  • Andrew Or's avatar
    1896c6e7
    Merge pull request #533 from andrewor14/master. Closes #533. · 1896c6e7
    Andrew Or authored
    External spilling - generalize batching logic
    
    The existing implementation consists of a hack for Kryo specifically and only works for LZF compression. Introducing an intermediate batch-level stream takes care of pre-fetching and other arbitrary behavior of higher level streams in a more general way.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    == Merge branch commits ==
    
    commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Wed Feb 5 12:09:32 2014 -0800
    
        Also privatize fields
    
    commit 090544a87a0767effd0c835a53952f72fc8d24f0
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Wed Feb 5 10:58:23 2014 -0800
    
        Privatize methods
    
    commit 13920c918efe22e66a1760b14beceb17a61fd8cc
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Tue Feb 4 16:34:15 2014 -0800
    
        Update docs
    
    commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Tue Feb 4 13:44:24 2014 -0800
    
        Typo: phyiscal -> physical
    
    commit 287ef44e593ad72f7434b759be3170d9ee2723d2
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Tue Feb 4 13:38:32 2014 -0800
    
        Avoid reading the entire batch into memory; also simplify streaming logic
    
        Additionally, address formatting comments.
    
    commit 3df700509955f7074821e9aab1e74cb53c58b5a5
    Merge: a531d2e 164489d
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Mon Feb 3 18:27:49 2014 -0800
    
        Merge branch 'master' of github.com:andrewor14/incubator-spark
    
    commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Mon Feb 3 18:18:04 2014 -0800
    
        Relax assumptions on compressors and serializers when batching
    
        This commit introduces an intermediate layer of an input stream on the batch level.
        This guards against interference from higher level streams (i.e. compression and
        deserialization streams), especially pre-fetching, without specifically targeting
        particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
    
    commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Mon Feb 3 18:18:04 2014 -0800
    
        Relax assumptions on compressors and serializers when batching
    
        This commit introduces an intermediate layer of an input stream on the batch level.
        This guards against interference from higher level streams (i.e. compression and
        deserialization streams), especially pre-fetching, without specifically targeting
        particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
    1896c6e7
    History
    Merge pull request #533 from andrewor14/master. Closes #533.
    Andrew Or authored
    External spilling - generalize batching logic
    
    The existing implementation consists of a hack for Kryo specifically and only works for LZF compression. Introducing an intermediate batch-level stream takes care of pre-fetching and other arbitrary behavior of higher level streams in a more general way.
    
    Author: Andrew Or <andrewor14@gmail.com>
    
    == Merge branch commits ==
    
    commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Wed Feb 5 12:09:32 2014 -0800
    
        Also privatize fields
    
    commit 090544a87a0767effd0c835a53952f72fc8d24f0
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Wed Feb 5 10:58:23 2014 -0800
    
        Privatize methods
    
    commit 13920c918efe22e66a1760b14beceb17a61fd8cc
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Tue Feb 4 16:34:15 2014 -0800
    
        Update docs
    
    commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Tue Feb 4 13:44:24 2014 -0800
    
        Typo: phyiscal -> physical
    
    commit 287ef44e593ad72f7434b759be3170d9ee2723d2
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Tue Feb 4 13:38:32 2014 -0800
    
        Avoid reading the entire batch into memory; also simplify streaming logic
    
        Additionally, address formatting comments.
    
    commit 3df700509955f7074821e9aab1e74cb53c58b5a5
    Merge: a531d2e 164489d
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Mon Feb 3 18:27:49 2014 -0800
    
        Merge branch 'master' of github.com:andrewor14/incubator-spark
    
    commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Mon Feb 3 18:18:04 2014 -0800
    
        Relax assumptions on compressors and serializers when batching
    
        This commit introduces an intermediate layer of an input stream on the batch level.
        This guards against interference from higher level streams (i.e. compression and
        deserialization streams), especially pre-fetching, without specifically targeting
        particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
    
    commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af
    Author: Andrew Or <andrewor14@gmail.com>
    Date:   Mon Feb 3 18:18:04 2014 -0800
    
        Relax assumptions on compressors and serializers when batching
    
        This commit introduces an intermediate layer of an input stream on the batch level.
        This guards against interference from higher level streams (i.e. compression and
        deserialization streams), especially pre-fetching, without specifically targeting
        particular libraries (Kryo) and forcing shuffle spill compression to use LZF.