Skip to content
Snippets Groups Projects
  • Tathagata Das's avatar
    79302096
    Merge pull request #497 from tdas/docs-update · 79302096
    Tathagata Das authored
    Updated Spark Streaming Programming Guide
    
    Here is the updated version of the Spark Streaming Programming Guide. This is still a work in progress, but the major changes are in place. So feedback is most welcome.
    
    In general, I have tried to make the guide to easier to understand even if the reader does not know much about Spark. The updated website is hosted here -
    
    http://www.eecs.berkeley.edu/~tdas/spark_docs/streaming-programming-guide.html
    
    The major changes are:
    - Overview illustrates the usecases of Spark Streaming - various input sources and various output sources
    - An example right after overview to quickly give an idea of what Spark Streaming program looks like
    - Made Java API and examples a first class citizen like Scala by using tabs to show both Scala and Java examples (similar to AMPCamp tutorial's code tabs)
    - Highlighted the DStream operations updateStateByKey and transform because of their powerful nature
    - Updated driver node failure recovery text to highlight automatic recovery in Spark standalone mode
    - Added information about linking and using the external input sources like Kafka and Flume
    - In general, reorganized the sections to better show the Basic section and the more advanced sections like Tuning and Recovery.
    
    Todos:
    - Links to the docs of external Kafka, Flume, etc
    - Illustrate window operation with figure as well as example.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    == Merge branch commits ==
    
    commit 18ff10556570b39d672beeb0a32075215cfcc944
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Tue Jan 28 21:49:30 2014 -0800
    
        Fixed a lot of broken links.
    
    commit 34a5a6008dac2e107624c7ff0db0824ee5bae45f
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Tue Jan 28 18:02:28 2014 -0800
    
        Updated github url to use SPARK_GITHUB_URL variable.
    
    commit f338a60ae8069e0a382d2cb170227e5757cc0b7a
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Mon Jan 27 22:42:42 2014 -0800
    
        More updates based on Patrick and Harvey's comments.
    
    commit 89a81ff25726bf6d26163e0dd938290a79582c0f
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Mon Jan 27 13:08:34 2014 -0800
    
        Updated docs based on Patricks PR comments.
    
    commit d5b6196b532b5746e019b959a79ea0cc013a8fc3
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Sun Jan 26 20:15:58 2014 -0800
    
        Added spark.streaming.unpersist config and info on StreamingListener interface.
    
    commit e3dcb46ab83d7071f611d9b5008ba6bc16c9f951
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Sun Jan 26 18:41:12 2014 -0800
    
        Fixed docs on StreamingContext.getOrCreate.
    
    commit 6c29524639463f11eec721e4d17a9d7159f2944b
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Thu Jan 23 18:49:39 2014 -0800
    
        Added example and figure for window operations, and links to Kafka and Flume API docs.
    
    commit f06b964a51bb3b21cde2ff8bdea7d9785f6ce3a9
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Wed Jan 22 22:49:12 2014 -0800
    
        Fixed missing endhighlight tag in the MLlib guide.
    
    commit 036a7d46187ea3f2a0fb8349ef78f10d6c0b43a9
    Merge: eab351d a1cd1851
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Wed Jan 22 22:17:42 2014 -0800
    
        Merge remote-tracking branch 'apache/master' into docs-update
    
    commit eab351d05c0baef1d4b549e1581310087158d78d
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Wed Jan 22 22:17:15 2014 -0800
    
        Update Spark Streaming Programming Guide.
    79302096
    History
    Merge pull request #497 from tdas/docs-update
    Tathagata Das authored
    Updated Spark Streaming Programming Guide
    
    Here is the updated version of the Spark Streaming Programming Guide. This is still a work in progress, but the major changes are in place. So feedback is most welcome.
    
    In general, I have tried to make the guide to easier to understand even if the reader does not know much about Spark. The updated website is hosted here -
    
    http://www.eecs.berkeley.edu/~tdas/spark_docs/streaming-programming-guide.html
    
    The major changes are:
    - Overview illustrates the usecases of Spark Streaming - various input sources and various output sources
    - An example right after overview to quickly give an idea of what Spark Streaming program looks like
    - Made Java API and examples a first class citizen like Scala by using tabs to show both Scala and Java examples (similar to AMPCamp tutorial's code tabs)
    - Highlighted the DStream operations updateStateByKey and transform because of their powerful nature
    - Updated driver node failure recovery text to highlight automatic recovery in Spark standalone mode
    - Added information about linking and using the external input sources like Kafka and Flume
    - In general, reorganized the sections to better show the Basic section and the more advanced sections like Tuning and Recovery.
    
    Todos:
    - Links to the docs of external Kafka, Flume, etc
    - Illustrate window operation with figure as well as example.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    == Merge branch commits ==
    
    commit 18ff10556570b39d672beeb0a32075215cfcc944
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Tue Jan 28 21:49:30 2014 -0800
    
        Fixed a lot of broken links.
    
    commit 34a5a6008dac2e107624c7ff0db0824ee5bae45f
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Tue Jan 28 18:02:28 2014 -0800
    
        Updated github url to use SPARK_GITHUB_URL variable.
    
    commit f338a60ae8069e0a382d2cb170227e5757cc0b7a
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Mon Jan 27 22:42:42 2014 -0800
    
        More updates based on Patrick and Harvey's comments.
    
    commit 89a81ff25726bf6d26163e0dd938290a79582c0f
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Mon Jan 27 13:08:34 2014 -0800
    
        Updated docs based on Patricks PR comments.
    
    commit d5b6196b532b5746e019b959a79ea0cc013a8fc3
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Sun Jan 26 20:15:58 2014 -0800
    
        Added spark.streaming.unpersist config and info on StreamingListener interface.
    
    commit e3dcb46ab83d7071f611d9b5008ba6bc16c9f951
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Sun Jan 26 18:41:12 2014 -0800
    
        Fixed docs on StreamingContext.getOrCreate.
    
    commit 6c29524639463f11eec721e4d17a9d7159f2944b
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Thu Jan 23 18:49:39 2014 -0800
    
        Added example and figure for window operations, and links to Kafka and Flume API docs.
    
    commit f06b964a51bb3b21cde2ff8bdea7d9785f6ce3a9
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Wed Jan 22 22:49:12 2014 -0800
    
        Fixed missing endhighlight tag in the MLlib guide.
    
    commit 036a7d46187ea3f2a0fb8349ef78f10d6c0b43a9
    Merge: eab351d a1cd1851
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Wed Jan 22 22:17:42 2014 -0800
    
        Merge remote-tracking branch 'apache/master' into docs-update
    
    commit eab351d05c0baef1d4b549e1581310087158d78d
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Date:   Wed Jan 22 22:17:15 2014 -0800
    
        Update Spark Streaming Programming Guide.