by Marko Bonaci, Petar Zecevic · 2016
ISBN: 1638351074 9781638351078
Category: Computers / Data Science / General
Page count: 472
<b>Summary</b><br><br><i>Spark in Action</i> teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0.<br><br>Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.<br><br><b>About the Technology</b><br><br>Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades.<br><br><b>About the Book</b><br><br><i>Spark in Action</i> teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code. <br><br><b>What's Inside</b><br><br><ul><li>Updated for Spark 2.0</li><li>Real-life case studies</li><li>Spark DevOps with Docker</li><li>Examples in Scala, and online in Java and Python</li></ul><br><b>About the Reader</b><br><br>Written for experienced programmers with some background in big data or machine learning. <br><br><b>About the Authors</b><br><br><b>Petar Zečević</b> and <b>Marko Bonaći</b> are seasoned developers heavily involved in the Spark community.<br><br><b>Table of Contents</b><br><br><ol>PART 1 - FIRST STEPS<li>Introduction to Apache Spark </li><li>Spark fundamentals </li><li>Writing Spark applications</li><li>The Spark API in depth </li>PART 2 - MEET THE SPARK FAMILY <li>Sparkling queries with Spark SQL </li><li>Ingesting data with Spark Streaming </li><li>Getting smart with MLlib </li><li>ML: classification and clustering </li><li>Connecting the dots with GraphX </li>PART 3 - SPARK OPS<li>Running Spark </li><li>Running on a Spark standalone cluster </li><li>Running on YARN and Mesos</li>PART 4 - BRINGING IT TOGETHER<li>Case study: real-time dashboard </li><li>Deep learning on Spark with H2O </li></ol>