Author: Jeffrey Aven
Publisher: Sams Publishing
ISBN: 0134456726
Category : Computers
Languages : en
Pages : 851
Book Description
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.
Hadoop in 24 Hours, Sams Teach Yourself
Author: Jeffrey Aven
Publisher: Sams Publishing
ISBN: 0134456726
Category : Computers
Languages : en
Pages : 851
Book Description
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.
Publisher: Sams Publishing
ISBN: 0134456726
Category : Computers
Languages : en
Pages : 851
Book Description
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.
Sams Teach Yourself Hadoop in 24 Hours
Author: Jeffrey Aven
Publisher: Sams Publishing
ISBN: 9780672338526
Category : Apache Hadoop
Languages : en
Pages : 0
Book Description
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, students can learn all the skills and techniques they'll need to deploy each key component of a Hadoop platform in a local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping students master all of Hadoop's essentials, and extend it to meet real-world challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk students through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; Did You Know? tips offer insider advice and shortcuts; and Watch Out! alerts help avoid pitfalls. By the time they're finished, they'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.
Publisher: Sams Publishing
ISBN: 9780672338526
Category : Apache Hadoop
Languages : en
Pages : 0
Book Description
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, students can learn all the skills and techniques they'll need to deploy each key component of a Hadoop platform in a local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping students master all of Hadoop's essentials, and extend it to meet real-world challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk students through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; Did You Know? tips offer insider advice and shortcuts; and Watch Out! alerts help avoid pitfalls. By the time they're finished, they'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.
Sams Teach Yourself Hadoop in 24 Hours
Author: Jeffrey Aven
Publisher:
ISBN: 9780134456737
Category : Apache Hadoop
Languages : en
Pages :
Book Description
Publisher:
ISBN: 9780134456737
Category : Apache Hadoop
Languages : en
Pages :
Book Description
Apache Spark in 24 Hours, Sams Teach Yourself
Author: Jeffrey Aven
Publisher: Sams Publishing
ISBN: 0134445821
Category : Computers
Languages : en
Pages : 1353
Book Description
Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility. This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data. Learn how to • Discover what Apache Spark does and how it fits into the Big Data landscape • Deploy and run Spark locally or in the cloud • Interact with Spark from the shell • Make the most of the Spark Cluster Architecture • Develop Spark applications with Scala and functional Python • Program with the Spark API, including transformations and actions • Apply practical data engineering/analysis approaches designed for Spark • Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output • Optimize Spark solution performance • Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra) • Leverage cutting-edge functional programming techniques • Extend Spark with streaming, R, and Sparkling Water • Start building Spark-based machine learning and graph-processing applications • Explore advanced messaging technologies, including Kafka • Preview and prepare for Spark’s next generation of innovations Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.
Publisher: Sams Publishing
ISBN: 0134445821
Category : Computers
Languages : en
Pages : 1353
Book Description
Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility. This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data. Learn how to • Discover what Apache Spark does and how it fits into the Big Data landscape • Deploy and run Spark locally or in the cloud • Interact with Spark from the shell • Make the most of the Spark Cluster Architecture • Develop Spark applications with Scala and functional Python • Program with the Spark API, including transformations and actions • Apply practical data engineering/analysis approaches designed for Spark • Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output • Optimize Spark solution performance • Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra) • Leverage cutting-edge functional programming techniques • Extend Spark with streaming, R, and Sparkling Water • Start building Spark-based machine learning and graph-processing applications • Explore advanced messaging technologies, including Kafka • Preview and prepare for Spark’s next generation of innovations Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.
Sams Teach Yourself ADO .NET in 24 Hours
Author: Jason Lefebvre
Publisher: Sams Publishing
ISBN: 9780672323836
Category : Computers
Languages : en
Pages : 414
Book Description
In 24 easy lessons, learn the new object model to retrieve and work with data from multiple sources.
Publisher: Sams Publishing
ISBN: 9780672323836
Category : Computers
Languages : en
Pages : 414
Book Description
In 24 easy lessons, learn the new object model to retrieve and work with data from multiple sources.
Sams Teach Yourself Node.js in 24 Hours
Author: George Ornbo
Publisher: Sams Publishing
ISBN: 0132966263
Category : Computers
Languages : en
Pages : 1029
Book Description
In just 24 sessions of one hour or less, Sams Teach Yourself Node.js in 24 Hours will help you master the Node.js platform and use it to build server-side applications with extraordinary speed and scalability. Using this text’s straightforward, step-by-step approach, you’ll move from basic installation, configuration, and programming all the way through real-time messaging between browser and server, testing and deployment. Every lesson and case-study application builds on what you’ve already learned, giving you a rock-solid foundation for real-world success! Step-by-step instructions carefully walk you through the most common Node.js development tasks. Quizzes and Exercises at the end of each chapter help you test your knowledge. By the Way notes present valuable additional information related to the discussion. Did You Know? tips offer advice or show you easier ways to perform tasks. Watch Out! cautions alert you to possible problems and give you advice on how to avoid them. Learn how to... · Create end-to-end applications entirely in JavaScript · Master essential Node.js concepts like callbacks and quickly create your first program · Create basic sites with the HTTP module and Express web framework · Manage data persistence with Node.js and MongoDB · Debug and test Node.js applications · Deploy Node.js applications to thirdparty services, such as Heroku and Nodester · Build powerful real-time solutions, from chat servers to Twitter clients · Create JSON APIs using JavaScript on the server · Use core components of the Node.js API, including processes, child processes, events, buffers, and streams · Create and publish a Node.js module
Publisher: Sams Publishing
ISBN: 0132966263
Category : Computers
Languages : en
Pages : 1029
Book Description
In just 24 sessions of one hour or less, Sams Teach Yourself Node.js in 24 Hours will help you master the Node.js platform and use it to build server-side applications with extraordinary speed and scalability. Using this text’s straightforward, step-by-step approach, you’ll move from basic installation, configuration, and programming all the way through real-time messaging between browser and server, testing and deployment. Every lesson and case-study application builds on what you’ve already learned, giving you a rock-solid foundation for real-world success! Step-by-step instructions carefully walk you through the most common Node.js development tasks. Quizzes and Exercises at the end of each chapter help you test your knowledge. By the Way notes present valuable additional information related to the discussion. Did You Know? tips offer advice or show you easier ways to perform tasks. Watch Out! cautions alert you to possible problems and give you advice on how to avoid them. Learn how to... · Create end-to-end applications entirely in JavaScript · Master essential Node.js concepts like callbacks and quickly create your first program · Create basic sites with the HTTP module and Express web framework · Manage data persistence with Node.js and MongoDB · Debug and test Node.js applications · Deploy Node.js applications to thirdparty services, such as Heroku and Nodester · Build powerful real-time solutions, from chat servers to Twitter clients · Create JSON APIs using JavaScript on the server · Use core components of the Node.js API, including processes, child processes, events, buffers, and streams · Create and publish a Node.js module
Sams Teach Yourself SAP in 24 Hours
Author: Tim Rhodes
Publisher: Pearson Education
ISBN: 0132715104
Category : Computers
Languages : en
Pages : 529
Book Description
Third Edition: Thoroughly Updated and Expanded, with Extensive New Coverage! In just 24 sessions of one hour or less, you’ll master the entire SAP project lifecycle, from planning through implementation and system administration through day-to-day operations. Using this book’s straightforward, step-by-step approach, you’ll gain a strong real-world foundation in both the technology and business essentials of today’s SAP products and applications—from the ground up. Step-by-step instructions walk you through the most common questions, issues, and tasks you’ll encounter with SAP. Case study-based exercises help you build and test your knowledge. By the Way notes present interesting pieces of information. Did You Know? tips offer advice or teach an easier way. Watch Out! cautions warn about potential problems. Learn how to... Understand SAP’s newest products for enterprises and small-to-midsize businesses, and choose the right solutions for your company Discover how SAP integrates with Web services and service-oriented architecture Develop an efficient roadmap for deploying SAP in your environment Plan your SAP implementation from business, functional, technical, and project management perspectives Leverage NetWeaver 7.0 features to streamline development and integration, and reduce cost Walk through a step-by-step SAP technical installation Master basic SAP system administration and operations Perform essential tasks such as logon, session management, and printing Build SAP queries and reports Prepare for SAP upgrades and enhancements Develop your own personal career as an SAP professional Register your book at informit.com/title/9780137142842 for convenient access to updates and corrections as they become available.
Publisher: Pearson Education
ISBN: 0132715104
Category : Computers
Languages : en
Pages : 529
Book Description
Third Edition: Thoroughly Updated and Expanded, with Extensive New Coverage! In just 24 sessions of one hour or less, you’ll master the entire SAP project lifecycle, from planning through implementation and system administration through day-to-day operations. Using this book’s straightforward, step-by-step approach, you’ll gain a strong real-world foundation in both the technology and business essentials of today’s SAP products and applications—from the ground up. Step-by-step instructions walk you through the most common questions, issues, and tasks you’ll encounter with SAP. Case study-based exercises help you build and test your knowledge. By the Way notes present interesting pieces of information. Did You Know? tips offer advice or teach an easier way. Watch Out! cautions warn about potential problems. Learn how to... Understand SAP’s newest products for enterprises and small-to-midsize businesses, and choose the right solutions for your company Discover how SAP integrates with Web services and service-oriented architecture Develop an efficient roadmap for deploying SAP in your environment Plan your SAP implementation from business, functional, technical, and project management perspectives Leverage NetWeaver 7.0 features to streamline development and integration, and reduce cost Walk through a step-by-step SAP technical installation Master basic SAP system administration and operations Perform essential tasks such as logon, session management, and printing Build SAP queries and reports Prepare for SAP upgrades and enhancements Develop your own personal career as an SAP professional Register your book at informit.com/title/9780137142842 for convenient access to updates and corrections as they become available.
Big Data Analytics with Microsoft HDInsight in 24 Hours, Sams Teach Yourself
Author: Manpreet Singh
Publisher: Sams Publishing
ISBN: 013403533X
Category : Computers
Languages : en
Pages : 1044
Book Description
Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours In just 24 lessons of one hour or less, Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours helps you leverage Hadoop’s power on a flexible, scalable cloud platform using Microsoft’s newest business intelligence, visualization, and productivity tools. This book’s straightforward, step-by-step approach shows you how to provision, configure, monitor, and troubleshoot HDInsight and use Hadoop cloud services to solve real analytics problems. You’ll gain more of Hadoop’s benefits, with less complexity–even if you’re completely new to Big Data analytics. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Practical, hands-on examples show you how to apply what you learn Quizzes and exercises help you test your knowledge and stretch your skills Notes and tips point out shortcuts and solutions Learn how to... · Master core Big Data and NoSQL concepts, value propositions, and use cases · Work with key Hadoop features, such as HDFS2 and YARN · Quickly install, configure, and monitor Hadoop (HDInsight) clusters in the cloud · Automate provisioning, customize clusters, install additional Hadoop projects, and administer clusters · Integrate, analyze, and report with Microsoft BI and Power BI · Automate workflows for data transformation, integration, and other tasks · Use Apache HBase on HDInsight · Use Sqoop or SSIS to move data to or from HDInsight · Perform R-based statistical computing on HDInsight datasets · Accelerate analytics with Apache Spark · Run real-time analytics on high-velocity data streams · Write MapReduce, Hive, and Pig programs Register your book at informit.com/register for convenient access to downloads, updates, and corrections as they become available.
Publisher: Sams Publishing
ISBN: 013403533X
Category : Computers
Languages : en
Pages : 1044
Book Description
Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours In just 24 lessons of one hour or less, Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours helps you leverage Hadoop’s power on a flexible, scalable cloud platform using Microsoft’s newest business intelligence, visualization, and productivity tools. This book’s straightforward, step-by-step approach shows you how to provision, configure, monitor, and troubleshoot HDInsight and use Hadoop cloud services to solve real analytics problems. You’ll gain more of Hadoop’s benefits, with less complexity–even if you’re completely new to Big Data analytics. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Practical, hands-on examples show you how to apply what you learn Quizzes and exercises help you test your knowledge and stretch your skills Notes and tips point out shortcuts and solutions Learn how to... · Master core Big Data and NoSQL concepts, value propositions, and use cases · Work with key Hadoop features, such as HDFS2 and YARN · Quickly install, configure, and monitor Hadoop (HDInsight) clusters in the cloud · Automate provisioning, customize clusters, install additional Hadoop projects, and administer clusters · Integrate, analyze, and report with Microsoft BI and Power BI · Automate workflows for data transformation, integration, and other tasks · Use Apache HBase on HDInsight · Use Sqoop or SSIS to move data to or from HDInsight · Perform R-based statistical computing on HDInsight datasets · Accelerate analytics with Apache Spark · Run real-time analytics on high-velocity data streams · Write MapReduce, Hive, and Pig programs Register your book at informit.com/register for convenient access to downloads, updates, and corrections as they become available.
Learning Spark
Author: Jules S. Damji
Publisher: O'Reilly Media
ISBN: 1492050016
Category : Computers
Languages : en
Pages : 400
Book Description
Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow
Publisher: O'Reilly Media
ISBN: 1492050016
Category : Computers
Languages : en
Pages : 400
Book Description
Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow
High Performance Spark
Author: Holden Karau
Publisher: "O'Reilly Media, Inc."
ISBN: 1491943173
Category : Computers
Languages : en
Pages : 356
Book Description
Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing. With this book, you’ll explore: How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark’s key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark’s Streaming components and external community packages
Publisher: "O'Reilly Media, Inc."
ISBN: 1491943173
Category : Computers
Languages : en
Pages : 356
Book Description
Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing. With this book, you’ll explore: How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark’s key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark’s Streaming components and external community packages