Working with Big Data LiveLessons: Infrastructure, Algorithms, and Visualizations

Pearson presents Working with Big Data LiveLessons: Infrastructure, Algorithms, and Visualizations.

Working with Big Data LiveLessons: Infrastructure, Algorithms, and Visualizations

Course Description

Working with Big Data: Infrastructure, Algorithms, and Visualizations from Pearson presents a high-level overview of big data and how to use key tools to solve your data challenges. This introduction to the three areas of big data includes:

    • Infrastructure – how to store and process big data
    • Algorithms – how to integrate algorithms into your big data stack and an introduction to classification
    • Visualizations – an introduction to creating visualizations in JavaScript using D3.js.

The goal was not to be exhaustive, but rather, to provide a higher level view of how all the pieces of a big data architecture work together.

Certificate Info:

Type of Certification

Certificate of Completion

Format of Certification

Digital and Print

Professional Association/Affiliation

This certificate is issued by Pearson LearnIT

Method of Obtaining Certification

Upon successful completion of the course, participants will receive a certificate of completion.

Course Outline

  • Introduction to Working with Big Data LiveLessons
  • What is Big Data?
  • Learning objectives
  • Set up a basic Hadoop installation
  • Write data into the Hadoop file system
  • Write a Hadoop streaming job to process text files
  • Learning objectives
  • Set up a basic Cassandra installation
  • Create a Cassandra schema for storing data
  • Store and retrieve data from Cassandra using the Ruby library
  • Write data into Cassandra from a Hadoop streaming job
  • Use the Hadoop reduce phase to parallelize writes
  • Learning objectives
  • Set up the Kafka messaging system
  • Publish and consume data from Kafka in Ruby
  • Aggregate log files into Hadoop using Kafka and a Ruby consumer
  • Create horizontally scalable message consumers
  • Sample messages using Kafka’s partitioning
  • Create redundant message consumers for high availability
  • Learning objectives
  • Grasp the concepts of machine learning and implement the k-nearest neighbors algorithm
  • Understand the basics of distance metrics and implement euclidean distance and cosine similarity
  • Transform raw data into a matrix and convert a text document into the vector space model
  • Use k-nearest neighbors to make predictions
  • Improve execution time by reducing the search space
  • Learning objectives
  • Use cross validation to test a predictive model
  • Integrate a trained model into production
  • Version a model and track feedback data
  • Write a test harness to compare versioned models
  • Test new predicted models in production
  • Learning objectives
  • Prepare raw data for use in visualizations
  • Use core functions of the D3 JavaScript visualizaiton toolkit
  • Use D3 to create a barchart
  • Use D3 to create a time series

    We can send you everything you need to know about this course through email.
    We respect your privacy. Your information is safe and will never be shared.