Content

Introduction

Understanding “data science” and “big data” concepts. Programming in Python.

Use Python tools for exploratory analysis and reproducible research.

Data Gathering, Cleaning & Storage.

Work with various data and file formats. Use tools for web data scraping. Write scripts for data.

Extract features from unstructured data. Understand and use tools for representing natural language data.

Bayesian Statistics

Learn about representing states of the world in terms of degrees of belief. Identify prior beliefs about what results are likely for a problem, and then update those according to the data we collect. Assess a model using Bayesian inference.

Recommenders

Identify user-based and item-based collaborative filtering techniques. Given a scenario, develop a collaborative filtering implementation.

Graph analysis

Explore and visualize networks. Analyze network data.

Cloud Computing

Industry perspective on why Cloud Computing is a key tool for Data Science and Big Data at scale and how to get started. From exploratory analysis and modeling processes up-to operationalization (MLOps) leveraging different public Cloud services.

Data Science Basic Toolbox

Use the Python ecosystem for numerical computation and tabular data analysis. Use visualization techniques for data exploration. Use software development and engineering tools.

Machine Learning

Understand different approaches to machine learning. Describe the steps for developing a classifier. Use model selection and model evaluation techniques. Recognize appropriate uses of linear models.

Recognize appropriate uses of Naïve Bayes and Random Forests. Evaluate the use of ensemble methods. Search for the best set of features. Use dimensionality reduction for data exploration and representation. Define clustering and identify appropriate use cases.

Deep Learning

Learn the basics on deep learning and toolboxes like keras and tensorflow

Capstone Project

An important part of the course is the IPython process notebook. This notebook details your steps in developing a solution to a real problem, including how you collected the data, alternative solutions you tried, describing statistical methods you used, and the insights you got.