Udacity UD032: Data Wrangling with MongoDB

data wrangling

Data Wrangling with MongoDB
Instructor: Shannon Bradshaw, Udacity
Zeitraum: ab Februar 2014
Status: habe ich kurz gesampelt

Anmerkung: schaut nicht so schlecht aus und ich habe vor, den bei Gelegenheit zu beenden.


Course Syllabus

Lesson 1: Data Extraction Fundamentals
Assessing the Quality of Data
Intro to Tabular Formats
Parsing CSV
Parsing XLS with XLRD
Intro to JSON
Using Web APIs

Lesson 2: Data in More Complex Formats
Intro to XML
XML Design Principles
Parsing XML
Web Scraping
Parsing HTML

Lesson 3: Data Quality
What is Data Cleaning?
Sources of Dirty Data
Measuring Data Quality
A Blueprint for Cleaning
Auditing Validity
Auditing Accuracy
Auditing Completeness
Auditing Consistency
Auditing Uniformity

Lesson 4: Working with MongoDB
Data Modelling in MongoDB
Introduction to PyMongo
Field Queries
Projection Queries
Getting Data into MongoDB
Using mongoimport
Operators like $gt, $lt, $exists, $regex
Querying Arrays and using $in and $all Operators
Changing entries: $update, $set, $unset

Lesson 5: Analyzing Data
Examples of Aggregation Framework
The Aggregation Pipeline
Aggregation Operators: $match, $project, $unwind, $group
Multiple Stages Using a Given Operator

Lesson 6: Case Study – OpenStreetMap Data
Using iterative parsing for large datafiles
Open Street Map XML Overview
Exercises around OpenStreetMap data
Final Project Instructions