Practical Machine Learning training

Practical Machine Learning: 2-Day Intensive

OVERVIEW

The “Practical Machine Learning” course is a 2-day, on-site training event focused on helping teams build robust, high-performing machine learning applications using Python’s powerful data ecosystem.

Teams will learn best practices for building, evaluating and deploying scalable data services using Python while exploring existing software libraries to help them save time and avoid reinventing the wheel.

The course will particularly appeal to programmers and scientists who seek to improve their proficiency in Python and streamline the data modelling process.

All exercises and discussions will be adjusted to reflect the problems currently facing your business.

A combination of teaching and hands-on programming exercises gives learners the opportunity to apply, test and refine their knowledge, improving retention and building confidence through real-time feedback.

Who Should Attend?

The course is designed for developers, but is suitable for engineers,
analysts and data scientists with a basic understanding of Python and previous programming experience.

WHAT TEAMS WILL LEARN

By the end of this training, your team will have the skills required for working with realistic, end-to-end data mining pipelines. Participants will be able to…

Understand the available ecosystem of Python data tools, including when to use which tools and how they relate
Write robust data pipelines using best industry practices
Get a deeper understanding of popular Python tools, such as pandas, scikit-learn and gensim
Communicate results to stakeholders and provide clear, meaningful insights from their data
Optimise pipeline performance (both CPU and memory)
Integrate with non-Python data mining tools and services

Prerequisites and Recommended Background

Attendees are expected to be familiar with basic programming concepts and terminology (command line, shell, filesystem navigation, basic data structures and algorithms such as list or dictionary and basic Python syntax).

In addition, participants will be asked to download and install data and software libraries as per our provided “Before You Arrive – Setup Sheet” in advance of the session. Delays due to installation issues on-site may adversely affect the day’s training schedule.

Each participant should have their own laptop, with a system that supports Python (OS X, Linux, Windows…), to participate in the interactive exercises throughout the workshop.

CLASS SYLLABUS

Please note: This syllabus can be customised to your specific needs, projects or areas of focus.
We are happy to tailor course content and exercises to meet your specific needs.

DAY 1

Course Introduction
- Administration, setup and course materials
- Course structure and agenda
- Participants and trainer introductions
Session 1: Data Exploration
- Data cleanup & the garbage-in-garbage-out methodology
- Tips and tools for efficient preprocessing, data streaming and out-of-core processing, generators, lazy pipelines
- Pandas: data frames, filtering, indexing
- Interactive data visualisations: matplotlib, seaborn
Interactive Programming Exercise
Session 2: Feature Engineering
- Understanding data as vectors, NumPy, SciPy
- Text processing ecosystem: NLTK, spaCy, gensim
- Topic modelling: TF-IDF, LSI, LDA, NNMF, word2vec
- Common issues and improvements
Interactive Programming Exercise

DAY 2

Session 3: Pipelines
- Building robust systems for classification and regression: scikit-learn
- Evaluating pipeline quality, scoring, multiclass aggregation
- Quality tuning: crossvalidation, parameter search
- Advanced features: feature union, multilabel classification
- Performance tips and gotchas
Interactive Programming Exercise
Session 4: Error analysis
- Validation curves
- Model introspection, looking “inside the black-box”, explaining classifications to stakeholders
- Improving quality via system iterations
- Unbiased sample collection for realistic performance estimate
Interactive Programming Exercise
Session 5: Production Models & APIs
- Model persistence
- Microservices, web APIs: Flask, CherryPy
- Pipeline orchestration: Luigi
- Performance and scaling tips
Interactive Programming Exercise

Practical Machine Learning: 2-Day Intensive

WHAT TEAMS WILL LEARN

Prerequisites and Recommended Background

CLASS SYLLABUS

INTERESTED IN THIS COURSE?