Practical Machine Learning: 2-Day Intensive

OVERVIEW

The “Practical Machine Learning” course is a 2-day, on-site training event focused on helping teams build robust, high-performing machine learning applications using Python’s powerful data ecosystem.

Teams will learn best practices for building, evaluating and deploying scalable data services using Python while exploring existing software libraries to help them save time and avoid reinventing the wheel.

The course will particularly appeal to programmers and scientists who seek to improve their proficiency in Python and streamline the data modelling process.

All exercises and discussions will be adjusted to reflect the problems currently facing your business.

A combination of teaching and hands-on programming exercises gives learners the opportunity to apply, test and refine their knowledge, improving retention and building confidence through real-time feedback.

Who Should Attend?

The course is designed for developers, but is suitable for engineers,
analysts and data scientists with a basic understanding of Python and previous programming experience.

WHAT TEAMS WILL LEARN

By the end of this training, your team will have the skills required for working with realistic, end-to-end data mining pipelines. Participants will be able to… 

  • Understand the available ecosystem of Python data tools, including when to use which tools and how they relate
  • Write robust data pipelines using best industry practices
  • Get a deeper understanding of popular Python tools, such as pandas, scikit-learn and gensim
  • Communicate results to stakeholders and provide clear, meaningful insights from their data
  • Optimise pipeline performance (both CPU and memory)
  • Integrate with non-Python data mining tools and services

Prerequisites and Recommended Background


Attendees are expected to be familiar with basic programming concepts and terminology (command line, shell, filesystem navigation, basic data structures and algorithms such as list or dictionary and basic Python syntax).


In addition, participants will be expected to download and install data and software libraries as per our provided “Before You Arrive – Setup Sheet” in advance of the session. Delays due to installation issues on-site may adversely affect the day’s training schedule.


Each participant must have their own laptop, with a system that supports Python (OS X, Linux, Windows…), to participate in the interactive exercises throughout training.

CLASS SYLLABUS

Please note: This syllabus can be customised to your specific needs, projects or areas of focus.
We are happy to tailor course content and exercises to meet your specific needs.

DAY 1 

  • Course Introduction
    • Administration, setup and course materials
    • Course structure and agenda
    • Participants and trainer introductions
  • Session 1: Data Exploration
    • Data cleanup & the garbage-in-garbage-out methodology
    • Tips and tools for efficient preprocessing, data streaming and out-of-core processing, generators, lazy pipelines
    • Pandas: data frames, filtering, indexing
    • Interactive data visualisations: matplotlib, seaborn
  • Interactive Programming Exercise
  • Session 2: Feature Engineering
    • Understanding data as vectors, NumPy, SciPy
    • Text processing ecosystem: NLTK, spaCy, gensim
    • Topic modelling: TF-IDF, LSI, LDA, NNMF, word2vec
    • Common issues and improvements
  • Interactive Programming Exercise

DAY 2

  • Session 3: Pipelines
    • Building robust systems for classification and regression: scikit-learn
    • Evaluating pipeline quality, scoring, multiclass aggregation
    • Quality tuning: crossvalidation, parameter search
    • Advanced features: feature union, multilabel classification
    • Performance tips and gotchas
  • Interactive Programming Exercise
  • Session 4: Error analysis
    • Validation curves
    • Model introspection, looking “inside the black-box”, explaining classifications to stakeholders
    • Improving quality via system iterations
    • Unbiased sample collection for realistic performance estimate
  • Interactive Programming Exercise
  • Session 5: Production Models & APIs
    • Model persistence
    • Microservices, web APIs: Flask, CherryPy
    • Pipeline orchestration: Luigi
    • Performance and scaling tips
  • Interactive Programming Exercise

INTERESTED IN THIS COURSE?

Contact us today to discuss available dates and how we can customise this training for your team.