Topic Modelling Training | RARE Technologies

Topic Modelling: 1-Day Intensive

OVERVIEW

The “Topic Modelling” 1-Day Intensive teaches teams how to extract information from unstructured, plain text documents using Python’s powerful data ecosystem.

Teams are taught smart, efficient practices for building, improving and deploying scalable natural language processing systems (NLP) using Python, using existing software libraries to avoid wasting time or trying to reinvent the wheel.

All course materials can be customised to focus on your business’ real challenges and products in development.

A combination of teaching and hands-on programming exercises will give learners the opportunity to apply, test and refine their knowledge, improving retention and building confidence through real-time feedback.

Who Should Attend?

The course will appeal to programmers and scientists who seek to improve their proficiency in Python and streamline the NLP process.

WHAT TEAMS WILL LEARN

By the end of this 1-day intensive, participants will have the necessary skills to:

Write robust document processing pipelines using best industry practices
Understand capabilities and limitations of existing NLP tools and algorithms for semantic analysis
Apply algorithms for entity extraction, chunking, semantic indexing and document retrieval
Communicate modelling results to stakeholders and provide meaningful insights into the data
Optimize CPU and memory of existing or developing Topic Modelling systems
Integrate with non-Python data mining tools and services

RECOMMENDED BACKGROUND

Attendees are expected to be familiar with basic programming concepts and terminology (command line, shell, filesystem navigation, basic data structures and algorithms such as list or dictionary and basic Python syntax.

In addition…

Each participant must have their own laptop, with a system that supports Python (OS X, Linux, Windows…), to participate in the interactive exercises throughout training.
Every participant is expected to have downloaded and installed the necessary software libraries, as instructed by RaRe’s “Before You Arrive – Setup Sheet” in advance. Delays due to installation issues on-site may affect the day’s training schedule.

CLASS SYLLABUS

Please note: This syllabus can be customised to your specific needs, projects or areas of focus.
We are happy to tailor course content and exercises to meet your specific needs.

DAY 1

Course Introduction
- Administration, setup and course materials distribution
- Course structure and agenda
- Participants and trainer introductions
Session 1: Text Processing
- NLP ecosystem: gensim, NLTK, spaCy
- Streamed corpora: generators, lazy processing
- Semantic text transformations: LSI, LDA, word2vec, doc2vec
- Named entity extraction, entity linking, knowledge bases (KBs)
- Model quality evaluation and tuning
- Performance gotchas and tips
Interactive Programming Exercise
Session 2: Indexing and Retrieval
- Indexing documents
- Retrieving related documents with semantic queries
- Scaling up, approximate document search
- Searching Wikipedia
Interactive Programming Exercise

Session 3: Integration, APIs
- Presenting text collections to stake holders: pyLDAviz, D3.js
- Microservices, web APIs: Flask, CherryPy
- Interactive charts & graphs: matplotlib, seaborn, bokeh
- Non-Python ecosystems: Spark, AWS, EC2, S3, EMR, HDF5, Elasticsearch
Interactive Programming Exercise

Topic Modelling: 1-Day Intensive

WHAT TEAMS WILL LEARN

RECOMMENDED BACKGROUND

CLASS SYLLABUS

INTERESTED IN THIS COURSE?