Streaming Topic Modeling and Inference

Contributed Talk | Day 2 | 10:25 am | 40 Minute Duration | Grand Gallery Overlook G

Streaming Topic Modeling and Inference

Contributed Talk | Day 2 | 10:25 am | 40 Minute Duration | Grand Gallery Overlook G

Analyzing streams of text data to extract topics is an important task for getting useful insights to be leveraged in subsequent workflows. For example extracting topics from text to be continuously ingested into a search engine can be useful to tag documents with important keywords or concepts to be used at search time. Another use case is doing analysis of support tickets to get insights on the most common problems for customers.

In this talk we illustrate how to use Flink’s Dynamic processing capabilities to continuously train topic models from unlabelled text and use such models to extract topics from the data itself. Such topic models will be built leveraging distributed representations of words and documents.