Steelcase Smart and Connected products aim to achieve two goals: to help users of space to efficiently navigate and operate within the space and to help owners of space understand how their space is used and how they can improve it. At the core of the current Smart and Connected offerings are passive infra-red sensors used to infer presence in the room. Initially, a naive algorithm was used to determine occupancy. As the product development advanced and the use cases became more diverse and complex, there were concerns the algorithm would not be sufficiently accurate, particularly for real-time use cases. Advanced Analytics, an internal analytics consulting group within Steelcase, was asked to help develop an algorithm which could be leveraged to provide real-time occupancy estimation for multiple use cases. This talk will highlight the journey of how a machine learning solution was defined, developed, and deployed. The solution was to use sensor data and corresponding ground truth data to train a machine learning model to estimate presence. But a few questions needed to be addressed before that could be done. The first was: “how will we assess the accuracy within a real-time setting?” For this a cost function, which weighted certain types of errors over others based on their impact on space users was defined with the help of the project stakeholders. The other question was: “what information would be available to the model as well as being available in real time?” Working with the stakeholders and platform developers, we defined a feature definition framework to generate a robust set of features that would describe the state of the sensor array at any minute for any given room. Using these developments several a logistic regression model and a random forest model where trained by leveraging cross validation to accomplish hyper-parameter tuning and feature selection, and then a hold-out validation set of data was used for the final estimation of model performance. Also, the initial algorithm and a proposed heuristic alternative where evaluated on the same data to assure the new models provided accuracy improvements. The Random forest model provided significant improvements over the other three models and was selected for deployment. We then worked closely with the platform developers to define system requirements so that the algorithm could be implemented at scale.