Indoor Occupancy Detection Using Machine Learning and Environmental Sensors
DOI:
https://doi.org/10.24191/jsst.v5i1.101Keywords:
Indoor occupancy detection, Machine learning, Data leakage, Target leakage, Random forest classifier, Decision trees classifierAbstract
Detecting the occupancy status of enclosed spaces has been immensely beneficial in the automated control of HVACs (heating, ventilation, and cooling systems), providing assistance to the elderly, healthcare provisioning, recognition of human activity, and others. As a result of these benefits, a plethora of machine learning-based solutions for occupancy detection has been developed in the literature. However, many of these solutions have poor prediction accuracies. Furthermore, it is necessary to develop models that are robust enough to achieve acceptable performance in situations where partial data from sensors are available. In this paper, we experimentally determined the Machine Learning (ML) models that are most robust for use in indoor occupancy detection. This is important because the activities of human subjects in an ML environment are capable of disrupting the data available to some deployed ML models, which might cause the performance of such models to drop. Hence, it is crucial to determine ML models that are robust against such disruptions. In this paper, three algorithms were developed: the first was for outlier removal from features, the second was for feature selection, and the third was for partial-features-availability-aware ML model selection. These algorithms were applied to data from environmental sensors such as temperature, humidity, carbon dioxide (CO2), and light sensors, and afterward. The resulting data was used to train six different ML-based classifiers. The classifiers considered in this paper were Logistic Regression (LR), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Support Vector Machines (SVM), and Gradient Boosting Machines (GBM). Simulation experiments revealed that only the RF and DT models are robust against the partial features availability problem, achieving at least 90% performance scores across all the considered metrics.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.