Indoor Occupancy Detection Using Machine Learning and Environmental Sensors

Authors

  • Akindele Segun Afolabi Department of Electrical and Electronics Engineering, University of Ilorin, Ilorin 240003, Nigeria
  • Olubunmi Adewale Akinola Department of Electrical and Electronic Engineering, Federal University of Agriculture, Abeokuta 110111, Nigeria
  • Oyinlolu Ayomidotun Odetoye Department of Electrical and Information Engineering, Covenant University, Ota, Ogun State 112233, Nigeria;
  • Emmanuel Adetiba Department of Electrical and Information Engineering, Covenant University, Ota, Ogun State 112233, Nigeria; Covenant Applied Informatics and Communication Africa Center of Excellence, Covenant University, Ota Ogun 112233, Nigeria; HRA Institute of System Science, Durban University of Technology, Durban 4000, South Africa

DOI:

https://doi.org/10.24191/jsst.v5i1.101

Keywords:

Indoor occupancy detection, Machine learning, Data leakage, Target leakage, Random forest classifier, Decision trees classifier

Abstract

Detecting the occupancy status of enclosed spaces has been immensely beneficial in the automated control of HVACs (heating, ventilation, and cooling systems), providing assistance to the elderly, healthcare provisioning, recognition of human activity, and others. As a result of these benefits, a plethora of machine learning-based solutions for occupancy detection has been developed in the literature. However, many of these solutions have poor prediction accuracies. Furthermore, it is necessary to develop models that are robust enough to achieve acceptable performance in situations where partial data from sensors are available. In this paper, we experimentally determined the Machine Learning (ML) models that are most robust for use in indoor occupancy detection. This is important because the activities of human subjects in an ML environment are capable of disrupting the data available to some deployed ML models, which might cause the performance of such models to drop. Hence, it is crucial to determine ML models that are robust against such disruptions. In this paper, three algorithms were developed: the first was for outlier removal from features, the second was for feature selection, and the third was for partial-features-availability-aware ML model selection. These algorithms were applied to data from environmental sensors such as temperature, humidity, carbon dioxide (CO2), and light sensors, and afterward. The resulting data was used to train six different ML-based classifiers. The classifiers considered in this paper were Logistic Regression (LR), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Support Vector Machines (SVM), and Gradient Boosting Machines (GBM). Simulation experiments revealed that only the RF and DT models are robust against the partial features availability problem, achieving at least 90% performance scores across all the considered metrics.

Downloads

Published

2025-03-31