Equipment Failure Prediction (End-to-end Machine Learning Process)
Problem Formulation
Maintenance refers to the process of ensuring that equipment, machinery, facilities, or any other assets operate at their intended level of performance. It involves various tasks, inspections, repairs, and upkeep activities aimed at preventing equipment failures, extending their lifespan, and maximizing their efficiency and reliability.
In plants, sensors and instruments are installed to monitor equipment parameters. Those parameters can be an indication to determine whether machinery equipment is in good or bad condition. It is essential to take action to mitigate the equipment before equipment failures become catastrophes. It can make a company lose its efficiency and suffer higher maintenance costs.
The goal of predictive maintenance is to transform maintenance practices from a reactive and time-based approach to a proactive, data-driven, and condition-based strategy. By predicting equipment failures before they happen, organizations can optimize their maintenance efforts, reduce downtime, and achieve significant cost savings while improving operational reliability and safety.
High-Level Approach
In this project, a prediction approach is developed to predict equipment failures based on its operational parameters to improve overall equipment reliability and safety.
An end-to-end classification prediction is modelled to predict whether a machine fails or not based on its parameters. Several algorithms are compared before deciding the best-performing algorithm that’ll be used in the platform.
Goals & Success
for evaluating performance, the metric that is used for classification cases is accuracy. Accuracy is one of the most commonly used evaluation metrics in classification tasks. It measures the proportion of correct predictions made by a classifier from the total number of predictions. In other words, accuracy indicates how well the classifier correctly identifies the instances of different classes. Accuracy is suitable for balanced data.
To determine whether the model accuracy is good or not, a bias-variance trade-off is performed. A good classification model strikes a balance between bias and variance. It captures the essential patterns in the data without being too sensitive to noise in the training data. This results in good performance on both training and test datasets.
Key Solution
Here is the link to the dataset that is used in this project. The synthetic dataset is modelled after an existing milling machine and consists of 10000 data points stored as rows with nine features in columns. The following table shows the description of each column.
There are several algorithms that are trained in this project, such as logistic regression, support vector machine, Naïve Bayes, decision tree, AdaBoost, and random forest. For evaluating performance, the metric that is used for classification cases is accuracy. Accuracy is one of the most commonly used evaluation metrics in classification tasks. It measures the proportion of correct predictions made by a classifier out of the total number of predictions. In other words, accuracy indicates how well the classifier correctly identifies the instances of different classes. Accuracy is suitable for balanced data.
The best-performing model is the Decision Tree with a max_depth value of 5, so the algorithm is chosen.
Key Flows
The figure above represents the flowchart of the end-to-end machine learning process. After choosing a dataset, a data cleansing process is needed. The process consists of dropping unused columns, checking null data, and removing it from the dataset. After that, a data defence step is taken to make sure of the data type and the range in each column. To prevent data leakage, the dataset is split into training and test data with a test split ratio of 2.
Training data is divided based on its predictors and target label. The predictor features are type, air temperature, process temperature, rotational speed, torque, and tool wear, whereas the target label is a binary label (0 = normal and 1 = broken). Following that, exploratory data analysis is done to prevent bias and remove the outliers of the predictors. In the ‘Type’ column, which is a category column, the encoding process is performed so that the data in the column can be modelled.
Before training by using several algorithms, a baseline value is needed as a benchmark. Because the data has been balanced by using random under-sampling, the baseline value should be around 50%. After the model is trained by the list algorithms mentioned in the previous section, hyperparameter tuning is executed by using GridSearchCV. After that, the best-performing model is saved in pickle format. In the deployment process, FastAPI and Streamlit are used.
Launch Readiness
The given table shows the project timeline. It takes eight weeks to complete the project from scratch. Exploratory Data Analysis (EDA) is the longest period, followed by modelling and hyperparameter tuning.
Artifact
In summary, finishing the project involves various artifacts, which are files and resources that help in different stages of the project lifecycle.
1. Software: Jupyter Notebook & Python
2. Hardware: Computer/Laptop
3. Files: Dataset & a list of libraries that must be installed in a virtual environment.
References
2. Github
3. Linkedin