Machine Learning is promising to uncover new insight in the mining industry. One of the challenges to applying a successful machine learning initiative is to have a good understanding of the types of questions machine learning can answer.
Ask Specific Questions
One of the things to think about is how you ask the question. For instance, suitable machine learning questions are precise and will usually look to have a target number or word that describes the outcome.
Examples of suitable questions are:
Examples of poor questions are:
Classification Questions
Can you structure the question only to have a list of possible answers? Typical examples will just have two possible answers and will be a Two-Class Classification question. These could be selective answers (A or B), logical (yes or no) or specific to a particular problem (assigned or not).
Some examples of two-class classification questions are
If there are more than two alternative answers, then it could be a multi-class classification.
Some examples include:
Anomaly detection
Maybe your problem is trying to determine if the data is normal or not. This looks like the two-class classification but is asking if the data is weird or abnormal. Some anomaly detection algorithms can detect for abnormalities in the data even when there are no examples in the available training data set.
Some examples of anomaly detection questions are:
Regression
If the purpose of the question is to get a number rather than a category or a class, then the question can be a regression question. Regression results will usually be a real number that can sometimes be negative or have lots of decimal points. These results may need to be interpreted to get the outcome required. Some interpretation examples: are rounding to the nearest whole number and assuming that negative numbers indicate a zero result.
Examples of regression questions include:
Multi-Class Classification Questions as Regression
A Regression approach to multi-class classification questions can also be useful. For example, “which component will fail in the next seven days: engine, gearbox, tyres or hydraulics?” seems to require a classification or a single component that will fail. Taking a regression approach would reformulate the question to “how likely is each component (engine, gearbox, tyres, hydraulics) to fail in the next seven days?” and would provide a numerical failure score for each component. The result would then be the highest scoring component.
Another example of restructuring a multi-class classification to a regression could be:
· “Which truck in my fleet needs servicing the most?” can be rephrased as
“How urgently does each truck in my fleet need servicing?”
Two-Class Classification as Regression
Sometimes it is beneficial to reformulate Two-Class Classification questions as regression questions. The regression version of the questions provides two scores that provide a “yes” score and a “no” score. The highest score can still be interpreted as either “yes” or “no” but can also handle the situation where there is “partly yes” and “partly no”. Each of the scores for “yes” and “no” can be partial or complete scores and may provide more information than just a “yes” or “no”.
Questions of this type often begin “how likely…” or “what fraction…”
Clustering Questions
Clustering Questions look to understand the structure of the data and try to separate data into natural ‘clumps’ that a human can easily interpret.
Some examples of clustering questions include:
“What should I do next?” Questions
Reinforcement learning algorithms allow a more advanced type of question to be asked. A question that can be linked to an action. What should I do next?
These questions are model-based and are rewarded when they make a “good decision” and are penalized when they make a poor decision.
Examples of questions that are well suited to reinforcement learning include:
Conclusion
By understanding the different forms of machine learning questions and various algorithms, the mining industry will be able to initiate a successful machine learning initiative. While having the right data will still be essential, an insight into how to ask the question and the implications may uncover a new way to think about how to leverage the available data.