First, you have to develop a “problem statement” that’s based on the problem provided by the business. This step is essential because it’ll help ensure that you fully understand the type of problem and the input and the output of the problem you want to solve.
The problem statement should be simple and no more than a single sentence. For example, let’s consider enterprise spam that requires an algorithm to identify it.
The problem statement would be: “Is the email fake/spam or not?” In this scenario, the identification of whether it’s fake/spam will be the output.
Once you have defined the problem statement, you have to identify the appropriate algorithm from the following:
- Any classification algorithm
- Any clustering algorithm
- Any regression algorithm
- Any recommendation algorithm
Which algorithm you use will depend on the specific problem you’re trying to solve. In this scenario, you can move forward with a clustering algorithm and choose a k-means algorithm to achieve your goal of filtering spam from the email system.