Prediction Precision A Forecast Is Only Good as Inputs

Prediction Precision: A Forecast Is Only as Good as its Inputs

Conner Snodgrass, Implementation Specialist
Andy Drozinski, Customer Program Manager
Mauricio Guerra, Implementation Specialist 

If only we were able to predict the future, then life would be so much easier! You have probably had a similar thought at some point in your life, whether it be for business or personal gain—or even just to leave the jacket at home when you receive an uncertain weather forecast. Believe it or not, in a lot of areas, we are able to predict the future. The practice of predicting future outcomes based on past events and insight is called forecasting. I know what you are thinking, didn’t we just mention how uncertain the weather forecast can be? Yes, that is true, forecasting the weather is an incredibly difficult and complex task, that only becomes more accurate as we get closer to the days being forecasted. Luckily for us, just about anything can be forecasted if you have the right data. Also, not every forecasting problem is as tough to solve as predicting the weather. Let’s dive into some general components of what makes a great forecast, and then discuss how to consider forecasting in a retail context.

The importance of data integrity

We say almost anything can be forecasted if you have the right data. So what exactly is “the right data?” Each forecasting problem might have a slightly different answer to that question; however, there are broad commonalities that characterize “the right data.” You’ve heard the saying “quality over quantity.” For forecasting, we prefer to have both quality and quantity. Quality data as it pertains to forecasting is data that is clean, consistent and complete. There is a whole practice devoted to this called “data cleansing,” but for  our purposes, clean data is data that is consistently formatted and structurally correct. Clean data should be devoid of things like typos and invalid entries. Next, the data should be consistent, scrubbed for things like outliers, which we will go into more detail on later. Consistent data should also come in a uniform format, with uniform units of measure. Data that is consistent and uniform enables us to perform “apples to apples” comparisons. Sometimes data must be converted to a like unit of measure (e.g., pounds to kilograms) before it can be utilized for forecasting purposes. Finally, you want the dataset to be complete. Missing or omitted data only serves to degrade your forecast.

Complete data ties back to the “quality and quantity” aspect we mentioned before. Generally speaking, the more data, the better for forecasting purposes. Leading forecasting techniques leverage artificial intelligence and machine learning. Machine learning sounds intimidating, but it is simply the study of computer algorithms that improve automatically through experience. These algorithms build a mathematical model based on “training data” to make predictions about what the future holds. Just like a human-generated forecast, the more experience (or training data), the better. Algorithm-based forecasting has led, in part,  to businesses seeking more and more data about their day-to-day operations. Businesses are seeing the benefits of more and more granular data reporting in areas such as staffing, customer service and sales forecasting. Consider this value proposition when selecting a forecast provider, and realize that the most advanced providers will now be able to leverage all the data you provide to better meet your business needs.

Identifying outliers

Special attention should be given to outliers or data points that differ significantly from other observations. It is important to try to treat the outliers before forecasting because they can wreak havoc on any statistical analysis. There are many potential causes of outliers, including measurement, natural and human error. Each root cause of the various outliers will have a different remedy. Measurement and data processing errors might require a change to the process of obtaining and transmitting the underlying data. Natural errors are typically just a novelty in the data and indicate a real divergence from the norm. Human error is also prevalent and often needs to be addressed on a case-by-case basis.

An advanced forecast should have a way to treat several different types of outliers. Legitimate, natural outliers can be flagged by what is commonly referred to as a forecasting tag. A forecasting tag is an input to your forecasting process that specifically calls out the tagged data as a known outlier. Leading forecasting methods leverage this tag data to create layered forecasting predictions that are much more accurate than older methods. Similarly, forecast tags can be used to omit illegitimate data. If you find an outlier in your data that was a one-off occurrence, how can it be omitted from future forecasts? Advanced forecasting models that utilize tags will often be able to characterize this type of outlier for you using machine learning. By simply tagging the event, the algorithm will compare the tagged data with historical data and decide how to treat it based on actual performance.

Forecasting algorithms have transformed how businesses predict future results

There are dozens of algorithms, each with a slightly different logic for building their mathematical models and giving predictions. Some algorithms weight the recent data more heavily, while others will seek to smooth out the more volatile data for a more consistent result. The best forecast providers will use multiple forecasting algorithms and select which algorithm is most accurate for your specific business application.

So,  how satisfied are you with the forecasting in your retail business? First, consider the following:

  • Are you satisfied with the accuracy and breadth and depth of the metrics?
  • Are there areas that could be managed more effectively with better data (e.g., customer count in service areas other than front ends)?
  • What would it take to get this data (e.g., new equipment, changes to how existing data is captured)?

Now let’s take these questions a step further.

  • For front ends, if you are not separating your data by register groups (to isolate front end activity from perimeter registers) into large and small orders (to optimize express) and self-checkout, there are significant opportunities for better utilizing the existing data.
  • Going deeper, looking at data distinctly by item type and transaction type will provide a better output as compared to homogenized data.
  • The source of most transactional data is the point-of-sale (POS) system. Often the output is limited to total store or data by register type, not by department or by items sold at a service counter. You should consider factors such as:
    • What data is optimal?
    • How important is it to have data that helps you best position your service resources when customers need that service?
    • What is it worth?
  • Going after better data may not be a huge IT project if your solution provider can read and process transaction log (T-log) data. Sometimes the best data is never pursued because it appears too daunting to get it done through existing IT resources.
  • Your solution provider for forecasting and scheduling needs to be able to use the data properly. The best systems enable task-based scheduling rather than scheduling at job or department levels. This can have a significant impact in the value that having the right data can provide.

Improving your forecast may require investing in equipment or technology to capture data in a cleaner, more consistent and more complete way, but the output could be worth it.