Prediction Precision: A Forecast Is Only as Good as its Inputs

October 13, 2020December 14, 2020 • Articles

Conner Snodgrass, Implementation Specialist
Andy Drozinski, Customer Program Manager
Mauricio Guerra, Implementation Specialist

If only we were able to predict the future, then life would be so much easier! You have probably had a similar thought at some point in your life, whether it be for business or personal gain—or even just to leave the jacket at home when you receive an uncertain weather forecast. Believe it or not, in a lot of areas, we are able to predict the future. The practice of predicting future outcomes based on past events and insight is called forecasting. I know what you are thinking, didn’t we just mention how uncertain the weather forecast can be? Yes, that is true, forecasting the weather is an incredibly difficult and complex task, that only becomes more accurate as we get closer to the days being forecasted. Luckily for us, just about anything can be forecasted if you have the right data. Also, not every forecasting problem is as tough to solve as predicting the weather. Let’s dive into some general components of what makes a great forecast, and then discuss how to consider forecasting in a retail context.

The importance of data integrity

We say almost anything can be forecasted if you have the right data. So what exactly is “the right data?” Each forecasting problem might have a slightly different answer to that question; however, there are broad commonalities that characterize “the right data.” You’ve heard the saying “quality over quantity.” For forecasting, we prefer to have both quality and quantity. Quality data as it pertains to forecasting is data that is clean, consistent and complete. There is a whole practice devoted to this called “data cleansing,” but for our purposes, clean data is data that is consistently formatted and structurally correct. Clean data should be devoid of things like typos and invalid entries. Next, the data should be consistent, scrubbed for things like outliers, which we will go into more detail on later. Consistent data should also come in a uniform format, with uniform units of measure. Data that is consistent and uniform enables us to perform “apples to apples” comparisons. Sometimes data must be converted to a like unit of measure (e.g., pounds to kilograms) before it can be utilized for forecasting purposes. Finally, you want the dataset to be complete. Missing or omitted data only serves to degrade your forecast.

Complete data ties back to the “quality and quantity” aspect we mentioned before. Generally speaking, the more data, the better for forecasting purposes. Leading forecasting techniques leverage artificial intelligence and machine learning. Machine learning sounds intimidating, but it is simply the study of computer algorithms that improve automatically through experience. These algorithms build a mathematical model based on “training data” to make predictions about what the future holds. Just like a human-generated forecast, the more experience (or training data), the better. Algorithm-based forecasting has led, in part, to businesses seeking more and more data about their day-to-day operations. Businesses are seeing the benefits of more and more granular data reporting in areas such as staffing, customer service and sales forecasting. Consider this value proposition when selecting a forecast provider, and realize that the most advanced providers will now be able to leverage all the data you provide to better meet your business needs.

Identifying outliers

Special attention should be given to outliers or data points that differ significantly from other observations. It is important to try to treat the outliers before forecasting because they can wreak havoc on any statistical analysis. There are many potential causes of outliers, including measurement, natural and human error. Each root cause of the various outliers will have a different remedy. Measurement and data processing errors might require a change to the process of obtaining and transmitting the underlying data. Natural errors are typically just a novelty in the data and indicate a real divergence from the norm. Human error is also prevalent and often needs to be addressed on a case-by-case basis.

An advanced forecast should have a way to treat several different types of outliers. Legitimate, natural outliers can be flagged by what is commonly referred to as a forecasting tag. A forecasting tag is an input to your forecasting process that specifically calls out the tagged data as a known outlier. Leading forecasting methods leverage this tag data to create layered forecasting predictions that are much more accurate than older methods. Similarly, forecast tags can be used to omit illegitimate data. If you find an outlier in your data that was a one-off occurrence, how can it be omitted from future forecasts? Advanced forecasting models that utilize tags will often be able to characterize this type of outlier for you using machine learning. By simply tagging the event, the algorithm will compare the tagged data with historical data and decide how to treat it based on actual performance.

Forecasting algorithms have transformed how businesses predict future results

There are dozens of algorithms, each with a slightly different logic for building their mathematical models and giving predictions. Some algorithms weight the recent data more heavily, while others will seek to smooth out the more volatile data for a more consistent result. The best forecast providers will use multiple forecasting algorithms and select which algorithm is most accurate for your specific business application.

So, how satisfied are you with the forecasting in your retail business? First, consider the following:

Are you satisfied with the accuracy and breadth and depth of the metrics?
Are there areas that could be managed more effectively with better data (e.g., customer count in service areas other than front ends)?
What would it take to get this data (e.g., new equipment, changes to how existing data is captured)?

Now let’s take these questions a step further.

For front ends, if you are not separating your data by register groups (to isolate front end activity from perimeter registers) into large and small orders (to optimize express) and self-checkout, there are significant opportunities for better utilizing the existing data.
Going deeper, looking at data distinctly by item type and transaction type will provide a better output as compared to homogenized data.
The source of most transactional data is the point-of-sale (POS) system. Often the output is limited to total store or data by register type, not by department or by items sold at a service counter. You should consider factors such as:
- What data is optimal?
- How important is it to have data that helps you best position your service resources when customers need that service?
- What is it worth?
Going after better data may not be a huge IT project if your solution provider can read and process transaction log (T-log) data. Sometimes the best data is never pursued because it appears too daunting to get it done through existing IT resources.
Your solution provider for forecasting and scheduling needs to be able to use the data properly. The best systems enable task-based scheduling rather than scheduling at job or department levels. This can have a significant impact in the value that having the right data can provide.

Improving your forecast may require investing in equipment or technology to capture data in a cleaner, more consistent and more complete way, but the output could be worth it.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
_cfuvid	Session	This cookie is a part of the services provided by Cloudflare - Including load-balancing, deliverance of website content and serving DNS connection for website operators.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the "Analytics" category.
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR Cookie Consent plugin to record the user consent for the cookies in the "Functional" category.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the "Necessary" category.
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the "Other" category.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the "Performance" category.
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
logile_geo_redirected	Session	Records whether the user's browser has been geo-located and redirected to a more suitable Logile sub-site dedicated to that region.
test_cookie	1 day	Used to check if the user's browser supports cookies.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	Session	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_KD0D69SXFL	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_52180569_1	Session	Set by Google to distinguish users.
_gid	Session	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data collected include the number of visitors, their source, and the pages they visit anonymously.
acton/bn/#	Session	A tracking cookie used by the Act-On platform, which tracks engagement with forms on our website.
wp33588	1 year	A tracking cookie used by the Act-On platform, which tracks engagement with forms on our website.

Cookie	Duration	Description
ab	1 day	This cookie is used by the website’s operator in context with multi-variate testing. This is a tool used to combine or change content on the website. This allows the website to find the best variation/edition of the site.
demdex	179 days	Via a unique ID that is used for semantic content analysis, the user's navigation on the website is registered and linked to offline data from surveys and similar registrations to display targeted ads.
dpm	179 days	Sets a unique ID for the visitor, that allows third-party advertisers to target the visitor with relevant advertisement. This pairing service is provided by third-party advertisement hubs, which facilitates real-time bidding for advertisers.
i	1 year	Registers anonymized user data, such as IP address, geographical location, visited websites, and which ads the user has clicked, with the purpose of optimizing ad display based on the user's movement on websites that use the same ad network.
IDE	1 year	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
pixel.gif	Session	Collects information on user preferences and/or interaction with web-campaign content. This is used on CRM-campaign-platform used by website owners for promoting events or products.
ssi	1 year	Registers a unique ID that identifies a returning user's device. The ID is used for targeted ads.
u	1 year	Collects data on user visits to the website, such as what pages have been accessed. The registered data is used to categorize the user's interest and demographic profiles in terms of resales for targeted marketing.
visitorId	1 year	These third-party cookies are used to collect information about companies that visit our website. We use the information to compile reports about companies interested in our website and to help us improve the website. The cookies collect information in a way that does not directly identify anyone. To learn more about ZoomInfo and manage your privacy settings, visit https://www.zoominfo.com/about-zoominfo/privacy-center.
w/1.0/sd	Session	Registers data on visitors such as IP addresses, geographical location and advertisement interaction. This information is used to optimize the advertisement on websites that make use of OpenX.net services.
ziwsSession	Session	Collects statistics on the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been read.
ziwsSessionId	Session	Collects statistics on the user's visits to the website, such as the number of visits, average time spent on the website and what pages have been read.

The importance of data integrity

Identifying outliers

Forecasting algorithms have transformed how businesses predict future results

Corner Convenience: Demand Forecasting to Drive Optimal C-Store Labor Planning, Scheduling and Execution

Retail Scale and Label Management: Experienced Do’s, Don’ts and Proven Best Practices

NRF 2024: Logile’s Top 4 Highlights