It's an inevitable problem. Over time, data gets messy. This includes the data in your SAP system. Why does this happen?
During day-to-day operations, data gets duplicated. For example, various users may enter redundant data without knowing that it already exists. For some, systems have been merged together following digital strategy or operational changes. Clients may have also have changed their corporate setup leading to redundant data. Whatever the case, you may end up with data representing the same object, but recorded in different ways.
While it might not seem like a critical problem at first, it can actually lead to huge issues. For example, sales could be reported to the wrong business unit because customers exist twice in the system because they were created by salespeople from different teams. The solution is proper data cleansing.
Data cleansing is the process of resolving issues with duplicate data and making sure that such issues don't occur in the first place. This includes double-checking the system's data quality and addressing all problems: starting with data extraction from SAP and satellite systems, followed by automatic and manual checks in Microsoft Excel or other data visualization programs, and completed with system update during a business interruption.
The first step in any data cleansing process is to identify and extract the data.
While it sounds simple at first, it gets more compiles the data cleansing coordinator goes deeper into the system.
Which datasets should be checked, how and by whom? These questions are not trivial, and it usually starts by defining data objects, and their scope.
If a corporation plans on cleansing their customer data, they might want to only include clients that have been active in the past few years, and simply ignore any client that did not interact for more years than the necessary legal duration for invoice archiving.
Older clients can then be avoided, and active ones extracted for further data checks. These checks should be based on defined data definition: which fields are necessary for the checks, can contain wrong information, and will have to be checked.
A data object is a defined way of organizing a subset of data: customer information, bill of materials, are some data objects examples.
The more data objects are identified in the process, the more work will be necessary to cleanse data.
Once the data objects have been defined, extraction processes are validated: how often the data is changed, how long the cleansing will take, and how often will it be extracted?
As business and data entry cannot be stopped, the system data will keep being updated with new – and potentially duplicate – data that must be added to the checklist.
A cleansing process typically takes several months, therefore it is necessary to clearly define these steps.
Once the data has been extracted, it can be checked and corrected by data cleansers, based on various data quality criteria:
The ongoing process of data quality validation will help to check metrics such as the percentage of completion of these various checks, that are managed by the data cleansing coordinator, checked by the data cleansers in coordination with business champions, and reported to management until the data quality score matches company's goal, at which point the system can be updated.
The final step in the data cleansing activity, at least for a specific project, unless the company has a continuous data quality improvement process, is to proceed to the system update.
It will typically involve a system stop and will be performed over a weekend, during a low business volume period, or by taking advantage of public holidays and business closure.
Depending on the amount of data to be updated, it can take up to several days, and any issue during this critical process might lead to business discontinuity beyond the planned business break.
The technical team will take care of updating the system data based on extracted data and quality check results, and update them one by one using custom-developed programs: archive old and irrelevant data, update identified values, and proceed to the next object.
If any issue occurs at this point in the process, the whole team will most likely be called up to work on it until the update process has been technically finished, and key users can validate data cleansing process completion.
Ensuring data correctness isn't only an ongoing process. It might be necessary to run a whole project following technical updates, company mergers, and other situations.
While each data cleansing project is different, it is generally necessary for the whole team to be skilled in data visualization.
Ensuring such a project's success can be accomplished through workforce upskilling not only in data cleansing understanding, but also in system basic usage, and most likely with complete Microsoft Excel skills, as most of the actual checks include setting up formulas that will compare huge amount of data automatically.
Plan your SAP data cleansing project success and save resources time by getting them upskilled with the following certifications:
There are various basic material types available by default in any standard SAP installation. This list can also be extended or customized with your own material types, as required by your business. What ...
Acquired by SAP in 2012 for $4.3 billion, the SAP Ariba network is an online platform that connects buyers and suppliers together in order to streamline and optimize their corporate operational ...
What is the Plan-Buy-Pay Process? The Plan-Buy-Pay process, which is the basic concept behind the Ariba business network, is the idea that the whole procurement process is interconnected, with the ...