Thursday, June 13, 2024

Data Matching




Data matching is process to identify to similar or identical entities to merge and create single view of those entities. Data will be collected from multiple sources and there can be entities which are identical. Find those entities from different sources by comparing attributes of this entity is data matching process.

To understand this let’s take example of bank database where we can have customer data. This data will be collected from various sources e.g. loan department, saving account department, insurance department. There will be chances that single customer is part of this three department. Based on attribute like address, phone, name and email we can identify identical customer. This process is called data matching.

Data matching can be done by two ways :

  1. Fuzzy matching :  Fuzzy match technique is used to find relevant data set. It identify two words identical based on phonetic or based on similarity between pronunciation. Example : Monika and Monica can be considered as match
  2. Exact matching :  As name suggest if two records are copy of each other they will be consider exact match. Example : Monika and Monica will not be considered as match. Only Monika with another same word Monika will be matched. 

Benefits of Data Matching :

  • Unnecessary costs reduced : Lets consider if bank is sending speed post every month to the customer. Customer has changed address into one system but did not update to speed post system . Bank can identify the customer and can send speed post to new address. It also helps in data storage reduction.
  • Identifying duplicate records :  Record will be collected from various system matching will help to identify duplicate records
  • Verifying the accuracy of data : If we find two duplicate records we can compare them to get accurate consolidated record
  • Consolidating data : we can create consolidated record by collecting record from multiple sources which is best version if record.

Challenges with data matching :

  • Data entry issues : While entering data into system someone can use different spelling for same name or they can use some short names. This could be Incorrect or incomplete data.
  • Mismatches in data formats : Best example of mismatch in data format is phone number.          (91) 12345678 and 9112345678 this both are same however due to different formats this will not get detected. We can use data standardization technique to avoid this.
  • Overmatching and undermatching : If matching rules defined are loose then we can get multiple records which can be overmatching. If matching rules are too tight we will get very less number of duplicates this will be undermatching.

Use case :

  • Retail Sector : Retail sector can use MDM to get customer data and based on customer purchase behavior they can provide some offers to specific customer
  • Financial Services : Bank can use this technique to identify potential fraud .Before giving any loan bank can check if this is existing customer ,if he has already taken any loan in the bank.
  • Healthcare Sector : Healthcare Sector can use this technique to compare patient with his medical history so that before treating patient they will be aware of his existing health situation.
  • Marketing and Sales : Marketing and Sales team can use this information to plan any new marketing scheme where they can compare customer purchase history and design offers to the customer.


No comments:

Post a Comment