Solved MBA IT Assignment and Notes

Full width home advertisement

Post Page Advertisement [Top]

Discuss the challenges involved in data integration and coordination process?
Answer: 

In general, most of the data that the warehouse gets is the data extracted from a combination of legacy mainframe systems, old minicomputer applications, and some client/server systems. But these source systems do not conform to the same set of business rules. Thus, they may often follow different naming conventions and varied standards for data representation. Thus, the process of data integration and consolidation plays a vital role. 
 
Here, the data integration includes combining of all relevant operational data into coherent data structures so as to make them ready for loading into data warehouse. Some of the challenges involved in the data integration and consolidation process are as follows.

Identification of an Entity:

Suppose there are three legacy applications that are in use in your organization; one is the order entry system, second is customer service support system, and the third is the marketing system. 

-    Each of these systems might have their own customer file to support the system. As you need to keep a single record for each customer in a data warehouse, you need to get the transactions of each customer from various source systems and then match them up to load into the data warehouse. This is an entity identification problem in which you do not know which of the customer records relate to the same customer. 

-    This problem is prevalent where multiple sources exist for the same entities and the other entities that are prone to this type of problem include vendors, suppliers, employees, and various products manufactured by a company. 

-    In case of three customer files, you have to design complex algorithms to match records from all the three files and groups of matching records. But this is a difficult exercise. If the matching criterion is too tight, then some records might escape the groups. Similarly, a particular group may include records of more than one customer if the matching criterion designed is too loose.

Existence of Multiple Sources

-    Another major challenge in the area of data integration and consolidation results from a single data element having more than one source. For instance, cost values are calculated and updated at specific intervals in the standard costing application. Similarly, your order processing application also carries the unit costs for all products. 

-    Thus there are two sources available to obtain the unit cost of a product and so there could be a slight variation in their values. Which of these systems needs to be considered to store the unit cost in the data warehouse becomes an important question. One easy way of handling this situation is to prioritize the two sources, or you may select the source on the basis of the last update date.

Implementation of Transformation

-    The implementation of data transformation is a complex exercise. You may have to go beyond the manual methods, usual methods of writing conversion programs while deploying the operational systems.
 
-    You need to consider several other factors to decide the methods to be adopted. Suppose you are considering automating the data transformation functions, you have to identify, configure and install the tools, train the team on these tools, and integrate them into the data warehouse environment.

Transformation for Dimension Attributes

-    Now we consider the updating of the dimension tables. The dimension tables are more stable in nature and so they are less volatile compared to the fact tables. The fact tables change through an increase in the number of rows, but the dimension tables change through the changes to the attributes. 

-    For instance, we consider a product dimension table. Every year, rows are added as new models become available. But what about the attributes that is within the dimension table. You might face a situation where there is a change in the product dimension table because a particular product was moved into a different product category. So the corresponding values must be changed in the product dimension table. Though most of the dimensions are generally constant over a period of time, they may change slowly.



No comments:

Post a Comment

Bottom Ad [Post Page]

| Designed by Colorlib