What are the data modeling techniques used in data warehousing environment?
Answer:
Two data modeling techniques that are relevant in a data warehousing environment are ER modeling and Multidimensional modeling.ER modeling produces a data model of the specific area of interest, using two basic concepts: entities and the relationships between those entities.
Multidimensional modeling uses three basic concepts: measures, facts, and dimensions. Multidimensional modeling is powerful in representing the requirements of the business user in the context of database tables.
Both ER and Multidimensional modeling can be used to create an abstract model of specific subject.
A) ER Modeling:
An ER model is represented by an ER diagram, which uses three basic graphic symbols to conceptualize the data: entity, relationship, and attribute.
Entity
- An entity is defined to be a person, place, thing, or event of interest to the business or the organization.- An entity represents a class of objects, which are things in the real world that can be observed and classified by their properties and characteristics.
- In the detailed ER model, defining a unique identifier of an entity is the most critical task. These unique identifiers are called candidate keys. From them we can select the key that is most commonly used to identify the entity. It is called the primary key.
Relationship:
- A relationship is represented with lines drawn between entities. It depicts the structural interaction and association among the entities in a model. A relationship is designated grammatically by a verb, such as owns, belongs, and has.
- The relationship between two entities can be defined in terms of the cardinality. This is the maximum number of instances of one entity that are related to a single instance in another table and vice versa. The possible cardinalities are: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:M).
Attributes
- Attributes describe the characteristics of properties of the entities. Eg: Product ID, Description, and Picture are attributes of the PRODUCT entity.
- An attribute name should be unique in an entity and should be self-explanatory.
B) Dimensional Modeling
Dimensional modeling is a technique for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. It is especially useful for summarizing and rearranging the data and presenting views of the data to support data analysis.
Dimensional modeling has several basic concepts:
• Facts
• Dimensions
• Measures (variables)
Fact
- A fact is a collection of related data items, consisting of measures and context data. Each fact typically represents a business item, a business transaction, or an event that can be used in analyzing the business or business processes.
Dimension
- A dimension is a collection of members or units of the same type of views. In a diagram, a dimension is usually represented by an axis.
- Dimensions are the parameters over which we want to perform Online Analytical Processing (OLAP). For example, in a database for analyzing all sales of products, common dimensions could be:
• Time
• Location/region
• Customers
• Salesperson
Measure
- A measure is a numeric attribute of a fact, representing the performance or behavior of the business relative to the dimensions.
- The actual numbers are called as variables. For example, measures are the sales in money, the sales volume, the quantity supplied, the supply cost, the transaction amount, and so forth.
Source: Redbooks.IBM.com
No comments:
Post a Comment