Wednesday, July 8, 2015

Star Schema

Star Schema                 

Star schema basically has a fact table and Dimension table surrounding it. All the metrics are stored in the fact table and the entities and their attributed are stored in the dimension tables. The dimension tables are connected to fact tables but are not connected to each other.
The main reason for opting this model is easy retrieval of data in queries. Let’s say, there is a requirement which says  ‘Show me the Sales Amount by Sales Department by Material Group by Month’. When using Star Schema,  Sales Amount is a present in the Fact table and the Sales Department , Material, and Month in Dimension table. How we model our dimensions is important when using Star Schema.
Case Study:                                                                                                                                                                              
‘Track Performance of Materials with respect to Customers and Sales Person.


  • Having complete understanding of the underlying business process
    • Identify the Entities and the relations between them. 
      • The entities like, Customer, Material and Sales Person are called ‘Strong Entities’ when designing dimensions, we need to know the relation between them. From general understanding we can say that, one customer can buy any number of materials and similarly, a material can be purchased by any number of customers. They share a n:m relation. Same with the Sales Person and Customer, Sales Person and Materials.
    • Know the how the performance is measured/Facts.
      • For knowing the performance, we need to know how many materials are sold which gives the facts. Facts are normally additive. Sales transactions would give the details. Hence, Sales Transactions become the Intersection Entity which changes the n:m relation between the dimensions to give a 1:n relation.
    • Determine if any additional details are required.
      • This gives additional entities and attributes. For Materal, we have material number, Material Name, type etc  as attributes and we might need to also consider, material group which is again an entity and can have attributes of its own.
    • If additional details are required, we need to know the relation between entities and relation between entities and their attributes.
  • Creating a valid data model.
    • The organization of the entities and their attributes that are arranged in a parent-child relationship(1:n) into groups need to done
These groups are called dimensions and their properties are called attributes. The strong entities define the dimensions and the attributes of the dimension represent a specific business view on the facts which are derived from the intersection entities.

The attributes of the dimension are then organized in a hierarchical way and the most atomic attribute forms the leaves of the hierarchy tree defines the granularity of the dimension. This is MDM (Multidimensional Model) where the facts are based in the center with the dimensions surrounding them, is a simple but effective concept.



Fact Table:  A central intersection entity defines a Fact Table. An intersection entity such as document number is normally described by facts (sales amount, quantity), which form the non-key columns of the fact table. In fact, M:N relationships between strong entities meet each other in the fact table, thus defining the cut between dimensions.

Dimensions (Tables): Attributes with 1:N conditional relationships should be stored in the same dimension such as material group and material.The foreign - primary key relations define the dimensions

Time Dimension: One exception is the time dimension. As there is no correspondence in the ERM, time attributes (day, week,year) have to be introduced in the MDM process to cover the analysis needs.

Dimension tables should have a 'relatively' small number of rows (in comparison to the fact table; factor at least 1:10 to 1:20).

Disadvantages

  • In star schema, the master data is present in the dimension table, which increase the size and also,it is accessible only for that infocube. Resuability is reduced.
  • Handling Languages ( For each language , there is a new record)
  • Handling Time Dependency – increases the volume of the table
  • Star schema used alpha numeric values for keys which increase the time when compared to using numeric keys.
  • There can only be 16 dimensions  for conducting analysis(*)

No comments:

Post a Comment