Understanding Vertical Fragmentation in Distributed Information Systems
Vertical fragmentation in distributed information systems involves partitioning a relation into smaller fragments based on attributes and primary keys to improve application performance. Information requirements include attribute affinity and access frequencies, with examples of attribute usage and affinity matrices. Clustering algorithms play a key role in designing effective vertical fragmentation strategies.
- Vertical Fragmentation
- Distributed Information Systems
- Clustering Algorithm
- Attribute Affinity
- Application Performance
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Distributed Information Systems (3.3.2.1-3.3.2.2[100 ~ 108-]) By Gowri Priya Paladugu Presentation id: 04
OUTLINE What is Vertical Fragmentation? What are Information Requirements of Vertical Fragmentation? What is Clustering Algorithm?
Vertical Fragmentation A vertical fragmentation of a relation R produces fragments R1;R2, .;Rr, each of which contains a subset of R s attributes as well as the primary key of R. The objective of vertical fragmentation is to partition a relation into a set of smaller relations so that many of the user applications will run on only one fragment.
Information Requirements of Vertical Fragmentation The major information required for vertical fragmentation is related to applications. Since vertical partitioning places in one fragment those attributes usually accessed together, there is a need for some measure that would define more precisely the notion of togetherness. This measure is the affinity of attributes, which indicates how closely related the attributes are. The major information requirement related to applications is their access frequencies.
Example Let Q = {q1;q2; : : : ;qq} be the set of user queries (applications) that access relation R(A1;A2; : : : ;An). Then, for each query qi and each attribute Aj, we associate an attribute usage value, denoted as use(qi;Aj ), and defined as follows:
Attribute Affinity Matrix (AA). Attribute usage values are not sufficiently general to form the basis of attribute splitting and fragmentation. This is because these values do not represent the weight of application frequencies. The frequency measure can be included in the definition of the attribute affinity measure aff (Ai;Aj), which measures the bond between two attributes of a relation according to how they are accessed by applications.
Clustering Algorithm The fundamental task in designing a vertical fragmentation algorithm is to find some means of grouping the attributes of a relation based on the attribute affinity values in AA. It has been suggested that the bond energy algorithm (BEA) should be used for this purpose It is considered appropriate for the following reasons: It is designed specifically to determine groups of similar items as opposed to say, a linear ordering of the items. The final groupings are insensitive to the order in which items are presented to the algorithm. The computation time of the algorithm is reasonable: O(n2), where n is the number of attributes. Secondary interrelationships between clustered attribute groups are identifiable.
Clustered Affinity Matrix The bond energy algorithm takes as input the attribute affinity matrix, permutes its rows and columns, and generates a clustered affinity matrix (CA) Generation of the clustered affinity matrix (CA) is done in three steps: Initialization. Place and fix one of the columns of AA arbitrarily into CA. Iteration. Pick each of the remaining n? i columns (where i is the number of columns already placed in CA) and try to place them in the remaining i+1 positions in the CA matrix. Row ordering. Once the column ordering is determined, the placement of the rows should also be changed so that their relative positions match the relative positions of the columns