Working with Large Data Volumes – Partitioning

As more and more the increase in information is getting visible to each database user, there comes a question on how are we to process these volumes in a Data Warehouse environment.

One of the first answers provided by Oracle on this topic is Partitioning.

What is Partitioning?

Similar to an Operating System partitioning, from a database perspective, we should envision partitioning as a logical division of data into separate units (like smaller tables). This allows the database to manage, to a certain extent, information on each partition as if it were a distinct table. Doing this, implies operating smaller sections of data, improving efficiency.

Partitioning enables tables and indexes to be subdivided into individual smaller pieces. Each piece of the database object is called a partition. A partition has its own name, and may optionally have its own storage characteristics.

From the perspective of a database administrator, a partitioned object has multiple pieces that can be managed either collectively or individually. This gives the administrator considerable flexibility in managing a partitioned object.

However, from the perspective of the application, a partitioned table is identical to a non-partitioned table; no modifications are necessary when accessing a partitioned table using SQL DML commands. Logically, it is still only one table

A query-rewrite view on an union all select of identical tables (from structure perspective)

So, to a certain extend, I would compare a partitioned table with a view on multiple tables which have the same structure (and you don’t need to bother in defining each table with the same structure, just adding a partition will give it same metadata), with a special column that sorts/tells you a critical selectivity criteria for that particular data set in the entire view, and which allows query rewrite.

Why?

Looking at a view containing multiple tables, if we query, through the view, information from only one table, Oracle knows to do a very neat trick of query rewrite, and re-writes your query into a selection from only that particular table. This allows for better performance results.

Well, partitioning does something similar, in the context of selecting from one single (or a selection of) partition(s), when filtering on the partitioning key. This is called partition pruning.

Partitioning types

In Oracle, there are a couple of major partitioning types, given a certain key/column:

  • Range Partitioning The data is distributed based on a range of values.
  • List Partitioning The data distribution is defined by a discrete list of values.
  • Hash Partitioning An internal hash algorithm is applied to the partitioning key to determine the partition.

Also, it allows sub-partitioning, which is a combination of the primary partitioning types. This is called Composite Partitioning. First, the table is partitioned by data distribution method one and then each partition is further subdivided into sub-partitions using the second data distribution method.

Additional methods of partitioning:

  • Multi-Column Range Partitioning: An option for when the partitioning key is composed of several columns and subsequent columns define a higher level of granularity than the preceding ones.
  • Interval Partitioning: Extends the capabilities of the range method by automatically defining equi-partitioned ranges for any future partitions using an interval definition as part of the table metadata.
  • Reference Partitioning: Partitions a table by leveraging an existing parent-child relationship. The primary key relationship is used to inherit the partitioning strategy of the parent table to its child table.
  • Virtual Column Based Partitioning: Allows the partitioning key to be an expression, using one or more existing columns of a table, and storing the expression as metadata only.
  • Interval Reference Partitioning: An extension to reference partitioning that allows the use of interval partitioned tables as parent tables for reference partitioning.
  • Range Partitioned Hash Cluster: Allows hash clusters to be partitioned by ranges.

How it Works?

For examples on some of the mentioned partitioning methods, please see my following posts:

See also sub-partitioning examples :

Why to use it?

Partitioning:

  • Increases performance by only working on the data that is relevant.
  • Improves availability through individual partition manageability.
  • Decreases costs by storing data in the most appropriate manner.
  • Is easy as to implement as it requires no changes to applications and queries.

References:

Please note I’ve used the official Oracle documentation for the definitions used in this post for each of the partitioning types, as well as the well known benefits. You can find the original page here.

Advertisements
This entry was posted in DW and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s