big data – BI Insight

Oracle Indexing

Indexing is one of the most frequent approaches when resolving query performance issues raised within a database (though not necessary the right approach, but we can pick this up later).

However, in order to better use indexing strategies, we should go trough the process on understanding index types and their functionality.

First of all, please keep in mind that an index is logically and physically independent of the data they represent. This implies that modifying the index will not affect the data consistency within the table the index is associated with.

It can be created on one or more columns of a table to enable queries to retrieve a small set of randomly distributed rows while reducing the cost of that operation, by reducing the IO associated with the alternative full table scan.

The general considerations for creating an index would be

Unique indexes for candidate unique/pk columns, to enable naming the index when creating the associate constraint on the table;
a referential constraint column
columns used in frequent queries with high selectivity (columns on which the filters applied would enable the return of a small percentage of the rows in the table).

! Note: Primary and unique keys automatically have indexes, but you might want to create an index on a foreign key.

Indexes are automatically maintained by the database with no additional action required by the user. This however, does not imply that an index comes without costs. Always, indexes will improve query performance, but decrease performance on data manipulations. That is due to the fact that any insert/update/delete will have to maintain both objects: the table the DML is submitted on as well as the index update.

When testing an indexing strategy, a developer can take advantage of the following properties of the indexes:

usability
visibility

Usability: Indexes are by default usable. An unusable index will both not be maintained by the DML operations, nor will it be used by the optimizer. This property can help improve performance on bulk loads. Instead of dropping and recreating the index, we can easily make it unusable and then rebuild.

! Note: Unusable indexes and index partitions do not consume space. When you make a usable index unusable, the database drops its index segment.

Syntax:

to set an index to unused:

alter index test_idx unusable;

unusable

to rebuild index:

alter index test_idx rebuild;

valid after rebuild

Visibility: Indexes are by default visible. An invisible index will still be maintained by DML operations but will not be used by the optimizer. Invisible indexes are especially useful for testing the removal of an index before dropping it or using indexes temporarily without affecting the overall application.

alter index test_idx invisible;

invisible

alter index test_idx visible;

to restore the index.

Index types (based on column number):

single key index
composite index

Index types (based on data content):

unique
nonunique

Nonunique indexes permit duplicates values in the indexed column or columns. For a nonunique index, the rowid is included in the key in sorted order, so nonunique indexes are sorted by the index key and rowid (ascending).

!Note: Oracle Database does not index table rows in which all key columns are null, except for bitmap indexes or when the cluster key column value is null.

Index types (based on structure of the index):

B-tree (balanced tree index) (standard type)
- Index Organized Table (IOT)
- Reverse key index
- Descending index
- B-tree cluster index
bitmap and bitmap join index
function based index

B-tree:

excellent for PK and highly-selective indexing
data retrieved sorted by the indexed columns

By associating a key with a row or range of rows, B-trees provide excellent retrieval performance for a wide range of queries, including exact match and range searches.

IOT – this table differs from classical (heap-organized) table by the fact the data is in the index itself.

For more details on IOTs, please see related article here.

B-tree cluster indexes – is used to index a table cluster key. Instead of pointing to a row, the key points to the block that contains rows related to the cluster key.

In a bitmap index, an index entry uses a bitmap to point to multiple rows. In contrast, a B-tree index entry points to a single row. A bitmap join index is a bitmap index for the join of two or more tables.

Oracle: Partition by List Sub-Partition by Range– Example

The following post will walk you trough an exercise of creating a partitioned table, using the list partitioning, with range sub-partitioning (explicit definition of partitions and sub-partitions naming), populating and testing the partition pruning.

Please note I will also post the scripts at each section so you can replicate the work.

Creating our Work Table:

I’m creating a sample table T2 with 4 columns, with the following structure:

table-structure

SQL:

create table 
 t
(c1 char(3) not null
, c2 date not null
, c3 number
, c4 varchar2(100))
partition by list (c1)
subpartition by range(c2)
 (
  partition P1 values ('ABC')
 ( subpartition p1_20161003 values less than (to_date('04-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20161004 values less than (to_date('05-oct-2016','dd-mon-yyyy')) , subpartition p1_20161005 values less than (to_date('06-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20161006 values less than (to_date('07-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20161007 values less than (to_date('08-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20161008 values less than (to_date('09-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20170101 values less than (to_date('01-jan-2017','dd-mon-yyyy'))
 )
 , partition P2 values ('ACD')
 ( subpartition p2_20161003 values less than (to_date('04-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161004 values less than (to_date('05-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161005 values less than (to_date('06-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161006 values less than (to_date('07-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161007 values less than (to_date('08-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161008 values less than (to_date('09-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20170101 values less than (to_date('01-jan-2017','dd-mon-yyyy'))
 ) ) ;

We want to partition this table by the C1 column, and subpartition by C2 column, which is a date column: not null, splitting the data into a priorly-defined number of categories.

I’m generating a sample data set of about 100 000 rows.

table-partition-and-subpartition-after-insert

And afterwards, please note a very important step, I’m gathering my stats 🙂

insert into t3 
select 
 case 
 when mod(level,20) <11 then 'ABC'
 when mod(level,20) >10 then 'ACD'
 end as c1
 , to_date('04-oct-2016','dd-mon-yyyy')+level/24/60 
, level 
, 'test record '||level 
from dual 
connect by level <=100000; 

commit; 

execute dbms_stats.gather_table_stats(user,'T3');

Partition Pruning:

Now, let’s run a couple of test to see how partition is actually helping our performance.

Please note that i used “Autotrace” to show the actual plan and the partition pruning for our selects.

First scenario:

select * from t3;

This is our base test: select all data from our partitioned table:

full-table-scan

Second scenario:

select * from t3
where c1='BCD';

Selecting data filtering on a partition key, but a value which does not exist in the table.

filter-by-partition-key-invalid-value

Third scenario:

select * from t3
where c1='ABC';

Selecting data filtering on a partition key, on a valid value

filter-by-partition-key-valid-value

Third scenario:

select * from t3
where c2 <
     (select /*+ no_unnest + result_cache*/
            (to_Date ('05-OCT-2016', 'dd-mon-yyyy')) + 1
      from dual)
;

Filtering on multiple values of our sub-partition key :

filter-by-subpartition-key

Conclusions:

I’ve been using the SQLDeveloper Autotrace to demonstrate the partition pruning.

As you can see, the selects will do partition pruning when filtering on one or multiple partitions.

But, most interesting, the database will do partition pruning when filtering directly on the sub-partition key.

Working with Large Data Volumes – Partitioning

As more and more the increase in information is getting visible to each database user, there comes a question on how are we to process these volumes in a Data Warehouse environment.

One of the first answers provided by Oracle on this topic is Partitioning.

What is Partitioning?

Similar to an Operating System partitioning, from a database perspective, we should envision partitioning as a logical division of data into separate units (like smaller tables). This allows the database to manage, to a certain extent, information on each partition as if it were a distinct table. Doing this, implies operating smaller sections of data, improving efficiency.

Partitioning enables tables and indexes to be subdivided into individual smaller pieces. Each piece of the database object is called a partition. A partition has its own name, and may optionally have its own storage characteristics.

From the perspective of a database administrator, a partitioned object has multiple pieces that can be managed either collectively or individually. This gives the administrator considerable flexibility in managing a partitioned object.

However, from the perspective of the application, a partitioned table is identical to a non-partitioned table; no modifications are necessary when accessing a partitioned table using SQL DML commands. Logically, it is still only one table

A query-rewrite view on an union all select of identical tables (from structure perspective)

So, to a certain extend, I would compare a partitioned table with a view on multiple tables which have the same structure (and you don’t need to bother in defining each table with the same structure, just adding a partition will give it same metadata), with a special column that sorts/tells you a critical selectivity criteria for that particular data set in the entire view, and which allows query rewrite.

Why?

Looking at a view containing multiple tables, if we query, through the view, information from only one table, Oracle knows to do a very neat trick of query rewrite, and re-writes your query into a selection from only that particular table. This allows for better performance results.

Well, partitioning does something similar, in the context of selecting from one single (or a selection of) partition(s), when filtering on the partitioning key. This is called partition pruning.

Partitioning types

In Oracle, there are a couple of major partitioning types, given a certain key/column:

Range Partitioning The data is distributed based on a range of values.
List Partitioning The data distribution is defined by a discrete list of values.
Hash Partitioning An internal hash algorithm is applied to the partitioning key to determine the partition.

Also, it allows sub-partitioning, which is a combination of the primary partitioning types. This is called Composite Partitioning. First, the table is partitioned by data distribution method one and then each partition is further subdivided into sub-partitions using the second data distribution method.

Additional methods of partitioning:

Multi-Column Range Partitioning: An option for when the partitioning key is composed of several columns and subsequent columns define a higher level of granularity than the preceding ones.
Interval Partitioning: Extends the capabilities of the range method by automatically defining equi-partitioned ranges for any future partitions using an interval definition as part of the table metadata.
Reference Partitioning: Partitions a table by leveraging an existing parent-child relationship. The primary key relationship is used to inherit the partitioning strategy of the parent table to its child table.
Virtual Column Based Partitioning: Allows the partitioning key to be an expression, using one or more existing columns of a table, and storing the expression as metadata only.
Interval Reference Partitioning: An extension to reference partitioning that allows the use of interval partitioned tables as parent tables for reference partitioning.
Range Partitioned Hash Cluster: Allows hash clusters to be partitioned by ranges.

How it Works?

For examples on some of the mentioned partitioning methods, please see my following posts:

Why to use it?

Partitioning:

Increases performance by only working on the data that is relevant.
Improves availability through individual partition manageability.
Decreases costs by storing data in the most appropriate manner.
Is easy as to implement as it requires no changes to applications and queries.

References:

Please note I’ve used the official Oracle documentation for the definitions used in this post for each of the partitioning types, as well as the well known benefits. You can find the original page here.

Oracle: Partition by Range– Example

The following post will walk you trough an exercise of creating a partitioned table, using the range partitioning (with auto define of partition names), populating and testing the partition pruning.

Please note I will also post the scripts at each section so you can replicate the work.

Creating our Work Table:

I’m creating a sample table T2 with 4 columns, with the following structure:

table-t2-structure

SQL:

create table 
 t2
(c1 char(3) not null
, c2 date not null
, c3 number
, c4 varchar2(100))
partition by range(c2)
interval (numtodsinterval (1,'day'))
 (
 partition empty values less than (to_Date ('03-OCT-2016', 'dd-mon-yyyy'))
 )
;

We want to partition this table by the C2 column, which is a date column: not null, splitting the data into a non-prior-defined number of categories.

I’m generating a sample data set of about 100 000 rows.

table-t2-auto-generated-partitions

And afterwards, please note a very important step, I’m gathering my stats 🙂

insert into t2 
select 
 case 
 when mod(level,20) <11 then 'ABC'
 when mod(level,20) >10 then 'ACD'
 end as c1
 , sysdate+level/24/60
 , level
 , 'test record '||level
from dual
connect by level <=100000;


commit;

execute dbms_stats.gather_table_stats(user,'T2');

Partition Pruning:

Now, let’s run a couple of test to see how partition is actually helping our performance.

Please note that i used “Autotrace” to show the actual plan and the partition pruning for our selects.

First scenario:

select * from t2;

This is our base test: select all data from our partitioned table:

select-all

Second scenario:

select * from t2
where c1='ABC';

Selecting data filtering on a non-partition key.

select-all-when-filter-on-non-partition-key

Third scenario:

select * from t2
where c1 =
     (select /*+ no_unnest + result_cache*/
            (to_Date ('05-OCT-2016', 'dd-mon-yyyy'))
      from dual)
;

Filtering on one of the values of the partition key, ’05-OCT-2016′:

select-1-partition-filtering-on-partition-key

Third scenario:

select * from t2
where c1 <
     (select /*+ no_unnest + result_cache*/
            (to_Date ('07-OCT-2016', 'dd-mon-yyyy'))
      from dual)
;

Filtering on multiple values of our partition key :

select-multiple-partitions-filtering-on-partition-key

Conclusions:

I’ve been using the SQLDeveloper Autotrace to demonstrate the partition pruning.

As you can see, the selects will do partition pruning when filtering on one or multiple partitions.

Also, please note the initially defined partitions are names as expected, while the rest of the data, on insert, generated automatically new partitions, named by the system.

Oracle: Partition by List – Example

The following post will walk you trough an exercise of creating a partitioned table, using the list partitioning, populating and testing the partition pruning.

Please note I will also post the scripts at the end of the post, so you can download to replicate the work.

Creating our Work Table:

I’m creating a sample table T1 with 4 columns, with the following structure:

table-t1-structure

SQL:

create table 
 t1
(c1 char(3) not null
, c2 date
, c3 number
, c4 varchar2(100))
partition by list (c1)
 (
 partition ACB values ('ABC')
 , partition ACD values ('ACD')
 )
;

We want to partition this table by the C1 column, which is more, what i call, a Category column: not null, splitting the data into a couple of finite categories. In our case we have 2 categories: ‘ABC’ and ‘ACD’:

I’m generating a sample data set of about 400 000 rows, in this particular case, evenly split between the two partitions.

And afterwards, please note a very important step, I’m gathering my stats 🙂

insert into t1 
select 
 case 
 when mod(level,20) <11 then 'ABC'
 when mod(level,20) >10 then 'ACD'
 end as c1
 , sysdate+level/24
 , level
 , 'test record '||level
from dual
connect by level <=400000;


commit;

execute dbms_stats.gather_table_stats(user,'T1');

Partition Pruning:

Now, let’s run a couple of test to see how partition is actually helping our performance.

Please note that i used “Autotrace” to show the actual plan and the partition pruning for our selects.

First scenario:

select * from t1;

This is our base test: select all data from our partitioned table:

select-all

Second scenario:

select * from t1
where c1='ABC';

select * from t1
where c1='ACD';

Filtering on one of the values of the partition key, ‘ABC’:

select-abc-only

Third scenario:

select * from t1
where c1 in ('ABC', 'ACD');

Filtering on both values of our partition key (‘ABC’ and ‘ACD’):

select-both

Conclusions:

I’ve been using the SQLDeveloper Autotrace to demonstrate the partition pruning.

As you can see, the selects will do partition pruning when filtering on one or multiple partitions.

Scripts:

test-scripts

Please note scripts are uploaded as PDFs but you can still copy the code in.

	idatcu on Oracle: Partition by List Sub-…
	Nazrul on Oracle: Partition by List Sub-…
	Oracle Indexing \| BI… on Oracle Function Based Ind…
	Oracle Indexing \| BI… on Oracle Index Organized Ta…
	Sam on Undo drills and view prom…