Buffer Busy Waits – Reverse Key Index – Demo

We discussed earlier about indexing, and specifically reverse key indexing.

I mentioned that is a solution for buffer busy waits on numerical consecutive inserted keys (like the sequence based generated ones) .

This post will test that solution.

For regular index test, I’ve created the following table, sequence and associated index:

create table test_tb
( row_id number
, text varchar2(100));

create sequence test_seq start with 1 increment by 1 cache 10000;

create index test_nidx on test_tb(row_id);

and a simple procedure that will insert the rows:

create or replace procedure insert_n (p_rows number default 10000) as 
begin
 
 for i in 1..p_rows loop
 insert into test_tb
 select test_seq.nextval, 'test row '||test_seq.currval from dual;
 
 end loop;
 commit;
end;
/

 

For the reverse key index test:

create table test_tb2
( row_id number
, text varchar2(100));

create sequence test_seq2 start with 1 increment by 1 cache 10000;

create index test_ridx on test_tb2(row_id);

and a simple procedure that will insert the rows:

create or replace procedure insert_r (p_rows number default 10000) as 
begin
 
 for i in 1..p_rows loop
 insert into test_tb2
 select test_seq2.nextval, 'test row '||test_seq2.currval from dual;
 
 end loop;
 commit;
end;
/

To test the effect of concurrent inserts, I’ve executed both procedures from 3 sessions (please note in my case number of sessions is small, but will still show us the impact of the two types of indexes):

execute insert_n(100000);

execute insert_n(100000);

Now looking at the buffer busy waits we had we see a reasonable benefit already of the reverse key index:

buffer busy waits - SQLbuffer busy waits

select object_name, value, statistic_name,object_type
from v$segment_statistics
where owner='C##INSIGHT'
and object_type='INDEX'
and statistic_name='buffer busy waits'
and object_name like 'TEST%'
;

 

 

 

 

 

Posted in OBIEE | 1 Comment

Oracle Reverse Index

This is one of the least used Oracle indexes I’ve encountered in a data-warehousing environment. However, a very good instrument in performance problems on OLTPs.

To understand a reverse key index, we must first look at the regular b-tree index on a numerical key, and its storage in the database memory. The index will store, in ascending sort order, all it’s keys on disk. If we consider a key generated by a sequence, then any new value for that key will therefore be inserted in the last database block.

On a high frequency insert setup system in this type of case, you will encounter the buffer busy waits event one too many times.

To see the test on this events, please see post on Buffer Busy Waits – Reverse Key Index – Demo

Now, on considering a reverse key index, the key stored is actually the reversed value. For instance, for values 12, 13, 14 the stored keys will be 21, 31 and 41. This type of index will scatter its consecutive key inserts into multiple blocks, removing the hot-block problem regular keys have.

!!! Please note there is a downside to this approach, meaning the index will no longer allow range scans.

How to create a reverse key index:

create index INDEX_NAME on TABLE_NAME(INDEX_KEY) reverse;

It is also possible to change an existing index, making it reverse:

alter index INDEX_NAME rebuild reverse;

reversed btree

And also to revert a reverse key index into a regular index:

alter index INDEX_NAME rebuild noreverse;

regular btree

Posted in DW | Tagged , , , , , , | Leave a comment

Cursor Loop Updates

I’m writing this topic mostly for database developer coming from the programming world.

I have seen various procedural units which use the for syntax to run updates on base tables. Scenarios like the one bellow.

FOR update SQL:

for i in (select * from t1) loop
 update t1
 set amount=amount*(1+(Select adjustment from t2 where discount=i.discount 
    and categ=i.categ))
 where row_id=i.row_id;
end loop;
commit;

The scenario above is the basic example where one table is updated using values from a second table.

Now, in order to understand the basic problem with the above example there is something else you need to keep in mind. In Oracle, the PL/SQL and the SQL are 2 very specific modules. Whenever you call a SQL statement in a PL/SQL block, you are doing a context switch. While this can be very easy on a simple statement, larger data volumes updates like the one above will show you the cost of that context switch.

The scope of this post is to create a thinking methodology shift into a database oriented one, explaining the pro an cons of this approach, together with options of conversion for this type of code into classical DML statements.

Basically, what I’m trying to suggest is use PL/SQL only when procedural thinking is required, but do use your SQL in the most efficient manner to avoid these costly mistakes. Note that 70-80% of these type of syntax use cases can be rewritten in plain SQL.

For instance, our prior for loop can be rewritten into :

merge into t1
 using t2
on ( t1.discount=t2.discount and t1.categ=t2.categ)
when matched then 
 update
 set amount=amount*(1+t2.adjustment);
commit;

Now,

The first example, using the cursor based update, in my demo VM test environment run for about 15 minutes to update a table of 100K rows using a mapping table of 19 rows

while the second occurrence took only  less than 2 seconds.

loop-updatemerge

Now please note this is a basic example, but I have not, as of now, met a situation where this was not the case.

Please note merge will not allow you to update the match condition key fields, therefore some situations might require you to adjust your syntax accordingly.

Scripts:

test-scripts

Please note scripts are uploaded as PDFs but you can still copy the code in.

Note: this was run on Oracle Database 12C 12.1.0.1.0

Posted in Oracle DB Intro | Tagged , , , , | Leave a comment

Oracle: Partitioning and Indexes 

As I’ve been presenting in the last couple of posts on partitioning, one of the major benefits of this database option is basically the selectivity when filtering on the partition key, what we currently call partition pruning.

Now, what we’ve been experimenting with so far are simple based table. However, in the data warehousing world, things are not so simple.
One of the most common “performance fixes” that we see in DW is indexing. However, indexing is not always good, and combined with partitioning, it might prove to be a very unfortunate combination, if not done properly.

Partitioning world gives us two options in indexing:

  • Local indexing
  • Global indexing

Basically, a local index will behave similar to a partition table, being split into partitions and best performing within partition pruning cases.
The global index, on the other side, will behave like a regular index, with one particularity: when used, it will disable the partition pruning.

Basically, from my tests so far, global indexes and partition pruning don’t mix.

How and Which

The following section will provide you a simple tested guideline on how to mix these two database performance hacks to reap the highest value out of your implementation.

Unique indexes should be global – when we use unique indexes, the selectivity provided by them should be much higher than the one from partition selectivity

Non-unique indexes – local indexes – for more skewed data across partitions. This is perfect  in combination with partition pruning.

Scenario 1: Global non-unique index

Use the table T1 created as part of post Oracle: Partition by List – Example

Create index on the date column:

SQL:

create index t1_dt on t1(c2);

Now looking at regular SQL for selecting a partition and then filtering on the date column:

select * from t1
where 1=1
and c1 in ('ACD')
and c2 between trunc(sysdate) and trunc(sysdate+1);

global-non-unique-index

Create local index on the date column (drop the prior created index):

SQL:

drop index t1_dt;
create index t1_dt on t1(c2) local;

Now looking at same SQL for selecting a partition and then filtering on the date column:

select * from t1
where 1=1
and c1 in ('ACD')
and c2 between trunc(sysdate) and trunc(sysdate+1);

local-non-unique-index

 

Posted in DW | Tagged , , , , | Leave a comment

Oracle: Partition by List Sub-Partition by Range– Example

The following post will walk you trough an exercise of creating a partitioned table, using the list partitioning, with range sub-partitioning (explicit definition of partitions and sub-partitions naming), populating and testing the partition pruning.

Please note I will also post the scripts at each section so you can replicate the work.

Creating our Work Table:

I’m creating a sample table T2 with 4 columns, with the following structure:

table-structure

SQL:

create table 
 t
(c1 char(3) not null
, c2 date not null
, c3 number
, c4 varchar2(100))
partition by list (c1)
subpartition by range(c2)
 (
  partition P1 values ('ABC')
 ( subpartition p1_20161003 values less than (to_date('04-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20161004 values less than (to_date('05-oct-2016','dd-mon-yyyy')) , subpartition p1_20161005 values less than (to_date('06-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20161006 values less than (to_date('07-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20161007 values less than (to_date('08-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20161008 values less than (to_date('09-oct-2016','dd-mon-yyyy'))
 , subpartition p1_20170101 values less than (to_date('01-jan-2017','dd-mon-yyyy'))
 )
 , partition P2 values ('ACD')
 ( subpartition p2_20161003 values less than (to_date('04-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161004 values less than (to_date('05-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161005 values less than (to_date('06-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161006 values less than (to_date('07-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161007 values less than (to_date('08-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20161008 values less than (to_date('09-oct-2016','dd-mon-yyyy'))
 , subpartition p2_20170101 values less than (to_date('01-jan-2017','dd-mon-yyyy'))
 ) ) ;

We want to partition this table by the C1 column, and subpartition by C2 column, which is a date column: not null, splitting the data into a priorly-defined number of categories.

I’m generating a sample data set of about 100 000 rows.

table-partition-and-subpartition-after-insert

And afterwards, please note a very important step, I’m gathering my stats 🙂

insert into t3 
select 
 case 
 when mod(level,20) <11 then 'ABC'
 when mod(level,20) >10 then 'ACD'
 end as c1
 , to_date('04-oct-2016','dd-mon-yyyy')+level/24/60 
, level 
, 'test record '||level 
from dual 
connect by level <=100000; 

commit; 

execute dbms_stats.gather_table_stats(user,'T3');

Partition Pruning:

Now, let’s run a couple of test to see  how partition is actually helping our performance.

Please note that i used “Autotrace” to show the actual plan and the partition pruning for our selects.

First scenario:

select * from t3;

This is our base test: select all data from our partitioned table:

full-table-scan

Second scenario:

select * from t3
where c1='BCD';

Selecting data filtering on a partition key, but a value which does not exist in the table.

filter-by-partition-key-invalid-value

Third scenario:

select * from t3
where c1='ABC';

Selecting data filtering on a partition key, on a valid value

filter-by-partition-key-valid-value

Third scenario:

select * from t3
where c2 <
     (select /*+ no_unnest + result_cache*/
            (to_Date ('05-OCT-2016', 'dd-mon-yyyy')) + 1
      from dual)
;

Filtering on multiple values of our sub-partition key :

filter-by-subpartition-key

Conclusions:

I’ve been using the SQLDeveloper Autotrace to demonstrate the partition pruning.

As you can see, the selects will do partition pruning when filtering on one or multiple partitions.

But, most interesting, the database will do partition pruning when filtering directly on the sub-partition key.

Posted in DW | Tagged , , , , , , , , , | 1 Comment

Working with Large Data Volumes – Partitioning

As more and more the increase in information is getting visible to each database user, there comes a question on how are we to process these volumes in a Data Warehouse environment.

One of the first answers provided by Oracle on this topic is Partitioning.

What is Partitioning?

Similar to an Operating System partitioning, from a database perspective, we should envision partitioning as a logical division of data into separate units (like smaller tables). This allows the database to manage, to a certain extent, information on each partition as if it were a distinct table. Doing this, implies operating smaller sections of data, improving efficiency.

Partitioning enables tables and indexes to be subdivided into individual smaller pieces. Each piece of the database object is called a partition. A partition has its own name, and may optionally have its own storage characteristics.

From the perspective of a database administrator, a partitioned object has multiple pieces that can be managed either collectively or individually. This gives the administrator considerable flexibility in managing a partitioned object.

However, from the perspective of the application, a partitioned table is identical to a non-partitioned table; no modifications are necessary when accessing a partitioned table using SQL DML commands. Logically, it is still only one table

A query-rewrite view on an union all select of identical tables (from structure perspective)

So, to a certain extend, I would compare a partitioned table with a view on multiple tables which have the same structure (and you don’t need to bother in defining each table with the same structure, just adding a partition will give it same metadata), with a special column that sorts/tells you a critical selectivity criteria for that particular data set in the entire view, and which allows query rewrite.

Why?

Looking at a view containing multiple tables, if we query, through the view, information from only one table, Oracle knows to do a very neat trick of query rewrite, and re-writes your query into a selection from only that particular table. This allows for better performance results.

Well, partitioning does something similar, in the context of selecting from one single (or a selection of) partition(s), when filtering on the partitioning key. This is called partition pruning.

Partitioning types

In Oracle, there are a couple of major partitioning types, given a certain key/column:

  • Range Partitioning The data is distributed based on a range of values.
  • List Partitioning The data distribution is defined by a discrete list of values.
  • Hash Partitioning An internal hash algorithm is applied to the partitioning key to determine the partition.

Also, it allows sub-partitioning, which is a combination of the primary partitioning types. This is called Composite Partitioning. First, the table is partitioned by data distribution method one and then each partition is further subdivided into sub-partitions using the second data distribution method.

Additional methods of partitioning:

  • Multi-Column Range Partitioning: An option for when the partitioning key is composed of several columns and subsequent columns define a higher level of granularity than the preceding ones.
  • Interval Partitioning: Extends the capabilities of the range method by automatically defining equi-partitioned ranges for any future partitions using an interval definition as part of the table metadata.
  • Reference Partitioning: Partitions a table by leveraging an existing parent-child relationship. The primary key relationship is used to inherit the partitioning strategy of the parent table to its child table.
  • Virtual Column Based Partitioning: Allows the partitioning key to be an expression, using one or more existing columns of a table, and storing the expression as metadata only.
  • Interval Reference Partitioning: An extension to reference partitioning that allows the use of interval partitioned tables as parent tables for reference partitioning.
  • Range Partitioned Hash Cluster: Allows hash clusters to be partitioned by ranges.

How it Works?

For examples on some of the mentioned partitioning methods, please see my following posts:

See also sub-partitioning examples :

Why to use it?

Partitioning:

  • Increases performance by only working on the data that is relevant.
  • Improves availability through individual partition manageability.
  • Decreases costs by storing data in the most appropriate manner.
  • Is easy as to implement as it requires no changes to applications and queries.

References:

Please note I’ve used the official Oracle documentation for the definitions used in this post for each of the partitioning types, as well as the well known benefits. You can find the original page here.

Posted in DW | Tagged , , , , , , , , | Leave a comment

Oracle: Partition by Range– Example

The following post will walk you trough an exercise of creating a partitioned table, using the range partitioning (with auto define of partition names), populating and testing the partition pruning.

Please note I will also post the scripts at each section so you can replicate the work.

Creating our Work Table:

I’m creating a sample table T2 with 4 columns, with the following structure:

table-t2-structure

SQL:

create table 
 t2
(c1 char(3) not null
, c2 date not null
, c3 number
, c4 varchar2(100))
partition by range(c2)
interval (numtodsinterval (1,'day'))
 (
 partition empty values less than (to_Date ('03-OCT-2016', 'dd-mon-yyyy'))
 )
;

We want to partition this table by the C2 column, which is a date column: not null, splitting the data into a non-prior-defined number of categories.

I’m generating a sample data set of about 100 000 rows.

table-t2-auto-generated-partitions

And afterwards, please note a very important step, I’m gathering my stats 🙂

insert into t2 
select 
 case 
 when mod(level,20) <11 then 'ABC'
 when mod(level,20) >10 then 'ACD'
 end as c1
 , sysdate+level/24/60
 , level
 , 'test record '||level
from dual
connect by level <=100000;


commit;

execute dbms_stats.gather_table_stats(user,'T2');

Partition Pruning:

Now, let’s run a couple of test to see  how partition is actually helping our performance.

Please note that i used “Autotrace” to show the actual plan and the partition pruning for our selects.

First scenario:

select * from t2;

This is our base test: select all data from our partitioned table:

select-all

Second scenario:

select * from t2
where c1='ABC';

Selecting data filtering on a non-partition key.

select-all-when-filter-on-non-partition-key

Third scenario:

select * from t2
where c1 =
     (select /*+ no_unnest + result_cache*/
            (to_Date ('05-OCT-2016', 'dd-mon-yyyy'))
      from dual)
;

Filtering on one of the values of the partition key, ’05-OCT-2016′:

select-1-partition-filtering-on-partition-key

Third scenario:

select * from t2
where c1 <
     (select /*+ no_unnest + result_cache*/
            (to_Date ('07-OCT-2016', 'dd-mon-yyyy'))
      from dual)
;

Filtering on multiple values of our partition key :

select-multiple-partitions-filtering-on-partition-key

Conclusions:

I’ve been using the SQLDeveloper Autotrace to demonstrate the partition pruning.

As you can see, the selects will do partition pruning when filtering on one or multiple partitions.

Also, please note the initially defined partitions are names as expected, while the rest of the data, on insert, generated automatically new partitions, named by the system.

Posted in DW | Tagged , , , , , , , | 1 Comment