Notes on histograms

Posted on February 12, 2018 by Brian Fitzgerald

By Brian Fitzgerald

This blog post is a worked example of Oracle histograms. In this case, a frequency histogram is demonstrated in the context of a single table query.

Setup

cr.histodemo.sql

Table ords. Intended column data distribution:

10000 rows
10000 unique ordid
1000 categories, evenly distributed
2 distinct status, skewed
99.99% COMPLETE
0.01% PENDING

One index on each column, ORDID, CATEGORYID, and STATUS.

Not null constraints are there to simplify the demonstration.

whenever oserror exit 1
@ sqlerc.sql
drop table ords purge;
@ sqlerx.sql

create table ords
(
ordid number not null,
categoryid number not null,
status varchar2(10) not null
);

insert into ords ( ordid, categoryid, status )
select
level,
mod( level, 1000 ),
case mod( level, 10000) when 0 then 'PENDING' else 'COMPLETE' end
from dual
connect by level <= 10000;

alter table ords add constraint ords primary key ( ordid );
create index category_idx on ords ( categoryid );
create index status_idx on ords ( status );

quit

output:

[oracle@stormking db12201 stats]$ sqlplus u/u @ cr.histodemo.sql
SQL*Plus: Release 12.2.0.1.0 Production on Sun Feb 11 11:37:31 2018
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Last Successful login time: Sun Feb 11 2018 00:10:42 -05:00
Connected to:
Oracle Database 12c Enterprise Edition 
Release 12.2.0.1.0 - 64bit Production
Table dropped.
Table created.
10000 rows created.
Table altered.
Index created.
Index created.
Disconnected from Oracle Database 12c Enterprise Edition 
Release 12.2.0.1.0 - 64bit Production

Execution scripts

query.ords.sql

Thwo queries on table ords.

equality predicate on STATUS only
equality predicate on STATUS and CATEGORYID

The 10053 trace shows optimizer’s rationale.

set linesize 100
@ sqlerx.sql
column plan_table_output format a80

@ set.tracefile.identifier.sql status
@ trace.10053.on.sql

select ordid from ords
where status = 'PENDING'
;
select * from table(dbms_xplan.display_cursor( format=>'basic' ));
@ trace.10053.off.sql

@ set.tracefile.identifier.sql cat
@ trace.10053.on.sql

select ordid from ords
where categoryid = 0
and status = 'COMPLETE'
;
select * from table(dbms_xplan.display_cursor( format=>'basic' ));
@ trace.10053.off.sql

rpt.col.usage.sql

Script to show what types of workload has run on the columns. Database monitoring is flushed for the sake of displaying the col_usage report before gathering stats in this demonstration. Gather stats automatically flushes column usagem so alling flush_database_monitoring_info is generally not required.

define ownname=&&1
define tabname=&&2
whenever oserror exit 1
@ sqlerx.sql
@ l32k.p50k.sql

set verify off

set long 2000000000
exec dbms_stats.flush_database_monitoring_info;
column rpt format a160
select dbms_stats.report_col_usage('&&ownname','&&tabname') rpt from dual;

gather.table.stats.sql

Invalidation, in this case, is to force a 10053 trace for the sake of this demonstration. The 10046 trace and tkprof is for checking what queries on ords get run during gather stats. With the exception of no_invalidate, dbms_stats is run with the default options to show how histograms are gathered automatically during routine database maintenance. Estimate_percent defaults to auto_sample_size. Gathering or deleting a histogram can be forced by using option method_opt.

define ownname=&&1
define tabname=&&2
whenever oserror exit 1
@ sqlerx.sql
set verify off

@ set.tracefile.identifier.sql stats
@ trace.10046.on.sql

begin
 dbms_stats.gather_table_stats (
 ownname => '&&ownname',
 tabname => '&&tabname',
 no_invalidate => false
 );
end;
/
prompt done gather stats on &&ownname..&&tabname

@ trace.10046.off.sql
@ tkprof.sql

user.tab.statistics.sql

Script to show the existence or nonexistence of a histogram

define tabname=&&1
whenever oserror exit 1
@ sqlerx.sql
@ l32k.p50k.sql
@ columnformat.sql

select
table_name,
blocks,
num_rows
from user_tab_statistics
where
table_name = '&&tabname';

select
index_name,
leaf_blocks,
num_rows,
distinct_keys
from user_ind_statistics
where table_name = '&&tabname';

select
column_name,
num_distinct,
density,
histogram,
num_buckets
from user_tab_col_statistics
where table_name = '&&tabname';

run.histodemo.sql

The demonstration script

whenever oserror exit 1
@ spoolfile.uniq.sql ords

@ rpt.col.usage.sql U ORDS
@ gather.table.stats.sql U ORDS
@ user.tab.statistics.sql ORDS
@ query.ords.sql

@ spooloff.sql
quit

Execution run 1

Files

ords.db12201.20180211.142321.84.txt

run.histodemo.sql output

db12201_ora_20225_STATS.trc

10043 trace of gather table stats

db12201_ora_20225_STATS.tkp

tkprof of gather table stats

db12201_ora_20225_STATUS.trc

10053 trace of “where status = ‘PENDING'”

db12201_ora_20225_CAT.trc

10053 trace of “where categoryid = 0 and status = ‘COMPLETE'”

Highlights

Because table ords is newly created, the column usage report shows no workload
Gather stats ran four queries on table ords, namely a full table scan and one per index.

$ grep -i '"U"."ORDS" t' db12201_ora_20225_STATS.tkp
 "U"."ORDS" t /* ACL,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/
 "U"."ORDS" t where "ORDID" is not null
 "U"."ORDS" t where "CATEGORYID" is not null
 "U"."ORDS" t where "STATUS" is not null

No histogram was gathered
ORDID is most selective column, having the highest num_distinct and lowest density
CATEGORYID is medium-selective
STATUS is not, in general, selective, having only 2 distinct values

COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM NUM_BUCKETS
------------------------------ ------------ ---------- -------------------
ORDID 10000 .0001 NONE 1
CATEGORYID 1000 .001 NONE 1
STATUS 2 .5 NONE 1

No histogram is present on status. Density is computed:

density = 1 / num_distinct = 1/2 = .5

“where status = ‘PENDING'” catches a full table scan

----------------------------------
| Id | Operation | Name |
----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| ORDS |
----------------------------------

“where categoryid = 0 and status = ‘COMPLETE'” catches an index range scan

------------------------------------------------------------
| Id | Operation | Name |
------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| ORDS |
| 2 | INDEX RANGE SCAN | CATEGORY_IDX |
-----------------------------------------------------------

Execution run 2

Files

ords.db12201.20180211.142525.84.txt

run.histodemo.sql output for run 2.

db12201_ora_20607_STATS.trc

db12201_ora_20607_STATS.tkp

Gather stats 10046 trace and tkprof.

db12201_ora_20607_STATUS.trc

db12201_ora_20607_CAT.trc

10053 trace again for the queries, “where status” and “where categoryid …
and status”.

Highlights

The column usage report shows that equality predicates have run on columns CATEGORYID and STATUS.

COLUMN USAGE REPORT FOR U.ORDS
..............................

1. CATEGORYID : EQ
2. STATUS : EQ

This time, gather stats ran six queries on table ords.

$ grep -i '"U"."ORDS" t' db12201_ora_20607_STATS.tkp | wc -l
6

gather table stats ran this query on column STATUS:

select 
/*+ no_parallel(t) no_parallel_index(t) dbms_stats 
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) 
no_monitoring xmlindex_sel_idx_tbl 
opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad 
*/
 substrb(dump("STATUS",16,0,64),1,240) val,
 rowidtochar(rowid) rwid 
from "U"."ORDS" t 
where rowid in (
chartorowid('AAAWt1AAHAAAAFjAAA'),
chartorowid('AAAWt1AAHAAAAGCAAv')) 
order by "STATUS"

Clearly, the only purpose is to obtain the two unique STATUS values.

Gather stats ran this query on column CATEGORYID:

select 
substrb(dump(val,16,0,64),1,240) ep, 
freq, 
cdn, 
ndv,
(sum(pop) over()) popcnt,
(sum(pop*freq) over()) popfreq,
substrb(dump(max(val) over(),16,0,64),1,240) maxval,
substrb(dump(min(val) over(),16,0,64),1,240) minval
from
 (
select val, 
freq, 
(sum(freq) over()) cdn, 
(count(*) over()) ndv,
(case when freq > ((sum(freq) over())/254) then 1 else 0 end) pop
from (select
  /*+ no_parallel(t) no_parallel_index(t) dbms_stats
    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) 
    no_monitoring xmlindex_sel_idx_tbl 
    opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad
 */
  "CATEGORYID" val, 
  count("CATEGORYID") freq 
from "U"."ORDS" t 
where "CATEGORYID" is not null 
group by "CATEGORYID"
)) order by val

Inference: dbms_stats is collecting information needed to construct a histogram on STATUS and CATEGORYID.

Notice the column name “ep”, which could refer to a histogram endpoint value.

“pop” = 1 if popular, 0 if non-popular.

dbms_stats stored a histogram for column status but not for categoryid

COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM NUM_BUCKETS
------------------------------ ------------ ---------- ------- -----------
ORDID 10000 .0001 NONE 1
CATEGORYID 1000 .001 NONE 1
STATUS 2 .00005 FREQUENCY 2

Inference: dbms_stats found skew in status but not in categoryid

The reported density on column status has changed:

82c84
< STATUS 2 .5 NONE 1
---
> STATUS 2 .00005 FREQUENCY 2

According to 2007 note New Density calculation in 11g by Alberto Dell’Era, density = 0.5 / num_rows for frequency histograms. Dell’Era’s paper refers only to not null values. In other words, for frequency histograms, density, as stored in the catalog, should be

density = .5 / num_values = .5 / ( num_rows – num_nulls )

I have found this to be accurate to within 2.5e-15 for 99% out of the 30,000 frequency histograms in one example database, where the histograms were mostly gathered with Oracle 11.1.0.7 binaries. After gathering statistics with Oracle 12.1.0.2 binaries, dba_tab_col_statistics density is found to agree with formula .5 / ( num_rows – num_nulls ) to within 2.5e-15 for 99.95% of all frequency histograms.This formula is found to apply only for the cases where num_distinct >1. For the remaining 0.05%, the density is within 2% of the formula. All the discrepant densities are IOTs; however, not all IOTs are discrepant, so I must be missing some detail.

In this example, num_values = 10000 – 0 = 10000. Density = .5 / 10000 = 0.00005.

The endpoint values are, roughly, using cre_hist_funcs.sql, function hist_numtochar, by Martin Widlake:

ENDPOINT_NUMBER VAL
--------------- ----------
 9999 COMPLEP
 10000 PENDIN<

Comment: density is used to cost predicates on non-popular values.

“where status = ‘COMPLETE'” remains non-selective.

“where status =’PENDING'” is selective.

Compare run without histogram vs run with histogram

where status =’PENDING’

diff run1/db12201_ora_20225_STATUS.trc run2/db12201_ora_20607_STATUS.trc

Output here: 10053 diff: where status = ‘PENDING’

There several points of interest in the 10053 trace comparison

With the histogram, the selectivity of status=’PENDING’ is 1.0000e-4, which is exactly correct.

875,876c877,879
< AvgLen: 9 NDV: 2 Nulls: 0 Density: 0.500000
< Estimated selectivity: 0.500000 , col: #3
---
> AvgLen: 9 NDV: 2 Nulls: 0 Density: 0.000050
> Histogram: Freq #Bkts: 2 UncompBkts: 10000 EndPtVals: 2 ActualVal: no
> Estimated selectivity: 1.0000e-04 , endpoint value predicate, col: #3

The plan changes thus:

1016,1021c1028,1034
< -------------------------------------+-----------------------------------+
< | Id | Operation | Name | Rows | Bytes | Cost | Time |
< -------------------------------------+-----------------------------------+
< | 0 | SELECT STATEMENT | | | | 11 | |
< | 1 | TABLE ACCESS FULL | ORDS | 5000 | 63K | 11 | 00:00:01 |
< -------------------------------------+-----------------------------------+
---
> ---------------------------------------------------------+-----------------------------------+
> | Id | Operation | Name | Rows | Bytes | Cost | Time |
> ---------------------------------------------------------+-----------------------------------+
> | 0 | SELECT STATEMENT | | | | 2 | |
> | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ORDS | 1 | 13 | 2 | 00:00:01 |
> | 2 | INDEX RANGE SCAN | STATUS_IDX| 1 | | 1 | 00:00:01 |
> ---------------------------------------------------------+-----------------------------------+

where categoryid = 0 and status = ‘COMPLETE’

diff run1/db12201_ora_20225_CAT.trc run2/db12201_ora_20607_CAT.trc

output here: 10053 diff: where categoryid and status

With a histogram, the estimated cost of using an index to access a popular value is higher. All the more reason not to use STATUS_IDX

 917 Access Path: index (AllEqRange)
 918 Index: STATUS_IDX

919,921c922,924
< resc_io: 30.000000 resc_cpu: 2414493
< ix_sel: 0.500000 ix_sel_with_filters: 0.500000
< Cost: 30.064126 Resp: 30.064126 Degree: 1
---
> resc_io: 60.000000 resc_cpu: 4827696
> ix_sel: 0.999900 ix_sel_with_filters: 0.999900
> Cost: 60.128217 Resp: 60.128217 Degree: 1

The cost of CATEGORY_IDX is unchanged.

 904 ****** Costing Index CATEGORY_IDX
 905 SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
 906 SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
 907 Estimated selectivity: 0.001000 , col: #2
 908 Access Path: index (AllEqRange)
 909 Index: CATEGORY_IDX
 910 resc_io: 11.000000 resc_cpu: 83586
 911 ix_sel: 0.001000 ix_sel_with_filters: 0.001000
 912 Cost: 11.002220 Resp: 11.002220 Degree: 1

CATEGORY_IDX is more selective and less costly than STATUS_IDX. The optimizer chooses CATEGORY_IDX.

Need for histogram in an application

The need for a histogram depends on the application. In this example, no histogram is needed on columns ORDID and CATEGORYID because the values are evenly distributed. There is no skew.

If the application code base consists of only this query, then no histogram is needed on STATUS:

select ordid from ords
where categoryid = 0
and status = 'COMPLETE'

If the application, additionally, has this query, then a histogram is needed on STATUS.

select ordid from ords
where status = 'PENDING'

What about “where status = ‘COMPLETE'”?

The question sometimes arises, what about a query with just “where status = ‘COMPLETE'”. Such a query would access a high number of blocks and return a high number of rows, and would be quickly identified as a poor performer. It is not clear what business value would be served by such a query. Such queries appear less commonly in practical applications.

What about two plans for the same statement?

Histograms are sometimes touted for their usefulness in finding more than one plan for the same sql. This blog post does not identify any such examples.

The hypothetical situation we are describing is:

The SQL is the same.
The plans are different.
In one case, the predicate is on a popular value and in the other case, a rare value.
In the popular case, the query may return a high number of rows, or a plan step may return a high number of rows to its antecedent.
In the non-popular case, the query accesses few blocks and returns quickly.
The popular and non-popular cases have business value.
The slower response time of the popular case is acceptable.

Findings

Column usage in predicates, as recorded in col_usage$, leads to analyzing for, but not necessarily storage of, histograms.
10053 trace files show the optimizer’s rationale for choosing its final execution plan.
In some cases, a histogram helps the optimizer choose a less costly plan.
In some cases, a histogram improves the accuracy of a cost computation, but does not affect the final choice of plan.

10053 diff: group by status

Posted on February 12, 2018 by Brian Fitzgerald

[oracle@stormking db12201 12.2]$ diff run1/db12201_ora_20225_GBY.trc run2/db12201_ora_20607_GBY.trc
1c1
< Trace file /u01/app/oracle/diag/rdbms/db12201/db12201/trace/db12201_ora_20225_GBY.trc
---
> Trace file /u01/app/oracle/diag/rdbms/db12201/db12201/trace/db12201_ora_20607_GBY.trc
12,13c12,13
< Oracle process number: 41
< Unix process pid: 20225, image: oracle@stormking (TNS V1-V3)
---
> Oracle process number: 40
> Unix process pid: 20607, image: oracle@stormking (TNS V1-V3)
16,22c16,22
< *** 2018-02-11T14:23:23.367313-05:00
< *** SESSION ID:(84.19304) 2018-02-11T14:23:23.367313-05:00
< *** CLIENT ID:() 2018-02-11T14:23:23.367313-05:00
< *** SERVICE NAME:(SYS$USERS) 2018-02-11T14:23:23.367313-05:00
< *** MODULE NAME:(SQL*Plus) 2018-02-11T14:23:23.367313-05:00
< *** ACTION NAME:() 2018-02-11T14:23:23.367313-05:00
< *** CLIENT DRIVER:(SQL*PLUS) 2018-02-11T14:23:23.367313-05:00
---
> *** 2018-02-11T14:25:27.002021-05:00
> *** SESSION ID:(84.15782) 2018-02-11T14:25:27.002021-05:00
> *** CLIENT ID:() 2018-02-11T14:25:27.002021-05:00
> *** SERVICE NAME:(SYS$USERS) 2018-02-11T14:25:27.002021-05:00
> *** MODULE NAME:(SQL*Plus) 2018-02-11T14:25:27.002021-05:00
> *** ACTION NAME:() 2018-02-11T14:25:27.002021-05:00
> *** CLIENT DRIVER:(SQL*PLUS) 2018-02-11T14:25:27.002021-05:00
25c25
< *** TRACE CONTINUED FROM FILE /u01/app/oracle/diag/rdbms/db12201/db12201/trace/db12201_ora_20225_CAT.trc ***
---
> *** TRACE CONTINUED FROM FILE /u01/app/oracle/diag/rdbms/db12201/db12201/trace/db12201_ora_20607_CAT.trc ***
27c27
< Registered qb: SEL$1 0xf972ef68 (PARSER)
---
> Registered qb: SEL$1 0x6bfd6f68 (PARSER)
862c862
< Registered qb: SEL$1 0xf94dbf10 (COPY SEL$1)
---
> Registered qb: SEL$1 0x6c4d9b28 (COPY SEL$1)
876c876
< call(in-use=1736, alloc=16344), compile(in-use=130400, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=130400, alloc=136232), execution(in-use=2936, alloc=4032)
879c879
< call(in-use=1736, alloc=16344), compile(in-use=126272, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=126272, alloc=136232), execution(in-use=2936, alloc=4032)
883c883
< call(in-use=1736, alloc=16344), compile(in-use=126272, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=126272, alloc=136232), execution(in-use=2936, alloc=4032)
888c888
< call(in-use=1736, alloc=16344), compile(in-use=126680, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=126704, alloc=136232), execution(in-use=2936, alloc=4032)
891c891
< call(in-use=1736, alloc=16344), compile(in-use=126800, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=126824, alloc=136232), execution(in-use=2936, alloc=4032)
897c897
< call(in-use=1736, alloc=16344), compile(in-use=126800, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=126824, alloc=136232), execution(in-use=2936, alloc=4032)
901c901
< call(in-use=1736, alloc=16344), compile(in-use=127208, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=127272, alloc=136232), execution(in-use=2936, alloc=4032)
904c904
< call(in-use=1736, alloc=16344), compile(in-use=127328, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=127392, alloc=136232), execution(in-use=2936, alloc=4032)
911c911
< call(in-use=1736, alloc=16344), compile(in-use=127328, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=127392, alloc=136232), execution(in-use=2936, alloc=4032)
916c916
< call(in-use=1736, alloc=16344), compile(in-use=127736, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=127800, alloc=136232), execution(in-use=2936, alloc=4032)
919c919
< call(in-use=1736, alloc=16344), compile(in-use=127856, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=127920, alloc=136232), execution(in-use=2936, alloc=4032)
927c927
< call(in-use=1736, alloc=16344), compile(in-use=127856, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=127920, alloc=136232), execution(in-use=2936, alloc=4032)
930c930
< call(in-use=1736, alloc=16344), compile(in-use=128264, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=128328, alloc=136232), execution(in-use=2936, alloc=4032)
933c933
< call(in-use=1736, alloc=16344), compile(in-use=128424, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=128448, alloc=136232), execution(in-use=2936, alloc=4032)
949c949
< call(in-use=1736, alloc=16344), compile(in-use=129680, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=1736, alloc=16344), compile(in-use=129728, alloc=136232), execution(in-use=2936, alloc=4032)
951c951
< kkoqbc-subheap (create addr=0x7fddf9729810)
---
> kkoqbc-subheap (create addr=0x7f286bfd1810)
991a992,993
> Column (#3):
> NewDensity:0.000050, OldDensity:0.000050 BktCnt:10000.000000, PopBktCnt:9999.000000, PopValCnt:1, NDV:2
993c995,996
< AvgLen: 9 NDV: 2 Nulls: 0 Density: 0.500000
---
> AvgLen: 9 NDV: 2 Nulls: 0 Density: 0.000050
> Histogram: Freq #Bkts: 2 UncompBkts: 10000 EndPtVals: 2 ActualVal: no
1145c1148
< kkoqbc-subheap (delete addr=0x7fddf9729810, in-use=27312, alloc=32840)
---
> kkoqbc-subheap (delete addr=0x7f286bfd1810, in-use=27312, alloc=32840)
1148c1151
< call(in-use=9832, alloc=65656), compile(in-use=133208, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=9624, alloc=65656), compile(in-use=135032, alloc=136232), execution(in-use=2936, alloc=4032)
1154c1157
< call(in-use=9832, alloc=65656), compile(in-use=136304, alloc=139528), execution(in-use=2936, alloc=4032)
---
> call(in-use=9624, alloc=65656), compile(in-use=138208, alloc=140376), execution(in-use=2936, alloc=4032)
3080c3083
< SEL$1 0xf972ef68 (PARSER) [FINAL]
---
> SEL$1 0x6bfd6f68 (PARSER) [FINAL]
3083c3086
< call(in-use=12280, alloc=65656), compile(in-use=159680, alloc=218968), execution(in-use=7392, alloc=8088)
---
> call(in-use=12072, alloc=65656), compile(in-use=161568, alloc=223424), execution(in-use=7392, alloc=8088)

Permanently deleting histograms

Posted on February 5, 2018 by Brian Fitzgerald

By Brian Fitzgerald

“A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column.” Without histograms, the optimizer could underestimate the cost of a table access for a nonselective predicate, and as a result, choose an inefficient index access path.

There are times when the DBA will want to delete histograms, including:

Issues with bind peeking.
Issues with adaptive cursor sharing.
Limit SYSAUX growth.
High number of child cursors resulting from adaptive cursor sharing while using a database resident connection pool (DRCP).
Plan instability caused by histograms.
Troubleshooting any of these.
Temporary workaround for any of these.
Permanent workaround for any of these.

You can delete a histogram, but that has obvious problems. In the first place, if you delete a histogram, you haven’t prevented it from coming back. You can lock statistics, but locking statistics has its drawbacks. You can only lock statistics at the table level, not the column level. If you table lock stats, other critical statistics, such as num_rows, num_nulls, and num_distinct will not get gathered. The optimizer will not keep up with changes in table population. Finally, to prevent reappearance of histograms, some DBAs decide to disable automatic statistics gathering. I usually do not agree with disabling automatic statistics gathering.

A more flexible way to delete a histogram is to set table preference method_opt to for columns size 1 for the desired column or columns. This blog posting demonstrates how this technique can be scripted.

Scripts for optimizer statistics

gather.table.stats.sql

Gather stats with the default options (no method_opt):

define ownname=&&1
define tabname=&&2
whenever oserror exit 1
@ sqlerx.sql
set verify off
set serveroutput on

declare
 l_fmt clob := q'{Gathered stats on %s.%s with default options.}';
begin
 dbms_stats.gather_table_stats (
 ownname => '&&ownname',
 tabname => '&&tabname'
 );
 out_pkg.printf(l_fmt, '&&ownname', '&&tabname' );
end;
/

Note, at this juncture, that every sql file that reads other sql files (with”@”) needs “whenever oserror exit 1”

dba.tab.col.statistics.sql

Display column statistics and show whether there is a histogram.

define owner=&&1
define tablename=&&2
whenever oserror exit 1
@ l32k.p50k.sql
@ columnformat.sql
set verify off

select column_name, num_distinct, histogram, num_buckets
from dba_tab_col_statistics
where owner = '&&owner'
and table_name = '&&tablename'
order by column_name;

rpt.col.usage.sql

Oracle decides whether to gather a histogram base on workload on skewed columns, as recorded in col_usage$. You can run report_col_usage to check on what workload Oracle has recorded. report_col_usage will not work unless you set long.

define ownname=&&1
define tabname=&&2
whenever oserror exit 1
@ sqlerx.sql
@ l32k.p50k.sql

set verify off

set long 2000000000
exec dbms_stats.flush_database_monitoring_info;
column rpt format a160
select dbms_stats.report_col_usage('&&ownname','&&tabname') rpt from dual;

prefer.delete.histo.sql

The delete histogram script that is the subject of this blog post. The script does not actually delete the histogram. It simply arranges that it will be delete on the next gather stats. You can either delete the histogram right away by calling dbms_stats.delete_column_stats, or you can gather stats with the default options (no method_opt).

define ownname=&&1
define tabname=&&2
define colname=&&3
whenever oserror exit 1
@ sqlerx.sql
set verify off
set serveroutput on

declare
 l_fmt clob := q'{Preference for no histogram on %s.%s is set.}';
begin
 dbms_stats.set_table_prefs (
 ownname => '&&ownname',
 tabname => '&&tabname',
 pname => 'method_opt',
 pvalue => 'for columns size 1 &&colname'
 );
 out_pkg.printf(l_fmt, '&&ownname', '&&tabname' );
end;
/

dba.tab.stat.prefs.sql

Use this script for reporting table stat prefs

define owner=&&1
whenever oserror exit 1
@ sqlerx.sql
@ l32k.p50k.sql
@ columnformat.sql
set verify off

column preference_name format a30
column preference_value format a50

select p.owner,
 p.table_name,
 p.preference_name,
 p.preference_value
from dba_tab_stat_prefs p
where owner = '&&owner'
order by
p.owner,
p.table_name,
p.preference_name;

Script in SQLPATH

Scripts mentioned in this blog post that I keep in a common directory for reuse.

sqlerx.sql

whenever sqlerror exit 1

l32k.p50k.sql

set linesize 32767
set pagesize 50000
set trimspool on

columnformat.sql

A very long, and ever growing, file of catalog column names.

column histogram format a20
column account_status format a20
column column_name format a30
column constraint_name format a30
column consumer_group format a30
etc.

The demo script

demo.delete.histo.sql

whenever oserror exit 1
@ l32k.p50k.sql
@ columnformat.sql
set verify off

drop table t purge;
drop sequence s;
@ sqlerx.sql
create table t
(
k number,
g varchar2(1)
);
create sequence s;

prompt Generate skewed data in column G
insert into T (k, g ) values ( s.nextval, 'M' );
insert into T (k, g ) values ( s.nextval, 'F' );
insert into T (k, g ) values ( s.nextval, 'M' );

commit;

@ gather.table.stats.sql U T
@ dba.tab.col.statistics.sql

prompt generating workload on column G
select count(*) from t where g = 'Z';

prompt checking col_usage$
@ rpt.col.usage.sql U T

@ gather.table.stats.sql U T
@ dba.tab.col.statistics.sql
prompt Yes, there is a histogram.

@ prefer.delete.histo.sql U T G

@ gather.table.stats.sql U T

prompt check existence of histogram on column G
@ dba.tab.col.statistics.sql
prompt The histogram is gone!

Demo script output

[oracle@stormking db12201 stats]$ sqlplus u/u @ demo.delete.histo.sql
SQL*Plus: Release 12.2.0.1.0 Production on Sun Feb 4 20:20:33 2018
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Last Successful login time: Sun Feb 04 2018 20:16:14 -05:00
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
Table dropped.
Sequence dropped.
Table created.
Sequence created.
Generate skewed data in column G
1 row created.
1 row created.
1 row created.
Commit complete.
Gathered stats on U.T with default options.
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS
------------------------------ ------------ -------------------- -----------
G 2 NONE 1
K 3 NONE 1
generating workload on column G
COUNT(*)
----------
 0
checking col_usage$
PL/SQL procedure successfully completed.
RPT
---------------------------------------------------------------------------
LEGEND:
.......
EQ : Used in single table EQuality predicate
RANGE : Used in single table RANGE predicate
LIKE : Used in single table LIKE predicate
NULL : Used in single table is (not) NULL predicate
EQ_JOIN : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate
FILTER : Used in single table FILTER predicate
JOIN : Used in JOIN predicate
GROUP_BY : Used in GROUP BY expression
...........................................................................
###########################################################################
COLUMN USAGE REPORT FOR U.T
...........................
1. G : EQ
###########################################################################
Gathered stats on U.T with default options.
PL/SQL procedure successfully completed.
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS
------------------------------ ------------ -------------------- -----------
G 2 FREQUENCY 2
K 3 NONE 1
Yes, there is a histogram.
Preference for no histogram on U.T is set.
PL/SQL procedure successfully completed.
Gathered stats on U.T with default options.
PL/SQL procedure successfully completed.
check existence of histogram on column G
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS
------------------------------ ------------ -------------------- ----------
G 2 NONE 1
K 3 NONE 1
The histogram is gone!
Disconnected from Oracle Database 12c Enterprise Edition 
Release 12.2.0.1.0 - 64bit Production

I have tested this script on 11.2, 12.1, and 12.2.

To summarize, I have deleted the histogram on table U.T, column G. I have done this not directly, by issuing delete_column_stats, but indirectly, by issuing set_table_prefs.

Notes

Gather stats option “force” overrides locked statistics but does not override table preference.

SQL> exec dbms_stats.gather_table_stats(ownname=>'U',tabname=>'T',
force=>true)
PL/SQL procedure successfully completed.
SQL> @ dba.tab.col.statistics.sql U T
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS
------------------------------ ------------ -------------------- -----------
G 2 NONE 1
K 3 NONE 1

The histogram remains deleted even though option “force” was used.

2. Gathering stats with non-default options can re-create the histogram.

SQL> exec dbms_stats.gather_table_stats(ownname=>'U',tabname=>'T',
method_opt=>'for all columns size 254')
PL/SQL procedure successfully completed.
SQL> @ dba.tab.col.statistics.sql U T
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS
------------------------------ ------------ -------------------- -----------
G 2 FREQUENCY 2
K 3 FREQUENCY 3

I want to point out a bad practice here. Routinely gathering histograms on all columns is inadvisable. It is better to start off by gathering stats by using the default options, and then let Oracle decide, based on col_usage$, whether to gather a histogram.

3. The earliest reference I found on this technique of using set_table_prefs to make sure that histograms stay deleted is a 2011 article in Oracle’s Optimizer Blog.

4. Whether table T needs a histogram on column G is a matter unto itself. The answer might be different depending on whether G, M, and F represent “gender”, “male”, and “female” or “granularity”, “medium” and “fine”, whether the distribution of M and F is skewed, whether G appears as a predicate in the application query, whether there is an index on G, and whether a histogram is causing query plan stability issues, or other issues.

5. Others have demonstrated data skew using a high number of rows. I demonstrate skew with only 3 rows.

6. Likewise, others have demonstrated workload by running a high number of statements. I run only one query. Note that a query for a nonexistent value, g = ‘Z’, serves the purpose.

Multiple columns per table

You cannot use script prefer.delete.histo.sql to delete the histogram on more than one column per table. If you try, you get this result.

SQL> @ prefer.delete.histo.sql U T K
Preference for no histogram on U.T is set.
PL/SQL procedure successfully completed.
SQL> @ dba.tab.stat.prefs.sql U
OWNER TABLE_NAME PREFERENCE_NAME PREFERENCE_VALUE
------------------------------ ------------------------------ ------------
U T METHOD_OPT FOR COLUMNS SIZE 1 K

You must state the prefs in a single command, such as:

prefer.delete.histo.sq2

begin
 dbms_stats.set_table_prefs (
 ownname => 'U',
 tabname => 'T',
 pname => 'method_opt',
 pvalue => 'for columns size 1 G size 1 K'
 );
end;
/

Result:

SQL> @ prefer.delete.histo.sq2
PL/SQL procedure successfully completed.
SQL> @ dba.tab.stat.prefs.sql U
OWNER TABLE_NAME PREFERENCE_NAME PREFERENCE_VALUE
------------------------------ ------------------------------ -------------
U T METHOD_OPT FOR COLUMNS SIZE 1 G SIZE 1 K

Use of table prefs in a DBA organization

In a DBA group, members have different opinions based on different experiences. Some DBAs want to lock table stats. Others want to disable the stats gathering job. Sometimes non-DBAs, like developers and data center managers, get into the act with their own opinions.

Using set_table_prefs is offered here as an alternative. dba.tab.stat.prefs.sql can be used to report on tables that have the preference for no histogram set.

The best thing to do is come up with a consensus that is based on principles, current recommendations from Oracle, and rigorous testing.