ASMLib on Red Hat 9

by Brian Fitzgerald

Introduction

Oracle has desupported ASM filter driver (AFD) 2806979.1. DBAs looking to support RAC need to look elsewhere for ASM management software. The updated asmlib release fills the bill and is available on Red Hat Linux, but finding the downloads is not straightforward.

Download

On RHEL 9, you will need these downloads:

oracleasm-support

https://yum.oracle.com/repo/OracleLinux/OL9/addons/x86_64/

oracleasm-support-3.1.0-10.el9.x86_64.rpm

oracleasmlib

https://www.oracle.com/linux/downloads/linux-asmlib-v9-downloads.html

oracleasmlib-3.1.0-6.el9.x86_64.rpm

Uninstall AFD

# /u01/app/oracle_grid/product/1930/grid/bin/afdroot uninstall

Install and start

# uname -r
5.14.0-427.42.1.el9_4.x86_64
# rpm -ihv oracleasm-support-3.1.0-10.el9.x86_64.rpm
# rpm -ihv oracleasmlib-3.1.0-6.el9.x86_64.rpm
# systemctl start oracleasm.service
# oracleasm scandisks
# oracleasm listdisks
ORA_ASM_GRID1_01
ORA_ASM_GRID1_02
ORA_ASM_GRID1_03

Conclusion

asmlib is a convenient drop-in replacement for afd. asmlib protects your ASM disks from illegal writers, just as AFD did. The downloads can be found off the beaten track.

ASMCMD-9520: AFD is not Loaded after Red Hat update

by Brian Fitzgerald

Introduction

Your Red Hat kernel got upgraded and now AFD does not load. Solution: run “grubby –set-default”. This article applies to Oracle Database 19c on Red Hat Enterprise Linux 9 on-premises.

Background

ASM filter driver (AFD) requires loading kernel module oracleafd.ko. The kernel modules are distributed in release updates and are installed in $ORACLE_HOME/usm/install/Oracle. AFD kernel modules are tied to a specific sub-version of Red Hat EL 9, and Oracle’s distribution of new AFD kernel modules may lag the Red Hat release by 6 months. For example, rhel9_4 was released on April 30, 2024, but Oracle did not release the compatible AFD kernel module until RU 19.25 of October 15, 2024. If you issued “dnf update” or “yum update” after April 30, but before installing RU 19.25, then AFD has stopped working. You must downgrade your kernel. Refer to ACFS and AFD Support On OS Platforms (Certification Matrix). (Doc ID 1369107.1) for up-to-date AFD kernel driver release information.

rhel9_5 was released on November 13, 2024. If you issued “dnf update,” then AFD has stopped working. As of this writing, Oracle has not released an rhel9_5 AFD kernel module, so you must downgrade your kernel.

Don’t do it!

Don’t run “dnf update”!

Oh no, you did it!

“dnf update” got run and now AFD does not load. Your Oracle database is down!

Fix it!

Fix this issue simply by identifying your previous kernel file and running “grubby –set-default”. Reboot.

Yay, it’s fixed!

Notice that filtering is not supported on Red Hat 9. Be careful not to overwrite your ASM device!

No filtering in RHEL9

ASM filter driver is designed to block IO from programs except for Oracle binaries. Filtering works in rhel7. You can’t overwrite an oracle device with dd, for example:

Refer to Oracle Automatic Storage Management Filter Driver (ASMFD) (Doc ID 2806979.1) for news about AFD filtering. Exercise care when handling Oracle devices. For example:

dd overwrote /dev/nvme3n1. Your data is wiped out. The ironically named “ASM Filter Driver” did not filter the non-Oracle I/O.

Common SA commands such as parted could corrupt your disk:

Be careful!

Red Hat release

Notice that the kernel is at rhel9_4, but the operating system is at rhel9_5.

Conclusion

We covered these points:

  • Oracle AFD depends on a kernel module.
  • In Red Hat 9, the AFD kernel module is tied to a specific sub-version.
  • Oracle will release the needed AFD module after each Red Hat 9 sub-version release.
  • Depending on what Oracle RU you have installed, dnf update may install a kernel that is incompatible with your AFD module.
  • You can fix your problem by running “grubby –set-default”
  • You can upgrade to a specific kernel version.
  • ASM filter driver no longer filters.
  • Administrative commands could wipe out your disks.
  • Exercise greater care without filtering present.

Bug: trouble with pipe character in Oracle TDE keystore password

by Brian Fitzgerald

Introduction

Using the pipe character (“|”) in an Oracle TDE keystore password leads to an unrecoverable state. This, apparently, is an Oracle bug.

Demonstration

Version 19.22.0.0.0 on Linux. First:

administer key management 
alter keystore password 
force keystore 
identified by "asdf1234WXYZ$" 
set "qwer5678|" with backup;

keystore altered.

Next, try to use the password:

administer key management 
set keystore open force keystore 
identified by "qwer5678|";
*
ERROR at line 1:
ORA-28353: failed to open wallet

From here, you can’t change the password back to the original.

administer key management 
alter keystore password 
force keystore 
identified by "qwer5678|" 
set "asdf1234WXYZ$" with backup;
*
ERROR at line 1:
ORA-28353: failed to open wallet

No other printable ASCII characters give trouble, except that I did not test double quote, single quote, or ampersand. (“, ‘. &).

Solution attempt 1 re-point alias

The wallet is now inaccessible. The wallet with the known password should still be in ASM. In this example, the wallet was created on Nov 29.

$ asmcmd ls -l +DATA1/PTDE/tde
Type Redund Striped Time Sys Name
AUTOLOGIN_KEY_STORE UNPROT COARSE JAN 27 17:00:00 N cwallet.sso => +DATA1/PTDE/AUTOLOGIN_KEY_STORE/cwallet.287.1154126305
KEY_STORE UNPROT COARSE JAN 27 09:00:00 N ewallet.p12 => +DATA1/PTDE/KEY_STORE/ewallet.259.1154126305
KEY_STORE UNPROT COARSE NOV 28 22:00:00 N ewallet_2023112903382447.p12 => +DATA1/PTDE/KEY_STORE/ewallet.296.1154126307
KEY_STORE UNPROT COARSE JAN 27 08:00:00 N ewallet_2024012713401060.p12 => +DATA1/PTDE/KEY_STORE/ewallet.295.1159346411
KEY_STORE UNPROT COARSE JAN 27 08:00:00 N ewallet_2024012713542108.p12 => +DATA1/PTDE/KEY_STORE/ewallet.285.1159347261
etc.

Notice date “20231129” in the good wallet alias name. Make careful note of where that alias points to: +DATA1/PTDE/KEY_STORE/ewallet.296.1154126307

You can re-point ewallet.p12 to the good wallet:

$ asmcmd rmalias +DATA1/PTDE/tde/ewallet.p12
$ asmcmd rmalias +DATA1/PTDE/tde/ewallet_2023112903382447.p12
$ asmcmd mkalias +DATA1/PTDE/KEY_STORE/ewallet.296.1154126307 +DATA1/PTDE/tde/ewallet.p12

Check:

administer key management
2 set
3 keystore open
4 force keystore
5 identified by "old-password";

keystore altered.

Everything seems to work now. Opening the database, backup the keystore, Data Guard managed recovery.

Solution attempt 2 restore wallet

Suppose you have a good wallet backup in a folder created Nov 29th:

/u03/tde/lib/dba/backup/hostname/ptde/20231129.224025

SQL> shutdown abort
ORACLE instance shut down.
$ asmcmd rm -rf +DATA1/PTDE/KEY_STORE/
$ asmcmd rm -rf +DATA1/PTDE/AUTOLOGIN_KEY_STORE/
SQL> startup nomount
ORACLE instance started.

administer key management
create
keystore
identified by "old-password";

keystore altered.

administer key management
set keystore open
identified by "old-password";

keystore altered.

administer key management
merge keystore '/u03/tde/lib/dba/backup/hostname/ptde/20231129.224025'
identified by "old-password"
into existing
keystore '+DATA1/PTDE/tde'
identified by "old-password"
with backup;

keystore altered.

administer key management
create local auto_login keystore from
keystore identified by "old-password";

keystore altered.

SQL> alter database mount;

Database altered.

SQL> alter database open;

Database altered.

Unsuccessful

There are issues:

2024-01-28T11:31:30.934368-05:00
WARNING: the following master key for tablespace 4 (file # 7) does not exist in the current keystore.
Please check if the master key is successfully imported from the source keystore.
2024-01-28T11:31:30.934406-05:00
kcbtse_populate_tbske_pga: ena 4 flag 2f mkloc 1
encrypted key 7f54d9fb1f800a7a4b0b8b48e450a72149534caef97a161f3267094abf140ba2
mkid 2e30a453ade64f7cbfb326e680302282
SQL> create table ttde (n number) segment creation immediate tablespace USERS;
create table ttde (n number) segment creation immediate tablespace USERS
*
ERROR at line 1:
ORA-28374: typed master key not found in wallet
RMAN> backup tablespace users;

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 01/28/2024 11:31:31
ORA-19914: unable to encrypt backup
ORA-28361: master key not yet set

The problem is that once the password was set with the pipe characater, the new master password in the database is out of sync with the wallet.

You could restore an earlier database and wallet backup.

Conclusion

Do not use “|” in an Oracle TDE keystore password.

Have a tde wallet recovery strategy.

How wallet locality is determined

By Brian Fitzgerald

Introduction

With orapki, you can create a local, auto-open wallet. The document states:

You cannot move local auto-login wallets to another computer. They must be used on the host on which they are created.

Questions arise:

  • How secure is the “local” feature?
  • How is “local” determined?
  • How could I open a wallet if it gets restored to a different host under unplanned circumstances?

This article sets out to answer these questions.

How to uniquely identify a host

There are multiple host attributes that one might use to uniquely identify a host, including:

  • hostname
  • IP address
  • hostid
  • MAC address

I tried two of these.

Setup

As root, create an auto-login local wallet

Check that the wallet does not require a password.

Test

As root, change the hostname. Retest the wallet. orapki prompts for a password, so the wallet is not auto-login anymore. That demonstrates that oracle checks the system hostname to determine whether the wallet is on the original host. We are done!

Before that, I tried changing hostid but found no effect on wallet locality.

Security implications

Oracle states “Local auto-login wallets are used for scenarios where additional security is required”. However, one can defeat the measure simply by issuing the “hostname” command. It is clear that local auto-login wallets offer little in the way of real security.

Conclusion

Oracle determines whether a local auto-login wallet is on the host where it was created by checking the system hostname. This feature is easy to spoof and does not substantially enhance security. In case of an unplanned restore to a different host, open the wallet by changing the new host’s hostname by issuing the hostname command as root.

10 things you didn’t know about tablespace quotas

10 things you didn’t know about tablespace quotas

by Brian Fitzgerald

Introduction

DBAs use tablespace quotas to limit where segments can be placed. Sometimes grant and revoke interact with quotas in surprising ways.

Setup

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Aug 31 22:59:29 2022
Version 19.16.0.0.0

Copyright (c) 1982, 2022, Oracle. All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL> set linesize 32767
SQL> set trimspool on
SQL> col segment_name format a30
SQL> col granted_role format a30
SQL> create user U identified by U default tablespace USERS;

User created.

1. When you grant role DBA to a user, unlimited tablespace gets granted.

Unlimited tablespace is a separate, special case side effect of granting DBA that is internal to oracle.

SQL> grant dba to U;

Grant succeeded.

SQL> select privilege from dba_sys_privs where grantee = ‘U’;

PRIVILEGE
—————————————-
UNLIMITED TABLESPACE

2. When you revoke DBA from a user, unlimited tablespace gets revoked.

Exercise care when tightening security. Before you revoke DBA from a user, check whether that user owns segments. Compensate by grant tablespace quotas. Failure to grant a tablespace quota will lead to a loss of ability to insert into tables owned by that user.

SQL> revoke dba from U;

Revoke succeeded.

SQL> select privilege from dba_sys_privs where grantee = 'U';

no rows selected

3. Unlimited tablespace gets revoked even if it had been granted separately.

SQL> grant dba to U;

Grant succeeded.

SQL> grant unlimited tablespace to U;

Grant succeeded.

SQL> revoke dba from U;

Revoke succeeded.

SQL> select privilege from dba_sys_privs where grantee = 'U';

no rows selected

4. If you revoke unlimited tablespace from a user with DBA role, that user keeps DBA role.

SQL> grant dba to U;

Grant succeeded.

SQL> revoke unlimited tablespace from U;

Revoke succeeded.

SQL> select granted_role from dba_role_privs where grantee = 'U';

GRANTED_ROLE
------------------------------
DBA

5. You cannot grant unlimited tablespace privilege to a role.

SQL> create role R;

Role created.

SQL> grant unlimited tablespace to R;
grant unlimited tablespace to R
*
ERROR at line 1:
ORA-01931: cannot grant UNLIMITED TABLESPACE to a role

No role has unlimited tablespace privilege, not even DBA.

6. You can revoke a quota from that a segment needs the quota.

This is seldom done intentionally. If you do, the segment cannot extend.

SQL> alter user U quota unlimited on USERS;

User altered.

SQL> create table U.T ( N number) segment creation immediate;

Table created.

SQL>
SQL> select grantee, privilege from dba_sys_privs
2 where grantee = 'U' and privilege = 'UNLIMITED TABLESPACE';

no rows selected

SQL>
SQL> select segment_name, bytes, extents from dba_segments 
2 where owner = 'U'
3 and segment_name = 'T'
4 and segment_type = 'TABLE';

SEGMENT_NAME BYTES EXTENTS
------------------------------ ---------- ----------
T 65536 1

SQL> alter user U quota 0 on USERS;

User altered.

SQL> insert into U.T select level from dual connect by level <= 1;

1 row created.

SQL> insert into U.T select level from dual connect by level <= 1000000;
insert into U.T select level from dual connect by level <= 1000000
*
ERROR at line 1:
ORA-01536: space quota exceeded for tablespace 'USERS'

7. Without a quota, you cannot move a table.

Moving a table can save space, but with an insufficient quota, you can’t do it

SQL> alter table U.T move;
alter table U.T move
*
ERROR at line 1:
ORA-01536: space quota exceeded for tablespace 'USERS'

8. When you revoke unlimited tablespace you also revoke all limited quotas.

SQL> grant unlimited tablespace to U;

Grant succeeded.

SQL> alter user U quota 10g on USERS;

User altered.

SQL> create tablespace U2;

Tablespace created.

SQL> alter user U quota unlimited on U2;

User altered.

SQL>
SQL> select username, tablespace_name, max_blocks
2 from dba_ts_quotas where username = 'U';

U TABLESPACE_NAME MAX_BLOCKS
- ------------------------------ ----------
U USERS 1310720
U U2 -1

SQL>
SQL> revoke unlimited tablespace from U;

Revoke succeeded.

SQL>
SQL> select username, tablespace_name, max_blocks
2 from dba_ts_quotas where username = 'U';

U TABLESPACE_NAME MAX_BLOCKS
- ------------------------------ ----------
U U2 -1

Notice that the limited quota on USERS got revoked, but the unlimited quota on U2 remains. That’s not exactly what the manual says: “If you later revoke the privilege, then you must explicitly grant quotas to individual tablespaces.”

Because revoking DBA revokes unlimited tablespace, it follows that revoking DBA revokes limited (finite) quotas.

9. You can grant a quota greater than 2 TB.

SQL> alter user U quota 10T on U2;

User altered.

The manual says “The maximum amount of space that you can assign for a tablespace is 2 TB.”

10. A quota could exceed the tablespace’s maximum size

SQL> select username, tablespace_name, max_blocks
2 from dba_ts_quotas where username = 'U';

U TABLESPACE_NAME MAX_BLOCKS
- ------------------------------ ----------
U U2 1342177280

SQL> select sum(maxblocks) from dba_data_files
2 where tablespace_name = 'U2';

SUM(MAXBLOCKS)
--------------
4194302

So here are 10 things you did not know about tablespace quotas. Here is one more:

Bonus: A user running import does not require a quota.

The owner of the segment needs the quota, not the user running import. That’s how it works. If you pre-create the user, you must grant the quota ahead of time. If the segment owner has no quota, then you will get ORA-01950: no privileges on tablespace.

SQL> create user U identified by U default tablespace USERS;

User created.

SQL> alter user U quota unlimited on USERS;

User altered.

SQL> create table U.T ( N number) segment creation immediate;

Table created.

$ cat exp.u.par
directory=d
dumpfile=exp.u.t.dmp
logfile=exp.u.t.log
reuse_dumpfiles=true
tables=
u.t

$ expdp "'/ as sysdba'" parfile=exp.u.par

Export: Release 19.0.0.0.0 - Production on Thu Sep 1 00:16:02 2022
Version 19.16.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Starting "SYS"."SYS_EXPORT_TABLE_01": "/******** AS SYSDBA" parfile=exp.u.par
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "U"."T" 0 KB 0 rows
Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TABLE_01 is:
/u99/exp/d/exp.u.t.dmp
Job "SYS"."SYS_EXPORT_TABLE_01" successfully completed at Thu Sep 1 00:19:07 2022 elapsed 0 00:03:05

SQL> drop user U cascade;

User dropped.

SQL> create user U identified by U default tablespace USERS;

User created.

$ cat imp.u.par
directory=d
dumpfile=exp.u.t.dmp
logfile=imp.u.t.log

$ impdp "'/ as sysdba'" parfile=imp.u.par

Import: Release 19.0.0.0.0 - Production on Thu Sep 1 00:24:48 2022
Version 19.16.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Master table "SYS"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_FULL_01": "/******** AS SYSDBA" parfile=imp.u.par
Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39083: Object type TABLE:"U"."T" failed to create with error:
ORA-01950: no privileges on tablespace 'USERS'

Failing sql is:
CREATE TABLE "U"."T" ("N" NUMBER) SEGMENT CREATION IMMEDIATE PCTFREE 10 
PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 
NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 
FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT 
CELL_FLASH_CACHE DEFAULT)
 TABLESPACE "USERS"

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "SYS"."SYS_IMPORT_FULL_01" completed with 1 error(s) at Thu Sep 1 
00:24:57 2022 elapsed 0 00:00:08

Cleanup

SQL> drop user U cascade;

User dropped.

SQL> drop role R;

Role dropped.

SQL> drop tablespace U2 including contents and datafiles;

Tablespace dropped.

Conclusion

Exercise care when revoking DBA or unlimited tablespace. Be sure to compensate by issuing needed quotas. Otherwise, users or applications will get ORA-01536: space quota exceeded.

Elastic Block Store throughput testing with Orion

By Brian Fitzgerald

Introduction

ORacle I/O Numbers (Orion) results for EBS volumes are presented here. Multiple configurations of EC2 instance type, number of devices, IOPS configuration, and throughput configuration are tested. The objective is to identify the best performing and most economical system for use in Oracle Automatic Storage Management (ASM).

Amazon AWS EC2 EBS

AWS EC2 offers I/O capacity for a price. You get a slice of the underlying hardware’s full bandwidth depending on how much you pay. The actual throughput depends on the EC2 instance’s device controller and the underlying volume. EC2 device controller capacity depends on instance class (m5, r5b, etc.), and size (large, 4xlarge, etc.). EBS volume capacity depends on volume type (gp3, io2, etc.) and configuration (throughput and IOPS). Maximum I/O capacity will be the lesser of EC2 capacity and EBS volume capacity, provided that you get the layout right.

The I/O capabilities of the EBS volume types that were tested are summarized here:

Type gp3 gp2 st1 io2
Purpose General General Low cost High IOPs, low latency
Medium ssd ssd magnetic ssd
Configurable throughput Yes No No No
Maximum throughput configuration (Mbps) 1000
Configurable IOPS Yes Yes
Max IOPS configuration 160000 256,000

The EBS test volumes were not encrypted.

This article places emphasis on optimizing throughput, but some IOPS and latency observations are presented here as well. As a limiting case, we are interested in 100% write I/O throughput capacity.

I/O throughput rating on EC2

We are mainly concerned with the modern, Nitro EC2 m5, c5, r5, and r5b instance types. Throughput is defined as the rate, in Mbps for large (128 KiB) I/O, and could refer to a read, write, or mixed (read-write) workload.

EC2 throughput rating can be found by running “aws ec2 describe-instance-types”, and is divided into two ranges, according to instance size, as shown in the following figure. The number of CPUs in EC2 maps to instance size thus:

Number of CPUs Instance size
2 large
4 xlarge
8 2xlarge
16 4xlarge
32 8xlarge
48 12xlarge
64 16xlarge
96 24xlarge

At the low end, up to 4xlarge, throughput is capped at a single value, regardless of instance size. That cap depends on instance class. In r5b, the cap is 1250 Mbps, but for c5, m5, and r5, throughput is capped at 593.75 Mbps. Other instance classes have lower throughput caps. At the high end, 8xlarge and up, throughput depends on the instance size. The r5b instance class tops out at 7500 Mbps. m5, c5, and r5 classes have throughput that ranges from 850 Mbps to 2375 Mbps, depending on instance type.

By now we have described the available volume types and instance types. So let’s pose this question. If we configure 8 gp3 volumes at 1000 Mbps each, then the system should deliver 8000 Mbps throughput, right?

Wrong.

No EC2-EBS-Linux system today delivers 8000 Mbps throughput. You will get, at best, the throughput offered by the EC2 instance type. For example, if you build an m5.4xlarge instance, then the highest possible throughput will be 593.75 Mbps, as I mentioned. Also, 593.75 Mbps is the most you will get. Actually achieving that requires configuring multiple volumes.

Operating system description

The test system is an Amazon AWS EC2 instance running Red Hat Linux 7.9. The test software is orion, which is found in the Oracle home. The number of CPUs, the amount of memory, and the device controller interface depend on the EC2 instance type. For example, the m5.xlarge has 4 CPUs, 16 GB RAM, and a non-volatile memory express (NVMe) device controller interface.

Test description

100% write I/O was tested with orion. General purpose volumes (gp2 and gp3), and the lower cost st1 storage were tested with large writes. I/O optimized storage (io2) was tested at small I/O. Each test ran from 4 to 20 minutes, depending on the number of volumes. For the throughput tests, “Maximum Large MBPS” was extracted from the summary.txt file. In the small I/O tests, “Maximum Small IOPS” and “Minimum Small Latency” were extracted.

Instance type and volume type

In this section, the influence of instance type and volume type in investigated. The new I/O-optimized r5b instance type was tested with gp3 storage. General purpose m5 instance class was tested with gp2, gp3, and io2 storage. Additionally, the lower cost, burstable t3 instance class, and the previous generation m4 instance class were tested. The number of CPUs ranged from 2 to 96. As a rule of thumb, the amount of memory for the m5, t3, and m4 instance classes, in GB, is four times the number of CPU. Eight 125-GB volumes were tested in every run. In all, there were 31 test runs.

Throughput testing

General  purpose storage was tested for throughput. gp3 storage was configured for 1000 Mbps throughput per volume (the maximum) and 4000 IOPS The results are displayed here.Each point represents the maximum throughput obtained in a single test. The dark blue line, for example, displays Orion throughput results for 6 tests on r5b.large, r5b.4xlarge, r5b.8xlarge, r5b.12xlarge, r5b.16xlarge, r5b.24xlarge, The test points above 24 are results of separate tests on r5b.4xlarge, m5.4xlarge on gp3 volume type, m5.4xlarge on gp2 volume type, t3.4xlarge, m4.4xlarge, and m5.4xlarge on st1 volume type. Again, each point is a separate test, not Orion data points from a single test.

The figure shows actual system throughput for 8 volumes. We already said that just because you configure 8 1000 Mbps volumes does not mean that the observed throughput will be 8000 Mbps. Instead, the throughput is limited by the instance type.

The key points to notice in the throughput testing are:

  • The observed throughput is level across instance types up to 4xlarge (24 CPUs), just like the documentation says.
  • Starting at 8xlarge, (25 CPUs), gp3 throughput scales up with the instance size, reaching 7330 Mbps on the r5b.24xlarge instance type, and 2330 Mbps on m5.24xlarge, again, just like the documentation says.
  • Like gp3 on the m5, gp2 throughput is level at 580 Mbps for instances up to 4xlarge.
  • From 12xlarge and up, gp2 throughput is level at 1015 MPBs.
  • For t3.xlarge and up, throughput plateaus at 335 Mbps.
  • In older instance class m4, throughput ranges from 55 Mbps to 240 Mbps.

Notice that in high-end instance types, r5b throughput is better than 3x the m5 throughput. This is one example where actual observations match the marketing materials: New – Amazon EC2 R5b Instances Provide 3x Higher EBS Performance

By using the number of CPUs on the horizontal axis, we do not mean to imply that CPUs have a significant influence on system throughput in EC2. In fact, during ORION testing, CPU workload tends be low, and the same can be said for memory footprint. Plotting vs “Number of CPUs” is just meant to be a convenient method for having plotting I/O numbers vs instance size. AWS throttles throughput depending on instance type and size, separately from the number of CPUs.

IOPS testing

I/O Optimized storage io2 configured for 7000 IOPS per volume on the m5 class was also tested. 8 volumes were tested.Again, to clarify, configured IOPS for each volume was 7000. You might think that configuring 8 volumes are 7000 IOPS per volume should deliver 56000 IOPS. As you can see, actual IOPS is less.

The key IOPS testing observations are:

  • IOPs is flat at 18905.
  • Latency is flat at 600ms.

Conclusion

  • r5b throughput outperforms m5 by better than 3x.
  • I/O performance is level up to 4xlarge.
  • Starting at 8xlarge, throughput ramps up with instance class.
  • 24xlarge delivers the highest throughput.
  • t3 write throughput lags m5, and m4 write throughput lags t3.
  • The observed system I/O rate is much less than you would expect if you calculated I/O rate based on the volume configuration.

This section was an overview of the capabilities of various EC2 instance types and volume types with eight volumes. As we are about to see, tuning the number of volumes is crucial to optimizing instance performance.

Number of volumes

The influence of the number of volumes on I/O was tested for various instance types. The number of volumes ranged from 1 to 16. Throughput was tested on general purpose volume types gp2 and gp3. The gp3 storage was configured for 1000 Mbps and 4000 IOPs per volume. The results are shown in the following figure:Additionally, IOPS was measured on on I/O optimized volume type io2. IOPS was set at 7000 per volume. The results are shows here:The key findings are

  • For improved throughput, I/O should be distributed across multiple volumes.
  • In r5b.24xlarge, throughput increases all the way out to 7300 Mbps at 16 volumes.
  • In r5b8xlarge and m5.24xlarge, throughput levels off around 2400 Mbps at 8 volumes.
  • On m5.large, gp2 and gp3 throughput levels off at 580 Mbps at four volumes.
  • On m5.large, io2 IOPs plateaus at 18905 IOPs at four volumes.
  • io2 latency tends to remain below 625ms.

Be aware that the EC2 device attachment limit is 28, including the network adapter and up to 27 volumes, including the system disk. A practical Oracle ASM system could consist of two diskgroups and four or more disks per group.

In conclusion, one can expect improve I/O by implementing four volumes. In high end systems, configuring eight or more volumes is beneficial. Exercise prudence on the number of volumes, because throughput and IOPS charges are per-volume.

Configured volume throughput

So far, we have tested volumes that are over-provisioned, meaning that measured throughput turned out to be much less than rated throughput. You don’t get what you pay for. So how much throughput should you actually be configuring?

In this section, we investigate the effect of configured throughput versus actual throughput on gp3 volumes. gp3 volumes were configured at 3000 IOPs and a throughput range from 125 Mbps to 1000 Mbps. Various m5 and r5b instance types were tested. The results are shown here.In the m5.large, m5.24xlarge, and r5b.8xlarge cases, setting volume throughput to 375 Mbps achieves the maximum possible system throughput. Setting volume throughput higher than that leads to no further gains in performance. In r5b.12xlarge and r5b.24xlarge, performance can be maximized by setting throughput to 500 Mbps. In conclusion, although gp3 throughput can be adjusted to 1000 Mbps, there is no benefit to setting gp3 throughput above 500 Mbps, and only in rare cases is it beneficial to set gp3 throughput above 375 Mbps.

Intel vs. AMD

AWS offers a choice of Intel or AMD processor on Red Hat EC2. AMD EC2 prices are 10% lower than Intel. Rated EBS throughput on the AMD systems is less than on Intel.

AMD
Intel
instance type throughput (Mbps) instance type throughput (Mbps)
t3a.medium t3.medium
m5a.large 2880 m5.large 4750
m5a.24xlarge 13570 m5.24xlarge 19000

The test system configurations were:

Instance type CPUs Volumes Storage Throughput IOPS
t3[a].medium 2 4 st1
m5[a].large 2 8 gp3 375 3000
m5[a].24xlarge 96 8 gp3 375 3000

“t3[a].medium” refers to a comparison of a t3a.medium (AMD) system to a t3.medium (Intel) system, and so on, for m5[a].large and m5[a].24xlarge.At the low end instance types, AMD and Intel performed equally. At the high end, Intel edged AMD.

Systems for different purposes

Recommendations for three different type of systems are presented here. The target system is Oracle Database on top of Grid Infrastructure on Red Hat. Only systems with at least 2 CPUs and 4 GB RAM are considered. A “system” could refer to an ASM diskgroup, such as DATA, RECO, or REDO.

Low cost

You might need a low cost system for light development, well-tuned reports, archiving, etc. Use the t3a instance class and st1 storage.

General purpose

For general purpose production systems, use the m5 class. If you want to halve the memory, substitute c5 and save 9% to 11%. To double the system memory, use r5 and pay 24% to 30% more. Configure four or more gp3 volumes per ASM diskgroup at 125 Mbps to 375 Mbps throughput and 3000 IOPS.

High throughput

For highest large write throughput on EBS, use r5b.24xlarge, and at 16 gp3 volumes configured at 500 Mbps.

Example systems and cost

Example of the three types of system are presented in this table. This time, we present CPU and storage cost, assuming Red Hat Linux and on demand pricing in region us-east-1, as of 7/24/2021.

Purpose Instance Type Vol type Num CPU Vol Size Num Vols Thr IOPS Actual Mbps EC2 EBS Total
Low cost t3a.medium st1 2 125 8 245 $71 $45 $116
General purpose m5.large gp3 2 250 4 125 3000 497 $114 $80 $194
High throughput r5b.24xlarge gp3 96 1000 16 500 3000 7330 $5316 $1522 $6838

Do not waste money

Follow this guidance to avoid wasting money

  • Avoid io2 EBS for general purpose Oracle databases. The cost is prohibitive.
  • In gp3, configure throughput at 500 Mbps or less, and configure 3000 IOPS.
  • For good, economical performance, configure gp3 at 125 Mbps and 3000 IOPS.

Workload

This article offers guidance on configuring the AWS EBS at 100% write. The findings apply to any diskgroup, particularly to redo diskgroups and to databases with high throughput requirements, such as ETL systems. The information is among many factors to consider when specifying a practical database system. Orion is also capable of simulating a 100% read workload, and a mixed workload. Mixed workload Orion results could be used to look for further cost reductions. Actual database application performance depends not only on storage throughput, but also on processor design, amount and speed of memory, and table design.

ASM diskgroup layout

Oracle makes ASM diskgroup layout recommendations: RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent) (Doc ID 810394.1).

Oracle recommends a minimum 4 disks per diskgroup:My own test results show that a minimum of 4 LUNs per diskgroup lead to optimal throughput. Configuring 4 or more LUNs per diskgroup is extremely important.

Oracle recommends no more than 2 ASM diskgroups:Benefits of no more than 2 ASM diskgroups:

  • Simplified administration.
  • High throughput in all diskgroups.
  • Avoid approaching the EC2 attachment limit.

If you configure more than 2 diskgroups, you could find yourself making design compromises at exactly the wrong places. For example, online redo logs are critical, high throughput components. If you configure separate REDO diskgroups in EC2, then you may find it difficult to keep to 4 disks per diskgroup, and still allow for expansion and remain within the attachment limit. Or you may find that so many diskgroups leads to a manageability issue.

Conclusion

The r5b instance class with gp3 storage delivers the highest performing I/O. m5, c5, and r5 instance classes make well-performing, general purpose systems with high throughput. The t3a instance class with st1 storage makes a fair-performing, low cost system. Each ASM disk groups should be configured with a minimum of 4 disks and more than 4 disks for high throughput r5b systems. For optimal throughput, gp3 EBS should be configured at 3000 IOPS and no more than 500 Mbps.

Create spfile in ASM

By Brian Fitzgerald

Problem

Question: How do you create a spfile in ASM? If you issue “create spfile” too early, you will not know the path, so you would have to search ASM for it. The situation could arise while you are setting up a standby database.

Solution

Register the database with CRS first. Then, as the manual mentions, the create spfile statement “automatically updates the SPFILE in the database resource.”

Register the db with CRS

$ srvctl add database -database orcl -role physical_standby -startoption MOUNT -stopoption ABORT -instance orcl -oraclehome /u01/app/oracle/product/1212/db_1 -diskgroup DATA1,RECO1,
REDO1,REDO2
$ srvctl start database -database orcl
$ srvctl config database -database orcl
Database unique name: orcl
Database name:
Oracle home: /u01/app/oracle/product/1212/db_1
Oracle user: oracle
Spfile:
Password file:
Start options: mount
Stop options: abort
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Disk Groups: DATA1,RECO1,REDO1,REDO2
Services:
OSDBA group: dba
OSOPER group: dba
Database instance: orcl

Create a text file

You could create a pfile from memory, for example.

SQL> create pfile='initorcl.memory' from memory;
File created.

Edit the file

$ vi initorcl.memory

Delete extraneous underscore parameters.For example, here is a file with the basic parameters that you need for restoring and recovering a duplicate database:

audit_trail='DB'
cluster_database=FALSE
compatible='12.1.0.2.0'
control_files='+REDO1/ORCL/CONTROLFILE/current.258.1067960587'
control_files='+REDO2/ORCL/CONTROLFILE/current.258.1067960589' # Restore Controlfile
core_dump_dest='/u01/app/oracle/diag/rdbms/orcl/orcl/cdump'
db_cache_size=16G
db_create_file_dest='+DATA1'
db_create_online_log_dest_1='+REDO1'
db_create_online_log_dest_2='+REDO2'
db_files=1000
db_name='ORCL'
db_recovery_file_dest='+RECO1'
db_recovery_file_dest_size=1000G
db_unique_name='orcl'
local_listener='ORCL'
remote_login_passwordfile='EXCLUSIVE'
sga_max_size=31G # internally adjusted
sga_target=31G
use_large_pages='only'

Create the spfile

SQL> create spfile = '+DATA1' from pfile='initorcl.memory';

Check:

$ srvctl config database -database orcl
Database unique name: orcl
Database name:
Oracle home: /u01/app/oracle/product/1212/db_1
Oracle user: oracle
Spfile: +DATA1/ORCL/PARAMETERFILE/spfile.585.1068023283
Password file:
Start options: mount
Stop options: abort
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Disk Groups: DATA1,RECO1,REDO1,REDO2
Services:
OSDBA group: dba
OSOPER group: dba
Database instance: orcl

Notice that there was no need to create an ASM alias or log in to ASM to search for the file.

Conclusion

Register your database in CRS first, then create your spfile.

No disk groups mounted after Grid 12.1 upgrade to 18c

Introduction

After an apparently successful upgrade from Grid 12.1 to 18c, the ASM disk groups failed to mount.

Scenario

An upgrade from Grid 12.1 to 18c happened. The DBA had used gridSetup.sh.

After the upgrade, no disk groups were found to be mounted. There had been no warnings before this happened.

Other application groups were waiting, so there was pressure to bring up the system.

Symptoms

  1. No disk groups got mounted.
  2. This error appears in the ASM alert log:
ORA-15032: not all alterations performed
ORA-59303: The attribute compatible.asm (11.2.0.0.0) of the diskgroup being mounted should be 11.2.0.2.0 or higher.

Notice that tools that refer to disk groups will not work

asmcmd lsdsk –discovery shows ASM disks, but no disk groups.

$ asmcmd lsdsk --discovery -g
Inst_ID  Path
      1  ORCL:ASM_DATA_VOL1
      1  ORCL:ASM_DATA_VOL2
      1  ORCL:ASM_DATA_VOL3
      1  ORCL:ASM_DATA_VOL4
      1  ORCL:ASM_DATA_VOL5
      1  ORCL:ASM_RECO_VOL1
      1  ORCL:ASM_REDO_VOL1

Mount with force option will not work. Ex:

asmcmd mount -f DATA1

This will not work:

SQL> ALTER DISKGROUP DATA1 SET ATTRIBUTE 'compatible.asm' ='11.2.0.2.0';
ALTER DISKGROUP DATA1 SET ATTRIBUTE 'compatible.asm' ='11.2.0.2.0'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15001: diskgroup "DATA1" does not exist or is not mounted

This will not work:

SQL> alter diskgroup DATA1 mount;
alter diskgroup DATA1 mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-59303: The attribute compatible.asm (11.2.0.0.0) of the diskgroup being
mounted should be 11.2.0.2.0 or higher.

Notice that the ASM disks are ok

asmlib:

$ oracleasm listdisks
ASM_DATA_VOL1
ASM_DATA_VOL2
ASM_DATA_VOL3
ASM_DATA_VOL4
ASM_DATA_VOL5
ASM_RECO_VOL1
ASM_REDO_VOL1

kfed:

$ kfed op=read dev=/dev/oracleasm/disks/ASM_DATA_VOL1
kfbh.endian:                          1 ; 0x000: 0x01
etc..
kfdhdb.dskname:           ASM_DATA_VOL1 ; 0x028: length=13
kfdhdb.grpname:                   DATA1 ; 0x048: length=5
etc.
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000

The ASM disk and ASM group name look good.

Hex b refers to oracle version 11. The zeros following the 2 apparently mean that the disk group version is 11.2.0.0.0.

Cause

Oracle 12.2, 18c, and 19c binaries cannot mount a compatible 11.2.0.0.0 disk group. The manual states:

Starting with Oracle ASM version 12.2.0.1, the minimum and default settings for Oracle ASM disk group attributes are:

  • COMPATIBLE.ASM = 11.2.0.2

If you are reading the Oracle grid upgrade manual, you might not notice any checks that could prevent this mistake. The Oracle 19c Grid Upgrade manual checklist hints at the problem:

Our 18c gridSetup.sh as patched (18.3) does not check for this, and lets you start the upgrade, which leads to the problem.

Approach (with a downside)

Downgrade grid to the previous version (12.1).

In the 18c grid home, attempted this step from the 18c manual:

# cd crs/install
# ./rootcrs.sh -downgrade
Using configuration parameter file: /ora_local/apps/oracle_grid/product/18.0.0/grid_18c/crs/install/crsconfig_params
The log of current session can be found at:
  /ora_local/apps/oracle/crsdata/evp003-eldb/crsconfig/crsdowngrade_evp003-eldb_2021-01-20_10-48-42AM.log
2021/01/20 10:48:44 CLSRSC-457: Oracle Restart is currently configured and cannot be deconfigured using this Oracle Clusterware deconfiguration command.
Died at /ora_local/apps/oracle_grid/product/18.0.0/grid_18c/crs/install/crsdowngrade.pm line 260.
The command '/ora_local/apps/oracle_grid/product/18.0.0/grid_18c/perl/bin/perl -I/ora_local/apps/oracle_grid/product/18.0.0/grid_18c/perl/lib -I/ora_local/apps/oracle_grid/product/18.0.0/grid_18c/crs/install /ora_local/apps/oracle_grid/product/18.0.0/grid_18c/crs/install/rootcrs.pl -downgrade' execution failed

Did instead:

# cd crs/install
# ./roothas.sh -deconfig

This step wipes out ocr.

In the 12.1 grid home

# cd crs/install
# ./roothas.sh

Now grid is running on 12.1 with just the basics, but not cssd.

Downside

roothas deconfig wipes out your crs. You will have to manually start cssd and re-add your ASM and database resources.

Workaround

Start cssd, etc.

# crsctl start resource -all

Start ASM

To get things rolling, you could create an ASM pfile. Remember that ASM needs to know which disk groups to mount, so be sure to configure asm_diskgroups

init+ASM.20210120

large_pool_size          = 12M
instance_type            = "asm"
remote_login_passwordfile= "EXCLUSIVE"
asm_diskgroups =  DATA1,REDO1,RECOVERY1
asm_power_limit          = 1
diagnostic_dest          = "/ora_local/apps/oracle"

start

SQL> startup mount pfile='init+ASM.20210120'
ASM instance started

Total System Global Area 1136934472 bytes
Fixed Size                  8666696 bytes
Variable Size            1103101952 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
ASM diskgroups volume enabled

You can successfully create an spfile, with this error message:

SQL> create spfile='asmspfile.ora'  from pfile =  'init+ASM.20210120';
create spfile='asmspfile.ora'  from pfile =  'init+ASM.20210120'
*
ERROR at line 1:
ORA-29786: Oracle Restart attribute GET failed with error [Attribute 'SPFILE'
sts[200] lsts[0]]

Add asm to crs

$ srvctl add asm -spfile asmspfile.ora

Start databases. Details omitted for brevity. Start with sqlplus. configure in crs with srvctl add database.

Final state

  • Grid 12.1 is up
  • ASM is up with all disk groups mounted.
  • Databases up

Follow up steps

To reattempt the Grid 18c upgrade:

  • In all disk groups, change compatible.asm 11.2.0.2.0 or higher
  • Retry the Grid 18c upgrade

Conclusion

During an upgrade from Grid 12.1 to 18c, the step to upgrade the disk groups to compatible 11.2.0.2.0 got missed. As a result, the 18c software could not mount the disk groups. kfod revealed that the ASM disks were intact, though.

The downgrade step found in the manual, rootcrs.sh -downgrade, failed, so I ran roothas.sh -deconfig, followed by roothas.sh in the 12c grid home. That worked, but I had to start cssd manually, and register ASM and the databases.

Launch AMI by name in terraform

By Brian Fitzgerald

Introduction

Using Terraform, you can launch an AWS EC2 instance from any account and any region using a public AMI identified only by image name. Because you do not need to lookup the image id each time, your code can be simplified.

Approach

Red Hat account number

In the AWS console, notice that the Red Hat, Inc. account number is 309956199498.


AMI name

Find the Red Hat AMI image by name. In the console, navigate to EC2 Images AMIs. Filter on Owner: 309956199498, Architecture: 64-bit (x86), Virtualization type: HVM.

For example, “RHEL-7.9_HVM_GA-20200917-x86_64-0-Hourly2-GP2”. You could also search using the AWS command line:

$ aws ec2 describe-images --owners 309956199498 --filters "` cat filters.ami.json `" --region us-east-2

where filters.ami.json contains:

[
    {
        "Name": "architecture",
        "Values": [
             "x86_64"
        ]
    },
    {
        "Name": "virtualization-type",
        "Values": [
             "hvm"
        ]
    }
]

Terraform data source

Define a Terraform aws_ami data source.

ami-rhel.tf

data "aws_ami" "rhel" {
    most_recent = true
    owners = [
        "309956199498"
    ]
    filter {
        name   = "name"
        values = [
            "RHEL-7.9_HVM_GA-20200917-x86_64-0-Hourly2-GP2"
        ]
    }
}

aws instance resource

Use the data source in your Terraform aws_instance resource

provider "aws" {
    region      = var.region
}

resource "aws_instance" "your-ec2-rsrc" {
    ami           = data.aws_ami.rhel.image_id

and so on . . .

Launch the ec2 instance

Run “terraform apply”. Terraform determines the AMI ID. For example, ami-0d2bf41df19c4aac7.

Conclusion

By using the Terraform AWS AMI data source, you can launch a public AMI from any account and region. The AMI ID is different for each account and region.

EC2 maximum number of volumes

Introduction

The documented AWS EC2 EBS volume attachment limit is 27. The attachment limit affects the maximum number of Oracle ASM disks.

EC2 volume limit

Amazon AWS documents a 27-volume attachment limit. I have found that a t2 type creation attempt with 28 EBS volumes will abort.

error.png

Attempting to create an m5 (Nitro)  with 28 or more volumes will hang.

creating

The attachment limit is 28, including network interfaces, volumes, and instance store volumes. EC2 instances with one network interface can have up to 27 volumes attached.

EC2 instances with more than 27 volumes

I am aware of t2 EC2 systems with as many as 44 volumes attached. I have no information about how this was done, and what the customer’s support expectations are. I would be concerned about how such a system would respond to a change of instance type.

Oracle Database

In planning an Oracle database installation, you may need file system mounts as well. /u01, for example. The underlying volume counts against the maximum attachment count.

In laying out an ASM setup, consider a simple design, with as few ASM disks and ASM disk groups as needed.

Conclusion

Based on the available public information, I recommend limiting the number of EBS volume attachments to 27. Oracle database administrators might want to simplify their ASM implementation.