Trace only one logon on AWS RDS Oracle

Posted on April 17, 2022 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

Given an Oracle RDS database, multiple application servers and multiple connections, the task at hand is to

Create a logon trigger that will …
Run when the first connection arrives and not again after that, and …
Run trace 10046 …
Up to 10 MB only.
Identify the trace file
Download the trace file
Run tkprof

Background

A logon trigger fires for every login that matches the firing condition. If we trace every session, we could file the file system. We only want one example trace file.
In RDS, you do not have direct access to the operating system. You must somehow copy out the trace file.

Let’s go!

Grant

The first thing you need to get right is the grant. Trace requires “alter session” rights, which must be granted directly, not through a role. Suppose the user is “inuser”

grant alter session to inuser;

Create a sequence

create sequence inuser.user_trace_trg_seq;

Create the trigger

The trigger must be owned by the user that needs to be traced.

create or replace trigger inuser.user_trace_trg
after logon on database
when ( user = 'INUSER' )
declare
  l_val number;
begin
  l_val := inuser.user_trace_trg_seq.nextval;
  if l_val = 1
  then
    execute immediate
    q'{alter session set max_dump_file_size = '10m'}';
    execute immediate
    q'{alter session set tracefile_identifier = '}'||user||q'{_user_trace_trg'}';
    execute immediate
    q'{alter session set events '10046 trace name context forever, level 12'}';
  end if;
end;
/

Now wait for the user to login. The first the trigger fires time, nextval equals 1, so the trigger body runs. After that, the trigger body does not run.

Identify the trace file

select * from table(
  rdsadmin.rds_file_util.listdir('BDUMP')
)
where filename like '%INUSER_user_trace_trg.trc'
order by mtime;

FILENAME	TYPE	FILESIZE	MTIME
ORCL_ora_9022_INUSER_user_trace_trg.trc	file	10485856	2022-04-17 00:17:04

“Download” the trace file

You could transfer the file from rds to s3, and then from s3 to a local system. The other way is:

select text from table(
  rdsadmin.rds_file_util.read_text_file(
   :dir,
   :filename
  )
)

Using sqlalchemy and boto3, I have integrated this statement with AWS secrets manager in a convenient python script

$ download-rds-ora-dir-files.py --secret-id $sec --dir BDUMP --pat ORCL_ora_9022_INUSER_user_trace_trg.trc
number of files: 1

You could also run sqlplus and spool the output to a file.

set feedback off
set linesize 32767
set trimspool on
set pagesize 0

set termout off
spool ORCL_ora_9022_INUSER_user_trace_trg.trc

select text from table(
    rdsadmin.rds_file_util.read_text_file(
        'BDUMP',
        'ORCL_ora_9022_INUSER_user_trace_trg.trc'
    )
)
;

Run tkprof

tkprof ORCL_ora_9022_INUSER_user_trace_trg.trc ORCL_ora_9022_INUSER_user_trace_trg.tkp

Done!

Conclusion

The following concepts were covered

How to make a logon trigger body that runs only once.
Permissions and object ownership for a logon trigger.
How to download a trace file from RDS.

Session objects in python APIs

Posted on March 16, 2022 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

Database administrators understand “session” to mean an established database connection. It may come as a surprise that “session”, in several database-related libraries, refers to a data structure in the absence of any established connection.

Database session

A database session is an instance memory structure that holds an established connection’s state. The session contains facts such as a unique session id and the user name. The database session exists only after the transport (network connection) is established, the user has authenticated, and the memory structures have been allocated. The client application uses an API which holds a handle to the server side session. Application code uses the database API handle to issue commands and to receive responses and query results. The API is layered on top of a database driver, which associates the API handle with an operating system or network connection. In this way, client application to a server session via an API and a database driver. In databases, then, a client “handle” connects to a server “session”.

Other meanings of “session”

DBAs work with software APIs, such as python libraries, that do not establish a connection while allocating what they call a “session”. I will mention three APIs: requests, sqlalchemy, and boto3, in which “session” refers to a data structure that is not always associated with a connection.

Requests

Requests is a python HTTP library. HTTP is a connectionless protocol. Each request to the HTTP server can be implemented by a separate, new network connection that is disconnected after each call. In the following example, the call makes a TCP connection to the server, sends a request, receives a response, and closes the connection.

>>> import requests
>>> from json import loads
>>> r = requests.get('https://httpbin.org/get?q=Hello')
>>> type(r)
<class 'requests.models.Response'>

Let’s verify that no connection remains. Suspend your python repl by pressing ctrl-Z.

>>>
[1]+ Stopped python3
$

Is requests still connected?

$ lsof -n -p $(pgrep python3) | grep https

No output here means that requests is no longer connected.

Requests offers an advanced option that preserves state and a connection across calls.

>>> from requests import Session
>>> ses = Session()
>>> type(ses)
<class 'requests.sessions.Session'>
>>> ses.get('https://httpbin.org/get?q=Hello')
<Response [200]>

Is ses still connected? Let’s see.

$ lsof -n -p $(pgrep python3) | grep https
python3 3387 bf2190 3u IPv4 125425570 0t0 TCP 10.130.99.84:36878->52.86.136.68:https (ESTABLISHED)

Yes. The Session is connected to the https server. If you make another call within a short time, requests will make that call on the same socket. If the connection is idle for approximately 2.5 minutes, then requests will close the connection. If you make another call after that, requests will open a new socket.

To summarize, a requests Session is a client side handle that is not connected when it is created, but gets connected on first use. The session remains connected during use, and disconnects after an idle timeout.

SQL Alchemy

SQL Alchemy is a python SQL toolkit and object relational mapper. Two objects used for establishing a connection are the Engine and the Session. The Engine contains information about the driver and the connection string.

from sqlalchemy import create_engine

eng = create_engine(
    connection_string,
    max_identifier_length=128
)
print(type(eng))
<class 'sqlalchemy.engine.base.Engine'>

The Session is an object that will eventually hold an established connection.

from sqlalchemy.orm import sessionmaker
db_sess = sessionmaker(bind=eng)()
print(type(db_sess))
<class 'sqlalchemy.orm.session.Session'>

So far, the Session is not connected, but will be after a call to connection()

conn = db_sess.connection()
print(type(conn))
<class 'sqlalchemy.engine.base.Connection'>

conn holds an established connection. The other way to connect is to simply use the session.

for row in db_sess.execute(text('select dummy from dual')):
    print('%s' % row['dummy'])
X

Now db_sess holds an established connection. In conclusion, the SQLAlchemy Session is an object that is not initially connected. The Session connects by a call to connection() or on first use.

boto3

boto3 is the AWS python library. It is used to manage AWS services. One such service is Relational Database Service (RDS). In boto3, you make API calls to manage your environment. For example, you can list all RDS db instance details by calling describe_db_instances. To make a call, you must first allocate a boto3 client. The constructor for the RDS service is client(‘rds’). You can call the constructor and run describe directly like so:

>>> from boto3 import client
>>> len(client('rds').describe_db_instances()['DBInstances'])
30

The output from describe_db_instances is verbose, but the results are not needed for this example, so len is used here to summarize the result (30 db instances). Because describe was called directly from the constructor, the client object goes out of scope immediately, and the connection is dropped:

>>> 
[1]+ Stopped python3 
$ lsof -n -p $(pgrep python3) | grep https 
$

In the next example, the boto3 client is saved to variable rds

>>> from boto3 import client
>>> rds = client('rds')
>>> type(rds)
<class 'botocore.client.RDS'>
>>> len(rds.describe_db_instances()['DBInstances'])
30
>>>
[1]+ Stopped python3
$ lsof -n -p $(pgrep python3) | grep https
python3 7128 bf2190 3u IPv4 125436458 0t0 TCP 10.130.99.84:60276->52.46.159.85:https (ESTABLISHED)

In this case, the client remains connected after the results are returned. After approximately 2 minutes of idle time, boto3 closes the connection. If you make another call after that, boto3 opens a new connection. In the next case, the client is within the scope of a function. When the function returns, the client does out of scope, and the connection gets dropped.

>>> from boto3 import client
>>>
>>> def desc():
... rds = client('rds')
... rds.describe_db_instances()['DBInstances']
...
>>> desc()
>>>
[1]+ Stopped python3
$ lsof -n -p $(pgrep python3) | grep https
$

In the final example, client rds is set outside the scope. When the function returns, the client remains in scope and connected to the AWS server.

>>> from boto3 import client
>>>
>>> rds = client('rds')
>>> def desc():
... rds.describe_db_instances()['DBInstances']
...
>>> desc()
>>>
[1]+ Stopped python3
$ lsof -n -p $(pgrep python3) | grep https
python3 7631 bf2190 3u IPv4 125437676 0t0 TCP 10.130.99.84:42598->54.239.31.3:https (ESTABLISHED)

The boto3 session

The boto3 Session is not a connection at all. It is an object that holds access credentials. The Session creates objects that connect to AWS, particularly the Client. You can use the session to manage across multiple regions in an account, multiple accounts, multiple users, multiple roles, or across tokens acquired at different times. You can use the session to override the access keys in your profile, in your environment, or in your service account.

In this example, we loop over two regions and count the number of option groups. We allocate a list of sessions

>>> from boto3.session import Session
>>>
>>> AWS_ACCESS_KEY_ID="ASIAW3NNBKAMBK2CUGNE"
>>> AWS_SECRET_ACCESS_KEY="8V7Qaama1Yvl6ADRfrFI0vNTegx+V56S3sY5/uX0"
>>> AWS_SESSION_TOKEN="IQoJb3JpZ2luX2VjENz//////////wEaCXVzLWVhc3QtMSJIMEYCIQDOnKeV4UGWCVxuSvmD+O1FPUJHbxZOCPa/njhGlxBo1gIhAIRky8kx0jUNl40dg0NKGHjn1aOjYKe9Hdhs6msAlmq7KokDCFQQARoMNDcxMTk4NjgzMTYwIgzXqUo5F7yN89PdSxkq5gL1HhFXOOjVa8Irz8fVwUazE4VAKVtYthND8bXkqTLm1BE3x0hbD54rzpae80UxmcCnPv2Ec3WH8L6N2a7AvqSfhAJa34pzqJ+AcLy4blcdlbwE8I7d8ds/rvoNFiowEf5EhOaP41ZOdtngzdIUN1byE01Vsij8l9Ai74iE5DUnFqmIajtbr2T0Ah9a5ovQmBBB5nyjhY0DBcV9Bj0B6R8swoDnd6NBDcdcxQNvPqzJfwHsqrOYNJVzn1/FQLj86sgwTumHa1WyEugKWO3tlgEuOb+4Y3SqaMQ/vgC7OV+cdKfRzCrK/H/XYkWliVt7Ct1fQ8sGkEyxBlxyemgfvXybJnrXmEx5YisIO4LFMAkpzqsKYNTNybEwDKZmOAN3geP2c2QxoGNGHZ0NcT9nXDmtwHFuO0x2OlPf/qYmdZmlFKIRqBbGNCOPaJ5YgwjuGtx0TqULwKpNlYuF9WkZz5VGvFo6C1MSMLnpupEGOqUBmaqzHmvTykPbaH03SR5J1pKIvHI/XVMcvaThT2UEl7xQhwY0OE/AwAgP0+dtm0G6Mt7I6F2oZ5x6S2gnFHlxb5ctrgC8CRavVkks+ehHM8iAeBIm8CMk4Cb9jFBWj6jXldhHZDOCnOaqVAFSOlTaqXD3ZPlSRW1iyN1vOrmHIMJsGKeb6PiaGz0a9omCE/HCewQkcqYJXBhaTHSRUF1oJL/+GCvw"
>>>
>>>
>>> sess = []
>>> for region in 'us-east-1 us-west-1'.split():
... ses = Session(
... aws_access_key_id=AWS_ACCESS_KEY_ID,
... aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
... aws_session_token=AWS_SESSION_TOKEN,
... region_name=region
... )
... print(type(ses))
... sess.append(ses)
...
<class 'boto3.session.Session'>
<class 'boto3.session.Session'>

next we loop over the sessions, and run describe in a function.

>>> for ses in sess:
... res = ses.client('rds').describe_option_groups()['OptionGroupsList']
... print(len(res)).
File "<stdin>", line 3
print(len(res)).
^
SyntaxError: invalid syntax
>>> for ses in sess:
... res = ses.client('rds').describe_option_groups()['OptionGroupsList']
... print(len(res))
...
4
0
>>>
[1]+ Stopped python3
$ cat ../pgrep.bash
#!/bin/bash
lsof -n -p $(pgrep python3) | grep https
$ ../pgrep.bash
$

There are 4 option groups in us-east-1 and 0 in us-west-1. The describe ran directly off the client constructor. The client went out of scope once the describe finished. No connection remains connected. Instead, let’s allocate a list of clients.

>>> from boto3.session import Session
>>>
>>> AWS_ACCESS_KEY_ID="ASIAW3NNBKAMBK2CUGNE"
>>> AWS_SECRET_ACCESS_KEY="8V7Qaama1Yvl6ADRfrFI0vNTegx+V56S3sY5/uX0"
>>> AWS_SESSION_TOKEN="IQoJb3JpZ2luX2VjENz//////////wEaCXVzLWVhc3QtMSJIMEYCIQDOnKeV4UGWCVxuSvmD+O1FPUJHbxZOCPa/njhGlxBo1gIhAIRky8kx0jUNl40dg0NKGHjn1aOjYKe9Hdhs6msAlmq7KokDCFQQARoMNDcxMTk4NjgzMTYwIgzXqUo5F7yN89PdSxkq5gL1HhFXOOjVa8Irz8fVwUazE4VAKVtYthND8bXkqTLm1BE3x0hbD54rzpae80UxmcCnPv2Ec3WH8L6N2a7AvqSfhAJa34pzqJ+AcLy4blcdlbwE8I7d8ds/rvoNFiowEf5EhOaP41ZOdtngzdIUN1byE01Vsij8l9Ai74iE5DUnFqmIajtbr2T0Ah9a5ovQmBBB5nyjhY0DBcV9Bj0B6R8swoDnd6NBDcdcxQNvPqzJfwHsqrOYNJVzn1/FQLj86sgwTumHa1WyEugKWO3tlgEuOb+4Y3SqaMQ/vgC7OV+cdKfRzCrK/H/XYkWliVt7Ct1fQ8sGkEyxBlxyemgfvXybJnrXmEx5YisIO4LFMAkpzqsKYNTNybEwDKZmOAN3geP2c2QxoGNGHZ0NcT9nXDmtwHFuO0x2OlPf/qYmdZmlFKIRqBbGNCOPaJ5YgwjuGtx0TqULwKpNlYuF9WkZz5VGvFo6C1MSMLnpupEGOqUBmaqzHmvTykPbaH03SR5J1pKIvHI/XVMcvaThT2UEl7xQhwY0OE/AwAgP0+dtm0G6Mt7I6F2oZ5x6S2gnFHlxb5ctrgC8CRavVkks+ehHM8iAeBIm8CMk4Cb9jFBWj6jXldhHZDOCnOaqVAFSOlTaqXD3ZPlSRW1iyN1vOrmHIMJsGKeb6PiaGz0a9omCE/HCewQkcqYJXBhaTHSRUF1oJL/+GCvw"
>>>
>>> rdss = []
>>> for region in 'us-east-1 us-west-1'.split():
... rds = Session(
... aws_access_key_id=AWS_ACCESS_KEY_ID,
... aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
... aws_session_token=AWS_SESSION_TOKEN,
... region_name=region
... ).client('rds')
... rdss.append(rds)
...

Now loop over the clients.

>>> for rds in rdss:
... res = rds.describe_option_groups()['OptionGroupsList']
... print(len(res))
...
4
0
>>>
[1]+ Stopped python3
$ ../pgrep.bash
python3 6587 ec2-user 3u IPv4 26002 0t0 TCP 10.0.16.52:40794->52.119.197.147:https (ESTABLISHED)
python3 6587 ec2-user 4u IPv4 26004 0t0 TCP 10.0.16.52:39852->176.32.118.192:https (ESTABLISHED)

Because the list is in the outer scope, the clients remain in place after the describe finishes and the clients remain connected to the AWS server.

One session maps to one account and region, but that Session can create multiple Clients, where each Client can connect to AWS. In conclusion, the boto3 Session is an object containing tokens that are needed create session objects that can connect to AWS. The boto3 Session itself does not connect to the AWS server.

Conclusion

When I started applying python to my database administration job, I ran across various APIs that used the a “session” that was not a connection to the server. In the three APIs mentioned here, requests, SQL Alchemy, and boto3, the Session is not itself a connection, but it can connect and remain connected for a time. The SQL Alchemy session can be used create a connection to a database. The boto3 Session can create a client that connects to the AWS server.

Launch AMI by name in terraform

Posted on December 31, 2020 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

Using Terraform, you can launch an AWS EC2 instance from any account and any region using a public AMI identified only by image name. Because you do not need to lookup the image id each time, your code can be simplified.

Approach

Red Hat account number

In the AWS console, notice that the Red Hat, Inc. account number is 309956199498.

AMI name

Find the Red Hat AMI image by name. In the console, navigate to EC2 Images AMIs. Filter on Owner: 309956199498, Architecture: 64-bit (x86), Virtualization type: HVM.

For example, “RHEL-7.9_HVM_GA-20200917-x86_64-0-Hourly2-GP2”. You could also search using the AWS command line:

$ aws ec2 describe-images --owners 309956199498 --filters "` cat filters.ami.json `" --region us-east-2

where filters.ami.json contains:

[
    {
        "Name": "architecture",
        "Values": [
             "x86_64"
        ]
    },
    {
        "Name": "virtualization-type",
        "Values": [
             "hvm"
        ]
    }
]

Terraform data source

Define a Terraform aws_ami data source.

ami-rhel.tf

data "aws_ami" "rhel" {
    most_recent = true
    owners = [
        "309956199498"
    ]
    filter {
        name   = "name"
        values = [
            "RHEL-7.9_HVM_GA-20200917-x86_64-0-Hourly2-GP2"
        ]
    }
}

aws instance resource

Use the data source in your Terraform aws_instance resource

provider "aws" {
    region      = var.region
}

resource "aws_instance" "your-ec2-rsrc" {
    ami           = data.aws_ami.rhel.image_id

and so on . . .

Launch the ec2 instance

Run “terraform apply”. Terraform determines the AMI ID. For example, ami-0d2bf41df19c4aac7.

Conclusion

By using the Terraform AWS AMI data source, you can launch a public AMI from any account and region. The AMI ID is different for each account and region.

aws lambda publish-layer-version fails silently

Posted on August 2, 2020 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

aws lambda publish-layer-version fails silently. Out of memory is the root cause. Changing the instance type fixes the problem.

Symptoms

aws lambda publish-layer-version produces no output. The exit status is nonzero.

[root@ip-172-31-62-89 layers]# aws lambda publish-layer-version --layer-name oracle-instant-client-layer --zip-file fileb://oracle-instant-client-layer.zip  --compatible-runtimes python3.7


[root@ip-172-31-62-89 layers]# echo $?
255

Investigation

Investigation using strace reveals an out of memory condition

[root@ip-172-31-62-89 layers]# uname -a
Linux ip-172-31-62-89.ec2.internal 4.14.186-146.268.amzn2.x86_64 #1 SMP Tue Jul 14 18:16:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@ip-172-31-62-89 layers]# strace -f -o tr aws lambda publish-layer-version --layer-name oracle-instant-client-layer --zip-file fileb://oracle-instant-client-layer.zip  --compatible-runtimes python3.7
[root@ip-172-31-62-89 layers]# grep ENOMEM tr
3576  mmap(NULL, 272371712, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
3576  mmap(NULL, 272502784, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

The instance type is t2.micro. Physical memory is 983 MB

[root@ip-172-31-62-89 layers]# curl -X GET http://169.254.169.254/latest/meta-data/instance-type                                                  t2.micro[root@ip-172-31-62-89 layers]#
[root@ip-172-31-62-89 layers]# free -m
              total        used        free      shared  buff/cache   available
Mem:            983          63         742           0         177         784
Swap:             0           0           0

Solution

Change the instance type to t2.small. Re-run the command, The normal json output appears. The exit status is 0.

[root@ip-172-31-62-89 layers]# aws lambda publish-layer-version --layer-name oracle-instant-client-layer --zip-file fileb://oracle-instant-client-layer.zip  --compatible-runtimes python3.7
{
    "LayerVersionArn": "arn:aws:lambda:us-east-1:999999999999:layer:oracle-instant-client-layer:2",
    "Description": "",
    "CreatedDate": "2020-08-02T21:03:38.787+0000",
    "LayerArn": "arn:aws:lambda:us-east-1:999999999999:layer:oracle-instant-client-layer",
    "Content": {
        "CodeSize": 51069060,
        "CodeSha256": "B1DGnA385aL50A8mrKoq1FOsIsEtMerbhdYCwd485YA=",
        "Location": "https://prod-04-2014-layers. etc."
    },
    "Version": 2,
    "CompatibleRuntimes": [
        "python3.7"
    ]
}
[root@ip-172-31-62-89 layers]# echo $?
0

Physical memory us 1991 MB.

[root@ip-172-31-62-89 layers]# curl -X GET http://169.254.169.254/latest/meta-data/instance-type
t2.small
[root@ip-172-31-62-89 layers]# free -m
              total        used        free      shared  buff/cache   available
Mem:           1991          63        1712           0         215        1788
Swap:             0           0           0

Conclusion

An AWS CLI on EC2 produced no output and exited silently. Investigation uncovered an out of memory condition, which was fixed by upgrading the instance type.

EC2 maximum number of volumes

Posted on July 8, 2020 by Brian Fitzgerald

Introduction

The documented AWS EC2 EBS volume attachment limit is 27. The attachment limit affects the maximum number of Oracle ASM disks.

EC2 volume limit

Amazon AWS documents a 27-volume attachment limit. I have found that a t2 type creation attempt with 28 EBS volumes will abort.

Attempting to create an m5 (Nitro) with 28 or more volumes will hang.

creating

The attachment limit is 28, including network interfaces, volumes, and instance store volumes. EC2 instances with one network interface can have up to 27 volumes attached.

EC2 instances with more than 27 volumes

I am aware of t2 EC2 systems with as many as 44 volumes attached. I have no information about how this was done, and what the customer’s support expectations are. I would be concerned about how such a system would respond to a change of instance type.

Oracle Database

In planning an Oracle database installation, you may need file system mounts as well. /u01, for example. The underlying volume counts against the maximum attachment count.

In laying out an ASM setup, consider a simple design, with as few ASM disks and ASM disk groups as needed.

Conclusion

Based on the available public information, I recommend limiting the number of EBS volume attachments to 27. Oracle database administrators might want to simplify their ASM implementation.

OUI on AIX Power in IBM cloud via SSH tunnel

Posted on November 19, 2019 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

This is a worked example on how to Display Oracle Universal Server back to your Windows PC if you want to connect ssh through an additional server, such as a bastion. The destination operating system is AIX POWER in IBM Cloud. The tunnel host is Linux.

Pattern

The ssh tunnel connection follows this serverfault answer:

How to enable SSH X11 forwarding through additional server?

There are several ways to do this, the one I prefer is to forward the ssh port:

First, connect to machine B and forward [localPort] to C:22 through B

A$ ssh -L [localPort]:C:22 B

Next, connect to C from A through this newly-created tunnel using [localPort], forwarding X11

A$ ssh -X -p [localPort] localhost

Now we can run X11 programs on C and have them display on A

C$ xclock

[localPort] can be any port that you are not already listening to on A, I often use 2222 for simplicity.

Prerequisites

You should have a working ssh setup before beginning. In this example, the private ssh key is saved on Windows as file “ibm_rsa”. Likewise, you will need a working X server, such a Xming.

In Oracle Cloud, create an AIX Power VM. Create a Linux VM to use as the intermediate host.

Tunnel account setup

On the Linux host:

[root@dal-vm ~]# groupadd tunl
[root@dal-vm ~]# useradd -g tunl tunl
[root@dal-vm ~]# cp -pr  ~/.ssh ~tunl/.ssh
[root@dal-vm ~]# chown -R tunl:tunl ~tunl/.ssh

Oracle account setup

On the AIX POWER host, enable X11 forwarding,

*******************************************************************************
*                                                                             *
*                                                                             *
*  Welcome to AIX Version 7.1!                                                *
*                                                                             *
*                                                                             *
*  Please see the README file in /usr/lpp/bos for information pertinent to    *
*  this release of the AIX Operating System.                                  *
*                                                                             *
*                                                                             *
*******************************************************************************
# bash
bash-4.3# ed /etc/ssh/sshd_config
3233
1,$s/#X11Forwarding no/X11Forwarding yes/g
w
3207
q
bash-4.3# stopsrc -s sshd
0513-044 The sshd Subsystem was requested to stop.
bash-4.3# startsrc -s sshd
0513-059 The sshd Subsystem has been started. Subsystem PID is 13107376.
bash-4.3#

Create the oracle account:

bash-4.3# mkgroup oinstall
bash-4.3# mkgroup dba
bash-4.3# useradd -g oinstall -G dba oracle
bash-4.3# mkdir ~oracle
bash-4.3# cp -pr ~/.ssh ~oracle/.ssh
bash-4.3# chown -R oracle:oinstall ~oracle
bash-4.3#

Create the tunnel

On Windows, connect to the Linux host with the needed tunneling arguments.

C:>ssh -i ibm_rsa -N -L 3333:52.117.58.66:22 tunl@169.61.227.202
The authenticity of host '169.61.227.202 (169.61.227.202)' can't be established.
ECDSA key fingerprint is SHA256:gCHZnnBtodihB75yPqIZ21Cbdq/+IAWbsCr4zRe5MTw.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '169.61.227.202' (ECDSA) to the list of known hosts.

Note that this is not an interactive session. No Linux prompt appears.

Log on to the AIX POWER oracle account

In Windows, set the DISPLAY variable. Log on to AIX as oracle via the local tunnel port:

C:>set DISPLAY=localhost:0.0

C:>ssh -i ibm_rsa -Y -p 3333 oracle@localhost
The authenticity of host '[localhost]:3333 ([::1]:3333)' can't be established.
RSA key fingerprint is SHA256:28Wh/Inx/YBDvPhIYN+VyEZ8b903cXtKzA83KEnv3bU.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[localhost]:3333' (RSA) to the list of known hosts.
Warning: No xauth data; using fake authentication data for X11 forwarding.
Last login: Tue Nov 19 11:43:19 2019 on ssh from 169.61.227.202
*******************************************************************************
*                                                                             *
*                                                                             *
*  Welcome to AIX Version 7.1!                                                *
*                                                                             *
*                                                                             *
*  Please see the README file in /usr/lpp/bos for information pertinent to    *
*  this release of the AIX Operating System.                                  *
*                                                                             *
Last login: Tue Nov 19 11:43:19 2019 on ssh from 169.61.227.202
*******************************************************************************
*                                                                             *
*                                                                             *
*  Welcome to AIX Version 7.1!                                                *
*                                                                             *
*                                                                             *
*  Please see the README file in /usr/lpp/bos for information pertinent to    *
*  this release of the AIX Operating System.                                  *
*                                                                             *
*                                                                             *
*******************************************************************************
1356-364 /usr/bin/X11/xauth:  creating new authority file /home/oracle/.Xauthority

In AIX, set LC_ALL. Optionally, test xterm. Start Oracle Universal installer:

-bash-4.3$ export LC_ALL=C
-bash-4.3$ xterm
-bash-4.3$ cd /opt/app/oracle/product/19.3.0/dbhome_1/
-bash-4.3$ unzip -q /opt/app/download/AIX.PPC64_193000_db_home.zip
-bash-4.3$ ./runInstaller

********************************************************************************

Your platform requires the root user to perform certain pre-installation
OS preparation.  The root user should run the shell script 'rootpre.sh' before
you proceed with Oracle installation. The rootpre.sh script can be found at:
/opt/app/oracle/product/19.3.0/dbhome_1/clone/rootpre.sh

Answer 'y' if root has run 'rootpre.sh' so you can proceed with Oracle
installation.
Answer 'n' to abort installation and then ask root to run 'rootpre.sh'.

********************************************************************************

Has 'rootpre.sh' been run by root in this machine? [y/n] (n)
y
Launching Oracle Database Setup Wizard...

oem

Conclusion

This has been a complete, worked example of starting Oracle Universal Installer, an X-Windows client, on an AIX POWER virtual machine in the IBM Cloud. The ssh session was established via a tunnel on a Linux virtual machine.

pagination and python 3.8 assignment expressions in AWS boto3

Posted on October 23, 2019 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

AWS query operations return their results all at once, or in smaller, manageable pages. boto3 provides the programmer with to ways to manage the retrieval of paginated results.

Python version 3.8 introduces assignment expressions, a way to assign to variables within an expression using the notation := expr. PEP-572 explains how assignment expressions are useful in control statements, such as if and while, and in comprehensions. Assignment expressions can be used to improve coding style in pagination code.

describe

To query an Amazon AWS service, one calls an API using a procedure with a name beginning with “describe”, or sometimes “list” or “get”. The result is a dictionary with at least two elements, namely the payload, and ResponseMetadata. If more results are pending, NextToken will appear. The payload is your query result and consists of an array of objects. Examples of payload data include EC2 instance Reservations, EC2 EBS Volumes, and RDS DBInstances.

Using python API boto3, you can write:

import boto3
ec2 = boto3.client('ec2')
resp = { 'NextToken' : '' }
while 'NextToken' in resp:
    resp = ec2.describe_instances(
        NextToken = resp['NextToken']
    )
    for resv in resp['Reservations']:
        for inst in resv['Instances']:
            print ( inst['InstanceId'])

resp is a dict, and appears four times in the above code:

initialization with a dummy NextToken
returned from describe_instances
used to get describe_instances argument NextToken
used to get Reservations

The dictionary keys are Reservations, ResponseMetadata, and when needed, NextToken.

python 3.8 assignment expression

In python 3.8, we can assign to a variable, and then use the assignment as an expression. Specifically, we can assign resp, and access element Reservations.

import boto3
ec2 = boto3.client('ec2')
resp = { 'NextToken' : '' }
while 'NextToken' in resp:
    for resv in (resp := ec2.describe_instances(
        NextToken = resp['NextToken']
    ))['Reservations']:
        for inst in resv['Instances']:
            print ( inst['InstanceId'])

There is one fewer line. Some may find the revised code appealing.

paginator

AWS boto3 provides paginators, python iterators, that handle NextToken for you.

import boto3
ec2 = boto3.client('ec2')
pagr = ec2.get_paginator('describe_instances')
for page in pagr.paginate():
    for resv in page['Reservations']:
        for inst in resv['Instances']:
            print( inst['InstanceId'])

There are two fewer lines. NextToken does not appear in the code.

Paginate returns the same results as the prior “describe” type methods. The returned value is a dict, again with keys Reservations, ResponseMetadata, and when needed, NextToken. However, the returned NextToken has no role in the new code.

paginator relationship to API call

A debug trace of describe_instances() shows an underlying call to client.py, line 357, function _api_call().

(Pdb) where
  /usr/lib64/python3.7/bdb.py(585)run()
-> exec(cmd, globals, locals)
  (1)()
  /home/ec2-user/git/aws/insts.desc.3.7.py(8)()
-> NextToken = resp['NextToken']
  /usr/local/lib/python3.7/site-packages/botocore/client.py(357)_api_call()
-> return self._make_api_call(operation_name, kwargs)

Debug trace of paginate approach shows a call to the same point via paginate.py.

(Pdb) where
  /usr/lib64/python3.7/bdb.py(585)run()
-> exec(cmd, globals, locals)
  (1)()
  /home/ec2-user/git/aws/insts.page.3.7.py(6)()
-> for page in pagr.paginate():
  /usr/local/lib/python3.7/site-packages/botocore/paginate.py(255)__iter__()
-> response = self._make_request(current_kwargs)
  /usr/local/lib/python3.7/site-packages/botocore/paginate.py(332)_make_request()
-> return self._method(**current_kwargs)
  /usr/local/lib/python3.7/site-packages/botocore/client.py(357)_api_call()

You can see where paginate is handling NextToken for you. For example, this code:

NextToken = resp['NextToken']

is handled behind the scenes as:

> /usr/local/lib/python3.7/site-packages/botocore/paginate.py(303)__iter__()
-> previous_next_token = next_token
(Pdb) p next_token
{'NextToken': 'eyJ2IjoiMiIsImMiOiJ5eHV0K011N2crQlBBaFhoSWU2SUpad0c3V3VaUFBvKzBPbDRIWFAvaXJSb3poeDFDNks3TkxGMkU5R1UxRjk4UlVnNFViRzNjSUlWWXhMbHk3ejU1Qjd1ZGhERHNBVktCR1g0cW5RZk9FdStZckViM0NjOXljV1p0SWplckhkV2ZISkNvc0NXdjhnMXA4RVBMWDFiVzNkS3k1NW5CdlZmUlhWUEpzeUZNbnhMS3VzdEo4eHFIWHRYNytpcEdWbHJKMFRqTlNLQ3A0Rk9VaEZGckdBMTVOYU44WGhvYkYyZVBBYjRrMVVaYXNFTCIsInMiOiIxIn0='}

The describe and paginate approaches are functionally equivalent. Neither one has a performance advantage. Using paginate leads to cleaner code.

limiting the results (cloud side)

You can limit the number of items returned thus:

...
for page in pagr.paginate(PaginationConfig={'MaxItems': 250}):
...

In that case, the total number of items returned will be at most 250. The limiting is done on the cloud side. The items could be returned across more than one page.

limiting the results (client side)

Suppose you want all the ec2 instance types that have memory less than or equal to 8 GB. Two problems:

AWS pricing has no filter for memory.
The AWS pricing filter has no “less than” operator.

In that case, you need to retrieve your products, and apply additional filtering in client side code.

#!/home/ec2-user/sw/python/3.8/bin/python3.8

from boto3 import client
from pfilt import Pfilt
from json import dumps, loads


class ResultsLimitExceededError(Exception):
    def __init__(self, limit):
        fmt = format('results limit %s exceeded')
        msg = (fmt % limit)
        super(ResultsLimitExceededError, self).__init__(msg)


class Prices:

    @classmethod
    def instancetypes(cls):
        cli = client('pricing')
        pag = cli.get_paginator('get_products')

        max_products_in = 1000
        max_products_out = 100
        num_products_in = 0
        num_products_out = 0
        for page in pag.paginate(
                ServiceCode='AmazonEC2',
                Filters=Pfilt.filters,
                PaginationConfig={'MaxItems': max_products_in}
        ):
            print('page size %s' % len(page['PriceList']))
            for skitem in [loads(itm) for itm in page['PriceList']]:
                num_products_in += 1
                if (filtered_skitem := cls.client_filter(cls.enrich(skitem))) is not None:
                    if (num_products_out := num_products_out + 1) > max_products_out:
                        raise ResultsLimitExceededError(max_products_out)
                    cls.processitm(filtered_skitem)

        print('number of products in %s' % num_products_in)
        print('number of products out %s' % num_products_out)

    @classmethod
    def enrich(cls, skitem):
        eskitem = skitem
        product = skitem['product']
        memGB = float(product['attributes']['memory'].split()[0].replace(',', ''))
        eskitem['memGB'] = memGB
        return eskitem

    @classmethod
    def client_filter(cls, eskitem):
        maxmem = 8
        return eskitem if eskitem['memGB'] <= maxmem else None

    @classmethod
    def processitm(cls, skitem):
        pass


if __name__ == '__main__':
    Prices.instancetypes()

In this case, we set a cloud-side limit of 1000 records, and a client-side limit of 100 records.

Example output:

$ ./prices.pag.py
page size 100
page size 100
page size 39
number of products in 239
number of products out 33

You could manage your client-side memory by reducing the page size:

PaginationConfig={'MaxItems': max_products_in, 'PageSize': 50}

Setting PageSize does not affect the results. The output, then, is:

page size 50
page size 50
page size 50
page size 50
page size 39
number of products in 239
number of products out 33

To demonstrate the exception, we can change:

max_products_out = 20

in that case, we get:

$ ./prices.pag.py
Traceback (most recent call last):
  File "./prices.pag.py", line 62, in 
    Prices.instancetypes()
  File "./prices.pag.py", line 36, in instancetypes
    raise ResultsLimitExceededError(max_products_out)
__main__.ResultsLimitExceededError: results limit 20 exceeded

We used the new Python 3.8 assignment expression in two places:

if (fskitem := cls.client_filter(cls.enrich(skitem))) is not None:
    if (num_products_out := num_products_out + 1) > max_products_out:

We assign fskitem and then test it. Likewise, we assign num_products_out and test it.

pagination availability

Not all AWS operations can be paginated — it depends on the service. If an operation cannot be paginated, it will not return NextToken. If an operation cannot be paginated, you may have issues managing the results. You can use this script to find out if an operation can be paginated.

#!/usr/bin/python

from boto3 import client
from argparse import ArgumentParser


class CanPag:
    args = None

    @classmethod
    def prs(cls):
        ap = ArgumentParser(
            description='Check whether an operation can paginate'
        )
        ap.add_argument(
            '--service', '-s', required=True,
            help='AWS service name (ec2, s3, pricing, etc.)'
        )
        ap.add_argument(
            '--operation', '-o', required=True,
            help='AWS service operation (describe_instances, etc.)'
        )

        cls.args = ap.parse_args()

    @classmethod
    def canpage(cls):
        cli = client(cls.args.service)
        cp = cli.can_paginate(cls.args.operation)
        print(
            '%s %s %s paginate.' % (
                cls.args.service,
                cls.args.operation,
                'can' if cp else 'cannot')
        )


if __name__ == '__main__':
    CanPag.prs()
    CanPag.canpage()

Examples:

$ ./can.paginate.py -s ec2 -o describe_instances
ec2 describe_instances can paginate.
$ ./can.paginate.py -s ec2 -o describe_volumes
ec2 describe_volumes can paginate.
$ ./can.paginate.py -s ec2 -o describe_images
ec2 describe_images cannot paginate.
$ ./can.paginate.py -s pricing -o get_products
pricing get_products can paginate.

Conclusion

I wrote this article for two reasons: To identify a good paginator coding practice, and to try out the new Python 3.8 assignment expression.

AWS queries are often returned in chunks, or pages. You could write your own code to manage the retrieval, but you are better off using the provided paginator. You can configure the paginator as to and MaxItems and PageSize. Use can_paginate() to find out which operations can be paginated.

Python version 3.8 assignment expression was demonstrated. Three cases were presented. In each case, the pattern was “assign and test”. The assigned variables in the examples were resp, filtered_skitem, and num_products_out. The assigned value was needed elsewhere in the routine.

AutoUpgrade 12c->19c with Transient Logical Standby

Posted on October 17, 2019 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

Oracle Database 12.1 was upgraded to version 19.3 using the AutoUpgrade utility and the Transient Logical Standby. Actual downtime was less than 10s.

News about AutoUpgrade

For news about AutoUpgrade, please refer Oracle product manager Mike Dietrich’s blog.

System Description

The system is Linux 7.7 on Amazon AWS EC2. Details appear in this table:

Attribute	Value
cloud	AWS
location	US East (N. Virginia)
instanceType	m5.large
tenancy	shared
instanceFamily	General purpose
physicalProcessor	Intel Xeon Platinum 8175
clockSpeed	2.5 GHz
Number of CPUs	2
memGB	7.8 GiB
Storage	EBS gp2
operatingSystem	Linux
AMI	RHEL-7.7_HVM_GA-20190723-x86_64-1-Hourly2-GP2
OS Vendor	Red Hat, Inc.
Red Hat version	7.7
price per hour	$0.096

System Configuration

Initially, the system is a Data Guard network with a physical standby.

Description	Value
Original version	12.1.0.2
Target version	19.3.0
Primary	EQTRD
Physical Standby	XIENT
Instance type	Restart
Schema	trades

Non-default Database system configurations

Selected non-default instance parameters are:

name	value
archive_lag_target	900
audit_file_dest	/u01/app/oracle/admin/EQTRD/adump
audit_trail	DB
compatible	12.1.0.2.0
control_files	+DATA01/EQTRD/CONTROLFILE/current.259.1021685185, +RECO01/EQTRD/CONTROLFILE/current.313.1021685185
db_block_size	8192
db_create_file_dest	+DATA01
db_create_online_log_dest_1	+DATA01
db_create_online_log_dest_2	+RECO01
db_name	EQTRD
db_recovery_file_dest	+RECO01
db_recovery_file_dest_size	30g
dg_broker_config_file1	+DATA01/EQTRD/dr1orcl.dat
dg_broker_config_file2	+RECO01/EQTRD/dr2orcl.dat
dg_broker_start	TRUE
diagnostic_dest	/u01/app/oracle
local_listener	127.0.0.1:1521
log_archive_config	dg_config=(EQTRD,XIENT)
log_archive_dest_1	location=USE_DB_RECOVERY_FILE_DEST, valid_for=(ALL_LOGFILES, ALL_ROLES)
log_archive_dest_2	service=”XIENT”, ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name=”XIENT” net_timeout=30, valid_for=(online_logfile,all_roles)
log_archive_dest_state_2	ENABLE
sga_max_size	1500m
sga_target	1500m
standby_file_management	AUTO
use_large_pages	ONLY

Existing Data Guard 12c physical standby

We start this article with a working Data Guard physical standby with these issues already handled on the primary and the standby:

network route
static listeners
archivelog
flashback on
forced logging
3×500 MB online redo log groups
4×500 MB standby redo log groups
audit directory
data guard broker
tested switchover
spfile moved to ASM
tnsnames.ora in 12c oracle home
Restart is setup
oratab is configured and pointing to the 12c home
srvctl tested

The standby spfile has been moved to ASM so that it will be unnecessary to move it after the upgrade, or a possible downgrade.

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DATA01/XIENT/PARAMETERFILE/spfile.275.1021229119

Existing Data

Example table ordermessage has one row.

SQL> @ insert1.sql

1 row created.

SQL> commit;

Commit complete.
SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK

Preparations

Careful preparation should be completed days ahead of the planned upgrade. AutoUpgrade Analyze and Fixups can be run and the results reviewed. The 19c oracle homes, including tnsnames.ora and orapw should be prepared for the upgrade.

Download AutoUpgrade

Download the latest autoupgrade.jar

Install to the 19c oracle home on primary and standby

[oracle@ip-172-31-91-148 dbhome_1]$ alias oh
alias oh='cd $ORACLE_HOME'
[oracle@ip-172-31-91-148 dbhome_1]$ oh
[oracle@ip-172-31-91-148 dbhome_1]$ find . -name autoupgrade.jar -ls
76988898 3616 -rw-r--r-- 1 oracle oinstall 3702360 Oct 7 20:35 ./rdbms/admin/autoupgrade.jar

Set this alias for convenience

[oracle@ip-172-31-91-148 dbhome_1]$ AH=/u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-91-148 dbhome_1]$ alias au='rlwrap $AH/jdk/bin/java -jar $AH/rdbms/admin/autoupgrade.jar'

Use rlwrap for CLI readline editing and history recall.

Check the AutoUpgrade version:

[oracle@ip-172-31-91-148 dbhome_1]$ au -version
build.hash 67fee5b
build.version 20190823
build.date 2019/08/23 18:08:47

Source AutoUpgrade configuration file

The source configuration file is eqtrd.12c.19c.conf. Notice that target_home need not appear in the source configuration file.

global.autoupg_log_dir=/u01/app/oracle/autoupgrade

upg1.dbname=EQTRD
upg1.start_time=now
upg1.source_home=/u01/app/oracle/product/12.1.0/dbhome_1
upg1.sid=EQTRD
upg1.log_dir=/u01/app/oracle/autoupgrade
upg1.upgrade_node=ip-172-31-88-93.ec2.internal
upg1.target_version=19.3

Analyze

Use AutoUpgrade to analyze for issues and needed fixups.

If you scroll down, you will find a horizontal scrollbar. You may scroll right to view the right-most columns.

[oracle@ip-172-31-88-93 ~]$ au -config eqtrd.12c.19c.conf -mode analyze
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be analyzed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 110|  EQTRD|PRECHECKS|PREPARING|RUNNING|19/10/11 14:38|     N/A|14:38:09|Loading DB info|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+--------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|       MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+--------------+
| 110|  EQTRD|PRECHECKS|PREPARING|RUNNING|19/10/11 14:38|     N/A|14:38:16|Remaining 6/72|
+----+-------+---------+---------+-------+--------------+--------+--------+--------------+
Total jobs 1
upg> Job 110 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 110 FOR EQTRD

Fixups

Use AutoUpgrade to run Fixups:

[oracle@ip-172-31-88-93 ~]$ au -config eqtrd.12c.19c.conf -mode fixups
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 111|  EQTRD|PRECHECKS|PREPARING|RUNNING|19/10/11 14:43|     N/A|14:43:28|Loading DB info|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 111|  EQTRD|PREFIXUPS|EXECUTING|RUNNING|19/10/11 14:43|     N/A|14:43:42|Remaining 3/3|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 1
upg> Job 111 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 111 FOR EQTRD

Target AutoUpgrade configuration file

Prepare the target AutoUpgrade configuration file. Notice that dbname and sid are changed. upgrade_node must match the target hostname. The configuration file is xient.12c.19c.conf.

global.autoupg_log_dir=/u01/app/oracle/autoupgrade

upg1.dbname=XIENT
upg1.start_time=now
upg1.source_home=/u01/app/oracle/product/12.1.0/dbhome_1
upg1.target_home=/u01/app/oracle/product/19.3.0/dbhome_1
upg1.sid=XIENT
upg1.log_dir=/u01/app/oracle/autoupgrade
upg1.upgrade_node=ip-172-31-91-148.ec2.internal
upg1.target_version=19.3

Prepare 19c oracle homes

Prepare the 19c oracle homes with the password file and tnsnames.ora. Primary:

[oracle@ip-172-31-88-93 ~]$ alias oh
alias oh='cd $ORACLE_HOME'
[oracle@ip-172-31-88-93 ~]$ oh
[oracle@ip-172-31-88-93 dbhome_1]$ pwd
/u01/app/oracle/product/12.1.0/dbhome_1
[oracle@ip-172-31-88-93 dbhome_1]$ cp -p dbs/orapwEQTRD /u01/app/oracle/product/19.3.0/dbhome_1/dbs/
[oracle@ip-172-31-88-93 dbhome_1]$ cp -p network/admin/tnsnames.ora /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/

standby:

[oracle@ip-172-31-91-148 ~]$ oh
[oracle@ip-172-31-91-148 dbhome_1]$ pwd
/u01/app/oracle/product/12.1.0/dbhome_1
[oracle@ip-172-31-91-148 dbhome_1]$ cp -p dbs/orapwXIENT /u01/app/oracle/product/19.3.0/dbhome_1/dbs/ 
[oracle@ip-172-31-91-148 dbhome_1]$ cp -p network/admin/tnsnames.ora /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/

Create the Transient Logical Standby

For this exercise, I find it helpful to have four terminal windows open, two on the primary and two on the standby. One window per host is for tailing the alert log, and the other is for commands.

Data Guard configuration

Note the Data Guard configuration. At the standby:

SQL> select * from v$dataguard_config;

DB_UNIQUE_NAME  PARENT_DBUN     DEST_ROLE         CURRENT_SCN     CON_ID
--------------- --------------- ----------------- ----------- ----------
EQTRD           NONE            PRIMARY DATABASE      1868165          0
XIENT           EQTRD           PHYSICAL STANDBY      1868075          0

Disable the Data Guard broker configuration

We are not going to stop the Data Guard broker. Disabling the configuration freezes the current Data Guard system state.

DGMGRL> disable configuration
Disabled.

Standby is mounted

Note that the standby database is mounted:

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

Primary restore point

The primary restore point is required for the upgrade procedure later. Create it now:

SQL> create restore point upgrd1 guarantee flashback database;

Restore point created.

Cancel managed recovery

On the standby:

SQL> recover managed standby database cancel
Media recovery complete.

Create standby restore point

On the standby:

SQL> create restore point upgrd2 guarantee flashback database;

Restore point created.

Some DBAs create additional restore points at later steps.

Create the logical standby

For this step, it is helpful to watch two windows.

On the standby:

SQL> alter database recover to logical standby keep identity;

The session hangs …

On the primary:

SQL> exec dbms_logstdby.build;

PL/SQL procedure successfully completed.

SQL>

dbms_logstdby.build runs in approximately three seconds. Three more seconds later, the alter database finishes and the sqlplus prompt appears.

Database altered.

SQL>

Run standby apply

On the standby:

SQL> alter database open;

Database altered.

SQL> alter database start logical standby apply immediate;

Database altered.

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
INITIALIZING

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
INITIALIZING

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
LOADING DICTIONARY

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
LOADING DICTIONARY

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
LOADING DICTIONARY

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
APPLYING

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
IDLE

SQL>

Standby log miner dictionary build may hang at INITIALIZING for more than one minute. It could take up to 5 minutes to reach IDLE.

Foreign archivelogs

Archivelogs identified as “foreign” may appear at the standby

Wed Oct 16 02:21:23 2019
LOGMINER: End mining logfile for session 1 thread 1 sequence 33, +RECO01/XIENT/foreign_archivelog/EQTRD/2019_10_16/thread_1_seq_33.316.1021774867

Stop standby apply

When logical standby state reaches IDLE, stop standby apply.

On the primary:

SQL> alter system set log_archive_dest_state_2=defer scope=memory;

System altered.

From this point until the catchup, the primary continues to operate normally with users connected. Archived logs at the primary contain transactions that will be replayed at the standby later.

On the standby:

SQL> alter database stop logical standby apply;

Database altered.

Shutdown 12c

On the standby:

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

Upgrade

Upgrade the standby to 19c.

Setup the shell environment

Set the 19c environment.

On the standby:

[oracle@ip-172-31-91-148 ~]$ unset ORACLE_SID
[oracle@ip-172-31-91-148 ~]$ . oraenv
ORACLE_SID = [oracle] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/19.3.0/dbhome_1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ip-172-31-91-148 ~]$ export ORACLE_SID=XIENT

Upgrade the Restart

[oracle@ip-172-31-91-148 ~]$ srvctl upgrade database -database XIENT -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1

startup upgrade

SQL> startup upgrade
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             385875968 bytes
Database Buffers         1174405120 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.

Check the status. The following output is expected:

SQL> select logins, status from v$instance;

LOGINS     STATUS
---------- ------------
RESTRICTED OPEN MIGRATE

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

oratab

Note that the Oracle Agent has updated the Oracle home in oratab.

[oracle@ip-172-31-91-148 ~]$ grep XIENT /etc/oratab
XIENT:/u01/app/oracle/product/19.3.0/dbhome_1:N         # line added by Agent

Start screen

If you lose your session during the upgrade, you are going to lose direct control over the upgrade. To recover, you would need to search your global.autoupg_log_dir for the state and try to figure out how to recover, if necessary.

Protect your session with screen.

[oracle@ip-172-31-91-148 ~]$ screen

If you get disconnected, reconnect to screen:

[oracle@ip-172-31-91-148 ~]$ screen -r

You can detach from screen with ctrl-a-d.

Run AutoUpgrade

Use AutoUpgrade to upgrade the standby database.

On the standby, in screen:

[oracle@ip-172-31-91-148 ~]$ AH=/u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-91-148 ~]$ alias au='rlwrap $AH/jdk/bin/java -jar $AH/rdbms/admin/autoupgrade.jar'
[oracle@ip-172-31-91-148 ~]$ au -config xient.12c.19c.conf -mode upgrade
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|02:48:05|0%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|02:51:08|12%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:00:14|49%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:09:21|92%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:12:52|0%Compiled |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:15:56|80%Compiled |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
|Job#|DB_NAME|    STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|            MESSAGE|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|FINISHED|19/10/12 02:47|     N/A|03:18:21|Restarting Database|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
Total jobs 1
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|     STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 105|  XIENT|POSTFIXUPS|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:22:21|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 1
upg> Job 105 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 105 FOR XIENT

Exit screen

You may exit screen now. To exit screen, exit your shell.

Update the static listener

In the grid account, edit listener.ora. You can do this while you are waiting for the upgrade to finish.

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = XIENT)
    )
  )

Reload the listener

[grid@ip-172-31-91-148 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 12-OCT-2019 02:26:43

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-91-148.ec2.internal)(PORT=1521)))
The command completed successfully

Check the listener status

[grid@ip-172-31-91-148 admin]$ lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 12-OCT-2019 02:26:48

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-91-148.ec2.internal)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                11-OCT-2019 13:17:23
Uptime                    0 days 13 hr. 9 min. 25 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/ip-172-31-91-148/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ip-172-31-91-148.ec2.internal)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_RECO01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "XIENT" has 1 instance(s).
  Instance "XIENT", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

Continued operations on the 12c primary

Meanwhile, on primary, which is still on 12c, users may continue to modify data:

SQL> select banner from v$version where banner like 'Oracle Database%';

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

SQL> @ insert2.sql

1 row created.

SQL> commit;

Commit complete.

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK
DMYD9FS3FD3CYO77CU4R

Catch up the upgraded standby

Initial state

The standby is open and upgraded, but still lacks application data updates that were made to the primary during the upgrade.

SQL> select banner from v$version where banner like 'Oracle Database%';

BANNER
--------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK

With only one row, table ordermessage is out of date.

Resume transport

Resume transport from the primary to the standby:

Primary:

SQL> alter system set log_archive_dest_state_2=enable;

System altered.

Standby:

SQL> alter database start logical standby apply immediate;

Database altered.

Check for catchup on the standby:

Primary:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1964941

Standby:

SQL> select applied_scn, latest_scn, mining_scn from v$logstdby_progress;

APPLIED_SCN LATEST_SCN MINING_SCN
----------- ---------- ----------
1964938        1964946    1964944

Check switchover status.

On the primary:

SQL> select database_role, switchover_status from v$database;

DATABASE_ROLE    SWITCHOVER_STATUS
---------------- --------------------
PRIMARY          TO STANDBY

Until now, the application was operating normally on the original primary. Primary application data is now caught up on the original standby.

Check the application data on the standby:

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK
DMYD9FS3FD3CYO77CU4R

The standby data is up to date!

Now it is time to quiesce the application. Otherwise, changes could be lost.

Switchover

We’re going to switchover the application to the Transient Logical Standby. This is the actual outage. Get your stopwatch ready!

Original primary:

SQL> alter database commit to switchover to logical standby;

Database altered.

Start your stopwatch!

alter database commit to switchover to logical standby could run for 4s.

Original standby:

SQL> select database_role, switchover_status from v$database;

DATABASE_ROLE    SWITCHOVER_STATUS
---------------- --------------------
LOGICAL STANDBY  TO PRIMARY
SQL> alter database commit to switchover to logical primary;

Database altered.

alter database commit to switchover to logical primary could run for 15s or more.

Stop!

The first outage is over. Users may connect to the new logical primary.

Flashback the original primary

The original primary is going to become the physical standby.

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
SQL> flashback database to restore point upgrd1;

Flashback complete.

Convert the original primary to physical standby

SQL> alter database convert to physical standby;

Database altered.

Shut down original primary

SQL> shutdown abort
ORACLE instance shut down.

Environment

Set the 19c environment:

[oracle@ip-172-31-88-93 ~]$ unset ORACLE_SID
[oracle@ip-172-31-88-93 ~]$ . oraenv
ORACLE_SID = [oracle] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/19.3.0/dbhome_1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ip-172-31-88-93 ~]$ ORACLE_SID=EQTRD

Upgrade original primary Restart

 [oracle@ip-172-31-88-93 ~]$ srvctl upgrade database -database EQTRD -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1

startup mount

Mount the original primary.

[oracle@ip-172-31-88-93 ~]$ srvctl start database -database EQTRD -startoption MOUNT

Check oratab

Notice that the Oracle Agent updated the Oracle Home in oratab

[oracle@ip-172-31-88-93 ~]$ grep EQTRD /etc/oratab
EQTRD:/u01/app/oracle/product/19.3.0/dbhome_1:N         # line added by Agent

Static listener

Edit the static listener. In the grid account on the original primary:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = EQTRD)
    )
  )

Reload the listener

[grid@ip-172-31-91-148 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 15-OCT-2019 00:32:52

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-91-148.ec2.internal)(PORT=1521)))
The command completed successfully

Check the listener status

[grid@ip-172-31-88-93 admin]$ lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 16-OCT-2019 03:56:08

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-88-93.ec2.internal)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                16-OCT-2019 00:25:28
Uptime                    0 days 3 hr. 30 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/ip-172-31-88-93/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ip-172-31-88-93.ec2.internal)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_RECO01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "EQTRD" has 2 instance(s).
  Instance "EQTRD", status UNKNOWN, has 1 handler(s) for this service...
  Instance "EQTRD", status READY, has 1 handler(s) for this service...
The command completed successfully

Enable the Data Guard configuration

Connect to the Transient Logical Standby:

[oracle@ip-172-31-91-148 ~]$ dgmgrl sys/Zystm.22@XIENT
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sat Oct 12 05:17:06 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "XIENT"
Connected as SYSDBA.

DGMGRL> enable configuration
Enabled.

Data Guard broker is going to detect that XIENT is the new primary. Data Guard broker will start log transport to EQTRD, and will start managed standby on EQTRD. Wait 5 to 10 minutes until recovery at the original primary is caught up.

A message such as the following should appear in the original primary alert log:

2019-10-17T02:24:01.118981+00:00
PR00 (PID:9485): Media Recovery Waiting for T-1.S-31 (in transit)

Check the Data Guard configuration

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxPerformance
  Members:
  XIENT - Primary database
    EQTRD - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 20 seconds ago)

Errors

The following errors should not appear if no steps were missed.

ORA-1033: ORACLE initialization or shutdown in progress

DGMGRL> add database 'EQTRD' as connect identifier is 'EQTRD' maintained as physical;
Error: ORA-1033: ORACLE initialization or shutdown in progress

Failed.

There could be a password file issue. Copy the 12c orapw to the 19c home.

On the original primary:

[oracle@ip-172-31-88-93 ~]$ cp -p /u01/app/oracle/product/12.1.0/dbhome_1/dbs/orapwEQTRD /u01/app/oracle/product/19.3.0/dbhome_1/dbs/orapwEQTRD

Error: ORA-16698: member has a LOG_ARCHIVE_DEST_n parameter with SERVICE attribute set

DGMGRL> add database 'EQTRD' as connect identifier is 'EQTRD' maintained as physical;
Error: ORA-16698: member has a LOG_ARCHIVE_DEST_n parameter with SERVICE attribute set

Solution: clear log_archive_dest_2.

On the original primary:

SQL> alter system set log_archive_dest_2='';

System altered.

Switchover to the original primary

The next step is going to result in another application outage.

DGMGRL> switchover to EQTRD
Performing switchover NOW, please wait...
Operation requires a connection to database "EQTRD"
Connecting ...
Connected to "EQTRD"
Connected as SYSDBA.
New primary database "EQTRD" is opening...
Oracle Clusterware is restarting database "XIENT" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to "XIENT"
Connected to "XIENT"
Switchover succeeded, new primary is "eqtrd"

The switchover completed in less than 40s.

Flashback to restore point

This section contains backout procedures that you can use on the primary or the standby.

Remove or disable Data Guard

You can try to flashback the primary and the standby and resume Data Guard. If you do, you may notice an error such as this one:

In the broker:

ERROR ORA-16700: the standby database has diverged from the primary database

In the standby alert log:

Thu Oct 17 03:53:40 2019
Errors in file /u01/app/oracle/diag/rdbms/xient/XIENT/trace/XIENT_mrp0_3527.trc:
ORA-19909: datafile 1 belongs to an orphan incarnation
ORA-01110: data file 1: '+DATA01/XIENT/DATAFILE/system.271.1021840189'

For the sake of demonstration, we will remove Data Guard and then flashback and open either the primary or the standby, but not both.

Remove the Data Guard broker configuration.

DGMGRL> disable configuration
Disabled.
DGMGRL> remove configuration
Removed configuration
DGMGRL>

Stop the brokers and disable transport.

On the primary:

SQL> alter system set log_archive_dest_2='';

System altered.

SQL> alter system set dg_broker_start=false;

System altered.

On the standby:

SQL> alter system set dg_broker_start=false;

System altered.

Flashback the primary

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
SQL> flashback database to restore point upgrd1;

Flashback complete.

Clear parameters that are invalid in 12.1. Shut down abort.

SQL> alter system reset "__unified_pga_pool_size" sid = 'EQTRD' scope=spfile;

System altered

SQL> shutdown abort
ORACLE instance shut down.

Downgrade Restart.

[oracle@ip-172-31-88-93 ~]$ srvctl downgrade database -database EQTRD -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -targetversion 12.1.0.2

Set the 12c environment.

[oracle@ip-172-31-88-93 ~]$ unset ORACLE_SID
[oracle@ip-172-31-88-93 ~]$ . oraenv
ORACLE_SID = [oracle] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/12.1.0/dbhome_1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ip-172-31-88-93 ~]$ ORACLE_SID=EQTRD

Startup mount and open resetlogs.

SQL> startup mount
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
SQL> alter database open resetlogs;

Database altered.

Notice that the Agent has updated oratab to 12c.

[oracle@ip-172-31-88-93 ~]$ grep EQTRD /etc/oratab
EQTRD:/u01/app/oracle/product/12.1.0/dbhome_1:N         # line added by Agent

Check the application data. Changes that were made during the upgrade have been lost.

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK

Rollback of primary to 12c is done.

Flashback the standby

As an alternative, flashback the standby.

SQL> recover managed standby database cancel
Media recovery complete.
SQL> flashback database to restore point upgrd2;

Flashback complete.

SQL> alter database activate physical standby database;

Database altered.

Reset parameters that are invalid in 12.1. Shut down abort.

SQL> alter system reset "__unified_pga_pool_size" sid = 'XIENT' scope=spfile;

System altered.

SQL> shutdown abort
ORACLE instance shut down.

Downgrade Restart.

[oracle@ip-172-31-91-148 ~]$ srvctl downgrade database -database XIENT -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -targetversion 12.1.0.2

Set the 12c environment. Startup.

[oracle@ip-172-31-91-148 ~]$ unset ORACLE_SID
[oracle@ip-172-31-91-148 ~]$ . oraenv
ORACLE_SID = [oracle] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/12.1.0/dbhome_1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ip-172-31-91-148 ~]$ export ORACLE_SID=XIENT

SQL> startup
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
Database opened.

Notice that the Agent has updated oratab.

[oracle@ip-172-31-91-148 ~]$ grep XIENT /etc/oratab
XIENT:/u01/app/oracle/product/12.1.0/dbhome_1:N         # line added by Agent

Check the application data.

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK

Conclusion to flashbacks

This section was an outline of two possible, independent rollback procedures. Flashback will result in data loss. Further refinement is required.

Prior to beginning, we had moved spfile to ASM.

This concludes the section on flashback.

Final steps

If there are no problems with the upgrade, continue.

Drop the restore point

Primary:

SQL> drop restore point UPGRD1;

Restore point dropped.

Standby:

SQL> drop restore point UPGRD2;

Restore point dropped.

Update compatible

Primary:

SQL> alter system set compatible = '19.3.0.0.0' scope=spfile;

System altered.

SQL> startup force
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
Database opened.

Standby:

SQL> alter system set compatible = '19.3.0.0.0' scope=spfile;

System altered.

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
SQL

compatible error

You cannot change compatibility if a guaranteed restore point exists.

SQL> alter system set compatible = '19.3.0.0.0' scope=spfile;

System altered.

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             536870912 bytes
Database Buffers         1023410176 bytes
Redo Buffers                7876608 bytes
ORA-38880: Cannot advance compatibility from 12.1.0.2.0 to 19.3.0.0.0 due to
guaranteed restore points

Solutions:

Set compatible to its original value.

SQL> alter system set compatible = '12.1.0.2.0' scope=spfile;

System altered.

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             536870912 bytes
Database Buffers         1023410176 bytes
Redo Buffers                7876608 bytes
Database mounted.
SQL>

or:

Drop the restore point.

Delete foreign archive logs

RMAN> list foreign archivelog all;

RMAN> delete noprompt foreign archivelog all;

Conclusion

Oracle Database was upgraded from version 12.1 to version 19.3 using AutoUpgrade and Transient Logical Standby. The actual application outage was 10 seconds.

Notes:

The initial state was a working Data Guard system with a physical standby.
Oracle Customer offers rolling upgrade script physru.
In case of a problem with the upgrade, it is possible to flashback to a point in time before the upgrade.
This is a zero data loss upgrade.
There are two brief outages.
The rollback is not fully developed in this article.
As described here, downgrade to 12c results in data loss.

AutoUpgrade Oracle ->19c

Posted on September 26, 2019 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

AutoUpgrade is a convenient utility for completing multiple database upgrades in parallel. AutoUpgrade was released undocumented in Oracle 18c, and was documented for the first time in the 19c manual:

The AutoUpgrade utility identifies issues before upgrades, deploys upgrades, performs postupgrade actions, and starts the upgraded Oracle Database.

Use AutoUpgrade to upgrade Oracle Database from version 11.2.0.4 and up. To use AutoUpgrade, prepare a simple configuration file and run autoupgrade.jar. Behind the scenes, AutoUpgrade runs all steps necessary to complete the upgrade. AutoUpgrade runs on a single host. If you want to migrate to a separate, new host, run AutoUpgrade preparatory steps on the Source (original) host. Then, copy the database to the new host and run the Upgrade step on the target host.

News about AutoUpgrade

Mike Dietrich is Oracle’s product manager for AutoUpgrade. Refer to his blog for news about AutoUpgrade. Future topics or changes could include coverage of new Oracle versions, RAC, Restart, Data Guard, and migrating directly from non-CDB to CDB.

Demonstration platform description

AutoUpgrade is demonstrated in this blog article. The system attributes of the demonstration system are:

Attribute	Value
location	US East (N. Virginia)
instanceType	c5.xlarge
tenancy	shared
instanceFamily	Compute optimized
physicalProcessor	Intel Xeon Platinum 8124M
clockSpeed	3.0 Ghz
Number of CPUs	4
memGB	8 GiB
operatingSystem	Linux
AMI	RHEL-7.7_HVM_GA-20190723-x86_64-1-Hourly2-GP2
OS Vendor	Red Hat, Inc.
Red Hat version	7.7
price per hour	$0.17

Demonstration database environment

Here is a database environment summary:

Description	Value
Source version	18.3.0
Target version	19.3.0
Instances	THING1 and THING2
Instance type	Restart

Concurrency

Be aware that AutoUpgrade is going to launch an upgrade process for each database in the configuration file. The result is going to be multiple concurrent upgrade processes running and multiple concurrent active databases.If you are upgrading few databases on a box with a lot of memory, you will have no problem.

If you have multiple databases on a smaller box, you may find that the load average will reach a high level, and interactive shell response will lag. If you find that AutoUpgrade results in an unmanageable load average, there is more than one solution. For example, you could temporarily shut down some databases while you upgrade other databases. If you are in the cloud, you can get better performance by temporarily running the host image on an instance type with a higher amount of memory and CPU:

aws ec2 stop-instances --instance-ids i-00a836c352bb54daa
aws ec2 modify-instance-attribute --instance-id i-00a836c352bb54daa --instance-type '{"Value": "c5.xlarge"}'
aws ec2 start-instances --instance-ids i-00a836c352bb54daa

Finally, you may use the start_time parameter to schedule database upgrades at different future times.

Alias

Run the AutoUpgrade binary out of an Oracle home at the target version. You are going to be put into an interactive command line interface (CLI), and you are going to check on job status multiple times with the “lsj” command. For convenience, use rlwrap and create an alias.

[oracle@ip-172-31-88-93 ~]$ AH=/u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-88-93 ~]$ alias au='rlwrap $AH/jdk/bin/java -jar $AH/rdbms/admin/autoupgrade.jar'

In this article, “au” refers to this alias.

Notice that in the alias, AH refers to an Oracle home where Java version 8 can be found. Java version 8 can be found in an Oracle 12c, 18c, or 19c Oracle home. Be sure to install the latest autoupgrade.jar into that Oracle home.

[oracle@ip-172-31-88-93 dbhome_1]$ $AH/jdk/bin/java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

If you want to run AutoUpgrade Analyze or Fixups on an Oracle 11.2.0.4 database that is on a separate host, install Java version 8 and the latest autoupgrade.jar there.

Version

AutoUpgrade distributed with Oracle Database 19.3 is version 20190207.

[oracle@ip-172-31-88-93 ~]$ au -version
build.version 20190207
build.date 2019/02/07 12:35:56
build.label RDBMS_PT.AUTOUPGRADE_LINUX.X64_190205.1800

Version 20190207 contains a bug that leads to a failure of Deploy mode on ASM. You can get the most recent version from Oracle Customer Support: AutoUpgrade Tool (Doc ID 2485457.1). As of this writing, the most recent version is:

[oracle@ip-172-31-88-93 ~]$ au -version
build.hash 67fee5b
build.version 20190823
build.date 2019/08/23 18:08:47

Configuration file

A single configuration file will be used to upgrade all instances. You may use AutoUpgrade itself to create a sample configuration file.

[oracle@ip-172-31-88-93 ~]$ au -create_sample_file config
Created sample configuration file /home/oracle/sample_config.cfg

Rename and customize the configuration file. There is one global section and multiple database sections, as many as you want. Parameter dbname actually refers to db_unique_name.

Optional AutoUpgrade configuration parameters deliver extended functionality such as changing database initialization parameters on the fly during upgrade, custom scripts to run before or after upgrade, guaranteed restore point, fixup list modification, environment variables, pluggable database upgrades, optional utlrp run, and alternate tns_admin directory location.

[oracle@ip-172-31-88-93 ~]$ mkdir -p /u01/app/oracle/autoupgrade
[oracle@ip-172-31-88-93 ~]$ cat thing1.thing2.18c.19c.conf

#Global configurations
global.autoupg_log_dir=/u01/app/oracle/autoupgrade

upg1.dbname=THING1
upg1.start_time=now
upg1.source_home=/u01/app/oracle/product/18.3.0/dbhome_1
upg1.target_home=/u01/app/oracle/product/19.3.0/dbhome_1
upg1.sid=THING1
upg1.log_dir=/u01/app/oracle/autoupgrade
upg1.upgrade_node=ip-172-31-88-93.ec2.internal
upg1.target_version=19.3
#upg1.run_utlrp=yes
#upg1.timezone_upg=yes

upg2.dbname=THING2
upg2.start_time=now
upg2.source_home=/u01/app/oracle/product/18.3.0/dbhome_1
upg2.target_home=/u01/app/oracle/product/19.3.0/dbhome_1
upg2.sid=THING2
upg2.log_dir=/u01/app/oracle/autoupgrade
upg2.upgrade_node=ip-172-31-88-93.ec2.internal
upg2.target_version=19.3
#upg2.run_utlrp=[yes|no]
#upg2.timezone_upg=[yes|no]

Recommendation: Identify $ORACLE_BASE. Set global and all database log directories to $ORACLE_BASE/autoupgrade. In every case, AutoUpgrade will create a subdirectory. You will get a directory structure like this:

/u01/app/oracle/autoupgrade/cfgtoollogs/
/u01/app/oracle/autoupgrade/THING1/
/u01/app/oracle/autoupgrade/THING2/

Analyze mode

In AutoUpgrade Analyze mode, the database instances should be running out of the Source Oracle home with the databases open.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode analyze

Output:

Autoupgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be analyzed
Enter some command, type 'help' or 'exit' to quit
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED| MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
| 110| THING1|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:20|Starting|
| 111| THING2|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:24|Starting|
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 110| THING1|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:41| Remaining 1/72|
| 111| THING2|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:42|Remaining 67/72|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 2
upg>
Job 110 for THING1 FINISHED
Job 111 for THING2 FINISHED

Locate the report. For example:

[oracle@ip-172-31-88-93 ~]$ find /u01/app/oracle/autoupgrade -name '*.html' -mmin -5
/u01/app/oracle/autoupgrade/global/cfgtoollogs/upgrade/auto/state.html
/u01/app/oracle/autoupgrade/THING1/110/prechecks/thing1_preupgrade.html
/u01/app/oracle/autoupgrade/THING2/111/prechecks/thing2_preupgrade.html

Review the report in a browser.

prechecks

The report may contain Error, Warning, Recommend, and Info findings. You must address severity Error findings before attempting upgrade.

Use of Analyze mode to identify fixups

The Analyze mode runs prechecks that result in a list of changes that AutoUpgrade will make during the Fixups stage. The planned fixups appear in a “checklist” file in three formats. For example:

THING1/101/prechecks/thing1_checklist.cfg
THING1/101/prechecks/thing1_checklist.json
THING1/101/prechecks/thing1_checklist.xml

Prechecks leading to fixups are:

[oracle@ip-172-31-88-93 autoupgrade]$ < THING1/101/prechecks/thing1_checklist.json jq -r '.containers[].checks[] | select( .fixup_available == "YES" ).checkname '
DICTIONARY_STATS
POST_DICTIONARY
POST_FIXED_OBJECTS
PRE_FIXED_OBJECTS
OLD_TIME_ZONES_EXIST
MANDATORY_UPGRADE_CHANGES

The complete json checklist file is presented for information:

{
  "dbname" : "THING1",
  "containers" : [ {
    "containername" : "THING1",
    "checks" : [ {
      "checkname" : "DICTIONARY_STATS",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather stale data dictionary statistics prior to database upgrade in off-peak time using:    EXECUTE DBMS_STATS.GATHER_DICTIONARY_STATS;",
      "broken rule" : "Dictionary statistics do not exist or are stale (not up-to-date).",
      "rule" : "Dictionary statistics help the Oracle optimizer find efficient SQL execution plans and are essential for proper upgrade timing. Oracle recommends gathering dictionary statistics in the last 24 hours before database upgrade.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "POST_DICTIONARY",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather dictionary statistics after the upgrade using the command:    EXECUTE DBMS_STATS.GATHER_DICTIONARY_STATS;",
      "broken rule" : "Oracle recommends gathering dictionary statistics after upgrade.",
      "rule" : "Dictionary statistics provide essential information to the Oracle optimizer to help it find efficient SQL execution plans. After a database upgrade, statistics need to be re-gathered as there can now be tables that have significantly changed during the upgrade or new tables that do not have statistics gathered yet."
    },
    {
      "checkname" : "POST_FIXED_OBJECTS",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather statistics on fixed objects after the upgrade and when there is a representative workload on the system using the command:    EXECUTE DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;",
      "broken rule" : "This recommendation is given for all preupgrade runs.",
      "rule" : "Fixed object statistics provide essential information to the Oracle optimizer to help it find efficient SQL execution plans.  Those statistics are specific to the Oracle Database release that generates them, and can be stale upon database upgrade.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "PRE_FIXED_OBJECTS",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather statistics on fixed objects prior the upgrade.",
      "broken rule" : "None of the fixed object tables have had stats collected.",
      "rule" : "Gathering statistics on fixed objects, if none have been gathered yet, is recommended prior to upgrading.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "OLD_TIME_ZONES_EXIST",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "WARNING",
      "action" : "Upgrade the database time zone file using the DBMS_DST package.",
      "broken rule" : "The database is using time zone file version 31 and the target 19 release ships with time zone file version 32.",
      "rule" : "Oracle recommends upgrading to the desired (latest) version of the time zone file.  For more information, refer to 'Upgrading the Time Zone File and Timestamp with Time Zone Data' in the 19 Oracle Database Globalization Support Guide."
    },
    {
      "checkname" : "MANDATORY_UPGRADE_CHANGES",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "INFO",
      "action" : "Mandatory changes are applied automatically in the during_upgrade_pfile_dbname.ora file.  Some of these changes maybe present in the after_upgrade_pfile_dbname.ora file.  The during_upgrade_pfile_dbname.ora is used to start the database in upgrade mode. The after_upgrade_pfile_dbname.ora is used to start the database once the upgrade has completed successfully.",
      "broken rule" : "",
      "rule" : "Mandatory changes are required to perform the upgrade.  These changes are implemented in the during_ and after_upgrade_pfile_dbname.ora files."
    },
    {
      "checkname" : "RMAN_RECOVERY_VERSION",
      "stage" : "PRECHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "INFO",
      "action" : "Check the Oracle Backup and Recovery User's Guide for information on how to manage an RMAN recovery catalog schema.",
      "broken rule" : "If you are using a version of the recovery catalog schema that is older than that required by the RMAN client version, then you must upgrade the catalog schema.",
      "rule" : "It is good practice to have the catalog schema the same or higher version than the RMAN client version you are using."
    },
    {
      "checkname" : "TABLESPACES_INFO",
      "stage" : "PRECHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "INFO",
      "action" : "To help you keep track of your tablespace allocations, the following AUTOEXTEND tablespaces are expected to successfully EXTEND during the upgrade process.",
      "broken rule" : "",
      "rule" : "Minimum tablespace sizes for upgrade are estimates."
    },
    {
      "checkname" : "DIR_SYMLINKS",
      "stage" : "POSTCHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "WARNING",
      "action" : "To identify directory objects with symbolic links in the path name, run $ORACLE_HOME/rdbms/admin/utldirsymlink.sql AS SYSDBA after upgrade. Recreate any directory objects listed, using path names that contain no symbolic links.",
      "broken rule" : "Some directory object path names may currently contain symbolic links.",
      "rule" : "Starting in Release 18c, symbolic links are not allowed in directory object path names used with BFILE data types, the UTL_FILE package, or external tables."
    }]
  }]
}

Fixup mode

Prior to upgrading, you may run the fixups that are recommended, possible, and available. You can see what fixups will run by running Analyze mode and reviewing the checklist file. In AutoUpgrade Fixup mode, the database instances should be running out of the Source Oracle home with the databases open.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode fixups

The console session:

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode fixups
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 102| THING2|PRECHECKS|PREPARING|RUNNING|19/09/29 01:04|     N/A|01:04:59|Loading DB info|
| 103| THING1|PRECHECKS|PREPARING|RUNNING|19/09/29 01:05|     N/A|01:05:03|Loading DB info|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 102| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:04|     N/A|01:06:28|Remaining 3/4|
| 103| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:05|     N/A|01:05:21|Remaining 4/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 102| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:04|     N/A|01:06:28|Remaining 3/4|
| 103| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:05|     N/A|01:06:36|Remaining 3/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg>
upg> Job 102 completed
Job 103 completed
------------------- Final Summary --------------------
Number of databases            [ 2 ]

Jobs finished successfully     [2]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 102 FOR THING2
Job 103 FOR THING1

Space requirements

You must have sufficient space for archived redo logs. In this example, I upgraded two databases (6.7 GB), and consumed 9.1 GB in RECO. Diskgroup RECO01 was 40 GB and each of two databases had db_recovery_file_dest_size=20g.

Upgrade vs Deploy mode

Two upgrade modes are available, Upgrade and Deploy.

Upgrade mode performs the actual upgrade. You should run the Analyze and Fixup steps first. Upgrade mode is an appropriate choice when the upgrade will be performed on a different host from the Source version. Upgrade mode requires that you manually issue “startup upgrade” in the new Oracle home.

Deploy mode performs the Analyze, Fixup, and Upgrade steps, and additional steps, on a single host in a single execution.

Upgrade mode

Startup upgrade

If you are going to use Upgrade mode, then switch the environment to the target Oracle home. Issue “startup upgrade”.

[ec2-user@ip-172-31-88-93 ~]$ cat /tmp/initTHING1.ora
spfile='+DATA01/THING1/PARAMETERFILE/spfile.276.1019927455'
[oracle@ip-172-31-88-93 ~]$ unset ORACLE_SID 
[oracle@ip-172-31-88-93 ~]$ . oraenv 
ORACLE_SID = [oracle] ? 
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/19.3.0/dbhome_1 
The Oracle base remains unchanged with value /u01/app/oracle 
[oracle@ip-172-31-88-93 ~]$ export ORACLE_SID=THING1 
[oracle@ip-172-31-88-93 ~]$ sysdba 
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Sep 25 19:03:59 2019 
Version 19.3.0.0.0 

Copyright (c) 1982, 2019, Oracle.  All rights reserved. 

Connected to an idle instance. 
SQL> startup upgrade pfile='/tmp/initTHING1.ora'
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             385875968 bytes
Database Buffers         1174405120 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

[oracle@ip-172-31-88-93 ~]$ export ORACLE_SID=THING2
[ec2-user@ip-172-31-88-93 ~]$ cat /tmp/initTHING2.ora
spfile='+DATA01/THING2/PARAMETERFILE/spfile.257.1019928401'
[oracle@ip-172-31-88-93 ~]$ sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Sep 25 19:06:08 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade pfile='/tmp/initTHING2.ora'
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             385875968 bytes
Database Buffers         1174405120 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

Run Upgrade

Start AutoUpgrade. After the upgrade has been launched in the background, you will be dropped int CLI mode. Run “lsj” from time to time to check on status.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode upgrade
Autoupgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Enter some command, type 'help' or 'exit' to quit
upg> tasks
+--+---------------------+-------------+
|ID|                 Name|       Status|
+--+---------------------+-------------+
| 1|                 main|      WAITING|
|33|             jobs_mon|      WAITING|
|34|             mgr_help|TIMED_WAITING|
|35|             watchdog|TIMED_WAITING|
|36|              console|     RUNNABLE|
|37|         queue_reader|      WAITING|
|38|                cmd-0|      WAITING|
|39|        job_manager-0|      WAITING|
|40|        job_manager-1|      WAITING|
|42|           bqueue-108|      WAITING|
|50|       monitor_thing1|TIMED_WAITING|
|51|        catctl_thing1|      WAITING|
|52| abort_monitor_thing1|TIMED_WAITING|
|54|           async_read|     RUNNABLE|
|55|           bqueue-109|      WAITING|
|58|           async_read|     RUNNABLE|
+--+---------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:09:39|Running|
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:09:42|Running|
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
Total jobs 2
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:12:46|13%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:12:50|14%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:18:48|41%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:18:52|42%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:21:49|49%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:21:53|49%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:34:21| 0%Compiled |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:33:12|94%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:38:39|       |
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:38:39|       |
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:52|Remaining 1/3|
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:55|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:52|Remaining 1/3|
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:55|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg>
Job 108 for THING1 FINISHED
Job 109 for THING2 FINISHED

The CLI exits when all jobs are done.

Execution time

You can find out the execution time by reviewing the logs.

/u01/app/oracle/autoupgrade/global/cfgtoollogs/upgrade/auto/autoupgrade_user.log

2019-09-25 19:09:22.185 INFO Loading user config file metadata
2019-09-25 19:09:28.405 INFO The target_base parameter was updated from N/A to /u01/app/oracle due to finding a more accurate value.
2019-09-25 19:09:28.418 INFO The target_version parameter was updated from 19.3 to 19.3.0.0.0 due to finding a more accurate value.
2019-09-25 19:09:29.444 INFO Finished processing dbEntry upg1
2019-09-25 19:09:35.597 INFO The target_base parameter was updated from N/A to /u01/app/oracle due to finding a more accurate value.
2019-09-25 19:09:35.602 INFO The target_version parameter was updated from 19.3 to 19.3.0.0.0 due to finding a more accurate value.
2019-09-25 19:09:36.627 INFO Finished processing dbEntry upg2
2019-09-25 19:09:36.634 INFO
build.version:20190207
build.date:2019/02/07 12:35:56
build.label:RDBMS_PT.AUTOUPGRADE_LINUX.X64_190205.1800

2019-09-25 19:09:36.635 INFO Current settings Initialized
2019-09-25 19:09:36.658 INFO Starting
2019-09-25 19:30:41.216 INFO Could not determine the flashback info
2019-09-25 19:30:43.240 INFO Could not determine the flashback info
2019-09-25 19:32:41.213 INFO Could not determine the flashback info
2019-09-25 19:32:43.237 INFO Could not determine the flashback info
2019-09-25 19:36:41.220 INFO Could not determine the flashback info
2019-09-25 19:36:43.253 INFO Could not determine the flashback info
2019-09-25 19:37:40.213 INFO Could not determine the flashback info
2019-09-25 19:37:41.237 INFO Could not determine the flashback info
2019-09-25 19:41:40.213 INFO Could not determine the flashback info
2019-09-25 19:41:41.238 INFO Could not determine the flashback info
2019-09-25 19:42:41.212 INFO Could not determine the flashback info
2019-09-25 19:42:43.235 INFO Could not determine the flashback info
2019-09-25 19:44:15.396 INFO Closing

Elapsed time was 34 minutes.

Deploy mode

Deploy mode completes all upgrade steps from soup to nuts: Analyze, Fixup, and Upgrade.

additional steps in Deploy mode

Deploy mode steps that are not covered by Analyze, Fixup, and Upgrade.

A careful review of the documentaion reveals that the Deploy mode implements a guaranteed restore point. You must exercise diligence to remove the restore point after it is no longer required, or you may optionally configure drop_grp_after_upgrade=yes.

Deploy mode contains a drain step, during which AutoUpgrade drains database sessions from the source instance.

Preupgrade refers to checks of your system, including disk space.

The Postupgrade documentation refers to moving the source configuation file and starting the upgraded instance. However, Upgrade mode also starts up your upgraded instance.

Deploy mode runs available fixups that correct Warning, Recommend, and Info precheck findings. AutoUpgrade makes these change without asking you. If you are using change management, be aware of, and document all changes, as required by your organization.

Run Deploy

Run deploy from the Source Oracle home with the databases opened normally. In other words, do not issue “startup upgrade” with Deploy mode.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode deploy
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
|Job#|DB_NAME|     STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|  MESSAGE|
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
| 114| THING2|PREUPGRADE|EXECUTING| RUNNING|19/09/26 14:41|     N/A|14:41:48|         |
| 115| THING1|     SETUP|PREPARING|FINISHED|19/09/26 14:43|     N/A|14:41:47|Scheduled|
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING| RUNNING|19/09/26 14:41|     N/A|14:42:25|Remaining 4/4|
| 115| THING1|    SETUP|PREPARING|FINISHED|19/09/26 14:43|     N/A|14:41:47|    Scheduled|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:42:25|Remaining 4/4|
| 115| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:43:51|Remaining 4/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:45:58|Remaining 3/4|
| 115| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:47:27|Remaining 3/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> tasks
+---+------------------+-------------+
| ID|              NAME|         Job#|
+---+------------------+-------------+
|  1|              main|      WAITING|
| 40|          jobs_mon|      WAITING|
| 41|           console|     RUNNABLE|
| 42|      queue_reader|      WAITING|
| 43|             cmd-0|      WAITING|
| 54|     job_manager-0|      WAITING|
| 55|     job_manager-1|      WAITING|
| 58|        event_loop|TIMED_WAITING|
| 59|        bqueue-114|      WAITING|
|200|         exec_loop|      WAITING|
|201|        bqueue-115|      WAITING|
|337|        fixups-115|      WAITING|
|338|    rep_checks-115|TIMED_WAITING|
|340|    thing1-puifx-0|      WAITING|
|341|    thing1-puifx-1|      WAITING|
|353|          quickSQL|     RUNNABLE|
|388|THING2-steady-ts-0|     RUNNABLE|
|400|          quickSQL|   TERMINATED|
+---+------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:51:53|0%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:52:30|0%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:57:58|21%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:58:35|18%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:01:01|21%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:01:37|21%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:07:06|37%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:07:42|37%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:13:10|49%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:13:46|49%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:22:17|75%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:22:53|75%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> tasks
+---+---------------------+-------------+
| ID|                 NAME|         Job#|
+---+---------------------+-------------+
|  1|                 main|      WAITING|
| 40|             jobs_mon|      WAITING|
| 41|              console|     RUNNABLE|
| 42|         queue_reader|      WAITING|
| 43|                cmd-0|      WAITING|
| 54|        job_manager-0|      WAITING|
| 55|        job_manager-1|      WAITING|
| 58|           event_loop|      WAITING|
| 59|           bqueue-114|      WAITING|
|200|            exec_loop|TIMED_WAITING|
|201|           bqueue-115|      WAITING|
|634|       monitor_thing2|TIMED_WAITING|
|635|        catctl_THING2|      WAITING|
|636| abort_monitor_THING2|TIMED_WAITING|
|637|           async_read|     RUNNABLE|
|649|       monitor_thing1|TIMED_WAITING|
|650|        catctl_THING1|      WAITING|
|651| abort_monitor_THING1|TIMED_WAITING|
|652|           async_read|     RUNNABLE|
|772|             quickSQL|     RUNNABLE|
+---+---------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:28:21|91%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:28:58|91%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:38:14|92%Compiled |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:38:34|79%Compiled |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
|Job#|DB_NAME|    STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|            MESSAGE|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
| 114| THING2|DBUPGRADE|EXECUTING|FINISHED|19/09/26 14:41|     N/A|15:46:19|Restarting Database|
| 115| THING1|DBUPGRADE|EXECUTING| RUNNING|19/09/26 14:43|     N/A|15:46:40|       98%Compiled |
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|     STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:51:05|Remaining 1/3|
| 115| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:51:29|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> Job 114 completed
lsj
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
|Job#|DB_NAME|      STAGE|OPERATION|  STATUS|    START_TIME|      END_TIME| UPDATED|          MESSAGE|
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
| 114| THING2|POSTUPGRADE|  STOPPED|FINISHED|19/09/26 14:41|19/09/26 15:55|15:55:23|Completed job 114|
| 115| THING1|POSTUPGRADE|EXECUTING| RUNNING|19/09/26 14:43|           N/A|15:54:57|       Restarting|
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
Total jobs 2
upg> Job 115 completed
------------------- Final Summary --------------------
Number of databases            [ 2 ]

Jobs finished successfully     [2]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 114 FOR THING2
Job 115 FOR THING1

---- Drop GRP at your convenience once you consider it is no longer needed ----
Drop GRP from THING2: drop restore point AUTOUPGRADE_221145114461854_THING2
Drop GRP from THING1: drop restore point AUTOUPGRADE_221145114461854_THING1

[oracle@ip-172-31-88-93 ~]$

The final state is all databases upgraded and open.

In Deploy mode, we have a new issue. There is now a guaranteed restore point. Unless you drop the restore point, you will eventually get a stuck archiver.

SQL> select guarantee_flashback_database gua, name from v$restore_point;

GUA NAME
--- ----------------------------------------
YES AUTOUPGRADE_221145114461854_THING1

Loose ends

There are items that you must tidy up manually, even if you use AutoUpgrade in Deploy mode:

CRS still points to the old Oracle home
/etc/oratab still points to the old Oracle home
in deploy mode, a guaranteed restore point exists.

Run “srvctl upgrade”.

[oracle@ip-172-31-88-93 ~]$ srvctl upgrade database -database THING1 -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-88-93 ~]$ srvctl upgrade database -database THING2 -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1

You can edit oratab or run “srvctl start database”

[oracle@ip-172-31-88-93 ~]$ srvctl start database -database THING1
[oracle@ip-172-31-88-93 ~]$ srvctl start database -database THING1
[oracle@ip-172-31-88-93 ~]$ grep THING /etc/oratab
THING1:/u01/app/oracle/product/19.3.0/dbhome_1:N                # line added by Agent
THING2:/u01/app/oracle/product/19.3.0/dbhome_1:N                # line added by Agent

Drop the restore points

SQL> drop restore point AUTOUPGRADE_221145114461854_THING1;

Restore point dropped.

and

SQL> drop restore point AUTOUPGRADE_221145114461854_THING2;

Restore point dropped.

Bug in distribution version

The AutoUpgrade that is distributed with Oracle 19.3 does not work well with ASM. Analyze will produce a false error and Deploy mode will fail.

reco

If you are using ASM, please download the most recent autoupgrade.jar version or use Upgrade mode only. If you use Upgrade mode, you must issue “startup upgrade” from the new Oracle home.

Conclusion

AutoUpgrade is an easy-to-use utility that upgrades multiple Oracle databases in the background. Be aware of these points:

Do not use the distribution autoupgrade.jar. Download the latest version.
On small hosts with multiple instances, manage CPU and memory.
Check that your database is in archivelog mode.
Allocate sufficient space for archive logs.
Use an alias to simplify the command line.
Use rlwrap for a more efficient CLI experience.
Use Analyze mode to identify errors, warnings, and fixups.
Use Fixup mode to apply fixups ahead of time.
Use Upgrade mode if you wish to handle analyze, fixup, and upgrade as separate steps.
Use Upgrade mode if you wish to migrate to a new host.
Handle all steps automatically by using Deploy mode.
Be aware of fixups that will be made in Deploy mode.
Monitor progress at the CLI prompt with “lsj”
Tie up loose ends:
- srvctl upgrade
- edit oratab
- drop restore point

Unexpected behavior in Data Guard Maximum Protection

Posted on September 14, 2019 by Brian Fitzgerald

By Brian Fitzgerald

Introduction

Maximum Protection mode provides the highest level of data protection in Data Guard. Although a Maximum Protection system protects data as documented, there are performance issues that database administrators should be aware. The most glaring weakness is that in case of a network timeout anywhere in the system, the primary database hangs. Administrators should be aware of this before deciding on Maximum Protection mode.

Protection Modes

To quote the manual, Oracle Data Guard ensures high availability, data protection, and disaster recovery for enterprise data. Oracle Data Guard provides three distinct modes of data protection.

Maximum Performance

This is the default protection mode. It provides the highest level of data protection that is possible without affecting the performance of a primary database.

Maximum Availability

This protection mode provides the highest level of data protection that is possible without compromising the availability of a primary database.

Maximum Protection

This protection mode ensures that no data loss occurs if the primary database fails. To provide this level of protection, the redo data needed to recover a transaction must be written to both the online redo log and to the standby redo log on at least one synchronized standby database before the transaction commits. To ensure that data loss cannot occur, the primary database shuts down, rather than continue processing transactions, if it cannot write its redo stream to at least one synchronized standby database.

This article is about Maximum Protection mode. Based on the description, there are a few obvious points to be deduced about Maximum protection. A practical system must have more than one standby. Otherwise, failure of the sole standby will cause the system to crash. Secondly, a low latency network is required to achieve reasonable commit performance. Finally, completely redundant hosts and storage are required for each standby. In the past, with physical hardware and long administrative lead times, it was prohibitive to experiment with Maximum Protection. In the cloud, however, virtual machines are inexpensive and setup is quick.

License

Data Guard is a feature of the Oracle Database Enterprise Edition itself and does not require separate licensing. By using Amazon Elastic Cloud Computing, (EC2), you can control your license costs by configuring only the CPUs that you need.

Overview

The demonstration system characteristics are:

Amazon AWS
Region us-west-1 (Northern California)
Availability zones us-west-1b and us-west-1c
Elastic Compute Cloud (EC2)
Red Hat Linux 7.2
Oracle Grid Infrastructure 19c
Oracle Database 19c

AWS availability zones are separate locations within the same geographic region. In us-wet-1, intra-availability-zone TCP latency, as measured by qperf, is 136μs, which is attractive for setting up Oracle Data Guard Maximum Protection. Inter-availability-zone TCP latency is 562μs. Note that availability zones identifiers are mapped independently for each account. For example, availability zone us-west-1b could refer to different locations for different accounts. Here is the mapping from database to availability zone that was used in this test.

db unique name	availability zone
SFB1	us-west-1b
SFB2	us-west-1b
SFC1	us-west-1c
SFC2	us-west-1c

Setup

Here is a brief summary of the preparatory steps:

environment:
- create and configure virtual machine
- install grid infrastructure
- create ASM disk groups
- install oracle database software
- setup static listener
duplicate database
- setup tnsnames.ora
- create orapwd
- create temporary init ora
- startup nomount
- duplicate target database for standby
setup Oracle Restart
validate all static listeners
start Data Guard broker

Create configuration

DGMGRL> create configuration 'CFG' primary database is 'SFB1' connect identifier is 'SFB1';
Configuration "CFG" created with primary database "SFB1"
DGMGRL> add database 'SFB2' as connect identifier is 'SFB2';
Database "SFB2" added
DGMGRL> add database 'SFC1' as connect identifier is 'SFC1';
Database "SFC1" added
DGMGRL> add database 'SFC2' as connect identifier is 'SFC2';
Database "SFC2" added

Set log transfer property to SYNC. Set Maximum Availability

DGMGRL> edit database SFB2 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit database SFC1 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit database SFC2 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.

Test switchovers. Primary SFB1->SFB2->SFC1->SFC2->SFB1.

DGMGRL> switchover to SFB2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFB2"
Connecting ...
Connected to "SFB2"
Connected as SYSDBA.
New primary database "SFB2" is opening...
Oracle Clusterware is restarting database "SFB1" ...
Connected to "SFB1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB1"
Connected to "SFB1"
Switchover succeeded, new primary is "sfb2"
DGMGRL> switchover to SFC1
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC1"
Connecting ...
Connected to "SFC1"
Connected as SYSDBA.
New primary database "SFC1" is opening...
Oracle Clusterware is restarting database "SFB2" ...
Connected to "SFB2"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB2"
Connected to "SFB2"
Switchover succeeded, new primary is "sfc1"
DGMGRL> switchover to SFC2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC2"
Connecting ...
Connected to "SFC2"
Connected as SYSDBA.
New primary database "SFC2" is opening...
Oracle Clusterware is restarting database "SFC1" ...
Connected to "SFC1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFC1"
Connected to "SFC1"
Switchover succeeded, new primary is "sfc2"
DGMGRL> switchover to SFB1
Performing switchover NOW, please wait...
Operation requires a connection to database "SFB1"
Connecting ...
Connected to "SFB1"
Connected as SYSDBA.
New primary database "SFB1" is opening...
Oracle Clusterware is restarting database "SFC2" ...
Connected to "SFC2"
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFC2"
Connected to "SFC2"
Switchover succeeded, new primary is "sfb1"

Now we are ready to implement Maximum Protection mode

Set Maximum Protection

DGMGRL> edit configuration set protection mode as MaxProtection; 
Succeeded.

Test switchovers again. There is no issue

DGMGRL> switchover to SFC2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC2"
Connecting ...
Connected to "SFC2"
Connected as SYSDBA.
New primary database "SFC2" is opening...
Oracle Clusterware is restarting database "SFB1" ...
Connected to "SFB1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB1"
Connected to "SFB1"
Switchover succeeded, new primary is "sfc2"

Network failures

Failure of some, but not all standbys

Here, where you are testing actual network failures, is where you are going to notice serious issues with Maximum Protection mode.

The initial status is normal. All standbys are in sync with the primary.

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    SFB2 - Physical standby database
    SFC1 - Physical standby database
    SFC2 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

Now we want to cause a network failure to one standby, SFC2, for example. We stage the failure by cutting TCP port 1521. In the AWS console, create a new security group that allows ingress to TCP port 22 only.

cutoff.sg

Go to change security groups. Un-check the original security group, check the “ssh only” security group, and press “Assign Security Groups”. There are three standbys, but we are cutting off network access to only one standby. The system should continue to operate normally. However, right away, you can notice a problem. Try to commit a transaction.

SQL> insert into t ( n ) values ( 0 );

1 row created.

SQL> commit;

You will find that the transaction does not complete. You can run “show configuration”. For 30 seconds, the configuration shows up as normal. No messages appear in the alert log. All the while, only one standby is unreachable. Two other standbys are reachable, but the database hangs. Eventually, Oracle notices the issue. Messages start appearing in the alert log.

2019-09-13T12:43:51.076718-04:00
LGWR (PID:26748): ORA-16198: Received timed out error from KSR
LGWR (PID:26748): Attempting LAD:4 network reconnect (16198)
LGWR (PID:26748): LAD:4 network reconnect abandoned

The transaction commits. The configuration now appears as:

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    SFB2 - Physical standby database
    SFC1 - Physical standby database
    SFC2 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 8 seconds ago)

You can repeat the process for a second standby, SFC1, for example. Again, the database hangs for 30 seconds, and finally resolves.

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    Error: ORA-16778: redo transport error for one or more members

    SFB2 - Physical standby database
    SFC1 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

    SFC2 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 160 seconds ago)

This section, where failure of some, but not all standbys leads to an application hang is going to be objectionable to some application users.

Failure of all standbys

Now disconnect the third standby. Observe this behavior:

The primary will retry the standbys for 5 minutes.
Sessions that issue a commit will wait.
The primary will abort and restart.
During restart, the instance will hang on open database.
After 8 more minutes, the primary will abort.

Host shutdown is the expected behavior after a failure of all standbys. The following message will appear in SQL*Plus sessions that were waiting on commit.

commit
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 10524
Session ID: 39 Serial number: 7415

Some may find it surprising that the shutdown is not instantaneous, and that it takes five minutes before the database shuts down.

Adjusting NetTimeout

You can reduce the amount of time that an application can wait on TCP timeout:

DGMGRL> edit database SFB1 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFB2 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFC1 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFC2 set property NetTimeout = 5;
Property "nettimeout" updated

This is a workaround and is beside the point. The application still has to wait to commit even though one or more standbys is still available. There should be no hanging.

Conclusion

There is a flaw in Data Guard Maximum Protection. If a TCP timeout error involving only one standby, the primary will hang even if other standbys are available. This issue will lead some administrators to rule out Data Guard Maximum Protection.

In the past, because of high hardware costs, long lead times, and separation of duties across DBA and SAs, it was rare to test Maximum Protection. Now, with the cloud, testing such arrangements is convenient, but testing uncovers weakness in the design.

Likewise, in the past, because of separation of duties across DBAs and network administrators, it was rare for DBAs to stage TCP timeout tests, but such tests are simple to perform now. TCP timeout tests uncover weaknesses in Data Guard.

Data Guard Maximum Availability has improved over time to the point where it is an acceptable substitute for Maximum Performance in some cases.

Oracle 12.1 introduced the far sync instance, which can be configured near a primary database on a low latency network to provide near-zero-data loss capability. Maximum Availability and far sync are described in this article. An Active Data Guard license is required for far sync. Ironically, in this sense, Maximum Availability costs more than Maximum Protection, the highest level.