Patching Copy Fail (CVE-2026-31431) on Red Hat Linux

by Brian Fitzgerald

Executive summary

Copy Fail (CVE-2026-31431) is a Linux kernel local privilege escalation that lets an unprivileged user become root with a single deterministic exploit. It is dangerous mainly in combination with remote vulnerabilities: any foothold that runs code as a non-root user — a compromised web service, a container — chains into root on the host. The fix is to install the patched kernel and reboot. Two things are easy to miss: instances launched from older AMIs, and instances restored from pre-patch snapshots, are vulnerable until they too are patched and rebooted. For hosts that can’t be patched immediately, blacklisting algif_aead via the kernel command line is an effective interim mitigation.

Introduction

Copy Fail (CVE-2026-31431) is a local privilege escalation (LPE) vulnerability in the Linux kernel’s algif_aead module, the AEAD interface of the kernel’s user-space crypto API (AF_ALG). It was disclosed publicly on April 29, 2026, and it affects essentially every mainline Linux kernel released since 2017 — across Ubuntu, RHEL, Amazon Linux, SUSE, Debian, and others. Its CVSS 3.1 score is 7.8 (High).

The bug is, on its own, only a local privilege elevation: the attacker must already have unprivileged host access. The danger is in what this vulnerability can be paired with. Any remote vulnerability becomes a path to root on the host. That combination — a remote foothold plus a reliable, deterministic LPE — is what makes Copy Fail a priority to address.

The fix is a kernel update. Three operational details are worth keeping in mind:

  1. Patched hosts must be rebooted. The fixed kernel does not take effect until it is the running kernel.
  2. Newly launched EC2 instances must be patched too, even if the AMI was current as of last week. Any AMI built before the fix landed in the vendor’s repos ships a vulnerable kernel.
  3. Instances restored from snapshots inherit the kernel of whatever was running when the snapshot was taken. A snapshot from before the patch is a vulnerable host, regardless of how recently it was restored.

This article walks through patching a single RHEL 10 EC2 instance on my personal AWS account and confirming, with a public proof-of-concept, that the patched kernel is no longer exploitable. It also revisits the questions and workarounds that came up in the days between disclosure and the kernel patch becoming available.

Demonstrate the bug

The instance is running RHEL 10.1 with the pre-patch kernel. Logged in as the unprivileged ec2-user, running the public proof-of-concept yields a root shell:

[ec2-user@ip-10-2-0-34 ~]$ python3.12 copy_fail_exp.py
[root@ip-10-2-0-34 ec2-user]# whoami
root
[root@ip-10-2-0-34 ec2-user]# id -u
0

No password prompt, no SSH key, no sudo — the exploit runs from a normal user shell and the next prompt is root. This is the deterministic behavior the vulnerability is known for: no race condition, no offsets, no retries.

The exploit script targets a setuid-root binary — /usr/bin/su — and on the unpatched kernel it tampers with that binary’s in-memory pages so that invoking it yields a root shell.

Day 0: can I just uninstall the affected piece?

On the day of disclosure, before any vendor guidance was out, the natural first question was whether the vulnerable subsystem could simply be removed — uninstall aead, uninstall authencesn, uninstall AF_ALG. The answer is no. There’s no algif-aead package, no af-alg package, no authencesn package. The vulnerable subsystem is part of the kernel image itself — built in on the affected distributions, not shipped as a separately installable or removable component. There is nothing to uninstall. The only way to change the code in question through the package manager is to replace the kernel, and on day 0 there was no fixed kernel to replace it with.

That left waiting for vendor guidance.

Day 1: blacklist algif_aead with grubby

The first official mitigation, ahead of the patched kernel, was to blacklist the algif_aead initcall on the kernel command line via grubby and restart:

[root@ip-10-2-0-34 ~]# grubby --update-kernel=ALL --args="initcall_blacklist=algif_aead_init"

After the restart, the AEAD interface of AF_ALG is never registered. The exploit script doesn’t get as far as the page-cache write that fails on the patched kernel — it fails earlier, at the system calls that set up the AF_ALG socket. With algif_aead blacklisted, bind() fails with FileNotFoundError. If AF_ALG is blacklisted entirely, socket() fails first, with EAFNOSUPPORT. Either way, the exploit doesn’t run.

Patch the kernel

The fix is delivered through the standard kernel package. On RHEL 10:

[root@ip-10-2-0-34 ~]# dnf update kernel

and so on …

Installing:
kernel x86_64 6.12.0-124.55.1.el10_1 rhel-10-baseos-rhui-rpms 1.4 M

… and so on

Installed:
kernel-6.12.0-124.55.1.el10_1.x86_64 kernel-core-6.12.0-124.55.1.el10_1.x86_64 kernel-modules-6.12.0-124.55.1.el10_1.x86_64 kernel-modules-core-6.12.0-124.55.1.el10_1.x86_64
Complete!

Restart the host

I restarted the EC2 instance from the AWS console. You can also issue reboot.

Demonstrate that the bug is patched

[root@ip-10-2-0-34 ~]# uname -r
6.12.0-124.55.1.el10_1.x86_64
[root@ip-10-2-0-34 ~]# exit
[ec2-user@ip-10-2-0-34 ~]$ python3.12 copy_fail_exp.py
Password:

On the patched kernel the tampering step does not happen, so /usr/bin/su runs normally and prompts for a password, which is exactly what an unprivileged user trying to become root should encounter.

Once the kernel is patched, the aead blacklist can be removed. Legitimate code can resume using the kernel’s user-space crypto interface.

What’s interesting about the chain

What stands out about Copy Fail is how unremarkable each step of the exploit is on its own. Reading a file as an unprivileged user is allowed. The kernel’s page cache is unified, so the pages a read brought in are the same pages a later execution would run from — that’s a feature, not a bug, and it’s what makes cache-warming tricks work. Opening a socket — including the user-space crypto sockets the AEAD interface provides — is allowed. Splicing data between file descriptors is allowed. None of these steps requires elevated privileges, and none of them looks suspicious in isolation. The exploit is a sequence of ordinary, sanctioned operations strung together so that a kernel weakness at one step lands a specially crafted change in a place that affects what the next setuid execution does. That’s what makes a local privilege elevation worth taking seriously even when the immediate prerequisite is “any unprivileged code execution on the host” — the building blocks are everywhere.

What this means going forward

Copy Fail is unlikely to be the last bug of this shape. About a week after Copy Fail was disclosed, a second LPE was disclosed under the name Dirty Frag (CVE-2026-43284 and CVE-2026-43500). Different kernel subsystems — xfrm-ESP and RxRPC — and different code paths, but the same underlying pattern: an in-place operation writing into page-cache pages that aren’t privately owned by the kernel.

The page cache is a large, well-trafficked piece of kernel infrastructure, and the in-place-optimization pattern that Copy Fail and Dirty Frag both exploit shows up in more than a few places. It would be surprising if the next several months didn’t bring more bugs in this family.

Two things about how these bugs are being found and disclosed are worth flagging. First, AI-assisted code analysis is now a real factor in kernel vulnerability discovery. Copy Fail was found that way — the researchers who disclosed it have said so explicitly — and the fact that the underlying weakness had been sitting in the kernel since 2017, through nine years of human review, suggests the new tooling is reaching code paths and combinations that didn’t get attention before. The same is likely true of Dirty Frag, where the older of the two bugs also dates to 2017. The economics of this kind of analysis are different from a human researcher’s: it scales, it doesn’t get bored, and it can re-examine entire subsystems whenever a new pattern is identified. The pace of discoveries should be expected to follow.

Second, the window between disclosure and a working public exploit is shrinking. Copy Fail had a working PoC out the day of disclosure. Dirty Frag was disclosed ahead of schedule because a third party broke the coordinated embargo, and a working PoC was public before any distribution had a patched kernel ready to ship. “Patch as soon as the vendor ships” is still the right answer, but it presupposes a vendor patch is available. When the disclosure outruns the patch, the question on the table is what the fleet does in the interim — which mitigation, applied through which mechanism, tracked how, removed when. That used to be an exceptional case; it is becoming an ordinary one.

The combination — more bugs, found faster, disclosed with less runway — argues for a fleet posture that can move quickly through several states. A known patching path for when patches exist. A known mitigation path for the gap before they do. A way to know, for any given host, which state it’s in. The specific bugs will change. The shape of the response is what’s worth investing in.

Conclusion

On its own, Copy Fail is a local privilege elevation — it requires the attacker to already have code execution on the host. In a cloud environment, that prerequisite is met by a wide range of common scenarios. Any remote bug in a public-facing service, any compromised CI runner, any malicious dependency that runs during a build, any container that can be coerced into executing attacker-supplied code, is enough of a foothold to chain into root on the host. The LPE is what turns a low-impact remote bug into a host compromise.

Patching the kernel and restarting the instance is the fix. Two operational notes worth keeping in mind:

  • New instances need to be patched, too. An AMI that was current last month is not current now. Any instance launched from a pre-patch image needs dnf update kernel (or the equivalent) and a restart before it should be considered safe.
  • Snapshots carry the kernel that was running when they were taken. Restoring a snapshot from before the fix produces a vulnerable host, even if the restore happened five minutes ago. The same patch-and-restart step applies.

Each host should land in one of two states: a patched kernel verified by uname -r, or the vulnerable code path made unreachable through a tracked mitigation such as the initcall_blacklist boot argument. Hosts in neither bucket are still exposed.

ASMLib on Red Hat 9

by Brian Fitzgerald

Introduction

Oracle has desupported ASM filter driver (AFD) 2806979.1. DBAs looking to support RAC need to look elsewhere for ASM management software. The updated asmlib release fills the bill and is available on Red Hat Linux, but finding the downloads is not straightforward.

Download

On RHEL 9, you will need these downloads:

oracleasm-support

https://yum.oracle.com/repo/OracleLinux/OL9/addons/x86_64/

oracleasm-support-3.1.0-10.el9.x86_64.rpm

oracleasmlib

https://www.oracle.com/linux/downloads/linux-asmlib-v9-downloads.html

oracleasmlib-3.1.0-6.el9.x86_64.rpm

Uninstall AFD

# /u01/app/oracle_grid/product/1930/grid/bin/afdroot uninstall

Install and start

# uname -r
5.14.0-427.42.1.el9_4.x86_64
# rpm -ihv oracleasm-support-3.1.0-10.el9.x86_64.rpm
# rpm -ihv oracleasmlib-3.1.0-6.el9.x86_64.rpm
# systemctl start oracleasm.service
# oracleasm scandisks
# oracleasm listdisks
ORA_ASM_GRID1_01
ORA_ASM_GRID1_02
ORA_ASM_GRID1_03

Conclusion

asmlib is a convenient drop-in replacement for afd. asmlib protects your ASM disks from illegal writers, just as AFD did. The downloads can be found off the beaten track.

replace ec2 ssh key

By Brian Fitzgerald

Introduction

The employee with the ec2-user secret ssh key left the company.

Now what?

Procedure

identify a key

Use an exiting ssh key pair or create a new one with a command such as:

ssh-keygen -m -f magic

Files magic and magic.pub get created as a result, in this example.

stop the ec2 instance

edit the user data

Select your ec2 instance. Select actions->Edit user data.

Paste a script such as this one, substituting the contents of your actual magic.pub:

Content-Type: multipart/mixed; boundary="=+"
MIME-Version: 1.0

--=+
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

cloud_final_modules:
- [scripts-user, always]

--=+
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="replace-ssh-key.bash"

#!/bin/bash

cat > /home/ec2-user/.ssh/authorized_keys <<-EOF
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC1pY+ct6N1DosG1qTSvbTVByY8CUilyn0ZVexISyBd+a4SPXbjgOKsRv4wiAYNHkMD29zFkvGXUgg8mB6UaOD7OeoI4opHpGNVYKGx4Sf+frvOzMuS7Z/VCrT+MtRc0GFI57K4aHNIVtJ6TxS1aq79eaD1ORG7TlgdqbBuR49KD1CemCZB/NdzPoo5D2oZTbr3yvrDXzRApzRX++DK3EZyNSrma8p1NUxB3H8JcNj7fmnDehrrAqoe9HctwJidc9/n/5c2AN3WV59SZLX4GhDRiKXIxOAq7pG0CWAUotlvRGhNRch87KzueLmQ/i5RsAKrx/B3dOAhlI6IF54HC8cdO6gjpjjfLxBCpireLHxNjNtBQCchMnHzGShGH3QEB1wHTZIa0ezN2kvjelC7hAB6UR/Klt6DD0lO4yBVGraahlUNFffLPHCf+Jym9Ppp2aXa0qzJIoRGFxpyC4KZDU1kzdWb3/Nou9qamyUbe4NBRLBc76L9sdD2Iz1SoolYm4E= brian@RWMZLB
EOF
--=+

start the ec2 instance

ssh using the new key

ssh -i magic ec2-user@your-host

Conclusion

Done!

Launch AMI by name in terraform

By Brian Fitzgerald

Introduction

Using Terraform, you can launch an AWS EC2 instance from any account and any region using a public AMI identified only by image name. Because you do not need to lookup the image id each time, your code can be simplified.

Approach

Red Hat account number

In the AWS console, notice that the Red Hat, Inc. account number is 309956199498.


AMI name

Find the Red Hat AMI image by name. In the console, navigate to EC2 Images AMIs. Filter on Owner: 309956199498, Architecture: 64-bit (x86), Virtualization type: HVM.

For example, “RHEL-7.9_HVM_GA-20200917-x86_64-0-Hourly2-GP2”. You could also search using the AWS command line:

$ aws ec2 describe-images --owners 309956199498 --filters "` cat filters.ami.json `" --region us-east-2

where filters.ami.json contains:

[
    {
        "Name": "architecture",
        "Values": [
             "x86_64"
        ]
    },
    {
        "Name": "virtualization-type",
        "Values": [
             "hvm"
        ]
    }
]

Terraform data source

Define a Terraform aws_ami data source.

ami-rhel.tf

data "aws_ami" "rhel" {
    most_recent = true
    owners = [
        "309956199498"
    ]
    filter {
        name   = "name"
        values = [
            "RHEL-7.9_HVM_GA-20200917-x86_64-0-Hourly2-GP2"
        ]
    }
}

aws instance resource

Use the data source in your Terraform aws_instance resource

provider "aws" {
    region      = var.region
}

resource "aws_instance" "your-ec2-rsrc" {
    ami           = data.aws_ami.rhel.image_id

and so on . . .

Launch the ec2 instance

Run “terraform apply”. Terraform determines the AMI ID. For example, ami-0d2bf41df19c4aac7.

Conclusion

By using the Terraform AWS AMI data source, you can launch a public AMI from any account and region. The AMI ID is different for each account and region.

curl on el5 with TLS 1.2

By Brian Fitzgerald

Background

We needed a curl that supportsTLS 1.2 to be working on Red Hat el5 in order to migrate off el5. The default curl does not support TLS 1.2
Solution: download and compile curl on el5

Test case

Download Oracle patch p6880880_200000_Linux-x86-64.zip from MOS.
Steps:
Store the URL, the file name, and your cookies in files url.txt, filename.txt , and cook.txt, respectively. For instructions on getting your browser cookies, please refer to Oracle binaries or patch download from Linux command line.

Symptom

$ uname -r
2.6.18-426.el5
$ which curl
/usr/bin/curl
$ curl -b cook.txt  -o "` cat filename.txt `" -L "` cat url.txt `"
curl: (35) error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure

Prerequisites to compiling curl

perl 

Install Perl v5.10.0 minimum
unset LD_LIBRARY_PATH
cd /u02/sw/perl/src/perl-5.30.2
./Configure -de -Dprefix=/u02/sw
make
make install

openssl

cd /u02/sw/openssl/src/openssl-1.1.1g
export PATH=/u02/sw/bin:/bin:/usr/bin
./config --prefix=/u02/sw enable-egd  
make
make install

openldap

cd /u02/sw/openldap/src/openldap-2.4.56
export PATH=/bin:/usr/bin
export LD_FLAGS=-L/u02/sw/lib
export CPPFLAGS=-I/u02/sw/include
./configure --prefix=/u02/sw --with-ssl=/u02/sw
make
make install

sasl

cd /u02/sw/cyrus-sasl/src/cyrus-sasl-2.1.27
./configure --prefix=/u02/sw --with-openssl=/u02/sw/ssl
make
make install
ln -s libsasl2.so /u02/sw/lib/libsasl2.so.2

Compile curl

cd /u02/sw/curl/src/curl-7.73.0
./configure --prefix=/u02/sw --with-ssl=/u02/sw
make
make install
unset LD_FLAGS

Tests

el5

$ ./test.u02.curl.bash
2.6.18-426.el5
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  116M  100  116M    0     0  9357k      0  0:00:12  0:00:12 --:--:-- 17.3M

el6

$ ./test.u02.curl.bash
2.6.32-754.31.1.el6.x86_64
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  116M  100  116M    0     0  4132k      0  0:00:28  0:00:28 --:--:-- 4184k

el7

$ ./test.u02.curl.bash
3.10.0-1062.el7.x86_64
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  116M  100  116M    0     0  42.3M      0  0:00:02  0:00:02 --:--:-- 51.8M

Actual TLS version

Verbose output shows:
$ curl -v -b cook.txt -o “` cat filename.txt `” -L “` cat url.txt `”
 . .
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1

Conclusion

A locally-compiled curl runs on Red Hat el5, el6, el7

EC2 maximum number of volumes

Introduction

The documented AWS EC2 EBS volume attachment limit is 27. The attachment limit affects the maximum number of Oracle ASM disks.

EC2 volume limit

Amazon AWS documents a 27-volume attachment limit. I have found that a t2 type creation attempt with 28 EBS volumes will abort.

error.png

Attempting to create an m5 (Nitro)  with 28 or more volumes will hang.

creating

The attachment limit is 28, including network interfaces, volumes, and instance store volumes. EC2 instances with one network interface can have up to 27 volumes attached.

EC2 instances with more than 27 volumes

I am aware of t2 EC2 systems with as many as 44 volumes attached. I have no information about how this was done, and what the customer’s support expectations are. I would be concerned about how such a system would respond to a change of instance type.

Oracle Database

In planning an Oracle database installation, you may need file system mounts as well. /u01, for example. The underlying volume counts against the maximum attachment count.

In laying out an ASM setup, consider a simple design, with as few ASM disks and ASM disk groups as needed.

Conclusion

Based on the available public information, I recommend limiting the number of EBS volume attachments to 27. Oracle database administrators might want to simplify their ASM implementation.

Upgrade to postgres 12

By Brian Fitzgerald

Introduction

Here is a walkthough of upgrading PostgreSQL from version 10 to 12. PostgreSQL version 12 was released today, Oct 3, 2019. The operating system is Red Hat Linux 7.7.

Install the new binaries

Caution: the installation will overwrite your ~postgres/.bash_profile.

As root:

# yum -y erase pgdg-redhat-repo
# yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# yum -y install postgresql12-server

Note: the installation will re-point some symbolic links in /usr/bin and /etc/alternatives. For example::

lrwxrwxrwx. 1 root root 28 Oct  3 15:27 /usr/bin/psql -> /etc/alternatives/pgsql-psql
lrwxrwxrwx. 1 root root 22 Oct  3 15:27 /etc/alternatives/pgsql-psql -> /usr/pgsql-12/bin/psql

Shut down the old version

As postgres:

$ echo $PATH /usr/pgsql-10/bin:/var/lib/pgsql/bin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/sbin 
$ echo $PGDATA /var/lib/pgsql/10/data 
$ pg_ctl stop 
waiting for server to shut down.... done 
server stopped

Initialize the new PGDATA

$ PATH=/usr/pgsql-12/bin:/var/lib/pgsql/bin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/sbin
$ PGDATA=/var/lib/pgsql/12/data
$ mkdir -p /var/lib/pgsql/12/data
$ initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/pgsql/12/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... America/New_York
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /var/lib/pgsql/12/data -l logfile start

Upgrade

Supply the old and new bin and data directories on the command line.

$ pg_upgrade -b /usr/pgsql-10/bin -B /usr/pgsql-12/bin -d /var/lib/pgsql/10/data -D /var/lib/pgsql/12/data
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for tables WITH OIDS                               ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
                                                            ok
Copying user relation files
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

Start the new version

[postgres@ip-172-31-47-137 data]$ pg_ctl start
waiting for server to start....2019-10-03 15:28:59.551 EDT [2589] LOG:  starting PostgreSQL 12.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2019-10-03 15:28:59.551 EDT [2589] LOG:  listening on IPv6 address "::1", port 5432
2019-10-03 15:28:59.551 EDT [2589] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2019-10-03 15:28:59.554 EDT [2589] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-10-03 15:28:59.559 EDT [2589] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-10-03 15:28:59.569 EDT [2589] LOG:  redirecting log output to logging collector process
2019-10-03 15:28:59.569 EDT [2589] HINT:  Future log output will appear in directory "log".
 done
server started

Analyze the database

I have one user database, “elections”.

$ ./analyze_new_cluster.sh
This script will generate minimal optimizer statistics rapidly
so your system is usable, and then gather statistics twice more
with increasing accuracy.  When it is done, your system will
have the default level of optimizer statistics.

If you have used ALTER TABLE to modify the statistics target for
any tables, you might want to remove them and restore them after
running this script because they will delay fast statistics generation.

If you would like default statistics as quickly as possible, cancel
this script and run:
    "/usr/pgsql-12/bin/vacuumdb" --all --analyze-only

vacuumdb: processing database "elections": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "elections": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "elections": Generating default (full) optimizer statistics
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
vacuumdb: processing database "template1": Generating default (full) optimizer statistics

Done

Delete the old data files

[postgres@ip-172-31-47-137 data]$ du -s /var/lib/pgsql/10/data /var/lib/pgsql/12/data
2379432 /var/lib/pgsql/10/data
1693660 /var/lib/pgsql/12/data
[postgres@ip-172-31-47-137 data]$ ./delete_old_cluster.sh
[postgres@ip-172-31-47-137 data]$ du -s /var/lib/pgsql/10/data /var/lib/pgsql/12/data
du: cannot access ‘/var/lib/pgsql/10/data’: No such file or directory
1691976 /var/lib/pgsql/12/data

Update ~postgres/.pgsql_profile

PATH=/usr/pgsql-12/bin:$PATH
export PGDATA=/var/lib/pgsql/12/data

Restore your old ~postgres/.bash_profile, if desired.

Conclusion

This was an overview of an upgrade of a standalone server from PostgreSQL 10 to version 12.

AutoUpgrade Oracle ->19c

By Brian Fitzgerald

Introduction

AutoUpgrade is a convenient utility for completing multiple database upgrades in parallel. AutoUpgrade was released undocumented in Oracle 18c, and was documented for the first time in the 19c manual:

The AutoUpgrade utility identifies issues before upgrades, deploys upgrades, performs postupgrade actions, and starts the upgraded Oracle Database.

Use AutoUpgrade to upgrade Oracle Database from version 11.2.0.4 and up. To use AutoUpgrade, prepare a simple configuration file and run autoupgrade.jar. Behind the scenes, AutoUpgrade runs all steps necessary to complete the upgrade. AutoUpgrade runs on a single host. If you want to migrate to a separate, new host, run AutoUpgrade preparatory steps on the Source (original) host. Then, copy the database to the new host and run the Upgrade step on the target host.

News about AutoUpgrade

Mike Dietrich is Oracle’s product manager for AutoUpgrade. Refer to his blog for news about AutoUpgrade. Future topics or changes could include coverage of new Oracle versions, RAC, Restart, Data Guard, and migrating directly from non-CDB to CDB.

Demonstration platform description

AutoUpgrade is demonstrated in this blog article. The system attributes of the demonstration system are:

Attribute Value
location US East (N. Virginia)
instanceType c5.xlarge
tenancy shared
instanceFamily Compute optimized
physicalProcessor Intel Xeon Platinum 8124M
clockSpeed 3.0 Ghz
Number of CPUs 4
memGB 8 GiB
operatingSystem Linux
AMI RHEL-7.7_HVM_GA-20190723-x86_64-1-Hourly2-GP2
OS Vendor Red Hat, Inc.
Red Hat version 7.7
price per hour $0.17

Demonstration database environment

Here is a database environment summary:

Description Value
Source version 18.3.0
Target version 19.3.0
Instances THING1 and THING2
Instance type Restart

Concurrency

Be aware that AutoUpgrade is going to launch an upgrade process for each database in the configuration file. The result is going to be multiple concurrent upgrade processes running and multiple concurrent active databases.If you are upgrading few databases on a box with a lot of memory, you will have no problem.

If you have multiple databases on a smaller box, you may find that the load average will reach a high level, and interactive shell response will lag. If you find that AutoUpgrade results in an unmanageable load average, there is more than one solution. For example, you could temporarily shut down some databases while you upgrade other databases. If you are in the cloud, you can get better performance by temporarily running the host image on an instance type with a higher amount of memory and CPU:

aws ec2 stop-instances --instance-ids i-00a836c352bb54daa
aws ec2 modify-instance-attribute --instance-id i-00a836c352bb54daa --instance-type '{"Value": "c5.xlarge"}'
aws ec2 start-instances --instance-ids i-00a836c352bb54daa

Finally, you may use the start_time parameter to schedule database upgrades at different future times.

Alias

Run the AutoUpgrade binary out of an Oracle home at the target version. You are going to be put into an interactive command line interface (CLI), and you are going to check on job status multiple times with the “lsj” command. For convenience, use rlwrap and create an alias.

[oracle@ip-172-31-88-93 ~]$ AH=/u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-88-93 ~]$ alias au='rlwrap $AH/jdk/bin/java -jar $AH/rdbms/admin/autoupgrade.jar'

In this article, “au” refers to this alias.

Notice that in the alias, AH refers to an Oracle home where Java version 8 can be found. Java version 8 can be found in an Oracle 12c, 18c, or 19c Oracle home. Be sure to install the latest autoupgrade.jar into that Oracle home.

[oracle@ip-172-31-88-93 dbhome_1]$ $AH/jdk/bin/java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

If you want to run AutoUpgrade Analyze or Fixups on an Oracle 11.2.0.4 database that is on a separate host, install Java version 8 and the latest autoupgrade.jar there.

Version

AutoUpgrade distributed with Oracle Database 19.3 is version 20190207.

[oracle@ip-172-31-88-93 ~]$ au -version
build.version 20190207
build.date 2019/02/07 12:35:56
build.label RDBMS_PT.AUTOUPGRADE_LINUX.X64_190205.1800

Version 20190207 contains a bug that leads to a failure of Deploy mode on ASM. You can get the most recent version from Oracle Customer Support: AutoUpgrade Tool (Doc ID 2485457.1). As of this writing, the most recent version is:

[oracle@ip-172-31-88-93 ~]$ au -version
build.hash 67fee5b
build.version 20190823
build.date 2019/08/23 18:08:47

Configuration file

A single configuration file will be used to upgrade all instances. You may use AutoUpgrade itself to create a sample configuration file.

[oracle@ip-172-31-88-93 ~]$ au -create_sample_file config
Created sample configuration file /home/oracle/sample_config.cfg

Rename and customize the configuration file. There is one global section and multiple database sections, as many as you want. Parameter dbname actually refers to db_unique_name.

Optional AutoUpgrade configuration parameters deliver extended functionality such as changing database initialization parameters on the fly during upgrade, custom scripts to run before or after upgrade, guaranteed restore point, fixup list modification, environment variables, pluggable database upgrades, optional utlrp run, and alternate tns_admin directory location.

[oracle@ip-172-31-88-93 ~]$ mkdir -p /u01/app/oracle/autoupgrade
[oracle@ip-172-31-88-93 ~]$ cat thing1.thing2.18c.19c.conf

#Global configurations
global.autoupg_log_dir=/u01/app/oracle/autoupgrade

upg1.dbname=THING1
upg1.start_time=now
upg1.source_home=/u01/app/oracle/product/18.3.0/dbhome_1
upg1.target_home=/u01/app/oracle/product/19.3.0/dbhome_1
upg1.sid=THING1
upg1.log_dir=/u01/app/oracle/autoupgrade
upg1.upgrade_node=ip-172-31-88-93.ec2.internal
upg1.target_version=19.3
#upg1.run_utlrp=yes
#upg1.timezone_upg=yes

upg2.dbname=THING2
upg2.start_time=now
upg2.source_home=/u01/app/oracle/product/18.3.0/dbhome_1
upg2.target_home=/u01/app/oracle/product/19.3.0/dbhome_1
upg2.sid=THING2
upg2.log_dir=/u01/app/oracle/autoupgrade
upg2.upgrade_node=ip-172-31-88-93.ec2.internal
upg2.target_version=19.3
#upg2.run_utlrp=[yes|no]
#upg2.timezone_upg=[yes|no]

Recommendation: Identify $ORACLE_BASE. Set global and all database log directories to $ORACLE_BASE/autoupgrade. In every case, AutoUpgrade will create a subdirectory. You will get a directory structure like this:

/u01/app/oracle/autoupgrade/cfgtoollogs/
/u01/app/oracle/autoupgrade/THING1/
/u01/app/oracle/autoupgrade/THING2/

Analyze mode

In AutoUpgrade Analyze mode, the database instances should be running out of the Source Oracle home with the databases open.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode analyze

Output:

Autoupgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be analyzed
Enter some command, type 'help' or 'exit' to quit
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED| MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
| 110| THING1|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:20|Starting|
| 111| THING2|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:24|Starting|
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 110| THING1|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:41| Remaining 1/72|
| 111| THING2|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:42|Remaining 67/72|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 2
upg>
Job 110 for THING1 FINISHED
Job 111 for THING2 FINISHED

Locate the report. For example:

[oracle@ip-172-31-88-93 ~]$ find /u01/app/oracle/autoupgrade -name '*.html' -mmin -5
/u01/app/oracle/autoupgrade/global/cfgtoollogs/upgrade/auto/state.html
/u01/app/oracle/autoupgrade/THING1/110/prechecks/thing1_preupgrade.html
/u01/app/oracle/autoupgrade/THING2/111/prechecks/thing2_preupgrade.html

Review the report in a browser.

prechecks

The report may contain Error, Warning, Recommend, and Info findings. You must address severity Error findings before attempting upgrade.

Use of Analyze mode to identify fixups

The Analyze mode runs prechecks that result in a list of changes that AutoUpgrade will make during the Fixups stage. The planned fixups appear in a “checklist” file in three formats. For example:

THING1/101/prechecks/thing1_checklist.cfg
THING1/101/prechecks/thing1_checklist.json
THING1/101/prechecks/thing1_checklist.xml

Prechecks leading to fixups are:

[oracle@ip-172-31-88-93 autoupgrade]$ < THING1/101/prechecks/thing1_checklist.json jq -r '.containers[].checks[] | select( .fixup_available == "YES" ).checkname '
DICTIONARY_STATS
POST_DICTIONARY
POST_FIXED_OBJECTS
PRE_FIXED_OBJECTS
OLD_TIME_ZONES_EXIST
MANDATORY_UPGRADE_CHANGES

The complete json checklist file is presented for information:

{
  "dbname" : "THING1",
  "containers" : [ {
    "containername" : "THING1",
    "checks" : [ {
      "checkname" : "DICTIONARY_STATS",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather stale data dictionary statistics prior to database upgrade in off-peak time using:    EXECUTE DBMS_STATS.GATHER_DICTIONARY_STATS;",
      "broken rule" : "Dictionary statistics do not exist or are stale (not up-to-date).",
      "rule" : "Dictionary statistics help the Oracle optimizer find efficient SQL execution plans and are essential for proper upgrade timing. Oracle recommends gathering dictionary statistics in the last 24 hours before database upgrade.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "POST_DICTIONARY",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather dictionary statistics after the upgrade using the command:    EXECUTE DBMS_STATS.GATHER_DICTIONARY_STATS;",
      "broken rule" : "Oracle recommends gathering dictionary statistics after upgrade.",
      "rule" : "Dictionary statistics provide essential information to the Oracle optimizer to help it find efficient SQL execution plans. After a database upgrade, statistics need to be re-gathered as there can now be tables that have significantly changed during the upgrade or new tables that do not have statistics gathered yet."
    },
    {
      "checkname" : "POST_FIXED_OBJECTS",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather statistics on fixed objects after the upgrade and when there is a representative workload on the system using the command:    EXECUTE DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;",
      "broken rule" : "This recommendation is given for all preupgrade runs.",
      "rule" : "Fixed object statistics provide essential information to the Oracle optimizer to help it find efficient SQL execution plans.  Those statistics are specific to the Oracle Database release that generates them, and can be stale upon database upgrade.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "PRE_FIXED_OBJECTS",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather statistics on fixed objects prior the upgrade.",
      "broken rule" : "None of the fixed object tables have had stats collected.",
      "rule" : "Gathering statistics on fixed objects, if none have been gathered yet, is recommended prior to upgrading.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "OLD_TIME_ZONES_EXIST",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "WARNING",
      "action" : "Upgrade the database time zone file using the DBMS_DST package.",
      "broken rule" : "The database is using time zone file version 31 and the target 19 release ships with time zone file version 32.",
      "rule" : "Oracle recommends upgrading to the desired (latest) version of the time zone file.  For more information, refer to 'Upgrading the Time Zone File and Timestamp with Time Zone Data' in the 19 Oracle Database Globalization Support Guide."
    },
    {
      "checkname" : "MANDATORY_UPGRADE_CHANGES",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "INFO",
      "action" : "Mandatory changes are applied automatically in the during_upgrade_pfile_dbname.ora file.  Some of these changes maybe present in the after_upgrade_pfile_dbname.ora file.  The during_upgrade_pfile_dbname.ora is used to start the database in upgrade mode. The after_upgrade_pfile_dbname.ora is used to start the database once the upgrade has completed successfully.",
      "broken rule" : "",
      "rule" : "Mandatory changes are required to perform the upgrade.  These changes are implemented in the during_ and after_upgrade_pfile_dbname.ora files."
    },
    {
      "checkname" : "RMAN_RECOVERY_VERSION",
      "stage" : "PRECHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "INFO",
      "action" : "Check the Oracle Backup and Recovery User's Guide for information on how to manage an RMAN recovery catalog schema.",
      "broken rule" : "If you are using a version of the recovery catalog schema that is older than that required by the RMAN client version, then you must upgrade the catalog schema.",
      "rule" : "It is good practice to have the catalog schema the same or higher version than the RMAN client version you are using."
    },
    {
      "checkname" : "TABLESPACES_INFO",
      "stage" : "PRECHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "INFO",
      "action" : "To help you keep track of your tablespace allocations, the following AUTOEXTEND tablespaces are expected to successfully EXTEND during the upgrade process.",
      "broken rule" : "",
      "rule" : "Minimum tablespace sizes for upgrade are estimates."
    },
    {
      "checkname" : "DIR_SYMLINKS",
      "stage" : "POSTCHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "WARNING",
      "action" : "To identify directory objects with symbolic links in the path name, run $ORACLE_HOME/rdbms/admin/utldirsymlink.sql AS SYSDBA after upgrade. Recreate any directory objects listed, using path names that contain no symbolic links.",
      "broken rule" : "Some directory object path names may currently contain symbolic links.",
      "rule" : "Starting in Release 18c, symbolic links are not allowed in directory object path names used with BFILE data types, the UTL_FILE package, or external tables."
    }]
  }]
}

Fixup mode

Prior to upgrading, you may run the fixups that are recommended, possible, and available. You can see what fixups will run by running Analyze mode and reviewing the checklist file. In AutoUpgrade Fixup mode, the database instances should be running out of the Source Oracle home with the databases open.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode fixups

The console session:

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode fixups
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 102| THING2|PRECHECKS|PREPARING|RUNNING|19/09/29 01:04|     N/A|01:04:59|Loading DB info|
| 103| THING1|PRECHECKS|PREPARING|RUNNING|19/09/29 01:05|     N/A|01:05:03|Loading DB info|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 102| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:04|     N/A|01:06:28|Remaining 3/4|
| 103| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:05|     N/A|01:05:21|Remaining 4/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 102| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:04|     N/A|01:06:28|Remaining 3/4|
| 103| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:05|     N/A|01:06:36|Remaining 3/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg>
upg> Job 102 completed
Job 103 completed
------------------- Final Summary --------------------
Number of databases            [ 2 ]

Jobs finished successfully     [2]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 102 FOR THING2
Job 103 FOR THING1

Space requirements

You must have sufficient space for archived redo logs. In this example, I upgraded two databases (6.7 GB), and consumed 9.1 GB in RECO. Diskgroup RECO01 was 40 GB and each of two databases had db_recovery_file_dest_size=20g.

Upgrade vs Deploy mode

Two upgrade modes are available, Upgrade and Deploy.

Upgrade mode performs the actual upgrade. You should run the Analyze and Fixup steps first. Upgrade mode is an appropriate choice when the upgrade will be performed on a different host from the Source version. Upgrade mode requires that you manually issue “startup upgrade” in the new Oracle home.

Deploy mode performs the Analyze, Fixup, and Upgrade steps, and additional steps, on a single host in a single execution.

Upgrade mode

Startup upgrade

If you are going to use Upgrade mode, then switch the environment to the target Oracle home. Issue “startup upgrade”.

[ec2-user@ip-172-31-88-93 ~]$ cat /tmp/initTHING1.ora
spfile='+DATA01/THING1/PARAMETERFILE/spfile.276.1019927455'
[oracle@ip-172-31-88-93 ~]$ unset ORACLE_SID 
[oracle@ip-172-31-88-93 ~]$ . oraenv 
ORACLE_SID = [oracle] ? 
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/19.3.0/dbhome_1 
The Oracle base remains unchanged with value /u01/app/oracle 
[oracle@ip-172-31-88-93 ~]$ export ORACLE_SID=THING1 
[oracle@ip-172-31-88-93 ~]$ sysdba 
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Sep 25 19:03:59 2019 
Version 19.3.0.0.0 

Copyright (c) 1982, 2019, Oracle.  All rights reserved. 

Connected to an idle instance. 
SQL> startup upgrade pfile='/tmp/initTHING1.ora'
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             385875968 bytes
Database Buffers         1174405120 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

[oracle@ip-172-31-88-93 ~]$ export ORACLE_SID=THING2
[ec2-user@ip-172-31-88-93 ~]$ cat /tmp/initTHING2.ora
spfile='+DATA01/THING2/PARAMETERFILE/spfile.257.1019928401'
[oracle@ip-172-31-88-93 ~]$ sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Sep 25 19:06:08 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade pfile='/tmp/initTHING2.ora'
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             385875968 bytes
Database Buffers         1174405120 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

Run Upgrade

Start AutoUpgrade. After the upgrade has been launched in the background, you will be dropped int CLI mode. Run “lsj” from time to time to check on status.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode upgrade
Autoupgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Enter some command, type 'help' or 'exit' to quit
upg> tasks
+--+---------------------+-------------+
|ID|                 Name|       Status|
+--+---------------------+-------------+
| 1|                 main|      WAITING|
|33|             jobs_mon|      WAITING|
|34|             mgr_help|TIMED_WAITING|
|35|             watchdog|TIMED_WAITING|
|36|              console|     RUNNABLE|
|37|         queue_reader|      WAITING|
|38|                cmd-0|      WAITING|
|39|        job_manager-0|      WAITING|
|40|        job_manager-1|      WAITING|
|42|           bqueue-108|      WAITING|
|50|       monitor_thing1|TIMED_WAITING|
|51|        catctl_thing1|      WAITING|
|52| abort_monitor_thing1|TIMED_WAITING|
|54|           async_read|     RUNNABLE|
|55|           bqueue-109|      WAITING|
|58|           async_read|     RUNNABLE|
+--+---------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:09:39|Running|
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:09:42|Running|
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
Total jobs 2
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:12:46|13%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:12:50|14%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:18:48|41%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:18:52|42%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:21:49|49%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:21:53|49%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:34:21| 0%Compiled |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:33:12|94%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:38:39|       |
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:38:39|       |
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:52|Remaining 1/3|
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:55|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:52|Remaining 1/3|
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:55|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg>
Job 108 for THING1 FINISHED
Job 109 for THING2 FINISHED

The CLI exits when all jobs are done.

Execution time

You can find out the execution time by reviewing the logs.

/u01/app/oracle/autoupgrade/global/cfgtoollogs/upgrade/auto/autoupgrade_user.log

2019-09-25 19:09:22.185 INFO Loading user config file metadata
2019-09-25 19:09:28.405 INFO The target_base parameter was updated from N/A to /u01/app/oracle due to finding a more accurate value.
2019-09-25 19:09:28.418 INFO The target_version parameter was updated from 19.3 to 19.3.0.0.0 due to finding a more accurate value.
2019-09-25 19:09:29.444 INFO Finished processing dbEntry upg1
2019-09-25 19:09:35.597 INFO The target_base parameter was updated from N/A to /u01/app/oracle due to finding a more accurate value.
2019-09-25 19:09:35.602 INFO The target_version parameter was updated from 19.3 to 19.3.0.0.0 due to finding a more accurate value.
2019-09-25 19:09:36.627 INFO Finished processing dbEntry upg2
2019-09-25 19:09:36.634 INFO
build.version:20190207
build.date:2019/02/07 12:35:56
build.label:RDBMS_PT.AUTOUPGRADE_LINUX.X64_190205.1800

2019-09-25 19:09:36.635 INFO Current settings Initialized
2019-09-25 19:09:36.658 INFO Starting
2019-09-25 19:30:41.216 INFO Could not determine the flashback info
2019-09-25 19:30:43.240 INFO Could not determine the flashback info
2019-09-25 19:32:41.213 INFO Could not determine the flashback info
2019-09-25 19:32:43.237 INFO Could not determine the flashback info
2019-09-25 19:36:41.220 INFO Could not determine the flashback info
2019-09-25 19:36:43.253 INFO Could not determine the flashback info
2019-09-25 19:37:40.213 INFO Could not determine the flashback info
2019-09-25 19:37:41.237 INFO Could not determine the flashback info
2019-09-25 19:41:40.213 INFO Could not determine the flashback info
2019-09-25 19:41:41.238 INFO Could not determine the flashback info
2019-09-25 19:42:41.212 INFO Could not determine the flashback info
2019-09-25 19:42:43.235 INFO Could not determine the flashback info
2019-09-25 19:44:15.396 INFO Closing

Elapsed time was 34 minutes.

Deploy mode

Deploy mode completes all upgrade steps from soup to nuts: Analyze, Fixup, and Upgrade.

additional steps in Deploy mode

Deploy mode steps that are not covered by Analyze, Fixup, and Upgrade.

A careful review of the documentaion reveals that the Deploy mode implements a guaranteed restore point. You must exercise diligence to remove the restore point after it is no longer required, or you may optionally configure drop_grp_after_upgrade=yes.

Deploy mode contains a drain step, during which AutoUpgrade drains database sessions from the source instance.

Preupgrade refers to checks of your system, including disk space.

The Postupgrade documentation refers to moving the source configuation file and starting the upgraded instance. However, Upgrade mode also starts up your upgraded instance.

Deploy mode runs available fixups that correct Warning, Recommend, and Info precheck findings. AutoUpgrade makes these change without asking you. If you are using change management, be aware of, and document all changes, as required by your organization.

Run Deploy

Run deploy from the Source Oracle home with the databases opened normally. In other words, do not issue “startup upgrade” with Deploy mode.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode deploy
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
|Job#|DB_NAME|     STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|  MESSAGE|
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
| 114| THING2|PREUPGRADE|EXECUTING| RUNNING|19/09/26 14:41|     N/A|14:41:48|         |
| 115| THING1|     SETUP|PREPARING|FINISHED|19/09/26 14:43|     N/A|14:41:47|Scheduled|
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING| RUNNING|19/09/26 14:41|     N/A|14:42:25|Remaining 4/4|
| 115| THING1|    SETUP|PREPARING|FINISHED|19/09/26 14:43|     N/A|14:41:47|    Scheduled|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:42:25|Remaining 4/4|
| 115| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:43:51|Remaining 4/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:45:58|Remaining 3/4|
| 115| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:47:27|Remaining 3/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> tasks
+---+------------------+-------------+
| ID|              NAME|         Job#|
+---+------------------+-------------+
|  1|              main|      WAITING|
| 40|          jobs_mon|      WAITING|
| 41|           console|     RUNNABLE|
| 42|      queue_reader|      WAITING|
| 43|             cmd-0|      WAITING|
| 54|     job_manager-0|      WAITING|
| 55|     job_manager-1|      WAITING|
| 58|        event_loop|TIMED_WAITING|
| 59|        bqueue-114|      WAITING|
|200|         exec_loop|      WAITING|
|201|        bqueue-115|      WAITING|
|337|        fixups-115|      WAITING|
|338|    rep_checks-115|TIMED_WAITING|
|340|    thing1-puifx-0|      WAITING|
|341|    thing1-puifx-1|      WAITING|
|353|          quickSQL|     RUNNABLE|
|388|THING2-steady-ts-0|     RUNNABLE|
|400|          quickSQL|   TERMINATED|
+---+------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:51:53|0%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:52:30|0%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:57:58|21%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:58:35|18%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:01:01|21%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:01:37|21%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:07:06|37%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:07:42|37%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:13:10|49%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:13:46|49%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:22:17|75%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:22:53|75%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> tasks
+---+---------------------+-------------+
| ID|                 NAME|         Job#|
+---+---------------------+-------------+
|  1|                 main|      WAITING|
| 40|             jobs_mon|      WAITING|
| 41|              console|     RUNNABLE|
| 42|         queue_reader|      WAITING|
| 43|                cmd-0|      WAITING|
| 54|        job_manager-0|      WAITING|
| 55|        job_manager-1|      WAITING|
| 58|           event_loop|      WAITING|
| 59|           bqueue-114|      WAITING|
|200|            exec_loop|TIMED_WAITING|
|201|           bqueue-115|      WAITING|
|634|       monitor_thing2|TIMED_WAITING|
|635|        catctl_THING2|      WAITING|
|636| abort_monitor_THING2|TIMED_WAITING|
|637|           async_read|     RUNNABLE|
|649|       monitor_thing1|TIMED_WAITING|
|650|        catctl_THING1|      WAITING|
|651| abort_monitor_THING1|TIMED_WAITING|
|652|           async_read|     RUNNABLE|
|772|             quickSQL|     RUNNABLE|
+---+---------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:28:21|91%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:28:58|91%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:38:14|92%Compiled |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:38:34|79%Compiled |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
|Job#|DB_NAME|    STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|            MESSAGE|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
| 114| THING2|DBUPGRADE|EXECUTING|FINISHED|19/09/26 14:41|     N/A|15:46:19|Restarting Database|
| 115| THING1|DBUPGRADE|EXECUTING| RUNNING|19/09/26 14:43|     N/A|15:46:40|       98%Compiled |
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|     STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:51:05|Remaining 1/3|
| 115| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:51:29|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> Job 114 completed
lsj
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
|Job#|DB_NAME|      STAGE|OPERATION|  STATUS|    START_TIME|      END_TIME| UPDATED|          MESSAGE|
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
| 114| THING2|POSTUPGRADE|  STOPPED|FINISHED|19/09/26 14:41|19/09/26 15:55|15:55:23|Completed job 114|
| 115| THING1|POSTUPGRADE|EXECUTING| RUNNING|19/09/26 14:43|           N/A|15:54:57|       Restarting|
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
Total jobs 2
upg> Job 115 completed
------------------- Final Summary --------------------
Number of databases            [ 2 ]

Jobs finished successfully     [2]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 114 FOR THING2
Job 115 FOR THING1

---- Drop GRP at your convenience once you consider it is no longer needed ----
Drop GRP from THING2: drop restore point AUTOUPGRADE_221145114461854_THING2
Drop GRP from THING1: drop restore point AUTOUPGRADE_221145114461854_THING1

[oracle@ip-172-31-88-93 ~]$

The final state is all databases upgraded and open.

In Deploy mode, we have a new issue. There is now a guaranteed restore point. Unless you drop the restore point, you will eventually get a stuck archiver.

SQL> select guarantee_flashback_database gua, name from v$restore_point;

GUA NAME
--- ----------------------------------------
YES AUTOUPGRADE_221145114461854_THING1

Loose ends

There are items that you must tidy up manually, even if you use AutoUpgrade in Deploy mode:

  • CRS still points to the old Oracle home
  • /etc/oratab still points to the old Oracle home
  • in deploy mode, a guaranteed restore point exists.

Run “srvctl upgrade”.

[oracle@ip-172-31-88-93 ~]$ srvctl upgrade database -database THING1 -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-88-93 ~]$ srvctl upgrade database -database THING2 -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1

You can edit oratab or run “srvctl start database”

[oracle@ip-172-31-88-93 ~]$ srvctl start database -database THING1
[oracle@ip-172-31-88-93 ~]$ srvctl start database -database THING1
[oracle@ip-172-31-88-93 ~]$ grep THING /etc/oratab
THING1:/u01/app/oracle/product/19.3.0/dbhome_1:N                # line added by Agent
THING2:/u01/app/oracle/product/19.3.0/dbhome_1:N                # line added by Agent

Drop the restore points

SQL> drop restore point AUTOUPGRADE_221145114461854_THING1;

Restore point dropped.

and

SQL> drop restore point AUTOUPGRADE_221145114461854_THING2;

Restore point dropped.

Bug in distribution version

The AutoUpgrade that is distributed with Oracle 19.3 does not work well with ASM. Analyze will produce a false error and Deploy mode will fail.

reco

If you are using ASM, please download the most recent autoupgrade.jar version or use Upgrade mode only. If you use Upgrade mode, you must issue “startup upgrade” from the new Oracle home.

Conclusion

AutoUpgrade is an easy-to-use utility that upgrades multiple Oracle databases in the background. Be aware of these points:

  • Do not use the distribution autoupgrade.jar. Download the latest version.
  • On small hosts with multiple instances, manage CPU and memory.
  • Check that your database is in archivelog mode.
  • Allocate sufficient space for archive logs.
  • Use an alias to simplify the command line.
  • Use rlwrap for a more efficient CLI experience.
  • Use Analyze mode to identify errors, warnings, and fixups.
  • Use Fixup mode to apply fixups ahead of time.
  • Use Upgrade mode if you wish to handle analyze, fixup, and upgrade as separate steps.
  • Use Upgrade mode if you wish to migrate to a new host.
  • Handle all steps automatically by using Deploy mode.
  • Be aware of fixups that will be made in Deploy mode.
  • Monitor progress at the CLI prompt with “lsj”
  • Tie up loose ends:
    • srvctl upgrade
    • edit oratab
    • drop restore point

 

Unexpected behavior in Data Guard Maximum Protection

By Brian Fitzgerald

Introduction

Maximum Protection mode provides the highest level of data protection in Data Guard. Although a Maximum Protection system protects data as documented, there are performance issues that database administrators should be aware. The most glaring weakness is that in case of a network timeout anywhere in the system, the primary database hangs.  Administrators should be aware of this before deciding on Maximum Protection mode.

Protection Modes

To quote the manual, Oracle Data Guard ensures high availability, data protection, and disaster recovery for enterprise data. Oracle Data Guard provides three distinct modes of data protection.

Maximum Performance

This is the default protection mode. It provides the highest level of data protection that is possible without affecting the performance of a primary database. 

Maximum Availability

This protection mode provides the highest level of data protection that is possible without compromising the availability of a primary database. 

Maximum Protection

This protection mode ensures that no data loss occurs if the primary database fails. To provide this level of protection, the redo data needed to recover a transaction must be written to both the online redo log and to the standby redo log on at least one synchronized standby database before the transaction commits. To ensure that data loss cannot occur, the primary database shuts down, rather than continue processing transactions, if it cannot write its redo stream to at least one synchronized standby database.

This article is about Maximum Protection mode. Based on the description, there are a few obvious points to be deduced about Maximum protection. A practical system must have more than one standby. Otherwise, failure of the sole standby will cause the system to crash. Secondly, a low latency network is required to achieve reasonable commit performance. Finally, completely redundant hosts and storage are required for each standby. In the past, with physical hardware and long administrative lead times, it was prohibitive to experiment with Maximum Protection. In the cloud, however, virtual machines are inexpensive and setup is quick.

License

Data Guard is a feature of the Oracle Database Enterprise Edition itself and does not require separate licensing. By using Amazon Elastic Cloud Computing, (EC2), you can control your license costs by configuring only the CPUs that you need.

Overview

The demonstration system characteristics are:

  • Amazon AWS
  • Region us-west-1 (Northern California)
  • Availability zones us-west-1b and us-west-1c
  • Elastic Compute Cloud (EC2)
  • Red Hat Linux 7.2
  • Oracle Grid Infrastructure 19c
  • Oracle Database 19c

AWS availability zones are separate locations within the same geographic region. In us-wet-1, intra-availability-zone TCP latency, as measured by qperf,  is 136μs, which is attractive for setting up Oracle Data Guard Maximum Protection. Inter-availability-zone TCP latency is 562μs. Note that availability zones identifiers are mapped independently for each account. For example, availability zone us-west-1b could refer to different locations for different accounts. Here is the mapping from database to availability zone that was used in this test.

db unique name availability zone
SFB1 us-west-1b
SFB2 us-west-1b
SFC1 us-west-1c
SFC2 us-west-1c

Setup

Here is a brief summary of  the preparatory steps:

  • environment:
    • create and configure virtual machine
    • install grid infrastructure
    • create ASM disk groups
    • install oracle database software
    • setup static listener
  • duplicate database
    • setup tnsnames.ora
    • create orapwd
    • create temporary init ora
    • startup nomount
    • duplicate target database for standby
  • setup Oracle Restart
  • validate all static listeners
  • start Data Guard broker

Create configuration

DGMGRL> create configuration 'CFG' primary database is 'SFB1' connect identifier is 'SFB1';
Configuration "CFG" created with primary database "SFB1"
DGMGRL> add database 'SFB2' as connect identifier is 'SFB2';
Database "SFB2" added
DGMGRL> add database 'SFC1' as connect identifier is 'SFC1';
Database "SFC1" added
DGMGRL> add database 'SFC2' as connect identifier is 'SFC2';
Database "SFC2" added

Set log transfer property to SYNC. Set Maximum Availability

DGMGRL> edit database SFB2 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit database SFC1 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit database SFC2 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.

Test switchovers. Primary SFB1->SFB2->SFC1->SFC2->SFB1.

DGMGRL> switchover to SFB2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFB2"
Connecting ...
Connected to "SFB2"
Connected as SYSDBA.
New primary database "SFB2" is opening...
Oracle Clusterware is restarting database "SFB1" ...
Connected to "SFB1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB1"
Connected to "SFB1"
Switchover succeeded, new primary is "sfb2"
DGMGRL> switchover to SFC1
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC1"
Connecting ...
Connected to "SFC1"
Connected as SYSDBA.
New primary database "SFC1" is opening...
Oracle Clusterware is restarting database "SFB2" ...
Connected to "SFB2"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB2"
Connected to "SFB2"
Switchover succeeded, new primary is "sfc1"
DGMGRL> switchover to SFC2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC2"
Connecting ...
Connected to "SFC2"
Connected as SYSDBA.
New primary database "SFC2" is opening...
Oracle Clusterware is restarting database "SFC1" ...
Connected to "SFC1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFC1"
Connected to "SFC1"
Switchover succeeded, new primary is "sfc2"
DGMGRL> switchover to SFB1
Performing switchover NOW, please wait...
Operation requires a connection to database "SFB1"
Connecting ...
Connected to "SFB1"
Connected as SYSDBA.
New primary database "SFB1" is opening...
Oracle Clusterware is restarting database "SFC2" ...
Connected to "SFC2"
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFC2"
Connected to "SFC2"
Switchover succeeded, new primary is "sfb1"

Now we are ready to implement Maximum Protection mode

Set Maximum Protection

DGMGRL> edit configuration set protection mode as MaxProtection; 
Succeeded.

Test switchovers again. There is no issue

DGMGRL> switchover to SFC2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC2"
Connecting ...
Connected to "SFC2"
Connected as SYSDBA.
New primary database "SFC2" is opening...
Oracle Clusterware is restarting database "SFB1" ...
Connected to "SFB1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB1"
Connected to "SFB1"
Switchover succeeded, new primary is "sfc2"

Network failures

Failure of some, but not all standbys

Here, where you are testing actual network failures, is where you are going to notice serious issues with Maximum Protection mode.

The initial status is normal. All standbys are in sync with the primary.

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    SFB2 - Physical standby database
    SFC1 - Physical standby database
    SFC2 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

Now we want to cause a network failure to one standby, SFC2, for example. We stage the failure by cutting TCP port 1521. In the AWS console, create a new security group that allows ingress to TCP port 22 only.

cutoff.sg

Go to change security groups. Un-check the original security group, check the “ssh only” security group, and press “Assign Security Groups”. There are three standbys, but we are cutting off network access to only one standby. The system should continue to operate normally. However, right away, you can notice a problem. Try to commit a transaction.

SQL> insert into t ( n ) values ( 0 );

1 row created.

SQL> commit;

You will find that the transaction does not complete. You can run “show configuration”. For 30 seconds, the configuration shows up as normal. No messages appear in the alert log. All the while, only one standby is unreachable. Two other standbys are reachable, but the database hangs. Eventually, Oracle notices the issue. Messages start appearing in the alert log.

2019-09-13T12:43:51.076718-04:00
LGWR (PID:26748): ORA-16198: Received timed out error from KSR
LGWR (PID:26748): Attempting LAD:4 network reconnect (16198)
LGWR (PID:26748): LAD:4 network reconnect abandoned

The transaction commits. The configuration now appears as:

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    SFB2 - Physical standby database
    SFC1 - Physical standby database
    SFC2 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 8 seconds ago)

You can repeat the process for a second standby, SFC1, for example. Again, the database hangs for 30 seconds, and finally resolves.

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    Error: ORA-16778: redo transport error for one or more members

    SFB2 - Physical standby database
    SFC1 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

    SFC2 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 160 seconds ago)

This section, where failure of some, but not all standbys leads to an application hang is going to be objectionable to some application users.

Failure of all standbys

Now disconnect the third standby.  Observe this behavior:

  • The primary will retry the standbys for 5 minutes.
  • Sessions that issue a commit will wait.
  • The primary will abort and restart.
  • During restart, the instance will hang on open database.
  • After 8 more minutes, the primary will abort.

Host shutdown is the expected behavior after a failure of all standbys. The following message will appear in SQL*Plus sessions that were waiting on commit.

commit
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 10524
Session ID: 39 Serial number: 7415

Some may find it surprising that the shutdown is not instantaneous, and that it takes five minutes before the database shuts down.

Adjusting NetTimeout

You can reduce the amount of time that an application can wait on TCP timeout:

DGMGRL> edit database SFB1 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFB2 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFC1 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFC2 set property NetTimeout = 5;
Property "nettimeout" updated

This is a workaround and is beside the point. The application still has to wait to commit even though one or more standbys is still available. There should be no hanging.

Conclusion

There is a flaw in Data Guard Maximum Protection. If a TCP timeout error involving only one standby, the primary will hang even if other standbys are available. This issue will lead some administrators to rule out Data Guard Maximum Protection.

In the past, because of high hardware costs, long lead times, and separation of duties across DBA and SAs, it was rare to test Maximum Protection. Now, with the cloud, testing such arrangements is convenient, but testing uncovers weakness in the design.

Likewise, in the past, because of separation of duties across DBAs and network administrators, it was rare for DBAs to stage TCP timeout tests, but such tests are simple to perform now. TCP timeout tests uncover weaknesses in Data Guard.

Data Guard Maximum Availability has improved over time to the point where it is an acceptable substitute for Maximum Performance in some cases.

Oracle 12.1 introduced the far sync instance, which can be configured near a primary database on a low latency network to provide near-zero-data loss capability. Maximum Availability and far sync are described in this article. An Active Data Guard license is required for far sync. Ironically, in this sense, Maximum Availability costs more than Maximum Protection, the highest level.

DGMGRL not required in listener.ora in Restart

By Brian Fitzgerald

Introduction

In Oracle Restart, _DGMGRL services are no longer required in listener.ora. Switchover output has changed slightly and _DGMGRL connections no longer appear in the listener log.

Background

While testing switchover in Data Guard in 19c, I noticed that after configuring Restart, connections to _DGMGRL longer appeared in the listener log. I deleted the _DGMGRL services, reloaded the listeners, and retested the switchover without issue.

Static listeners (initial)

Initially, in the grid account, in $ORACLE_HOME/network/admin/listener.ora, these SID_LIST_LISTENER were in place. At the primary:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = NY_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

At the far sync:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = FS_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

At the standby:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = SF_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

I reloaded the listeners at each host:

[grid@ip-172-31-86-22 ~]$ lsnrctl reload

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 02-SEP-2019 12:28:09

Copyright (c) 1991, 2019, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
The command completed successfully

Switchover test before setting up Restart

The following switchover test without Restart was done:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Operation requires start up of instance "ORCL" on database "NY"
Starting instance "ORCL"...
Connected to an idle instance.
ORACLE instance started.
Connected to "NY"
Database mounted.
Connected to "NY"
Switchover succeeded, new primary is "sf"

Notice the message ‘Operation requires start up of instance “ORCL” on database “NY”‘. In the NY listener log, several connections to service NY_DGMGRL appear. For example:

02-SEP-2019 22:46:10 * (CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=51298)) * establish * NY_DGMGRL * 0

Run “show configuration”:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxPerformance
  Members:
  SF - Primary database
    FS - Far sync instance
      NY - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

Switch back to the original primary.

DGMGRL> switchover to NY
Performing switchover NOW, please wait...
New primary database "NY" is opening...
Operation requires start up of instance "ORCL" on database "SF"
Starting instance "ORCL"...
Connected to an idle instance.
ORACLE instance started.
Connected to "SF"
Database mounted.
Connected to "SF"
Switchover succeeded, new primary is "ny"

Notice the message ‘Operation requires start up of instance “ORCL” on database “SF”‘ . In the SF listener log, several connections to service SF_DGMGRL appear. For example:

02-SEP-2019 22:43:08 * (CONNECT_DATA=(SERVICE_NAME=SF_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=50340)) * establish * SF_DGMGRL * 0

Notice already at the far sync, no connections to the FS_DGMGRL service appeared. Service FS_DGMGRL plays no role at the far sync.

Configure Restart

Register your Restart instances.

At the primary:

[oracle@ip-172-31-86-22 ~]$ srvctl add database -database NY -role PRIMARY -stopoption IMMEDIATE -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile +DATA01/ORCL/PARAMETERFILE/spfile.266.1017440879 -diskgroup DATA01,RECO01

At the far sync:

[oracle@ip-172-31-28-23 ~]$ srvctl add database -database FS -role physical_standby -startoption MOUNT -stopoption ABORT -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora -diskgroup DATA01,RECO01

To activate the change, I found it necessary to issue srvctl start. That does not seem right. “srvctl enable database” should do it.

Also, in Restart, “srvctl enable instance” is not available. “srvctl enable instance” only works in RAC.

Unless you activate the instance, it will not start automatically upon host reboot.

[oracle@ip-172-31-28-23 ~]$ srvctl start database -database FS

At the standby:

[oracle@ip-172-32-10-34 ~]$ srvctl add database -database SF -role physical_standby -startoption MOUNT -stopoption ABORT -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora -diskgroup DATA01,RECO01
[oracle@ip-172-32-10-34 ~]$ srvctl start database -database SF

Reboot all hosts and check that the database instances start automatically, in the proper startup mode and Data Guard role.

Switchover test

Test switchover to SF. The output changes slightly, and becomes:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Oracle Clusterware is restarting database "NY" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "NY"
Connected to "NY"
Switchover succeeded, new primary is "sf"

After configuring Restart, the Data Guard switchover output has changed slightly. Message ‘Operation requires start up of instance “ORCL” on database “NY”‘ has been replaced with ‘Oracle Clusterware is restarting database “NY” …’. A review of the NY listener log shows no connection to service NY_DGMGRL.

Test switchover to NY. The output is now:

DGMGRL> switchover to NY
Performing switchover NOW, please wait...
New primary database "NY" is opening...
Oracle Clusterware is restarting database "SF" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to "SF"
Connected to "SF"
Switchover succeeded, new primary is "ny"

A review of the SF listener log shows no connection to service SF_DGMGRL.

Static listeners (final)

In Restart, therefore, the “_DGMGRL” listener.ora entry is not needed. SID_LIST_LISTENER can be simplified on the primary, far sync, and standby as:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

After configuring Oracle Restart, switchovers were retested with the revised listener.ora without issue.

Error in non-Restart and no DGMGRL

Suppose we disable restart at NY.

[oracle@ip-172-31-86-22 ~]$ srvctl stop database -database NY
[oracle@ip-172-31-86-22 ~]$ srvctl remove database -database NY -y
[oracle@ip-172-31-86-22 dbs]$ cat > initORCL.ora
spfile='+DATA01/NY/PARAMETERFILE/spfile.263.1018152951'
[oracle@ip-172-31-86-22 dbs]$ sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Sep 12 11:07:20 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             318767104 bytes
Database Buffers          805306368 bytes
Redo Buffers                7880704 bytes
Database mounted.
Database opened.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    SF - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 11 seconds ago)

(Note: In this example, there is no far sync.) Now try a switchover:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Operation requires start up of instance "ORCL" on database "NY"
Starting instance "ORCL"...
Unable to connect to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)))
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Failed.

Please complete the following steps to finish switchover:
        start up and mount instance "ORCL" of database "NY"

The new primary, SF, opens just fine. The old primary, NY, got shut down, but now there is no way for the broker to restart it. The listener is not listening on behalf of the oracle database.

[grid@ip-172-31-86-22 ~]$ lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 12-SEP-2019 11:26:00

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                12-SEP-2019 10:15:24
Uptime                    0 days 1 hr. 10 min. 36 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/ip-172-31-86-22/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_RECO01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
The command completed successfully

This error referring to misssing service NY_DGMGRL appears in the listener log:

12-SEP-2019 11:22:09 * (CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=53930)) * establish * NY_DGMGRL * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor

The conclusion is that without Restart we need the static listener and the DGMGRL in SID_LIST_LISTENER.

Validate static listeners

Validating the static listeners is a critical Data Guard setup step. After duplicating the database and setting up Restart, check each static listener.:

  • Connect over the network with sqlplus
  • Shutdown (shutdown abort for standbys and far syncs)
  • Startup (startup mount for standbys and far syncs)

Example:

[oracle@ip-172-31-28-23 ops]$ sqlplus sys/zystm.22@FS as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Sep 13 11:36:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.

This is an example of a failed static listener check.

[oracle@ip-172-31-28-23 ops]$ sqlplus sys/zystm.22@FS as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Sep 13 11:36:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.
SQL> shutdown abort
ORACLE instance shut down.
ERROR:
ORA-12505: TNS:listener does not currently know of SID given in connect
descriptor


SQL> startup mount
SP2-0640: Not connected

Resolve all failures before proceeding.

Conclusion

In Oracle Database 12.1, the Data Guard Broker Manager tool (dgmgrl) was modified so that in Restart or RAC, “_DGMGRL” is no longer required. This fact is mentioned in Oracle Data Guard Broker and Static Service Registration (Doc ID 1387859.1).

On the other hand, if you are not using Restart, then you do need DGMGRL in your SID_LIST_LISTENER.

Restart is only available with Grid Infrastructure. The main benefit of Grid Infrastructure is ASM. If you built Oracle Database on operating system files, then you have less of a need for Grid Infrastructure. If you did not install Grid Infrastructure, then Restart is not available. Without Retstart, a Data Guard setup will need a DGMGRL service in SID_LIST_LISTENER.

During Data Guard setup, validate all static listeners.