AutoUpgrade Oracle ->19c

By Brian Fitzgerald

Introduction

AutoUpgrade is a convenient utility for completing multiple database upgrades in parallel. AutoUpgrade was released undocumented in Oracle 18c, and was documented for the first time in the 19c manual:

The AutoUpgrade utility identifies issues before upgrades, deploys upgrades, performs postupgrade actions, and starts the upgraded Oracle Database.

Use AutoUpgrade to upgrade Oracle Database from version 11.2.0.4 and up. To use AutoUpgrade, prepare a simple configuration file and run autoupgrade.jar. Behind the scenes, AutoUpgrade runs all steps necessary to complete the upgrade. AutoUpgrade runs on a single host. If you want to migrate to a separate, new host, run AutoUpgrade preparatory steps on the Source (original) host. Then, copy the database to the new host and run the Upgrade step on the target host.

News about AutoUpgrade

Mike Dietrich is Oracle’s product manager for AutoUpgrade. Refer to his blog for news about AutoUpgrade. Future topics or changes could include coverage of new Oracle versions, RAC, Restart, Data Guard, and migrating directly from non-CDB to CDB.

Demonstration platform description

AutoUpgrade is demonstrated in this blog article. The system attributes of the demonstration system are:

Attribute Value
location US East (N. Virginia)
instanceType c5.xlarge
tenancy shared
instanceFamily Compute optimized
physicalProcessor Intel Xeon Platinum 8124M
clockSpeed 3.0 Ghz
Number of CPUs 4
memGB 8 GiB
operatingSystem Linux
AMI RHEL-7.7_HVM_GA-20190723-x86_64-1-Hourly2-GP2
OS Vendor Red Hat, Inc.
Red Hat version 7.7
price per hour $0.17

Demonstration database environment

Here is a database environment summary:

Description Value
Source version 18.3.0
Target version 19.3.0
Instances THING1 and THING2
Instance type Restart

Concurrency

Be aware that AutoUpgrade is going to launch an upgrade process for each database in the configuration file. The result is going to be multiple concurrent upgrade processes running and multiple concurrent active databases.If you are upgrading few databases on a box with a lot of memory, you will have no problem.

If you have multiple databases on a smaller box, you may find that the load average will reach a high level, and interactive shell response will lag. If you find that AutoUpgrade results in an unmanageable load average, there is more than one solution. For example, you could temporarily shut down some databases while you upgrade other databases. If you are in the cloud, you can get better performance by temporarily running the host image on an instance type with a higher amount of memory and CPU:

aws ec2 stop-instances --instance-ids i-00a836c352bb54daa
aws ec2 modify-instance-attribute --instance-id i-00a836c352bb54daa --instance-type '{"Value": "c5.xlarge"}'
aws ec2 start-instances --instance-ids i-00a836c352bb54daa

Finally, you may use the start_time parameter to schedule database upgrades at different future times.

Alias

Run the AutoUpgrade binary out of an Oracle home at the target version. You are going to be put into an interactive command line interface (CLI), and you are going to check on job status multiple times with the “lsj” command. For convenience, use rlwrap and create an alias.

[oracle@ip-172-31-88-93 ~]$ AH=/u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-88-93 ~]$ alias au='rlwrap $AH/jdk/bin/java -jar $AH/rdbms/admin/autoupgrade.jar'

In this article, “au” refers to this alias.

Notice that in the alias, AH refers to an Oracle home where Java version 8 can be found. Java version 8 can be found in an Oracle 12c, 18c, or 19c Oracle home. Be sure to install the latest autoupgrade.jar into that Oracle home.

[oracle@ip-172-31-88-93 dbhome_1]$ $AH/jdk/bin/java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

If you want to run AutoUpgrade Analyze or Fixups on an Oracle 11.2.0.4 database that is on a separate host, install Java version 8 and the latest autoupgrade.jar there.

Version

AutoUpgrade distributed with Oracle Database 19.3 is version 20190207.

[oracle@ip-172-31-88-93 ~]$ au -version
build.version 20190207
build.date 2019/02/07 12:35:56
build.label RDBMS_PT.AUTOUPGRADE_LINUX.X64_190205.1800

Version 20190207 contains a bug that leads to a failure of Deploy mode on ASM. You can get the most recent version from Oracle Customer Support: AutoUpgrade Tool (Doc ID 2485457.1). As of this writing, the most recent version is:

[oracle@ip-172-31-88-93 ~]$ au -version
build.hash 67fee5b
build.version 20190823
build.date 2019/08/23 18:08:47

Configuration file

A single configuration file will be used to upgrade all instances. You may use AutoUpgrade itself to create a sample configuration file.

[oracle@ip-172-31-88-93 ~]$ au -create_sample_file config
Created sample configuration file /home/oracle/sample_config.cfg

Rename and customize the configuration file. There is one global section and multiple database sections, as many as you want. Parameter dbname actually refers to db_unique_name.

Optional AutoUpgrade configuration parameters deliver extended functionality such as changing database initialization parameters on the fly during upgrade, custom scripts to run before or after upgrade, guaranteed restore point, fixup list modification, environment variables, pluggable database upgrades, optional utlrp run, and alternate tns_admin directory location.

[oracle@ip-172-31-88-93 ~]$ mkdir -p /u01/app/oracle/autoupgrade
[oracle@ip-172-31-88-93 ~]$ cat thing1.thing2.18c.19c.conf

#Global configurations
global.autoupg_log_dir=/u01/app/oracle/autoupgrade

upg1.dbname=THING1
upg1.start_time=now
upg1.source_home=/u01/app/oracle/product/18.3.0/dbhome_1
upg1.target_home=/u01/app/oracle/product/19.3.0/dbhome_1
upg1.sid=THING1
upg1.log_dir=/u01/app/oracle/autoupgrade
upg1.upgrade_node=ip-172-31-88-93.ec2.internal
upg1.target_version=19.3
#upg1.run_utlrp=yes
#upg1.timezone_upg=yes

upg2.dbname=THING2
upg2.start_time=now
upg2.source_home=/u01/app/oracle/product/18.3.0/dbhome_1
upg2.target_home=/u01/app/oracle/product/19.3.0/dbhome_1
upg2.sid=THING2
upg2.log_dir=/u01/app/oracle/autoupgrade
upg2.upgrade_node=ip-172-31-88-93.ec2.internal
upg2.target_version=19.3
#upg2.run_utlrp=[yes|no]
#upg2.timezone_upg=[yes|no]

Recommendation: Identify $ORACLE_BASE. Set global and all database log directories to $ORACLE_BASE/autoupgrade. In every case, AutoUpgrade will create a subdirectory. You will get a directory structure like this:

/u01/app/oracle/autoupgrade/cfgtoollogs/
/u01/app/oracle/autoupgrade/THING1/
/u01/app/oracle/autoupgrade/THING2/

Analyze mode

In AutoUpgrade Analyze mode, the database instances should be running out of the Source Oracle home with the databases open.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode analyze

Output:

Autoupgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be analyzed
Enter some command, type 'help' or 'exit' to quit
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED| MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
| 110| THING1|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:20|Starting|
| 111| THING2|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:24|Starting|
+----+-------+---------+---------+-------+--------------+--------+--------+--------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 110| THING1|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:41| Remaining 1/72|
| 111| THING2|PRECHECKS|PREPARING|RUNNING|19/09/26 14:06|     N/A|14:06:42|Remaining 67/72|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 2
upg>
Job 110 for THING1 FINISHED
Job 111 for THING2 FINISHED

Locate the report. For example:

[oracle@ip-172-31-88-93 ~]$ find /u01/app/oracle/autoupgrade -name '*.html' -mmin -5
/u01/app/oracle/autoupgrade/global/cfgtoollogs/upgrade/auto/state.html
/u01/app/oracle/autoupgrade/THING1/110/prechecks/thing1_preupgrade.html
/u01/app/oracle/autoupgrade/THING2/111/prechecks/thing2_preupgrade.html

Review the report in a browser.

prechecks

The report may contain Error, Warning, Recommend, and Info findings. You must address severity Error findings before attempting upgrade.

Use of Analyze mode to identify fixups

The Analyze mode runs prechecks that result in a list of changes that AutoUpgrade will make during the Fixups stage. The planned fixups appear in a “checklist” file in three formats. For example:

THING1/101/prechecks/thing1_checklist.cfg
THING1/101/prechecks/thing1_checklist.json
THING1/101/prechecks/thing1_checklist.xml

Prechecks leading to fixups are:

[oracle@ip-172-31-88-93 autoupgrade]$ < THING1/101/prechecks/thing1_checklist.json jq -r '.containers[].checks[] | select( .fixup_available == "YES" ).checkname '
DICTIONARY_STATS
POST_DICTIONARY
POST_FIXED_OBJECTS
PRE_FIXED_OBJECTS
OLD_TIME_ZONES_EXIST
MANDATORY_UPGRADE_CHANGES

The complete json checklist file is presented for information:

{
  "dbname" : "THING1",
  "containers" : [ {
    "containername" : "THING1",
    "checks" : [ {
      "checkname" : "DICTIONARY_STATS",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather stale data dictionary statistics prior to database upgrade in off-peak time using:    EXECUTE DBMS_STATS.GATHER_DICTIONARY_STATS;",
      "broken rule" : "Dictionary statistics do not exist or are stale (not up-to-date).",
      "rule" : "Dictionary statistics help the Oracle optimizer find efficient SQL execution plans and are essential for proper upgrade timing. Oracle recommends gathering dictionary statistics in the last 24 hours before database upgrade.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "POST_DICTIONARY",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather dictionary statistics after the upgrade using the command:    EXECUTE DBMS_STATS.GATHER_DICTIONARY_STATS;",
      "broken rule" : "Oracle recommends gathering dictionary statistics after upgrade.",
      "rule" : "Dictionary statistics provide essential information to the Oracle optimizer to help it find efficient SQL execution plans. After a database upgrade, statistics need to be re-gathered as there can now be tables that have significantly changed during the upgrade or new tables that do not have statistics gathered yet."
    },
    {
      "checkname" : "POST_FIXED_OBJECTS",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather statistics on fixed objects after the upgrade and when there is a representative workload on the system using the command:    EXECUTE DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;",
      "broken rule" : "This recommendation is given for all preupgrade runs.",
      "rule" : "Fixed object statistics provide essential information to the Oracle optimizer to help it find efficient SQL execution plans.  Those statistics are specific to the Oracle Database release that generates them, and can be stale upon database upgrade.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "PRE_FIXED_OBJECTS",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "RECOMMEND",
      "action" : "Gather statistics on fixed objects prior the upgrade.",
      "broken rule" : "None of the fixed object tables have had stats collected.",
      "rule" : "Gathering statistics on fixed objects, if none have been gathered yet, is recommended prior to upgrading.  For information on managing optimizer statistics, refer to the 18.0.0.0 Oracle Database Upgrade Guide."
    },
    {
      "checkname" : "OLD_TIME_ZONES_EXIST",
      "stage" : "POSTCHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "WARNING",
      "action" : "Upgrade the database time zone file using the DBMS_DST package.",
      "broken rule" : "The database is using time zone file version 31 and the target 19 release ships with time zone file version 32.",
      "rule" : "Oracle recommends upgrading to the desired (latest) version of the time zone file.  For more information, refer to 'Upgrading the Time Zone File and Timestamp with Time Zone Data' in the 19 Oracle Database Globalization Support Guide."
    },
    {
      "checkname" : "MANDATORY_UPGRADE_CHANGES",
      "stage" : "PRECHECKS",
      "fixup_available" : "YES",
      "runfix" : "YES",
      "severity" : "INFO",
      "action" : "Mandatory changes are applied automatically in the during_upgrade_pfile_dbname.ora file.  Some of these changes maybe present in the after_upgrade_pfile_dbname.ora file.  The during_upgrade_pfile_dbname.ora is used to start the database in upgrade mode. The after_upgrade_pfile_dbname.ora is used to start the database once the upgrade has completed successfully.",
      "broken rule" : "",
      "rule" : "Mandatory changes are required to perform the upgrade.  These changes are implemented in the during_ and after_upgrade_pfile_dbname.ora files."
    },
    {
      "checkname" : "RMAN_RECOVERY_VERSION",
      "stage" : "PRECHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "INFO",
      "action" : "Check the Oracle Backup and Recovery User's Guide for information on how to manage an RMAN recovery catalog schema.",
      "broken rule" : "If you are using a version of the recovery catalog schema that is older than that required by the RMAN client version, then you must upgrade the catalog schema.",
      "rule" : "It is good practice to have the catalog schema the same or higher version than the RMAN client version you are using."
    },
    {
      "checkname" : "TABLESPACES_INFO",
      "stage" : "PRECHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "INFO",
      "action" : "To help you keep track of your tablespace allocations, the following AUTOEXTEND tablespaces are expected to successfully EXTEND during the upgrade process.",
      "broken rule" : "",
      "rule" : "Minimum tablespace sizes for upgrade are estimates."
    },
    {
      "checkname" : "DIR_SYMLINKS",
      "stage" : "POSTCHECKS",
      "fixup_available" : "NO",
      "runfix" : "N/A",
      "severity" : "WARNING",
      "action" : "To identify directory objects with symbolic links in the path name, run $ORACLE_HOME/rdbms/admin/utldirsymlink.sql AS SYSDBA after upgrade. Recreate any directory objects listed, using path names that contain no symbolic links.",
      "broken rule" : "Some directory object path names may currently contain symbolic links.",
      "rule" : "Starting in Release 18c, symbolic links are not allowed in directory object path names used with BFILE data types, the UTL_FILE package, or external tables."
    }]
  }]
}

Fixup mode

Prior to upgrading, you may run the fixups that are recommended, possible, and available. You can see what fixups will run by running Analyze mode and reviewing the checklist file. In AutoUpgrade Fixup mode, the database instances should be running out of the Source Oracle home with the databases open.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode fixups

The console session:

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode fixups
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 102| THING2|PRECHECKS|PREPARING|RUNNING|19/09/29 01:04|     N/A|01:04:59|Loading DB info|
| 103| THING1|PRECHECKS|PREPARING|RUNNING|19/09/29 01:05|     N/A|01:05:03|Loading DB info|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 102| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:04|     N/A|01:06:28|Remaining 3/4|
| 103| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:05|     N/A|01:05:21|Remaining 4/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 102| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:04|     N/A|01:06:28|Remaining 3/4|
| 103| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/29 01:05|     N/A|01:06:36|Remaining 3/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg>
upg> Job 102 completed
Job 103 completed
------------------- Final Summary --------------------
Number of databases            [ 2 ]

Jobs finished successfully     [2]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 102 FOR THING2
Job 103 FOR THING1

Space requirements

You must have sufficient space for archived redo logs. In this example, I upgraded two databases (6.7 GB), and consumed 9.1 GB in RECO. Diskgroup RECO01 was 40 GB and each of two databases had db_recovery_file_dest_size=20g.

Upgrade vs Deploy mode

Two upgrade modes are available, Upgrade and Deploy.

Upgrade mode performs the actual upgrade. You should run the Analyze and Fixup steps first. Upgrade mode is an appropriate choice when the upgrade will be performed on a different host from the Source version. Upgrade mode requires that you manually issue “startup upgrade” in the new Oracle home.

Deploy mode performs the Analyze, Fixup, and Upgrade steps, and additional steps, on a single host in a single execution.

Upgrade mode

Startup upgrade

If you are going to use Upgrade mode, then switch the environment to the target Oracle home. Issue “startup upgrade”.

[ec2-user@ip-172-31-88-93 ~]$ cat /tmp/initTHING1.ora
spfile='+DATA01/THING1/PARAMETERFILE/spfile.276.1019927455'
[oracle@ip-172-31-88-93 ~]$ unset ORACLE_SID 
[oracle@ip-172-31-88-93 ~]$ . oraenv 
ORACLE_SID = [oracle] ? 
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/19.3.0/dbhome_1 
The Oracle base remains unchanged with value /u01/app/oracle 
[oracle@ip-172-31-88-93 ~]$ export ORACLE_SID=THING1 
[oracle@ip-172-31-88-93 ~]$ sysdba 
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Sep 25 19:03:59 2019 
Version 19.3.0.0.0 

Copyright (c) 1982, 2019, Oracle.  All rights reserved. 

Connected to an idle instance. 
SQL> startup upgrade pfile='/tmp/initTHING1.ora'
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             385875968 bytes
Database Buffers         1174405120 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

[oracle@ip-172-31-88-93 ~]$ export ORACLE_SID=THING2
[ec2-user@ip-172-31-88-93 ~]$ cat /tmp/initTHING2.ora
spfile='+DATA01/THING2/PARAMETERFILE/spfile.257.1019928401'
[oracle@ip-172-31-88-93 ~]$ sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Sep 25 19:06:08 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade pfile='/tmp/initTHING2.ora'
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             385875968 bytes
Database Buffers         1174405120 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

Run Upgrade

Start AutoUpgrade. After the upgrade has been launched in the background, you will be dropped int CLI mode. Run “lsj” from time to time to check on status.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode upgrade
Autoupgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Enter some command, type 'help' or 'exit' to quit
upg> tasks
+--+---------------------+-------------+
|ID|                 Name|       Status|
+--+---------------------+-------------+
| 1|                 main|      WAITING|
|33|             jobs_mon|      WAITING|
|34|             mgr_help|TIMED_WAITING|
|35|             watchdog|TIMED_WAITING|
|36|              console|     RUNNABLE|
|37|         queue_reader|      WAITING|
|38|                cmd-0|      WAITING|
|39|        job_manager-0|      WAITING|
|40|        job_manager-1|      WAITING|
|42|           bqueue-108|      WAITING|
|50|       monitor_thing1|TIMED_WAITING|
|51|        catctl_thing1|      WAITING|
|52| abort_monitor_thing1|TIMED_WAITING|
|54|           async_read|     RUNNABLE|
|55|           bqueue-109|      WAITING|
|58|           async_read|     RUNNABLE|
+--+---------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:09:39|Running|
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:09:42|Running|
+----+-------+---------+---------+-------+--------------+--------+--------+-------+
Total jobs 2
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:12:46|13%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:12:50|14%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:18:48|41%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:18:52|42%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:21:49|49%Upgraded |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:21:53|49%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|JOB#|DB NAME|    STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 108| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:34:21| 0%Compiled |
| 109| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:33:12|94%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:38:39|       |
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:38:39|       |
+----+-------+----------+---------+-------+--------------+--------+--------+-------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:52|Remaining 1/3|
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:55|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|JOB#|DB NAME|     STAGE|OPERATION| STATUS|    START TIME|END TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 108| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:52|Remaining 1/3|
| 109| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/25 19:09|     N/A|19:40:55|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg>
Job 108 for THING1 FINISHED
Job 109 for THING2 FINISHED

The CLI exits when all jobs are done.

Execution time

You can find out the execution time by reviewing the logs.

/u01/app/oracle/autoupgrade/global/cfgtoollogs/upgrade/auto/autoupgrade_user.log

2019-09-25 19:09:22.185 INFO Loading user config file metadata
2019-09-25 19:09:28.405 INFO The target_base parameter was updated from N/A to /u01/app/oracle due to finding a more accurate value.
2019-09-25 19:09:28.418 INFO The target_version parameter was updated from 19.3 to 19.3.0.0.0 due to finding a more accurate value.
2019-09-25 19:09:29.444 INFO Finished processing dbEntry upg1
2019-09-25 19:09:35.597 INFO The target_base parameter was updated from N/A to /u01/app/oracle due to finding a more accurate value.
2019-09-25 19:09:35.602 INFO The target_version parameter was updated from 19.3 to 19.3.0.0.0 due to finding a more accurate value.
2019-09-25 19:09:36.627 INFO Finished processing dbEntry upg2
2019-09-25 19:09:36.634 INFO
build.version:20190207
build.date:2019/02/07 12:35:56
build.label:RDBMS_PT.AUTOUPGRADE_LINUX.X64_190205.1800

2019-09-25 19:09:36.635 INFO Current settings Initialized
2019-09-25 19:09:36.658 INFO Starting
2019-09-25 19:30:41.216 INFO Could not determine the flashback info
2019-09-25 19:30:43.240 INFO Could not determine the flashback info
2019-09-25 19:32:41.213 INFO Could not determine the flashback info
2019-09-25 19:32:43.237 INFO Could not determine the flashback info
2019-09-25 19:36:41.220 INFO Could not determine the flashback info
2019-09-25 19:36:43.253 INFO Could not determine the flashback info
2019-09-25 19:37:40.213 INFO Could not determine the flashback info
2019-09-25 19:37:41.237 INFO Could not determine the flashback info
2019-09-25 19:41:40.213 INFO Could not determine the flashback info
2019-09-25 19:41:41.238 INFO Could not determine the flashback info
2019-09-25 19:42:41.212 INFO Could not determine the flashback info
2019-09-25 19:42:43.235 INFO Could not determine the flashback info
2019-09-25 19:44:15.396 INFO Closing

Elapsed time was 34 minutes.

Deploy mode

Deploy mode completes all upgrade steps from soup to nuts: Analyze, Fixup, and Upgrade.

additional steps in Deploy mode

Deploy mode steps that are not covered by Analyze, Fixup, and Upgrade.

A careful review of the documentaion reveals that the Deploy mode implements a guaranteed restore point. You must exercise diligence to remove the restore point after it is no longer required, or you may optionally configure drop_grp_after_upgrade=yes.

Deploy mode contains a drain step, during which AutoUpgrade drains database sessions from the source instance.

Preupgrade refers to checks of your system, including disk space.

The Postupgrade documentation refers to moving the source configuation file and starting the upgraded instance. However, Upgrade mode also starts up your upgraded instance.

Deploy mode runs available fixups that correct Warning, Recommend, and Info precheck findings. AutoUpgrade makes these change without asking you. If you are using change management, be aware of, and document all changes, as required by your organization.

Run Deploy

Run deploy from the Source Oracle home with the databases opened normally. In other words, do not issue “startup upgrade” with Deploy mode.

[oracle@ip-172-31-88-93 ~]$ au -config thing1.thing2.18c.19c.conf -mode deploy
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
2 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
|Job#|DB_NAME|     STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|  MESSAGE|
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
| 114| THING2|PREUPGRADE|EXECUTING| RUNNING|19/09/26 14:41|     N/A|14:41:48|         |
| 115| THING1|     SETUP|PREPARING|FINISHED|19/09/26 14:43|     N/A|14:41:47|Scheduled|
+----+-------+----------+---------+--------+--------------+--------+--------+---------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING| RUNNING|19/09/26 14:41|     N/A|14:42:25|Remaining 4/4|
| 115| THING1|    SETUP|PREPARING|FINISHED|19/09/26 14:43|     N/A|14:41:47|    Scheduled|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:42:25|Remaining 4/4|
| 115| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:43:51|Remaining 4/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:45:58|Remaining 3/4|
| 115| THING1|PREFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:47:27|Remaining 3/4|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> tasks
+---+------------------+-------------+
| ID|              NAME|         Job#|
+---+------------------+-------------+
|  1|              main|      WAITING|
| 40|          jobs_mon|      WAITING|
| 41|           console|     RUNNABLE|
| 42|      queue_reader|      WAITING|
| 43|             cmd-0|      WAITING|
| 54|     job_manager-0|      WAITING|
| 55|     job_manager-1|      WAITING|
| 58|        event_loop|TIMED_WAITING|
| 59|        bqueue-114|      WAITING|
|200|         exec_loop|      WAITING|
|201|        bqueue-115|      WAITING|
|337|        fixups-115|      WAITING|
|338|    rep_checks-115|TIMED_WAITING|
|340|    thing1-puifx-0|      WAITING|
|341|    thing1-puifx-1|      WAITING|
|353|          quickSQL|     RUNNABLE|
|388|THING2-steady-ts-0|     RUNNABLE|
|400|          quickSQL|   TERMINATED|
+---+------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:51:53|0%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:52:30|0%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|14:57:58|21%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|14:58:35|18%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:01:01|21%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:01:37|21%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:07:06|37%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:07:42|37%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:13:10|49%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:13:46|49%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:22:17|75%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:22:53|75%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> tasks
+---+---------------------+-------------+
| ID|                 NAME|         Job#|
+---+---------------------+-------------+
|  1|                 main|      WAITING|
| 40|             jobs_mon|      WAITING|
| 41|              console|     RUNNABLE|
| 42|         queue_reader|      WAITING|
| 43|                cmd-0|      WAITING|
| 54|        job_manager-0|      WAITING|
| 55|        job_manager-1|      WAITING|
| 58|           event_loop|      WAITING|
| 59|           bqueue-114|      WAITING|
|200|            exec_loop|TIMED_WAITING|
|201|           bqueue-115|      WAITING|
|634|       monitor_thing2|TIMED_WAITING|
|635|        catctl_THING2|      WAITING|
|636| abort_monitor_THING2|TIMED_WAITING|
|637|           async_read|     RUNNABLE|
|649|       monitor_thing1|TIMED_WAITING|
|650|        catctl_THING1|      WAITING|
|651| abort_monitor_THING1|TIMED_WAITING|
|652|           async_read|     RUNNABLE|
|772|             quickSQL|     RUNNABLE|
+---+---------------------+-------------+
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:28:21|91%Upgraded |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:28:58|91%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 114| THING2|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:38:14|92%Compiled |
| 115| THING1|DBUPGRADE|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:38:34|79%Compiled |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 2
upg> lsj
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
|Job#|DB_NAME|    STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|            MESSAGE|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
| 114| THING2|DBUPGRADE|EXECUTING|FINISHED|19/09/26 14:41|     N/A|15:46:19|Restarting Database|
| 115| THING1|DBUPGRADE|EXECUTING| RUNNING|19/09/26 14:43|     N/A|15:46:40|       98%Compiled |
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
Total jobs 2
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|     STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 114| THING2|POSTFIXUPS|EXECUTING|RUNNING|19/09/26 14:41|     N/A|15:51:05|Remaining 1/3|
| 115| THING1|POSTFIXUPS|EXECUTING|RUNNING|19/09/26 14:43|     N/A|15:51:29|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 2
upg> Job 114 completed
lsj
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
|Job#|DB_NAME|      STAGE|OPERATION|  STATUS|    START_TIME|      END_TIME| UPDATED|          MESSAGE|
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
| 114| THING2|POSTUPGRADE|  STOPPED|FINISHED|19/09/26 14:41|19/09/26 15:55|15:55:23|Completed job 114|
| 115| THING1|POSTUPGRADE|EXECUTING| RUNNING|19/09/26 14:43|           N/A|15:54:57|       Restarting|
+----+-------+-----------+---------+--------+--------------+--------------+--------+-----------------+
Total jobs 2
upg> Job 115 completed
------------------- Final Summary --------------------
Number of databases            [ 2 ]

Jobs finished successfully     [2]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 114 FOR THING2
Job 115 FOR THING1

---- Drop GRP at your convenience once you consider it is no longer needed ----
Drop GRP from THING2: drop restore point AUTOUPGRADE_221145114461854_THING2
Drop GRP from THING1: drop restore point AUTOUPGRADE_221145114461854_THING1

[oracle@ip-172-31-88-93 ~]$

The final state is all databases upgraded and open.

In Deploy mode, we have a new issue. There is now a guaranteed restore point. Unless you drop the restore point, you will eventually get a stuck archiver.

SQL> select guarantee_flashback_database gua, name from v$restore_point;

GUA NAME
--- ----------------------------------------
YES AUTOUPGRADE_221145114461854_THING1

Loose ends

There are items that you must tidy up manually, even if you use AutoUpgrade in Deploy mode:

  • CRS still points to the old Oracle home
  • /etc/oratab still points to the old Oracle home
  • in deploy mode, a guaranteed restore point exists.

Run “srvctl upgrade”.

[oracle@ip-172-31-88-93 ~]$ srvctl upgrade database -database THING1 -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-88-93 ~]$ srvctl upgrade database -database THING2 -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1

You can edit oratab or run “srvctl start database”

[oracle@ip-172-31-88-93 ~]$ srvctl start database -database THING1
[oracle@ip-172-31-88-93 ~]$ srvctl start database -database THING1
[oracle@ip-172-31-88-93 ~]$ grep THING /etc/oratab
THING1:/u01/app/oracle/product/19.3.0/dbhome_1:N                # line added by Agent
THING2:/u01/app/oracle/product/19.3.0/dbhome_1:N                # line added by Agent

Drop the restore points

SQL> drop restore point AUTOUPGRADE_221145114461854_THING1;

Restore point dropped.

and

SQL> drop restore point AUTOUPGRADE_221145114461854_THING2;

Restore point dropped.

Bug in distribution version

The AutoUpgrade that is distributed with Oracle 19.3 does not work well with ASM. Analyze will produce a false error and Deploy mode will fail.

reco

If you are using ASM, please download the most recent autoupgrade.jar version or use Upgrade mode only. If you use Upgrade mode, you must issue “startup upgrade” from the new Oracle home.

Conclusion

AutoUpgrade is an easy-to-use utility that upgrades multiple Oracle databases in the background. Be aware of these points:

  • Do not use the distribution autoupgrade.jar. Download the latest version.
  • On small hosts with multiple instances, manage CPU and memory.
  • Check that your database is in archivelog mode.
  • Allocate sufficient space for archive logs.
  • Use an alias to simplify the command line.
  • Use rlwrap for a more efficient CLI experience.
  • Use Analyze mode to identify errors, warnings, and fixups.
  • Use Fixup mode to apply fixups ahead of time.
  • Use Upgrade mode if you wish to handle analyze, fixup, and upgrade as separate steps.
  • Use Upgrade mode if you wish to migrate to a new host.
  • Handle all steps automatically by using Deploy mode.
  • Be aware of fixups that will be made in Deploy mode.
  • Monitor progress at the CLI prompt with “lsj”
  • Tie up loose ends:
    • srvctl upgrade
    • edit oratab
    • drop restore point

 

Unexpected behavior in Data Guard Maximum Protection

By Brian Fitzgerald

Introduction

Maximum Protection mode provides the highest level of data protection in Data Guard. Although a Maximum Protection system protects data as documented, there are performance issues that database administrators should be aware. The most glaring weakness is that in case of a network timeout anywhere in the system, the primary database hangs.  Administrators should be aware of this before deciding on Maximum Protection mode.

Protection Modes

To quote the manual, Oracle Data Guard ensures high availability, data protection, and disaster recovery for enterprise data. Oracle Data Guard provides three distinct modes of data protection.

Maximum Performance

This is the default protection mode. It provides the highest level of data protection that is possible without affecting the performance of a primary database. 

Maximum Availability

This protection mode provides the highest level of data protection that is possible without compromising the availability of a primary database. 

Maximum Protection

This protection mode ensures that no data loss occurs if the primary database fails. To provide this level of protection, the redo data needed to recover a transaction must be written to both the online redo log and to the standby redo log on at least one synchronized standby database before the transaction commits. To ensure that data loss cannot occur, the primary database shuts down, rather than continue processing transactions, if it cannot write its redo stream to at least one synchronized standby database.

This article is about Maximum Protection mode. Based on the description, there are a few obvious points to be deduced about Maximum protection. A practical system must have more than one standby. Otherwise, failure of the sole standby will cause the system to crash. Secondly, a low latency network is required to achieve reasonable commit performance. Finally, completely redundant hosts and storage are required for each standby. In the past, with physical hardware and long administrative lead times, it was prohibitive to experiment with Maximum Protection. In the cloud, however, virtual machines are inexpensive and setup is quick.

License

Data Guard is a feature of the Oracle Database Enterprise Edition itself and does not require separate licensing. By using Amazon Elastic Cloud Computing, (EC2), you can control your license costs by configuring only the CPUs that you need.

Overview

The demonstration system characteristics are:

  • Amazon AWS
  • Region us-west-1 (Northern California)
  • Availability zones us-west-1b and us-west-1c
  • Elastic Compute Cloud (EC2)
  • Red Hat Linux 7.2
  • Oracle Grid Infrastructure 19c
  • Oracle Database 19c

AWS availability zones are separate locations within the same geographic region. In us-wet-1, intra-availability-zone TCP latency, as measured by qperf,  is 136μs, which is attractive for setting up Oracle Data Guard Maximum Protection. Inter-availability-zone TCP latency is 562μs. Note that availability zones identifiers are mapped independently for each account. For example, availability zone us-west-1b could refer to different locations for different accounts. Here is the mapping from database to availability zone that was used in this test.

db unique name availability zone
SFB1 us-west-1b
SFB2 us-west-1b
SFC1 us-west-1c
SFC2 us-west-1c

Setup

Here is a brief summary of  the preparatory steps:

  • environment:
    • create and configure virtual machine
    • install grid infrastructure
    • create ASM disk groups
    • install oracle database software
    • setup static listener
  • duplicate database
    • setup tnsnames.ora
    • create orapwd
    • create temporary init ora
    • startup nomount
    • duplicate target database for standby
  • setup Oracle Restart
  • validate all static listeners
  • start Data Guard broker

Create configuration

DGMGRL> create configuration 'CFG' primary database is 'SFB1' connect identifier is 'SFB1';
Configuration "CFG" created with primary database "SFB1"
DGMGRL> add database 'SFB2' as connect identifier is 'SFB2';
Database "SFB2" added
DGMGRL> add database 'SFC1' as connect identifier is 'SFC1';
Database "SFC1" added
DGMGRL> add database 'SFC2' as connect identifier is 'SFC2';
Database "SFC2" added

Set log transfer property to SYNC. Set Maximum Availability

DGMGRL> edit database SFB2 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit database SFC1 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit database SFC2 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.

Test switchovers. Primary SFB1->SFB2->SFC1->SFC2->SFB1.

DGMGRL> switchover to SFB2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFB2"
Connecting ...
Connected to "SFB2"
Connected as SYSDBA.
New primary database "SFB2" is opening...
Oracle Clusterware is restarting database "SFB1" ...
Connected to "SFB1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB1"
Connected to "SFB1"
Switchover succeeded, new primary is "sfb2"
DGMGRL> switchover to SFC1
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC1"
Connecting ...
Connected to "SFC1"
Connected as SYSDBA.
New primary database "SFC1" is opening...
Oracle Clusterware is restarting database "SFB2" ...
Connected to "SFB2"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB2"
Connected to "SFB2"
Switchover succeeded, new primary is "sfc1"
DGMGRL> switchover to SFC2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC2"
Connecting ...
Connected to "SFC2"
Connected as SYSDBA.
New primary database "SFC2" is opening...
Oracle Clusterware is restarting database "SFC1" ...
Connected to "SFC1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFC1"
Connected to "SFC1"
Switchover succeeded, new primary is "sfc2"
DGMGRL> switchover to SFB1
Performing switchover NOW, please wait...
Operation requires a connection to database "SFB1"
Connecting ...
Connected to "SFB1"
Connected as SYSDBA.
New primary database "SFB1" is opening...
Oracle Clusterware is restarting database "SFC2" ...
Connected to "SFC2"
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFC2"
Connected to "SFC2"
Switchover succeeded, new primary is "sfb1"

Now we are ready to implement Maximum Protection mode

Set Maximum Protection

DGMGRL> edit configuration set protection mode as MaxProtection; 
Succeeded.

Test switchovers again. There is no issue

DGMGRL> switchover to SFC2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC2"
Connecting ...
Connected to "SFC2"
Connected as SYSDBA.
New primary database "SFC2" is opening...
Oracle Clusterware is restarting database "SFB1" ...
Connected to "SFB1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB1"
Connected to "SFB1"
Switchover succeeded, new primary is "sfc2"

Network failures

Failure of some, but not all standbys

Here, where you are testing actual network failures, is where you are going to notice serious issues with Maximum Protection mode.

The initial status is normal. All standbys are in sync with the primary.

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    SFB2 - Physical standby database
    SFC1 - Physical standby database
    SFC2 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

Now we want to cause a network failure to one standby, SFC2, for example. We stage the failure by cutting TCP port 1521. In the AWS console, create a new security group that allows ingress to TCP port 22 only.

cutoff.sg

Go to change security groups. Un-check the original security group, check the “ssh only” security group, and press “Assign Security Groups”. There are three standbys, but we are cutting off network access to only one standby. The system should continue to operate normally. However, right away, you can notice a problem. Try to commit a transaction.

SQL> insert into t ( n ) values ( 0 );

1 row created.

SQL> commit;

You will find that the transaction does not complete. You can run “show configuration”. For 30 seconds, the configuration shows up as normal. No messages appear in the alert log. All the while, only one standby is unreachable. Two other standbys are reachable, but the database hangs. Eventually, Oracle notices the issue. Messages start appearing in the alert log.

2019-09-13T12:43:51.076718-04:00
LGWR (PID:26748): ORA-16198: Received timed out error from KSR
LGWR (PID:26748): Attempting LAD:4 network reconnect (16198)
LGWR (PID:26748): LAD:4 network reconnect abandoned

The transaction commits. The configuration now appears as:

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    SFB2 - Physical standby database
    SFC1 - Physical standby database
    SFC2 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 8 seconds ago)

You can repeat the process for a second standby, SFC1, for example. Again, the database hangs for 30 seconds, and finally resolves.

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    Error: ORA-16778: redo transport error for one or more members

    SFB2 - Physical standby database
    SFC1 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

    SFC2 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 160 seconds ago)

This section, where failure of some, but not all standbys leads to an application hang is going to be objectionable to some application users.

Failure of all standbys

Now disconnect the third standby.  Observe this behavior:

  • The primary will retry the standbys for 5 minutes.
  • Sessions that issue a commit will wait.
  • The primary will abort and restart.
  • During restart, the instance will hang on open database.
  • After 8 more minutes, the primary will abort.

Host shutdown is the expected behavior after a failure of all standbys. The following message will appear in SQL*Plus sessions that were waiting on commit.

commit
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 10524
Session ID: 39 Serial number: 7415

Some may find it surprising that the shutdown is not instantaneous, and that it takes five minutes before the database shuts down.

Adjusting NetTimeout

You can reduce the amount of time that an application can wait on TCP timeout:

DGMGRL> edit database SFB1 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFB2 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFC1 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFC2 set property NetTimeout = 5;
Property "nettimeout" updated

This is a workaround and is beside the point. The application still has to wait to commit even though one or more standbys is still available. There should be no hanging.

Conclusion

There is a flaw in Data Guard Maximum Protection. If a TCP timeout error involving only one standby, the primary will hang even if other standbys are available. This issue will lead some administrators to rule out Data Guard Maximum Protection.

In the past, because of high hardware costs, long lead times, and separation of duties across DBA and SAs, it was rare to test Maximum Protection. Now, with the cloud, testing such arrangements is convenient, but testing uncovers weakness in the design.

Likewise, in the past, because of separation of duties across DBAs and network administrators, it was rare for DBAs to stage TCP timeout tests, but such tests are simple to perform now. TCP timeout tests uncover weaknesses in Data Guard.

Data Guard Maximum Availability has improved over time to the point where it is an acceptable substitute for Maximum Performance in some cases.

Oracle 12.1 introduced the far sync instance, which can be configured near a primary database on a low latency network to provide near-zero-data loss capability. Maximum Availability and far sync are described in this article. An Active Data Guard license is required for far sync. Ironically, in this sense, Maximum Availability costs more than Maximum Protection, the highest level.

srvctl enable database not working in Restart

By Brian Fitzgerald

Introduction

In Restart, srvctl enable database does not actually enable database restart. This could be a bug.

Demonstration

srvctl add database

[oracle@ip-172-31-86-22 ~]$ srvctl stop database -database NY
[oracle@ip-172-31-86-22 ~]$ srvctl remove database -d NY -y
[oracle@ip-172-31-86-22 ~]$ srvctl add database -database NY -role PRIMARY -stopoption IMMEDIATE -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile +DATA01/NY/PARAMETERFILE/spfile.263.1018152951 -diskgroup DATA01,RECO01
[oracle@ip-172-31-86-22 ~]$ logout
[ec2-user@ip-172-31-86-22 ~]$ sudo reboot

log back in:

[oracle@ip-172-31-86-22 ~]$ ps -ef| grep smon
grid      2214     1  0 14:06 ?        00:00:00 asm_smon_+ASM
oracle    2239  2115  0 14:06 pts/0    00:00:00 grep --color=auto smon

srvctl enable database

You could try:

[oracle@ip-172-31-86-22 ~]$ srvctl enable database -d NY
PRCC-1010 : NY was already enabled
PRCR-1002 : Resource ora.ny.db is already enabled

Reboor, log back in, and check:

[oracle@ip-172-31-86-22 ~]$ logout 
[ec2-user@ip-172-31-86-22 ~]$ sudo reboot
[oracle@ip-172-31-86-22 ~]$ ps -ef| grep smon
grid      2112     1  0 15:10 ?        00:00:00 asm_smon_+ASM
oracle    2807  2782  0 15:16 pts/0    00:00:00 grep --color=auto smon

No luck.

srvctl enable instance command is not available

You could try

[grid@ip-172-31-86-22 ~]$ srvctl enable instance -db NY -instance NY

but in Restart, this message will appear:

PRKO-2012 : instance object is not supported in Oracle Restart

Workarounds

I am aware of two workarounds

1. srvctl start database

Issuing srvctl start database has the side effect of enabling automatic restart.

[oracle@ip-172-31-86-22 dbs]$ srvctl start database -database NY
PRCC-1014 : NY was already running
PRCR-1004 : Resource ora.ny.db is already running
PRCR-1079 : Failed to start resource ora.ny.db

From now on, the oracle database instance will start automatically.

2. crsctl modify resource

Issuing this command as the grid owner

[grid@ip-172-31-86-22 ~]$ crsctl modify resource ora.ny.db -attr AUTO_START=always -unsupported

also resolves the issue.

reboot

After you try one of these workarounds, reboot and notice that Oracle database is running.

[oracle@ip-172-31-86-22 ~]$ ps -ef| grep smon
grid      2216     1  0 15:03 ?        00:00:00 asm_smon_+ASM
oracle    2332     1  0 15:03 ?        00:00:00 ora_smon_ORCL
oracle    2411  1781  0 15:03 pts/0    00:00:00 grep --color=auto smon

Conclusion

In Restart, “srvctl enable database” does not lead to automatic instance restart upon reboot as you might expect, but you can work around by issuing “srvctl start database” or “crsctl modify resource”.

DGMGRL not required in listener.ora in Restart

By Brian Fitzgerald

Introduction

In Oracle Restart, _DGMGRL services are no longer required in listener.ora. Switchover output has changed slightly and _DGMGRL connections no longer appear in the listener log.

Background

While testing switchover in Data Guard in 19c, I noticed that after configuring Restart, connections to _DGMGRL longer appeared in the listener log. I deleted the _DGMGRL services, reloaded the listeners, and retested the switchover without issue.

Static listeners (initial)

Initially, in the grid account, in $ORACLE_HOME/network/admin/listener.ora, these SID_LIST_LISTENER were in place. At the primary:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = NY_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

At the far sync:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = FS_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

At the standby:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = SF_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

I reloaded the listeners at each host:

[grid@ip-172-31-86-22 ~]$ lsnrctl reload

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 02-SEP-2019 12:28:09

Copyright (c) 1991, 2019, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
The command completed successfully

Switchover test before setting up Restart

The following switchover test without Restart was done:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Operation requires start up of instance "ORCL" on database "NY"
Starting instance "ORCL"...
Connected to an idle instance.
ORACLE instance started.
Connected to "NY"
Database mounted.
Connected to "NY"
Switchover succeeded, new primary is "sf"

Notice the message ‘Operation requires start up of instance “ORCL” on database “NY”‘. In the NY listener log, several connections to service NY_DGMGRL appear. For example:

02-SEP-2019 22:46:10 * (CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=51298)) * establish * NY_DGMGRL * 0

Run “show configuration”:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxPerformance
  Members:
  SF - Primary database
    FS - Far sync instance
      NY - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

Switch back to the original primary.

DGMGRL> switchover to NY
Performing switchover NOW, please wait...
New primary database "NY" is opening...
Operation requires start up of instance "ORCL" on database "SF"
Starting instance "ORCL"...
Connected to an idle instance.
ORACLE instance started.
Connected to "SF"
Database mounted.
Connected to "SF"
Switchover succeeded, new primary is "ny"

Notice the message ‘Operation requires start up of instance “ORCL” on database “SF”‘ . In the SF listener log, several connections to service SF_DGMGRL appear. For example:

02-SEP-2019 22:43:08 * (CONNECT_DATA=(SERVICE_NAME=SF_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=50340)) * establish * SF_DGMGRL * 0

Notice already at the far sync, no connections to the FS_DGMGRL service appeared. Service FS_DGMGRL plays no role at the far sync.

Configure Restart

Register your Restart instances.

At the primary:

[oracle@ip-172-31-86-22 ~]$ srvctl add database -database NY -role PRIMARY -stopoption IMMEDIATE -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile +DATA01/ORCL/PARAMETERFILE/spfile.266.1017440879 -diskgroup DATA01,RECO01

At the far sync:

[oracle@ip-172-31-28-23 ~]$ srvctl add database -database FS -role physical_standby -startoption MOUNT -stopoption ABORT -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora -diskgroup DATA01,RECO01

To activate the change, I found it necessary to issue srvctl start. That does not seem right. “srvctl enable database” should do it.

Also, in Restart, “srvctl enable instance” is not available. “srvctl enable instance” only works in RAC.

Unless you activate the instance, it will not start automatically upon host reboot.

[oracle@ip-172-31-28-23 ~]$ srvctl start database -database FS

At the standby:

[oracle@ip-172-32-10-34 ~]$ srvctl add database -database SF -role physical_standby -startoption MOUNT -stopoption ABORT -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora -diskgroup DATA01,RECO01
[oracle@ip-172-32-10-34 ~]$ srvctl start database -database SF

Reboot all hosts and check that the database instances start automatically, in the proper startup mode and Data Guard role.

Switchover test

Test switchover to SF. The output changes slightly, and becomes:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Oracle Clusterware is restarting database "NY" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "NY"
Connected to "NY"
Switchover succeeded, new primary is "sf"

After configuring Restart, the Data Guard switchover output has changed slightly. Message ‘Operation requires start up of instance “ORCL” on database “NY”‘ has been replaced with ‘Oracle Clusterware is restarting database “NY” …’. A review of the NY listener log shows no connection to service NY_DGMGRL.

Test switchover to NY. The output is now:

DGMGRL> switchover to NY
Performing switchover NOW, please wait...
New primary database "NY" is opening...
Oracle Clusterware is restarting database "SF" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to "SF"
Connected to "SF"
Switchover succeeded, new primary is "ny"

A review of the SF listener log shows no connection to service SF_DGMGRL.

Static listeners (final)

In Restart, therefore, the “_DGMGRL” listener.ora entry is not needed. SID_LIST_LISTENER can be simplified on the primary, far sync, and standby as:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

After configuring Oracle Restart, switchovers were retested with the revised listener.ora without issue.

Error in non-Restart and no DGMGRL

Suppose we disable restart at NY.

[oracle@ip-172-31-86-22 ~]$ srvctl stop database -database NY
[oracle@ip-172-31-86-22 ~]$ srvctl remove database -database NY -y
[oracle@ip-172-31-86-22 dbs]$ cat > initORCL.ora
spfile='+DATA01/NY/PARAMETERFILE/spfile.263.1018152951'
[oracle@ip-172-31-86-22 dbs]$ sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Sep 12 11:07:20 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             318767104 bytes
Database Buffers          805306368 bytes
Redo Buffers                7880704 bytes
Database mounted.
Database opened.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    SF - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 11 seconds ago)

(Note: In this example, there is no far sync.) Now try a switchover:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Operation requires start up of instance "ORCL" on database "NY"
Starting instance "ORCL"...
Unable to connect to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)))
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Failed.

Please complete the following steps to finish switchover:
        start up and mount instance "ORCL" of database "NY"

The new primary, SF, opens just fine. The old primary, NY, got shut down, but now there is no way for the broker to restart it. The listener is not listening on behalf of the oracle database.

[grid@ip-172-31-86-22 ~]$ lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 12-SEP-2019 11:26:00

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                12-SEP-2019 10:15:24
Uptime                    0 days 1 hr. 10 min. 36 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/ip-172-31-86-22/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_RECO01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
The command completed successfully

This error referring to misssing service NY_DGMGRL appears in the listener log:

12-SEP-2019 11:22:09 * (CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=53930)) * establish * NY_DGMGRL * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor

The conclusion is that without Restart we need the static listener and the DGMGRL in SID_LIST_LISTENER.

Validate static listeners

Validating the static listeners is a critical Data Guard setup step. After duplicating the database and setting up Restart, check each static listener.:

  • Connect over the network with sqlplus
  • Shutdown (shutdown abort for standbys and far syncs)
  • Startup (startup mount for standbys and far syncs)

Example:

[oracle@ip-172-31-28-23 ops]$ sqlplus sys/zystm.22@FS as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Sep 13 11:36:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.

This is an example of a failed static listener check.

[oracle@ip-172-31-28-23 ops]$ sqlplus sys/zystm.22@FS as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Sep 13 11:36:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.
SQL> shutdown abort
ORACLE instance shut down.
ERROR:
ORA-12505: TNS:listener does not currently know of SID given in connect
descriptor


SQL> startup mount
SP2-0640: Not connected

Resolve all failures before proceeding.

Conclusion

In Oracle Database 12.1, the Data Guard Broker Manager tool (dgmgrl) was modified so that in Restart or RAC, “_DGMGRL” is no longer required. This fact is mentioned in Oracle Data Guard Broker and Static Service Registration (Doc ID 1387859.1).

On the other hand, if you are not using Restart, then you do need DGMGRL in your SID_LIST_LISTENER.

Restart is only available with Grid Infrastructure. The main benefit of Grid Infrastructure is ASM. If you built Oracle Database on operating system files, then you have less of a need for Grid Infrastructure. If you did not install Grid Infrastructure, then Restart is not available. Without Retstart, a Data Guard setup will need a DGMGRL service in SID_LIST_LISTENER.

During Data Guard setup, validate all static listeners.

Symmetric far sync Data Guard

Introduction

Presented here is a reliable, symmetric Oracle Data Guard network. By reliable is meant that zero data loss and continuous availability is guaranteed in case of loss of one far sync. Symmetric means that there is is no preferred primary site.

Background

I used to drive a Navy Nuclear Submarine. On a weekly basis, we switched over our running machinery. For example, we switched over our turbine generator lubricating oil pumps. The engineroom upper level watch would the start the standby pump, verify proper response, and then and place the originally running pump in standby. We shifted our pumps for multiple reasons. Switching pumps equalized wear. It also verified readiness and operability. Finally, by operating the controls on a regular basis, the crew maintained a higher level of proficiency.

Notice a few things about this arrangement. No pump is a “preferred” primary. The pump functioning in the standby role is in no way inferior in design, specifications, or readiness. Because the machinery is rotated on a weekly basis, there is no question that the standby is ready to take over the primary role immediately when necessary.

So it is with databases. You need to have  a disaster recovery strategy. The DR site needs to be in a known state of readiness at all times. The best way of accomplishing that is to rotate DR sites on a weekly or quarterly basis. No site is a “preferred” primary. By regularly exercising the system, there is no question that a standby is in a state of readiness. Finally, database administrators maintain proficiency by regularly switching over the Data Guard systems.

A reliable, symmetric Data Guard arrangement is convenient to set up in the cloud. You have complete flexibility over which region and availability zone for placing your databases and far syncs.

Environment

The environment is Amazon AWS EC2 (Elastic Computing Cloud) with Red Hat Linux 7.2. Here is an overview of the environment.

db unique name region availability zone type net I/O sync latency from latency μs
NY us-east-1 us-east-1c database SYNC SF 32000
NY_FSA us-east-1 us-east-1d far sync ASYNC NY 250
NY_FSB us-east-1 us-east-1a far sync ASYNC NY 500
SF us-west-1 us-west-1b database SYNC NY 31500
SF_FSA us-west-1 us-west-1b far sync ASYNC SF 115
SF_FSB us-west-1 us-west-1c far sync ASYNC SF 600

Notice that NY, NY_FSA, and NY_FSB are in the same region but in separate availability zones. Also, SF, SF_FSA, and SF_FSB are in the same region but SF_FSB is in an availability zone separate from SF and SF_FSA. SF and SF_FSA are in the same availability zone. The highest latency is across regions. The lowest latency is within one availability zone. Medium latency is in the same region but across availability zones.

Here is a network diagram.

reliable.symmetric

In Maximum Availability mode, the primary ships redo to one far sync, leaving the second far sync as an alternate. In fase of a failure on the far sync, the alternate comes on line, resyncs with the primary, and takes over the role of far sync. The primary always ships redo to a far sync in the same region. For example, NY ships to NY_FSA or NY_FSB. Because of this arrangement, there is a total of four far syncs, two per region. Although four far sync hosts are required, the hosts could be configured with less CPU, memory, and disk space than the primary database. In a cloud, CPU, memory, and disk can be reconfigured quickly. Note that in the present symmmetric arrangement, far sync direction is one-way, from the local database to the database in the remote region.

Setup

You can assume an initial, enabled Maximum Availability configuration with only a primary and standby database, and no far syncs.

Create and enable the far syncs

Create far syncs as explained in Data Guard 19c in AWS with far sync. Add the far syncs and enable them.

DGMGRL> add far_sync "NY_FSA" as connect identifier is 'NY_FSA';
far sync instance "NY_FSA" added
DGMGRL> enable far_sync "NY_FSA";
Enabled.
DGMGRL> add far_sync 'NY_FSB' as connect identifier is 'NY_FSB';
far sync instance "NY_FSB" added
DGMGRL> enable far_sync 'NY_FSB'
Enabled.
DGMGRL> add far_sync 'SF_FSA' as connect identifier is 'SF_FSA';
far sync instance "SF_FSA" added
DGMGRL> enable far_sync 'SF_FSA';
Enabled.
DGMGRL> add far_sync 'SF_FSB' as connect identifier is 'SF_FSB';
far sync instance "SF_FSB" added
DGMGRL> enable far_sync 'SF_FSB'
Enabled.

The configuration so far looks like this:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY     - Primary database
    SF     - Physical standby database

  Members Not Receiving Redo:
  NY_FSA - Far sync instance
  SF_FSA - Far sync instance
  SF_FSB - Far sync instance
  NY_FSB - Far sync instance

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 57 seconds ago)

Edit the far sync routes

DGMGRL> edit far_sync NY_FSA set property RedoRoutes = '(NY : SF ASYNC)';
Property "redoroutes" updated
DGMGRL> edit far_sync NY_FSB set property RedoRoutes = '(NY : SF ASYNC)';
Property "redoroutes" updated
DGMGRL> edit far_sync SF_FSA set property RedoRoutes = '(SF : NY ASYNC)';
Property "redoroutes" updated
DGMGRL> edit far_sync SF_FSB set property RedoRoutes = '(SF : NY ASYNC)';
Property "redoroutes" updated

Edit the database routes

DGMGRL> edit database NY set property RedoRoutes = '(LOCAL : (NY_FSA SYNC, NY_FSB SYNC))';
Property "redoroutes" updated
DGMGRL> edit database SF set property RedoRoutes = '(LOCAL : (SF_FSA SYNC, SF_FSB SYNC))';
Property "redoroutes" updated
DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY     - Primary database
    NY_FSA - Far sync instance
      SF     - Physical standby database

  Members Not Receiving Redo:
  SF_FSA - Far sync instance
  SF_FSB - Far sync instance
  NY_FSB - Far sync instance (alternate of NY_FSA)

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 40 seconds ago)

Test far sync failover

In case of a failure of the far sync, Data Guard should quickly switch over to the alternate far sync. In this example, if there is a disrpution of NY_FSA, Data Guard should fail over to NY_FSB. We want to observe the system response to a simulated failure at NY_FSA. There are two failure scenarios.

end-of-file on communication channel

In a TCP client, such as the primary database connection to the far sync, ORA-03113: end-of-file on communication channel will appear when the far sync is disrupted but the far sync host is still on the network. Possible causes could include far sync instance crash, restart, or host reboot. As long as the host is on the network at the link level, it is possible for the primary host operating system to quickly identify the error and return a failure to the primary database. In such cases, an error such as this one appears immediately in the alert log.

2019-09-10T13:44:33.987037-04:00
ORA-03113: end-of-file on communication channel

In such a case, recovery will be very quick. The primary will immediately connect to the alternate far sync. Synchronization can be complete in less than 10 seconds.

timeout error

A timeout will occur if there is a power loss or network disruption. A network disruption could occur as a result of a network configuration change. I can demonstrate a timeout by cutting off the TCP connectivity at the cloud level. To do that, I switch the NY_FSA host to a security group that does not allow traffic on the Oracle port (1521).

clear.sg

In this case, Oracle error detection and recovery is slow. In 30 seconds, the NetTimeout default, the primary connection to NY_FSA (LAD:2 (log archive destination) times out:

2019-09-10T18:22:28.214255-04:00
LGWR (PID:4969): ORA-16198: Received timed out error from KSR
LGWR (PID:4969): Attempting LAD:2 network reconnect (16198)
LGWR (PID:4969): LAD:2 network reconnect abandoned
2019-09-10T18:22:28.214836-04:00
Errors in file /u01/app/oracle/diag/rdbms/ny/ORCL/trace/ORCL_lgwr_4969.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR (PID:4969): Error 16198 for LNO:1 to 'NY_FSA'
2019-09-10T18:22:28.223786-04:00
LGWR (PID:4969): LAD:2 is UNSYNCHRONIZED
LGWR (PID:4969): Failed to archive LNO:1 T-1.S-333, error=16198

After another 30 seconds, the primary gives up on NY_FSA and switches to NY_FSB (LAD:3).

2019-09-10T18:22:58.232029-04:00
LGWR (PID:4969): ORA-16198: Received timed out error from KSR
LGWR (PID:4969): Error 16198 disconnecting from LAD:2 standby host 'NY_FSA'
2019-09-10T18:22:58.232531-04:00
LGWR (PID:4969): LAD:3 is UNSYNCHRONIZED
2019-09-10T18:22:58.232638-04:00
LGWR (PID:4969): LAD:2 no longer supports SYNCHRONIZATION
LGWR (PID:4969): SRL selected to archive T-1.S-334
LGWR (PID:4969): SRL selected for T-1.S-334 for LAD:3
2019-09-10T18:22:58.449325-04:00
Thread 1 advanced to log sequence 334 (LGWR switch)
  Current log# 2 seq# 334 mem# 0: +RECO01/NY/ONLINELOG/group_2.484.1018151803
  Current log# 2 seq# 334 mem# 1: +DATA01/NY/ONLINELOG/group_2.270.1018151815
2019-09-10T18:22:58.519853-04:00
ARC0 (PID:5073): Archived Log entry 675 added for T-1.S-333 ID 0x5c2b52e5 LAD:1
2019-09-10T18:22:58.707980-04:00
ARC1 (PID:5079): SRL selected for T-1.S-333 for LAD:3

Synchronization begins, but four to five minutes elapse until the primary is resynced to the new far sync, NY_FSB.

2019-09-10T18:27:01.888256-04:00
LGWR (PID:4969): LAD:3 is SYNCHRONIZED

While NY_FSB is synchronizing, there are two things to notice. The primary status is:

  NY     - Primary database
    Warning: ORA-16629: database reports a different protection level from the protection mode

and the protection level:

SQL> select protection_level from v$database;

PROTECTION_LEVEL
--------------------
RESYNCHRONIZATION

The final state is:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY     - Primary database
    NY_FSB - Far sync instance (alternate of NY_FSA)
      SF     - Physical standby database

  Members Not Receiving Redo:
  NY_FSA - Far sync instance
    Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

  SF_FSA - Far sync instance
  SF_FSB - Far sync instance

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 59 seconds ago)

Adjusting NetTimeout

On a low latency and usually reliable network, you should reduce the timeout. Example:

DGMGRL> edit database NY set property NetTimeout = 5;
Property "nettimeout" updated

This makes for a cleaner far sync switchover.

Maximum Protection

You might think that you could implement protection mode Maximum Protection with multiple far syncs:

DGMGRL> edit configuration set protection mode as MaxProtection;

Oracle will not let you do that.

Error: ORA-16627: operation disallowed since no member would remain to support protection mode

If you want to implement Maximum Protection, you will need to implement a direct route to one or more physical standby databases.

Conclusion

Several points were covered in this article.

  • Data Guard latency can be reduced by implementing far sync.
  • Data Guard can be made more reliable by placing multiple far syncs near the primary.
  • Benefits to setting up a symmetric arrangement include:
    • Standby readniness has been recently verified.
    • The standby capability is known to be identical to the primary.
    • Heightened DBA staff proficiency.
  • Setup implementation steps were covered.
  • Data Guard does not keep alternate far syncs in sync with the primary.
  • Data Guard is quick to recover from some error conditions.
  • Data Guard detection and recovery from network timeout can be slow.
  • Maximum Protection though a far sync is not supported.

Data Guard error in Maximum Availability

Introduction

Errors were noted while changing Data Guard protection mode to Maximum Availability. The root cause was mismatched standby redo log size.

Symptoms

We are in protection mode Maximum Performance and we want to set Maximum Availability. Before we begin, we notice something is amiss, but for the sake of discussion, we are not sure what, and we proceed.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxPerformance
  Members:
  NY - Primary database
    FS - Far sync instance
      Warning: ORA-16809: multiple warnings detected for the member

      SF - Physical standby database
        Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 57 seconds ago)

We neglect the errors and set protection mode to Maximum Availability.

DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.
DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    FS - Far sync instance
      Warning: ORA-16855: transport lag has exceeded specified threshold

      SF - Physical standby database
        Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 59 seconds ago)

The warnings persist. We force a logfile switch at the primary

[oracle@ip-172-31-86-22 ~]$ sqlplus sys/zystm.22@NY as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Sep 5 21:16:54 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> alter system switch logfile;

System altered.

Now errors appear in the show configuration display.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    Error: ORA-16810: multiple errors or warnings detected for the member

    FS - Far sync instance
      Warning: ORA-16809: multiple warnings detected for the member

      SF - Physical standby database
        Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 51 seconds ago)

The system has errors. Logs are not being processed. We are in hot water. We investigate.

Let’s review the standby and work backward via the far sync, and finally at the primary.

DGMGRL> show database verbose SF StatusReport
STATUS REPORT
       INSTANCE_NAME   SEVERITY ERROR_TEXT
                   *    WARNING ORA-16855: transport lag has exceeded specified threshold
                   *    WARNING ORA-16857: member disconnected from redo source for longer than specified threshold

DGMGRL> show far_sync verbose FS StatusReport
STATUS REPORT
       INSTANCE_NAME   SEVERITY ERROR_TEXT
                   *    WARNING ORA-16855: transport lag has exceeded specified threshold
                   *    WARNING ORA-16857: member disconnected from redo source for longer than specified threshold

DGMGRL> show database NY StatusReport
STATUS REPORT
       INSTANCE_NAME   SEVERITY ERROR_TEXT
                   *    WARNING ORA-16629: database reports a different protection level from the protection mode
                ORCL      ERROR ORA-16737: the redo transport service for member "FS" has an error

Message ‘the redo transport service for member “FS” has an error’ requires further drilldown:

DGMGRL> show database verbose NY LogXptStatus;
LOG TRANSPORT STATUS
PRIMARY_INSTANCE_NAME STANDBY_DATABASE_NAME     STATUS                ERROR
                ORCL                   FS      ERROR ORA-16086: Redo data cannot be written to the standby redo log

The underlying reason is found in far sync RFS trace:

Trace file /u01/app/oracle/diag/rdbms/fs/ORCL/trace/ORCL_rfs_29164.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
Build label:    RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417
ORACLE_HOME:    /u01/app/oracle/product/19.3.0/dbhome_1
System name:    Linux
Node name:      ip-172-31-28-23.ec2.internal
Release:        3.10.0-1062.el7.x86_64
Version:        #1 SMP Thu Jul 18 20:25:13 UTC 2019
Machine:        x86_64
VM name:        Xen Version: 4.2 (HVM)
Instance name: ORCL
Redo thread mounted by this instance: 1
Oracle process number: 56
Unix process pid: 29164, image: oracle@ip-172-31-28-23.ec2.internal


*** 2019-09-04T15:56:40.334571-04:00
*** SESSION ID:(66.28887) 2019-09-04T15:56:40.334605-04:00
*** CLIENT ID:() 2019-09-04T15:56:40.334616-04:00
*** SERVICE NAME:() 2019-09-04T15:56:40.334624-04:00
*** MODULE NAME:(oracle@ip-172-31-86-22.ec2.internal (TNS V1-V3)) 2019-09-04T15:56:40.334631-04:00
*** ACTION NAME:() 2019-09-04T15:56:40.334639-04:00
*** CLIENT DRIVER:() 2019-09-04T15:56:40.334646-04:00

krsv_proc_add: Request to add process to V$MANAGED_STANDBY [krsr.c:4229]
krsr_abrt: The primary database is operating in MAXIMUM PROTECTION
  or MAXIMUM AVAILABILITY mode, and the standby database
  does not contain any viable SRLs
Encountered error status 16086
krsv_proc_rem: Request to remove process from V$MANAGED_STANDBY [krsr.c:12657]

Analysis

Consider the message “the standby database does not contain any viable SRLs” (standby redo logs). By “viable” is meant standby redo logs of size greater than or equal to the primary online redo log. Check the primary online redo log:

SQL> select group#, bytes/1024/1024 mb from v$log;

    GROUP#         MB
---------- ----------
         1        200
         2        200
         3        200

All primary online redo logs are 200 MB. Now check the standby redo logs at each site.

Primary:

SQL> select group#, bytes/1024/1024 mb from v$standby_log;

    GROUP#         MB
---------- ----------
         4         50
         5         50
         6         50
         7         50
         8         50
         9        200

6 rows selected.

Already there is a problem. Not all the standby logs are the same size. Check the far sync:

SQL> select group#, bytes/1024/1024 mb from v$standby_log;

    GROUP#         MB
---------- ----------
         4         50
         5         50
         6         50
         7         50
         8         50

All the standby logs are 50 mb. They are too small. There are no viable standby redo logs. Check the standby:

SQL> select group#, bytes/1024/1024 mb from v$standby_log;

    GROUP#         MB
---------- ----------
         4         50
         5         50
         6         50
         7         50
         8         50

Again, there are no viable SRLs.

Possible root cause

The online redo log size is 200 MB, which happens to be the Database Creation Assistant (DBCA) default size.

For some reason of incorrect administration, the standby redo log size was 50 MB when the database was duplicated. The standby redo log should be 200 MB.

Corrective action

  1. Drop all standby redo logs that are not 200 MB.
  2. Create a set of 200 MB SRLs per database.

Standby database

DGMGRL> edit database SF set state=apply-off;
Succeeded.
SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4         50 UNASSIGNED
         5         50 UNASSIGNED
         6         50 UNASSIGNED
         7         50 UNASSIGNED
         8         50 UNASSIGNED

SQL> alter system set standby_file_management = manual;

System altered.

SQL> alter database drop logfile group 4;

Database altered.

SQL> alter database drop logfile group 5;

Database altered.

SQL> alter database drop logfile group 6;

Database altered.

SQL> alter database drop logfile group 7;

Database altered.

SQL> alter database drop logfile group 8;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter system set standby_file_management = auto;

System altered.

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4        200 UNASSIGNED
         5        200 UNASSIGNED
         6        200 UNASSIGNED
         7        200 UNASSIGNED
DGMGRL> edit database SF set state=apply-on;
Succeeded.

Far sync

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4         50 UNASSIGNED
         5         50 UNASSIGNED
         6         50 UNASSIGNED
         7         50 UNASSIGNED
         8         50 UNASSIGNED

SQL> alter database drop logfile group 4;

Database altered.

SQL> alter database drop logfile group 5;

Database altered.

SQL> alter database drop logfile group 6;

Database altered.

SQL> alter database drop logfile group 7;

Database altered.

SQL> alter database drop logfile group 8;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         1        200 UNASSIGNED
         2        200 UNASSIGNED
         3        200 UNASSIGNED
         4        200 UNASSIGNED

Primary

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4         50 UNASSIGNED
         5         50 UNASSIGNED
         6         50 UNASSIGNED
         7         50 UNASSIGNED
         8         50 UNASSIGNED
         9        200 UNASSIGNED

6 rows selected.

SQL> alter database drop logfile group 4;

Database altered.

SQL> alter database drop logfile group 5;

Database altered.

SQL> alter database drop logfile group 6;

Database altered.

SQL> alter database drop logfile group 7;

Database altered.

SQL> alter database drop logfile group 8;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4        200 UNASSIGNED
         5        200 UNASSIGNED
         6        200 UNASSIGNED
         9        200 UNASSIGNED

Double check all systems

Check NY, FS, and SF.

SQL> select count(*)numlog, bytes/1024/1024 mb from v$standby_log group by bytes;

    NUMLOG         MB
---------- ----------
         4        200

Data Guard

After a few seconds, note that the system state is normal. The protection mode is now Maximum Availability.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    FS - Far sync instance
      SF - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

Conclusion

Be aware of these lessons or practices:

  • Before duplicating the primary database, note the online redo log size. Check that all OLRs are the same size. Check that the SRLs are all the same size, and are the same size as the OLRs.
  • Recognize that a normal Data Guard in steady state should be free of warnings.
  • Before attempting to upgrade the protection level, resolve all persistent warnings.
  • Drill in to errors as follows:
    • show configuration
    • show objecttype objectname StatusReport
    • show objecttype objectname property
      where property is a monitor property, LogXptStatus in this case

 

Data Guard 19c in AWS with far sync

By Brian Fitzgerald

Introduction

Oracle Data Guard 19c with far sync setup is described here. Far sync can improve commit response time in a Maximum Availability Data Guard network. The Data Guard configuration is EC2 across two AWS Cloud regions. Database storage is ASM. Far sync creation is done using RMAN. The physical standby and far sync are implemented in a single configuration step. Some observations on network latency and switchover timing are shown.

License

Data Guard is a feature of the Oracle Database Enterprise Edition itself and does not require separate licensing. An Active Data Guard license is required for Far sync.

By using Amazon Elastic Cloud Computing, (EC2), you can control your license costs by configuring only the CPUs that you need.

Environment overview

A system overview is described in this table:

Description Value
Cloud AWS
Image ID ami-2051294a
Red Hat version 7.2
EC2 InstanceType m3.medium
Memory 3.75 GB
CPU 1
Swap 2 GB
Grid software owner grid
Grid Infrastructure Version 19.3.0
Database Storage ASM
Oracle software owner oracle
Oracle Database Version 19.3.0
Oracle Instance Type Restart

AWS instance type was initially m3.large (8 GB, 2 CPU), and then downsized after the grid and oracle home installations were complete. For additional information on the grid install, please refer to grid 19c install with ASM filter driver. For the network description, please refer to Data Guard network in AWS. The breakdown by region, availability zone, host, and role is:

description primary far sync standby
Region N. Virginia N. Virginia N. California
availability zone us-east-1c us-east-1d us-west-1b
ip address 172.31.86.22 172.31.28.23 172.32.10.34
hostname -s ip-172-31-86-22 ip-172-31-28-23 ip-172-32-10-34
db_unique_name NY FS SF

Network latency

We can measure the network latency from the primary to the far sync and to the standby. Start qperf server on the far sync:

[ec2-user@ip-172-31-28-23 ~]$ qperf

Start qperf server on the standby

[ec2-user@ip-172-32-10-34 ~]$ qperf

Measure bandwidth and latency between two N. Virginia availability zones:

[ec2-user@ip-172-31-86-22 ~]$ qperf 172.31.28.23 tcp_bw tcp_lat
tcp_bw:
    bw  =  92.6 MB/sec
tcp_lat:
    latency  =  254 us

Measure bandwidth and latency between regions N. Virginia and N. California:

[ec2-user@ip-172-31-86-22 ~]$ qperf 172.32.10.34 tcp_bw tcp_lat
tcp_bw:
    bw  =  18.4 MB/sec
tcp_lat:
    latency  =  33.3 ms

Network bandwidth is approximately 5x higher, and latency is more than 100x lower across in-region availability zones, compared to cross-region. This fact motivates the far-sync. actually showed that far sync can improve performance in cases where the network latency to the standby is higher than the latency to the far sync.

Database configuration

The initial primary instance configuration was:

*.archive_lag_target=900
*.audit_file_dest='/u01/app/oracle/admin/NY/adump'
*.audit_trail='db'
*.compatible='19.3.0'
*.control_files='+RECO01/NY/CONTROLFILE/current.486.1018151799','+DATA01/NY/CONTROLFILE/current.273.1018151799'
*.db_block_size=8192
*.db_create_file_dest='+DATA01'
*.db_create_online_log_dest_1='+RECO01'
*.db_create_online_log_dest_2='+DATA01'
*.db_name='ORCL'
*.db_recovery_file_dest='+RECO01'
*.db_recovery_file_dest_size=4000m
*.db_unique_name='NY'
*.dg_broker_config_file1='+DATA01/NY/dr1orcl.dat'
*.dg_broker_config_file2='+RECO01/NY/dr2orcl.dat'
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=ORCLXDB)'
*.local_listener='LISTENER_NY'
*.log_archive_format='%t_%s_%r.dbf'
*.nls_language='AMERICAN'
*.nls_territory='AMERICA'
*.open_cursors=300
*.pga_aggregate_target=360m
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=1080m
*.standby_file_management='AUTO'
*.undo_tablespace='UNDOTBS1'

If not already done, make these changes:

SQL> alter system set standby_file_management=AUTO;

System altered.

SQL> alter system set dg_broker_config_file1='+DATA01/NY/dr1orcl.dat';

System altered.

SQL> alter system set dg_broker_config_file2='+RECO01/NY/dr2orcl.dat';

System altered.

Optional settings

  • compatible. Must be set to the same value on the the primary, on the far sync, and on the standby.
  • db_create_online_log_dest_n. Points logfile members to specific disk groups. Simplifies alter database add standby logfile syntax.
  • archive_lag_target. Time-boxes each archive log. Adjust to manage control file contention.

Alter database

Make sure these alter database alterations are done:

SQL> alter database force logging;

Database altered.

Make these changes with the database mounted:

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.
SQL> alter database archivelog;

Database altered.

SQL> alter database flashback on;

Database altered.

SQL> alter database open;

Database altered.

Standby logs

Identify the online redo log size. Check that all online redo logs are the same size. Check that this query returns exactly one row:

SQL> select count(*)numlogs, bytes/1024/1024 mb from v$log group by bytes;

   NUMLOGS         MB
---------- ----------
         3        200

If all online logs are not the same size, correct that condition before proceeding.

If standby redo logs exist, check that they are all the same size, and the same size as the online redo logs. Check that this query returns exactly one row:

SQL> select count(*)numlogs, bytes/1024/1024 mb from v$standby_log group by bytes;

   NUMLOGS         MB
---------- ----------
         4        200

Correct discrepant conditions before proceeding.

Create standby logs

Create standby logs, if needed. You can set:

SQL> alter system set db_create_online_log_dest_1 = '+DATA01';

System altered.

SQL> alter system set db_create_online_log_dest_2 = '+RECO01';

System altered.

And then run, for example:

SQL> alter database add standby logfile size 200m;

as many times as needed to get the desired number of standby logs. The optimal number of standby logs is usually greater than the number of online logs. If a high apply backlog is expected, then increase this number further.

Static listeners

On all hosts, primary, far sync, and standby, in the grid account, in $ORACLE_HOME/network/admin, edit listener.ora.

Primary:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = NY)
    )
  )

Far sync:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = FS)
    )
  )

Standby:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = SF)
    )
  )

Reload the listener. For example:

[grid@ip-172-31-86-22 ~]$ lsnrctl reload

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 02-SEP-2019 12:28:09

Copyright (c) 1991, 2019, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
The command completed successfully

startup far sync

On the far sync configure and startup nomount

Far sync audit directory

At the far sync

[oracle@ip-172-32-10-34 ~]$ mkdir -p /u01/app/oracle/admin/FS/adump

Far sync oratab and environment

[oracle@ip-172-31-28-23 ~]$ echo FS:/u01/app/oracle/product/19.3.0/dbhome_1:N >> /etc/oratab
[oracle@ip-172-31-28-23 ~]$ . oraenv
ORACLE_SID = [FS] ? FS
The Oracle base remains unchanged with value /u01/app/oracle

Far sync orapwd

Create orapwFS on the far sync:

[oracle@ip-172-31-28-23 ~]$ alias oh
alias oh='cd $ORACLE_HOME'
[oracle@ip-172-31-28-23 ~]$ oh
[oracle@ip-172-31-28-23 dbhome_1]$ cd dbs
[oracle@ip-172-31-28-23 dbs]$ orapwd file=orapwFS entries=10 password=zystm.22

Far sync startup

Create a temporary, minimal pfile on the far sync. Set the compatible setting to match the primary.

[oracle@ip-172-31-28-23 ops]$ cat > /tmp/initFS.ora
db_name = ORCL
compatible = '19.3.0'
^D
[oracle@ip-172-31-28-23 dbs]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Aug 31 22:01:58 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/initFS.ora'
ORACLE instance started.

Total System Global Area  243268216 bytes
Fixed Size                  8895096 bytes
Variable Size             180355072 bytes
Database Buffers           50331648 bytes
Redo Buffers                3686400 bytes
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

Startup the standby

On the standby configure and startup nomount

Standby audit directory

At the standby

[oracle@ip-172-32-10-34 ~]$ mkdir -p /u01/app/oracle/admin/SF/adump

Standby oratab and environment

[oracle@ip-172-32-10-34 ~]$ echo SF:/u01/app/oracle/product/19.3.0/dbhome_1:N >> /etc/oratab
[oracle@ip-172-32-10-34 ~]$ . oraenv
ORACLE_SID = [SF] ? SF
The Oracle base remains unchanged with value /u01/app/oracle

Standby orapwd

Create orapwSF on the standby:
[oracle@ip-172-32-10-34 ~]$ oh
[oracle@ip-172-32-10-34 dbhome_1]$ cd dbs
[oracle@ip-172-32-10-34 dbs]$ orapwd file=orapwSF entries=10 password=zystm.22

Standby startup

Create a temporary, minimal pfile on the standby. Set the compatible setting to match the primary.

[oracle@ip-172-32-10-34 ops]$ cat > /tmp/initSF.ora
db_name = ORCL
compatible = '19.3.0'
^D
[oracle@ip-172-32-10-34 dbs]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Aug 31 22:01:58 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/initSF.ora'
ORACLE instance started.

Total System Global Area  243268216 bytes
Fixed Size                  8895096 bytes
Variable Size             180355072 bytes
Database Buffers           50331648 bytes
Redo Buffers                3686400 bytes
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

tnsnames.ora

On all hosts, in the oracle account, in $ORACLE_HOME/network/admin/tnsnames.ora, add these entries:

NY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 172.31.86.22)(PORT = 1521))
    (CONNECT_DATA =
      (SID = NY)
    )
  )

FS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 172.31.28.23)(PORT = 1521))
    (CONNECT_DATA =
      (SID = FS)
    )
  )

SF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 172.32.10.34)(PORT = 1521))
    (CONNECT_DATA =
      (SID = SF)
    )
  )

Connectivity checklist

Check that you have configured all of these:

  • VPC enable DNS hostnames
  • DNS resolution across the peering connection
  • Route tables across the peering connection
  • Security groups
  • Static listener
  • orapwd
  • tnsnames.ora
  • ORACLE_HOME environment variable

Test connectivity

From the primary, test connectivity to all three instances:

sqlplus sys/zystm.22@OH as sysdba
sqlplus sys/zystm.22@VA_FS as sysdba
sqlplus sys/zystm.22@OH as sysdba

From the far sync and the standby, repeat these checks.

Duplicate primary to far sync

Now we are going to kick off an RMAN script. RMAN will transfer the spfile, the password file, and the control file.

[oracle@ip-172-31-86-22 ops]$ cat dup.db.farsync.rcv
DUPLICATE TARGET DATABASE
  FOR FARSYNC
  FROM ACTIVE DATABASE
  SPFILE
    SET db_unique_name='FS'
    SET dg_broker_config_file1='+DATA01/FS/dr1orcl.dat'
    SET dg_broker_config_file2='+RECO01/FS/dr2orcl.dat'
    SET audit_file_dest='/u01/app/oracle/admin/FS/adump'
  NOFILENAMECHECK;
[oracle@ip-172-31-86-22 ops]$ rman target sys/zystm.22@NY auxiliary sys/zystm.22@FS

Recovery Manager: Release 19.0.0.0.0 - Production on Thu Sep 5 04:53:02 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1545932534)
connected to auxiliary database: ORCL (not mounted)

RMAN> @ dup.db.farsync.rcv

RMAN> DUPLICATE TARGET DATABASE
2>   FOR FARSYNC
3>   FROM ACTIVE DATABASE
4>   SPFILE
5>     SET db_unique_name='FS'
6>     SET dg_broker_config_file1='+DATA01/FS/dr1orcl.dat'
7>     SET dg_broker_config_file2='+RECO01/FS/dr2orcl.dat'
8>     SET audit_file_dest='/u01/app/oracle/admin/FS/adump'
9>   NOFILENAMECHECK;
Starting Duplicate Db at 05-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=39 device type=DISK

contents of Memory Script:
{
   backup as copy reuse
   passwordfile auxiliary format  '/u01/app/oracle/product/19.3.0/dbhome_1/dbs/orapwORCL'   ;
   restore clone from service  'NY' spfile to
 '/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora';
   sql clone "alter system set spfile= ''/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora''";
}
executing Memory Script

Starting backup at 05-SEP-19
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=88 device type=DISK
Finished backup at 05-SEP-19

Starting restore at 05-SEP-19
using channel ORA_AUX_DISK_1

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service NY
channel ORA_AUX_DISK_1: restoring SPFILE
output file name=/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 05-SEP-19

sql statement: alter system set spfile= ''/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora''

contents of Memory Script:
{
   sql clone "alter system set  db_unique_name =
 ''FS'' comment=
 '''' scope=spfile";
   sql clone "alter system set  dg_broker_config_file1 =
 ''+DATA01/FS/dr1orcl.dat'' comment=
 '''' scope=spfile";
   sql clone "alter system set  dg_broker_config_file2 =
 ''+RECO01/FS/dr2orcl.dat'' comment=
 '''' scope=spfile";
   sql clone "alter system set  audit_file_dest =
 ''/u01/app/oracle/admin/FS/adump'' comment=
 '''' scope=spfile";
   shutdown clone immediate;
   startup clone nomount;
}
executing Memory Script

sql statement: alter system set  db_unique_name =  ''FS'' comment= '''' scope=spfile

sql statement: alter system set  dg_broker_config_file1 =  ''+DATA01/FS/dr1orcl.dat'' comment= '''' scope=spfile

sql statement: alter system set  dg_broker_config_file2 =  ''+RECO01/FS/dr2orcl.dat'' comment= '''' scope=spfile

sql statement: alter system set  audit_file_dest =  ''/u01/app/oracle/admin/FS/adump'' comment= '''' scope=spfile

Oracle instance shut down

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    1140849904 bytes

Fixed Size                     8895728 bytes
Variable Size                301989888 bytes
Database Buffers             822083584 bytes
Redo Buffers                   7880704 bytes
duplicating Online logs to Oracle Managed File (OMF) location
duplicating Datafiles to Oracle Managed File (OMF) location

contents of Memory Script:
{
   sql clone "alter system set  control_files =
  ''+RECO01/FS/CONTROLFILE/current.257.1018155321'', ''+DATA01/FS/CONTROLFILE/current.258.1018155321'' comment=
 ''Set by RMAN'' scope=spfile";
   restore clone from service  'NY' farsync controlfile;
}
executing Memory Script

sql statement: alter system set  control_files =   ''+RECO01/FS/CONTROLFILE/current.257.1018155321'', ''+DATA01/FS/CONTROLFILE/current.258.1018155321'' comment= ''Set by RMAN'' scope=spfile

Starting restore at 05-SEP-19
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=46 device type=DISK

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service NY
channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:01
output file name=+RECO01/FS/CONTROLFILE/current.383.1018155325
output file name=+DATA01/FS/CONTROLFILE/current.265.1018155325
Finished restore at 05-SEP-19

contents of Memory Script:
{
   sql clone 'alter database mount';
}
executing Memory Script

sql statement: alter database mount

contents of Memory Script:
{
   sql 'alter system archive log current';
}
executing Memory Script

sql statement: alter system archive log current
Finished Duplicate Db at 05-SEP-19

RMAN> **end-of-file**

Note that there are no online logs, and that the standby logs are the same count and size as the primary.

SQL> select count(*)numlogs, bytes/1024/1024 mb from v$log group by bytes;

no rows selected

SQL> select count(*)numlogs, bytes/1024/1024 mb from v$standby_log group by bytes;

   NUMLOGS         MB
---------- ----------
         4        200

Notice these points about the operation:

  • All file transfers were done with RMAN. No ssh connection is needed.
  • There is no ALTER DATABASE CREATE FAR SYNC INSTANCE CONTROLFILE step. RMAN handled the control file by backing it up at the primary and tranferring it to the far sync.
  • Although the primary spfile is in ASM, the far sync spfile ends up on the file system in directory $ORACLE_HOME/dbs.
  • The orapwORCL that you created at the far sync got overwritten by a backup of the primary orapwORCL.

Duplicate primary to standby

Allocate an appropriate number of channels to reduce the time needed to duplicate the database.

[oracle@ip-172-31-86-22 ops]$ cat dup.db.standby.rcv
run {

allocate channel ch01 device type disk;
allocate channel ch02 device type disk;
allocate auxiliary channel aux01 device type disk;
allocate auxiliary channel aux02 device type disk;

DUPLICATE TARGET DATABASE
  FOR STANDBY
  FROM ACTIVE DATABASE
  DORECOVER
  SPFILE
    SET db_unique_name='SF'
    SET dg_broker_config_file1='+DATA01/SF/dr1orcl.dat'
    SET dg_broker_config_file2='+RECO01/SF/dr2orcl.dat'
    SET audit_file_dest='/u01/app/oracle/admin/SF/adump'
  NOFILENAMECHECK;

}
[oracle@ip-172-31-86-22 ops]$ vi dup.db.standby.rcv
[oracle@ip-172-31-86-22 ops]$ rman target sys/zystm.22@NY auxiliary sys/zystm.22@SF

Recovery Manager: Release 19.0.0.0.0 - Production on Thu Sep 5 05:04:52 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1545932534)
connected to auxiliary database: ORCL (not mounted)

RMAN> @ dup.db.standby.rcv

RMAN> run {
2>
3> allocate channel ch01 device type disk;
4> allocate channel ch02 device type disk;
5> allocate auxiliary channel aux01 device type disk;
6> allocate auxiliary channel aux02 device type disk;
7>
8> DUPLICATE TARGET DATABASE
9>   FOR STANDBY
10>   FROM ACTIVE DATABASE
11>   DORECOVER
12>   SPFILE
13>     SET db_unique_name='SF'
14>     SET dg_broker_config_file1='+DATA01/SF/dr1orcl.dat'
15>     SET dg_broker_config_file2='+RECO01/SF/dr2orcl.dat'
16>     SET audit_file_dest='/u01/app/oracle/admin/SF/adump'
17>   NOFILENAMECHECK;
18>
19> }
using target database control file instead of recovery catalog
allocated channel: ch01
channel ch01: SID=84 device type=DISK

allocated channel: ch02
channel ch02: SID=85 device type=DISK

allocated channel: aux01
channel aux01: SID=40 device type=DISK

allocated channel: aux02
channel aux02: SID=41 device type=DISK

Starting Duplicate Db at 05-SEP-19
current log archived

contents of Memory Script:
{
   backup as copy reuse
   passwordfile auxiliary format  '/u01/app/oracle/product/19.3.0/dbhome_1/dbs/orapwORCL'   ;
   restore clone from service  'NY' spfile to
 '/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora';
   sql clone "alter system set spfile= ''/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora''";
}
executing Memory Script

Starting backup at 05-SEP-19
Finished backup at 05-SEP-19

Starting restore at 05-SEP-19

channel aux01: starting datafile backup set restore
channel aux01: using network backup set from service NY
channel aux01: restoring SPFILE
output file name=/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora
channel aux01: restore complete, elapsed time: 00:00:02
Finished restore at 05-SEP-19

sql statement: alter system set spfile= ''/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora''

contents of Memory Script:
{
   sql clone "alter system set  db_unique_name =
 ''SF'' comment=
 '''' scope=spfile";
   sql clone "alter system set  dg_broker_config_file1 =
 ''+DATA01/SF/dr1orcl.dat'' comment=
 '''' scope=spfile";
   sql clone "alter system set  dg_broker_config_file2 =
 ''+RECO01/SF/dr2orcl.dat'' comment=
 '''' scope=spfile";
   sql clone "alter system set  audit_file_dest =
 ''/u01/app/oracle/admin/SF/adump'' comment=
 '''' scope=spfile";
   shutdown clone immediate;
   startup clone nomount;
}
executing Memory Script

sql statement: alter system set  db_unique_name =  ''SF'' comment= '''' scope=spfile

sql statement: alter system set  dg_broker_config_file1 =  ''+DATA01/SF/dr1orcl.dat'' comment= '''' scope=spfile

sql statement: alter system set  dg_broker_config_file2 =  ''+RECO01/SF/dr2orcl.dat'' comment= '''' scope=spfile

sql statement: alter system set  audit_file_dest =  ''/u01/app/oracle/admin/SF/adump'' comment= '''' scope=spfile

Oracle instance shut down

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    1140849904 bytes

Fixed Size                     8895728 bytes
Variable Size                301989888 bytes
Database Buffers             822083584 bytes
Redo Buffers                   7880704 bytes
allocated channel: aux01
channel aux01: SID=44 device type=DISK
allocated channel: aux02
channel aux02: SID=45 device type=DISK
duplicating Online logs to Oracle Managed File (OMF) location
duplicating Datafiles to Oracle Managed File (OMF) location

contents of Memory Script:
{
   sql clone "alter system set  control_files =
  ''+RECO01/SF/CONTROLFILE/current.257.1018156101'', ''+DATA01/SF/CONTROLFILE/current.258.1018156101'' comment=
 ''Set by RMAN'' scope=spfile";
   restore clone from service  'NY' standby controlfile;
}
executing Memory Script

sql statement: alter system set  control_files =   ''+RECO01/SF/CONTROLFILE/current.257.1018156101'', ''+DATA01/SF/CONTROLFILE/current.258.1018156101'' comment= ''Set by RMAN'' scope=spfile

Starting restore at 05-SEP-19

channel aux01: starting datafile backup set restore
channel aux01: using network backup set from service NY
channel aux01: restoring control file
channel aux01: restore complete, elapsed time: 00:00:04
output file name=+RECO01/SF/CONTROLFILE/current.262.1018156111
output file name=+DATA01/SF/CONTROLFILE/current.267.1018156111
Finished restore at 05-SEP-19

contents of Memory Script:
{
   sql clone 'alter database mount standby database';
}
executing Memory Script

sql statement: alter database mount standby database

contents of Memory Script:
{
   set newname for clone tempfile  1 to new;
   switch clone tempfile all;
   set newname for clone datafile  1 to new;
   set newname for clone datafile  3 to new;
   set newname for clone datafile  4 to new;
   set newname for clone datafile  7 to new;
   restore
   from  nonsparse   from service
 'NY'   clone database
   ;
   sql 'alter system archive log current';
}
executing Memory Script

executing command: SET NEWNAME

renamed tempfile 1 to +DATA01 in control file

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting restore at 05-SEP-19

channel aux01: starting datafile backup set restore
channel aux01: using network backup set from service NY
channel aux01: specifying datafile(s) to restore from backup set
channel aux01: restoring datafile 00001 to +DATA01
channel aux02: starting datafile backup set restore
channel aux02: using network backup set from service NY
channel aux02: specifying datafile(s) to restore from backup set
channel aux02: restoring datafile 00003 to +DATA01
channel aux02: restore complete, elapsed time: 00:00:38
channel aux02: starting datafile backup set restore
channel aux02: using network backup set from service NY
channel aux02: specifying datafile(s) to restore from backup set
channel aux02: restoring datafile 00004 to +DATA01
channel aux01: restore complete, elapsed time: 00:00:44
channel aux01: starting datafile backup set restore
channel aux01: using network backup set from service NY
channel aux01: specifying datafile(s) to restore from backup set
channel aux01: restoring datafile 00007 to +DATA01
channel aux01: restore complete, elapsed time: 00:00:05
channel aux02: restore complete, elapsed time: 00:00:12
Finished restore at 05-SEP-19

sql statement: alter system archive log current
current log archived

contents of Memory Script:
{
   restore clone force from service  'NY'
           archivelog from scn  2246774;
   switch clone datafile all;
}
executing Memory Script

Starting restore at 05-SEP-19

channel aux01: starting archived log restore to default destination
channel aux01: using network backup set from service NY
channel aux01: restoring archived log
archived log thread=1 sequence=9
channel aux02: starting archived log restore to default destination
channel aux02: using network backup set from service NY
channel aux02: restoring archived log
archived log thread=1 sequence=10
channel aux01: restore complete, elapsed time: 00:00:02
channel aux02: restore complete, elapsed time: 00:00:02
Finished restore at 05-SEP-19

datafile 1 switched to datafile copy
input datafile copy RECID=5 STAMP=1018156184 file name=+DATA01/SF/DATAFILE/system.266.1018156127
datafile 3 switched to datafile copy
input datafile copy RECID=6 STAMP=1018156184 file name=+DATA01/SF/DATAFILE/sysaux.265.1018156129
datafile 4 switched to datafile copy
input datafile copy RECID=7 STAMP=1018156184 file name=+DATA01/SF/DATAFILE/undotbs1.264.1018156167
datafile 7 switched to datafile copy
input datafile copy RECID=8 STAMP=1018156184 file name=+DATA01/SF/DATAFILE/users.269.1018156173

contents of Memory Script:
{
   set until scn  2247713;
   recover
   standby
   clone database
    delete archivelog
   ;
}
executing Memory Script

executing command: SET until clause

Starting recover at 05-SEP-19

starting media recovery

archived log for thread 1 with sequence 9 is already on disk as file +RECO01/SF/ARCHIVELOG/2019_09_05/thread_1_seq_9.261.1018156181
archived log for thread 1 with sequence 10 is already on disk as file +RECO01/SF/ARCHIVELOG/2019_09_05/thread_1_seq_10.259.1018156183
archived log file name=+RECO01/SF/ARCHIVELOG/2019_09_05/thread_1_seq_9.261.1018156181 thread=1 sequence=9
archived log file name=+RECO01/SF/ARCHIVELOG/2019_09_05/thread_1_seq_10.259.1018156183 thread=1 sequence=10
media recovery complete, elapsed time: 00:00:01
Finished recover at 05-SEP-19

contents of Memory Script:
{
   delete clone force archivelog all;
}
executing Memory Script

deleted archived log
archived log file name=+RECO01/SF/ARCHIVELOG/2019_09_05/thread_1_seq_9.261.1018156181 RECID=1 STAMP=1018156181
Deleted 1 objects

deleted archived log
archived log file name=+RECO01/SF/ARCHIVELOG/2019_09_05/thread_1_seq_10.259.1018156183 RECID=2 STAMP=1018156182
Deleted 1 objects

Finished Duplicate Db at 05-SEP-19
released channel: ch01
released channel: ch02
released channel: aux01
released channel: aux02

RMAN> **end-of-file**

RMAN>

Recovery Manager complete.

Notice that the online logs and the standby logs are the same as the primary:

SQL> select count(*)numlogs, bytes/1024/1024 mb from v$log group by bytes;

   NUMLOGS         MB
---------- ----------
         3        200

SQL> select count(*)numlogs, bytes/1024/1024 mb from v$standby_log group by bytes;

   NUMLOGS         MB
---------- ----------
         4        200

Configure restart

All new systems should be mounted at this stage. Register your restart instances. (At the primary, this step might already be done.)

[oracle@ip-172-31-86-22 ~]$ srvctl add database -database NY -role PRIMARY -stopoption IMMEDIATE -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile +DATA01/ORCL/PARAMETERFILE/spfile.266.1017440879 -diskgroup DATA01,RECO01

At the far sync:

[oracle@ip-172-31-28-23 ~]$ srvctl add database -database FS -role physical_standby -startoption MOUNT -stopoption ABORT -instance FS -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileFS.ora -diskgroup DATA01,RECO01

To activate the change, I found it necessary to issue srvctl start. That does not seem right. “srvctl enable database” should do it. In restart, “srvctl enable instance” is not available. Unless you activate the instance, it will not start automatically upon host reboot.

[oracle@ip-172-31-28-23 ~]$ srvctl start database -database FS

At the standby:

[oracle@ip-172-32-10-34 ~]$ srvctl add database -database SF -role physical_standby -startoption MOUNT -stopoption ABORT -instance SF -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileSF.ora -diskgroup DATA01,RECO01
[oracle@ip-172-32-10-34 ~]$ srvctl start database -database SF

Reboot all hosts and check that the database instances start automatically, in the proper startup mode.

Validate static listeners

Validating the static listeners is critical. For each database or far sync (NY, FS, SF):

  • Connect over the network with sqlplus
  • Shutdown (shutdown abort for standbys and far syncs)
  • Startup (startup mount for standbys and far syncs)

Example:

[oracle@ip-172-31-28-23 ops]$ sqlplus sys/zystm.22@FS as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Sep 13 11:36:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.

This is an example of a failed static listener check.

[oracle@ip-172-31-28-23 ops]$ sqlplus sys/zystm.22@FS as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Sep 13 11:36:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.
SQL> shutdown abort
ORACLE instance shut down.
ERROR:
ORA-12505: TNS:listener does not currently know of SID given in connect
descriptor


SQL> startup mount
SP2-0640: Not connected

Resolve all failures before proceeding.

Broker start

At this stage, primary database is open and the far sync and standby intances are mounted. At the primary, far sync, and standby, start the Data Guard broker:

SQL> alter system set dg_broker_start = true;

System altered.

Create the broker configuration

Connect to the Data Guard broker:

[oracle@ip-172-31-86-22 broker]$ dgmgrl sys/zystm.22@NY
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Mon Sep 2 13:59:25 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "NY"
Connected as SYSDBA.

Create the Data Guard broker configuration. One way to do it is to create and test the standby first, and add the far sync later. However, in this example, we add the far sync and the standby in one fell swoop.

While creating new objects, if you want to preserve name case, you should quote your identifiers.

DGMGRL> create configuration 'ORCL_CONFIG' as primary database is 'NY' connect identifier is 'NY';
Configuration "ORCL_CONFIG" created with primary database "NY"
DGMGRL> add far_sync 'FS' as connect identifier is 'FS';
far sync instance "FS" added
DGMGRL> add database 'SF' as connect identifier is 'SF' maintained as physical;
Database "SF" added

Add the redo routes:

DGMGRL> edit database NY set property RedoRoutes = '(LOCAL : FS SYNC)';
Property "redoroutes" updated
DGMGRL> edit database SF set property RedoRoutes = '(LOCAL : FS SYNC)';
Property "redoroutes" updated
DGMGRL> edit far_sync FS set property RedoRoutes = '(NY : SF ASYNC)(SF : NY ASYNC)';
Property "redoroutes" updated

Enable the configuration

DGMGRL> enable configuration
Enabled.

A point of interest is the fal_server configuration symmetry.

parameter primary far sync standby
fal_server ‘FS’,’SF’* ‘NY’,’SF’ ‘FS’,’NY’

* after switchover

The Data Guard broker sets fal_server at the far sync and standby when you enable the configuration. If you perform a switchover, the broker sets fal_server at the new standby and clears fal_server at the old standby.

Switchover test

Test switchover to SF. The output becomes:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Oracle Clusterware is restarting database "NY" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "NY"
Connected to "NY"
Switchover succeeded, new primary is "sf"

Test switchover to NY. The output is now:

DGMGRL> switchover to NY
Performing switchover NOW, please wait...
New primary database "NY" is opening...
Oracle Clusterware is restarting database "SF" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to "SF"
Connected to "SF"
Switchover succeeded, new primary is "ny"

Setting the protection mode

Maximum Performance

In Maximum Performance protection mode:

Transactions commit as soon as all redo data generated by those transactions has been written to the online log

Maximum Availability

In Maximum Availability mode:

Under normal operations, transactions do not commit until all redo data needed to recover those transactions has been written to the online redo log AND based on user configuration, one of the following is true:

    • redo has been received at the standby, I/O to the standby redo log has been initiated, and acknowledgement sent back to primary
    • redo has been received and written to standby redo log at the standby and acknowledgement sent back to primary

If the primary does not receive acknowledgement from at least one synchronized standby, then it operates as if it were in maximum performance mode to preserve primary database availability until it is again able to write its redo stream to a synchronized standby database.

With far sync, replace “standby” in the preceding description with “far sync”. Transactions do not commit until redo has been written to the far sync standby log. Because the far sync is in the same region as the primary, commit performance of a Maximum Availability system is expected to be better with a far sync.

Set protection mode to Maximum Availability.

DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.

Wait a few minutes and check:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    FS - Far sync instance
      SF - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

Test switchover and switch back:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Oracle Clusterware is restarting database "NY" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to "NY"
Connected to "NY"
Switchover succeeded, new primary is "sf"
DGMGRL> switchover to NY
Performing switchover NOW, please wait...
Operation requires a connection to database "NY"
Connecting ...
Connected to "NY"
Connected as SYSDBA.
New primary database "NY" is opening...
Oracle Clusterware is restarting database "SF" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SF"
Connected to "SF"
Switchover succeeded, new primary is "ny"

Notice at the far sync that one or more standby logs are assigned

SQL> select group#, bytes/1024/1024 mb, thread#, sequence#, status from v$standby_log;

    GROUP#         MB    THREAD#  SEQUENCE# STATUS
---------- ---------- ---------- ---------- ----------
         1        200          1        572 ACTIVE
         2        200          1          0 UNASSIGNED
         3        200          1          0 UNASSIGNED
         4        200          0          0 UNASSIGNED

and at the physical standby one or more standby logs are assigned.

SQL> select group#, bytes/1024/1024 mb, thread#, sequence#, status from v$standby_log;

    GROUP#         MB    THREAD#  SEQUENCE# STATUS
---------- ---------- ---------- ---------- ----------
         4        200          1          0 UNASSIGNED
         5        200          1        572 ACTIVE
         6        200          0          0 UNASSIGNED
         7        200          0          0 UNASSIGNED

Disabling far sync

You can disable far sync

DGMGRL> edit database NY set property RedoRoutes = '(LOCAL : SF SYNC)';
Property "redoroutes" updated
DGMGRL> edit database SF set property RedoRoutes = '(LOCAL : NY SYNC)';
Property "redoroutes" updated
DGMGRL> edit far_sync FS set property RedoRoutes = '';
Property "redoroutes" updated
DGMGRL> disable far_sync FS
Disabled.

The configuration display looks like this:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    SF - Physical standby database

  Members Not Receiving Redo:
  FS - Far sync instance (disabled)
    ORA-16749: The member was disabled manually.

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 24 seconds ago)

Re-enabling far sync

You can re-enable far sync. You must touch the objects in the reverse order that you did when you disabled far sync: far sync, far sync redo routes, database redo routes.

DGMGRL> enable far_sync FS
Enabled.
DGMGRL> edit far_sync FS set property RedoRoutes = '(NY : SF ASYNC)(SF : NY ASYNC)';
Property "redoroutes" updated
DGMGRL> edit database SF set property RedoRoutes = '(LOCAL : FS SYNC)';
Property "redoroutes" updated
DGMGRL> edit database NY set property RedoRoutes = '(LOCAL : FS SYNC)';
Property "redoroutes" updated

Check:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    FS - Far sync instance
      SF - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 39 seconds ago)

Removing the far sync

You can remove a disabled far sync from the configuration.

DGMGRL> remove far_sync FS
Removed far sync instance "fs" from the configuration

Switchover performance

We want to know how long after starting the switchover until the new primary is available. In other words how much time elapsed from when the switchover command is issued until the new primary is open. The times appear in the alert log. For example:

2019-09-05T14:44:04.271609-04:00
SWITCHOVER VERIFY BEGIN
...
2019-09-05T14:44:59.392512-04:00
TMI: adbdrv open database END 2019-09-05 14:44:59.392313
Starting background process CJQ0
Completed: ALTER DATABASE OPEN

We can compare database opening time with far sync

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    FS - Far sync instance
      SF - Physical standby database

to time without far sync (direct route)

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    SF - Physical standby database

With far sync, the switchover timing data are:

sw to date start sw new primary open elapsed time to open (s) redo route prot level
SF 9/5/2019 05:54:16.9 05:55:11.4 54.6 far sync Max Perf
NY 9/5/2019 06:02:11.7 06:04:12.2 120.5 far sync Max Perf
SF 9/5/2019 06:09:34.2 06:10:40.9 66.8 far sync Max Perf
NY 9/5/2019 06:13:32.7 06:15:21.9 109.1 far sync Max Perf
SF 9/5/2019 13:38:38.6 13:40:18.1 99.5 far sync Max Avail
NY 9/5/2019 14:07:56.7 14:10:23.7 147.0 far sync Max Avail
SF 9/5/2019 14:15:32.9 14:17:58.9 145.9 far sync Max Avail
NY 9/5/2019 14:20:04.6 14:21:38.9 94.3 far sync Max Avail
Average 104.7
Stdev 33.5

compared to without far sync:

sw to date start sw new primary open elapsed time to open (s) redo route prot level
SF 9/5/2019 06:19:48.0 06:20:47.4 59.4 direct Max Perf
NY 9/5/2019 06:23:10.5 06:24:06.8 56.3 direct Max Perf
SF 9/5/2019 06:27:20.5 06:28:19.5 59.0 direct Max Perf
NY 9/5/2019 06:29:57.2 06:30:54.0 56.8 direct Max Perf
SF 9/5/2019 14:34:49.6 14:35:50.1 60.5 direct Max Avail
NY 9/5/2019 14:38:04.4 14:38:59.3 54.9 direct Max Avail
SF 9/5/2019 14:40:59.6 14:41:58.1 58.5 direct Max Avail
NY 9/5/2019 14:44:04.3 14:44:59.4 55.1 direct Max Avail
Average 57.6
Stdev 2.1

Without far sync, switchover time is lower (Average) and more consistent (Stdev). Far sync, therefore, imposes a switchover time penalty.

Conclusion

A far sync implementation is presented. There are several points of interest.

  • The platform is Red Hat Linux 7.2 on Amazon AWS EC2.
  • The network setup is described in a separate article.
  • You can measure network latency and throughput with qperf.
  • The grid and oracle home users are separate accounts.
  • Database storage is in ASM, not operating system files.
  • The number of standby log files depends on the expected workload.
  • Parameter compatible must be set the same at the primary, far sync, and standby.
  • Check standby log size before duplicating the database.
  • File transfer is by RMAN only.
  • SSH trust is not required to setup Data Guard.
  • Multiple RMAN channels should be used to duplicate the database.
  • Configure Restart for all instances.
  • Validate the static listeners
  • Data Guard configuration and operations are done from the broker, not sqlplus.
  • The the standby and the far sync are set up at the same time, not separately.
  • Reboot hosts and confirm automatic startup, startup mode, and Data Guard role.
  • Test switchover after configuring restart.
  • Data Guard broker manages fal_server at all nodes during role transitions.
  • You can change the protection mode to Maximum Availability.
  • Far sync can reduce or prevent data loss.
  • With far sync, switchover takes longer and the elapsed time is less consistent.

We have presented an Oracle Data Guard 19c setup with far sync. Some techniques here are variants on contributions found elsewhere. Some findings on performance are presented.

Data Guard network in AWS

By Brian Fitzgerald

Introduction

This a cross-region Data Guard network in Amazon AWS. This article covers VPC, peering, and security groups.

VPC

We are starting with a default VPC at the primary with CIDR block 172.31.0.0/16. At the standby, in preparation for peering, we will create a new VPC with non-overlapping CIDR block 172.32.0.0/20.

cr.vpcEnable DNS hostnames, which is required for the Oracle grid setup.

vpc.enable.dns.hostnames2

We need to create at least one subnet.

cr.subnet

Optionally, you may create an internet gateway and route.

Peering

At the standby VPC, setup peering to the primary VPCs:

cr.peering

Note the acknowledgement:

conf.peering

At the primary, accept the request.

accept.peering

Press “Yes, Accept”

accept.peering.yes.png

Note the acknowledgement. Select “Modify my route tables now”:

will.modify.route

At the standby, add a route to the primary VPC.

edit.routes.sf

At the primary, add a route to the standby VPC.

edit.routes.va

Security Groups

Create primary, far sync, and standby EC2 instances. If you have Enterprise Manager set up already, you may consider it now. Assuming these IP addresses:

Description IP Address
primary 172.31.86.22
far sync 172.31.28.23
standby 172.32.10.34
Enterprise Manager 172.31.82.194

A minimalist security group arrangement involves just the database boxes. In that case, setup security groups such as the following. At the primary EC2 instance, accept incoming Oracle connections from the far sync and the standby. At all EC2 instances, accept incoming SSH and em agent connections.

sg.prim

At the far sync, accept Oracle connections from the primary and the standby.

sg.far

At the standby, accept Oracle connections from the primary and the far sync.

sg.stby

Suppose, however, that there are database application clients on subnets 172.32.0.0/20 and 172.31.80.0/20. In that case, at you could revise the primary and standby security groups as follows. At both the primary and the standby, accept Oracle connections from the primary and the standby subnets and from the far sync.

sg.app.prim

sg.app.stby

Instead of opening up incoming SSH to all hosts, you can create one or more bastion hosts, and you can restrict incoming SSH to only the bastion.

Conclusion

Using the AWS console, you can setup networking for a cross-region Data Guard network. At the VPC level, the first key point is to select non-overlapping IP address ranges with a view to establishing peering. For a smooth grid install, enable DNS hostnames. After your EC2 instances are created, you must configure security groups to accept incoming database, EM agent, and ssh connections. Security groups should be as restrictive as possible.

grid setup error: could not retrieve local node name

By Brian Fitzgerald

Error

In AWS EC2 Redhat Linux, gridSetup.sh reports this error:

PRVF-0002 : could not retrieve local node name

Solution

Set the EnableDnsHostnames flag

Error details

In Linux, you run Oracle 19c grid setup

[grid@ip-10-0-2-197 ~]$ cd /u01/app/19.3.0/grid
[grid@ip-10-0-2-197 grid]$ ./gridSetup.sh

After the splash screen:

splash

this error appears:

errmsg

The error appears in several log files. For example: /tmp/GridSetupActions2019-09-01_09-40-29AM/gridSetupActions2019-09-01_09-40-29AM.log

SEVERE: [Sep 1, 2019 9:40:36 AM] [FATAL] PRVF-0002 : could not retrieve local node name
ip-10-0-2-197.ec2.internal: ip-10-0-2-197.ec2.internal: Name or service not known.
Refer associated stacktrace #oracle.install.commons.util.exception.AbstractErrorAdvisor:165

/tmp/GridSetupActions2019-09-01_09-40-29AM/time2019-09-01_09-40-29AM.log

oracle.cluster.verification.VerificationException: PRVF-0002 : could not retrieve local node name
ip-10-0-2-197.ec2.internal: ip-10-0-2-197.ec2.internal: Name or service not known

Note that the local hostname does not resolve to an IP address:

[grid@ip-10-0-2-197 grid]$ hostname
ip-10-0-2-197.ec2.internal
[grid@ip-10-0-2-197 grid]$ ping ip-10-0-2-197.ec2.internal
ping: ip-10-0-2-197.ec2.internal: Name or service not known

Cause

The Virtual Private Cloud (VPC) has the EnableDnsHostnames flag set to Disabled (False). This could happen in a non-default VPC, meaning a VPC that you create. You can see this in the AWS console:

vpc

or from python:

Python 2.7.5 (default, Jun 11 2019, 14:33:56)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from boto3 import client
>>> rsp = client('ec2').describe_vpc_attribute(
...     Attribute='enableDnsHostnames',
...     VpcId = 'vpc-0786f6ef4dc16cefc'
... )
>>> rsp['EnableDnsHostnames']['Value']
False

or from the command line:

[grid@ip-10-0-2-197 grid]$ aws ec2 describe-vpc-attribute --vpc-id vpc-0786f6ef4dc16cefc --attribute enableDnsHostnames
{
    "VpcId": "vpc-0786f6ef4dc16cefc",
    "EnableDnsHostnames": {
        "Value": false
    }
}

Fix

Set the EnableDnsHostnames flag, which the AWS manual describes:

Indicates whether the instances launched in the VPC get DNS hostnames. If enabled, instances in the VPC get DNS hostnames; otherwise, they do not.

From the VPC Dashboard, select your VPC. Click Actions, and select “Edit DNS hostnames”. At the Edit DNS hostnames screen, check “enable” and press “Save”.

edit.dns.hostnames

or, in python:

>>> from boto3 import client
>>> rsp = client('ec2').modify_vpc_attribute(
...     VpcId = 'vpc-0786f6ef4dc16cefc',
...     EnableDnsHostnames = { 'Value': True }
... )

or, using the aws command line, issue modify-vpc-attribute:

[grid@ip-10-0-2-197 grid]$ aws ec2 modify-vpc-attribute --vpc-id vpc-0786f6ef4dc16cefc --enable-dns-hostnames
[grid@ip-10-0-2-197 grid]$ aws ec2 describe-vpc-attribute --vpc-id vpc-0786f6ef4dc16cefc --attribute enableDnsHostnames
{
    "VpcId": "vpc-0786f6ef4dc16cefc",
    "EnableDnsHostnames": {
        "Value": true
    }
}

Retest

The change does not affect the EC2 instance immediately.

[grid@ip-10-0-2-197 grid]$ ping ip-10-0-2-197.ec2.internal
ping: ip-10-0-2-197.ec2.internal: Name or service not known

Reboot the host:

[ec2-user@ip-10-0-2-197 scr]$ sudo reboot

Now test:

[ec2-user@ip-10-0-2-197 ~]$ ping ip-10-0-2-197.ec2.internal
PING ip-10-0-2-197.ec2.internal (10.0.2.197) 56(84) bytes of data.

Now gridSetup.sh starts normally, and the installer menu appears:

[grid@ip-10-0-2-197 grid]$ ./gridSetup.sh

installer

Discussion

Note that error PRVF-0002 has been widely reported in the blogs. In this case, the environment is AWS EC2 Redhat Linux 7.2 and Oracle 19c Grid Infrastructure. The error also appears in Oracle Linux, and in Oracle versions as early as 11g. PRVF-0002 can as well appear in a silent install.

The fix is sometimes given as a local /etc/hosts edit. However, in this blog article, I am proposing a fix that you can implement once per VPC, so that you don’t have to edit hosts every time. Also, you can get more consistent deployments across all VPCs, not just the default VPC.

Conclusion

gridSetup error “PRVF-0002 : could not retrieve local node name” can be resolved at the VPC level by setting EnableDnsHostnames.