listener status UNKNOWN or BLOCKED

By Brian Fitzgerald

Introduction

By trial and error, I identify what Oracle Database instance states correspond to listener status READY, UNKNOWN, and BLOCKED.

Details

I have recorded the listener status for various database states. In some cases, the listener was reloaded or restarted.

Database open READ WRITE:

Service "EQTRD" has 2 instance(s).
  Instance "EQTRD", status UNKNOWN, has 1 handler(s) for this service...
  Instance "EQTRD", status READY, has 1 handler(s) for this service...

Shutdown the database

Service "EQTRD" has 1 instance(s).
  Instance "EQTRD", status UNKNOWN, has 1 handler(s) for this service...

Delete the static listener and reload the listener.

[none]

Startup nomount

Service "EQTRD" has 1 instance(s).
  Instance "EQTRD", status BLOCKED, has 1 handler(s) for this service...

Reload

Service "EQTRD" has 1 instance(s).
  Instance "EQTRD", status BLOCKED, has 1 handler(s) for this service...

Stop and start the listener.

Service "EQTRD" has 1 instance(s).
  Instance "EQTRD", status BLOCKED, has 1 handler(s) for this service...

Note that reloading or restarting the listener does not clear a BLOCKED state.

Mount

Service "EQTRD" has 1 instance(s).
  Instance "EQTRD", status READY, has 1 handler(s) for this service...

Startup force nomount

Service "EQTRD" has 1 instance(s).
  Instance "EQTRD", status BLOCKED, has 1 handler(s) for this service...

Re-enable the static listener

Service "EQTRD" has 2 instance(s).
  Instance "EQTRD", status UNKNOWN, has 1 handler(s) for this service...
  Instance "EQTRD", status BLOCKED, has 1 handler(s) for this service...

Mount

Service "EQTRD" has 2 instance(s).
  Instance "EQTRD", status UNKNOWN, has 1 handler(s) for this service...
  Instance "EQTRD", status READY, has 1 handler(s) for this service...

Restricted session

Service "XIENT" has 1 instance(s).
  Instance "XIENT", status RESTRICTED, has 1 handler(s) for this service...

Summary

Here is a summary of listener status and interpretation.

status interpretation
BLOCKED nomount
UNKNOWN static listener
READY mounted or open. Dymamically registered
RESTRICTED restricted session

 

Data Guard broker 19c edit database set parameter feature is unstable

By Brian Fitzgerald

Introduction

New in Oracle Database 19c, you can set a database initialization parameter directly from dgmgrl, the Data Guard broker command-line interface. The documentation is not clear or the feature is unstable.

Edit database set parameter

The basic functionality works as documented.

DGMGRL> edit database PRIM set parameter pga_aggregate_target='752m';
Parameter "pga_aggregate_target" updated

However:

DGMGRL> edit database PRIM set parameter pga_aggregate_target='753m' sid='PRIM';
edit database PRIM set parameter pga_aggregate_target='753m' sid='PRIM';
                                                                ^
Syntax error before or at "="

DGMGRL> edit database PRIM set parameter sga_target=2256m scope = both;
edit database PRIM set parameter sga_target=2256m scope = both;
                                                        ^
Syntax error before or at "="

Options SID and SCOPE are documented but are not working for me.

Edit database reset parameter

You can use edit database to reset a database initialization parameter. Suppose spfile initially contains:

*.use_large_pages='ONLY'
PRIM.use_large_pages='ONLY'

Now reset.

DGMGRL> edit database PRIM reset parameter use_large_pages;
ORA-16675: database instance restart required for property value modification to take effect

Parameter "use_large_pages" reset
DGMGRL>

This appears in the alert log

2019-10-17T19:43:49.034727+00:00
ALTER SYSTEM RESET use_large_pages SCOPE=SPFILE;

Here is what is remains:

PRIM.use_large_pages='ONLY'

The “sid=’*'” entry is gone. The dgmgrl reset behavior, then, is the same as the alter system syntax with an assumed “sid=’*'”.

The dgmgrl edit database reset parameter command has no sid option.

DGMGRL> edit database PRIM reset parameter use_large_pages sid = 'PRIM';
edit database PRIM reset parameter use_large_pages sid = 'PRIM';
                                                   ^
Syntax error before or at "sid"

Conclusion

The dgmgrl edit database set and reset parameter commands are incomplete or have unexpected behavior today, or I misunderstand them. I will not use them for now.

AutoUpgrade 12c->19c with Transient Logical Standby

By Brian Fitzgerald

Introduction

Oracle Database 12.1 was upgraded to version 19.3 using the AutoUpgrade utility and the Transient Logical Standby. Actual downtime was less than 10s.

News about AutoUpgrade

For news about AutoUpgrade, please refer Oracle product manager Mike Dietrich’s blog.

System Description

The system is Linux 7.7 on Amazon AWS EC2. Details appear in this table:

Attribute Value
cloud AWS
location US East (N. Virginia)
instanceType m5.large
tenancy shared
instanceFamily General purpose
physicalProcessor Intel Xeon Platinum 8175
clockSpeed 2.5 GHz
Number of CPUs 2
memGB 7.8 GiB
Storage EBS gp2
operatingSystem Linux
AMI RHEL-7.7_HVM_GA-20190723-x86_64-1-Hourly2-GP2
OS Vendor Red Hat, Inc.
Red Hat version 7.7
price per hour $0.096

System Configuration

Initially, the system is a Data Guard network with a physical standby.

Description Value
Original version 12.1.0.2
Target version 19.3.0
Primary EQTRD
Physical Standby XIENT
Instance type Restart
Schema trades

Non-default Database system configurations

Selected non-default instance parameters are:

name value
archive_lag_target 900
audit_file_dest /u01/app/oracle/admin/EQTRD/adump
audit_trail DB
compatible 12.1.0.2.0
control_files +DATA01/EQTRD/CONTROLFILE/current.259.1021685185, +RECO01/EQTRD/CONTROLFILE/current.313.1021685185
db_block_size 8192
db_create_file_dest +DATA01
db_create_online_log_dest_1 +DATA01
db_create_online_log_dest_2 +RECO01
db_name EQTRD
db_recovery_file_dest +RECO01
db_recovery_file_dest_size 30g
dg_broker_config_file1 +DATA01/EQTRD/dr1orcl.dat
dg_broker_config_file2 +RECO01/EQTRD/dr2orcl.dat
dg_broker_start TRUE
diagnostic_dest /u01/app/oracle
local_listener 127.0.0.1:1521
log_archive_config dg_config=(EQTRD,XIENT)
log_archive_dest_1 location=USE_DB_RECOVERY_FILE_DEST, valid_for=(ALL_LOGFILES, ALL_ROLES)
log_archive_dest_2 service=”XIENT”, ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name=”XIENT” net_timeout=30, valid_for=(online_logfile,all_roles)
log_archive_dest_state_2 ENABLE
sga_max_size 1500m
sga_target 1500m
standby_file_management AUTO
use_large_pages ONLY

Existing Data Guard 12c physical standby

We start this article with a working Data Guard physical standby with these issues already handled on the primary and the standby:

  • network route
  • static listeners
  • archivelog
  • flashback on
  • forced logging
  • 3×500 MB online redo log groups
  • 4×500 MB standby redo log groups
  • audit directory
  • data guard broker
  • tested switchover
  • spfile moved to ASM
  • tnsnames.ora in 12c oracle home
  • Restart is setup
  • oratab is configured and pointing to the 12c home
  • srvctl tested

The standby spfile has been moved to ASM so that it will be unnecessary to move it after the upgrade, or a possible downgrade.

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DATA01/XIENT/PARAMETERFILE/spfile.275.1021229119

Existing Data

Example table ordermessage has one row.

SQL> @ insert1.sql

1 row created.

SQL> commit;

Commit complete.
SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK

Preparations

Careful preparation should be completed days ahead of the planned upgrade. AutoUpgrade Analyze and Fixups can be run and the results reviewed. The 19c oracle homes, including tnsnames.ora and orapw should be prepared for the upgrade.

Download AutoUpgrade

Download the latest autoupgrade.jar

Install to the 19c oracle home on primary and standby

[oracle@ip-172-31-91-148 dbhome_1]$ alias oh
alias oh='cd $ORACLE_HOME'
[oracle@ip-172-31-91-148 dbhome_1]$ oh
[oracle@ip-172-31-91-148 dbhome_1]$ find . -name autoupgrade.jar -ls
76988898 3616 -rw-r--r-- 1 oracle oinstall 3702360 Oct 7 20:35 ./rdbms/admin/autoupgrade.jar

Set this alias for convenience

[oracle@ip-172-31-91-148 dbhome_1]$ AH=/u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-91-148 dbhome_1]$ alias au='rlwrap $AH/jdk/bin/java -jar $AH/rdbms/admin/autoupgrade.jar'

Use rlwrap for CLI readline editing and history recall.

Check the AutoUpgrade version:

[oracle@ip-172-31-91-148 dbhome_1]$ au -version
build.hash 67fee5b
build.version 20190823
build.date 2019/08/23 18:08:47

Source AutoUpgrade configuration file

The source configuration file is eqtrd.12c.19c.conf. Notice that target_home need not appear in the source configuration file.

global.autoupg_log_dir=/u01/app/oracle/autoupgrade

upg1.dbname=EQTRD
upg1.start_time=now
upg1.source_home=/u01/app/oracle/product/12.1.0/dbhome_1
upg1.sid=EQTRD
upg1.log_dir=/u01/app/oracle/autoupgrade
upg1.upgrade_node=ip-172-31-88-93.ec2.internal
upg1.target_version=19.3

Analyze

Use AutoUpgrade to analyze for issues and needed fixups.

If you scroll down, you will find a horizontal scrollbar. You may scroll right to view the right-most columns.

[oracle@ip-172-31-88-93 ~]$ au -config eqtrd.12c.19c.conf -mode analyze
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be analyzed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 110|  EQTRD|PRECHECKS|PREPARING|RUNNING|19/10/11 14:38|     N/A|14:38:09|Loading DB info|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+--------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|       MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+--------------+
| 110|  EQTRD|PRECHECKS|PREPARING|RUNNING|19/10/11 14:38|     N/A|14:38:16|Remaining 6/72|
+----+-------+---------+---------+-------+--------------+--------+--------+--------------+
Total jobs 1
upg> Job 110 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 110 FOR EQTRD

Fixups

Use AutoUpgrade to run Fixups:

[oracle@ip-172-31-88-93 ~]$ au -config eqtrd.12c.19c.conf -mode fixups
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 111|  EQTRD|PRECHECKS|PREPARING|RUNNING|19/10/11 14:43|     N/A|14:43:28|Loading DB info|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
| 111|  EQTRD|PREFIXUPS|EXECUTING|RUNNING|19/10/11 14:43|     N/A|14:43:42|Remaining 3/3|
+----+-------+---------+---------+-------+--------------+--------+--------+-------------+
Total jobs 1
upg> Job 111 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 111 FOR EQTRD

Target AutoUpgrade configuration file

Prepare the target AutoUpgrade configuration file. Notice that dbname and sid are changed. upgrade_node must match the target hostname. The configuration file is xient.12c.19c.conf.

global.autoupg_log_dir=/u01/app/oracle/autoupgrade

upg1.dbname=XIENT
upg1.start_time=now
upg1.source_home=/u01/app/oracle/product/12.1.0/dbhome_1
upg1.target_home=/u01/app/oracle/product/19.3.0/dbhome_1
upg1.sid=XIENT
upg1.log_dir=/u01/app/oracle/autoupgrade
upg1.upgrade_node=ip-172-31-91-148.ec2.internal
upg1.target_version=19.3

Prepare 19c oracle homes

Prepare the 19c oracle homes with the password file and tnsnames.ora. Primary:

[oracle@ip-172-31-88-93 ~]$ alias oh
alias oh='cd $ORACLE_HOME'
[oracle@ip-172-31-88-93 ~]$ oh
[oracle@ip-172-31-88-93 dbhome_1]$ pwd
/u01/app/oracle/product/12.1.0/dbhome_1
[oracle@ip-172-31-88-93 dbhome_1]$ cp -p dbs/orapwEQTRD /u01/app/oracle/product/19.3.0/dbhome_1/dbs/
[oracle@ip-172-31-88-93 dbhome_1]$ cp -p network/admin/tnsnames.ora /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/

standby:

[oracle@ip-172-31-91-148 ~]$ oh
[oracle@ip-172-31-91-148 dbhome_1]$ pwd
/u01/app/oracle/product/12.1.0/dbhome_1
[oracle@ip-172-31-91-148 dbhome_1]$ cp -p dbs/orapwXIENT /u01/app/oracle/product/19.3.0/dbhome_1/dbs/ 
[oracle@ip-172-31-91-148 dbhome_1]$ cp -p network/admin/tnsnames.ora /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/

Create the Transient Logical Standby

For this exercise, I find it helpful to have four terminal windows open, two on the primary and two on the standby. One window per host is for tailing the alert log, and the other is for commands.

Data Guard configuration

Note the Data Guard configuration. At the standby:

SQL> select * from v$dataguard_config;

DB_UNIQUE_NAME  PARENT_DBUN     DEST_ROLE         CURRENT_SCN     CON_ID
--------------- --------------- ----------------- ----------- ----------
EQTRD           NONE            PRIMARY DATABASE      1868165          0
XIENT           EQTRD           PHYSICAL STANDBY      1868075          0

Disable the Data Guard broker configuration

We are not going to stop the Data Guard broker. Disabling the configuration freezes the current Data Guard system state.

DGMGRL> disable configuration
Disabled.

Standby is mounted

Note that the standby database is mounted:

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

Primary restore point

The primary restore point is required for the upgrade procedure later. Create it now:

SQL> create restore point upgrd1 guarantee flashback database;

Restore point created.

Cancel managed recovery

On the standby:

SQL> recover managed standby database cancel
Media recovery complete.

Create standby restore point

On the standby:

SQL> create restore point upgrd2 guarantee flashback database;

Restore point created.

Some DBAs create additional restore points at later steps.

Create the logical standby

For this step, it is helpful to watch two windows.

On the standby:

SQL> alter database recover to logical standby keep identity;

The session hangs …

On the primary:

SQL> exec dbms_logstdby.build;

PL/SQL procedure successfully completed.

SQL>

dbms_logstdby.build runs in approximately three seconds. Three more seconds later, the alter database finishes and the sqlplus prompt appears.

Database altered.

SQL>

Run standby apply

On the standby:

SQL> alter database open;

Database altered.

SQL> alter database start logical standby apply immediate;

Database altered.

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
INITIALIZING

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
INITIALIZING

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
LOADING DICTIONARY

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
LOADING DICTIONARY

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
LOADING DICTIONARY

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
APPLYING

SQL> select state from v$logstdby_state;

STATE
----------------------------------------------------------------
IDLE

SQL>

Standby log miner dictionary build may hang at INITIALIZING for more than one minute. It could take up to 5 minutes to reach IDLE.

Foreign archivelogs

Archivelogs identified as “foreign” may appear at the standby

Wed Oct 16 02:21:23 2019
LOGMINER: End mining logfile for session 1 thread 1 sequence 33, +RECO01/XIENT/foreign_archivelog/EQTRD/2019_10_16/thread_1_seq_33.316.1021774867

Stop standby apply

When logical standby state reaches IDLE, stop standby apply.

On the primary:

SQL> alter system set log_archive_dest_state_2=defer scope=memory;

System altered.

From this point until the catchup, the primary continues to operate normally with users connected. Archived logs at the primary contain transactions that will be replayed at the standby later.

On the standby:

SQL> alter database stop logical standby apply;

Database altered.

Shutdown 12c

On the standby:

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

Upgrade

Upgrade the standby to 19c.

Setup the shell environment

Set the 19c environment.

On the standby:

[oracle@ip-172-31-91-148 ~]$ unset ORACLE_SID
[oracle@ip-172-31-91-148 ~]$ . oraenv
ORACLE_SID = [oracle] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/19.3.0/dbhome_1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ip-172-31-91-148 ~]$ export ORACLE_SID=XIENT

Upgrade the Restart

[oracle@ip-172-31-91-148 ~]$ srvctl upgrade database -database XIENT -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1

startup upgrade

SQL> startup upgrade
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             385875968 bytes
Database Buffers         1174405120 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.

Check the status. The following output is expected:

SQL> select logins, status from v$instance;

LOGINS     STATUS
---------- ------------
RESTRICTED OPEN MIGRATE

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

oratab

Note that the Oracle Agent has updated the Oracle home in oratab.

[oracle@ip-172-31-91-148 ~]$ grep XIENT /etc/oratab
XIENT:/u01/app/oracle/product/19.3.0/dbhome_1:N         # line added by Agent

Start screen

If you lose your session during the upgrade, you are going to lose direct control over the upgrade. To recover, you would need to search your global.autoupg_log_dir for the state and try to figure out how to recover, if necessary.

Protect your session with screen.

[oracle@ip-172-31-91-148 ~]$ screen

If you get disconnected, reconnect to screen:

[oracle@ip-172-31-91-148 ~]$ screen -r

You can detach from screen with ctrl-a-d.

Run AutoUpgrade

Use AutoUpgrade to upgrade the standby database.

On the standby, in screen:

[oracle@ip-172-31-91-148 ~]$ AH=/u01/app/oracle/product/19.3.0/dbhome_1
[oracle@ip-172-31-91-148 ~]$ alias au='rlwrap $AH/jdk/bin/java -jar $AH/rdbms/admin/autoupgrade.jar'
[oracle@ip-172-31-91-148 ~]$ au -config xient.12c.19c.conf -mode upgrade
AutoUpgrade tool launched with default options
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|02:48:05|0%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|02:51:08|12%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:00:14|49%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:09:21|92%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:12:52|0%Compiled |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|     MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:15:56|80%Compiled |
+----+-------+---------+---------+-------+--------------+--------+--------+------------+
Total jobs 1
upg> lsj
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
|Job#|DB_NAME|    STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|            MESSAGE|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
| 105|  XIENT|DBUPGRADE|EXECUTING|FINISHED|19/10/12 02:47|     N/A|03:18:21|Restarting Database|
+----+-------+---------+---------+--------+--------------+--------+--------+-------------------+
Total jobs 1
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
|Job#|DB_NAME|     STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|      MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
| 105|  XIENT|POSTFIXUPS|EXECUTING|RUNNING|19/10/12 02:47|     N/A|03:22:21|Remaining 1/3|
+----+-------+----------+---------+-------+--------------+--------+--------+-------------+
Total jobs 1
upg> Job 105 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 105 FOR XIENT

Exit screen

You may exit screen now. To exit screen, exit your shell.

Update the static listener

In the grid account, edit listener.ora. You can do this while you are waiting for the upgrade to finish.

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = XIENT)
    )
  )

Reload the listener

[grid@ip-172-31-91-148 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 12-OCT-2019 02:26:43

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-91-148.ec2.internal)(PORT=1521)))
The command completed successfully

Check the listener status

[grid@ip-172-31-91-148 admin]$ lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 12-OCT-2019 02:26:48

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-91-148.ec2.internal)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                11-OCT-2019 13:17:23
Uptime                    0 days 13 hr. 9 min. 25 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/ip-172-31-91-148/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ip-172-31-91-148.ec2.internal)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_RECO01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "XIENT" has 1 instance(s).
  Instance "XIENT", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

Continued operations on the 12c primary

Meanwhile, on primary, which is still on 12c, users may continue to modify data:

SQL> select banner from v$version where banner like 'Oracle Database%';

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

SQL> @ insert2.sql

1 row created.

SQL> commit;

Commit complete.

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK
DMYD9FS3FD3CYO77CU4R

Catch up the upgraded standby

Initial state

The standby is open and upgraded, but still lacks application data updates that were made to the primary during the upgrade.

SQL> select banner from v$version where banner like 'Oracle Database%';

BANNER
--------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK

With only one row, table ordermessage is out of date.

Resume transport

Resume transport from the primary to the standby:

Primary:

SQL> alter system set log_archive_dest_state_2=enable;

System altered.

Standby:

SQL> alter database start logical standby apply immediate;

Database altered.

Check for catchup on the standby:

Primary:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1964941

Standby:

SQL> select applied_scn, latest_scn, mining_scn from v$logstdby_progress;

APPLIED_SCN LATEST_SCN MINING_SCN
----------- ---------- ----------
1964938        1964946    1964944

Check switchover status.

On the primary:

SQL> select database_role, switchover_status from v$database;

DATABASE_ROLE    SWITCHOVER_STATUS
---------------- --------------------
PRIMARY          TO STANDBY

Until now, the application was operating normally on the original primary. Primary application data is now caught up on the original standby.

Check the application data on the standby:

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK
DMYD9FS3FD3CYO77CU4R

The standby data is up to date!

Now it is time to quiesce the application. Otherwise, changes could be lost.

Switchover

We’re going to switchover the application to the Transient Logical Standby. This is the actual outage. Get your stopwatch ready!

Original primary:

SQL> alter database commit to switchover to logical standby;

Database altered.

Start your stopwatch!

alter database commit to switchover to logical standby could run for 4s.

Original standby:

SQL> select database_role, switchover_status from v$database;

DATABASE_ROLE    SWITCHOVER_STATUS
---------------- --------------------
LOGICAL STANDBY  TO PRIMARY
SQL> alter database commit to switchover to logical primary;

Database altered.

alter database commit to switchover to logical primary could run for 15s or more.

Stop!

The first outage is over. Users may connect to the new logical primary.

Flashback the original primary

The original primary is going to become the physical standby.

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
SQL> flashback database to restore point upgrd1;

Flashback complete.

Convert the original primary to physical standby

SQL> alter database convert to physical standby;

Database altered.

Shut down original primary

SQL> shutdown abort
ORACLE instance shut down.

Environment

Set the 19c environment:

[oracle@ip-172-31-88-93 ~]$ unset ORACLE_SID
[oracle@ip-172-31-88-93 ~]$ . oraenv
ORACLE_SID = [oracle] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/19.3.0/dbhome_1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ip-172-31-88-93 ~]$ ORACLE_SID=EQTRD

Upgrade original primary Restart

 [oracle@ip-172-31-88-93 ~]$ srvctl upgrade database -database EQTRD -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1

startup mount

Mount the original primary.

[oracle@ip-172-31-88-93 ~]$ srvctl start database -database EQTRD -startoption MOUNT

Check oratab

Notice that the Oracle Agent updated the Oracle Home in oratab

[oracle@ip-172-31-88-93 ~]$ grep EQTRD /etc/oratab
EQTRD:/u01/app/oracle/product/19.3.0/dbhome_1:N         # line added by Agent

Static listener

Edit the static listener. In the grid account on the original primary:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = EQTRD)
    )
  )

Reload the listener

[grid@ip-172-31-91-148 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 15-OCT-2019 00:32:52

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-91-148.ec2.internal)(PORT=1521)))
The command completed successfully

Check the listener status

[grid@ip-172-31-88-93 admin]$ lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 16-OCT-2019 03:56:08

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-88-93.ec2.internal)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                16-OCT-2019 00:25:28
Uptime                    0 days 3 hr. 30 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/ip-172-31-88-93/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ip-172-31-88-93.ec2.internal)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_RECO01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "EQTRD" has 2 instance(s).
  Instance "EQTRD", status UNKNOWN, has 1 handler(s) for this service...
  Instance "EQTRD", status READY, has 1 handler(s) for this service...
The command completed successfully

Enable the Data Guard configuration

Connect to the Transient Logical Standby:

[oracle@ip-172-31-91-148 ~]$ dgmgrl sys/Zystm.22@XIENT
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sat Oct 12 05:17:06 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "XIENT"
Connected as SYSDBA.

DGMGRL> enable configuration
Enabled.

Data Guard broker is going to detect that XIENT is the new primary. Data Guard broker will start log transport to EQTRD, and will start managed standby on EQTRD. Wait 5 to 10 minutes until recovery at the original primary is caught up.

A message such as the following should appear in the original primary alert log:

2019-10-17T02:24:01.118981+00:00
PR00 (PID:9485): Media Recovery Waiting for T-1.S-31 (in transit)

Check the Data Guard configuration

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxPerformance
  Members:
  XIENT - Primary database
    EQTRD - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 20 seconds ago)

Errors

The following errors should not appear if no steps were missed.

ORA-1033: ORACLE initialization or shutdown in progress

DGMGRL> add database 'EQTRD' as connect identifier is 'EQTRD' maintained as physical;
Error: ORA-1033: ORACLE initialization or shutdown in progress

Failed.

There could be a password file issue. Copy the 12c orapw to the 19c home.

On the original primary:

[oracle@ip-172-31-88-93 ~]$ cp -p /u01/app/oracle/product/12.1.0/dbhome_1/dbs/orapwEQTRD /u01/app/oracle/product/19.3.0/dbhome_1/dbs/orapwEQTRD

Error: ORA-16698: member has a LOG_ARCHIVE_DEST_n parameter with SERVICE attribute set

DGMGRL> add database 'EQTRD' as connect identifier is 'EQTRD' maintained as physical;
Error: ORA-16698: member has a LOG_ARCHIVE_DEST_n parameter with SERVICE attribute set

Solution: clear log_archive_dest_2.

On the original primary:

SQL> alter system set log_archive_dest_2='';

System altered.

Switchover to the original primary

The next step is going to result in another application outage.

DGMGRL> switchover to EQTRD
Performing switchover NOW, please wait...
Operation requires a connection to database "EQTRD"
Connecting ...
Connected to "EQTRD"
Connected as SYSDBA.
New primary database "EQTRD" is opening...
Oracle Clusterware is restarting database "XIENT" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to "XIENT"
Connected to "XIENT"
Switchover succeeded, new primary is "eqtrd"

The switchover completed in less than 40s.

Flashback to restore point

This section contains backout procedures that you can use on the primary or the standby.

Remove or disable Data Guard

You can try to flashback the primary and the standby and resume Data Guard. If you do, you may notice an error such as this one:

In the broker:

ERROR ORA-16700: the standby database has diverged from the primary database

In the standby alert log:

Thu Oct 17 03:53:40 2019
Errors in file /u01/app/oracle/diag/rdbms/xient/XIENT/trace/XIENT_mrp0_3527.trc:
ORA-19909: datafile 1 belongs to an orphan incarnation
ORA-01110: data file 1: '+DATA01/XIENT/DATAFILE/system.271.1021840189'

For the sake of demonstration, we will remove Data Guard and then flashback and open either the primary or the standby, but not both.

Remove the Data Guard broker configuration.

DGMGRL> disable configuration
Disabled.
DGMGRL> remove configuration
Removed configuration
DGMGRL>

Stop the brokers and disable transport.

On the primary:

SQL> alter system set log_archive_dest_2='';

System altered.

SQL> alter system set dg_broker_start=false;

System altered.

On the standby:

SQL> alter system set dg_broker_start=false;

System altered.

Flashback the primary

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
SQL> flashback database to restore point upgrd1;

Flashback complete.

Clear parameters that are invalid in 12.1. Shut down abort.

SQL> alter system reset "__unified_pga_pool_size" sid = 'EQTRD' scope=spfile;

System altered

SQL> shutdown abort
ORACLE instance shut down.

Downgrade Restart.

[oracle@ip-172-31-88-93 ~]$ srvctl downgrade database -database EQTRD -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -targetversion 12.1.0.2

Set the 12c environment.

[oracle@ip-172-31-88-93 ~]$ unset ORACLE_SID
[oracle@ip-172-31-88-93 ~]$ . oraenv
ORACLE_SID = [oracle] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/12.1.0/dbhome_1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ip-172-31-88-93 ~]$ ORACLE_SID=EQTRD

Startup mount and open resetlogs.

SQL> startup mount
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
SQL> alter database open resetlogs;

Database altered.

Notice that the Agent has updated oratab to 12c.

[oracle@ip-172-31-88-93 ~]$ grep EQTRD /etc/oratab
EQTRD:/u01/app/oracle/product/12.1.0/dbhome_1:N         # line added by Agent

Check the application data. Changes that were made during the upgrade have been lost.

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK

Rollback of primary to 12c is done.

Flashback the standby

As an alternative, flashback the standby.

SQL> recover managed standby database cancel
Media recovery complete.
SQL> flashback database to restore point upgrd2;

Flashback complete.

SQL> alter database activate physical standby database;

Database altered.

Reset parameters that are invalid in 12.1. Shut down abort.

SQL> alter system reset "__unified_pga_pool_size" sid = 'XIENT' scope=spfile;

System altered.

SQL> shutdown abort
ORACLE instance shut down.

Downgrade Restart.

[oracle@ip-172-31-91-148 ~]$ srvctl downgrade database -database XIENT -oraclehome /u01/app/oracle/product/12.1.0/dbhome_1 -targetversion 12.1.0.2

Set the 12c environment. Startup.

[oracle@ip-172-31-91-148 ~]$ unset ORACLE_SID
[oracle@ip-172-31-91-148 ~]$ . oraenv
ORACLE_SID = [oracle] ?
ORACLE_HOME = [/home/oracle] ? /u01/app/oracle/product/12.1.0/dbhome_1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ip-172-31-91-148 ~]$ export ORACLE_SID=XIENT

SQL> startup
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
Database opened.

Notice that the Agent has updated oratab.

[oracle@ip-172-31-91-148 ~]$ grep XIENT /etc/oratab
XIENT:/u01/app/oracle/product/12.1.0/dbhome_1:N         # line added by Agent

Check the application data.

SQL> select ClOrdID from ordermessage order by TransactTime;

CLORDID
------------------------------
XL46W2NCVM22N02RQ2YK

Conclusion to flashbacks

This section was an outline of two possible, independent rollback procedures. Flashback will result in data loss. Further refinement is required.

Prior to beginning, we had moved spfile to ASM.

This concludes the section on flashback.

Final steps

If there are no problems with the upgrade, continue.

Drop the restore point

Primary:

SQL> drop restore point UPGRD1;

Restore point dropped.

Standby:

SQL> drop restore point UPGRD2;

Restore point dropped.

Update compatible

Primary:

SQL> alter system set compatible = '19.3.0.0.0' scope=spfile;

System altered.

SQL> startup force
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
Database opened.

Standby:

SQL> alter system set compatible = '19.3.0.0.0' scope=spfile;

System altered.

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
SQL

compatible error

You cannot change compatibility if a guaranteed restore point exists.

SQL> alter system set compatible = '19.3.0.0.0' scope=spfile;

System altered.

SQL> startup force mount
ORACLE instance started.

Total System Global Area 1577054672 bytes
Fixed Size                  8896976 bytes
Variable Size             536870912 bytes
Database Buffers         1023410176 bytes
Redo Buffers                7876608 bytes
ORA-38880: Cannot advance compatibility from 12.1.0.2.0 to 19.3.0.0.0 due to
guaranteed restore points

Solutions:

  1. Set compatible to its original value.
    SQL> alter system set compatible = '12.1.0.2.0' scope=spfile;
    
    System altered.
    
    SQL> startup force mount
    ORACLE instance started.
    
    Total System Global Area 1577054672 bytes
    Fixed Size                  8896976 bytes
    Variable Size             536870912 bytes
    Database Buffers         1023410176 bytes
    Redo Buffers                7876608 bytes
    Database mounted.
    SQL>
    

    or:

  2. Drop the restore point.

Delete foreign archive logs

RMAN> list foreign archivelog all;

RMAN> delete noprompt foreign archivelog all;

Conclusion

Oracle Database was upgraded from version 12.1 to version 19.3 using AutoUpgrade and Transient Logical Standby. The actual application outage was 10 seconds.

Notes:

  • The initial state was a working Data Guard system with a physical standby.
  • Oracle Customer offers rolling upgrade script physru.
  • In case of a problem with the upgrade, it is possible to flashback to a point in time before the upgrade.
  • This is a zero data loss upgrade.
  • There are two brief outages.
  • The rollback is not fully developed in this article.
  • As described here, downgrade to 12c results in data loss.

 

move spfile to ASM in one step

By Brian Fitzgerald

Introduction

Here is a simple technique for relocating the Oracle Database spfile from the file system to ASM in a single RMAN step in Oracle Restart. No direct ASM access is required. The technique applies to Oracle Database 12c and up.

tl;dr

RMAN> restore from service 'EQTRD' spfile to '+DATA01';
SQL> startup force

Initial condition

Note the Restart configuration

Note the db_unique_name and the spfile location.

[oracle@ip-172-31-88-93 ~]$ srvctl config database -database EQTRD
Database unique name: EQTRD
Database name: EQTRD
Oracle home: /u01/app/oracle/product/19.3.0/dbhome_1
Oracle user: oracle
Spfile: /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileEQTRD.ora
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: RECO01,DATA01
Services:
OSDBA group:
OSOPER group:
Database instance: EQTRD

ASM

Check that no spfile exists in ASM for your db_unique_name:

[grid@ip-172-31-88-93 ~]$ asmcmd ls +DATA01/EQTRD/PARAMETERFILE
ASMCMD-8002: entry 'PARAMETERFILE' does not exist in directory '+DATA01/EQTRD/'

No spfile was found in ASM.

Check Restart

The initial condition should be a working Oracle Restart.

[oracle@ip-172-31-88-93 ~]$ srvctl stop database -database EQTRD
[oracle@ip-172-31-88-93 ~]$ srvctl start database -database EQTRD
[oracle@ip-172-31-88-93 ~]$

Restore

Restore the spfile using a single RMAN command. You must connect to oracle via a service. “rman target /” will not work in this case. You only need to specify the target disk group. You do not need to specify the full ASM directory path.

[oracle@ip-172-31-88-93 ~]$ rman target sys/Zystm.22@EQTRD

Recovery Manager: Release 19.0.0.0.0 - Production on Sun Oct 13 18:12:25 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: EQTRD (DBID=625173207)

RMAN> restore from service 'EQTRD' spfile to '+DATA01';

Starting restore at 13-OCT-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=282 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service EQTRD
channel ORA_DISK_1: restoring SPFILE
output file name=+DATA01
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 13-OCT-19

RMAN>

Recovery Manager complete.

The RMAN technique uses the “restore from service” syntax, available in Oracle Database 12c and up.

Check Restart

Notice that the Restart configuration got updated:

[oracle@ip-172-31-88-93 ~]$ srvctl config database -database EQTRD
Database unique name: EQTRD
Database name: EQTRD
Oracle home: /u01/app/oracle/product/19.3.0/dbhome_1
Oracle user: oracle
Spfile: +DATA01/EQTRD/PARAMETERFILE/spfile.280.1021572757
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: RECO01,DATA01
Services:
OSDBA group:
OSOPER group:
Database instance: EQTRD

Notice we moved spfile and learned its new location without direct access to ASM. In an organization with segregation of duties, the RDMBS DBA can move spfile to ASM without coordination with the ASM DBA.

Check ASM

For information, we may check ASM.

[grid@ip-172-31-88-93 ~]$ asmcmd ls +DATA01/EQTRD/PARAMETERFILE/spfile.280.1021572757
spfile.280.1021572757

Remove old spfile

Remove the old spfile from the file system (if you dare!).

The Oracle base has been set to /u01/app/oracle
[oracle@ip-172-31-88-93 ~]$ rm /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileEQTRD.ora
[oracle@ip-172-31-88-93 ~]$ 

Restart the database

You must restart the database instance now. You may use sqlplus or srvctl.

SQL> startup force
ORACLE instance started.

Total System Global Area 2365586088 bytes
Fixed Size                  8899240 bytes
Variable Size             520093696 bytes
Database Buffers         1828716544 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.
SQL>

Done!

Only one RMAN command was needed to move the spfile to ASM.

Reversal

You can move spfile in the reverse direction, from ASM to the file system.

Check the file system

Note that the spfile does not exist on the file system

 [oracle@ip-172-31-88-93 ~]$ ls /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileEQTRD.ora
ls: cannot access /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileEQTRD.ora: No such file or directory

Restore spfile to file system

[oracle@ip-172-31-88-93 ~]$ rman target sys/Zystm.22@EQTRD

Recovery Manager: Release 19.0.0.0.0 - Production on Sun Oct 13 18:40:05 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: EQTRD (DBID=625173207)

RMAN> restore from service 'EQTRD' spfile to '/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileEQTRD.ora';

Starting restore at 13-OCT-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=55 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service EQTRD
channel ORA_DISK_1: restoring SPFILE
output file name=/u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileEQTRD.ora
channel ORA_DISK_1: restore complete, elapsed time: 00:00:02
Finished restore at 13-OCT-19

RMAN>

Moving the spfile to the file system required no access to ASM and no knowledge of the ASM file name.

Check that spfile exists on the file system

[oracle@ip-172-31-88-93 ~]$ ls -l /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileEQTRD.ora
-rw-r-----. 1 oracle dba 3584 Oct 13 18:40 /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileEQTRD.ora

Check that Restart is updated

Remove old spfile from ASM

[ec2-user@ip-172-31-88-93 ~]$ sudo -i -u grid
The Oracle base has been set to /u01/app/grid
[grid@ip-172-31-88-93 ~]$ asmcmd rm +DATA01/EQTRD/PARAMETERFILE/spfile.280.1021572757
[grid@ip-172-31-88-93 ~]$

Restart the database

Instead of using sqlplus, you may use srvctl.

[oracle@ip-172-31-88-93 ~]$ srvctl stop database -database EQTRD -stopoption ABORT
[oracle@ip-172-31-88-93 ~]$ srvctl start database -database EQTRD
[oracle@ip-172-31-88-93 ~]$

Conclusion

Previously, moving spfile TO ASM required five steps:

  1. Create pfile from spfile
  2. Create spfile from pfile
  3. Connect to ASM and identify the file name.
  4. Update CRS by running srvctl modify database
  5. Restart the instance

By the way, some DBAs pre-create an ASM directory. I don’t know why. Some DBAs specify an ASM alias name for the spfile. I just let ASM generate a random, unique name.

spfile was moved from the file system in a single step. It was unnecessary to create an intermediate pfile, nor was it necessary to know the path to the source spfile. You only need to specify the ASM disk group. The technique did not require the DBA to access ASM to learn the location of the spfile. There is no separate srvctl modify database step. RMAN takes care of that.

In RAC, spfile centralization is an administrative win. Multiple instance settings can be stored in a single file. Initialization parameter settings applicable to all instances, or only one instance can be made even if a RAC node is down. Parameters follow an instance even if an instance is relocated. The same goes for RAC One Node.

Remember that a Restart instance can be relocated, too. With spfile in ASM, there is no need to move spfile during relocation. If you upgrade the Oracle version, you don’t have to be concerned about moving the spfile or reconfiguring Restart.

RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process

By Brian Fitzgerald

Introduction

After dropping a logical standby database, ORA-08137 appeared. Clearing log_archive_dest_2 resolved the issue.

Problem

During testing, I dropped a logical standby database. I attempted to delete archive logs on the primary. ORA-08137 appeared.

RMAN> delete noprompt archivelog all;

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=39 device type=DISK
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+RECO01/EQTRD/ARCHIVELOG/2019_10_12/thread_1_seq_24.263.1021422689 thread=1 sequence=24

Solution

Having dropped the standby database, the appropriate solution is to clear log_archive_dest_2.

SQL> alter system set log_archive_dest_2 = '';

System altered.

Now the delete succeeds:

RMAN> delete noprompt archivelog all;

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=39 device type=DISK
List of Archived Log Copies for database with db_unique_name EQTRD
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - ---------
33      1    24      A 11-OCT-19
        Name: +RECO01/EQTRD/ARCHIVELOG/2019_10_12/thread_1_seq_24.263.1021422689

deleted archived log
archived log file name=+RECO01/EQTRD/ARCHIVELOG/2019_10_12/thread_1_seq_24.263.1021422689 RECID=33 STAMP=1021422689
Deleted 1 objects

No force is needed.

Conclusion

There are numerous causes for RMAN-08137 relating to GoldenGate and Streams. These situations are described in the blogs and forums. Also, other logical standby situations have a different solution. Our case here is simple. We dropped the logical standby. The solution is also simple: Clear log_archive_dest_2. Thanks to Dean Capps for describing this solution. Refer also to Note 2169282.1.

 

Unexpected behavior in Data Guard Maximum Protection

By Brian Fitzgerald

Introduction

Maximum Protection mode provides the highest level of data protection in Data Guard. Although a Maximum Protection system protects data as documented, there are performance issues that database administrators should be aware. The most glaring weakness is that in case of a network timeout anywhere in the system, the primary database hangs.  Administrators should be aware of this before deciding on Maximum Protection mode.

Protection Modes

To quote the manual, Oracle Data Guard ensures high availability, data protection, and disaster recovery for enterprise data. Oracle Data Guard provides three distinct modes of data protection.

Maximum Performance

This is the default protection mode. It provides the highest level of data protection that is possible without affecting the performance of a primary database. 

Maximum Availability

This protection mode provides the highest level of data protection that is possible without compromising the availability of a primary database. 

Maximum Protection

This protection mode ensures that no data loss occurs if the primary database fails. To provide this level of protection, the redo data needed to recover a transaction must be written to both the online redo log and to the standby redo log on at least one synchronized standby database before the transaction commits. To ensure that data loss cannot occur, the primary database shuts down, rather than continue processing transactions, if it cannot write its redo stream to at least one synchronized standby database.

This article is about Maximum Protection mode. Based on the description, there are a few obvious points to be deduced about Maximum protection. A practical system must have more than one standby. Otherwise, failure of the sole standby will cause the system to crash. Secondly, a low latency network is required to achieve reasonable commit performance. Finally, completely redundant hosts and storage are required for each standby. In the past, with physical hardware and long administrative lead times, it was prohibitive to experiment with Maximum Protection. In the cloud, however, virtual machines are inexpensive and setup is quick.

License

Data Guard is a feature of the Oracle Database Enterprise Edition itself and does not require separate licensing. By using Amazon Elastic Cloud Computing, (EC2), you can control your license costs by configuring only the CPUs that you need.

Overview

The demonstration system characteristics are:

  • Amazon AWS
  • Region us-west-1 (Northern California)
  • Availability zones us-west-1b and us-west-1c
  • Elastic Compute Cloud (EC2)
  • Red Hat Linux 7.2
  • Oracle Grid Infrastructure 19c
  • Oracle Database 19c

AWS availability zones are separate locations within the same geographic region. In us-wet-1, intra-availability-zone TCP latency, as measured by qperf,  is 136μs, which is attractive for setting up Oracle Data Guard Maximum Protection. Inter-availability-zone TCP latency is 562μs. Note that availability zones identifiers are mapped independently for each account. For example, availability zone us-west-1b could refer to different locations for different accounts. Here is the mapping from database to availability zone that was used in this test.

db unique name availability zone
SFB1 us-west-1b
SFB2 us-west-1b
SFC1 us-west-1c
SFC2 us-west-1c

Setup

Here is a brief summary of  the preparatory steps:

  • environment:
    • create and configure virtual machine
    • install grid infrastructure
    • create ASM disk groups
    • install oracle database software
    • setup static listener
  • duplicate database
    • setup tnsnames.ora
    • create orapwd
    • create temporary init ora
    • startup nomount
    • duplicate target database for standby
  • setup Oracle Restart
  • validate all static listeners
  • start Data Guard broker

Create configuration

DGMGRL> create configuration 'CFG' primary database is 'SFB1' connect identifier is 'SFB1';
Configuration "CFG" created with primary database "SFB1"
DGMGRL> add database 'SFB2' as connect identifier is 'SFB2';
Database "SFB2" added
DGMGRL> add database 'SFC1' as connect identifier is 'SFC1';
Database "SFC1" added
DGMGRL> add database 'SFC2' as connect identifier is 'SFC2';
Database "SFC2" added

Set log transfer property to SYNC. Set Maximum Availability

DGMGRL> edit database SFB2 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit database SFC1 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit database SFC2 set property LogXptMode=SYNC;
Property "logxptmode" updated
DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.

Test switchovers. Primary SFB1->SFB2->SFC1->SFC2->SFB1.

DGMGRL> switchover to SFB2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFB2"
Connecting ...
Connected to "SFB2"
Connected as SYSDBA.
New primary database "SFB2" is opening...
Oracle Clusterware is restarting database "SFB1" ...
Connected to "SFB1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB1"
Connected to "SFB1"
Switchover succeeded, new primary is "sfb2"
DGMGRL> switchover to SFC1
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC1"
Connecting ...
Connected to "SFC1"
Connected as SYSDBA.
New primary database "SFC1" is opening...
Oracle Clusterware is restarting database "SFB2" ...
Connected to "SFB2"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB2"
Connected to "SFB2"
Switchover succeeded, new primary is "sfc1"
DGMGRL> switchover to SFC2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC2"
Connecting ...
Connected to "SFC2"
Connected as SYSDBA.
New primary database "SFC2" is opening...
Oracle Clusterware is restarting database "SFC1" ...
Connected to "SFC1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFC1"
Connected to "SFC1"
Switchover succeeded, new primary is "sfc2"
DGMGRL> switchover to SFB1
Performing switchover NOW, please wait...
Operation requires a connection to database "SFB1"
Connecting ...
Connected to "SFB1"
Connected as SYSDBA.
New primary database "SFB1" is opening...
Oracle Clusterware is restarting database "SFC2" ...
Connected to "SFC2"
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFC2"
Connected to "SFC2"
Switchover succeeded, new primary is "sfb1"

Now we are ready to implement Maximum Protection mode

Set Maximum Protection

DGMGRL> edit configuration set protection mode as MaxProtection; 
Succeeded.

Test switchovers again. There is no issue

DGMGRL> switchover to SFC2
Performing switchover NOW, please wait...
Operation requires a connection to database "SFC2"
Connecting ...
Connected to "SFC2"
Connected as SYSDBA.
New primary database "SFC2" is opening...
Oracle Clusterware is restarting database "SFB1" ...
Connected to "SFB1"
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "SFB1"
Connected to "SFB1"
Switchover succeeded, new primary is "sfc2"

Network failures

Failure of some, but not all standbys

Here, where you are testing actual network failures, is where you are going to notice serious issues with Maximum Protection mode.

The initial status is normal. All standbys are in sync with the primary.

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    SFB2 - Physical standby database
    SFC1 - Physical standby database
    SFC2 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

Now we want to cause a network failure to one standby, SFC2, for example. We stage the failure by cutting TCP port 1521. In the AWS console, create a new security group that allows ingress to TCP port 22 only.

cutoff.sg

Go to change security groups. Un-check the original security group, check the “ssh only” security group, and press “Assign Security Groups”. There are three standbys, but we are cutting off network access to only one standby. The system should continue to operate normally. However, right away, you can notice a problem. Try to commit a transaction.

SQL> insert into t ( n ) values ( 0 );

1 row created.

SQL> commit;

You will find that the transaction does not complete. You can run “show configuration”. For 30 seconds, the configuration shows up as normal. No messages appear in the alert log. All the while, only one standby is unreachable. Two other standbys are reachable, but the database hangs. Eventually, Oracle notices the issue. Messages start appearing in the alert log.

2019-09-13T12:43:51.076718-04:00
LGWR (PID:26748): ORA-16198: Received timed out error from KSR
LGWR (PID:26748): Attempting LAD:4 network reconnect (16198)
LGWR (PID:26748): LAD:4 network reconnect abandoned

The transaction commits. The configuration now appears as:

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    SFB2 - Physical standby database
    SFC1 - Physical standby database
    SFC2 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 8 seconds ago)

You can repeat the process for a second standby, SFC1, for example. Again, the database hangs for 30 seconds, and finally resolves.

DGMGRL> show configuration

Configuration - CFG

  Protection Mode: MaxProtection
  Members:
  SFB1 - Primary database
    Error: ORA-16778: redo transport error for one or more members

    SFB2 - Physical standby database
    SFC1 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

    SFC2 - Physical standby database
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 160 seconds ago)

This section, where failure of some, but not all standbys leads to an application hang is going to be objectionable to some application users.

Failure of all standbys

Now disconnect the third standby.  Observe this behavior:

  • The primary will retry the standbys for 5 minutes.
  • Sessions that issue a commit will wait.
  • The primary will abort and restart.
  • During restart, the instance will hang on open database.
  • After 8 more minutes, the primary will abort.

Host shutdown is the expected behavior after a failure of all standbys. The following message will appear in SQL*Plus sessions that were waiting on commit.

commit
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 10524
Session ID: 39 Serial number: 7415

Some may find it surprising that the shutdown is not instantaneous, and that it takes five minutes before the database shuts down.

Adjusting NetTimeout

You can reduce the amount of time that an application can wait on TCP timeout:

DGMGRL> edit database SFB1 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFB2 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFC1 set property NetTimeout = 5;
Property "nettimeout" updated
DGMGRL> edit database SFC2 set property NetTimeout = 5;
Property "nettimeout" updated

This is a workaround and is beside the point. The application still has to wait to commit even though one or more standbys is still available. There should be no hanging.

Conclusion

There is a flaw in Data Guard Maximum Protection. If a TCP timeout error involving only one standby, the primary will hang even if other standbys are available. This issue will lead some administrators to rule out Data Guard Maximum Protection.

In the past, because of high hardware costs, long lead times, and separation of duties across DBA and SAs, it was rare to test Maximum Protection. Now, with the cloud, testing such arrangements is convenient, but testing uncovers weakness in the design.

Likewise, in the past, because of separation of duties across DBAs and network administrators, it was rare for DBAs to stage TCP timeout tests, but such tests are simple to perform now. TCP timeout tests uncover weaknesses in Data Guard.

Data Guard Maximum Availability has improved over time to the point where it is an acceptable substitute for Maximum Performance in some cases.

Oracle 12.1 introduced the far sync instance, which can be configured near a primary database on a low latency network to provide near-zero-data loss capability. Maximum Availability and far sync are described in this article. An Active Data Guard license is required for far sync. Ironically, in this sense, Maximum Availability costs more than Maximum Protection, the highest level.

srvctl enable database not working in Restart

By Brian Fitzgerald

Introduction

In Restart, srvctl enable database does not actually enable database restart. This could be a bug.

Demonstration

srvctl add database

[oracle@ip-172-31-86-22 ~]$ srvctl stop database -database NY
[oracle@ip-172-31-86-22 ~]$ srvctl remove database -d NY -y
[oracle@ip-172-31-86-22 ~]$ srvctl add database -database NY -role PRIMARY -stopoption IMMEDIATE -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile +DATA01/NY/PARAMETERFILE/spfile.263.1018152951 -diskgroup DATA01,RECO01
[oracle@ip-172-31-86-22 ~]$ logout
[ec2-user@ip-172-31-86-22 ~]$ sudo reboot

log back in:

[oracle@ip-172-31-86-22 ~]$ ps -ef| grep smon
grid      2214     1  0 14:06 ?        00:00:00 asm_smon_+ASM
oracle    2239  2115  0 14:06 pts/0    00:00:00 grep --color=auto smon

srvctl enable database

You could try:

[oracle@ip-172-31-86-22 ~]$ srvctl enable database -d NY
PRCC-1010 : NY was already enabled
PRCR-1002 : Resource ora.ny.db is already enabled

Reboor, log back in, and check:

[oracle@ip-172-31-86-22 ~]$ logout 
[ec2-user@ip-172-31-86-22 ~]$ sudo reboot
[oracle@ip-172-31-86-22 ~]$ ps -ef| grep smon
grid      2112     1  0 15:10 ?        00:00:00 asm_smon_+ASM
oracle    2807  2782  0 15:16 pts/0    00:00:00 grep --color=auto smon

No luck.

srvctl enable instance command is not available

You could try

[grid@ip-172-31-86-22 ~]$ srvctl enable instance -db NY -instance NY

but in Restart, this message will appear:

PRKO-2012 : instance object is not supported in Oracle Restart

Workarounds

I am aware of two workarounds

1. srvctl start database

Issuing srvctl start database has the side effect of enabling automatic restart.

[oracle@ip-172-31-86-22 dbs]$ srvctl start database -database NY
PRCC-1014 : NY was already running
PRCR-1004 : Resource ora.ny.db is already running
PRCR-1079 : Failed to start resource ora.ny.db

From now on, the oracle database instance will start automatically.

2. crsctl modify resource

Issuing this command as the grid owner

[grid@ip-172-31-86-22 ~]$ crsctl modify resource ora.ny.db -attr AUTO_START=always -unsupported

also resolves the issue.

reboot

After you try one of these workarounds, reboot and notice that Oracle database is running.

[oracle@ip-172-31-86-22 ~]$ ps -ef| grep smon
grid      2216     1  0 15:03 ?        00:00:00 asm_smon_+ASM
oracle    2332     1  0 15:03 ?        00:00:00 ora_smon_ORCL
oracle    2411  1781  0 15:03 pts/0    00:00:00 grep --color=auto smon

Conclusion

In Restart, “srvctl enable database” does not lead to automatic instance restart upon reboot as you might expect, but you can work around by issuing “srvctl start database” or “crsctl modify resource”.

DGMGRL not required in listener.ora in Restart

By Brian Fitzgerald

Introduction

In Oracle Restart, _DGMGRL services are no longer required in listener.ora. Switchover output has changed slightly and _DGMGRL connections no longer appear in the listener log.

Background

While testing switchover in Data Guard in 19c, I noticed that after configuring Restart, connections to _DGMGRL longer appeared in the listener log. I deleted the _DGMGRL services, reloaded the listeners, and retested the switchover without issue.

Static listeners (initial)

Initially, in the grid account, in $ORACLE_HOME/network/admin/listener.ora, these SID_LIST_LISTENER were in place. At the primary:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = NY_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

At the far sync:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = FS_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

At the standby:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = SF_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

I reloaded the listeners at each host:

[grid@ip-172-31-86-22 ~]$ lsnrctl reload

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 02-SEP-2019 12:28:09

Copyright (c) 1991, 2019, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
The command completed successfully

Switchover test before setting up Restart

The following switchover test without Restart was done:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Operation requires start up of instance "ORCL" on database "NY"
Starting instance "ORCL"...
Connected to an idle instance.
ORACLE instance started.
Connected to "NY"
Database mounted.
Connected to "NY"
Switchover succeeded, new primary is "sf"

Notice the message ‘Operation requires start up of instance “ORCL” on database “NY”‘. In the NY listener log, several connections to service NY_DGMGRL appear. For example:

02-SEP-2019 22:46:10 * (CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=51298)) * establish * NY_DGMGRL * 0

Run “show configuration”:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxPerformance
  Members:
  SF - Primary database
    FS - Far sync instance
      NY - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

Switch back to the original primary.

DGMGRL> switchover to NY
Performing switchover NOW, please wait...
New primary database "NY" is opening...
Operation requires start up of instance "ORCL" on database "SF"
Starting instance "ORCL"...
Connected to an idle instance.
ORACLE instance started.
Connected to "SF"
Database mounted.
Connected to "SF"
Switchover succeeded, new primary is "ny"

Notice the message ‘Operation requires start up of instance “ORCL” on database “SF”‘ . In the SF listener log, several connections to service SF_DGMGRL appear. For example:

02-SEP-2019 22:43:08 * (CONNECT_DATA=(SERVICE_NAME=SF_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=50340)) * establish * SF_DGMGRL * 0

Notice already at the far sync, no connections to the FS_DGMGRL service appeared. Service FS_DGMGRL plays no role at the far sync.

Configure Restart

Register your Restart instances.

At the primary:

[oracle@ip-172-31-86-22 ~]$ srvctl add database -database NY -role PRIMARY -stopoption IMMEDIATE -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile +DATA01/ORCL/PARAMETERFILE/spfile.266.1017440879 -diskgroup DATA01,RECO01

At the far sync:

[oracle@ip-172-31-28-23 ~]$ srvctl add database -database FS -role physical_standby -startoption MOUNT -stopoption ABORT -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora -diskgroup DATA01,RECO01

To activate the change, I found it necessary to issue srvctl start. That does not seem right. “srvctl enable database” should do it.

Also, in Restart, “srvctl enable instance” is not available. “srvctl enable instance” only works in RAC.

Unless you activate the instance, it will not start automatically upon host reboot.

[oracle@ip-172-31-28-23 ~]$ srvctl start database -database FS

At the standby:

[oracle@ip-172-32-10-34 ~]$ srvctl add database -database SF -role physical_standby -startoption MOUNT -stopoption ABORT -instance ORCL -oraclehome /u01/app/oracle/product/19.3.0/dbhome_1 -spfile /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileORCL.ora -diskgroup DATA01,RECO01
[oracle@ip-172-32-10-34 ~]$ srvctl start database -database SF

Reboot all hosts and check that the database instances start automatically, in the proper startup mode and Data Guard role.

Switchover test

Test switchover to SF. The output changes slightly, and becomes:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Oracle Clusterware is restarting database "NY" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to an idle instance.
Connected to "NY"
Connected to "NY"
Switchover succeeded, new primary is "sf"

After configuring Restart, the Data Guard switchover output has changed slightly. Message ‘Operation requires start up of instance “ORCL” on database “NY”‘ has been replaced with ‘Oracle Clusterware is restarting database “NY” …’. A review of the NY listener log shows no connection to service NY_DGMGRL.

Test switchover to NY. The output is now:

DGMGRL> switchover to NY
Performing switchover NOW, please wait...
New primary database "NY" is opening...
Oracle Clusterware is restarting database "SF" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to "SF"
Connected to "SF"
Switchover succeeded, new primary is "ny"

A review of the SF listener log shows no connection to service SF_DGMGRL.

Static listeners (final)

In Restart, therefore, the “_DGMGRL” listener.ora entry is not needed. SID_LIST_LISTENER can be simplified on the primary, far sync, and standby as:

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

After configuring Oracle Restart, switchovers were retested with the revised listener.ora without issue.

Error in non-Restart and no DGMGRL

Suppose we disable restart at NY.

[oracle@ip-172-31-86-22 ~]$ srvctl stop database -database NY
[oracle@ip-172-31-86-22 ~]$ srvctl remove database -database NY -y
[oracle@ip-172-31-86-22 dbs]$ cat > initORCL.ora
spfile='+DATA01/NY/PARAMETERFILE/spfile.263.1018152951'
[oracle@ip-172-31-86-22 dbs]$ sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Sep 12 11:07:20 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             318767104 bytes
Database Buffers          805306368 bytes
Redo Buffers                7880704 bytes
Database mounted.
Database opened.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    SF - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 11 seconds ago)

(Note: In this example, there is no far sync.) Now try a switchover:

DGMGRL> switchover to SF
Performing switchover NOW, please wait...
Operation requires a connection to database "SF"
Connecting ...
Connected to "SF"
Connected as SYSDBA.
New primary database "SF" is opening...
Operation requires start up of instance "ORCL" on database "NY"
Starting instance "ORCL"...
Unable to connect to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)))
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Failed.

Please complete the following steps to finish switchover:
        start up and mount instance "ORCL" of database "NY"

The new primary, SF, opens just fine. The old primary, NY, got shut down, but now there is no way for the broker to restart it. The listener is not listening on behalf of the oracle database.

[grid@ip-172-31-86-22 ~]$ lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 12-SEP-2019 11:26:00

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                12-SEP-2019 10:15:24
Uptime                    0 days 1 hr. 10 min. 36 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/ip-172-31-86-22/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ip-172-31-86-22.ec2.internal)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_RECO01" has 1 instance(s).
  Instance "+ASM", status READY, has 1 handler(s) for this service...
The command completed successfully

This error referring to misssing service NY_DGMGRL appears in the listener log:

12-SEP-2019 11:22:09 * (CONNECT_DATA=(SERVICE_NAME=NY_DGMGRL)(INSTANCE_NAME=ORCL)(SERVER=DEDICATED)(CID=(PROGRAM=dgmgrl)(HOST=ip-172-31-86-22.ec2.internal)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.31.86.22)(PORT=53930)) * establish * NY_DGMGRL * 12514
TNS-12514: TNS:listener does not currently know of service requested in connect descriptor

The conclusion is that without Restart we need the static listener and the DGMGRL in SID_LIST_LISTENER.

Validate static listeners

Validating the static listeners is a critical Data Guard setup step. After duplicating the database and setting up Restart, check each static listener.:

  • Connect over the network with sqlplus
  • Shutdown (shutdown abort for standbys and far syncs)
  • Startup (startup mount for standbys and far syncs)

Example:

[oracle@ip-172-31-28-23 ops]$ sqlplus sys/zystm.22@FS as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Sep 13 11:36:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.

This is an example of a failed static listener check.

[oracle@ip-172-31-28-23 ops]$ sqlplus sys/zystm.22@FS as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Sep 13 11:36:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1140849904 bytes
Fixed Size                  8895728 bytes
Variable Size             301989888 bytes
Database Buffers          822083584 bytes
Redo Buffers                7880704 bytes
Database mounted.
SQL> shutdown abort
ORACLE instance shut down.
ERROR:
ORA-12505: TNS:listener does not currently know of SID given in connect
descriptor


SQL> startup mount
SP2-0640: Not connected

Resolve all failures before proceeding.

Conclusion

In Oracle Database 12.1, the Data Guard Broker Manager tool (dgmgrl) was modified so that in Restart or RAC, “_DGMGRL” is no longer required. This fact is mentioned in Oracle Data Guard Broker and Static Service Registration (Doc ID 1387859.1).

On the other hand, if you are not using Restart, then you do need DGMGRL in your SID_LIST_LISTENER.

Restart is only available with Grid Infrastructure. The main benefit of Grid Infrastructure is ASM. If you built Oracle Database on operating system files, then you have less of a need for Grid Infrastructure. If you did not install Grid Infrastructure, then Restart is not available. Without Retstart, a Data Guard setup will need a DGMGRL service in SID_LIST_LISTENER.

During Data Guard setup, validate all static listeners.

Symmetric far sync Data Guard

Introduction

Presented here is a reliable, symmetric Oracle Data Guard network. By reliable is meant that zero data loss and continuous availability is guaranteed in case of loss of one far sync. Symmetric means that there is is no preferred primary site.

Background

I used to drive a Navy Nuclear Submarine. On a weekly basis, we switched over our running machinery. For example, we switched over our turbine generator lubricating oil pumps. The engineroom upper level watch would the start the standby pump, verify proper response, and then and place the originally running pump in standby. We shifted our pumps for multiple reasons. Switching pumps equalized wear. It also verified readiness and operability. Finally, by operating the controls on a regular basis, the crew maintained a higher level of proficiency.

Notice a few things about this arrangement. No pump is a “preferred” primary. The pump functioning in the standby role is in no way inferior in design, specifications, or readiness. Because the machinery is rotated on a weekly basis, there is no question that the standby is ready to take over the primary role immediately when necessary.

So it is with databases. You need to have  a disaster recovery strategy. The DR site needs to be in a known state of readiness at all times. The best way of accomplishing that is to rotate DR sites on a weekly or quarterly basis. No site is a “preferred” primary. By regularly exercising the system, there is no question that a standby is in a state of readiness. Finally, database administrators maintain proficiency by regularly switching over the Data Guard systems.

A reliable, symmetric Data Guard arrangement is convenient to set up in the cloud. You have complete flexibility over which region and availability zone for placing your databases and far syncs.

Environment

The environment is Amazon AWS EC2 (Elastic Computing Cloud) with Red Hat Linux 7.2. Here is an overview of the environment.

db unique name region availability zone type net I/O sync latency from latency μs
NY us-east-1 us-east-1c database SYNC SF 32000
NY_FSA us-east-1 us-east-1d far sync ASYNC NY 250
NY_FSB us-east-1 us-east-1a far sync ASYNC NY 500
SF us-west-1 us-west-1b database SYNC NY 31500
SF_FSA us-west-1 us-west-1b far sync ASYNC SF 115
SF_FSB us-west-1 us-west-1c far sync ASYNC SF 600

Notice that NY, NY_FSA, and NY_FSB are in the same region but in separate availability zones. Also, SF, SF_FSA, and SF_FSB are in the same region but SF_FSB is in an availability zone separate from SF and SF_FSA. SF and SF_FSA are in the same availability zone. The highest latency is across regions. The lowest latency is within one availability zone. Medium latency is in the same region but across availability zones.

Here is a network diagram.

reliable.symmetric

In Maximum Availability mode, the primary ships redo to one far sync, leaving the second far sync as an alternate. In fase of a failure on the far sync, the alternate comes on line, resyncs with the primary, and takes over the role of far sync. The primary always ships redo to a far sync in the same region. For example, NY ships to NY_FSA or NY_FSB. Because of this arrangement, there is a total of four far syncs, two per region. Although four far sync hosts are required, the hosts could be configured with less CPU, memory, and disk space than the primary database. In a cloud, CPU, memory, and disk can be reconfigured quickly. Note that in the present symmmetric arrangement, far sync direction is one-way, from the local database to the database in the remote region.

Setup

You can assume an initial, enabled Maximum Availability configuration with only a primary and standby database, and no far syncs.

Create and enable the far syncs

Create far syncs as explained in Data Guard 19c in AWS with far sync. Add the far syncs and enable them.

DGMGRL> add far_sync "NY_FSA" as connect identifier is 'NY_FSA';
far sync instance "NY_FSA" added
DGMGRL> enable far_sync "NY_FSA";
Enabled.
DGMGRL> add far_sync 'NY_FSB' as connect identifier is 'NY_FSB';
far sync instance "NY_FSB" added
DGMGRL> enable far_sync 'NY_FSB'
Enabled.
DGMGRL> add far_sync 'SF_FSA' as connect identifier is 'SF_FSA';
far sync instance "SF_FSA" added
DGMGRL> enable far_sync 'SF_FSA';
Enabled.
DGMGRL> add far_sync 'SF_FSB' as connect identifier is 'SF_FSB';
far sync instance "SF_FSB" added
DGMGRL> enable far_sync 'SF_FSB'
Enabled.

The configuration so far looks like this:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY     - Primary database
    SF     - Physical standby database

  Members Not Receiving Redo:
  NY_FSA - Far sync instance
  SF_FSA - Far sync instance
  SF_FSB - Far sync instance
  NY_FSB - Far sync instance

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 57 seconds ago)

Edit the far sync routes

DGMGRL> edit far_sync NY_FSA set property RedoRoutes = '(NY : SF ASYNC)';
Property "redoroutes" updated
DGMGRL> edit far_sync NY_FSB set property RedoRoutes = '(NY : SF ASYNC)';
Property "redoroutes" updated
DGMGRL> edit far_sync SF_FSA set property RedoRoutes = '(SF : NY ASYNC)';
Property "redoroutes" updated
DGMGRL> edit far_sync SF_FSB set property RedoRoutes = '(SF : NY ASYNC)';
Property "redoroutes" updated

Edit the database routes

DGMGRL> edit database NY set property RedoRoutes = '(LOCAL : (NY_FSA SYNC, NY_FSB SYNC))';
Property "redoroutes" updated
DGMGRL> edit database SF set property RedoRoutes = '(LOCAL : (SF_FSA SYNC, SF_FSB SYNC))';
Property "redoroutes" updated
DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY     - Primary database
    NY_FSA - Far sync instance
      SF     - Physical standby database

  Members Not Receiving Redo:
  SF_FSA - Far sync instance
  SF_FSB - Far sync instance
  NY_FSB - Far sync instance (alternate of NY_FSA)

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 40 seconds ago)

Test far sync failover

In case of a failure of the far sync, Data Guard should quickly switch over to the alternate far sync. In this example, if there is a disrpution of NY_FSA, Data Guard should fail over to NY_FSB. We want to observe the system response to a simulated failure at NY_FSA. There are two failure scenarios.

end-of-file on communication channel

In a TCP client, such as the primary database connection to the far sync, ORA-03113: end-of-file on communication channel will appear when the far sync is disrupted but the far sync host is still on the network. Possible causes could include far sync instance crash, restart, or host reboot. As long as the host is on the network at the link level, it is possible for the primary host operating system to quickly identify the error and return a failure to the primary database. In such cases, an error such as this one appears immediately in the alert log.

2019-09-10T13:44:33.987037-04:00
ORA-03113: end-of-file on communication channel

In such a case, recovery will be very quick. The primary will immediately connect to the alternate far sync. Synchronization can be complete in less than 10 seconds.

timeout error

A timeout will occur if there is a power loss or network disruption. A network disruption could occur as a result of a network configuration change. I can demonstrate a timeout by cutting off the TCP connectivity at the cloud level. To do that, I switch the NY_FSA host to a security group that does not allow traffic on the Oracle port (1521).

clear.sg

In this case, Oracle error detection and recovery is slow. In 30 seconds, the NetTimeout default, the primary connection to NY_FSA (LAD:2 (log archive destination) times out:

2019-09-10T18:22:28.214255-04:00
LGWR (PID:4969): ORA-16198: Received timed out error from KSR
LGWR (PID:4969): Attempting LAD:2 network reconnect (16198)
LGWR (PID:4969): LAD:2 network reconnect abandoned
2019-09-10T18:22:28.214836-04:00
Errors in file /u01/app/oracle/diag/rdbms/ny/ORCL/trace/ORCL_lgwr_4969.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR (PID:4969): Error 16198 for LNO:1 to 'NY_FSA'
2019-09-10T18:22:28.223786-04:00
LGWR (PID:4969): LAD:2 is UNSYNCHRONIZED
LGWR (PID:4969): Failed to archive LNO:1 T-1.S-333, error=16198

After another 30 seconds, the primary gives up on NY_FSA and switches to NY_FSB (LAD:3).

2019-09-10T18:22:58.232029-04:00
LGWR (PID:4969): ORA-16198: Received timed out error from KSR
LGWR (PID:4969): Error 16198 disconnecting from LAD:2 standby host 'NY_FSA'
2019-09-10T18:22:58.232531-04:00
LGWR (PID:4969): LAD:3 is UNSYNCHRONIZED
2019-09-10T18:22:58.232638-04:00
LGWR (PID:4969): LAD:2 no longer supports SYNCHRONIZATION
LGWR (PID:4969): SRL selected to archive T-1.S-334
LGWR (PID:4969): SRL selected for T-1.S-334 for LAD:3
2019-09-10T18:22:58.449325-04:00
Thread 1 advanced to log sequence 334 (LGWR switch)
  Current log# 2 seq# 334 mem# 0: +RECO01/NY/ONLINELOG/group_2.484.1018151803
  Current log# 2 seq# 334 mem# 1: +DATA01/NY/ONLINELOG/group_2.270.1018151815
2019-09-10T18:22:58.519853-04:00
ARC0 (PID:5073): Archived Log entry 675 added for T-1.S-333 ID 0x5c2b52e5 LAD:1
2019-09-10T18:22:58.707980-04:00
ARC1 (PID:5079): SRL selected for T-1.S-333 for LAD:3

Synchronization begins, but four to five minutes elapse until the primary is resynced to the new far sync, NY_FSB.

2019-09-10T18:27:01.888256-04:00
LGWR (PID:4969): LAD:3 is SYNCHRONIZED

While NY_FSB is synchronizing, there are two things to notice. The primary status is:

  NY     - Primary database
    Warning: ORA-16629: database reports a different protection level from the protection mode

and the protection level:

SQL> select protection_level from v$database;

PROTECTION_LEVEL
--------------------
RESYNCHRONIZATION

The final state is:

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY     - Primary database
    NY_FSB - Far sync instance (alternate of NY_FSA)
      SF     - Physical standby database

  Members Not Receiving Redo:
  NY_FSA - Far sync instance
    Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

  SF_FSA - Far sync instance
  SF_FSB - Far sync instance

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 59 seconds ago)

Adjusting NetTimeout

On a low latency and usually reliable network, you should reduce the timeout. Example:

DGMGRL> edit database NY set property NetTimeout = 5;
Property "nettimeout" updated

This makes for a cleaner far sync switchover.

Maximum Protection

You might think that you could implement protection mode Maximum Protection with multiple far syncs:

DGMGRL> edit configuration set protection mode as MaxProtection;

Oracle will not let you do that.

Error: ORA-16627: operation disallowed since no member would remain to support protection mode

If you want to implement Maximum Protection, you will need to implement a direct route to one or more physical standby databases.

Conclusion

Several points were covered in this article.

  • Data Guard latency can be reduced by implementing far sync.
  • Data Guard can be made more reliable by placing multiple far syncs near the primary.
  • Benefits to setting up a symmetric arrangement include:
    • Standby readniness has been recently verified.
    • The standby capability is known to be identical to the primary.
    • Heightened DBA staff proficiency.
  • Setup implementation steps were covered.
  • Data Guard does not keep alternate far syncs in sync with the primary.
  • Data Guard is quick to recover from some error conditions.
  • Data Guard detection and recovery from network timeout can be slow.
  • Maximum Protection though a far sync is not supported.

Data Guard error in Maximum Availability

Introduction

Errors were noted while changing Data Guard protection mode to Maximum Availability. The root cause was mismatched standby redo log size.

Symptoms

We are in protection mode Maximum Performance and we want to set Maximum Availability. Before we begin, we notice something is amiss, but for the sake of discussion, we are not sure what, and we proceed.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxPerformance
  Members:
  NY - Primary database
    FS - Far sync instance
      Warning: ORA-16809: multiple warnings detected for the member

      SF - Physical standby database
        Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 57 seconds ago)

We neglect the errors and set protection mode to Maximum Availability.

DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.
DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    FS - Far sync instance
      Warning: ORA-16855: transport lag has exceeded specified threshold

      SF - Physical standby database
        Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 59 seconds ago)

The warnings persist. We force a logfile switch at the primary

[oracle@ip-172-31-86-22 ~]$ sqlplus sys/zystm.22@NY as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Sep 5 21:16:54 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> alter system switch logfile;

System altered.

Now errors appear in the show configuration display.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    Error: ORA-16810: multiple errors or warnings detected for the member

    FS - Far sync instance
      Warning: ORA-16809: multiple warnings detected for the member

      SF - Physical standby database
        Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 51 seconds ago)

The system has errors. Logs are not being processed. We are in hot water. We investigate.

Let’s review the standby and work backward via the far sync, and finally at the primary.

DGMGRL> show database verbose SF StatusReport
STATUS REPORT
       INSTANCE_NAME   SEVERITY ERROR_TEXT
                   *    WARNING ORA-16855: transport lag has exceeded specified threshold
                   *    WARNING ORA-16857: member disconnected from redo source for longer than specified threshold

DGMGRL> show far_sync verbose FS StatusReport
STATUS REPORT
       INSTANCE_NAME   SEVERITY ERROR_TEXT
                   *    WARNING ORA-16855: transport lag has exceeded specified threshold
                   *    WARNING ORA-16857: member disconnected from redo source for longer than specified threshold

DGMGRL> show database NY StatusReport
STATUS REPORT
       INSTANCE_NAME   SEVERITY ERROR_TEXT
                   *    WARNING ORA-16629: database reports a different protection level from the protection mode
                ORCL      ERROR ORA-16737: the redo transport service for member "FS" has an error

Message ‘the redo transport service for member “FS” has an error’ requires further drilldown:

DGMGRL> show database verbose NY LogXptStatus;
LOG TRANSPORT STATUS
PRIMARY_INSTANCE_NAME STANDBY_DATABASE_NAME     STATUS                ERROR
                ORCL                   FS      ERROR ORA-16086: Redo data cannot be written to the standby redo log

The underlying reason is found in far sync RFS trace:

Trace file /u01/app/oracle/diag/rdbms/fs/ORCL/trace/ORCL_rfs_29164.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
Build label:    RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417
ORACLE_HOME:    /u01/app/oracle/product/19.3.0/dbhome_1
System name:    Linux
Node name:      ip-172-31-28-23.ec2.internal
Release:        3.10.0-1062.el7.x86_64
Version:        #1 SMP Thu Jul 18 20:25:13 UTC 2019
Machine:        x86_64
VM name:        Xen Version: 4.2 (HVM)
Instance name: ORCL
Redo thread mounted by this instance: 1
Oracle process number: 56
Unix process pid: 29164, image: oracle@ip-172-31-28-23.ec2.internal


*** 2019-09-04T15:56:40.334571-04:00
*** SESSION ID:(66.28887) 2019-09-04T15:56:40.334605-04:00
*** CLIENT ID:() 2019-09-04T15:56:40.334616-04:00
*** SERVICE NAME:() 2019-09-04T15:56:40.334624-04:00
*** MODULE NAME:(oracle@ip-172-31-86-22.ec2.internal (TNS V1-V3)) 2019-09-04T15:56:40.334631-04:00
*** ACTION NAME:() 2019-09-04T15:56:40.334639-04:00
*** CLIENT DRIVER:() 2019-09-04T15:56:40.334646-04:00

krsv_proc_add: Request to add process to V$MANAGED_STANDBY [krsr.c:4229]
krsr_abrt: The primary database is operating in MAXIMUM PROTECTION
  or MAXIMUM AVAILABILITY mode, and the standby database
  does not contain any viable SRLs
Encountered error status 16086
krsv_proc_rem: Request to remove process from V$MANAGED_STANDBY [krsr.c:12657]

Analysis

Consider the message “the standby database does not contain any viable SRLs” (standby redo logs). By “viable” is meant standby redo logs of size greater than or equal to the primary online redo log. Check the primary online redo log:

SQL> select group#, bytes/1024/1024 mb from v$log;

    GROUP#         MB
---------- ----------
         1        200
         2        200
         3        200

All primary online redo logs are 200 MB. Now check the standby redo logs at each site.

Primary:

SQL> select group#, bytes/1024/1024 mb from v$standby_log;

    GROUP#         MB
---------- ----------
         4         50
         5         50
         6         50
         7         50
         8         50
         9        200

6 rows selected.

Already there is a problem. Not all the standby logs are the same size. Check the far sync:

SQL> select group#, bytes/1024/1024 mb from v$standby_log;

    GROUP#         MB
---------- ----------
         4         50
         5         50
         6         50
         7         50
         8         50

All the standby logs are 50 mb. They are too small. There are no viable standby redo logs. Check the standby:

SQL> select group#, bytes/1024/1024 mb from v$standby_log;

    GROUP#         MB
---------- ----------
         4         50
         5         50
         6         50
         7         50
         8         50

Again, there are no viable SRLs.

Possible root cause

The online redo log size is 200 MB, which happens to be the Database Creation Assistant (DBCA) default size.

For some reason of incorrect administration, the standby redo log size was 50 MB when the database was duplicated. The standby redo log should be 200 MB.

Corrective action

  1. Drop all standby redo logs that are not 200 MB.
  2. Create a set of 200 MB SRLs per database.

Standby database

DGMGRL> edit database SF set state=apply-off;
Succeeded.
SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4         50 UNASSIGNED
         5         50 UNASSIGNED
         6         50 UNASSIGNED
         7         50 UNASSIGNED
         8         50 UNASSIGNED

SQL> alter system set standby_file_management = manual;

System altered.

SQL> alter database drop logfile group 4;

Database altered.

SQL> alter database drop logfile group 5;

Database altered.

SQL> alter database drop logfile group 6;

Database altered.

SQL> alter database drop logfile group 7;

Database altered.

SQL> alter database drop logfile group 8;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter system set standby_file_management = auto;

System altered.

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4        200 UNASSIGNED
         5        200 UNASSIGNED
         6        200 UNASSIGNED
         7        200 UNASSIGNED
DGMGRL> edit database SF set state=apply-on;
Succeeded.

Far sync

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4         50 UNASSIGNED
         5         50 UNASSIGNED
         6         50 UNASSIGNED
         7         50 UNASSIGNED
         8         50 UNASSIGNED

SQL> alter database drop logfile group 4;

Database altered.

SQL> alter database drop logfile group 5;

Database altered.

SQL> alter database drop logfile group 6;

Database altered.

SQL> alter database drop logfile group 7;

Database altered.

SQL> alter database drop logfile group 8;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         1        200 UNASSIGNED
         2        200 UNASSIGNED
         3        200 UNASSIGNED
         4        200 UNASSIGNED

Primary

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4         50 UNASSIGNED
         5         50 UNASSIGNED
         6         50 UNASSIGNED
         7         50 UNASSIGNED
         8         50 UNASSIGNED
         9        200 UNASSIGNED

6 rows selected.

SQL> alter database drop logfile group 4;

Database altered.

SQL> alter database drop logfile group 5;

Database altered.

SQL> alter database drop logfile group 6;

Database altered.

SQL> alter database drop logfile group 7;

Database altered.

SQL> alter database drop logfile group 8;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> alter database add standby logfile size 200m;

Database altered.

SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;

    GROUP#         MB STATUS
---------- ---------- ----------
         4        200 UNASSIGNED
         5        200 UNASSIGNED
         6        200 UNASSIGNED
         9        200 UNASSIGNED

Double check all systems

Check NY, FS, and SF.

SQL> select count(*)numlog, bytes/1024/1024 mb from v$standby_log group by bytes;

    NUMLOG         MB
---------- ----------
         4        200

Data Guard

After a few seconds, note that the system state is normal. The protection mode is now Maximum Availability.

DGMGRL> show configuration

Configuration - ORCL_CONFIG

  Protection Mode: MaxAvailability
  Members:
  NY - Primary database
    FS - Far sync instance
      SF - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

Conclusion

Be aware of these lessons or practices:

  • Before duplicating the primary database, note the online redo log size. Check that all OLRs are the same size. Check that the SRLs are all the same size, and are the same size as the OLRs.
  • Recognize that a normal Data Guard in steady state should be free of warnings.
  • Before attempting to upgrade the protection level, resolve all persistent warnings.
  • Drill in to errors as follows:
    • show configuration
    • show objecttype objectname StatusReport
    • show objecttype objectname property
      where property is a monitor property, LogXptStatus in this case