Introduction
Errors were noted while changing Data Guard protection mode to Maximum Availability. The root cause was mismatched standby redo log size.
Symptoms
We are in protection mode Maximum Performance and we want to set Maximum Availability. Before we begin, we notice something is amiss, but for the sake of discussion, we are not sure what, and we proceed.
DGMGRL> show configuration
Configuration - ORCL_CONFIG
Protection Mode: MaxPerformance
Members:
NY - Primary database
FS - Far sync instance
Warning: ORA-16809: multiple warnings detected for the member
SF - Physical standby database
Warning: ORA-16809: multiple warnings detected for the member
Fast-Start Failover: Disabled
Configuration Status:
WARNING (status updated 57 seconds ago)
We neglect the errors and set protection mode to Maximum Availability.
DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.
DGMGRL> show configuration
Configuration - ORCL_CONFIG
Protection Mode: MaxAvailability
Members:
NY - Primary database
FS - Far sync instance
Warning: ORA-16855: transport lag has exceeded specified threshold
SF - Physical standby database
Warning: ORA-16809: multiple warnings detected for the member
Fast-Start Failover: Disabled
Configuration Status:
WARNING (status updated 59 seconds ago)
The warnings persist. We force a logfile switch at the primary
[oracle@ip-172-31-86-22 ~]$ sqlplus sys/zystm.22@NY as sysdba SQL*Plus: Release 19.0.0.0.0 - Production on Thu Sep 5 21:16:54 2019 Version 19.3.0.0.0 Copyright (c) 1982, 2019, Oracle. All rights reserved. Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.3.0.0.0 SQL> alter system switch logfile; System altered.
Now errors appear in the show configuration display.
DGMGRL> show configuration
Configuration - ORCL_CONFIG
Protection Mode: MaxAvailability
Members:
NY - Primary database
Error: ORA-16810: multiple errors or warnings detected for the member
FS - Far sync instance
Warning: ORA-16809: multiple warnings detected for the member
SF - Physical standby database
Warning: ORA-16809: multiple warnings detected for the member
Fast-Start Failover: Disabled
Configuration Status:
ERROR (status updated 51 seconds ago)
The system has errors. Logs are not being processed. We are in hot water. We investigate.
Let’s review the standby and work backward via the far sync, and finally at the primary.
DGMGRL> show database verbose SF StatusReport
STATUS REPORT
INSTANCE_NAME SEVERITY ERROR_TEXT
* WARNING ORA-16855: transport lag has exceeded specified threshold
* WARNING ORA-16857: member disconnected from redo source for longer than specified threshold
DGMGRL> show far_sync verbose FS StatusReport
STATUS REPORT
INSTANCE_NAME SEVERITY ERROR_TEXT
* WARNING ORA-16855: transport lag has exceeded specified threshold
* WARNING ORA-16857: member disconnected from redo source for longer than specified threshold
DGMGRL> show database NY StatusReport
STATUS REPORT
INSTANCE_NAME SEVERITY ERROR_TEXT
* WARNING ORA-16629: database reports a different protection level from the protection mode
ORCL ERROR ORA-16737: the redo transport service for member "FS" has an error
Message ‘the redo transport service for member “FS” has an error’ requires further drilldown:
DGMGRL> show database verbose NY LogXptStatus;
LOG TRANSPORT STATUS
PRIMARY_INSTANCE_NAME STANDBY_DATABASE_NAME STATUS ERROR
ORCL FS ERROR ORA-16086: Redo data cannot be written to the standby redo log
The underlying reason is found in far sync RFS trace:
Trace file /u01/app/oracle/diag/rdbms/fs/ORCL/trace/ORCL_rfs_29164.trc Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.3.0.0.0 Build label: RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417 ORACLE_HOME: /u01/app/oracle/product/19.3.0/dbhome_1 System name: Linux Node name: ip-172-31-28-23.ec2.internal Release: 3.10.0-1062.el7.x86_64 Version: #1 SMP Thu Jul 18 20:25:13 UTC 2019 Machine: x86_64 VM name: Xen Version: 4.2 (HVM) Instance name: ORCL Redo thread mounted by this instance: 1 Oracle process number: 56 Unix process pid: 29164, image: oracle@ip-172-31-28-23.ec2.internal *** 2019-09-04T15:56:40.334571-04:00 *** SESSION ID:(66.28887) 2019-09-04T15:56:40.334605-04:00 *** CLIENT ID:() 2019-09-04T15:56:40.334616-04:00 *** SERVICE NAME:() 2019-09-04T15:56:40.334624-04:00 *** MODULE NAME:(oracle@ip-172-31-86-22.ec2.internal (TNS V1-V3)) 2019-09-04T15:56:40.334631-04:00 *** ACTION NAME:() 2019-09-04T15:56:40.334639-04:00 *** CLIENT DRIVER:() 2019-09-04T15:56:40.334646-04:00 krsv_proc_add: Request to add process to V$MANAGED_STANDBY [krsr.c:4229] krsr_abrt: The primary database is operating in MAXIMUM PROTECTION or MAXIMUM AVAILABILITY mode, and the standby database does not contain any viable SRLs Encountered error status 16086 krsv_proc_rem: Request to remove process from V$MANAGED_STANDBY [krsr.c:12657]
Analysis
Consider the message “the standby database does not contain any viable SRLs” (standby redo logs). By “viable” is meant standby redo logs of size greater than or equal to the primary online redo log. Check the primary online redo log:
SQL> select group#, bytes/1024/1024 mb from v$log;
GROUP# MB
---------- ----------
1 200
2 200
3 200
All primary online redo logs are 200 MB. Now check the standby redo logs at each site.
Primary:
SQL> select group#, bytes/1024/1024 mb from v$standby_log;
GROUP# MB
---------- ----------
4 50
5 50
6 50
7 50
8 50
9 200
6 rows selected.
Already there is a problem. Not all the standby logs are the same size. Check the far sync:
SQL> select group#, bytes/1024/1024 mb from v$standby_log;
GROUP# MB
---------- ----------
4 50
5 50
6 50
7 50
8 50
All the standby logs are 50 mb. They are too small. There are no viable standby redo logs. Check the standby:
SQL> select group#, bytes/1024/1024 mb from v$standby_log;
GROUP# MB
---------- ----------
4 50
5 50
6 50
7 50
8 50
Again, there are no viable SRLs.
Possible root cause
The online redo log size is 200 MB, which happens to be the Database Creation Assistant (DBCA) default size.
For some reason of incorrect administration, the standby redo log size was 50 MB when the database was duplicated. The standby redo log should be 200 MB.
Corrective action
- Drop all standby redo logs that are not 200 MB.
- Create a set of 200 MB SRLs per database.
Standby database
DGMGRL> edit database SF set state=apply-off; Succeeded.
SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;
GROUP# MB STATUS
---------- ---------- ----------
4 50 UNASSIGNED
5 50 UNASSIGNED
6 50 UNASSIGNED
7 50 UNASSIGNED
8 50 UNASSIGNED
SQL> alter system set standby_file_management = manual;
System altered.
SQL> alter database drop logfile group 4;
Database altered.
SQL> alter database drop logfile group 5;
Database altered.
SQL> alter database drop logfile group 6;
Database altered.
SQL> alter database drop logfile group 7;
Database altered.
SQL> alter database drop logfile group 8;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter system set standby_file_management = auto;
System altered.
SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;
GROUP# MB STATUS
---------- ---------- ----------
4 200 UNASSIGNED
5 200 UNASSIGNED
6 200 UNASSIGNED
7 200 UNASSIGNED
DGMGRL> edit database SF set state=apply-on; Succeeded.
Far sync
SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;
GROUP# MB STATUS
---------- ---------- ----------
4 50 UNASSIGNED
5 50 UNASSIGNED
6 50 UNASSIGNED
7 50 UNASSIGNED
8 50 UNASSIGNED
SQL> alter database drop logfile group 4;
Database altered.
SQL> alter database drop logfile group 5;
Database altered.
SQL> alter database drop logfile group 6;
Database altered.
SQL> alter database drop logfile group 7;
Database altered.
SQL> alter database drop logfile group 8;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;
GROUP# MB STATUS
---------- ---------- ----------
1 200 UNASSIGNED
2 200 UNASSIGNED
3 200 UNASSIGNED
4 200 UNASSIGNED
Primary
SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;
GROUP# MB STATUS
---------- ---------- ----------
4 50 UNASSIGNED
5 50 UNASSIGNED
6 50 UNASSIGNED
7 50 UNASSIGNED
8 50 UNASSIGNED
9 200 UNASSIGNED
6 rows selected.
SQL> alter database drop logfile group 4;
Database altered.
SQL> alter database drop logfile group 5;
Database altered.
SQL> alter database drop logfile group 6;
Database altered.
SQL> alter database drop logfile group 7;
Database altered.
SQL> alter database drop logfile group 8;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> alter database add standby logfile size 200m;
Database altered.
SQL> select group#, bytes/1024/1024 mb, status from v$standby_log;
GROUP# MB STATUS
---------- ---------- ----------
4 200 UNASSIGNED
5 200 UNASSIGNED
6 200 UNASSIGNED
9 200 UNASSIGNED
Double check all systems
Check NY, FS, and SF.
SQL> select count(*)numlog, bytes/1024/1024 mb from v$standby_log group by bytes;
NUMLOG MB
---------- ----------
4 200
Data Guard
After a few seconds, note that the system state is normal. The protection mode is now Maximum Availability.
DGMGRL> show configuration
Configuration - ORCL_CONFIG
Protection Mode: MaxAvailability
Members:
NY - Primary database
FS - Far sync instance
SF - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 51 seconds ago)
Conclusion
Be aware of these lessons or practices:
- Before duplicating the primary database, note the online redo log size. Check that all OLRs are the same size. Check that the SRLs are all the same size, and are the same size as the OLRs.
- Recognize that a normal Data Guard in steady state should be free of warnings.
- Before attempting to upgrade the protection level, resolve all persistent warnings.
- Drill in to errors as follows:
- show configuration
- show objecttype objectname StatusReport
- show objecttype objectname property
where property is a monitor property, LogXptStatus in this case