By Brian Fitzgerald
Introduction
Maximum Protection mode provides the highest level of data protection in Data Guard. Although a Maximum Protection system protects data as documented, there are performance issues that database administrators should be aware. The most glaring weakness is that in case of a network timeout anywhere in the system, the primary database hangs. Administrators should be aware of this before deciding on Maximum Protection mode.
Protection Modes
To quote the manual, Oracle Data Guard ensures high availability, data protection, and disaster recovery for enterprise data. Oracle Data Guard provides three distinct modes of data protection.
Maximum Performance
This is the default protection mode. It provides the highest level of data protection that is possible without affecting the performance of a primary database.
Maximum Availability
This protection mode provides the highest level of data protection that is possible without compromising the availability of a primary database.
Maximum Protection
This protection mode ensures that no data loss occurs if the primary database fails. To provide this level of protection, the redo data needed to recover a transaction must be written to both the online redo log and to the standby redo log on at least one synchronized standby database before the transaction commits. To ensure that data loss cannot occur, the primary database shuts down, rather than continue processing transactions, if it cannot write its redo stream to at least one synchronized standby database.
This article is about Maximum Protection mode. Based on the description, there are a few obvious points to be deduced about Maximum protection. A practical system must have more than one standby. Otherwise, failure of the sole standby will cause the system to crash. Secondly, a low latency network is required to achieve reasonable commit performance. Finally, completely redundant hosts and storage are required for each standby. In the past, with physical hardware and long administrative lead times, it was prohibitive to experiment with Maximum Protection. In the cloud, however, virtual machines are inexpensive and setup is quick.
License
Data Guard is a feature of the Oracle Database Enterprise Edition itself and does not require separate licensing. By using Amazon Elastic Cloud Computing, (EC2), you can control your license costs by configuring only the CPUs that you need.
Overview
The demonstration system characteristics are:
- Amazon AWS
- Region us-west-1 (Northern California)
- Availability zones us-west-1b and us-west-1c
- Elastic Compute Cloud (EC2)
- Red Hat Linux 7.2
- Oracle Grid Infrastructure 19c
- Oracle Database 19c
AWS availability zones are separate locations within the same geographic region. In us-wet-1, intra-availability-zone TCP latency, as measured by qperf, is 136μs, which is attractive for setting up Oracle Data Guard Maximum Protection. Inter-availability-zone TCP latency is 562μs. Note that availability zones identifiers are mapped independently for each account. For example, availability zone us-west-1b could refer to different locations for different accounts. Here is the mapping from database to availability zone that was used in this test.
| db unique name | availability zone |
| SFB1 | us-west-1b |
| SFB2 | us-west-1b |
| SFC1 | us-west-1c |
| SFC2 | us-west-1c |
Setup
Here is a brief summary of the preparatory steps:
- environment:
- create and configure virtual machine
- install grid infrastructure
- create ASM disk groups
- install oracle database software
- setup static listener
- duplicate database
- setup tnsnames.ora
- create orapwd
- create temporary init ora
- startup nomount
- duplicate target database for standby
- setup Oracle Restart
- validate all static listeners
- start Data Guard broker
Create configuration
DGMGRL> create configuration 'CFG' primary database is 'SFB1' connect identifier is 'SFB1'; Configuration "CFG" created with primary database "SFB1" DGMGRL> add database 'SFB2' as connect identifier is 'SFB2'; Database "SFB2" added DGMGRL> add database 'SFC1' as connect identifier is 'SFC1'; Database "SFC1" added DGMGRL> add database 'SFC2' as connect identifier is 'SFC2'; Database "SFC2" added
Set log transfer property to SYNC. Set Maximum Availability
DGMGRL> edit database SFB2 set property LogXptMode=SYNC; Property "logxptmode" updated DGMGRL> edit database SFC1 set property LogXptMode=SYNC; Property "logxptmode" updated DGMGRL> edit database SFC2 set property LogXptMode=SYNC; Property "logxptmode" updated DGMGRL> edit configuration set protection mode as MaxAvailability; Succeeded.
Test switchovers. Primary SFB1->SFB2->SFC1->SFC2->SFB1.
DGMGRL> switchover to SFB2 Performing switchover NOW, please wait... Operation requires a connection to database "SFB2" Connecting ... Connected to "SFB2" Connected as SYSDBA. New primary database "SFB2" is opening... Oracle Clusterware is restarting database "SFB1" ... Connected to "SFB1" Connected to an idle instance. Connected to an idle instance. Connected to an idle instance. Connected to "SFB1" Connected to "SFB1" Switchover succeeded, new primary is "sfb2" DGMGRL> switchover to SFC1 Performing switchover NOW, please wait... Operation requires a connection to database "SFC1" Connecting ... Connected to "SFC1" Connected as SYSDBA. New primary database "SFC1" is opening... Oracle Clusterware is restarting database "SFB2" ... Connected to "SFB2" Connected to an idle instance. Connected to an idle instance. Connected to an idle instance. Connected to "SFB2" Connected to "SFB2" Switchover succeeded, new primary is "sfc1" DGMGRL> switchover to SFC2 Performing switchover NOW, please wait... Operation requires a connection to database "SFC2" Connecting ... Connected to "SFC2" Connected as SYSDBA. New primary database "SFC2" is opening... Oracle Clusterware is restarting database "SFC1" ... Connected to "SFC1" Connected to an idle instance. Connected to an idle instance. Connected to an idle instance. Connected to "SFC1" Connected to "SFC1" Switchover succeeded, new primary is "sfc2" DGMGRL> switchover to SFB1 Performing switchover NOW, please wait... Operation requires a connection to database "SFB1" Connecting ... Connected to "SFB1" Connected as SYSDBA. New primary database "SFB1" is opening... Oracle Clusterware is restarting database "SFC2" ... Connected to "SFC2" Connected to an idle instance. Connected to an idle instance. Connected to "SFC2" Connected to "SFC2" Switchover succeeded, new primary is "sfb1"
Now we are ready to implement Maximum Protection mode
Set Maximum Protection
DGMGRL> edit configuration set protection mode as MaxProtection; Succeeded.
Test switchovers again. There is no issue
DGMGRL> switchover to SFC2 Performing switchover NOW, please wait... Operation requires a connection to database "SFC2" Connecting ... Connected to "SFC2" Connected as SYSDBA. New primary database "SFC2" is opening... Oracle Clusterware is restarting database "SFB1" ... Connected to "SFB1" Connected to an idle instance. Connected to an idle instance. Connected to an idle instance. Connected to "SFB1" Connected to "SFB1" Switchover succeeded, new primary is "sfc2"
Network failures
Failure of some, but not all standbys
Here, where you are testing actual network failures, is where you are going to notice serious issues with Maximum Protection mode.
The initial status is normal. All standbys are in sync with the primary.
DGMGRL> show configuration
Configuration - CFG
Protection Mode: MaxProtection
Members:
SFB1 - Primary database
SFB2 - Physical standby database
SFC1 - Physical standby database
SFC2 - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 21 seconds ago)
Now we want to cause a network failure to one standby, SFC2, for example. We stage the failure by cutting TCP port 1521. In the AWS console, create a new security group that allows ingress to TCP port 22 only.

Go to change security groups. Un-check the original security group, check the “ssh only” security group, and press “Assign Security Groups”. There are three standbys, but we are cutting off network access to only one standby. The system should continue to operate normally. However, right away, you can notice a problem. Try to commit a transaction.
SQL> insert into t ( n ) values ( 0 ); 1 row created. SQL> commit;
You will find that the transaction does not complete. You can run “show configuration”. For 30 seconds, the configuration shows up as normal. No messages appear in the alert log. All the while, only one standby is unreachable. Two other standbys are reachable, but the database hangs. Eventually, Oracle notices the issue. Messages start appearing in the alert log.
2019-09-13T12:43:51.076718-04:00 LGWR (PID:26748): ORA-16198: Received timed out error from KSR LGWR (PID:26748): Attempting LAD:4 network reconnect (16198) LGWR (PID:26748): LAD:4 network reconnect abandoned
The transaction commits. The configuration now appears as:
DGMGRL> show configuration
Configuration - CFG
Protection Mode: MaxProtection
Members:
SFB1 - Primary database
SFB2 - Physical standby database
SFC1 - Physical standby database
SFC2 - Physical standby database
Warning: ORA-16857: member disconnected from redo source for longer than specified threshold
Fast-Start Failover: Disabled
Configuration Status:
WARNING (status updated 8 seconds ago)
You can repeat the process for a second standby, SFC1, for example. Again, the database hangs for 30 seconds, and finally resolves.
DGMGRL> show configuration
Configuration - CFG
Protection Mode: MaxProtection
Members:
SFB1 - Primary database
Error: ORA-16778: redo transport error for one or more members
SFB2 - Physical standby database
SFC1 - Physical standby database
Warning: ORA-16857: member disconnected from redo source for longer than specified threshold
SFC2 - Physical standby database
Warning: ORA-16857: member disconnected from redo source for longer than specified threshold
Fast-Start Failover: Disabled
Configuration Status:
ERROR (status updated 160 seconds ago)
This section, where failure of some, but not all standbys leads to an application hang is going to be objectionable to some application users.
Failure of all standbys
Now disconnect the third standby. Observe this behavior:
- The primary will retry the standbys for 5 minutes.
- Sessions that issue a commit will wait.
- The primary will abort and restart.
- During restart, the instance will hang on open database.
- After 8 more minutes, the primary will abort.
Host shutdown is the expected behavior after a failure of all standbys. The following message will appear in SQL*Plus sessions that were waiting on commit.
commit * ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 10524 Session ID: 39 Serial number: 7415
Some may find it surprising that the shutdown is not instantaneous, and that it takes five minutes before the database shuts down.
Adjusting NetTimeout
You can reduce the amount of time that an application can wait on TCP timeout:
DGMGRL> edit database SFB1 set property NetTimeout = 5; Property "nettimeout" updated DGMGRL> edit database SFB2 set property NetTimeout = 5; Property "nettimeout" updated DGMGRL> edit database SFC1 set property NetTimeout = 5; Property "nettimeout" updated DGMGRL> edit database SFC2 set property NetTimeout = 5; Property "nettimeout" updated
This is a workaround and is beside the point. The application still has to wait to commit even though one or more standbys is still available. There should be no hanging.
Conclusion
There is a flaw in Data Guard Maximum Protection. If a TCP timeout error involving only one standby, the primary will hang even if other standbys are available. This issue will lead some administrators to rule out Data Guard Maximum Protection.
In the past, because of high hardware costs, long lead times, and separation of duties across DBA and SAs, it was rare to test Maximum Protection. Now, with the cloud, testing such arrangements is convenient, but testing uncovers weakness in the design.
Likewise, in the past, because of separation of duties across DBAs and network administrators, it was rare for DBAs to stage TCP timeout tests, but such tests are simple to perform now. TCP timeout tests uncover weaknesses in Data Guard.
Data Guard Maximum Availability has improved over time to the point where it is an acceptable substitute for Maximum Performance in some cases.
Oracle 12.1 introduced the far sync instance, which can be configured near a primary database on a low latency network to provide near-zero-data loss capability. Maximum Availability and far sync are described in this article. An Active Data Guard license is required for far sync. Ironically, in this sense, Maximum Availability costs more than Maximum Protection, the highest level.