by Brian Fitzgerald
Executive summary
Copy Fail (CVE-2026-31431) is a Linux kernel local privilege escalation that lets an unprivileged user become root with a single deterministic exploit. It is dangerous mainly in combination with remote vulnerabilities: any foothold that runs code as a non-root user — a compromised web service, a container — chains into root on the host. The fix is to install the patched kernel and reboot. Two things are easy to miss: instances launched from older AMIs, and instances restored from pre-patch snapshots, are vulnerable until they too are patched and rebooted. For hosts that can’t be patched immediately, blacklisting algif_aead via the kernel command line is an effective interim mitigation.
Introduction
Copy Fail (CVE-2026-31431) is a local privilege escalation (LPE) vulnerability in the Linux kernel’s algif_aead module, the AEAD interface of the kernel’s user-space crypto API (AF_ALG). It was disclosed publicly on April 29, 2026, and it affects essentially every mainline Linux kernel released since 2017 — across Ubuntu, RHEL, Amazon Linux, SUSE, Debian, and others. Its CVSS 3.1 score is 7.8 (High).
The bug is, on its own, only a local privilege elevation: the attacker must already have unprivileged host access. The danger is in what this vulnerability can be paired with. Any remote vulnerability becomes a path to root on the host. That combination — a remote foothold plus a reliable, deterministic LPE — is what makes Copy Fail a priority to address.
The fix is a kernel update. Three operational details are worth keeping in mind:
- Patched hosts must be rebooted. The fixed kernel does not take effect until it is the running kernel.
- Newly launched EC2 instances must be patched too, even if the AMI was current as of last week. Any AMI built before the fix landed in the vendor’s repos ships a vulnerable kernel.
- Instances restored from snapshots inherit the kernel of whatever was running when the snapshot was taken. A snapshot from before the patch is a vulnerable host, regardless of how recently it was restored.
This article walks through patching a single RHEL 10 EC2 instance on my personal AWS account and confirming, with a public proof-of-concept, that the patched kernel is no longer exploitable. It also revisits the questions and workarounds that came up in the days between disclosure and the kernel patch becoming available.
Demonstrate the bug
The instance is running RHEL 10.1 with the pre-patch kernel. Logged in as the unprivileged ec2-user, running the public proof-of-concept yields a root shell:
[ec2-user@ip-10-2-0-34 ~]$ python3.12 copy_fail_exp.py
[root@ip-10-2-0-34 ec2-user]# whoami
root
[root@ip-10-2-0-34 ec2-user]# id -u
0
No password prompt, no SSH key, no sudo — the exploit runs from a normal user shell and the next prompt is root. This is the deterministic behavior the vulnerability is known for: no race condition, no offsets, no retries.
The exploit script targets a setuid-root binary — /usr/bin/su — and on the unpatched kernel it tampers with that binary’s in-memory pages so that invoking it yields a root shell.
Day 0: can I just uninstall the affected piece?
On the day of disclosure, before any vendor guidance was out, the natural first question was whether the vulnerable subsystem could simply be removed — uninstall aead, uninstall authencesn, uninstall AF_ALG. The answer is no. There’s no algif-aead package, no af-alg package, no authencesn package. The vulnerable subsystem is part of the kernel image itself — built in on the affected distributions, not shipped as a separately installable or removable component. There is nothing to uninstall. The only way to change the code in question through the package manager is to replace the kernel, and on day 0 there was no fixed kernel to replace it with.
That left waiting for vendor guidance.
Day 1: blacklist algif_aead with grubby
The first official mitigation, ahead of the patched kernel, was to blacklist the algif_aead initcall on the kernel command line via grubby and restart:
[root@ip-10-2-0-34 ~]# grubby --update-kernel=ALL --args="initcall_blacklist=algif_aead_init"
After the restart, the AEAD interface of AF_ALG is never registered. The exploit script doesn’t get as far as the page-cache write that fails on the patched kernel — it fails earlier, at the system calls that set up the AF_ALG socket. With algif_aead blacklisted, bind() fails with FileNotFoundError. If AF_ALG is blacklisted entirely, socket() fails first, with EAFNOSUPPORT. Either way, the exploit doesn’t run.
Patch the kernel
The fix is delivered through the standard kernel package. On RHEL 10:
[root@ip-10-2-0-34 ~]# dnf update kernel
and so on …
Installing:
kernel x86_64 6.12.0-124.55.1.el10_1 rhel-10-baseos-rhui-rpms 1.4 M
… and so on
Installed:
kernel-6.12.0-124.55.1.el10_1.x86_64 kernel-core-6.12.0-124.55.1.el10_1.x86_64 kernel-modules-6.12.0-124.55.1.el10_1.x86_64 kernel-modules-core-6.12.0-124.55.1.el10_1.x86_64
Complete!
Restart the host
I restarted the EC2 instance from the AWS console. You can also issue reboot.
Demonstrate that the bug is patched
[root@ip-10-2-0-34 ~]# uname -r
6.12.0-124.55.1.el10_1.x86_64
[root@ip-10-2-0-34 ~]# exit
[ec2-user@ip-10-2-0-34 ~]$ python3.12 copy_fail_exp.py
Password:
On the patched kernel the tampering step does not happen, so /usr/bin/su runs normally and prompts for a password, which is exactly what an unprivileged user trying to become root should encounter.
Once the kernel is patched, the aead blacklist can be removed. Legitimate code can resume using the kernel’s user-space crypto interface.
What’s interesting about the chain
What stands out about Copy Fail is how unremarkable each step of the exploit is on its own. Reading a file as an unprivileged user is allowed. The kernel’s page cache is unified, so the pages a read brought in are the same pages a later execution would run from — that’s a feature, not a bug, and it’s what makes cache-warming tricks work. Opening a socket — including the user-space crypto sockets the AEAD interface provides — is allowed. Splicing data between file descriptors is allowed. None of these steps requires elevated privileges, and none of them looks suspicious in isolation. The exploit is a sequence of ordinary, sanctioned operations strung together so that a kernel weakness at one step lands a specially crafted change in a place that affects what the next setuid execution does. That’s what makes a local privilege elevation worth taking seriously even when the immediate prerequisite is “any unprivileged code execution on the host” — the building blocks are everywhere.
What this means going forward
Copy Fail is unlikely to be the last bug of this shape. About a week after Copy Fail was disclosed, a second LPE was disclosed under the name Dirty Frag (CVE-2026-43284 and CVE-2026-43500). Different kernel subsystems — xfrm-ESP and RxRPC — and different code paths, but the same underlying pattern: an in-place operation writing into page-cache pages that aren’t privately owned by the kernel.
The page cache is a large, well-trafficked piece of kernel infrastructure, and the in-place-optimization pattern that Copy Fail and Dirty Frag both exploit shows up in more than a few places. It would be surprising if the next several months didn’t bring more bugs in this family.
Two things about how these bugs are being found and disclosed are worth flagging. First, AI-assisted code analysis is now a real factor in kernel vulnerability discovery. Copy Fail was found that way — the researchers who disclosed it have said so explicitly — and the fact that the underlying weakness had been sitting in the kernel since 2017, through nine years of human review, suggests the new tooling is reaching code paths and combinations that didn’t get attention before. The same is likely true of Dirty Frag, where the older of the two bugs also dates to 2017. The economics of this kind of analysis are different from a human researcher’s: it scales, it doesn’t get bored, and it can re-examine entire subsystems whenever a new pattern is identified. The pace of discoveries should be expected to follow.
Second, the window between disclosure and a working public exploit is shrinking. Copy Fail had a working PoC out the day of disclosure. Dirty Frag was disclosed ahead of schedule because a third party broke the coordinated embargo, and a working PoC was public before any distribution had a patched kernel ready to ship. “Patch as soon as the vendor ships” is still the right answer, but it presupposes a vendor patch is available. When the disclosure outruns the patch, the question on the table is what the fleet does in the interim — which mitigation, applied through which mechanism, tracked how, removed when. That used to be an exceptional case; it is becoming an ordinary one.
The combination — more bugs, found faster, disclosed with less runway — argues for a fleet posture that can move quickly through several states. A known patching path for when patches exist. A known mitigation path for the gap before they do. A way to know, for any given host, which state it’s in. The specific bugs will change. The shape of the response is what’s worth investing in.
Conclusion
On its own, Copy Fail is a local privilege elevation — it requires the attacker to already have code execution on the host. In a cloud environment, that prerequisite is met by a wide range of common scenarios. Any remote bug in a public-facing service, any compromised CI runner, any malicious dependency that runs during a build, any container that can be coerced into executing attacker-supplied code, is enough of a foothold to chain into root on the host. The LPE is what turns a low-impact remote bug into a host compromise.
Patching the kernel and restarting the instance is the fix. Two operational notes worth keeping in mind:
- New instances need to be patched, too. An AMI that was current last month is not current now. Any instance launched from a pre-patch image needs
dnf update kernel(or the equivalent) and a restart before it should be considered safe. - Snapshots carry the kernel that was running when they were taken. Restoring a snapshot from before the fix produces a vulnerable host, even if the restore happened five minutes ago. The same patch-and-restart step applies.
Each host should land in one of two states: a patched kernel verified by uname -r, or the vulnerable code path made unreachable through a tracked mitigation such as the initcall_blacklist boot argument. Hosts in neither bucket are still exposed.