Introducting Early Cascade Injection

This article presents Early Cascade Injection, a new process injection method that achieves stealth by avoiding cross-process APCs and minimizing remote interaction, combining ideas from Early Bird APC Injection and EDR-Preloading. It explains Windows process creation internals, how EDRs load in-process detection, and how Early Cascade uses a shim-engine callback and intra-process APC queuing to execute payloads early and reliably. #EarlyCascadeInjection #g_pfnSE_DllLoaded

Keypoints

Early Cascade Injection is a novel process-injection technique that avoids cross-process APC queuing and limits remote process interaction to reduce detection risk.
The technique hijacks the ntdll shim-engine pointer g_pfnSE_DllLoaded (enabled via g_ShimsEnabled) to achieve early code execution during LdrInitializeThunk.
Because g_pfnSE_DllLoaded is in .mrdata (writable during suspend) and g_ShimsEnabled is in .data (writable throughout), the pointer can be hijacked and then disabled without changing memory protections.
To overcome the constraints of early execution (limited dependencies and loader lock), the payload stub calls NtQueueApcThread to queue an APC on the same initial thread (intra-process APC), which executes later when NtTestAlert empties the queue.
Early Cascade thereby achieves unrestricted execution after most DLLs (kernel32/kernelbase) have been loaded while avoiding suspicious cross-process behavior that modern EDRs watch for.
EDRs commonly inject shellcode and hook LdrLoadDll, LdrpLoadDll, or NtContinue during process creation; Early Cascade aims to preempt or disrupt those initialization steps.
The technique is undocumented and relies on internal ntdll pointers, so it may be fragile across Windows updates and EDR countermeasures.

MITRE Techniques

[T1055] Process Injection – General process-injection methods used to run code in the context of another process; here implemented by writing payloads into a suspended child process and executing them early in initialization (‘Early Cascade Injection combines elements from existing techniques to achieve stealthy process injection without detection.’)
[T1055.003] Asynchronous Procedure Call – Use of APCs to execute code in a thread context; Early Cascade queues an APC on the initial thread to run the main payload when NtTestAlert empties the APC queue (‘By queuing an APC on itself, the technique allows for unrestricted execution later in the process creation.’)
[T1574] Hijack Execution Flow – Overwriting or hijacking in-process callback pointers to divert execution flow; both EDR-Preloading and Early Cascade overwrite ntdll callback pointers like AvrfpAPILookupCallbackRoutine and g_pfnSE_DllLoaded (‘hijacking the ntdll!AvrfpAPILookupCallbackRoutine callback pointer…’)

Indicators of Compromise

[Domain] referenced sources – outflank.nl, malwaretech.com
[File/module names] Windows modules involved – ntdll.dll, kernel32.dll (context: modules loaded during process creation and targets for pointer hijacking)
[Function/pointer names] targeted symbols and APIs – g_pfnSE_DllLoaded, g_ShimsEnabled, AvrfpAPILookupCallbackRoutine, NtQueueApcThread (context: callback pointers hijacked and API used to queue APCs)

This write-up introduces Early Cascade Injection, a newly described method for stealthy process injection on Windows that sidesteps cross-process APC queueing and restricts remote interaction to simple memory operations. The technique draws inspiration from Early Bird APC Injection and recent EDR-Preloading research but changes the execution path by hijacking a shim-engine callback in ntdll and then queuing an APC on the initial thread to execute a full-featured payload later in initialization.

Windows offers multiple APIs to create processes—CreateProcess, CreateProcessAsUser, CreateProcessWithLogon—and they all ultimately call NtCreateUserProcess in ntdll, which transitions into the kernel to perform the kernel-mode portion of process creation. When the CREATE_SUSPENDED flag is used, the kernel creates the initial thread in a suspended state; that thread only begins user-mode process initialization when resumed, starting at LdrInitializeThunk, the image loader in ntdll. LdrInitializeThunk performs several important tasks: it initializes the Loader Lock in the PEB, sets up the .mrdata section, inserts the first LDR_DATA_TABLE_ENTRY for ntdll into the PEB module lists, configures the parallel loader, creates the LDR_DATA_TABLE_ENTRY for the executable, and maps and initializes initial modules such as kernel32.dll and kernelbase.dll. Only after mapping and initializing dependencies does LdrInitializeThunk call NtTestAlert (which empties the calling thread’s APC queue) and then NtContinue to transfer execution to the application entry point.

Because LdrInitializeThunk orchestrates both mapping and initialization of DLLs, a number of techniques have targeted specific points in this sequence to obtain code execution before EDR detection measures are active. Early Bird APC Injection, discovered in 2018, uses cross-process APC queuing: an attacker creates a process suspended, writes a payload into the target, queues a QueueUserAPC to the target thread, and then resumes the thread so that NtTestAlert invokes the queued APC during LdrInitializeThunk, executing the injected code early in process initialization. That early execution can preempt some user-mode detection hooks, but cross-process APC queueing is now a suspicious behavior that many EDRs monitor closely, making Early Bird more detectable than before.

EDR-Preloading, a more recent idea, achieves early execution by overwriting callback pointers in ntdll, such as AvrfpAPILookupCallbackRoutine, and enabling the callback flag so the malicious pointer runs during LdrLoadDll. This runs extremely early—when only ntdll is reliably present—allowing an attacker to potentially prevent an EDR’s hooking DLL from initializing. However, code run through AvrfpAPILookupCallbackRoutine is highly constrained: only ntdll and its undocumented NTAPI functions are reliably available, and Loader Lock is held while the callback runs, preventing safe loading of additional DLLs or spawning threads because those operations would deadlock the loader lock. As a result, EDR-Preloading gives early but limited execution capability.

During analysis of LdrInitializeThunk and related functions, the authors discovered an alternate callback pointer in ntdll named g_pfnSE_DllLoaded that belongs to the Shim Engine and is located in the .mrdata section. This pointer is invoked within LdrpSendPostSnapNotifications as part of LdrpPrepareModuleForExecution, and unlike AvrfpAPILookupCallbackRoutine it appears not to run under the Loader Lock. The shim-engine enable flag g_ShimsEnabled resides in the .data section and can be toggled without changing memory protections. Because .mrdata is still writable when the target process is in CREATE_SUSPENDED state, and because g_ShimsEnabled remains writable later, an attacker can write a small payload stub into the process, set g_pfnSE_DllLoaded to that stub, and set g_ShimsEnabled to 1 so the stub runs during process initialization. The stub immediately clears g_ShimsEnabled to avoid executing other uninitialized shim pointers, so the process does not crash.

Even though g_pfnSE_DllLoaded runs outside Loader Lock, the environment is still somewhat constrained and the researchers found that fully functional shellcode could not reliably run directly from that callback. To overcome those constraints, they designed a two-stage approach: the payload stub executed via g_pfnSE_DllLoaded invokes NtQueueApcThread (available in ntdll and not subject to Loader Lock) to queue an APC on the initial thread itself (intra-process APC). By queuing the APC early but on the same thread, the sample avoids cross-process APC behavior while ensuring the queued APC will be executed later when LdrInitializeThunk calls NtTestAlert to empty the APC queue. That later execution occurs after most DLLs—including kernel32.dll and kernelbase.dll—are loaded and initialized, providing a richer environment for a main payload such as an implant and avoiding deadlocks associated with loader synchronization primitives.

Putting these parts together, Early Cascade Injection proceeds by creating a child process in suspended mode, writing a two-part payload to its memory (a small payload stub and a larger main payload), and then overwriting the child’s ntdll g_pfnSE_DllLoaded pointer with the payload stub and setting g_ShimsEnabled to 1. When the suspended process is resumed, the initial thread runs LdrInitializeThunk and eventually triggers g_pfnSE_DllLoaded, which runs the payload stub. The stub disables g_ShimsEnabled to prevent other shim callbacks from firing, and then uses NtQueueApcThread to queue an APC that points to the main payload in process memory. Later, NtTestAlert empties the APC queue, the APC runs in the thread context with a full set of initialized DLLs, and the main payload executes. Because the queued APC is intra-process rather than cross-process, Early Cascade avoids the suspicious cross-process indicators that modern EDRs often flag, while still achieving early and powerful execution.

This method offers several operational advantages: it limits remote actions to memory allocation, protection, and writing; it hijacks a pointer in .mrdata while relying on a writable flag in .data to enable and then safely disable the shim callback without changing memory protections; and it moves heavy lifting out of the constrained early callback into an APC that runs after the loader has completed most initialization. At the same time, the technique depends on undocumented internal pointers in ntdll and the behavior of loader internals, so it may be fragile across Windows updates or changes to the shim engine.

The researchers also examined how EDRs load their user-mode detection measures so they can better position their early-control techniques. Many EDRs register kernel process-creation callbacks and, when a new process is created, inject shellcode and place inline hooks on functions inside LdrInitializeThunk—such as LdrLoadDll, LdrpLoadDll, or NtContinue—to redirect execution to the injected shellcode that loads the EDR’s hooking DLL. That shellcode typically invokes LdrLoadDll to load the hooking DLL, whose DllMain then installs hooks on API functions; after loading, the EDR removes the inline hook and lets the process proceed. Depending on where an EDR inserts its hook, attackers can attempt to preempt or disrupt the EDR’s initialization: techniques that execute before those hooks are invoked can run without being intercepted, while g_pfnSE_DllLoaded provides a point to interfere with a DLL’s initialization so that a hooking DLL may not initialize correctly. In short, Early Cascade and similar callback-hijacking approaches aim to take control before EDR user-mode measures are fully active or to interrupt their initialization sequence.

In conclusion, Early Cascade Injection combines the early control of callback-pointer hijacking with a safer, intra-process APC handoff to execute a robust payload at a point in initialization where most DLLs are present and EDR hooks may not have completed their setup. The approach reduces noisy cross-process behaviors, minimizes remote interaction, and demonstrates how deep knowledge of LdrInitializeThunk and related loader internals can yield new injection primitives. Because of operational risk, the authors have not released source code publicly; the techniques and implementations are available to vetted members of Outflank Security Tooling (OST) rather than being published broadly. Practitioners and defenders should note that the method relies on undocumented ntdll internals and that EDR vendors could adapt detection strategies or change their load timing to mitigate it in future updates.

SHARE THIS STORY

WhatsApp X (Twitter)Telegram Bluesky Facebook LinkedIn Threads Email Print

Introducting Early Cascade Injection | Outflank Blog