Introducing CWE-1265: A New Way to Understand Vulnerable Reentrant Control Flows

August 27, 2020 | Simon Zuckerbraun

On June 25, 2020, the MITRE Corporation released version 4.1 of the CWE List1. Among the changes was the addition of a new software weakness entry that I contributed: CWE-1265: Unintended Reentrant Invocation of Non-reentrant Code Via Nested Calls. In this article, we’ll examine the significance of this new CWE entry and explore some examples in various categories of software.

The publication of this CWE is not intended to herald a “new bug class”. Rather, it is intended as a new conceptual framework for understanding certain vulnerabilities that have been known to exist and continue to be found today. Frequently these appear as use-after-free vulnerabilities. This CWE reveals a deeper aspect to their root cause. I hope that this CWE will bring you a new perspective on these vulnerabilities that will prove useful to you in your research.

Background: Non-Reentrant Code

Problems of reentrance have long been recognized as a source of errors in software. Code is called “non-reentrant” if it has not been designed for safe operation in the presence of reentrant execution. “Reentrant execution” is defined as the entrance of a unit of code during a time frame in which that unit of code is already executing.

When a unit of code is non-reentrant, the hazard generally stems from the way it modifies global state. By “global state” we mean to include any data storage except for those local variables with duration limited to a single function invocation. Global state includes not only what is commonly referred to as global variables and buffers, but also includes general heap allocations and object-based storage.

While discussions of non-reentrant code have frequently centered on non-reentrant functions, I find it useful to expand the definition to include larger units of code. In general, any unit of code responsible for managing a unit of global state can be termed “non-reentrant” if the way it modifies global state presents reentrance hazards. For example, a C++ class is an example of a unit of code that has responsibility for managing a unit of global state, namely, the class’s own instance fields. Very commonly, a C++ class will be written with the assumption that, given a single instance of the class, public methods of that instance will be called only in a sequential and non-overlapping manner. The method implementations do not expect to be interrupted by other method invocations on that same class instance, except for those caller-callee relationships that are inherent in the implementation, where one method is defined in terms of another.

For example, in the following C++ code, the class defined is non-reentrant:

Figure 1: A non-reentrant class

This class is responsible for managing the state of the two fields x and xSquared. Upon return from a call to set_x, these two fields should always be consistent with one another. ClassA is not designed to be reentrant. Were the execution of set_x to be interrupted by a second invocation of set_x on the same instance, the following might occur:

     -- The first invocation begins executing. Let’s say the argument passed is 2.
     -- The first invocation sets field x to 2.
     -- A second invocation now interrupts (we will discuss possible mechanisms later in this article). Let’s say the argument passed to this invocation is 3.
     -- The second invocation starts and finishes, setting field x to 3 and field xSquared to 9.
     -- The first invocation resumes, and changes xSquared to 4.

The final result is that the fields of this class instance will be left in an inconsistent state, with x set to 3 and xSquared set to 4. ClassA is non-reentrant because it does not expect callers to invoke its public methods in an interleaved fashion. This does not mean that ClassA is buggy. On the contrary, what is shown in ClassA is typical and perfectly acceptable coding practice. What it does mean is that all code using ClassA must be written in a way that ensures that there will be no reentrant invocations of methods of any single instance of ClassA.

Conversely, when we consider the constructor of ClassA, we may note that method set_x is invoked in the middle of a call to the constructor, but this is not a matter for any concern. This is a fixture of the design of ClassA itself, and ClassA is specifically intended to operate in this fashion.

Classic Causes of Non-Reentrant Execution

What type of circumstances produce reentrant execution? Classically, two scenarios are discussed:

1 - Signal handling: In this scenario, an event occurs that disrupts the normal execution of the program’s instructions. This may be a hardware interrupt, a processor fault, or a “signal” artificially injected into the running process via an OS-defined mechanism. In any of these events, normal program flow is diverted suddenly to a handler routine. At the conclusion of the handler routine, normal program flow may be resumed. Extreme care is in order when designing the handler routine. Since the handler routine can be invoked at arbitrary points within program execution, it is possible that non-reentrant code will be running at the moment the handler begins. If the handler routine itself invokes that non-reentrant code, state may be corrupted. This scenario is covered by CWE-479: Signal Handler Use of a Non-reentrant Function.

2 - Multithreading: In this scenario, the non-reentrant code is entered on more than one thread2. If an invocation on one thread has not yet completed by the time a second invocation begins on another thread, state may be corrupted. This scenario is covered by CWE-663: Use of a Non-reentrant Function in a Concurrent Context.

A New Cause of Reentrant Execution

CWE-1265 describes a newly-recognized scenario for producing reentrant execution. As opposed to the two scenarios discussed in the previous section, this new cause of reentrancy is likely to occur only in complex software systems. It especially impacts systems in which there is interaction between trusted and partially-trusted code, such as web browsers.

The fundamental issue is this: When systems reach a certain level of complexity, it becomes overly taxing upon the developers to always fully recognize and anticipate all possible code paths that can emanate from a given function call. This is because a single function call can produce a deep call stack of other, nested calls, and a single call can lead to a great variety of different possible trees of nested invocations of various functions. An adversary may be able to select from among a very large number of these possible call trees by crafting input data or by performing carefully-chosen interactions with the system. This is especially true if the adversary is permitted to provide partially-trusted code, such as script, that will be executed (or interpreted) on the thread. The resulting risk is that a function call made from within non-reentrant code will lead to a hazardous reentrance via a deep and unanticipated chain of nested calls.

In this way, non-reentrant code can be reentered, even though no signal has occurred, and even though no other thread has intervened. Instead, the reentrance occurs purely via a nesting of calls. When a vulnerability arises in this way, then it is described by our new CWE, CWE-1265: Unintended Reentrant Invocation of Non-reentrant Code Via Nested Calls.

It is noteworthy that this CWE has nothing to do with concurrency. The relevant execution occurs exclusively on one thread. In fact, it is entirely possible for this CWE to apply to a vulnerability in a single-threaded application. It follows, therefore, that it cannot be mitigated by use of synchronization primitives.

In some instances, the vulnerability arises due to a recursive invocation (either direct or indirect) of a single function that should not be invoked recursively. However, this is not generally the case. Rather, the key point is that some non-reentrant unit of code is reentered via a stack of calls. For example, during the execution of some public method of an object, a sequence of nested calls may arise that ultimately invokes a different public method of that same object. As discussed above, objects implemented as C++ classes are frequently written with the assumption that a client will invoke only one public operation at a time, and that the next public operation will not begin until the last one has returned. In our vulnerable scenario, this assumption is broken.

Vulnerabilities of this type have a major advantage over other reentrance-type bugs in terms of their exploitability. When reentrance occurs due to a signal handler or due to multithreading, there is an inherent race condition. The final state of the system will hinge upon slight differences in timing, for example, at which instruction the original execution is preempted. By contrast, CWE-1265 describes a class of vulnerabilities that are entirely deterministic and therefore lend themselves more readily to reliable exploitation.

As mentioned in the introduction to this article, a common consequence of CWE-1265 is a use-after-free. A use-after-free can easily occur if the attacker can cause the nested invocation to free memory that the outer invocation expects to remain allocated throughout its course of execution.

Next, we’ll explore the details of some published vulnerabilities that can be understood retroactively as examples of CWE-1265. We present three case studies.

Case Study 1: ZDI-CAN-3499 / CVE-2016-0114: Microsoft Internet Explorer Input Range Control Use-After-Free Remote Code Execution Vulnerability

ZDI-CAN-3499 is a vulnerability in Internet Explorer I discovered in January of 2016. Triggering this vulnerability produces a crash showing that a use-after-free has occurred. A deeper analysis, however, indicates that the true issue is one of reentrance.

The UAF is of a 0x90-byte structure owned by a CInput object (an object representing an HTML <input> element). The 0x90-byte structure does not have a name in public symbols, but from analysis, it is clear that it maintains the state of a slider UI. Accordingly, we will call it the Slider structure. It is instantiated only for input elements of the form <input type="range">.

Since the Slider structure is relevant only when the type attribute is range, IE will free the structure if script changes the type to anything other than range. This can result in a use-after-free. Although a fully functional PoC requires a great deal of additional complexity3, the essence of the trigger is as follows:

Figure 2: Essential code to trigger ZDI-CAN-3499

The PoC changes the CSS width attribute of the input element to cause a firing of the onresize event. func2 handles this event and changes the width property again, repeating the process several times until the vulnerable code path is invoked. During the final iteration, func2 changes the type attribute from range to text, freeing the Slider structure. The Slider structure is re-used shortly afterward for a UAF.

Here is the call stack at the time that the Slider structure is freed. All analysis shown below is from MSHTML 11.0.9600.18125 (Windows 7 x86, Dec. 2015 patch level).

Figure 3: Call stack for ZDI-CAN-3499 at time of free

Note the presence of CInput::CacheTrackContentRect on the call stack. This function plays a key role in the vulnerability, because it gives rise to the erroneous re-use of the Slider. The following code is excerpted from CInput::CacheTrackContentRect:

Figure 4: ZDI-CAN-3499: Excerpt of CInput::CacheTrackContentRect

CInput::CacheTrackContentRect retrieves the pointer to the Slider from a field of CInput located at offset +0xA0. It then calculates offset +0x7c within the Slider structure, and passes this address as a parameter to CElement::GetWidthHelper. CElement::GetWidthHelper uses it as an out parameter, meaning that it will write a result to the specified address before returning. The vulnerability arises because the Slider is freed prior to CElement::GetWidthHelper writing to the out parameter. When CElement::GetWidthHelper ultimately performs the write, it writes to memory that has already been freed. Hence this vulnerability is a use-after-free of a Slider structure.

With our understanding of CWE-1265, however, we can express the cause of this UAF with greater specificity. Referring to the call stack in Figure 3, notice that CInput::CacheTrackContentRect makes a call to CElement::GetWidthHelper, and this call results in a deep and convoluted call stack. Ultimately it produces a new invocation of a method of the CInput object, namely CInput::OnPropertyChange, modifying the state of the CInput and destroying of the Slider. The code within CInput::CacheTrackContentRect, as shown in Figure 4, was not written to expect that the seemingly harmless call it makes to CElement::GetWidthHelper might produce re-entrant invocation of arbitrary methods of CInput, modifying the state of the CInput object.

Our conclusion is that the true root cause of ZDI-CAN-3499 is an unforeseen reentry of the CInput implementation. The authors of the code in MSHTML did not anticipate that new, arbitrary operations on the CInput would be initiated from within CInput::CacheTrackContentRect. Yet the call stack shows that, in the presence of adversarial input, it is indeed possible to produce such a nested invocation. The nested invocation alters state maintained by the CInput, violating an assumption about the state of the CInput made by the enclosing invocation of CInput::CacheTrackContentRect.

As illustrated in Figure 3, the hallmark of CWE-1265 is a call stack that unexpectedly reenters a unit of code that should not be reentered (here, the implementation of class CInput) via a convoluted and unforeseen sequence of nested calls, influenced by adversarial input.

Case Study 2: ZDI-CAN-6129 / CVE-2018-8275: Microsoft Chakra Array.splice Use-After-Free Remote Code Execution Vulnerability

This is a vulnerability in array handling in Microsoft’s Chakra JavaScript engine that I discovered in April 2018.

For a proper understanding of this vulnerability, we must start with a brief introduction to the way Chakra represents arrays in memory. Array storage in any modern JavaScript engine is quite a large topic, but the points we must understand at the moment are as follows: In JavaScript, it’s possible for a script to create an array and assign into various indexes in a non-contiguous fashion, making the array “sparse”. It is also possible and quite common for a script to create a more traditional sort of array, containing a large range of contiguous indexes. The JavaScript engine must be able to represent both kinds of arrays efficiently in memory. In Chakra’s implementation, a JavaScript array is represented by a C++ object of type JavascriptArray, which maintains a linked list of array segments, each represented by a structure of type SparseArraySegment. Each segment stores the contents of a single contiguous range of indexes. This scheme solves the memory efficiency problem. A sparse array can be efficiently stored using a large number of small segments, while array with a large contiguous range of indexes can be efficiently stored with a single large segment. One problem remains, though: the segments are stored in a linked list, which does not offer time-efficient random access to indexes. To remedy this situation, a JavascriptArray can also maintain a hash map that maps indexes to pointers to the proper segments. This is known as the “segment map” and functions as a sort of cache. It is built only on demand. Prior to any array modification for which the map cannot be updated in a straightforward fashion, Chakra simply discards the map. This is called “clearing” or “dumping” the map. In that case, the map will be rebuilt the next time it is needed. At this point, you may have guessed that we will be talking about a cache invalidation issue, and you would be exactly right.

The vulnerable code I noticed was in ChakraCore/lib/Runtime/Library/JavascriptArray.cpp, in method JavascriptArray::TryArraySplice, which is part of the implementation of Array.prototype.splice:

Figure 5: Vulnerable code in function JavascriptArray::TryArraySplice

Before performing any operations that make changes to pArr, the code calls pArr->ClearSegmentMap() to invalidate the segment map, since the data in the map might no longer be correct once we start making changes to the segments of pArr. The trouble is that right after clearing the map, the engine invokes the object’s species constructor. This provides an opportunity for the attacker’s script to execute and initiate a new operation on array pArr that rebuilds the segment map. After return from the species constructor, TryArraySplice proceeds to make modifications to pArr. Potentially, this deletes some segments of pArr that are still referenced by the stale segment map, producing a UAF.

To state this in terms of reentrance: The author of the code in Figure 5 assumed that there would be no reentrance of methods that modify pArr from within TryArraySplice, implying that once the segment map was cleared, it could be assumed safely that it would stay cleared for the remainder of the call to TryArraySplice. Through adversarial input, we can produce such a reentrant call, disturbing the assumed safe state of the segment map.

The relevant call stack is shown below.

Figure 6: Reentrance of a method on JavascriptArray caused by adversarial input

As usual, the hallmark of CWE-1265 is a convoluted call stack showing an abnormal and unforeseen inner invocation of a non-reentrant unit of code while an outer invocation is in the midst of execution.

Case Study 3: ZDI-CAN-6343 / CVE-2018-8420: “Double Kill” Vulnerability in MSXML6

Attackers can use the VBScript feature Class_Terminate to produce a variety of anomalous control flows. I detailed a number of these in a blog post in 2018. Several of these anomalous control flows can be aptly described by CWE-1265. For one clear example, see the second bug highlighted in that post.

Here, though, I would like to present a variation on the “Double Kill” vulnerability discussed at the end of that post. The term “Double Kill” has been applied to a cluster of vulnerabilities in which Class_Terminate produces a reentrant invocation of VariantClear. The effect is an erroneous double decrement of the reference count of a COM object pointed to by the VARIANT. Thus, “Double Kill” vulnerabilities are inherently examples of CWE-1265 since they involve a reentrance of VariantClear (for the same VARIANT) via nested calls. Microsoft patched VariantClear in May 2018, changing its implementation to make it essentially safe for reentrance.

However, it turned out that there were some variations on “Double Kill” that did not involve the function VariantClear and were not remediated by the patch May 2018 patch. ZDI-CAN-6343/CVE-2018-8420 is a bug in MSXML6 that I discovered in May 2018. Class_Terminate can be used to produce reentrance of the property setter of the onreadystatechange property of an XMLHTTP object. The PoC is as follows:

Figure 7: PoC for CVE-2018-8420

At [1], the first property set assigns an instance of MyClass into the property. At [2], the setter function put_onreadystatechange first calls Release on the MyClass instance that is the current value of the property, thus invoking Class_Terminate. Within that function, at [3], put_onreadystatechange is invoked in a reentrant fashion for the “Double Kill”. This produces a second, spurious decrement of the reference count of the MyClass instance. When execution arrives at [4], obj contains a reference to the MyClass instance, but by that time it has already been freed due to the extraneous decrement of its reference count.

A call stack showing the reentrance:

Figure 8: Call stack for CVE-2018-8420 showing reentrance

Figure 8 shows that XMLHttp::put_onreadystatechange has been reentered via an unexpected chain of nested calls, which is the hallmark of CWE-1265.

Conclusion

Anomalous reentrant call stacks are the root cause of numerous vulnerabilities found in complex software systems. This is particularly true in software that includes an engine that executes or interprets untrusted script. The presence of untrusted script in an execution environment can easily turn what would ordinarily be a harmless call into a jumping-off point for a convoluted sequence of nested calls, ultimately arriving at a reentrant invocation that modifies data in a way that violates assumptions made by the original caller. I hope this discussion serves as a useful conceptual framework when hunting and analyzing these types of vulnerabilities.


1 CWE (Common Weaknesses Enumeration) is the industry-standard canonical list of types of weaknesses commonly found in hardware and software.

2 Some draw a semantic distinction, and refer to a hazard due to multithreading as a “thread safety” issue as opposed to a “reentrance” issue https://en.wikipedia.org/wiki/Reentrancy_(computing). However, I use the term “reentrance” throughout, in accordance with the usage of the term in CWE-663.

3 The difficulty arises because the trigger as shown here will first crash due to an unrelated UAF, and this UAF is non-exploitable due to MemGC.