Sky’s the Limit – Quick Analysis and Exploitation of a Chrome ipcz TOCTOU Vulnerability

August 28, 2024

In this blog post we’ll dive into the details of an interesting vulnerability in the new IPC mechanism of Chrome, ipcz, and see how it was possible to exploit it from a compromised renderer to escape the Chrome sandbox. The following analysis and described exploitation technique is based on Chromium 113.0.5672.77.

Background

The vulnerable code was added in November 2022, in the process of adding support for ipcz to Chrome. On March 31, 2023 Mark Brand from Google Project Zero reported the vulnerability in issue 40063855 and a fix for the issue was eventually shipped with the stable branch of Chrome 114.

Introduction

ipcz is the new IPC implementation used by Chrome. It’s meant as a replacement for mojo core, intended to address some existing shortcomings regarding the routing and data transfer.

In the process of adding ipcz to Chrome, support for Parcels was added.

commit da5cd04508573976a35a81780ef12f57bfc9bee9
Author: Ken Rockot <rockot@google.com>
Date:   Wed Nov 16 01:16:25 2022 +0000

    ipcz: Introduce parcel objects
    
    This introduces parcel objects as a first-class concept of the public
    ipcz API, replacing the concept of validators. In particular,
    applications have the option to Get() parcel objects from portals
    rather than getting the parcel's data and handles directly.
    
    Data and handles can then be retrieved from a parcel object in the
    same way they can be retrieved from portals, i.e. with the usual
    Get/BeginGet/EndGet APIs.
    
    This allows applications to consume individual parcel contents with
    two-phase I/O operations (i.e. with direct access to the parcel
    memory) without tying up the receiving portal in the meantime.
    
    MojoIpcz exploits this new API feature to avoid copying parcel data
    into its own type of MojoMessage objects, instead retaining a parcel
    handle and exposing message data via a two-phase get.
    
    Bug: 1299283
    Fixed: 1384208
    Change-Id: Iafd2efb16a1aa150dffb9baba9fe445ef01763e6
    Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4023329
    Commit-Queue: Ken Rockot <rockot@google.com>
    Reviewed-by: Alex Gough <ajgo@chromium.org>
    Cr-Commit-Position: refs/heads/main@{#1071963}

Parcels are the units of data which are used to transfer application data and ipcz handles between Portals. Portals are the communication endpoints between Nodes. Every process such as the Browser, GPU, or renderer process is represented by a single node. Basically every time a mojo method call happens in ipcz, the underlying data is transferred inside a parcel.

As part of the referenced commit, code was added to construct a MojoMessage based on the data received through a parcel.

bool MojoMessage::SetParcel(ScopedIpczHandle parcel) {
  DCHECK(!data_storage_);
  DCHECK(!parcel_.is_valid());

  parcel_ = std::move(parcel);

  const void* data;
  size_t num_bytes;
  size_t num_handles;
  IpczResult result = GetIpczAPI().BeginGet(
      parcel_.get(), IPCZ_NO_FLAGS, nullptr, &data, &num_bytes, &num_handles);
  if (result != IPCZ_RESULT_OK) {
    return false;
   }
 
  // Grab only the handles.
  handles_.resize(num_handles);
  result = GetIpczAPI().EndGet(parcel_.get(), 0, num_handles, IPCZ_NO_FLAGS,
                               nullptr, handles_.data());
  if (result != IPCZ_RESULT_OK) {
    return false;
   }
 
  // Now start a new two-phase get, which we'll leave active indefinitely for
  // `data_` to reference.
  result = GetIpczAPI().BeginGet(parcel_.get(), IPCZ_NO_FLAGS, nullptr, &data,  // [1]
                                 &num_bytes, &num_handles);
  if (result != IPCZ_RESULT_OK) {
    return false;
  }

  DCHECK_EQ(0u, num_handles);
  data_ = base::make_span(static_cast<uint8_t*>(const_cast<void*>(data)),       // [2]
                          num_bytes);

At [1] the code calls BeginGet which obtains a pointer to the parcel data, letting data directly reference the underlying shared memory. Finally the code stores a span over that memory inside its data_ member at [2].

This will lead to a situation where every operation on the serialized mojo message will be performed on the underlying shared memory, making it prone to time-of-check to time-of-use (TOCTOU) issues which can be exploited by the renderer.

Choosing a Good Exploitation Primitive

This change broke the fundamental assumption in the rest of the Chrome code base that the content of a mojo message can’t be changed during deserialization and validation.

This opens up a huge can of possible ways of how this bug could be turned into a stable sandbox escape. The following sections document a few ideas and eventually describe the technique we ended up using to turn the bug into a controlled heap corruption primitive.

Bypassing Generic Mojo Validation

The vulnerability allows to bypass any validation happening on the mojo message itself. It is e.g. possible to bypass the use of the generic mojo validation methods for any sent mojo message. This can be accomplished by flipping the name field of the MessageHeader back and forth between the original value and 0xffffffff.

The generic mojo validation method starts with the following code, before it invokes the interface specific validation methods:

template <typename T>
bool ValidateRequestGenericT(Message* message,
                             const char* class_name,
                             base::span<const T> info) {
  if (!message->is_serialized() ||
      ControlMessageHandler::IsControlMessage(message)) {
    return true;
  }

In case IsControlMessage returns true, it will skip all further validation. Taking a closer look at this method, we can see that it performs its checks solely on the content of the message header:


bool ControlMessageHandler::IsControlMessage(const Message* message) {
  return message->header()->name == interface_control::kRunMessageId ||
         message->header()->name == interface_control::kRunOrClosePipeMessageId;
}

If the name field of the MessageHeader is set to kRunMessageId (0xffffffff) or kRunOrClosePipeMessageId (0xfffffffe), IsControlMessage will return true, skipping any further validation.

If we quickly set it back to its original value, the message will be handled normally, executing the mojo method implementation, but without any validation being performed on the message.

In the following sections we take a look at some assumptions in the deserialization code which are verified by one of the mojo validation methods.

Targeting Map Deserialization

The code for deserializing maps transferred over mojo has the implicit assumption that the number of keys and values will always match.

static bool Deserialize(Data* input, UserType* output, Message* message) {
    if (!input)
      return CallSetToNullIfExists<Traits>(output);

    std::vector<UserKey> keys;
    std::vector<UserValue> values;

    if (!KeyArraySerializer::DeserializeElements(input->keys.Get(), &keys,
                                                 message) ||
        !ValueArraySerializer::DeserializeElements(input->values.Get(), &values,
                                                   message)) {
      return false;
    }

    DCHECK_EQ(keys.size(), values.size());                                          // [3]
    size_t size = keys.size();
    Traits::SetToEmpty(output);

    for (size_t i = 0; i < size; ++i) {
      if (!Traits::Insert(*output, std::move(keys[i]), std::move(values[i])))       // [4]
        return false;
    }
    return true;
  }
};

If this assumption does not hold, the code will invalidate the debug assert at [3] and will access either the keys or values vectors out-of-bounds at [4].

The code which makes sure that this assumption always holds can be found in the corresponding mojo validation method:

class Map_Data {
 public:
  // |validate_params| must have non-null |key_validate_params| and
  // |element_validate_params| members.
  static bool Validate(const void* data,
                       ValidationContext* validation_context,
                       const ContainerValidateParams* validate_params) {
    if (!data)
      return true;

[...]

    if (object->keys.Get()->size() != object->values.Get()->size()) {
      ReportValidationError(validation_context,
                            VALIDATION_ERROR_DIFFERENT_SIZED_ARRAYS_IN_MAP);
      return false;
    }

    return true;
  }

Since we can bypass the use of these validation methods, we could try to trigger an out-of-bounds above. Unfortunately the libc++ hardening asserts _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS which are enabled in the std::vector implementation in Chrome, would prevent us from causing any memory corruption in this case.

Array Deserialization Code

The mojo array deserialization code looks as follows:

static bool DeserializeElements(Data* input,
                                  UserType* output,
                                  Message* message) {
    if (!Traits::Resize(*output, input->size()))
      return false;
    if (input->size()) {
      if constexpr (HasGetDataMethod<Traits, UserType>::value) {
        auto data = Traits::GetData(*output);
        memcpy(data, input->storage(), input->size() * sizeof(DataElement));
      } else {
        ArrayIterator<Traits, UserType> iterator(*output);
        for (size_t i = 0; i < input->size(); ++i)
          iterator.GetNext() = input->at(i);
      }
    }
    return true;
}

This code is prone to a TOCTOU issue, and looks potentially very interesting, since it might allow us to perform a controlled heap overflow both in the browser as well as in the GPU process. Unfortunately it seems the memcpy case is not compiled in. So we always hit the else case which uses the std::vector::[] operator to write out-of-bounds which is caught by the libc++ hardening asserts and causes a crash.

Targeting Pickle in Channel Interface of MessagePipeReader

The MessagePipeReader::Receive method implements an unpickle operation for received old-style IPC messages tunneled over mojo. If a routed IPC message (declared as IPC_MESSAGE_ROUTED) is received, the following Pickle constructor will be used:

Pickle::Pickle(const Pickle& other)
    : header_(nullptr),
      header_size_(other.header_size_),
      capacity_after_header_(0),
      write_offset_(other.write_offset_) {
  if (other.header_) {
    Resize(other.header_->payload_size);                                            // [5]
    memcpy(header_, other.header_, header_size_ + other.header_->payload_size);     // [6]
  }
}

As can be seen the code reads the mojo message header other.header_->payload_size two times in this case, making it vulnerable to a TOCTOU issue. The Resize method call at [5] reads the payload size the first time, allocating the new buffer via PartitionAlloc (instead of using shared memory like in the source Pickle).

void Pickle::Resize(size_t new_capacity) {
  CHECK_NE(capacity_after_header_, kCapacityReadOnly);
  capacity_after_header_ = bits::AlignUp(new_capacity, kPayloadUnit);
  void* p = realloc(header_, GetTotalAllocatedSize());
  CHECK(p);
  header_ = reinterpret_cast<Header*>(p);
}

The following memcpy operation at [6] reads the payload size a second time, copying the data from the received message into the allocated buffer. Racing the payload_size field of the mojo header thus allows us to perform a controlled heap corruption with controlled allocation/overflow size and controlled data. Losing the race, would just copy a smaller amount into an oversized buffer and doesn’t cause any side effects.

Due to these nice properties, we decided to exploit the vulnerability using this heap corruption primitive.

Creating a Heap Overflow Primitive

In order to trigger the controlled heap overflow, we are making use of the GinJavaBridgeHostMsg_ObjectWrapperDeleted IPC message. This is a routed old-style IPC message taking a single 32-bit integer as an argument as can be seen below:

IPC_MESSAGE_ROUTED1(GinJavaBridgeHostMsg_ObjectWrapperDeleted,
                    int32_t /* object_id */)

Since this message only takes a 32-bit integer argument its message payload is rather small. Since our heap corruption primitive is based on the payload size of the sent message we need to increase the size of the sent mojo message in this case.

We can use renderer hooks to expand the size of the sent message and to append the data to be used in the heap overflow. We then start racing the payload size field of the message header, switching it back and forth between the desired allocation size and the size to be used for the heap overflow.

There are three possible scenarios which can happen on the receiving side in the browser process which would skip the heap corruption.

Inside the MessagePipeReader::Receive method, there’s the following check:

if (!message.IsValid()) {
    delegate_->OnBrokenDataReceived();
    return;
}

Racing the payload size can let the IsValid method return false, leading to an early exit, skipping the use of the Pickle constructor, which performs the allocation/copy operation. The message will just be dropped in such a case and we can retry.

The next two scenarios which can prevent the heap corruption from happening can be found inside the Pickle constructor:

    Resize(other.header_->payload_size);
    memcpy(header_, other.header_, header_size_ + other.header_->payload_size);

Here the code could read the larger size during the Resize call and the smaller size for the memcpy operation which would just do a short-copy without any bad consequences. Finally the Resize and memcpy operations could both read the same payload size, which again would be a harmless operation.

So we can just repeatedly try to win the race until we successfully triggered the heap corruption.

Expanding IPC Block Capacity

The ipcz code uses blocks from the underlying buffer pool to store fragments. A few block buffers for certain sizes are pre-created as can be seen in the NodeLinkMemory::NodeLinkMemory method:

const BlockAllocator allocators[] = {primary_buffer_.block_allocator_64(),
                                       primary_buffer_.block_allocator_256(),
                                       primary_buffer_.block_allocator_512(),
                                       primary_buffer_.block_allocator_1k(),
                                       primary_buffer_.block_allocator_2k(),
                                       primary_buffer_.block_allocator_4k()};

If we are trying to send a message exceeding these sizes, the code tries to expand the block capacity inside NodeLinkMemory::AllocateFragment:

// Use failure as a hint to possibly expand the pool's capacity. The
// caller's allocation will still fail, but maybe future allocations won't.
if (CanExpandBlockCapacity(block_size)) {
    RequestBlockCapacity(block_size, [](bool success) {
        if (!success) {
          DLOG(ERROR) << "Failed to allocate new block capacity.";
        }
    });
}

The CanExpandBlockCapacity method checks for the IPCZ_MEMORY_FIXED_PARCEL_CAPACITY memory flag and makes sure that the total block capacity doesn’t exceed a maximum. In the case of Chrome 113, this flag is set, which prevents the renderer from expanding the block capacity for sent parcels.

When the block capacity can’t be expanded the code ends up storing the message content to be sent in a PartitionAlloc allocation instead of placing it on shared memory directly, preventing us from racing it.

In order to still expand the block capacity to the required size of our messages, we can just send a message of appropriate size, while hooking the CanExpandBlockCapacity method in the renderer to let it return true. This will end up allocating a shared memory segment of appropriate size and add it as a new block for future message transmissions.

This makes sure that we can always race sent messages, even for larger sizes.

How the Bug Was Fixed

After being reported by Mark Brand of Google Project Zero, the bug was fixed in the stable release of Chrome 114.

The following commit fixed the issue:

commit 93c6be3a42e702101af2f528bf79d624cd3bfed9
Author: Ken Rockot <rockot@google.com>
Date:   Mon Apr 3 19:43:13 2023 +0000

    MojoIpcz: Copy incoming messages early
    
    Fixed: 1429720
    Change-Id: Id6cb7269d3a3e9118cc6ff1579b56e18bf911c07
    Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4390758
    Commit-Queue: Ken Rockot <rockot@google.com>
    Reviewed-by: Daniel Cheng <dcheng@chromium.org>
    Cr-Commit-Position: refs/heads/main@{#1125510}

diff --git a/mojo/core/ipcz_driver/mojo_message.cc b/mojo/core/ipcz_driver/mojo_message.cc
index da073af255795..e362f3db6003c 100644
--- a/mojo/core/ipcz_driver/mojo_message.cc
+++ b/mojo/core/ipcz_driver/mojo_message.cc
@@ -109,23 +109,20 @@ void MojoMessage::SetParcel(ScopedIpczHandle parcel) {
 
   // We always pass a parcel object in, so Begin/EndGet() must always succeed.
   DCHECK_EQ(result, IPCZ_RESULT_OK);
+  if (num_bytes > 0) {
+    data_storage_.reset(
+        static_cast<uint8_t*>(base::AllocNonScannable(num_bytes)));
+    memcpy(data_storage_.get(), data, num_bytes);
+  } else {
+    data_storage_.reset();
+  }
+  data_ = {data_storage_.get(), num_bytes};
+  data_storage_size_ = num_bytes;
 
-  // Grab only the handles.
   handles_.resize(num_handles);
-  result = GetIpczAPI().EndGet(parcel_.get(), 0, num_handles, IPCZ_NO_FLAGS,
-                               nullptr, handles_.data());
-  DCHECK_EQ(result, IPCZ_RESULT_OK);
-
-  // Now start a new two-phase get, which we'll leave active indefinitely for
-  // `data_` to reference.
-  result = GetIpczAPI().BeginGet(parcel_.get(), IPCZ_NO_FLAGS, nullptr, &data,
-                                 &num_bytes, &num_handles);
+  result = GetIpczAPI().EndGet(parcel_.get(), num_bytes, num_handles,
+                               IPCZ_NO_FLAGS, nullptr, handles_.data());
   DCHECK_EQ(result, IPCZ_RESULT_OK);
-
-  DCHECK_EQ(0u, num_handles);
-  data_ = base::make_span(static_cast<uint8_t*>(const_cast<void*>(data)),
-                          num_bytes);
-
   if (!FixUpDataPipeHandles(handles_)) {
     // The handle list was malformed. Although this is a validation error, it
     // is not safe to trigger MojoNotifyBadMessage from within MojoReadMessage,

Instead of letting the MojoMessage reference the shared memory of the parcel, it now takes a copy of the data so that all the deserialization and validation will happen on the copy instead.

This successfully fixed the issue and made the world a safer place.

See More Blog Posts

Race conditions in Linux Kernel perf events

Foreword: We disclosed this vulnerability to the kernel security team through responsible disclosure (CVE-2024-46713). The patch on the mailing list is visible here. We are publishing the vulnerability to demonstrate that it is fully exploitable and to ensure that the technical details are available. The vulnerability seems to have been