Exploring Atomic Buffered Writes: PostgreSQL, Writethrough, and Kernel Development

By — min read

At the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit, two back-to-back sessions—which even spilled into a third slot—focused on the atomic-buffered-writes feature. This article breaks down the key discussions into a series of questions and answers, offering an engaging look at the problem, the use case driving it (PostgreSQL), and a proposed writethrough-based solution.

What is the atomic-buffered-writes feature?

Atomic buffered writes is a proposed kernel mechanism that ensures a set of writes to buffered files is completed as an indivisible unit. In current Linux buffered I/O, writes go into the page cache and are later written back to disk asynchronously. This means that if a crash occurs between writes, only a partial set of data may be persisted, leading to corruption. Atomic buffered writes guarantee that either all writes in a group are fully on disk or none are, providing crash consistency without requiring applications to use direct I/O or complex journaling. This feature is particularly valuable for databases and other applications that need reliable, high-performance updates to large data structures.

Exploring Atomic Buffered Writes: PostgreSQL, Writethrough, and Kernel Development

Why is PostgreSQL driving the need for this feature?

PostgreSQL, a popular open-source database, relies heavily on reliable storage semantics to maintain data integrity. Its write-ahead log (WAL) already provides crash recovery, but the database also issues buffered writes for data pages. Without atomic buffered writes, PostgreSQL must use techniques like full-page writes or rely on direct I/O to avoid torn pages—a 2018 study showed these workarounds can reduce performance by up to 30%. By integrating atomic buffered writes into the kernel, PostgreSQL could simplify its code, reduce overhead, and improve throughput, especially on modern storage hardware that supports atomic operations. The database community has long requested such a feature, making PostgreSQL the primary motivator for this development.

How did Pankaj Raghav and Andres Freund introduce the problem?

In the first session, Pankaj Raghav and Andres Freund laid out the core problem: buffered writes in the Linux kernel lack atomicity, leading to potential data corruption on crashes. They explained that while applications can use fsync or O_SYNC, these are expensive and still don't guarantee atomicity across multiple buffers. Freund, a PostgreSQL contributor, demonstrated how PostgreSQL currently works around this limitation with heavy-weight locking and double writes. The duo also highlighted that modern storage devices (e.g., NVMe with atomic write unit) already support atomic operations at the hardware level, but the kernel's page cache and filesystem layers do not leverage them. Their presentation set the stage for exploring a kernel-level solution that could efficiently expose hardware atomicity to user space.

What is the writethrough approach proposed by Ojaswin Mujoo?

In the second session, Ojaswin Mujoo presented a potential implementation path for atomic buffered writes based on a writethrough mechanism. Instead of buffering writes in the page cache and later flushing them asynchronously, the kernel immediately writes data to disk when a request is marked as atomic. This writethrough mode bypasses the usual writeback delay, ensuring that the data is on persistent storage before the write system call returns. Importantly, the kernel can still cache the data for reads, but writes go directly to disk. Mujoo's approach leverages existing filesystem journaling and block layer features to provide atomicity guarantees. He argued that this design minimizes changes to the existing page cache infrastructure while delivering the desired crash consistency for applications like PostgreSQL.

How does writethrough differ from the traditional page cache writeback mechanism?

Traditionally, the Linux kernel's page cache writeback mechanism periodically flushes dirty pages to disk, often many seconds after the application writes them. This delay increases performance but risks data loss on crashes. The writethrough approach, by contrast, forces a synchronous write to disk for atomic groups, eliminating the window of vulnerability. However, writethrough does not mean the page cache is bypassed entirely; the cache still holds the data for subsequent reads, but the write path writes directly to the block device. This is different from direct I/O (O_DIRECT), which bypasses the page cache entirely. Writethrough offers a middle ground: the performance of buffered I/O for reads, with the crash safety of synchronous writes for critical updates.

What were the main points of discussion among the filesystem and storage developers?

The combined sessions saw lively debate among developers from various filesystems (ext4, XFS, btrfs) and storage subsystems. Key discussion points included: (1) whether the writethrough approach could be extended to support atomic multi-file operations; (2) the need for a proper API that allows applications to specify atomic write groups; (3) how to handle devices that do not support native atomic writes (e.g., SATA SSDs or HDDs); and (4) potential interactions with existing features like DAX (direct access) and block-layer atomic write units. Several developers expressed concern about performance overhead for non-atomic writes if the implementation were not carefully designed. The consensus was that a prototype should focus on NVMe devices first, where hardware atomic support is widespread.

What is the current status and next steps for atomic buffered writes?

Following the summit, the atomic-buffered-writes patches are under active development. The immediate next step is to submit a revised patchset that incorporates feedback from the sessions, particularly around the API design and the writethrough logic. The developers plan to first implement support for a single-file atomic write group, then expand to multiple files. Testing will focus on PostgreSQL workloads to validate performance improvements and correctness. The maintainers have indicated that the feature is likely to be merged in the following kernel release cycle, pending successful reviews. The community is also exploring ways to expose hardware atomic write capabilities to user space through a new system call or extended pwritev flags.

Tags: