Sequential data access is essential in data warehouse systems. Despite advancements such as zero-copy I/O, achieving optimal execution remains elusive due to the significant CPU cycles consumed by data access control mechanisms within the DBMS and the operating system. The OS community successfully lower latency by shifting I/O management to the database layer, they pose additional complexity, requiring DBMSs to develop and maintain custom I/O stacks. The database community has also invested substantial effort in optimizing DBMS engines to hide I/O latencies by exploiting asynchronous I/O interfaces, but this remains constrained by the unavoidable CPU overhead, thus offloading some of this burden to auxiliary resources such as OS threads.
To address this, we propose a novel DB-OS co-design, zicIO(zero-interaction & copy I/O) that automates data access control. This idea shifts the responsibility for data access management from DBMSs to a DBMS-oriented OS, specially co-designed to prefetch data just before the DBMS requires it. For this, zicIO comprises three components: (1) UzicIO (user library) that gathers precise timing information from DBMSs to predict data needs, (2) KzicIO (OS module) that automates control and directly issues I/O requests to storage devices, and (3) memSB (shared memory), small memory mapped to both the DBMS and the OS, ensuring seamless coordination between UzicIO and KzicIO.

When data warehouse queries using KzicIO are executed concurrently, they bypass many layers in DBMSs and OSs to eliminate I/O latencies and will encounter redundant data fetching due to the absence of caching layers. To address this, we propose SKzicIO (sharing-enabled KzicIO) that shares data at the OS level by manipulating page table dynamically. We evaluated three database engines with and without zicIO under standard data warehouse workloads, and supplemented microbenchmarks. The results of evaluation with legacy DBMSs showed performance enhancements of up to 9,95× under TPC-H loads. The microbenchmark results showed performance enhancements of up to 16.31× under sequential data ingestion.
Kyungmin Lim, Minseok Yoon, Kihwan Kim, Alan David Fekete, Hyungsoo Jung
Proceedings of the ACM on Management of Data (SIGMOD 2025), Volume 3, Issue 1
Article No.: 68, Pages 1 – 28
https://doi.org/10.1145/3709718
