Author
Listed:
- Zhenfei Wang
(School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China)
- Jianxun Feng
(School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China)
- Longxiang Dun
(School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China)
- Ziliang Bao
(School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China)
- Chunfeng Du
(School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China)
Abstract
Optimizing metadata indexing remains critical for enhancing distributed file system performance. The Traditional Log-Structured Merge-Trees (LSM-Trees) architecture, while effective for write-intensive operations, exhibits significant limitations when handling massive metadata workloads, particularly manifesting as suboptimal read performance and substantial indexing overhead. Although existing learned indexes perform well on read-only workloads, they struggle to support modifications such as inserts and updates effectively. This paper proposes SwiftKV, a novel metadata indexing scheme that combines LSM-Tree and learned indexes to address these issues. Firstly, SwiftKV employs a dynamic partition strategy to narrow the metadata search range. Secondly, a two-level learned index block, consisting of Greedy Piecewise Linear Regression (Greedy-PLR) and Linear Regression (LR) models, is leveraged to replace the typical Sorted String Table (SSTable) index block for faster location prediction than binary search. Thirdly, SwiftKV incorporates a load-aware construction mechanism and parallel optimization to minimize training overhead and enhance efficiency. This work bridges the gap between LSM-Trees’ write efficiency and learned indexes’ query performance, offering a scalable and high-performance solution for modern distributed file systems. This paper implements the prototype of SwiftKV based on RocksDB. The experimental results show that it narrows the memory usage of index blocks by 30.06% and reduces read latency by 1.19×~1.60× without affecting write performance. Furthermore, SwiftKV’s two-level learned index achieves a 15.13% reduction in query latency and a 44.03% reduction in memory overhead compared to a single-level model. For all YCSB workloads, SwiftKV outperforms other schemes.
Suggested Citation
Zhenfei Wang & Jianxun Feng & Longxiang Dun & Ziliang Bao & Chunfeng Du, 2025.
"SwiftKV: A Metadata Indexing Scheme Integrating LSM-Tree and Learned Index for Distributed KV Stores,"
Future Internet, MDPI, vol. 17(9), pages 1-21, August.
Handle:
RePEc:gam:jftint:v:17:y:2025:i:9:p:398-:d:1738320
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:9:p:398-:d:1738320. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.