Introduction¶
SiftRAG is a follow-up to the DimCiM project, extending its exploration of energy-aware computation into the domain of neural retrieval. While DimCiM focused on similarity computation under a Compute-in-Memory (CiM) abstraction for classification workloads, this project establishes a distinct research track to reflect a new focus: studying retrieval as a first-class similarity-computation workload. The work is conducted by Thibaud Clement under the supervision of Professor Sara Achour as part of the Stanford Novel Computing Systems Lab.
Retrieval workloads are becoming a dominant component of intelligent systems at scale. In Retrieval-Augmented Generation (RAG) pipelines, retrieval often forms the bottleneck, with energy cost growing with data (corpus size), not model parameters. As modern AI shifts from training large models to continuously deploying intelligence for search, recommendations, assistants, and edge inference, a substantial fraction of system energy is spent on memory movement and similarity comparisons rather than learning. If retrieval scales naively with corpus size, it becomes a first-order barrier to sustainable intelligent systems.
We therefore study retrieval through the lens of representation robustness. Concretely, we treat retrieval as a similarity-computation workload and systematically reduce the amount of information examined per query, measuring how retrieval quality degrades and how those reductions translate into lower digitization and data-movement cost under a CiM-like model. By making the cost of information materialization explicit, we aim to expose opportunities for sustainable scaling of retrieval workloads.
This framing is guided by three central questions: (1) How much of a high-dimensional embedding must be examined to preserve retrieval quality? (2) How do different information-reduction strategies translate into reduced digitization and data-movement costs under a CiM-like cost model? (3) Can robust representations enable a two-stage retrieval pipeline in which a coarse, low-information stage safely prunes candidates for a more precise stage?