Member-only story

Don’t do “RAG” because now there is “CAG”

Attention Data Scientists, the new successor of RAG could be CAG!

Berika Varol Malkoçoğlu

Published in

Towards AI

5 min readJan 13, 2025

Don’t do “RAG” because now there is “CAG”

For readers who can’t see the full story, click here.

In recent years, Large Language Models (LLM) have made great progress in knowledge-based tasks. The traditional Retrieval-Augmented Generation (RAG) method was able to produce better answers by feeding LLM models with external knowledge sources.

However, this method is limited by delays in information retrieval and selection errors.

This is where a new paradigm, Cache-Augmented Generation (CAG), comes in. CAG enables models with long context windows to preload all information. Thus, the information retrieval process is completely disabled.

Advantages of CAG:

Faster response times
Reducing the risk of error
Simpler and simpler architecture

CAG Theory

LLMs have a specific context window. This window determines the maximum amount of information that can be given to the model at the same time. CAG loads all the necessary information into the context window in advance. Thus, the model does not need to dynamically fetch a separate source of information during the query.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Continue in app

Or, continue in mobile web

Already have an account? Sign in

Published in Towards AI

Last published just now

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Written by Berika Varol Malkoçoğlu

PhD | Data Scientist | Lecturer | AI Researcher

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Berika Varol Malkoçoğlu and Towards AI

Audio transcription with OpenAI Whisper on Raspberry PI 5.

Eugene Tkachenko

Audio transcription with OpenAI Whisper on Raspberry PI 5.

I will test OpenAI Whisper audio transcription models on a Raspberry Pi 5. The main goal is to understand if a Raspberry Pi can transcribe…

Jul 30, 2024

Exploring LoRA as a Dynamic Neural Network Layer for Efficient LLM Adaptation

In

Towards AI

by

Shenggang Li

Exploring LoRA as a Dynamic Neural Network Layer for Efficient LLM Adaptation

From Low-Rank Theory to Adaptive Rank Selection and RAG Integration — A Comprehensive Guide with Code Examples

5d ago

Mastering Principal Component Analysis (PCA) for Effective Data Science

In

Towards AI

by

Mirko Peters

Mastering Principal Component Analysis (PCA) for Effective Data Science

This blog post explores Principal Component Analysis (PCA), its importance in data science, and how it transforms complex, high-dimensional…

5d ago

Raspberry Pi 5 Video Stream Latencies: Comparing UDP, TCP, RTSP, and WebRTC

Eugene Tkachenko

Raspberry Pi 5 Video Stream Latencies: Comparing UDP, TCP, RTSP, and WebRTC

Discover the best live-streaming options for the Raspberry Pi 5. I will test a video live-stream latency and pick the best from the list.

Apr 15, 2024

See all from Berika Varol Malkoçoğlu

See all from Towards AI

Recommended from Medium

Object detection with Vision Transformers

In

AI Innovator From PrismAI

by

Abhijat Sarari

Object detection with Vision Transformers

Object detection is a core task in computer vision, powering technologies from self-driving cars to real-time video surveillance. It…

Oct 20, 2024

FPV autonomous operation with Betaflight and Raspberry Pi

In

ILLUMINATION

by

Dmytro Sazonov

FPV autonomous operation with Betaflight and Raspberry Pi

Start building your own Autopilot for FPV Combat Drone with Betaflight-based Flight Controller, simple 10 files on Python and Raspberry Pi.

Feb 24

Lists

Generative AI Recommended Reading

52 stories1674 saves

AI Regulation

6 stories702 saves

What is ChatGPT?

9 stories514 saves

ChatGPT prompts

51 stories2606 saves

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

Review — YOLOv12: Attention-Centric Real-Time Object Detectors

Sik-Ho Tsang

Review — YOLOv12: Attention-Centric Real-Time Object Detectors

YOLOv12, Outperforms YOLOv11, YOLOv10, YOLOv9, RT-DETR

Feb 22

Create a futuristic illustration showcasing a user effortlessly building their own AI desktop application using a powerful, free AI model. The scene should depict a sleek, modern workspace with a holographic interface displaying code and design elements. In the background, a vibrant network of interconnected nodes and data streams represents the AI model’s capabilities. The user is confidently interacting with the interface, highlighting the simplicity and accessibility of the process. Use brigh

In

Towards AI

by

Fabio Matricardi

How to build your own AI desktop app… by yourself

No magic wands, no Cursor AI: in 4 steps you can have a Mistral Small 22B powered chat-bot portable application. And here is how

4d ago

Creating an AI Agent That Uses a Computer Like People Do

In

Level Up Coding

by

Fareed Khan

Creating an AI Agent That Uses a Computer Like People Do

Sees Your Desktop, Performs Tasks

Feb 24

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams