Member-only story
RAG | Indexing Phase
Indexing is another step of the RAG structure that facilitates the use of LLM models on specific data. It allows the data in the referenced documents to be represented in a vector space after being decomposed into chunks. This step helps us to match the incoming query with the most accurate information among the documents.
There are 3 basic steps we need to know for indexing.
1. Vector Embedding
Vector Embedding is the process of transforming high-dimensional data, such as text, images or other complex data, into lower-dimensional and meaningful numerical vectors. This conversion process makes it easier for computers to understand and manipulate this data.
- By representing a word (e.g. “king”) with a vector, the Word2Vec model can capture the similarity relationship between “queen” and “king”.
- For example, the word “king” could be a vector like this:
[0.5, -0.4, 0.7, 0.8, 0.9, -0.7, -0.6]
.
Why Do We Use Vector Embedding?
- Dimensionality reduction: High-dimensional data increases computational costs and can degrade the performance of algorithms. This processing step solves the…