Splitting text, the simple way (Image generated by author w. Dall-E 3)When preparing data for embedding and retrieval in a RAG system, splitting the text into appropriately sized chunks is crucial. This process is guided by two main factors, Model Constraints and Retrieval Effectiveness. Model Constraints Embedding models have a maximum token length for input;…
