Text Chunker for RAG

Split large text into overlapping chunks by character count, word count, or sentence boundary — output as JSON for embeddings and RAG pipelines.

Source text

Retrieval-augmented generation (RAG) combines a large language model with an external knowledge base. Instead of relying solely on what the model memorized during training, the system retrieves relevant passages from a document store and feeds them into the prompt as context. This lets the model answer questions using up-to-date or private information it was never trained on.

A critical step in building a RAG pipeline is chunking: splitting long documents into smaller pieces before they are embedded and stored in a vector database. Chunks that are too large dilute the embedding and hurt retrieval precision. Chunks that are too small lose surrounding context, making the retrieved passage hard to interpret on its own. Overlapping chunks help preserve context across chunk boundaries, so a sentence split across two chunks still appears in full at least once.

Most production pipelines settle on chunk sizes between 200 and 800 characters (or roughly 100 to 300 words), with an overlap of 10 to 20 percent of the chunk size. The right values depend on the embedding model's context window, the structure of the source documents, and how granular the retrieved answers need to be.

1,188 characters · 190 words

Split mode

Chunk size (characters)

Overlap (characters)

3 chunks generated

Chunk 1500 chars · 77 words

Chunk 2500 chars · 79 words

documents into smaller pieces before they are embedded and stored in a vector database. Chunks that are too large dilute the embedding and hurt retrieval precision. Chunks that are too small lose surrounding context, making the retrieved passage hard to interpret on its own. Overlapping chunks help preserve context across chunk boundaries, so a sentence split across two chunks still appears in full at least once. Most production pipelines settle on chunk sizes between 200 and 800 characters (o

Chunk 3288 chars · 52 words

e on chunk sizes between 200 and 800 characters (or roughly 100 to 300 words), with an overlap of 10 to 20 percent of the chunk size. The right values depend on the embedding model's context window, the structure of the source documents, and how granular the retrieved answers need to be.

JSON array output

[
  {
    "index": 0,
    "text": "Retrieval-augmented generation (RAG) combines a large language model with an external knowledge base. Instead of relying solely on what the model memorized during training, the system retrieves relevant passages from a document store and feeds them into the prompt as context. This lets the model answer questions using up-to-date or private information it was never trained on.\n\nA critical step in building a RAG pipeline is chunking: splitting long documents into smaller pieces before they are emb",
    "char_count": 500,
    "word_count": 77
  },
  {
    "index": 1,
    "text": " documents into smaller pieces before they are embedded and stored in a vector database. Chunks that are too large dilute the embedding and hurt retrieval precision. Chunks that are too small lose surrounding context, making the retrieved passage hard to interpret on its own. Overlapping chunks help preserve context across chunk boundaries, so a sentence split across two chunks still appears in full at least once.\n\nMost production pipelines settle on chunk sizes between 200 and 800 characters (o",
    "char_count": 500,
    "word_count": 79
  },
  {
    "index": 2,
    "text": "e on chunk sizes between 200 and 800 characters (or roughly 100 to 300 words), with an overlap of 10 to 20 percent of the chunk size. The right values depend on the embedding model's context window, the structure of the source documents, and how granular the retrieved answers need to be.",
    "char_count": 288,
    "word_count": 52
  }
]

How RAG text chunking works

Retrieval-augmented generation (RAG) pipelines embed documents in pieces, not all at once — a whole PDF rarely fits in an embedding model's context window, and even when it does, a single vector for an entire document is too coarse to retrieve precisely. This tool splits your text into smaller, overlapping chunks and outputs a ready-to-use JSON array, so each chunk can be embedded and stored individually in a vector database (Pinecone, Weaviate, pgvector, Chroma, and similar stores all expect this kind of pre-split input).

Three splitting strategies are supported. By character count slides a fixed-width window across the raw text — simplest and most predictable, but it can cut a sentence in half. By word count does the same on whitespace-separated words, which keeps chunk sizes closer to token counts for most tokenizers. By sentence boundary packs whole sentences into each chunk up to a target character size, so chunks never split mid-sentence — usually the best choice for prose, articles, and documentation.

The overlapsetting repeats a small amount of trailing content at the start of the next chunk. This matters because a fact or reference that spans a chunk boundary would otherwise be lost or truncated in both chunks; overlap ensures it appears in full in at least one of them. A common starting point is a chunk size of 300–800 characters (or 100–300 words) with an overlap of 10–20% of the chunk size — tune both values based on your embedding model's context window and how granular you need retrieved passages to be. Everything runs locally in your browser — no text is uploaded anywhere.

Private & free — this tool runs entirely in your browser.