Skip to main content
The text module provides pure-JS text processing utilities, with no external dependencies.

splitText

Split text into chunks using a recursive character splitting strategy. The splitter tries separators in order, splits on the first one found, merges small pieces back up to chunkSize, maintains chunkOverlap between consecutive chunks, and recurses with finer separators for pieces still too large.
- run:
    module: text
    function: splitText
    parameters:
      content: "{{document.text}}"
      chunkSize: 1000
      chunkOverlap: 200
    output: chunks
ParameterTypeRequiredDefaultDescription
contentstring | string[]yesText or array of texts to split
chunkSizenumberyesMaximum size of each chunk (in characters)
chunkOverlapnumberyesNumber of overlapping characters between consecutive chunks
separatorsstring[]no["\n\n", "\n", " ", ""]Ordered list of separators to try, from coarsest to finest
keepSeparatorboolean | "start" | "end"nofalseAttach the separator to the chunk. true or "end" appends it to the preceding chunk, "start" prepends it to the following chunk. Only visible with non-whitespace separators (whitespace is trimmed).
Returns an array of { content, size } objects:
[
  { "content": "First chunk text...", "size": 253 },
  { "content": "Second chunk text...", "size": 241 }
]

Split with custom separators (e.g. Markdown headings)

- run:
    module: text
    function: splitText
    parameters:
      content: "{{document.text}}"
      chunkSize: 1500
      chunkOverlap: 100
      separators:
        - "\n## "
        - "\n### "
        - "\n\n"
        - "\n"
        - " "
        - ""
      keepSeparator: start
    output: chunks

Iterate over chunks

- run:
    module: text
    function: splitText
    parameters:
      content: "{{document.text}}"
      chunkSize: 500
      chunkOverlap: 50
    output: chunks
- repeat:
    on: "{{chunks}}"
    do:
      - emit:
          event: chunk-ready
          payload:
            text: "{{item.content}}"
            size: "{{item.size}}"