> ## Documentation Index
> Fetch the complete documentation index at: https://docs.prisme.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Text Module

> Pure-JS text processing utilities for automations

The **text** module provides pure-JS text processing utilities, with no external dependencies.

## parseUrl

Generic URL parser. Extracts standard URL parts (domain, path, query) plus Prisme.ai-specific fields (workspaceId, file id/filename).

```yaml theme={null}
- run:
    module: text
    function: parseUrl
    parameters:
      url: "{{uploaded_file.url}}"
    output: parsed
# parsed.domain      → "api.prisme.ai"
# parsed.path        → "/v2/files/ws123/abc.report.pdf"
# parsed.id          → "abc"
# parsed.filename    → "report.pdf"
# parsed.ext          → "pdf"
# parsed.workspaceId → "ws123"
# parsed.mimetype    → "application/pdf"
# parsed.query       → { "token": "xyz" }
```

| Parameter | Type   | Required | Default | Description             |
| --------- | ------ | -------- | ------- | ----------------------- |
| `url`     | string | yes      |         | Any URL (or raw string) |

Returns an object with the following fields:

| Field         | Type   | Description                                                                                                  |
| ------------- | ------ | ------------------------------------------------------------------------------------------------------------ |
| `domain`      | string | Hostname (e.g. `api.prisme.ai`). Empty for non-URLs.                                                         |
| `path`        | string | Full pathname (e.g. `/v2/files/ws123/abc.report.pdf`).                                                       |
| `id`          | string | From the last path segment: everything before the first dot, only when 2+ dots are present. Empty otherwise. |
| `filename`    | string | From the last path segment: everything after the first dot. If no dot, equals the full segment.              |
| `ext`         | string | Lowercase file extension from the filename (e.g. `pdf`, `xlsx`). Empty if none.                              |
| `workspaceId` | string | Extracted from `/files/{wsId}/…` or `/workspaces/{wsId}/…`. Empty if not found.                              |
| `mimetype`    | string | MIME type inferred from the file extension.                                                                  |
| `query`       | object | Query string parameters as key-value pairs.                                                                  |

### How `id` / `filename` splitting works

The last path segment is parsed based on the number of dots:

* **2+ dots** (`{id}.{name}.{ext}`): `id` is everything before the first dot, `filename` is the rest.
* **1 dot** (`{name}.{ext}`): the whole segment is the `filename`, no `id`.
* **0 dots**: the whole segment is the `filename`, no `id`.

| Last segment        | `id`      | `filename`   |
| ------------------- | --------- | ------------ |
| `abc123.report.pdf` | `abc123`  | `report.pdf` |
| `doc.pdf`           | *(empty)* | `doc.pdf`    |
| `README`            | *(empty)* | `README`     |

### Examples

Extract a file ID from a native upload URL:

```yaml theme={null}
- run:
    module: text
    function: parseUrl
    parameters:
      url: "{{file_part.url}}"
    output: _parsed
- set:
    name: file_id
    value: "{{_parsed.id}}"
```

Extract workspace ID from any Prisme.ai URL:

```yaml theme={null}
- run:
    module: text
    function: parseUrl
    parameters:
      url: "{{webhook_url}}"
    output: _parsed
- set:
    name: ws_id
    value: "{{_parsed.workspaceId}}"
```

## splitText

Split text into chunks using a recursive character splitting strategy. The splitter tries separators in order, splits on the first one found, merges small pieces back up to `chunkSize`, maintains `chunkOverlap` between consecutive chunks, and recurses with finer separators for pieces still too large.

```yaml theme={null}
- run:
    module: text
    function: splitText
    parameters:
      content: "{{document.text}}"
      chunkSize: 1000
      chunkOverlap: 200
    output: chunks
```

| Parameter       | Type                            | Required | Default                   | Description                                                                                                                                                                                                |
| --------------- | ------------------------------- | -------- | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `content`       | string \| string\[]             | yes      |                           | Text or array of texts to split                                                                                                                                                                            |
| `chunkSize`     | number                          | yes      |                           | Maximum size of each chunk (in characters)                                                                                                                                                                 |
| `chunkOverlap`  | number                          | yes      |                           | Number of overlapping characters between consecutive chunks                                                                                                                                                |
| `separators`    | string\[]                       | no       | `["\n\n", "\n", " ", ""]` | Ordered list of separators to try, from coarsest to finest                                                                                                                                                 |
| `keepSeparator` | boolean \| `"start"` \| `"end"` | no       | `false`                   | Attach the separator to the chunk. `true` or `"end"` appends it to the preceding chunk, `"start"` prepends it to the following chunk. Only visible with non-whitespace separators (whitespace is trimmed). |

Returns an array of `{ content, size }` objects:

```json theme={null}
[
  { "content": "First chunk text...", "size": 253 },
  { "content": "Second chunk text...", "size": 241 }
]
```

### Split with custom separators (e.g. Markdown headings)

```yaml theme={null}
- run:
    module: text
    function: splitText
    parameters:
      content: "{{document.text}}"
      chunkSize: 1500
      chunkOverlap: 100
      separators:
        - "\n## "
        - "\n### "
        - "\n\n"
        - "\n"
        - " "
        - ""
      keepSeparator: start
    output: chunks
```

### Iterate over chunks

```yaml theme={null}
- run:
    module: text
    function: splitText
    parameters:
      content: "{{document.text}}"
      chunkSize: 500
      chunkOverlap: 50
    output: chunks
- repeat:
    on: "{{chunks}}"
    do:
      - emit:
          event: chunk-ready
          payload:
            text: "{{item.content}}"
            size: "{{item.size}}"
```
