API Reference

Convert and split a document into optimal chunks.

Convert files into optimal chunks of text ready to be embedded and/or indexed.

In case of success, the response will provide you with a string process ID that you can use to keep track of the different parsing jobs you started.

Learn more about how to use the Parsing parameters and what output to expect.

For code examples, check out how to integrate it in a custom pipeline, LlamaIndex, or LangChain

Query Params
string

The URL to be called after chunking, where the result will be posted. If you prefer to get the result through an HTTP request, omit this parameter.

string

The language in ISO 639-1 format. If not provided, the system will automatically identify it.

boolean
Defaults to false

If set to true, each chunk will include the title of the parent paragraph/section.

boolean
Defaults to true

If set to false, the content of the headers will be removed. Headers may include page numbers, document titles, section titles, paragraph titles, and fixed layout elements.

boolean
Defaults to true

If set to true, only relevant titles will be included in the chunks, while other information will be removed. Relevant titles are those that should be part of the body of the page as a title. If set to false, only the keep_header parameter will be considered. If keep_header is false, the smart_header parameter will be ignored.

boolean
Defaults to false
boolean
Defaults to false

If set to true, the text contained in the images will be added to the chunks.

string
Defaults to text

Specifies the output format for tables.

boolean
Defaults to false

If set to true, and if tables are split across multiple chunks, each chunk will include the table row header.

boolean
Defaults to false

If set to true, short chunks will be merged with others to maximize the chunk length.

Body Params
file
required

The file to be uploaded as binary in the request form. Maximum file size 30 MB. Allowed file types: pdf, doc, docx, ppt, pptx, xls, xlsx, odt, ods, odp, eml, html, plain text

Response

Language
Request
Choose an example:
application/json