Layout Parsing

Layout Parsing

curl --request POST \
  --url https://api.z.ai/api/paas/v4/layout_parsing \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "GLM-OCR",
  "file": "https://cdn.bigmodel.cn/static/logo/introduction.png"
}
'

{
  "id": "task_123456789",
  "created": 1727156815,
  "model": "GLM-OCR",
  "md_results": "# Doc title\nThis is the document content...",
  "layout_details": [
    [
      {
        "index": 1,
        "label": "text",
        "bbox_2d": [
          0.1,
          0.1,
          0.5,
          0.3
        ],
        "content": "This is the content of the element",
        "height": 800,
        "width": 600
      }
    ]
  ],
  "layout_visualization": [
    "<string>"
  ],
  "data_info": {
    "num_pages": 5,
    "pages": [
      {
        "width": 600,
        "height": 800
      }
    ]
  },
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "prompt_tokens_details": {
      "cached_tokens": 123
    },
    "total_tokens": 123
  },
  "request_id": "req_123456789"
}

POST

paas

layout_parsing

Layout Parsing

curl --request POST \
  --url https://api.z.ai/api/paas/v4/layout_parsing \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "GLM-OCR",
  "file": "https://cdn.bigmodel.cn/static/logo/introduction.png"
}
'

{
  "id": "task_123456789",
  "created": 1727156815,
  "model": "GLM-OCR",
  "md_results": "# Doc title\nThis is the document content...",
  "layout_details": [
    [
      {
        "index": 1,
        "label": "text",
        "bbox_2d": [
          0.1,
          0.1,
          0.5,
          0.3
        ],
        "content": "This is the content of the element",
        "height": 800,
        "width": 600
      }
    ]
  ],
  "layout_visualization": [
    "<string>"
  ],
  "data_info": {
    "num_pages": 5,
    "pages": [
      {
        "width": 600,
        "height": 800
      }
    ]
  },
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "prompt_tokens_details": {
      "cached_tokens": 123
    },
    "total_tokens": 123
  },
  "request_id": "req_123456789"
}

Authorizations

Authorization

string

header

required

Use the following format for authentication: Bearer

Body

application/json

model

enum<string>

required

Model code: glm-ocr

Available options:

glm-ocr

Example:

"glm-ocr"

file

string

required

Image or PDF document to be recognized, supports URL and base64. Supported image formats: PDF, JPG, PNG. Single image ≤10MB, PDF ≤50MB, maximum support 100 pages

Example:

"https://cdn.bigmodel.cn/static/logo/introduction.png"

return_crop_images

boolean

default:false

Whether to return screenshot information

need_layout_visualization

boolean

default:false

Whether to return detailed layout image result information

start_page_id

integer

Start page number for parsing when PDF is provided

Required range: x >= 1

end_page_id

integer

End page number for parsing when PDF is provided

Required range: x >= 1

request_id

string

Unique request identifier, automatically generated if not provided

Example:

"req_123456789"

user_id

string

End user ID for abuse monitoring. Length: 6-128 characters

Required string length: 6 - 128

Example:

"user_123456"

Response

Business processing successful

string

required

Task ID

Example:

"task_123456789"

created

integer<int64>

required

Request creation time, Unix timestamp in seconds

Example:

1727156815

model

string

required

Model name

Example:

"GLM-OCR"

md_results

string

Recognition result in Markdown format

Example:

"# Doc title\nThis is the document content..."

layout_details

object[][]

Detailed layout information

Hide child attributes

layout_details.index

integer

required

Element index

Example:

1

layout_details.label

enum<string>

required

Element type: image for images, text for text content, formula for inline formulas, table for tables

Available options:

image,

text,

formula,

table

Example:

"text"

layout_details.bbox_2d

number[]

Normalized element coordinates [x1,y1,x2,y2]

Required array length: 4 elements

Required range: 0 <= x <= 1

Example:

[0.1, 0.1, 0.5, 0.3]

layout_details.content

string

Element content (text / image URL / table HTML)

Example:

"This is the content of the element"

layout_details.height

integer

Page height

Example:

800

layout_details.width

integer

Page width

Example:

600

layout_visualization

string[]

Recognition result image URLs

data_info

object

Document basic information

Hide child attributes

data_info.num_pages

integer

required

Total number of document pages

Example:

5

data_info.pages

object[]

Document page count information

Hide child attributes

data_info.pages.width

integer

required

Page width

Example:

600

data_info.pages.height

integer

required

Page height

Example:

800

usage

object

Token usage statistics returned when the model call ends.

Hide child attributes

usage.prompt_tokens

number

Number of tokens in user input

usage.completion_tokens

number

Number of output tokens

usage.prompt_tokens_details

object

Hide child attributes

usage.prompt_tokens_details.cached_tokens

number

Number of tokens served from cache

usage.total_tokens

integer

Total number of tokens

request_id

string

Request ID

Example:

"req_123456789"

Tokenizer Web Search

⌘I

Using the APIs

Model API

Image API

Video API

Audio API

Tool API

Agent API

Authorizations

Body

Response