🔍 Powered by Tesseract OCR

Extract Text from
Any Image or Document

OCR API powered by Tesseract. Extract text from receipts, invoices, scanned PDFs, business cards, and screenshots in 20 languages.

Get API Key Free → Get API Key →

Request

POST /extract
Content-Type: application/json

{
  "url": "https://example.com/receipt.jpg",
  "language": "eng"
}

Response — 200 OK

{
  "status": "ok",
  "text": "RECEIPT\nDate: 2024-01-15\nItem 1  $12.50\nItem 2  $8.99\nTotal: $21.49",
  "confidence": 94.2,
  "word_count": 12,
  "language": "eng",
  "words": [...]
}

Features

Production-grade text extraction

🌐

20 Languages

English, Chinese (Simplified/Traditional), Japanese, Korean, Arabic, Hindi, Russian, German, French, Spanish, and more.

📊

Confidence Score

Every extraction includes a confidence score (0–100) so you know how reliable the result is.

📍

Bounding Boxes

Optional word-level bounding box coordinates for layout analysis and document understanding.

🔗

URL or Base64

Pass an image URL or base64-encoded image. Supports JPEG, PNG, GIF, and WebP.

📄

Document Types

Receipts, invoices, IDs, business cards, scanned PDFs, screenshots, handwritten notes.

⚡

Fast Processing

Average response under 2 seconds for standard documents. Larger images may take longer.

Languages

20 languages supported

🇬🇧 English (eng)

🇨🇳 Chinese Simplified

🇹🇼 Chinese Traditional

🇯🇵 Japanese (jpn)

🇰🇷 Korean (kor)

🇸🇦 Arabic (ara)

🇮🇳 Hindi (hin)

🇷🇺 Russian (rus)

🇩🇪 German (deu)

🇫🇷 French (fra)

🇪🇸 Spanish (spa)

🇵🇹 Portuguese (por)

🇮🇹 Italian (ita)

🇳🇱 Dutch (nld)

🇵🇱 Polish (pol)

🇸🇪 Swedish (swe)

🇳🇴 Norwegian (nor)

🇩🇰 Danish (dan)

🇫🇮 Finnish (fin)

🇹🇷 Turkish (tur)

Pricing

Simple, transparent pricing

Free

$0/mo

1,000 requests / month
All API endpoints
1 API key
Priority support

Get Free Key

Basic

$9/mo

10,000 requests / month
All API endpoints
1 API key
Email support

Subscribe $9/mo

Common questions

What image formats are supported?

JPEG, PNG, GIF, and WebP. You can pass either a public image URL or a base64-encoded image. Maximum file size is 10MB.

How accurate is the OCR?

Accuracy depends on image quality. For clear, high-resolution images: 90-98% accuracy. Blurry, low-contrast, or handwritten text will have lower accuracy. The confidence score indicates reliability.

What are bounding boxes used for?

Word-level bounding boxes give you the x, y coordinates and dimensions of each detected word. Useful for document layout analysis, redaction, or highlighting specific words in the original image.

Can it read handwritten text?

Tesseract can read some handwritten text, but accuracy is lower than printed text. Clear, consistent handwriting works better than cursive or irregular styles.

Extract Text fromAny Image or Document