Name: baidu/Qianfan-OCR
Author: baidu

Question 1

What is baidu/Qianfan-OCR?

Accepted Answer

Baidu/Qianfan-OCR is a 4B-parameter end-to-end multimodal model for document intelligence, unifying OCR, layout analysis, and document understanding in a single vision-language architecture. It directly converts images to Markdown and supports tasks like table extraction, chart understanding, key information extraction (KIE), and multilingual OCR (192 languages). Powered by Qianfan-ViT (vision encoder) + Qwen3-4B (language model), it achieves #1 rankings on OmniDocBench v1.5 (93.12), OlmOCR Bench (79.8), and KIE benchmarks (87.9). The model introduces Layout-as-Thought, an optional thinking phase (⟨think⟩ tokens) for structured layout recovery, and delivers high throughput (1.024 PPS on A100 with W8A8 quantization). It is open-source (Apache 2.0 License) and deployable via transformers or vLLM.

Question 2

Who makes baidu/Qianfan-OCR?

Accepted Answer

baidu/Qianfan-OCR is developed by Baidu Inc..

Question 3

What can baidu/Qianfan-OCR do?

Accepted Answer

baidu/Qianfan-OCR specializes in image to text.

baidu/Qianfan-OCR

Potential Risks

Privacy & Security

Other

EU Alternatives

Ready to manage AI applications?