by baidu
Baidu/Qianfan-OCR is a 4B-parameter end-to-end multimodal model for document intelligence, unifying OCR, layout analysis, and document understanding in a single vision-language architecture. It directly converts images to Markdown and supports tasks like table extraction, chart understanding, key information extraction (KIE), and multilingual OCR (192 languages). Powered by Qianfan-ViT (vision encoder) + Qwen3-4B (language model), it achieves #1 rankings on OmniDocBench v1.5 (93.12), OlmOCR Bench (79.8), and KIE benchmarks (87.9). The model introduces Layout-as-Thought, an optional thinking phase (⟨think⟩ tokens) for structured layout recovery, and delivers high throughput (1.024 PPS on A100 with W8A8 quantization). It is open-source (Apache 2.0 License) and deployable via transformers or vLLM.
Complete information about the vendor/provider of this AI application
1 considerations identified
Review recommended before use
These considerations are automatically identified based on publicly available information about the vendor and AI catalog data. Actual risks may vary based on your specific use case and implementation.
Legal, privacy, and compliance documentation
Get insights into risk by running assessments on this AI application.
Discover EU-based alternatives for this AI application.
Track, assess, and govern your AI applications with Anove.