https://huggingface.co/papers/2409.01704 Hugging Face's logo Hugging Face [ ] * Models * Datasets * Spaces * Posts * Docs * Solutions * Pricing * * ----------------------------------------------------------------- * Log In * Sign Up Papers arxiv:2409.01704 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Published on Sep 3 * Submitted by [d4b925f7b8] HaoranWei on Sep 4 #3 Paper of the day [ ] Upvote 45 * [d4b925f7b8] * [956238ce50] * [1639b6552a] * [114e0cc393] * [wUa1epGtTG] * [5f9a7ab6b6] * [77c48a8369] * [2603d00158] * +37 Authors: Haoran Wei , Chenglong Liu , Jinyue Chen , Jia Wang , Lingyu Kong , Yanming Xu , Zheng Ge , Liang Zhao , Jianjian Sun , Yuang Peng , Chunrui Han , Xiangyu Zhang Abstract Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT, with 580M parameters, is a unified, elegant, and end-to-end model, consisting of a high-compression encoder and a long-contexts decoder. As an OCR-2.0 model, GOT can handle all the above "characters" under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors. Furthermore, we also adapt dynamic resolution and multi-page OCR technologies to GOT for better practicality. In experiments, we provide sufficient results to prove the superiority of our model. View arXiv page View PDF Add to collection Community [d4b925f7b8] HaoranWei Paper submitter 7 days ago OCR-2.0 era is coming. 10 10 + Reply [1674830754] librarian-bot 7 days ago This is an automated message from the Librarian Bot. I found the following papers similar to this paper. The following papers were recommended by the Semantic Scholar API * AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding (2024) * Decoder Pre-Training with only Text for Scene Text Recognition (2024) * Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models (2024) * INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model (2024) * Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval (2024) Please give a thumbs up to this comment if you found it helpful! If you want recommendations for any Paper on Hugging Face checkout this Space You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend 1 1 + Reply [62e67532cf] maybekatz about 8 hours ago IMG_3672.jpeg * [62e67532cf] * [6f1bbef1a5] * 2 replies * + [62e67532cf] maybekatz about 8 hours ago Shi Yao Nei Rong + Expand 1 reply [ ] EditPreview [ ] [ ] [ ] Upload images, audio, and videos by dragging in the text input, pasting, or clicking here. Tap or paste here to upload images [ ] Comment * Sign up or log in to comment [ ] Upvote 45 * [d4b925f7b8] * [956238ce50] * [1639b6552a] * [114e0cc393] * [wUa1epGtTG] * [5f9a7ab6b6] * [77c48a8369] * [2603d00158] * [dac76ebd3b] * [617d5cb03b] * [8b4406f583] * [c82779fdf9] * +33 Models citing this paper 1 [c1752d35d1] abhinand/GOT-OCR-2.0 Updated about 7 hours ago * 3 Datasets citing this paper 0 No dataset linking this paper Cite arxiv.org/abs/2409.01704 in a dataset README.md to link it from this page. Spaces citing this paper 0 No Space linking this paper Cite arxiv.org/abs/2409.01704 in a Space README.md to link it from this page. Collections including this paper 14 Interesting Collection 2 items * Updated 6 days ago Papers to Read Collection 15 items * Updated about 7 hours ago LLMs Collection 2 items * Updated about 10 hours ago Medizin Model Collection 2 items * Updated about 8 hours ago Browse 14 collections that include this paper Company (c) Hugging Face TOS Privacy About Jobs Website Models Datasets Spaces Pricing Docs