https://huggingface.co/papers/2409.01704

Hugging Face's logo Hugging Face
[                    ]

  * Models
  * Datasets
  * Spaces
  * Posts
  * Docs
  * Solutions
  * Pricing
  * 
  * -----------------------------------------------------------------
  * Log In
  * Sign Up

Papers
arxiv:2409.01704

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Published on Sep 3
* Submitted by [d4b925f7b8] HaoranWei on Sep 4
#3 Paper of the day
 
[ ] Upvote
45

  * [d4b925f7b8]
  * [956238ce50]
  * [1639b6552a]
  * [114e0cc393]
  * [wUa1epGtTG]
  * [5f9a7ab6b6]
  * [77c48a8369]
  * [2603d00158]
  * +37

Authors:
Haoran Wei ,
Chenglong Liu ,
Jinyue Chen ,
Jia Wang ,
Lingyu Kong ,
Yanming Xu ,
Zheng Ge ,
Liang Zhao ,
Jianjian Sun ,
Yuang Peng ,
Chunrui Han ,
Xiangyu Zhang

Abstract

Traditional OCR systems (OCR-1.0) are increasingly unable to meet
people's usage due to the growing demand for intelligent processing
of man-made optical characters. In this paper, we collectively refer
to all artificial optical signals (e.g., plain texts, math/molecular
formulas, tables, charts, sheet music, and even geometric shapes) as
"characters" and propose the General OCR Theory along with an
excellent model, namely GOT, to promote the arrival of OCR-2.0. The
GOT, with 580M parameters, is a unified, elegant, and end-to-end
model, consisting of a high-compression encoder and a long-contexts
decoder. As an OCR-2.0 model, GOT can handle all the above
"characters" under various OCR tasks. On the input side, the model
supports commonly used scene- and document-style images in slice and
whole-page styles. On the output side, GOT can generate plain or
formatted results (markdown/tikz/smiles/kern) via an easy prompt.
Besides, the model enjoys interactive OCR features, i.e.,
region-level recognition guided by coordinates or colors.
Furthermore, we also adapt dynamic resolution and multi-page OCR
technologies to GOT for better practicality. In experiments, we
provide sufficient results to prove the superiority of our model.

View arXiv page View PDF Add to collection

Community

[d4b925f7b8] HaoranWei
Paper submitter 7 days ago

OCR-2.0 era is coming.


10
10
+
Reply
[1674830754] librarian-bot
7 days ago

This is an automated message from the Librarian Bot. I found the
following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

  * AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene
    Understanding (2024)
  * Decoder Pre-Training with only Text for Scene Text Recognition
    (2024)
  * Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large
    Language Models (2024)
  * INF-LLaVA: Dual-perspective Perception for High-Resolution
    Multimodal Large Language Model (2024)
  * Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and
    Flexible Scene Text Retrieval (2024)

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout
this Space

You can directly ask Librarian Bot for paper recommendations by
tagging it in a comment: @librarian-bot recommend


1
1
+
Reply
[62e67532cf] maybekatz
about 8 hours ago

IMG_3672.jpeg

  * [62e67532cf]
  * [6f1bbef1a5]
  * 2 replies

*
+
[62e67532cf] maybekatz
about 8 hours ago

Shi Yao Nei Rong 

+
Expand 1 reply
[                    ]
EditPreview
[                    ]
[                    ]
[                    ]
Upload images, audio, and videos by dragging in the text input,
pasting, or clicking here.
Tap or paste here to upload images
[                    ]
Comment

* Sign up or log in to comment

 
[ ] Upvote
45

  * [d4b925f7b8]
  * [956238ce50]
  * [1639b6552a]
  * [114e0cc393]
  * [wUa1epGtTG]
  * [5f9a7ab6b6]
  * [77c48a8369]
  * [2603d00158]
  * [dac76ebd3b]
  * [617d5cb03b]
  * [8b4406f583]
  * [c82779fdf9]
  * +33

Models citing this paper 1

 
[c1752d35d1]

abhinand/GOT-OCR-2.0

Updated about 7 hours ago * 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.01704 in a dataset README.md to link it from
this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.01704 in a Space README.md to link it from
this page.

Collections including this paper 14

Interesting

Collection
2 items * Updated 6 days ago

Papers to Read

Collection
15 items * Updated about 7 hours ago

LLMs

Collection
2 items * Updated about 10 hours ago

Medizin Model

Collection
2 items * Updated about 8 hours ago
Browse 14 collections that include this paper
Company
(c) Hugging Face
TOS Privacy About Jobs  
Website
Models Datasets Spaces Pricing Docs