https://github.com/tesseract-ocr/tesseract Skip to content Sign up * Why GitHub? Features - + Mobile - + Actions - + Codespaces - + Packages - + Security - + Code review - + Issues - + Integrations - + GitHub Sponsors - + Customer stories- * Team * Enterprise * Explore + Explore GitHub - Learn and contribute + Topics - + Collections - + Trending - + Learning Lab - + Open source guides - Connect with others + The ReadME Project - + Events - + Community forum - + GitHub Education - + GitHub Stars program - * Marketplace * Pricing Plans - + Compare plans - + Contact Sales - + Education - [ ] [search-key] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} tesseract-ocr / tesseract * Notifications * Star 40.8k * Fork 7.4k Tesseract Open Source OCR Engine (main repository) tesseract-ocr.github.io/ Apache-2.0 License 40.8k stars 7.4k forks Star Notifications * Code * Issues 318 * Pull requests 8 * Actions * Projects 1 * Wiki * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Wiki * Security * Insights master Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags 8 branches 44 tags Code Clone HTTPS GitHub CLI [https://github.com/t] Use Git or checkout with SVN using the web URL. [gh repo clone tesser] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @zdenop zdenop Solve clang reporting unused variable in ExtractMicros function (#3501) ... 8dd7936 Jul 17, 2021 Solve clang reporting unused variable in ExtractMicros function (# 3501) * mark attribute as unused for compiler * try c++17 standard https://en.cppreference.com/w/cpp/language/attributes/maybe_unused 8dd7936 Git stats * 5,594 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github CI: Replace g++-8 by g++-11 for MacOS Jun 26, 2021 abseil @ e1d388e Update submodule abseil to tagged release 20210324.1 May 15, 2021 cmake Fix typo Dec 11, 2020 doc Update URLs for Google groups Apr 11, 2021 googletest @ 703bd9c Update submodule googletest to tagged release release-1.10.0 Mar 8, 2020 include/tesseract ThresholdMethod::TiledSauvola -> ThresholdMethod::Sauvola May 21, 2021 java Fix some typos (most found by codespell) Apr 11, 2021 m4 Remove autoconf-archive dependency Jun 30, 2018 snap Add newline at end of file Sep 3, 2018 src Solve clang reporting unused variable in ExtractMicros function (# 3501) Jul 17, 2021 tessdata remove legacy parameter disable_character_fragments from lstm.train Oct 23, 2019 test @ ebaee16 Update test submodule Feb 13, 2021 unittest Fix IntSimdMatrixTest.AVX2 Jul 4, 2021 .clang-format clang-format: Clean formatting rules Apr 7, 2021 .gitattributes Create .gitattributes for cross os contributors Oct 17, 2020 .gitignore [gitignore] Ignore some local dirs. Jan 2, 2021 .gitmodules Fix git submodule 'test' Dec 19, 2020 .lgtm.yml Use Python3 for LGTM Nov 30, 2018 .travis.yml travis: Use libleptonica-dev from Ubuntu focal instead of local build May 16, 2021 AUTHORS Update AUTHORS Apr 11, 2021 CMakeLists.txt fix cross-build to iOS/tvOS/watchOS Jul 17, 2021 CONTRIBUTING.md Update URLs for Google groups Apr 11, 2021 ChangeLog Replace references to the old wiki by new URLs Feb 3, 2020 Dockerfile Update Dockerfile Dec 7, 2020 INSTALL Fix file endings Apr 25, 2018 INSTALL.GIT.md Update piccolo2d-core and piccolo2d-extras Feb 23, 2020 LICENSE Added missing license headers Nov 18, 2016 Makefile.am Use SIMD instructions for DotProductNative Jul 14, 2021 README.md Update README.md May 15, 2021 VERSION Create new pre-release 5.0.0-alpha-20210401 Apr 1, 2021 appveyor.yml [ci] Copy fonts to testdir. Jan 5, 2021 autogen.sh Update autogen.sh Dec 4, 2019 configure.ac Use SIMD instructions for DotProductNative Jul 14, 2021 docker-compose.yml Dockerifying using travis build script Mar 18, 2016 sw.cpp [sw] Add pthread to tests. Jan 5, 2021 tesseract.pc.cmake Add missing libraries in configuration for pkg-config Nov 2, 2019 tesseract.pc.in Add missing libraries in configuration for pkg-config Nov 2, 2019 View code Tesseract OCR About Brief history Installing Tesseract Running Tesseract For developers Support License Dependencies Latest Version of README README.md Tesseract OCR Build Status Build status Build status Coverity Scan Build Status Code Quality: Cpp Total Alerts OSS-Fuzz GitHub license Downloads About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository. The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and GitHub's log of contributors. Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". Tesseract supports various output formats: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV. The master branch also has experimental support for ALTO (XML) output. You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract. This project does not include a GUI application. If you need one, please see the 3rdParty documentation. Tesseract can be trained to recognize other languages. See Tesseract Training for more information. Brief history Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. From 2006 until November 2018 it was developed by Google. The latest (LSTM based) stable version is 4.1.1, released on December 26, 2019. Latest source code is available from master branch on GitHub. Open issues can be found in issue tracker, and planning documentation. The latest 3.0x version is 3.05.02, released on June 19, 2018. Latest source code for 3.05 is available from 3.05 branch on GitHub. There is no development for this version, but it can be used for special cases (e.g. see Regression of features from 3.0x). See Release Notes and Change Log for more details of the releases. Installing Tesseract You can either Install Tesseract via pre-built binary package or build it from source. C++17 support is required for building. Running Tesseract Basic command line usage: tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...] For more information about the various command line options use tesseract --help or man tesseract. Examples can be found in the documentation. For developers Developers can use libtesseract C or C++ API to build their own application. If you need bindings to libtesseract for other programming languages, please see the wrapper section in the AddOns documentation. Documentation of Tesseract generated from source code by doxygen can be found on tesseract-ocr.github.io. Support Before you submit an issue, please review the guidelines for this repository. For support, first read the documentation, particularly the FAQ to see if your problem is addressed there. If not, search the Tesseract user forum, the Tesseract developer forum and past issues, and if you still can't find what you need, ask for support in the mailing-lists. Mailing-lists: * tesseract-ocr - For tesseract users. * tesseract-dev - For tesseract developers. Please report an issue only for a bug, not for asking questions. License The code in this repository is licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. NOTE: This software depends on other packages that may be licensed under different open source licenses. Tesseract uses Leptonica library which essentially uses a BSD 2-clause license. Dependencies Tesseract uses Leptonica library for opening input images (e.g. not documents like pdf). It is suggested to use leptonica with built-in support for zlib, png and tiff (for multipage tiff). Latest Version of README For the latest online version of the README.md see: https://github.com/tesseract-ocr/tesseract/blob/master/README.md About Tesseract Open Source OCR Engine (main repository) tesseract-ocr.github.io/ Topics machine-learning ocr tesseract lstm tesseract-ocr hacktoberfest ocr-engine Resources Readme License Apache-2.0 License Releases 44 4.1.1 Release Latest Dec 26, 2019 + 43 releases Packages 0 No packages published Contributors 143 * * * * * * * * * * * + 132 contributors Languages * C++ 95.8% * C 1.1% * CMake 1.0% * Java 0.9% * Makefile 0.9% * Shell 0.3% * (c) 2021 GitHub, Inc. * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.