https://github.com/artperrin/image2csv Skip to content Sign up * Why GitHub? Features - + Mobile - + Actions - + Codespaces - + Packages - + Security - + Code review - + Project management - + Integrations - + GitHub Sponsors - + Customer stories- * Team * Enterprise * Explore + Explore GitHub - Learn and contribute + Topics - + Collections - + Trending - + Learning Lab - + Open source guides - Connect with others + The ReadME Project - + Events - + Community forum - + GitHub Education - + GitHub Stars program - * Marketplace * Pricing Plans - + Compare plans - + Contact Sales - + Nonprofit - + Education - [ ] [search-key] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this user All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} artperrin / image2csv * Notifications * Star 78 * Fork 2 Convert tables stored as images to an usable .csv file MIT License 78 stars 2 forks Star Notifications * Code * Issues 3 * Pull requests 0 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights master Switch branches/tags [ ] Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 1 branch 0 tags Go to file Code Clone HTTPS GitHub CLI [https://github.com/a] Use Git or checkout with SVN using the web URL. [gh repo clone artper] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit @artperrin artperrin Add a licence ... f838eae Mar 9, 2021 Add a licence f838eae Git stats * 9 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time readme_figures Initial commit completion Feb 3, 2021 testimages automatic grid detection added Mar 9, 2021 .gitignore form enhancement Feb 4, 2021 LICENSE Add a licence Mar 9, 2021 README.md automatic grid detection added Mar 9, 2021 grid_detector.py automatic grid detection added Mar 9, 2021 image2csv.py automatic grid detection added Mar 9, 2021 requirements.txt form enhancement Feb 4, 2021 tools.py automatic grid detection added Mar 9, 2021 View code README.md Convert an image of numbers to a .csv file This Python program aims to convert images of array numbers to corresponding .csv files. It uses OpenCV for Python to process the given image and Tesseract for number recognition. Output Example The repository includes: * the source code of image2csv.py, * the tools.py file where useful functions are implemented, * the grid_detector.py file to perform automatic grid detection, * a folder with some files used for test. The code is not well documented nor fully efficient as I'm a beginner in programming, and this project is a way for me to improve my skills, in particular in Python programming. How to use the program First of all, the user must install the needed packages: $ pip install -r requirements.txt as well as Tesseract. Then, in a python terminal, use the command line: $ python image2csv.py --image path/to/image There are a few optionnal arguments: * --path path/to/output/csv/file * --grid [False]/True * --visualization [y]/n * --method [fast]/denoize and one can find their usage using the command line: $ python image2csv.py --help By default, the program will try to detect a grid automatically. This detection uses OpenCV's Hough transformation and Canny detection, so the user can tweak a few parameters for better processing in the grid_detector.py file. When then program is running with manual grid detection, the user has to interact with it via its mouse and the terminal : 1. the image is opened in a window for the user to draw a rectangle around the first (top left) number. As this rectangle is used as a base to create a grid afterward, keep in mind that all the numbers should fit into the box. 2. A new window is opened showing the image with the drawn rectangle. Press any key to close and continue. 3. Based on the drawn rectangle, a grid is created to extract each number one by one. This grid is controlled by the user via two "offset" values. The user has to enter those values in the terminal, then the image is opened in a window with the created grid. Press any key to close and continue. If the numbers does not fit into the grid, the user can change the offset values and repeat this step. When the grid matches the user's expectations, he can set both of the offset values to 0 to continue. 4. The numbers are extracted from the image and the results are shown in the terminal. (be carefoul though, the indicated number of errors represents the number of errors encountered by Tesseract, but Tesseract can identify a wrong number which will not be counted as an error !) 5. The .csv file is created with the numbers identified by Tesseract. If Tesseract finds an error, it will show up on the .csv file as an infinite value. Hypothesis and limits For the program to run correctly, the input image must verify some hypothesis (just a few simple ones): * for manual selection, the line and row width must be constants, as the build grid is just a repetition of the initial rectangle with offsets; * to use automatic grid detection, a full and clear grid, with external borders, must be visible; * it is recommended to have a good input image resolution, to control the offsets more easily. At last, this program is not perfect (I know you thought so, with its smooth workflow and simple hypothesis, sorry to disappoint...) and does not work with decimal numbers... But does a great job on negatives ! Also the user must be careful with the slashed zero which seems to be identified by Tesseract as a six. Credits For image pre-processing in the tool.py file I used a useful function implemented by @Nitish9711 for his Automatic-Number-plate-detection ( https://github.com/Nitish9711/Automatic-Number-plate-detection.git). About Convert tables stored as images to an usable .csv file Resources Readme License MIT License Releases No releases published Packages 0 No packages published Languages * Python 100.0% * (c) 2021 GitHub, Inc. * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.