https://github.com/microsoft/markitdown Skip to content Navigation Menu Toggle navigation Sign in * Product + GitHub Copilot Write better code with AI + Security Find and fix vulnerabilities + Actions Automate any workflow + Codespaces Instant dev environments + Issues Plan and track work + Code Review Manage code changes + Discussions Collaborate outside of code + Code Search Find more, search less Explore + All features + Documentation + GitHub Skills + Blog * Solutions By company size + Enterprises + Small and medium teams + Startups By use case + DevSecOps + DevOps + CI/CD + View all use cases By industry + Healthcare + Financial services + Manufacturing + Government + View all industries View all solutions * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners + Executive Insights * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} microsoft / markitdown Public * Notifications You must be signed in to change notification settings * Fork 25 * Star 963 Python tool for converting files and office documents to Markdown. License MIT license 963 stars 25 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 5 * Pull requests 4 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights microsoft/markitdown main BranchesTags [ ] Go to file Code Folders and files Last commit Last Name Name message commit date Latest commit History 20 Commits .github/workflows .github/workflows src/markitdown src/markitdown tests tests .gitignore .gitignore .pre-commit-config.yaml .pre-commit-config.yaml CODE_OF_CONDUCT.md CODE_OF_CONDUCT.md LICENSE LICENSE README.md README.md SECURITY.md SECURITY.md SUPPORT.md SUPPORT.md pyproject.toml pyproject.toml View all files Repository files navigation * README * Code of conduct * MIT license * Security MarkItDown The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.) It presently supports: * PDF (.pdf) * PowerPoint (.pptx) * Word (.docx) * Excel (.xlsx) * Images (EXIF metadata, and OCR) * Audio (EXIF metadata, and speech transcription) * HTML (special handling of Wikipedia, etc.) * Various other text-based formats (csv, json, xml, etc.) The API is simple: from markitdown import MarkItDown markitdown = MarkItDown() result = markitdown.convert("test.xlsx") print(result.text_content) Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https:// cla.opensource.microsoft.com. When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments. Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies. About Python tool for converting files and office documents to Markdown. Topics openai autogen langchain autogen-extension Resources Readme License MIT license Code of conduct Code of conduct Security policy Security policy Activity Custom properties Stars 963 stars Watchers 5 watching Forks 25 forks Report repository Releases No releases published Packages 0 No packages published Contributors 2 * * Languages * HTML 94.9% * Python 5.1% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.