https://successfulsoftware.net/2022/04/30/why-isnt-there-a-decent-file-format-for-tabular-data/ Successful Software ...requires more than just good programming. Menu Skip to content * Home * About * Articles * Books * Links * Tools * Starting [cropped-peak43_960x250px] Why isn't there a decent file format for tabular data? 2 Replies Tabular data is everywhere. I support reading and writing tabular data in various formats in all 3 of my software application. It is an important part of my data transformation software. But all the tabular data formats suck. There doesn't seem to be anything that is reasonably space efficient, simple and quick to parse and text based (not binary) so you can view and edit it with a standard editor. Most tabular data currently gets exchanged as: CSV, Tab separated, XML, JSON or Excel. And they are all highly sub-optimal for the job. CSV is a mess. One quote in the wrong place and the file is invalid. It is difficult to parse efficiently using multiple cores, due to the quoting (you can't start parsing from part way through a file). Different quoting schemes are in use. You don't know what encoding it is in. Use of separators and line endings are inconsistent (sometimes comma, sometimes semicolon). Writing a parser to handle all the different dialects is not at all trivial. Microsoft Excel and Apple Numbers don't even agree on how to interpret some edge cases for CSV. Tab separated is a bit better than CSV. But can't store tabs and still has issues with line endings, encodings etc. XML and Javascript are tree structures and not suitable for efficiently storing tabular data (plus other issues). There is Parquet. It is very efficient with it's columnar storage and compression. But it is binary, so can't be viewed or edited with standard tools, which is a pain. Don't even get me started on Excel's proprietary, ghastly binary format. Why can't we have a format where: * Encoding is always UTF-8 * Values stored in row major order (row 1, row2 etc) * Columns are separated by \u001F (ASCII unit separator) * Rows are separated by \u001E (ASCII record separator) * Er, that's the entire specification. No escaping. If you want to put \u001F or \u001E in your data - tough you can't. Use a different format. It would be reasonably compact, efficient to parse and easy to manually edit (Notepad++ shows the unit separator as a 'US' symbol). You could write a fast parser for it in minutes. Typing \u001F or \ u001E in some editors might be a faff, but it is hardly a showstopper. It could be called something like "unicode separated value" (hat tip to @fakeunicode on Twitter for the name) or "unit separated value" with file extension .usv. Maybe a different extension could used when values are stored in column major order (column1, column 2 etc). Is there nothing like this already? Maybe there is and I just haven't heard of it. If not, shouldn't there be? And yes I am aware of the relevant XKCD cartoon ( https://xkcd.com/ 927/ ). Share this: * Tweet * * * More * * Share on Tumblr * * * Email * Like this: Like Loading... Related This entry was posted in article, data transformation, Easy Data Transform, software and tagged csv, data interchange, excel, json, software, tabular data, xml on 30 April 2022 by Andy Brice. Post navigation - Verifone seems to be having issues processing UK payments 2 thoughts on "Why isn't there a decent file format for tabular data? " 1. [36d27]Kurt Ruff 2 May 2022 at 5:11 pm https://ronaldduncan.wordpress.com/2009/10/31/ text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text / refers to this as "ASCII Delimited Text" (ADT). Reply | 1. [c03f8]Andy Brice Post author3 May 2022 at 10:35 pm Interesting. I hadn't seen that before. Ronald and I are very much in agreement (but he was there 13 years earlier!). Reply | What do you think? Cancel reply Enter your comment here... [ ] Fill in your details below or click an icon to log in: * * * * Gravatar Email (required) (Address never made public) [ ] Name (required) [ ] Website [ ] WordPress.com Logo You are commenting using your WordPress.com account. ( Log Out / Change ) Twitter picture You are commenting using your Twitter account. ( Log Out / Change ) Facebook photo You are commenting using your Facebook account. ( Log Out / Change ) Cancel Connecting to %s [ ] Notify me of new comments via email. [ ] Notify me of new posts via email. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] Easy Data Transform Easy Data Transform Clean, reformat, merge and dedupe your data without programming Easy Data Transform Free trial! Hyper Plan Hyper Plan Flexible visual planner for Windows & Mac Hyper Plan Free trial! PerfectTablePlan Perfect Table Plan Create the best seating plan in the least time PerfectTablePlan Free trial! RSS feed RSS feed Follow @successfulsw Subscribe to Blog via Email Enter your email address to subscribe to this blog and receive notifications of new posts by email. Join 2,111 other followers Email Address: [ ] Subscribe Blog Stats * 2,833,757 hits Top Posts * Why isn't there a decent file format for tabular data? * How to build a gym in your garden * How to add a dark theme to your Qt application * Do customers need to see an advertisement seven times? * About Recent posts * Why isn't there a decent file format for tabular data? * Verifone seems to be having issues processing UK payments * Positioning Software in a Crowded Market * Making explainer videos for your software * WinterFest 2021 Search Search for: [ ] [Search] Categories Categories[Select Category ] Creative Commons License This work is licenced under a Creative Commons Licence. Blog at WordPress.com. * Follow Following + [948a05] Successful Software Join 2,111 other followers [ ] Sign me up + Already have a WordPress.com account? Log in now. * + [948a05] Successful Software + Customize + Follow Following + Sign up + Log in + Copy shortlink + Report this content + View post in Reader + Manage subscriptions + Collapse this bar Send to Email Address [ ] Your Name [ ] Your Email Address [ ] [ ] loading [Send Email] Cancel Post was not sent - check your email addresses! Email check failed, please try again Sorry, your blog cannot share posts by email. %d bloggers like this: [b]