[HN Gopher] Original WWW proposal is a Word for Macintosh 4 file...
___________________________________________________________________
Original WWW proposal is a Word for Macintosh 4 file from 1990, can
we open it?
Author : jgrahamc
Score : 463 points
Date : 2024-02-13 14:06 UTC (1 days ago)
(HTM) web link (blog.jgc.org)
(TXT) w3m dump (blog.jgc.org)
| zdw wrote:
| If you wanted exactly what would have been printed, on the
| emulator running Word for Mac 4.0 you should be able to install a
| print queue that can generate a .ps (Postscript) file, which
| would could be converted to PDF.
|
| Or Acrobat may be available for that old of an OS and would have
| a virtual print driver to go directly to PDF.
| detourdog wrote:
| I know I have running Macs with Word 5.1a which I consider the
| last Word version needed. I'm sure I opened Word 4.0 files.
| kps wrote:
| Yes, a few years ago I helped a friend recover a bunch of old
| documents. The solution was to use Mac Word 5 to open the
| Word 4 files and save them as something newer versions could
| read.
| jgrahamc wrote:
| Ah. Great suggestion! I just used Print2PDF to make a PDF from
| Word. Will update the blog.
| chrisfinazzo wrote:
| https://web.mit.edu/ghostscript/www/Ps2pdf.htm
|
| _Or, if you prefer to do more tweaking yourself, dive into the
| Ghostscript deep end :)_
|
| https://www.ghostscript.com
| msephton wrote:
| LibreOffice opens it right up. It's support for old document file
| formats is really excellent. I keep it around for just this
| purpose. https://imgur.com/a/JENgq6V
|
| But I also love using BasiliskII and InfiniteMac emulators!
| Karellen wrote:
| > LibreOffice opens it right up. It's support for old document
| formats is really excellent.
|
| Yes, the OP also mentions that LibreOffice opens it.
|
| ...but they also point out with LibreOffice that "Although
| there's something weird about the margins and there are other
| formatting problems." - which is also apparent in your
| screenshot? Certainly that level support for such an old
| proprietary format is pretty good, but I'm not sure I'd class
| it as "really excellent" with those issues.
| jgrahamc wrote:
| Yes, LibreOffice opened it right up with the wrong font
| sizes, headers and footers messed up, incorrect gutter and
| margins, and a bunch of other problems. But they were all
| fixable.
| msephton wrote:
| I should have been clearer: what I meant was that its support
| for very many different old document formats is excellent.
| Atari ST, Amiga, Macintosh, and so on. The OP and you are
| quite right that it won't open the documents with exactly the
| right formatting, but it's good enough in a pinch so you
| don't have to learn how to use 40 year old computers. It's a
| good tool to have.
|
| 7zip has similar support for a wide range of compressed file
| formats, exes, data files, cabinets, and so on. Another good
| tool to save time and keep you on your modern operating
| system.
| opello wrote:
| > 7zip has similar support for a wide range of compressed
| file formats, exes, data files, cabinets, and so on.
|
| 7zfm.exe (7-Zip File Manager) anyway, which I agree is very
| useful. I've wanted it in Linux multiple times to avoid
| creating loopback devices but seem to always find it's
| Windows only.
| msephton wrote:
| I was referring to 7z on the command line.
| opello wrote:
| Ah nice, I didn't realize it worked with the wider types
| of archives. I'm pretty sure I dug into the source in the
| past when trying to get it to handle an ISO in Linux and
| found that it was only supported on Windows. But that
| might have just been the GUI and not the command line
| tool.
|
| Thanks!
| sigspec wrote:
| Yeah we read the article--- which matches your screenshot.
| msephton wrote:
| This is for all the TL;DR folks.
| jgrahamc wrote:
| I think your summary is a bit short. Sure, LibreOffice
| opens the file but there are multiple problems with the
| formatting that need correcting. Your screenshot shows at
| least one of them (there shouldn't be any headers on the
| first page and the page layout should be different).
| chris_wot wrote:
| The question is: is there a bug report?
| msephton wrote:
| The question was "can we open it?"
| graemep wrote:
| LibreOffice was the first thing I tried, and it worked with no
| problem.
| jgrahamc wrote:
| Well, except for all the problems I outlined in the post.
| soperj wrote:
| headline says "open" and libreoffice opened it with no
| problem.
| TaylorAlexander wrote:
| I simply opened the file with my hex editor. Problem
| solved. (sarcasm)
| jgrahamc wrote:
| I actually opened it in emacs in hexl-mode before I ran
| the file command!
| skissane wrote:
| In the past, I have in all seriousness read Microsoft
| Word documents on Linux using less. I might have had
| LibreOffice installed, but it can't run over SSH.
|
| It works okay with most old school (pre-XML) ones, since
| the document text is in the file in plain ASCII amidst
| all the binary formatting stuff. For the new XML formats,
| less by itself doesn't do anything useful, but unzip them
| and you can read the XML containing the document text.
| pests wrote:
| Word supported a mode, in order to speed up saving,
| changes were appended to the file in a diff-like format.
| How could you know you were reading the right content if
| it could be overwritten later on?
| skissane wrote:
| Sometimes "reading the right content" isn't that
| important - e.g. "what is this random doc document
| about?" "oh, it is a design doc for the XYZ subsystem".
| Unless the changes completely rewrote the document into a
| completely different document, which I expect would be
| rare
|
| If I was going to use the document in anger, I would open
| it with something proper, of course
| vidarh wrote:
| I once negotiated a higher offer for a job because the
| company sent out an offer letter they'd done this with,
| where the deleted details for another offer gave me info
| about another role that made me (correctly) guess there
| was room to ask for more.
| pests wrote:
| Reminds me of whatever image format or editor that
| handled cropping the same way. Data was still there,
| bounds just redefined.
|
| I remember a celebrity leaking some photos or similar
| back in the early 2000s or similar.
| NikkiA wrote:
| > but it can't run over SSH.
|
| I know it's being pedantic, but it absolutely can,
| libreoffice will happily run over a ssh -X tunnelled X
| display.
| skissane wrote:
| Oh yeah, but that would require me to start an X server.
| Which I could do, but why bother when less does the job?
|
| Also, less starts a lot faster than LibreOffice does
| lizknope wrote:
| Yeah, I stopped reading the article, downloaded the file, the
| only word processor is in Libre Office. It seemed to work fine
| so I didn't know what the issue was. Then I read the article
| and kept scrolling to the end where the author finally uses
| LibreOffice and it opens mostly okay.
| vdaea wrote:
| So does Word 2019 for Windows.
| jgrahamc wrote:
| Is the formatting correct? Are the images visible? Because
| others report (see other comments) that Word opens the file
| but the images are missing. See the Word generated PDF here:
| https://news.ycombinator.com/item?id=39359079
| vdaea wrote:
| Yes, you are right, apologies. I thought it wouldn't open
| at all, like in the screenshot in that blog post.
| ogurechny wrote:
| Well, StarOffice already existed back then. Now I wonder
| whether LibreOffice still has some early '90s third party
| format parsing code inside, or some reverse engineered
| compatibility and conversion code from much later Word version
| actually does the job.
| jasomill wrote:
| Give QEMU a try -- current versions do a great job emulating a
| Power Mac, able to run the most recent PowerPC versions of both
| classic Mac OS (9.2.2) and Mac OS X (10.5).
| voltagex_ wrote:
| With what command line?
|
| Figuring out what to ask qemu to do (without libvirt!) is
| half the battle.
|
| (Thanks though, I have something to play with tonight)
| jasomill wrote:
| On macOS, I typically run it from an .app bundle containing
| a one-line shell script that execs the following script
| with the "-monitor vc" option (to enable access to the QEMU
| monitor via a menu command in the Cocoa GUI; when actively
| using the monitor, I run the script directly with the
| "-monitor stdio" option instead, as opening the monitor in
| the Cocoa GUI hides the emulated Mac's display):
| #!/bin/bash export PATH=
| here="$(/opt/ld/bin/realpath -s "$(/usr/bin/dirname
| "$0")")" workdir="$here"
| name="$(/usr/bin/basename "$workdir")"
| qemu='/opt/qemu/bin/qemu-system-ppc' cd
| "$workdir" \ && exec "$qemu" \
| -display cocoa \ -L pc-bios -boot c -no-
| reboot \ -M mac99,via=pmu -m 768 \
| -rtc base=localtime \ -g 1920x1080x32 \
| -prom-env 'boot-args=-v' \ -prom-env 'auto-
| boot?=true' \ -prom-env 'vga-ndrv?=true' \
| -nodefaults \ -device pci-ohci,id=usb0 \
| -device usb-kbd,id=keyboard0 \ -device usb-
| mouse,id=mouse0 \ -device
| VGA,edid=on,vgamem_mb=32,id=vga0 \ -nic tap,i
| d=nic0,ifname=tap9,script=no,downscript=no,model=sungem,mac
| =00:50:56:16:65:09 \ -drive
| file="$here/disk/Classic.img",format=raw,media=disk,id=hd0
| \ -drive file="$here/../../scratch/$name/Scra
| tch.img",format=raw,media=disk,cache=unsafe,id=hd1 \
| -drive media=cdrom,id=cd0 \ "$@"
|
| Paths are (obviously) site-specific, _realpath_ is the GNU
| version -- used here to ensure nice-looking absolute paths
| in light of my heavily symlinked filesystem -- and specific
| details (options supplied in no particular order, $workdir
| vs $here, etc.) are artifacts of hours of fiddling and not
| cleaning up afterwards.
|
| I'm currently running a version of QEMU recently built from
| Git, though I haven't changed this script in years.
|
| For networking, I'm currently using the notarized tap kext
| bundled with Tunnelblick[1].
|
| Finally, I'm currently using an Intel Mac, so YMMV with
| Apple Silicon or Linux, though I have no particular reason
| to believe any command-line changes would be necessary,
| other than the obvious -display change to something other
| than cocoa for Linux.
|
| [1] https://www.tunnelblick.net/downloads.html
| arnaudsm wrote:
| Great cautionary tale about how quickly formats get obsolete,
| especially closed source ones.
|
| I use markdown, plaintext and png for all the documents I need to
| store long term.
|
| Even if these formats disappear, I could trivially reimplement my
| own parser.
| ComputerGuru wrote:
| Isn't markdown plaintext? (I didn't downvote.)
| williamcotton wrote:
| Isn't HTML plaintext?
|
| ;)
| ComputerGuru wrote:
| Yes, but not intended to be directly human readable by
| contrast.
| Narishma wrote:
| If it wasn't intended to be human readable it would have
| been a binary format.
| robinsonb5 wrote:
| It may have been intended to be human readable, but it
| failed dismally in that goal.
|
| Even before the web turned into the javascript infested
| swamp that is now, the tags having the same visual weight
| as the text they enclose made it tiring to read.
|
| Markdown's genius is in the formatting tags being almost
| no hindrance to readability.
| williamcotton wrote:
| I definitely agree that Markdown is more readable than
| markup, but personally I abhor what some frameworks do to
| HTML. I make sure my HTML is legible! There is even a
| benefit when it comes to hyperlinks in that you can _see_
| the URL!
| elzbardico wrote:
| As a society we should have been thinking more about digital
| preservation since the time we started eschewing archiving hard
| copies in paper.
|
| People who don't know history are doomed to repeat it, but how
| can our future generations learn from our mistakes if all our
| documents are unreadable or lost by their time?
| zokier wrote:
| Are you just casually dismissing all the work that digital
| archivists have done over the past couple of decades?
|
| https://www.loc.gov/librarians/standards
|
| https://www.loc.gov/preservation/digital/
|
| https://www.loc.gov/programs/digital-collections-
| management/...
|
| and that's just Library of Congress, they are hardly alone in
| this field
| kragen wrote:
| implementing a markdown parser is far from trivial
|
| implementing a parser that tricks people into believing it
| parses markdown because it acts like a markdown parser in
| simple cases is what is trivial
|
| it's likely that your markdown data will indeed be recoverable,
| but if you're generating it yourself, html is probably safer
| arnaudsm wrote:
| Parsing markdown is multiple orders of magnitude easier than
| Microsoft Word, especially before docx.
|
| And it has the merit to be human-readable in plaintext!
| kragen wrote:
| that's probably true
| jprete wrote:
| But the Markdown document doesn't actually need a parser to
| still be usable. Markdown as a whole imitates the conventions
| of typed text. The table formats would even be usable on an
| old typewriter.
| kragen wrote:
| markdown doesn't have tables, although you can include html
| <table> tags in it. perhaps you mean
| indented fixed-width blocks you can use for
| ascii art or typewriter-style tables?
| kelnos wrote:
| Sure it does. It may not be in the original standard, but
| many/most parsers support tables that use pipe characters
| to separate columns.
|
| And regardless, markdown documents -- including the table
| extension -- are readable without a parser.
| kragen wrote:
| extensions to markdown aren't markdown; that's why
| commonmark is called commonmark
|
| not being able to tell which variant of a language is in
| use is one of the biggest problems for archival, and in
| particular various extensions to the microsoft word
| format (all made by the same company!) were what made
| jgc's archival work so difficult in this case
|
| language extensions are an especially bad problem when
| there's no extension mechanism--because sometimes a pipe
| is just a pipe. but unfortunately markdown's only
| extension mechanism is html
| samatman wrote:
| It's called CommonMark because Gruber insisted. Not
| because extensions to markdown aren't Markdown(r), which
| no one cares about, and not because it isn't markdown in
| the ways that matter.
|
| Ironically, his objection was to the idea of a single and
| rigorous standard, you'll note that Git-flavored markdown
| never drew his wrath. And yet you're treating him and
| Swartz's implementation as if it was such a standard.
| Which it is not.
| zilti wrote:
| Or org-mode format. Then you even get tables properly.
| samatman wrote:
| The (only) issue is that Markdown isn't a format, it's a
| loose family of formats with many extensions. Implementing a
| parser Commonmark is not an especially difficult task in the
| grand scheme of things, it's quite well specified and has an
| extensive test suite.
|
| Although I find myself wondering what this "parsing Markdown"
| business is even about. It's perfectly legible as plain text,
| that was the main design principle behind it. If the goal is
| to have your data accessible in future, if you can read it
| now, and you don't go blind, you'll be able to read it later
| as well.
| inopinatus wrote:
| strictly speaking, markdown is a superset of html
| mnw21cam wrote:
| The problem with markdown is that if you want to convert it to
| a formatted set of pages, the output will differ based on the
| version of your markdown converter. Similarly for HTML and also
| for plaintext to an extent. A PDF _should_ remain exactly the
| same forever, but AFAIK the only properly editable document
| type that really keeps exactly the same formatting over time
| with updated software releases is TeX /LaTeX. In fact, that is
| a guarantee - if a LaTeX version _doesn 't_ produce exactly the
| same layout as a previous version for the same input document,
| it's officially a bug.
| zzo38computer wrote:
| For such reasons, I think it is a good idea to use plain ASCII
| text format to document protocols and file formats as much as
| possible. (It is especially a problem if the documentation of a
| more complicated format or protocol requires use of that format
| or protocol itself.)
|
| There is also Just Solve The File Format Problem wiki (which I
| have added stuff to), although it uses HTML, and does not
| include full specifications for all file formats (but it does
| for some of them), and in some cases are links to external
| files, but it is helpful to find information about file formats
| anyways.
| dzdt wrote:
| Somehow the author doesnt recognize that emulation is a
| legitimate answer to this question. Yes he was able to open the
| document, by using the original software on a highly accurate
| emulation of the original system. Everything beyond that point is
| a different question: can we get it inside of a modern word
| processor.
| jgrahamc wrote:
| Sort of. What I wanted was to be able to get a PDF version of
| it. I was hoping that a modern word processor would read the
| file format, and LibreOffice did. But it's also true that using
| emulation I was able to get a PDF (albeit one that has
| different fonts).
| nextaccountic wrote:
| > it's also true that using emulation I was able to get a PDF
| (albeit one that has different fonts).
|
| Maybe you needed to have the right fonts installed in your
| emulated mac? Another comment in this thread pointed out this
| londons_explore wrote:
| Emulation is starting to get gaps too... for example, running
| Windows 95 in an emulator on a modern machine is getting harder
| and harder (emulators like vmware and virtualbox don't emulate
| the CPU speed accurately, which causes the system not to boot,
| and they also don't emulate various paging behaviours of old
| intel CPU's accurately which causes windows applications to
| crash within a few seconds of starting).
|
| There are binary patches to windows 95 to fix these issues, but
| as the system gets older it's less likely people will put
| effort into binary patching it for compatibility with modern
| systems. And if it were more obscure, you'd be SOL.
| fourfour3 wrote:
| Whole system emulation like 86box does a much better job of
| emulating older hardware and OSes - I use it quite a bit for
| DOS/Win3.11/Win9x era stuff.
| thawkth wrote:
| PCem is far, far better for Win95 emulation - it can handle a
| P2 233 and a Voodoo3 fairly accurately - and tons and tons of
| hardware on top of that.
|
| It's amazing. I keep a 95 / 98 and some other vintage
| machines around as a hobby, but being able to play Unreal in
| an emulator with 3D acceleration blows my mind
| fourfour3 wrote:
| How have you found the Voodoo 3 emulation? I have found it
| a bit ropey in 86box/PCem - but I find voodoo 1 or 2 works
| really well.
| Narishma wrote:
| Those are virtual machines, not emulators. If you use a
| proper emulator like PCem or 86box, Windows 95 works fine.
| thaumasiotes wrote:
| > running Windows 95 in an emulator on a modern machine is
| getting harder and harder (emulators like vmware and
| virtualbox don't emulate the CPU speed accurately, which
| causes the system not to boot, and they also don't emulate
| various paging behaviours of old intel CPU's accurately which
| causes windows applications to crash within a few seconds of
| starting)
|
| I thought the normal way to run Windows 95 was in dosbox?
| markus92 wrote:
| As a testament to Microsoft's backwards compatibility: the file
| opened mostly fine in the Windows version of Word (version 2401),
| and the layout seems to be identical to the PDF of the article.
| It did block the file format by default but that was easy enough
| to allow.
|
| The graphics did not open however, due to a missing graphics
| filter for the Microsoft Word Picture format. Seem it's been
| deprecated for a while now but Word 2003 should be able to open
| it? Which is old, but not _that_ old not to run on modern
| systems.
| markus92 wrote:
| Installed a copy of Word 2003, document opened flawlessly
| immediately with default settings. Saving it from there
| converted it to a modern .doc which I could open with Office
| 365 and convert to PDF etc.
|
| I think the moral of the story is that the Windows Office team
| seems to spend a bit more time on backwards compatibility.
| jgrahamc wrote:
| I would be interested to see a PDF generated from Office 365
| to understand how flawless it really is.
| zokier wrote:
| Here you go, exported from desktop Word to PDF.
|
| https://drive.google.com/file/d/1lnaSr22l3kQbmFHnxg3Ggd3-46
| v...
|
| Full version string:
|
| Microsoft(r) Word for Microsoft 365 MSO (Version 2311 Build
| 16.0.17029.20140) 64-bit
| jgrahamc wrote:
| Right. So all the images are missing. LibreOffice still
| gives the best conversion I think.
| markus92 wrote:
| Yeah, that's why you need Word 2003 for the images, it's
| a deprecated format full of security holes I guess.
| giancarlostoro wrote:
| Ah... yeah I was wondering why they would deprecate an
| image format at all. My understanding is that Word in the
| old days serialized what was in memory, maybe that was a
| little too exploitable with images?
|
| Not sure just curious not even sure where to look that
| one up honestly.
| zokier wrote:
| Digging through the files a bit I think the images are in
| PICT format which is very specific to Macs (the original
| ones). Its not surprising that modern Word doesn't
| support those that well as they are actually somewhat
| complicated kinda-vector image format. I am surprised
| that even Word 2003 implemented PICT on Windows.
| ogurechny wrote:
| It's not "kinda-vector", it's a metafile format for
| QuickDraw operations (Windows did the same later with
| WMF, which was a list of GDI operations).
|
| http://fileformats.archiveteam.org/wiki/PICT
|
| Imagemagick supports it. What's more important, QuickDraw
| source is available, so not only we can have "some"
| conversion, we can also reason about its correctness (to
| some extent -- according to comments, it's from
| 1982-1985).
|
| https://computerhistory.org/blog/macpaint-and-quickdraw-
| sour...
|
| Extracting raw embedded PICT files from the document and
| working with them would be the best way to get proper
| charts. To see what appeared on paper, we can direct
| emulated system output to an emulated printer, or capture
| the PostScript commands and rasterize them at the
| resolution that was used by device available to the
| author. It is well known that Word for Windows stored
| last used printer settings in the document, so it could
| be the same for files produced by Mac version.
|
| (M-hm, it says "Laserwriter" at 0x10097. Maybe they all
| do.)
|
| Because Microsoft made the most popular document editor
| for both Windows and Mac, they had to deal with
| interoperability of two versions of their own software.
| Supporting WMF/EMF on Mac meant they had to drag GDI
| implementation along with Office (luckily, the reference
| could be grabbed from their colleagues). Supporting PICT
| on Windows meant they had to re-implement QuickDraw
| primitives.
|
| https://en.wikipedia.org/wiki/History_of_Microsoft_Word
|
| https://news.microsoft.com/1999/04/26/office-98-built-
| for-th...
|
| It is totally possible that Office applications used
| built-in PICT parser even on Mac to make things simple,
| and not rely on 15 years of compatibility layers in the
| system.
| zokier wrote:
| Probably the completely best would be to use LO for the
| images and Word otherwise... needs some manual twiddling
| but I suspect that way you can get pretty much perfect
| layout and images.
| ogurechny wrote:
| Office applications up to (and probably including) version
| 2010 break and crash on latest Windows versions. That
| behavior varies based on Office service packs and updates
| installed. You were lucky to be able to just _save the
| document_.
|
| Unless, of course, you've found some _portable version_ on
| the net that packs ThinApp and an assortment of old system
| libraries under the hood.
| markus92 wrote:
| I had no problems installing a vanilla Office 2003 on
| Windows 11 23H2. Got the iso from archive.org and it
| installed without a hitch.
| astura wrote:
| This has not been my experience, I'm wondering where you
| heard this information from?
|
| I have Office 2003 (or maybe it's 2007?) installed on my
| work computer, no problems. It even happily coexists with
| whatever modern Office version I have installed on there
| too.
|
| I also have Office 2010 installed on my home computer and
| my husband uses it all the time. No issues.
|
| Both computers are running Windows 10, so I guess it's not
| technically "the latest version."
| Moru wrote:
| I think they spend extra time creating those backward
| compatibility problems just to make it harder to create a
| perfect third-party tool.
|
| [1] https://www.infoworld.com/article/2618153/how-microsoft-
| was-...
| crazygringo wrote:
| I'm surprised he didn't try an intermediate version of Word --
| not the original Word 4.0 for Mac, but not the current online
| version of Word either.
|
| I had a lot of old Word 4.0 for Mac files at one point, and
| remember some point in the late 1990's or early 2000's opening
| them all up in a version of Word for Windows, and then re-saving
| them in a more up-to-date Word format. I believe there was an
| official converter tool Microsoft provided as a free add-on or an
| optional install component -- it wouldn't open the "ancient" Word
| formats otherwise.
|
| There's definitely going to be a chain here of 1 or 2
| intermediate versions of Word that should be able to open the
| document perfectly and get it into a modern Word format, I should
| think -- and I'm curious what the exact versions _are_. (Although
| as other people point out, if you don 't need to edit it, then
| exporting it as PostScript in Word 4.0 and converting it to PDF
| works fine too.)
| jasomill wrote:
| As I've discovered while playing with this document and reading
| this thread:
|
| Current Word for Mac blocks opening the file under discussion,
| with no obvious workarounds.
|
| Current Word for Windows will only open the file with non-
| default security settings, and won't render the images at all.
|
| Per Microsoft, PICT image support was removed from all versions
| of Word for Windows in August 2019[1].
|
| The current version of Word for Mac fails to render the images
| with a misleading error message ("There is not enough memory or
| disk space to display or print the picture.").
|
| As for fonts, they _should_ render fine assuming you have
| matching fonts, where "matching" is defined by some
| application- or OS-specific algorithm, _e.g.,_ a post above
| indicates LibreOffice (on Linux?) substituting Times New Roman
| for Palatino when Palatino Linotype was avilable, whereas
| current Word on Windows 11 has no problem rendering Palatino as
| Palatino, presumably using the copy of Palatino Linotype
| installed with the OS.
|
| Finally, if matching spacing (character, word, and line), line
| breaks, and page breaks is important, you should definitely
| open the document using as close a version of Word as possible
| with the exact fonts used when creating the document installed.
|
| Oh, and hope the original author didn't rely on printer fonts
| without matching scalable screen fonts available, or else
| you're probably SOL unless your goal is printing to a
| sufficiently similar printer.
|
| [1] https://support.microsoft.com/en-gb/office/support-for-
| pict-...
| elzbardico wrote:
| I am deeply disappointed that a company like Microsoft doesn't
| make a point of Microsoft Word being able to open any document
| created by any version of Word, no matter how ancient it is. I
| think they have the social/historical/economical responsibility
| of doing so.
|
| If they are worried about vulnerabilities in the old parsing
| code, move it to an external process, run it under isolation in a
| sandbox to spit out a newer readable version on the fly, but
| don't eliminate this capability from the software.
|
| EDIT: zokier pointed out to me that the desktop version of Word
| opens the file fine, it is only the web version that doesn't. So,
| consider this post void.
|
| EDIT 2: Well it opens the document, but is not able to display or
| print the embedded graphics, it seems.
| OJFord wrote:
| You don't have to go _anywhere near_ 1990 to find issues with
| modern Microsoft (especially cloud) apps opening documents
| created in older ones!
| kiwijamo wrote:
| Indeed. If I ever end up in the cloud version of Word (or
| indeed any other app) my first instinct is to click 'Open in
| App'.
| zokier wrote:
| You missed the fact that the real Word does open this file just
| fine, its just the toy web version that has issues (and maybe
| Mac too but eh)
| elzbardico wrote:
| Oh, really? I stand corrected. Thanks for pointing this out.
| jgrahamc wrote:
| No, you're not wrong, another commenter points out that
| latest Word opens the document but doesn't display the
| graphics.
| ben7799 wrote:
| The Office 365 Mac version refuses to open it.
|
| You can recover text but the result is horrible. No graphics
| and all formatting lost.
| jgrahamc wrote:
| Yes, it opens it and throws away the graphics, so not "just
| fine".
| zokier wrote:
| If we go into splitting hairs, it doesn't really throw the
| graphics away, it simply lacks the "filter" to display them
| but they are there still, as in it recognizes the graphics
| object correctly and lays out it on the page. Based on the
| error message, hypothetically I suppose you could even make
| a custom filter to handle the object.
|
| But this really goes more into the facet of Office files
| that allowed embedding pretty much anything into them, and
| relying on this "filter" system (I guess OLE) to handle
| embedded objects. So while the DOC file itself is getting
| parsed and rendered pretty much perfectly, the embedded
| objects are another story.
|
| In the same sense I'd say browser might open some HTML page
| "fine" even if it doesn't know how to handle some image
| format that is used on the page; it'd still handles the
| HTML correctly.
| jdofaz wrote:
| Makes me wonder if the graphics are in PICT format
| zokier wrote:
| I think they are. You can even find some PICT files
| inside the ODT in the github from TFA
| petersmagnusson wrote:
| if you read the blog, the main point of OP's project was
| to get at the diagrams, so hardly "splitting hairs".
| nullindividual wrote:
| This is expected with the web versions of Office. They can
| read (certain) binary Office formats but not edit them. The
| web version of Office is designed for OpenXml file formats.
| nullindividual wrote:
| Old file formats have security vulnerabilities. The online
| version of Word is designed for docx only, although it can open
| certain binary documents.
| o11c wrote:
| Fundamentally, a data file format can't have vulnerabilities.
| At most it can be prone to vulnerabilities, but more often
| it's just that popular implementations are bad.
| nullindividual wrote:
| Sorry, the Word parser does and Microsoft did not feel it
| important enough to fix as their focus is on OpenXml
| formats.
| kelnos wrote:
| Then that's on Microsoft. There's no fundamental reason
| why a secure parser can't be written for old formats.
| nullindividual wrote:
| Why would Microsoft do that? It makes zero financial
| sense to continue with a parser that may need to be
| rewritten from scratch for a ~30 year old format.
| genewitch wrote:
| they can do what they want, and i'll continue on my 2
| decade long decision to never give microsoft money, for
| anything. Same way i'll never give propellerhead another
| dime, or Plex[0], or any of these other consumer-hostile
| companies.
|
| I don't trust MS to maintain software, even though as far
| as that goes, they're better than a lot of companies that
| have been writing software for decades. "time marches on"
| is silly when we have millions of times the compute,
| storage, and transit speeds available to us. I also don't
| see why people see the need to shill for multi-billion
| dollar companies.
|
| What microsoft should have done is trademark a new name
| for their word processor the second they made the
| decision to not open word .doc from older versions. That
| way there's no confusion.
|
| [0] having a hard time remembering the name/company of
| the software i purchased for in-house streaming over a
| decade ago. Plex is still a hassle to use for in-house
| streaming compared to the "service" or whatever they're
| selling. Unfortunately Synology seems to have grown weary
| of releasing a version of their client for every
| newfangled device that comes to market, so i'm stuck with
| plex on my TV; that is, unless i want to use a stick/set-
| top/computer attached to it.
| nullindividual wrote:
| > I don't trust MS to maintain software
|
| Then you should champion removal of any "old" software
| they have that is under maintenance-only status. You
| wouldn't want security vulnerabilities to go unfixed,
| would you?
|
| > What microsoft should have done is trademark a new name
| for their word processor the second they made the
| decision to not open word .doc from older versions. That
| way there's no confusion.
|
| That makes zero sense. Word is still Word. It performs
| the same tasks (and more) as Word 1.0 did.
|
| And Word today still reads/writes .doc, just not versions
| that are that old.
| kelnos wrote:
| No they don't. Parsers can have security vulnerabilities, but
| you can fix those, and there's little reason why a parser for
| an old format would have more vulnerabilities than for a new
| format. Some formats can also have certain (intended)
| features that have security implications, but parsers can
| choose to disable them if they are concerned.
| larsrc wrote:
| Many old formats were essentially just binary dumps of memory,
| or something not far removed. Documenting the formats was not a
| standard. Yes, I agree that there is a social responsibility,
| but having worked in digital archiving I can tell you that the
| olden days were really, really messy. No, really.
| resters wrote:
| This is the point that many of the commenters who criticize
| Microsoft are missing, and it's why the old formats are not
| enabled by default (security vulnerabilities) and why it's
| not as simple as creating a parser.
| autoexec wrote:
| Microsoft still deserves criticism for designing their old
| word formats so badly. It was a design choice to turn
| documents of mostly text into obscure binary formats that
| were badly standardized and maintained.
| resters wrote:
| Not true at all. Some of Microsoft's best minds created
| _extremely ingenious_ methods that allowed early word
| processors to be usable on files that were dramatically
| larger than what would fit in memory. OSes didn 't
| support suitable performance via VM infrastructure at the
| time. It was clever, outside of the box thinking that got
| MS to be able to beat WordPerfect (a worthy competitor)
| and the many other also-rans.
|
| There was (contrary to popular belief) not a deliberate
| strategy to limit interoperability. It was simply the
| reality of the approaches utilized that made them tightly
| coupled to the MS Word codebase and less standardizable
| than would have otherwise been ideal.
|
| Source: one of the guys who worked on it at MS.
| unsui wrote:
| no they don't.
|
| They were effectively working at embedded scale, trying
| to capture state within tremendously limiting
| constraints.
|
| This is a case of interpreting past decisions based on
| current criteria, when those same conditions would have
| prevented modern methods from being implemented.
| bogantech wrote:
| > Microsoft still deserves criticism for designing their
| old word formats so badly.
|
| I would love to see some modern devs try to write
| software for a 68000 system with only 512K of memory
| layer8 wrote:
| Word 4.0 ran from floppy disks on PC XTs (8088 CPU) with
| 320 KB of RAM. You can't afford an elaborate parser in
| such limited memory, or you'd have to swap out its
| implementation on floppy on every load and save. Just
| running the parser would have slowed down document
| loading significantly. The floppy disk capacity also
| wasn't much larger. You already had to swap the disks for
| doing spell checking or similar. For comparison, the
| first web browser (WorldWideWeb) was an executable of
| about 1 MB and ran on a much faster 32-bit NeXT computer
| with 8 MB of RAM and a hard drive.
| pompino wrote:
| Is there any commercial software development company with
| better backwards compatibility creds than Microsoft? I'm
| genuinely curious.
| rietta wrote:
| Extremely interesting and thank you for doing this. I feel
| strongly that this goes to show just how important preserving
| historical software and emulation is. I have dabbled myself with
| old Windows 3.1 software for this very reason. We really, truly
| are going to have a period where web application driven software
| just disappears and we wont easily have this retro computing view
| of these decades in a short time from now.
| dfxm12 wrote:
| I also think it is important to show the importance of open
| formats or open source in general if we want future generations
| to read our documents or run/compile/understand our software.
| CharlesW wrote:
| _[silly pre-coffee post deleted]_
| jgrahamc wrote:
| Word is already available on the Infinite Mac as it's under
| Productivity inside the Infinite HD. No need to install it.
| whoopdedo wrote:
| > That way I can see actual fonts, font sizes and layout to
| confirm how the document should have looked.
|
| Or you would if you had the original fonts. Word 4.0 was released
| for System 6 with support as far back as System 3.2. Fonts at
| that time had separate screen and printer files for the different
| output resolutions. If you're missing the printer font it'll
| print a scaled (using nearest-neighbor) rendering of the screen
| font. If you're missing the screen font it'll substitute the
| system font. (Geneva by default, as seen in the screenshot.)
|
| In this case, only the well-known Palatino and Courier typefaces
| are needed. But LibreOffice substituted Times New Roman even
| though I have Palatino Linotype installed.
| jgrahamc wrote:
| That may go some way to explaining some of the differences I
| see, but the main thing I was looking for in the emulation was
| the font sizes.
| aidenn0 wrote:
| Doesn't the font matter almost as much as the font-size
| setting for font sizes, given that different font families
| can have wildly different metrics at the same font size?
| jgrahamc wrote:
| I bet it does. I should redo the final part after
| installing the required fonts.
| jasomill wrote:
| This is probably because the (internal) name of Palatino
| Linotype is "PalatinoLinotype" (for the version shipped with
| Windows) or "PalatinoLTStd" (for the Adobe OpenType version).
|
| In the absence of a hard-coded special case, font matching
| based on common prefixes could easily match something
| inappropriate, such as -- taking the first example I see on my
| machine -- mapping "Lucida" to "LucidaConsole", when almost any
| proportional sans-serif font would arguably be a better match
| for the document author's design intent.
|
| Then again, even exact name matches provide no guarantees. For
| example, Apple has shipped two fonts (internally) named
| NewYork: the TrueType conversion of Susan Kare's 1983 bitmap
| design for the original Macintosh, and an unrelated design
| released in 2019.
| whoopdedo wrote:
| It's more that I half-expected well-known mappings to be
| baked in. Like "Times" -> "Times New Roman".
|
| Didn't they also name one of their new fonts "SanFrancisco"
| much to the ire of Susan Kare fans.
| jasomill wrote:
| Yes, but the current OpenType San Francisco fonts use "SF"
| in their (display and internal) names, so no naming
| conflict exists with the original "ransom note" bitmap
| font.
|
| Also, as far as I know, of the original Mac fonts, Apple
| only ever shipped TrueType versions of Chicago, Geneva,
| Monaco, and New York. And I'm not aware of any OS with
| native support for both OpenType and classic Mac bitmap
| fonts (conversions are always possible, of course).
| stuaxo wrote:
| This is good.
|
| It would be good to get some feature requests into libreoffice to
| fix the remaining mis-matches in the formatting.
| scaglio wrote:
| This rises a potential problem, often underrated by companies:
| some have backups with _infinite_ retention.
|
| It is common to have backups with retention of 10 years, some may
| have 20 years for legal reasons... but the majority of people
| don't understand the difference between "readable" and "usable".
|
| Of course, it depends on the data... And there are companies
| backing up whole _virtual machines_ with infinite retention,
| believing to be able to run them: it is hard enough to restore a
| vSphere 5.x machine on a brand new vSphere 8, I really don 't
| understand this waste of space.
| actionfromafar wrote:
| Often an old file or disk image is tiny compared to modern file
| sizes.
|
| So the waste of space is more of an administrative character
| than a waste of _disk_ space.
| rvnx wrote:
| If you backup all, you can sort later, and even eventually
| never. It costs 1 USD per month at Google Cloud to store 1TB of
| data.
|
| At this price it's not worth sorting, when one single devops
| costs 100 USD+ per hour, not including the opportunity cost of
| not working on something more productive (and less boring for
| the developer).
|
| Then X years after the company is acquired, or sufficient time
| has lapsed, you can delete / drop the data without sorting.
|
| Regarding virtual machines, if it's VMDK for example, you can
| read the raw disks without booting it, and again, it's not
| worth taking a risk to lose data to potentially save 10 USD per
| month, which is similar to one developer taking one beer extra
| at a team event.
| scaglio wrote:
| > if it's VMDK for example, you can read the raw disks
| without booting it
|
| Yes, but that's the difference between "readable" and
| "usable". Many companies don't realize the technical
| difficulties to be able to _run_ the VMs. They just expect
| that it will work, if needed.
| anonymouskimmer wrote:
| WordPerfect claims the ability to open MS Word 4.0 files. The
| standard edition is currently $175. I'm not buying it, but if
| you're willing to spend $175 it might be something to try.
| caboteria wrote:
| Yet another example of why Apache needs to take OpenOffice behind
| the barn.
| EasyMark wrote:
| You mean retire it to a nice farm upstate, little Jimmy might
| hear the shotgun blast!
| acheron wrote:
| "Here's a 4000 year old letter from a merchant to his partners
| describing how to avoid taxes by smuggling goods in their
| underwear." ( https://www.britishmuseum.org/blog/trade-and-
| contraband-anci... )
|
| vs
|
| "Not sure if it's possible to read this 30 year old file!"
| kelnos wrote:
| I get the point you're trying to make, but your former example
| is rare. While there are more exceedingly-old paper records
| that are still around and have been preserved than we might
| expect, we've lost so, so much. Paper and ink (and variations
| on that) are both fragile.
|
| Digital documents are otherwise easy to preserve indefinitely,
| if care is taken up-front to choose a simple document format
| that is likely to remain parseable (or at least documented) for
| a long time. And even when you don't do that, there's always
| the possibility of writing a parser later (assuming
| documentation is around) or reverse-engineering the format.
|
| And in this case, the 30-year-old file did end up getting
| opened, albeit not as trivially easily as one might hope.
| thaumasiotes wrote:
| > but your former example is rare. While there are more
| exceedingly-old paper records that are still around and have
| been preserved than we might expect, we've lost so, so much.
| Paper and ink (and variations on that) are both fragile.
|
| Depends what you mean by "rare". Ancient Near Eastern
| correspondence isn't rare at all, precisely because they
| didn't use paper. (And they went to war a lot.) You seem to
| be writing as if that letter was a paper document, but it
| isn't. Paper records that old only exist in Egypt.
|
| > Digital documents are otherwise easy to preserve
| indefinitely, if care is taken up-front to choose a simple
| document format that is likely to remain parseable (or at
| least documented) for a long time.
|
| This isn't a good match to the example either; Ancient Near
| Eastern records had to be deciphered. (The Semitic ones had
| to be deciphered. The Sumerian ones benefited from surviving
| documentation, but we had to find that and learn how to read
| it.)
|
| The original example isn't particularly apt; reading this
| 30-year-old file, or a similar one, is a task that one guy
| can do in less than a week using existing tools and know that
| he's done it correctly. Reading a 4000-year-old cuneiform
| letter was a much larger project than that.
| pjmlp wrote:
| Until they find a storage medium that don't deteriorate
| through time, nope, digital storage is still worse than plain
| paper or clay, in losing its storage capacity and it is
| enough to have one bad bit.
| melomac wrote:
| I was able to download and transfer the proposal document to a
| Mini vMac emulator, set the Finder's type and creator to those of
| a Microsoft Word 5 document i.e. respectively WDBN and MSWD, and
| finally open the document with Microsoft Word 5 for Mac to export
| it as a RTF document.
|
| Here you have it: https://neko.melomac.net/tmp/proposal.rtf
|
| I certainly agree opening a document from this Macintosh era
| should be, by far, easier than the process I detailed below, but
| this is how it is -\\_(tsu)_/-
| jgrahamc wrote:
| Thanks. Unfortunately, the images are all missing.
| melomac wrote:
| It is even more frustrating that the image are in the
| document, and Microsoft Word for Mac would still display them
| accurately.
|
| And LibreOffice would display the images in the RTF document
| in a different size (a tiny block).
|
| If my old Mac display would work, I could have been able to
| send the document over to CUPS via Netatalk, and make a PDF
| out of it. Unfortunately Mini vMac can't connect to that VM
| on the LAN...
|
| Anyhow, it is scandalous that opening legacy documents became
| such a PITA.
| bluedino wrote:
| That Mac Word screenshot gives me claustrophobic flashbacks to
| trying to work on those tiny screens in middle school computer
| lab, writing science fair papers.
| cynicalsecurity wrote:
| It wasn't so bad. It's better now, but it was fine back then.
| whoopdedo wrote:
| I consider it more of not knowing how much better we could
| have had it. Small monitors were "normal." But I imagine
| people who got to work with the Portrait Display[1] (an
| impressive 640x870 resolution!) felt then as we do now when
| they had to switch back to the internal screen.
|
| [1] https://wiki.preterhuman.net/Apple_Macintosh_Portrait_Dis
| pla...
| retrac wrote:
| Heh, that screenshot is relatively high-resolution for the time
| in question, too. 800x600 maybe? The compact Macs were 512x342:
| https://www.betalogue.com/images/uploads/microsoft/pce-mac-w...
| (The toolbars, rulers, etc., could be hidden in the settings.)
| cranberryturkey wrote:
| libreoffice opened it.
| kelnos wrote:
| Sure, but the layout was screwed up and the fonts and sizes
| were wrong.
|
| Certainly this is helpful: it's better to be able to open a
| document and then have to manually fix those issues than to be
| unable to open it at all. But it was far from perfect.
| EasyMark wrote:
| It's orders of magnitude better than "I can't open this file
| at all, -1"?
| cranberryturkey wrote:
| agreed, but you could probably export as rich text or
| something.
| Sembiance wrote:
| This does an "okay" job at converting the document:
| https://archive.org/details/KeyViewPro
|
| Here is the converted PDF:
| https://smallpdf.com/result#r=091f20f23de353fac21376a3a49a60...
| jgrahamc wrote:
| Not sure that's really true. It did something but the images
| are a mess and a lot of formatting is gone. I think LibreOffice
| is still the winner here.
| bilsbie wrote:
| I wonder if it would be a viable business to keep running
| versions of computers going back say 40 years and offering to
| recover and convert files for people. (Just getting stuff off
| floppy disks and Zip drives might be useful)
| traceroute66 wrote:
| Interestingly, the latest and greatest version (desktop app via
| Office365) of Microsoft Word on Mac appears to know what it is
| _but_ refuses to open it.
|
| If you drag the file onto Word, it launches a dialogue box
| telling you "proposal uses a file type that is blocked from
| opening in this version" along with a link to the supporting page
| on the Microsoft website[1].
|
| [1] https://support.microsoft.com/en-us/office/error-filename-
| us...
| worik wrote:
| > telling you "proposal uses a file type that is blocked from
| opening in this version"
|
| "blocked"?
|
| That sounds like Microsoft has some IP problems with their old
| software.
| aidenn0 wrote:
| Normally I have good success with abiword, but it completely
| barfs on this file; it seems to be falling back on its RTF
| support.
| noufalibrahim wrote:
| One underappreciated (though mentioned) hero in this little saga
| is the venerable file(1) command. proposal:
| Microsoft Word for Macintosh 4.0
|
| It's so incredibly useful and so easily overlooked. I almost
| reflexively reach out to it when I'm curious about a file and the
| information it returns is just sufficient to satiate my curiosity
| and be useful.
| cpach wrote:
| I agree, _file_ is such a great tool.
|
| I have cursed so many times in the past when I sat in front of
| a work computer that ran Windows and didn't have this tool
| easily available. (Later on, WSL made life easier, but now I'm
| luckily nearly Windows-free.)
| AdamJacobMuller wrote:
| One might even say that file has a lot of magic in it.
| pdmccormick wrote:
| file has a lot of magic, but a file typically has only one
| magic.
| layer8 wrote:
| I'd say it has a number of magic.
| noufalibrahim wrote:
| Definitely uses magic to do its work.
| dorfsmay wrote:
| LibreOffice is amazing, beside being able to open many document
| formats, it can run headless and has command line options which
| allow automating some tasks such as converting format that would
| not be possible otherwise.
|
| https://help.libreoffice.org/latest/en-US/text/shared/guide/...
|
| https://opensource.com/article/21/3/libreoffice-command-line
| j45 wrote:
| https://www.ebay.com/itm/235033043066
|
| The original word for macOS software seems more than available.
| Dwedit wrote:
| Is there a way to make a PS or PDF file using the actual Word for
| Macintosh 4? I'd think that would be the definitive render.
| wrs wrote:
| Keep reading...he did that. But it's not clear he had the right
| PS fonts installed.
| jgrahamc wrote:
| I probably did not as I did it really fast after someone
| suggested it.
| aidenn0 wrote:
| Somewhat off-topic, but I remember Word for Windows 6.0 would
| take considerable time (like a minute for a 10 page document on
| my AM386DX/40) to reflow paragraphs across page-breaks (trying to
| handle widows, orphans &c). If I made an edit to the first page
| and hit print before it was done, I would end up with a printed
| document that contained either duplicated or dropped lines at
| page boundaries.
| jmclnx wrote:
| I have a few Wang WP Documents from decades ago. I could not open
| them at all. Libreoffice thought they were corrupted Word Docs.
|
| So the concern about some document formats being unreadable is
| still valid. Who knows what obscure proprietary formats exist out
| there.
| pseingatl wrote:
| Wasn't Multimate a Wang clone? Of course, finding an 8" floppy
| drive might be difficult.
| jmclnx wrote:
| It could have been. The Docs I have were created on the Wang
| PC using Wang WP. This 51/4" diskettes were used on those.
|
| I actually coped then to 31/2" later on.
| jtotheh wrote:
| Tragically, Postscript support has been largely removed from
| MacOS now. Apparently the language was weird enough that
| supporting it made some (in)security hacks possible. I guess I'm
| old ! I remember first finding out about it in 1986 when is very
| "leet". Postscript printers were big $.
|
| I say tragically because Postscript was pretty key in making DTP
| as compelling as it used to be, which kind of saved the Mac in
| terms of being the "killer app" for it.
|
| I think you may be able to run some kind of postscript support in
| some tool from Adobe, or even Ghostscript. And probably, the
| newer software is better, but it's sad that you can't view a
| postscript file on macOS out of the box now.
| jasomill wrote:
| While I agree -- my first exposure to PostScript as a
| programming language was playing around with examples from the
| Adobe "blue book"[1] over a bidirectional serial connection to
| a LaserWriter sometime in the '80s -- nothing in this document
| requires PostScript.
|
| The embedded images are in PICT format, and TrueType versions
| of the three fonts used (Courier, Helvetica, and Palatino) have
| shipped with all versions of the Mac OS since System 7 in 1991.
|
| And while Word 4.0 shipped in 1989, so did Adobe Type
| Manager[2], which supported Type 1 fonts onscreen and on non-
| PostScript printers, though to get a Type 1 version of Palatino
| for ATM at that time you'd have also needed the Adobe Plus
| Pack[3] (or possibly acquiring Palatino by other means; I don't
| recall when Adobe started selling individual fonts and the Font
| Folio).
|
| [1] https://archive.org/details/postscriptlangua00adobrich
|
| [2] https://www.nytimes.com/1989/12/19/science/personal-
| computer...
|
| [3] https://archive.org/details/adobe-a
| jtotheh wrote:
| Your information is much more detailed and specific. I was
| just giving an example of the loss of support for old
| software/formats. I didn't mean that postscript support was
| involved in this particular case.
| Lammy wrote:
| > or possibly acquiring Palatino by other means
|
| Relevant: The Palatino FAQ (1998)
|
| https://web.archive.org/web/19990202052926/http://www.mindsp.
| .. https://news.ycombinator.com/item?id=24005172
| jxdxbx wrote:
| Amazing that you can just pop up an emulator in a browser window.
| Retro Mac emulation used to be such a pain in the ass.
| jasomill wrote:
| For anyone interested, here's the document in modern Word format,
| with all vector artwork and fonts intact:
|
| https://jasomill.at/proposal.docx
|
| To convert it, I first opened and re-saved using Word 98[1]
| running on a QEMU-emulated Power Mac, at which point it opened in
| modern Word for Mac ( _viz.,_ version 16.82).
|
| The pictures were missing, however, with Word claiming "There is
| not enough memory or disk space to display or print the picture."
| (given 64 GB RAM with 30+ GB free at the time, I assume the
| actual problem is that Word no longer supports the PICT image
| format).
|
| To restore the images, I used Acrobat (5.0.10) print-to-PDF in
| Word 98 to create a PDF, then extracted the three images to
| separate PDFs using (modern) Adobe Illustrator, preserving the
| original fonts, vector artwork, size, and exact bounding box of
| each image.
|
| At this point, restoring the images was a simple matter of
| deleting the original images and dragging and dropping the PDF
| replacements from the Finder.
|
| For comparison, here's the PDF created by Acrobat from Word 98 on
| the Power Mac
|
| https://jasomill.at/proposal-Word98.pdf
|
| and here's a PDF created by modern Word running on macOS Sonoma
|
| https://jasomill.at/proposal-Word16.82.pdf
|
| [1] https://archive.org/details/ms-word98-special-edition
| jasomill wrote:
| As an aside, MacClippy 98 knew the score:
|
| https://jasomill.at/Clippy.png
| throwaway828 wrote:
| MacClippy seems like a useful bot. Similar to AI chat windows
| on websites without the second guessing.
| whoopdedo wrote:
| Did you attempt to extract the pictures so they could be
| converted directly by another program? Archive Team says that
| LibreOffice can read vector PICT files[1]. And then saved as
| SVG. Of course you still have the font problem if it has text.
| I hadn't thought of using PDF to preserve vectors, but of
| course it does, as well as embedding the fonts.
|
| [1] http://fileformats.archiveteam.org/wiki/PICT
| jasomill wrote:
| Good question. I saved the original document as RTF and
| extracted what I believe is the raw PICT binary data, but
| quickly decided on the Acrobat route when I realized I didn't
| know of any software that could easily convert PICT to a more
| modern vector format (other than by printing the PICT to
| Acrobat PDF, but that's essentially what I did in Word with
| extra steps).
|
| If you want to give it a go, here's the raw PICT data from
| the RTF:
|
| https://jasomill.at/Picture1.PICT
|
| (extracted from RTF tag \pict\macpict\picw513\pich459)
|
| https://jasomill.at/Picture2.PICT
|
| (\pict\macpict\picw410\pich327)
|
| https://jasomill.at/Picture3.PICT
|
| (\pict\macpict\picw420\pich291)
|
| and here are MacBinary-encoded[1] PICT files containing the
| same data:
|
| https://jasomill.at/Picture1.bin
|
| https://jasomill.at/Picture2.bin
|
| https://jasomill.at/Picture3.bin
|
| [1] https://en.wikipedia.org/wiki/MacBinary
|
| Encoding is required because the PICT file format stores
| image data in the file's resource fork[2].
|
| [2] https://en.wikipedia.org/wiki/Resource_fork
| Gormo wrote:
| Just tried it and confirmed that LibreOffice can indeed
| read PICT files as vector images and re-export to SVG.
|
| This can be scripted using the `--convert-to` option on the
| LibreOffice command line.
| animal_spirits wrote:
| The sci-fi job of digital archaeologists are becoming real!
| tomjakubowski wrote:
| any time you dig through layers of git commit history to
| answer a question, you are performing archaeology
| jgrahamc wrote:
| Marvellous. Thank you!
| ragebol wrote:
| I did not expect to read about the LHC in such an 'old'
| document. I couldn't find (in the time I was willing to spend
| during work) when the LHC project started to this already be
| relevant in 1990 (20 years before it started, which is also
| longer than I would have guessed)
| api wrote:
| Today's historic working documents will mostly be SaaS hosted
| documents in systems like Google Docs, Notion, etc. In the future
| nobody will be able to open them. They won't exist, and the
| software won't exist, and there will be no way to restore it
| since the software is SaaS that can't be emulated or even
| installed anywhere.
| willmadden wrote:
| MS word for mac 16.16 opens it with the diagrams intact in
| "compatibility mode". The only issue is the text is indented
| slightly too far on the left.
|
| Libre Office opens it with the same quality, but has some weird
| gray ghost lines around tables.
| _rupertius wrote:
| Now do one with Google Docs
| im_down_w_otp wrote:
| There's a System 7.1 Mac SE/30 sitting 2ft to my right with Word
| 5 on it. Send it to me. I've got you. Using a combination of
| LocalTalk and two other computers on that shelf I should get it
| up to Office 2001 in no time.
| voltagex_ wrote:
| ITT: people repeatedly making the same mistakes, misunderstanding
| archival and also ignoring glaring problems with converted output
| ogurechny wrote:
| Just ask The Neural Net to draw something appropriate to
| illustrate the given text. There's little noticeable
| difference.
|
| _(ducks and runs away)_
| LarryMade2 wrote:
| Props to LibreOffice
|
| Recently I was asked to locate an old form document which I found
| it was written in WriteNow for Macintosh, libreOffice opened it
| up easily (even without a filename extension) and except for some
| font substitutions the tables seemed to be all correct. Very
| impressive.
| 0xcde4c3db wrote:
| See also: "How to hire Guillaume Portes" [1]
|
| (also "autoSpaceLikeWord95" in case anyone shares that specific
| brainworm with me and is Ctrl+Fing for it)
|
| [1] https://www.robweir.com/blog/2007/01/how-to-hire-
| guillaume-p...
| cxr wrote:
| I've been collecting notes about this file for a few years.
|
| Some of the information in this post was previously covered right
| here in the comments on HN a few years back:
| <https://news.ycombinator.com/item?id=12793157>
|
| The top reply there links to an online file(1)-like tool that
| identified it as a MacWrite II document. Last time I checked, the
| tool was updated and identifies the file as "Word for the
| Macintosh document (v4.0)" (pretty much what my system's file(1)
| says about it).
|
| We actually have a scan of Robert Cailliau's copy with his
| handwritten notes (including the infamous, "Vague but
| exciting..." remark). It's neither 20 nor 24 pages but instead 16
| and differs in several respects:
| <https://cds.cern.ch/record/1405411>; the version linked in the
| post and described erroneously as "the original" on w3.org
| clearly isn't the original and _has_ been changed in several ways
| besides just "the date added in May 1990". Rather, the May 1990
| version here is the second revision of the original that was
| first passed to Cailliau, and by November 1990 Berners-Lee and
| Cailliau were calling this second revision "HyperText and
| CERN"[1][2].
|
| That is, "Information Management: A Proposal" is the one authored
| solely by TBL and given to Cailliau. It's not the version that
| appears here. "HyperText and CERN" from May 1990 is what we're
| looking at here, but was mistakenly _also_ published as
| "Information Management: A Proposal". Later, TBL and Cailliau
| coauthored a joint work called "WorldWideWeb: Proposal for a
| Hypertext Project"[1][3] that referenced "HyperText and CERN" by
| name.
|
| TBL is also known to have used WriteNow--there are lots of .wn
| files littering w3.org. I now believe (since last summer) that
| it's likely that TBL authored this revision of the proposal in
| WriteNow (even if he didn't save it in the WriteNow format) or
| used WriteNow at least for the RTF export. Refer again to [2].
|
| 1.
| <https://cds.cern.ch/record/2639699/files/Proposal_Nov-1990.p...>
|
| 2. <https://www.w3.org/Administration/HTandCERN.>
|
| 3. <https://www.w3.org/Proposal>
| cxr wrote:
| > We actually have a scan of Robert Cailliau's copy with his
| handwritten notes (including the infamous, "Vague but
| exciting..." remark).
|
| Sorry, it was late when I wrote this. That was actually Mike
| Sendall (though TBL and Cailliau did collaborate on the
| others).
| dusted wrote:
| It's an interesting problem we have with file formats.. Emulation
| saves us, but at which point will we need to run emulators in
| emulators to reach the documents ? I suppose it's still somewhat
| easier than trying to understand some symbols on a cave wall..
| peter_hansteen wrote:
| This reminds me of my own screed of a much simpler document (an
| ASCII table generated as a printer test back in the late 1980s)
| that was not possible to render correctly some years later -
| https://bsdly.blogspot.com/2013/11/compatibility-is-hard-cha... -
| also contains a link to a further rant about other document
| formats that were supposed to be "standard" and "portable".
| vman81 wrote:
| > I downloaded the latest Apache OpenOffice and it did open the
| file
|
| The last decade of Apache OpenOffice can VERY generously be
| described as "maintenance mode". Most of the pull requests are
| grammar and dictionary tweaks.
___________________________________________________________________
(page generated 2024-02-14 23:02 UTC)