[HN Gopher] Bit banging a 3.5" floppy drive
       ___________________________________________________________________
        
       Bit banging a 3.5" floppy drive
        
       Author : brk
       Score  : 124 points
       Date   : 2023-12-19 18:09 UTC (4 hours ago)
        
 (HTM) web link (floppy.cafe)
 (TXT) w3m dump (floppy.cafe)
        
       | wolpoli wrote:
       | > Fun fact! floppy disks actually contain a lot more surface area
       | than 1.44mb. By my calculation, you'll get closer to 1.70mb but a
       | lot of that extra space is earmarked for synchronization barriers
       | and sector / track metadata.
       | 
       | This explains the 2M utility that allowed storing about 1.8mb on
       | a floppy disk. It was fun playing with it.
        
         | pgeorgi wrote:
         | https://www.os2museum.com/wp/the-xdf-diskette-format/ has tons
         | of details on IBM's contemporary and somewhat similar format.
        
         | retrac wrote:
         | I think this is a case where a picture is worth a thousand
         | words. This excellent article "Visualizing Commodore 1541 Disk
         | Contents" [1] by Michael Steil about the Commodore 64 disk
         | format, includes visualizations of the magnetic flux as stored
         | on disk.
         | 
         | This bit is particularly relevant:
         | https://www.pagetable.com/docs/visualize_1541/sector.png See
         | the solid bit at the end of the sector, just before the next
         | header? You _could_ squeeze a few more bytes in there, but if
         | the drive motor is just slightly too fast, it 'll overwrite the
         | next sector. That's why there's a gap, tolerance for timing
         | variation.
         | 
         | Most floppy drive technologies wrote blindly, guessing where
         | they were on the disk based on timing estimates since the
         | controller last saw a sector header. This is also why disks
         | needed to be "formatted". Not just in the sense of writing the
         | file system data structures, but writing out all the sector
         | headers. This had to be done all at once with the same drive,
         | due to those small timing variations.
         | 
         | [1] https://www.pagetable.com/?p=1070
        
           | examiga500 wrote:
           | Amiga had floppy drives that could read/write 1.76MB on HD
           | disks and 880K on SD floppys. I think this was possible
           | because they could control the speed.
        
             | basementcat wrote:
             | Mac drives had finer control over the motor RPM. Amiga
             | drives read/wrote a track at a time and had no sector gaps.
             | 
             | https://porterolsen.wordpress.com/2016/06/15/accessing-
             | mac-f... https://c65gs.blogspot.com/2023/10/reading-amiga-
             | disks-in-me....
        
             | rasz wrote:
             | The only HD drive ever available for Amiga was sold with
             | some 4000 units - a modded Chinon FZ357A spinning at half
             | rpm because nobody at commodore knew how to update PLL
             | circuit in Denise. 1.76MB capacity was reached by not using
             | standard PC format.
             | 
             | Microsoft itself was shipping software on ordinary PC
             | floppies formatted for 1.68MB
             | https://en.wikipedia.org/wiki/Distribution_Media_Format
        
               | Cockbrand wrote:
               | My A3000 has an HD floppy drive. I was surprised to find
               | that out, as I hadn't been aware that Amigas with these
               | drives existed.
        
           | brk wrote:
           | I recall it was also not unheard of to have floppy drives
           | that could be incompatible with each other. A drive that was
           | a tad slow might format a disk that would work for itself,
           | and other drive, but that disk might not work in a drive that
           | was a tad too fast (and vice-versa). This wasn't common, just
           | frequently enough to occur so occasionally as to always be
           | baffling, particularly in an office with lots of PCs.
           | 
           | Then there were things like Spiradisc
           | (https://en.wikipedia.org/wiki/Spiradisc), which created
           | incompatibilities by design.
        
             | jacquesm wrote:
             | This happened with tape drives too. Head alignment or track
             | alignment a little bit off and you'd start to lose the high
             | pretty quickly.
        
         | Aachen wrote:
         | A 2M utility allows storing 1.8M? That's not confusing at all!
        
         | hinkley wrote:
         | I don't think I ever got a stable disk above 1.6MB. Which was
         | just enough for a few things but generally not worth it.
         | 
         | The motors in the disk drives could be controlled directly, and
         | you could pack the tracks tighter by stepping the motor just a
         | little bit less than you were supposed to. And in theory if you
         | did it right, other disk drives could read it.
         | 
         | 'In theory' is carrying a lot there. I tended to find
         | 1.5something to 1.6something worked and anything higher rarely
         | ever did.
        
           | rasz wrote:
           | Microsoft had no problem with 1680KB
           | https://en.wikipedia.org/wiki/Distribution_Media_Format
        
           | scoot wrote:
           | Could you really step less than a standard track? I would
           | have assumed that the stepper motor steps are track aligned,
           | so either you step, or you don't...
        
             | NegativeLatency wrote:
             | I'd imagine it's very hardware specific
        
             | rasz wrote:
             | No, no standard Shugart interface compatible floppy allows
             | you to do that.
        
         | sedatk wrote:
         | Microsoft was able to distribute Windows 95 in fewer floppies
         | by creating their own floppy disk format called DMF that used
         | utilized more sectors per track.
        
       | ComputerGuru wrote:
       | Interesting and fun project! I found the MFM encoding page
       | particularly enlightening as it explained why you have to write a
       | full sector at a time on a floppy, even though there's nothing
       | _physically_ constraining you to that so far as I could see on
       | the electromechanical /hardware side of things.
       | 
       | And on that page the "make sure the compiler didn't inject 10,000
       | lines of boundary checks" bit told me everything I needed to know
       | about what language the project was written in :lol: - here's the
       | link to the driver: https://github.com/SharpCoder/floppy-driver-
       | rs
       | 
       | (Side note: I'm glad to see the Teensy continuing to get love; I
       | adopted it back when it was at v1 and v2 as it was just such a
       | complete no-brainer of a better choice than the Arduino stack
       | everyone was using back then. I think now there's even an
       | Arduino-on-Teensy software stack, but I've moved to just using
       | STM32 directly even for just fun home hacks and have greatly
       | enjoyed coding for that target in rust.)
        
         | jacquesm wrote:
         | There is also Arduino on Raspberry Pi. The Arduino IDE is a bit
         | annoying but the compatibility between platforms is really nice
         | to see and makes a lot of boards a drop-in replacement for each
         | other if you run out of a particular resource or need some
         | other capability.
        
       | sked64 wrote:
       | super cool very retro
        
       | dusted wrote:
       | side select.. so.. that basically flips between upper or lower
       | read/write heads..
       | 
       | That wire though..
       | 
       | It seems they could have gotten twice the speed by having two
       | read and two write pins, one one additional pin.
        
         | zaxomi wrote:
         | The first floppy from 1967 only had one side. Adding one signal
         | to select side was an easy solution to increase the capacity
         | without to much modification of the controller. Adding the
         | capability to read from both heads at the same time would
         | require much more modifications, and more memory.
        
           | FullyFunctional wrote:
           | Undeniably true; so much of computing hardware (and software)
           | looks archaic and bizarre because it's the process of a long
           | chain of backwards compatible changes (don't get me started
           | on ATAPI).
           | 
           | But that aside, like dusted I too wonder just _how_ hard it
           | would have been for a company like, say Apple, to demand the
           | extra circuit. Might not have been worth it for just 2X
           | speedup.
           | 
           | Now I want to build a 10X floppy RAID ...
        
       | theamk wrote:
       | Interesting how author switches to assembly language for more
       | precise reading, but keeps the "read_data" method as a separate
       | non-assembly function. That introduces lots of branching in the
       | code which is busy-loop-based and branch prediction is not what
       | I'd want for consistent timing. It also introduces un-needed
       | dependency: what if the next version of compiler changes the
       | code? All timings (which are based on cycle-counting) will be
       | off.
       | 
       | That said, Teensy 4.0 is 600 MHz ARM cpu, so there are 1000
       | cycles even between the shortest transitions.. some overhead is
       | fine, the project is not exactly cpu-starved.
       | 
       | I also wonder if author has considered using a peripherals for
       | precise signal capture? Something like timer in capture mode
       | feeding into DMA buffer would allow hardware signal capture with
       | very high precision and without any dependencies on exact
       | instructions emitted.
        
         | naitgacem wrote:
         | I think judging by the title and the mention of bit banging,
         | the aim of this isn't to get a robust reliable thing going on.
         | I find hacking things together like this to be really fun and
         | doesn't feel like a job. sort of entertainment. but that's just
         | me perhaps.
        
           | theamk wrote:
           | So do I, but that's more of the reason to keep things robust,
           | no?
           | 
           | In my work projects, I can use the dangerous code like this -
           | because we have compilers and libraries frozen, unit and
           | integration tests, a complex testing process. We can do all
           | the right efforts to ensure the things work, even if solution
           | is intrinsically unreliable.
           | 
           | In my personal projects I write some stuff and start using
           | it, the testing is minimal, and toolchain versions is
           | "whatever platformio decided to pull up today". I'd hate for
           | my project to break just because I rebuilt it to add the new
           | feature and meanwhile my compiler got upgraded. So I'd
           | definitely abuse SPI port or something to get things
           | reliable.
        
         | chasd00 wrote:
         | > Teensy 4.0 is 600 MHz ARM cpu, so there are 1000 cycles even
         | between the shortest transitions.
         | 
         | can you explain this a bit more? When you say "transition" are
         | you talking about an individual transistor moving from on to
         | off or vice versa?
        
           | theamk wrote:
           | "transition" is signal changing, from high to low or from low
           | to high.
           | 
           | As described in the page, there are multiple signals changing
           | when operating floppy ("track 0", "write gate", "data",
           | etc..). Of them the fastest one is "data", so that's what I
           | am going to focus on.
           | 
           | The 2nd page of writeup [0] says:                   A short
           | transition (S) will nominally have 2us between bits, and
           | represents 0b10         A medium transition (M) will
           | nominally have 3us between bits, and represents 0b100
           | A long transition (L) will nominally have 4us between bits,
           | and represents 0b1000
           | 
           | So we are looking at 3, 4 or 5 microseconds between bits. To
           | get this in CPU cycles, you multiply this by clock frequency
           | - google can help you with units, searching for "2
           | microseconds * 600 megahertz" [1] shows the answer, 1200,
           | right away. I've rounded this down to 1000, as there are two
           | transitions per pulse and it is all very approximate anyway.
           | 
           | And then you use your embedded knowledge to assign meaning to
           | the number: the CPU is ARM, so 1 instruction/cycle is a good
           | approximation (it could be more due to dual-issue or less due
           | to jumps). So you have like 1000 instructions. Each function
           | call in language like C or C++ might be a 5-20 instructions
           | overhead, and you probably want to read that pin at least 10
           | times to detect both transitions. The tightest loop is also
           | going to be a dozen instructions or less (read gpio, mask,
           | compare, maybe jump out, increase, compare timeout, loop)
           | 
           | So.. you can do it in C/C++ easily if your main loop involves
           | no function calls (and you have no interrupts). If you use
           | functions to read, your timing is going to be tight and those
           | functions are better be super-optimized, you will be asking
           | your compiler for a lot. Higher level languages like
           | lua/micropython are out of the question (at least for that
           | loop). And as I learned from reading this, rust is also out
           | of the question, although I wonder if there are some unsafe
           | primitives which do not do any checking.
           | 
           | (and yes, there are transistors changing in the background
           | all the time throughout the process, but I really don't care
           | much about them, they are on too low of the abstraction
           | level)
           | 
           | [0] https://floppy.cafe/mfm.html
           | 
           | [1] https://www.google.com/search?q=2+microseconds+*+600+mega
           | her...
        
             | dragontamer wrote:
             | > And then you use your embedded knowledge to assign
             | meaning to the number: the CPU is ARM, so 1
             | instruction/cycle is a good approximation (it could be more
             | due to dual-issue or less due to jumps). So you have like
             | 1000 instructions. Each function call in language like C or
             | C++ might be a 5-20 instructions overhead, and you probably
             | want to read that pin at least 10 times to detect both
             | transitions. The tightest loop is also going to be a dozen
             | instructions or less (read gpio, mask, compare, maybe jump
             | out, increase, compare timeout, loop)
             | 
             | Nit: That's definitely the wrong approach though IMO.
             | 
             | So you want to accomplish two things:
             | 
             | 1. Clock recovery -- Figuring out the timing of a signal
             | 
             | 2. Decoding -- Figuring out what that signal means
             | 
             | These are two separate steps and should be done separately,
             | be it in code or hardware. Though advanced protocols
             | combine both into a single step, the older protocols (UART
             | / Floppy / etc. etc.) had these two concepts separated into
             | two different steps.
             | 
             | You won't have 2us between bits: but instead 2.01us or
             | 1.99us between bits, etc. etc. Clock-recovery mechanisms
             | means that even in the face of worst-case timing
             | differences, your code remains resilient.
             | 
             | Decoding is the step you've done here, but it should be
             | done after clock-recovery.
             | 
             | -------------
             | 
             | Traditional clock recovery methods are phase-locked-loops
             | (in hardware), or various XOR-loops (in software) to try
             | and figure out the timing of the clock from the 0-1 and 1-0
             | transitions.
             | 
             | ----
             | 
             | The traditional UART (ex: 9600 baud or 115200 baud) is
             | ~16-ticks per bit. (IE: a 9600 baud UART needs to look at
             | the signal 153600 times per second. A 115200 baud UART
             | needs to look at the signal 1843200 times per second). The
             | 16-times per bit helps you "center your aim" for the
             | transition. You then typically aim at the center-3
             | timeslots (ex: count number 7, 8, and 9) for when to send
             | and/or read the signal.
             | 
             | --------
             | 
             | That being said, your analysis for "how many instructions
             | you have per timeslice to read the data" is correct. I just
             | feel like adding that the clock-recovery portion needs to
             | be definitely addressed.
        
               | dfox wrote:
               | The idea there is that if you measure the timing between
               | transitions precisely enough you do not have to do a real
               | clock recovery. The FDD motors seem to be precise and
               | stable enough (after some spin-up time) that this
               | approach works and IIRC even many HW FDCs do something
               | similar internally. But at the same time the low-level
               | format is clearly designed to make some kind of PLL-based
               | clock recovery scheme possible.
               | 
               | After all that is what the FM in MFM implies. There is an
               | obvious parallel with the simplest approach to
               | demodulating FSK (or for that matter DTMF) in digital
               | domain, which works by counting/timing zero transitions
               | of the signal.
               | 
               | The UART receivers are similar in that there is no clock
               | recovery, with the assumption that the clock is stable
               | enough that any kind of frequency error or drift will be
               | insignificant for the relatively short (usually 10bit)
               | frame. The oversampling is there to align the sample
               | point with middle of the symbol and the majority voting
               | from multiple samples serves to average out effects of
               | spiky noise that may be superimposed on the signal.
        
               | dragontamer wrote:
               | Pretty much all UART receivers I know of perform either
               | 16x or 8x sampling to figure out where the start and end
               | of bit-transitions are located. This is the clock-
               | recovery mechanism.
               | 
               | You need to discover the edges of the clock, and make
               | sure you read _AWAY_ from those edges. The bits are not
               | well defined on the clock edges. Even with a 100%
               | accurate clock, if you're reading on the edges you'll be
               | very unreliable.
               | 
               | UARTs aim to read on the "center" of bits. (If there are
               | 16x reads per bit, then the "center" is on reads 7, 8,
               | and 9). You'll want to stay away from reading on
               | timeslot#1 or timeslot#16.
        
         | dfox wrote:
         | It would be a good match for some kind of input capture
         | peripheral (the MCU does not have input capture per-se but it
         | is an Cortex M7 so you can certainly build that out of
         | interrupt matrix and some clever configuration of DMA engine)
         | or for abusing SPI USRT if it can support such extremely long
         | frames (which it probably can).
        
         | LegionMammal978 wrote:
         | > That introduces lots of branching in the code which is busy-
         | loop-based and branch prediction is not what I'd want for
         | consistent timing.
         | 
         | I don't see how so basic a call would create conditional
         | branches that would have to be predicted. Calling the function
         | is an unconditional branch-and-link, after which it should just
         | be doing a load and returning. (It's LLVM with the equivalent
         | of -O2, it's not going to be doing anything weird.) Unless the
         | return address isn't cached in these processors?
        
       | maaarghk wrote:
       | Could be useful / fun to make a USB floppy drive which supports
       | non standard layouts like Commodore and AKAI.
        
         | EvanAnderson wrote:
         | A couple of examples:
         | 
         | - https://decromancer.ca/greaseweazle/
         | 
         | -
         | https://www.cbmstuff.com/index.php?route=product/product&pro...
         | 
         | I think it'd be interesting to connect a high sensitivity /
         | resolution sampling probe directly to the analog output of the
         | drive heads. You could do software-defined signal processing to
         | potentially recover damaged data. These USB-based tools are
         | getting the signal after being amplified in the analog domain
         | and processed by the drive's electronics.
        
       | jacquesm wrote:
       | The Teensy is an incredibly powerful platform for its size. I
       | never cease to be amazed at what people manage to get out of
       | them.
        
       | vardump wrote:
       | Would be pretty cool to get this working on a bit-banging
       | monster, like RP2040 (Raspberry Pi Pico).
       | 
       | Just a few bucks and sports a 12 Mbit/s USB interface (and wifi
       | for pico-w).
        
         | zozbot234 wrote:
         | Take a look at the Greaseweazle
         | https://github.com/keirf/greaseweazle to see what a really
         | high-end solution in this space looks like. It's intended as a
         | from scratch alternative to the better-known KryoFlux.
        
       | quijoteuniv wrote:
       | I love fun with flags!
        
       | gwbas1c wrote:
       | > To enable the motor, pull this pin LOW and then wait 500ms.
       | 
       | No wonder floppies were sooo daaarn sloooooow.
        
         | mras0 wrote:
         | Once the drive is spinning that doesn't matter though. Floppies
         | are slow by modern standards mostly because you only get a new
         | (decoded) bit around every 2nd (or for DD drives 4th)
         | microsecond, and the drive takes some time stepping to the next
         | cylinder (track).
        
       | userbinator wrote:
       | _Fun fact! While I was developing my driver, I ruined many entire
       | tracks by leaving this open for too long._
       | 
       | The write gate basically turns on the electromagnet in the head,
       | which will do exactly what you'd expect that to. Early floppy
       | drives' documentation actually came with schematics which show
       | this more clearly.
       | 
       | Early hard drives based on ST-506 also have a very similar
       | interface.
        
       | anotherevan wrote:
       | One of my first jobs in the early 90's was to write a device
       | driver for a floppy disk drive in an embedded system. There was
       | the drive itself, the floppy disk controller chip, and the direct
       | memory access (DMA) chip. I only had the specs for the latter two
       | in English.
       | 
       | Analysing the circuits, I saw the controller chip was wired such
       | that use of the DMA chip was software configurable, so I thought
       | beaut, I'll write and test the first iteration without DMA, then
       | add and test that after.
       | 
       | Couldn't get it working. Scratched my head for a while until,
       | while discussing it with one of the hardware engineers, was told
       | that while the controller chip had been wired software
       | configurable, the floppy drive itself was hard-wired to use DMA.
       | If only I had the spec for that I would have figured it out!
       | 
       | So added usage of the DMA chip... success!
        
       | EvanAnderson wrote:
       | The other side of this coin is emulating a floppy drive in
       | software: https://github.com/keirf/flashfloppy
        
       ___________________________________________________________________
       (page generated 2023-12-19 23:00 UTC)