[HN Gopher] Bit banging a 3.5" floppy drive
___________________________________________________________________
Bit banging a 3.5" floppy drive
Author : brk
Score : 124 points
Date : 2023-12-19 18:09 UTC (4 hours ago)
(HTM) web link (floppy.cafe)
(TXT) w3m dump (floppy.cafe)
| wolpoli wrote:
| > Fun fact! floppy disks actually contain a lot more surface area
| than 1.44mb. By my calculation, you'll get closer to 1.70mb but a
| lot of that extra space is earmarked for synchronization barriers
| and sector / track metadata.
|
| This explains the 2M utility that allowed storing about 1.8mb on
| a floppy disk. It was fun playing with it.
| pgeorgi wrote:
| https://www.os2museum.com/wp/the-xdf-diskette-format/ has tons
| of details on IBM's contemporary and somewhat similar format.
| retrac wrote:
| I think this is a case where a picture is worth a thousand
| words. This excellent article "Visualizing Commodore 1541 Disk
| Contents" [1] by Michael Steil about the Commodore 64 disk
| format, includes visualizations of the magnetic flux as stored
| on disk.
|
| This bit is particularly relevant:
| https://www.pagetable.com/docs/visualize_1541/sector.png See
| the solid bit at the end of the sector, just before the next
| header? You _could_ squeeze a few more bytes in there, but if
| the drive motor is just slightly too fast, it 'll overwrite the
| next sector. That's why there's a gap, tolerance for timing
| variation.
|
| Most floppy drive technologies wrote blindly, guessing where
| they were on the disk based on timing estimates since the
| controller last saw a sector header. This is also why disks
| needed to be "formatted". Not just in the sense of writing the
| file system data structures, but writing out all the sector
| headers. This had to be done all at once with the same drive,
| due to those small timing variations.
|
| [1] https://www.pagetable.com/?p=1070
| examiga500 wrote:
| Amiga had floppy drives that could read/write 1.76MB on HD
| disks and 880K on SD floppys. I think this was possible
| because they could control the speed.
| basementcat wrote:
| Mac drives had finer control over the motor RPM. Amiga
| drives read/wrote a track at a time and had no sector gaps.
|
| https://porterolsen.wordpress.com/2016/06/15/accessing-
| mac-f... https://c65gs.blogspot.com/2023/10/reading-amiga-
| disks-in-me....
| rasz wrote:
| The only HD drive ever available for Amiga was sold with
| some 4000 units - a modded Chinon FZ357A spinning at half
| rpm because nobody at commodore knew how to update PLL
| circuit in Denise. 1.76MB capacity was reached by not using
| standard PC format.
|
| Microsoft itself was shipping software on ordinary PC
| floppies formatted for 1.68MB
| https://en.wikipedia.org/wiki/Distribution_Media_Format
| Cockbrand wrote:
| My A3000 has an HD floppy drive. I was surprised to find
| that out, as I hadn't been aware that Amigas with these
| drives existed.
| brk wrote:
| I recall it was also not unheard of to have floppy drives
| that could be incompatible with each other. A drive that was
| a tad slow might format a disk that would work for itself,
| and other drive, but that disk might not work in a drive that
| was a tad too fast (and vice-versa). This wasn't common, just
| frequently enough to occur so occasionally as to always be
| baffling, particularly in an office with lots of PCs.
|
| Then there were things like Spiradisc
| (https://en.wikipedia.org/wiki/Spiradisc), which created
| incompatibilities by design.
| jacquesm wrote:
| This happened with tape drives too. Head alignment or track
| alignment a little bit off and you'd start to lose the high
| pretty quickly.
| Aachen wrote:
| A 2M utility allows storing 1.8M? That's not confusing at all!
| hinkley wrote:
| I don't think I ever got a stable disk above 1.6MB. Which was
| just enough for a few things but generally not worth it.
|
| The motors in the disk drives could be controlled directly, and
| you could pack the tracks tighter by stepping the motor just a
| little bit less than you were supposed to. And in theory if you
| did it right, other disk drives could read it.
|
| 'In theory' is carrying a lot there. I tended to find
| 1.5something to 1.6something worked and anything higher rarely
| ever did.
| rasz wrote:
| Microsoft had no problem with 1680KB
| https://en.wikipedia.org/wiki/Distribution_Media_Format
| scoot wrote:
| Could you really step less than a standard track? I would
| have assumed that the stepper motor steps are track aligned,
| so either you step, or you don't...
| NegativeLatency wrote:
| I'd imagine it's very hardware specific
| rasz wrote:
| No, no standard Shugart interface compatible floppy allows
| you to do that.
| sedatk wrote:
| Microsoft was able to distribute Windows 95 in fewer floppies
| by creating their own floppy disk format called DMF that used
| utilized more sectors per track.
| ComputerGuru wrote:
| Interesting and fun project! I found the MFM encoding page
| particularly enlightening as it explained why you have to write a
| full sector at a time on a floppy, even though there's nothing
| _physically_ constraining you to that so far as I could see on
| the electromechanical /hardware side of things.
|
| And on that page the "make sure the compiler didn't inject 10,000
| lines of boundary checks" bit told me everything I needed to know
| about what language the project was written in :lol: - here's the
| link to the driver: https://github.com/SharpCoder/floppy-driver-
| rs
|
| (Side note: I'm glad to see the Teensy continuing to get love; I
| adopted it back when it was at v1 and v2 as it was just such a
| complete no-brainer of a better choice than the Arduino stack
| everyone was using back then. I think now there's even an
| Arduino-on-Teensy software stack, but I've moved to just using
| STM32 directly even for just fun home hacks and have greatly
| enjoyed coding for that target in rust.)
| jacquesm wrote:
| There is also Arduino on Raspberry Pi. The Arduino IDE is a bit
| annoying but the compatibility between platforms is really nice
| to see and makes a lot of boards a drop-in replacement for each
| other if you run out of a particular resource or need some
| other capability.
| sked64 wrote:
| super cool very retro
| dusted wrote:
| side select.. so.. that basically flips between upper or lower
| read/write heads..
|
| That wire though..
|
| It seems they could have gotten twice the speed by having two
| read and two write pins, one one additional pin.
| zaxomi wrote:
| The first floppy from 1967 only had one side. Adding one signal
| to select side was an easy solution to increase the capacity
| without to much modification of the controller. Adding the
| capability to read from both heads at the same time would
| require much more modifications, and more memory.
| FullyFunctional wrote:
| Undeniably true; so much of computing hardware (and software)
| looks archaic and bizarre because it's the process of a long
| chain of backwards compatible changes (don't get me started
| on ATAPI).
|
| But that aside, like dusted I too wonder just _how_ hard it
| would have been for a company like, say Apple, to demand the
| extra circuit. Might not have been worth it for just 2X
| speedup.
|
| Now I want to build a 10X floppy RAID ...
| theamk wrote:
| Interesting how author switches to assembly language for more
| precise reading, but keeps the "read_data" method as a separate
| non-assembly function. That introduces lots of branching in the
| code which is busy-loop-based and branch prediction is not what
| I'd want for consistent timing. It also introduces un-needed
| dependency: what if the next version of compiler changes the
| code? All timings (which are based on cycle-counting) will be
| off.
|
| That said, Teensy 4.0 is 600 MHz ARM cpu, so there are 1000
| cycles even between the shortest transitions.. some overhead is
| fine, the project is not exactly cpu-starved.
|
| I also wonder if author has considered using a peripherals for
| precise signal capture? Something like timer in capture mode
| feeding into DMA buffer would allow hardware signal capture with
| very high precision and without any dependencies on exact
| instructions emitted.
| naitgacem wrote:
| I think judging by the title and the mention of bit banging,
| the aim of this isn't to get a robust reliable thing going on.
| I find hacking things together like this to be really fun and
| doesn't feel like a job. sort of entertainment. but that's just
| me perhaps.
| theamk wrote:
| So do I, but that's more of the reason to keep things robust,
| no?
|
| In my work projects, I can use the dangerous code like this -
| because we have compilers and libraries frozen, unit and
| integration tests, a complex testing process. We can do all
| the right efforts to ensure the things work, even if solution
| is intrinsically unreliable.
|
| In my personal projects I write some stuff and start using
| it, the testing is minimal, and toolchain versions is
| "whatever platformio decided to pull up today". I'd hate for
| my project to break just because I rebuilt it to add the new
| feature and meanwhile my compiler got upgraded. So I'd
| definitely abuse SPI port or something to get things
| reliable.
| chasd00 wrote:
| > Teensy 4.0 is 600 MHz ARM cpu, so there are 1000 cycles even
| between the shortest transitions.
|
| can you explain this a bit more? When you say "transition" are
| you talking about an individual transistor moving from on to
| off or vice versa?
| theamk wrote:
| "transition" is signal changing, from high to low or from low
| to high.
|
| As described in the page, there are multiple signals changing
| when operating floppy ("track 0", "write gate", "data",
| etc..). Of them the fastest one is "data", so that's what I
| am going to focus on.
|
| The 2nd page of writeup [0] says: A short
| transition (S) will nominally have 2us between bits, and
| represents 0b10 A medium transition (M) will
| nominally have 3us between bits, and represents 0b100
| A long transition (L) will nominally have 4us between bits,
| and represents 0b1000
|
| So we are looking at 3, 4 or 5 microseconds between bits. To
| get this in CPU cycles, you multiply this by clock frequency
| - google can help you with units, searching for "2
| microseconds * 600 megahertz" [1] shows the answer, 1200,
| right away. I've rounded this down to 1000, as there are two
| transitions per pulse and it is all very approximate anyway.
|
| And then you use your embedded knowledge to assign meaning to
| the number: the CPU is ARM, so 1 instruction/cycle is a good
| approximation (it could be more due to dual-issue or less due
| to jumps). So you have like 1000 instructions. Each function
| call in language like C or C++ might be a 5-20 instructions
| overhead, and you probably want to read that pin at least 10
| times to detect both transitions. The tightest loop is also
| going to be a dozen instructions or less (read gpio, mask,
| compare, maybe jump out, increase, compare timeout, loop)
|
| So.. you can do it in C/C++ easily if your main loop involves
| no function calls (and you have no interrupts). If you use
| functions to read, your timing is going to be tight and those
| functions are better be super-optimized, you will be asking
| your compiler for a lot. Higher level languages like
| lua/micropython are out of the question (at least for that
| loop). And as I learned from reading this, rust is also out
| of the question, although I wonder if there are some unsafe
| primitives which do not do any checking.
|
| (and yes, there are transistors changing in the background
| all the time throughout the process, but I really don't care
| much about them, they are on too low of the abstraction
| level)
|
| [0] https://floppy.cafe/mfm.html
|
| [1] https://www.google.com/search?q=2+microseconds+*+600+mega
| her...
| dragontamer wrote:
| > And then you use your embedded knowledge to assign
| meaning to the number: the CPU is ARM, so 1
| instruction/cycle is a good approximation (it could be more
| due to dual-issue or less due to jumps). So you have like
| 1000 instructions. Each function call in language like C or
| C++ might be a 5-20 instructions overhead, and you probably
| want to read that pin at least 10 times to detect both
| transitions. The tightest loop is also going to be a dozen
| instructions or less (read gpio, mask, compare, maybe jump
| out, increase, compare timeout, loop)
|
| Nit: That's definitely the wrong approach though IMO.
|
| So you want to accomplish two things:
|
| 1. Clock recovery -- Figuring out the timing of a signal
|
| 2. Decoding -- Figuring out what that signal means
|
| These are two separate steps and should be done separately,
| be it in code or hardware. Though advanced protocols
| combine both into a single step, the older protocols (UART
| / Floppy / etc. etc.) had these two concepts separated into
| two different steps.
|
| You won't have 2us between bits: but instead 2.01us or
| 1.99us between bits, etc. etc. Clock-recovery mechanisms
| means that even in the face of worst-case timing
| differences, your code remains resilient.
|
| Decoding is the step you've done here, but it should be
| done after clock-recovery.
|
| -------------
|
| Traditional clock recovery methods are phase-locked-loops
| (in hardware), or various XOR-loops (in software) to try
| and figure out the timing of the clock from the 0-1 and 1-0
| transitions.
|
| ----
|
| The traditional UART (ex: 9600 baud or 115200 baud) is
| ~16-ticks per bit. (IE: a 9600 baud UART needs to look at
| the signal 153600 times per second. A 115200 baud UART
| needs to look at the signal 1843200 times per second). The
| 16-times per bit helps you "center your aim" for the
| transition. You then typically aim at the center-3
| timeslots (ex: count number 7, 8, and 9) for when to send
| and/or read the signal.
|
| --------
|
| That being said, your analysis for "how many instructions
| you have per timeslice to read the data" is correct. I just
| feel like adding that the clock-recovery portion needs to
| be definitely addressed.
| dfox wrote:
| The idea there is that if you measure the timing between
| transitions precisely enough you do not have to do a real
| clock recovery. The FDD motors seem to be precise and
| stable enough (after some spin-up time) that this
| approach works and IIRC even many HW FDCs do something
| similar internally. But at the same time the low-level
| format is clearly designed to make some kind of PLL-based
| clock recovery scheme possible.
|
| After all that is what the FM in MFM implies. There is an
| obvious parallel with the simplest approach to
| demodulating FSK (or for that matter DTMF) in digital
| domain, which works by counting/timing zero transitions
| of the signal.
|
| The UART receivers are similar in that there is no clock
| recovery, with the assumption that the clock is stable
| enough that any kind of frequency error or drift will be
| insignificant for the relatively short (usually 10bit)
| frame. The oversampling is there to align the sample
| point with middle of the symbol and the majority voting
| from multiple samples serves to average out effects of
| spiky noise that may be superimposed on the signal.
| dragontamer wrote:
| Pretty much all UART receivers I know of perform either
| 16x or 8x sampling to figure out where the start and end
| of bit-transitions are located. This is the clock-
| recovery mechanism.
|
| You need to discover the edges of the clock, and make
| sure you read _AWAY_ from those edges. The bits are not
| well defined on the clock edges. Even with a 100%
| accurate clock, if you're reading on the edges you'll be
| very unreliable.
|
| UARTs aim to read on the "center" of bits. (If there are
| 16x reads per bit, then the "center" is on reads 7, 8,
| and 9). You'll want to stay away from reading on
| timeslot#1 or timeslot#16.
| dfox wrote:
| It would be a good match for some kind of input capture
| peripheral (the MCU does not have input capture per-se but it
| is an Cortex M7 so you can certainly build that out of
| interrupt matrix and some clever configuration of DMA engine)
| or for abusing SPI USRT if it can support such extremely long
| frames (which it probably can).
| LegionMammal978 wrote:
| > That introduces lots of branching in the code which is busy-
| loop-based and branch prediction is not what I'd want for
| consistent timing.
|
| I don't see how so basic a call would create conditional
| branches that would have to be predicted. Calling the function
| is an unconditional branch-and-link, after which it should just
| be doing a load and returning. (It's LLVM with the equivalent
| of -O2, it's not going to be doing anything weird.) Unless the
| return address isn't cached in these processors?
| maaarghk wrote:
| Could be useful / fun to make a USB floppy drive which supports
| non standard layouts like Commodore and AKAI.
| EvanAnderson wrote:
| A couple of examples:
|
| - https://decromancer.ca/greaseweazle/
|
| -
| https://www.cbmstuff.com/index.php?route=product/product&pro...
|
| I think it'd be interesting to connect a high sensitivity /
| resolution sampling probe directly to the analog output of the
| drive heads. You could do software-defined signal processing to
| potentially recover damaged data. These USB-based tools are
| getting the signal after being amplified in the analog domain
| and processed by the drive's electronics.
| jacquesm wrote:
| The Teensy is an incredibly powerful platform for its size. I
| never cease to be amazed at what people manage to get out of
| them.
| vardump wrote:
| Would be pretty cool to get this working on a bit-banging
| monster, like RP2040 (Raspberry Pi Pico).
|
| Just a few bucks and sports a 12 Mbit/s USB interface (and wifi
| for pico-w).
| zozbot234 wrote:
| Take a look at the Greaseweazle
| https://github.com/keirf/greaseweazle to see what a really
| high-end solution in this space looks like. It's intended as a
| from scratch alternative to the better-known KryoFlux.
| quijoteuniv wrote:
| I love fun with flags!
| gwbas1c wrote:
| > To enable the motor, pull this pin LOW and then wait 500ms.
|
| No wonder floppies were sooo daaarn sloooooow.
| mras0 wrote:
| Once the drive is spinning that doesn't matter though. Floppies
| are slow by modern standards mostly because you only get a new
| (decoded) bit around every 2nd (or for DD drives 4th)
| microsecond, and the drive takes some time stepping to the next
| cylinder (track).
| userbinator wrote:
| _Fun fact! While I was developing my driver, I ruined many entire
| tracks by leaving this open for too long._
|
| The write gate basically turns on the electromagnet in the head,
| which will do exactly what you'd expect that to. Early floppy
| drives' documentation actually came with schematics which show
| this more clearly.
|
| Early hard drives based on ST-506 also have a very similar
| interface.
| anotherevan wrote:
| One of my first jobs in the early 90's was to write a device
| driver for a floppy disk drive in an embedded system. There was
| the drive itself, the floppy disk controller chip, and the direct
| memory access (DMA) chip. I only had the specs for the latter two
| in English.
|
| Analysing the circuits, I saw the controller chip was wired such
| that use of the DMA chip was software configurable, so I thought
| beaut, I'll write and test the first iteration without DMA, then
| add and test that after.
|
| Couldn't get it working. Scratched my head for a while until,
| while discussing it with one of the hardware engineers, was told
| that while the controller chip had been wired software
| configurable, the floppy drive itself was hard-wired to use DMA.
| If only I had the spec for that I would have figured it out!
|
| So added usage of the DMA chip... success!
| EvanAnderson wrote:
| The other side of this coin is emulating a floppy drive in
| software: https://github.com/keirf/flashfloppy
___________________________________________________________________
(page generated 2023-12-19 23:00 UTC)