[HN Gopher] Memory Mapping an FPGA from an STM32
___________________________________________________________________
Memory Mapping an FPGA from an STM32
Author : hasheddan
Score : 88 points
Date : 2024-07-25 14:21 UTC (8 hours ago)
(HTM) web link (serd.es)
(TXT) w3m dump (serd.es)
| Already__Taken wrote:
| real quite high level sorry, most of your embedded projects going
| forward are MCU+fpga to do what? I thought a custom router but
| 284mbps isn't nearly fast for a network.
| UncleOxidant wrote:
| It's a good question. A lot of FPGA projects I see (including
| some real life products I've looked into recently) don't really
| need an FPGA. One I was asked to evaluate recently could easily
| have been done with a microcontroller with PWM outputs. The
| frequencies involved were well under 40MHz. Yes, there were a
| couple of multiplications going on in the FPGA, but there those
| could've been easily handled by a micorcontroller. An RP2040
| would've sufficed instead of what they had - a microcontroller
| + an FPGA.
| azonenberg wrote:
| The projects in question include things like a 48 port
| gigabit Ethernet switch with packet datapath in the FPGA, and
| dual 10/25G SFP28 uplinks. You're not doing that on a MCU.
| Also higher end oscilloscope work (e.g. 10 Gsps 12-bit
| JESD204B)
|
| But a STM32 is more than sufficient for the management
| interface on both.
| rvense wrote:
| I think a lot of people don't fully appreciate how fast a
| modern "microcontroller" is. That 'H735 is probably faster
| than every computer I had up to and including the iBook G4 I
| until early 2009.
| buescher wrote:
| I keep running into that also. It's like the common mental
| model of a microcontroller froze around Y2K as a sort of
| headless VIC-20. I had an _FAE_ , and a good one, from a
| major supplier tell me "you can't implement a filter" on a
| low-end micro that was roughly as powerful as an early
| nineties DSP.
| rvense wrote:
| Cortex-Ms, man. A lot of 'em you just give 3.3V and a
| couple of bypass caps, and GCC will use the single-cycle
| hardware MAC (for M4 and above) if you just write the
| straight-forward C code and you can put it on there in
| 600ms with DFU. I'm a hobbyist, not an embedded wizard,
| but it really seems *pretty* good compared to what I
| understand about the old days.
|
| (Like I like retro stuff and during COVID I bought an old
| DSP56k dev board with a book about the assembly language
| but oh boy, oh dear)
| buescher wrote:
| They're amazing. And - you can run a _PC emulator_ on an
| ESP32. Sure, you need the fancy ram. OK. And then people
| will tell me an ESP32 can 't do things that people
| definitely were doing on bare-bones PCs in the eighties.
| 15155 wrote:
| Zynq 7010s are $2.50 and are a hell of a lot more chip than
| an RP2040. If you already have the design (or copy one of the
| 50 available), it's a good option when you don't want to
| fight the chip.
|
| PIO has extraordinarily sloppy timing (skew in all
| categories) compared to the cheapest and smallest FPGAs.
| azonenberg wrote:
| Where are you getting them for $2.50?? The XC7Z010-1CLG225C
| is $74.83 at Digikey in qty 1.
|
| Checking sketchier places Win-Source has the CLG400 package
| for $22.20 and even the cheapest aliexpress seller wants
| $4.84 for something marked as a 7Z010 that may or may not
| be legit.
|
| Also "fight the chip" is pretty much the definition of what
| I did last time I did a zynq project. Just give me a plain
| FPGA and MCU with no wizards or GUIs or automatic code
| generation.
| 15155 wrote:
| https://www.aliexpress.us/item/3256803970893483.html
|
| I've ordered trays (and they send the OEM tray) - unique
| barcodes, legit.
|
| > Just give me a plain FPGA and MCU with no wizards or
| GUIs or automatic code generation.
|
| You can pretty much cut out all of their tools and get a
| pure Yocto/Vivado TCL build for the bitstream for the 7
| series Zynqs. Very low touch.
|
| Their IO planner (in the Vivado IP integrator) is
| somewhat necessary for complex peripheral scenarios and
| is one of the few things I ever use Xilinx GUI
| applications for anymore.
| azonenberg wrote:
| The intent is for the high performance datapath to live
| entirely in FPGA (and the project you're probably thinking of
| is switching, not routing).
|
| The MCU is for control plane only. Several hundred Mbps between
| the control and data plane is more than enough for a SSH
| management CLI and poking registers on the FPGA to move a port
| to a different VLAN in response to a CLI command or add an ACL
| rule or something.
| dragontamer wrote:
| Embedded projects are never about doing things as fast as
| computers: we have full scale computers (and routers, and
| firewalls, and switches) for that.
|
| Embedded is about solving problems more physical in nature, as
| you are physically closer to reality in nearly all aspects.
|
| --------
|
| An MCU + FPGA project could implement... say... the VFIR IrDA
| (Infrared) protocol at 16Mbit.
|
| Traditional IrDA is widely supported at SIR and MIR levels
| (upto 1.152MBit or so). Anything faster and the equipment has
| basically been lost to the 1990s (and never was very popular
| anyway).
|
| IrDA I'd explain as a remote-controller on steroids. Its
| infrared based (like TV Remote Controllers), so you need to
| line up both devices and have them looking at each other.
| Infrared can reliably travel about 3 meters over the open air
| in a variety of conditions. IrDA allows for bidirectional
| communications. Its a truly wireless protocol, albeit one that
| requires significant alignment to function correctly. But ~3
| meters is good range and practical for many applications.
|
| Nominally, you could use an entire MCU to handle the encoding /
| decoding of these light-pulses. However, that's a bit
| redundant. Its far more cost efficient to dedicate a few LUTs
| in an FPGA to the task.
|
| Yes, the MCU is needed for the final application-level / OSI
| layer 4/5/6/7 aspects of IrDA protocol. But the lowest PHY and
| MAC levels of the protocol can and (probably) should be a small
| section of FPGA.
|
| Upgrading from standard MCU 1MBit to 16MBit would be a 1600%
| improvement to communications compared to what's readily
| available with commercial-off-the-shelf solutions. If you've
| determined that IR Communications is good for whatever purpose
| you're using, maybe the 1600% improvement is going to be
| useful.
|
| ------------
|
| EDIT: The "physicality" of this is because photodiodes react
| very quickly to light pulses. And an expensive enough
| transistor can amplify that at the ~100MHz speeds needed to run
| VFIR (at least in theory. I've never done this).
|
| The FPGA (or MCU if you go that route...) just needs to clock
| at 100MHz or so, and interpret the start-of-frame and end-of-
| frame signals, while also interpreting a few other low-level
| details. Overall, this turns the sequence of light pulses into
| bits-and-bytes for higher-level processing (which code can and
| should handle).
| throwawayabcdef wrote:
| This is dope. I work with Zynq/Versal quite a bit and respect and
| understand (conceptually) the decisions you have made!
|
| You get to own every aspect of your toolchain and with that will
| come a lot of power.
|
| Are you familiar with:
|
| https://github.com/corundum/corundum
|
| Perhaps you can build a support package for your platform.
| chillingeffect wrote:
| Neat! I love that H7 chip and its gargantuan inatruction
| manual... ...and you didn't even mention its 2nd core :)
| azonenberg wrote:
| H735 is one of the single core SKUs. Just a 550 MHz M7.
|
| Would not surprise me if the M4 was there and fused off (i.e.
| same die as multicore H7 offerings), but it's not active.
| duskwuff wrote:
| Probably not. The dual-core parts are DIE450 (which is shared
| with some single-core parts like the H750 series!), but
| STM32H735 is DIE483.
| azonenberg wrote:
| I have a H735 on a retired board slated for decap so we'll
| find out once I open it up.
|
| Do you know if it's fabbed in house, TSMC, or Samsung? I've
| seen ST silicon from all 3 foundries but the only thing
| I've seen stated publicly is 40nm. When I get it opened up
| it should be easy to tell, TSMC and Samsung processes have
| distinctive features on them that I recognize by sight.
| duskwuff wrote:
| No idea - I'm reading the die IDs out of the STM32Cube
| DB. I haven't looked at the silicon, but I have no reason
| to doubt what the DB says, especially since it confirms
| that a lot of allegedly different parts use the same
| dies.
| 15155 wrote:
| I recommend checking out SpinalHDL generally - I do a ton of this
| very same kind of work with these same chips (7 series, US+) and
| would never look back to Verilog!
|
| AXI (and all memory-mapped bus protocol schemes) becomes very
| very _pleasant._ SV interfaces get you 5% of the way there,
| though!
|
| Also - I was under the impression that S1000-2M is a higher-end
| material, not cost-optimized? (But not Rogers, of course.)
| azonenberg wrote:
| S1000-2 is quite cheap and lossy (Df 0.016), slightly better
| than Isola 370HR (0.021) but nowhere near the stuff I usually
| use. At my usual Chinese board house it's one of the lowest
| cost substrates available for prototypes since it's always in
| stock and there's no need to special order.
|
| For higher end digital work I typically reach for Taiwan Union
| TU872SLK (Df 0.009) which also has a better range of prepregs
| and glass styles available to help minimize fiber weave effect.
| Still quite a bit lossier than e.g. RO4350B but far less
| expensive and if you have decent equalizers on your SERDES the
| difference is typically not significant unless you're making
| some kind of humongous backplane. I get wide open eyes with
| just a tiny bit of post-cursor emphasis on the TX FFE at
| 10.3125 Gbps on TU872SLK for my typical shortish high speed
| tracks (FPGA to SFP+ cage).
| 15155 wrote:
| Curious who you are using in CN for higher-speed FPGA boards,
| if you can share!
|
| I haven't seen these as directly-advertised options at any of
| my usual suspects.
| azonenberg wrote:
| Multech (multech-pcb.com) is my preferred manufacturer
| these days for high end stuff. I've done six layer HDI any-
| layer via stackups, ten layers with filled via-in-pad,
| RO4350B, TU872SLK, flex, 75 micron trace/space, etc. And
| that's nowhere near the limit of their capabilities, I just
| haven't needed higher end yet.
|
| I have some 25/100G stuff in the pipe for probably some
| time next year that I plan to make with them too.
|
| Their website undersells, I get the impression most of the
| actual sales contacts are word of mouth. I talk to my sales
| rep by skype mostly (the alternatives are expensive
| international phone calls or wechat).
|
| The really cool thing is that you get a 10+ page QA report
| with every order including measured
| copper/dielectric/soldermask thicknesses, hole sizes, ionic
| contamination measurements, and a ton of other metrics. And
| they send the TDR strips and polished cross section with
| every order as their way of saying "look, we actually did
| the QA, double check our measurements if you don't trust
| us". (I actually have repeated some of the measurements to
| spot-check and got results within a few percent of their QA
| department, no surprises there).
|
| And they don't make silent gerber changes or anything. They
| do a full CAM review and send you working gerbers and a
| list of suggested DFM tweaks for you to sign off before
| beginning manufacture. If something doesn't look right you
| have a chance to say "wait there's a problem".
|
| For example, one time they wanted to make a really large
| width adjustment for impedance on some RF traces that I had
| carefully modeled in an EM solver. But they didn't make a
| bad board without telling me, they flagged it on the CAM
| review and we went back and forth before realizing the
| mistake was on their end (they had calculated impedance
| assuming solder mask over the traces, while they were
| actually exposed copper). They re-ran the numbers which
| then closely matched my simulations, I signed off on the
| modified design, and the board was manufactured without
| issue.
| buescher wrote:
| Also S1000-2 is not rated/controlled past 1GHz. It shouldn't
| vary that much so for small runs the risk is minimal. But for
| volume production that's exactly the sort of thing you never
| want to have to investigate in hindsight.
| buescher wrote:
| This is really crisp work and nice to see. Before the Zynq era I
| worked with some designs that used a DSP or StrongARM along with
| a medium-sized FPGA, where the FPGA would be both the glue logic
| for RAM as well as custom peripherals, but I've been out of that
| world for a while. It would be fun to find an application for a
| big FPGA and a modern microcontroller.
| dmitrygr wrote:
| Be veeeery careful. STM32H QSPI peripheral is _FULL OF_ very
| nasty bugs, especially the second version (supports writes) that
| you find in STM32H0B chips . You are currently avoiding them by
| having QSPI mapped as device memory, but the minute you attempt
| to use it with cache or run code from it, or (god help you) put
| your stack, heap, and /or vector table on a QSPI device, you are
| in for a world of poorly-debuggable 1:1,000,000 failures. STM
| knows but refuses to publicly acknowledge, even if they privately
| admit some other customers have "hit similar issues". Issues I've
| found, demonstrated to them, and wrote reliable replications of:
|
| * non-4-byte-sized writes randomly lost about 1/million writes if
| QSPI is writeable and not cached
|
| * non-4-byte-sized writes randomly rounded up in size to 2 or 4
| bytes with garbage, overwriting nearby data about 1/million
| writes if QSPI is writeable and cached
|
| * when PC, SP, and VTOR all point to QSPI memory, any interrupt
| has about a 1/million chance of reading garbage instead of the
| proper vector from the vector table if it interrupts a LDM/STM
| instruction targeting the QSPI memory and it is cached and misses
| the cache
|
| Some of these have workarounds that I found (contact me). I am
| refusing to disclose them to STM until they acknowledge the bugs
| publicly.
|
| I recommend NOT using STM32H7 chips in any product where you want
| QSPI memory to work properly.
| azonenberg wrote:
| I have encountered issues with QSPI (mostly caused by the
| annoying prefetch queue) which is why I am switching to the FMC
| for FPGA interfacing (i.e. not using OCTOSPI). That was the
| whole point of this experiment, validating FMC as a replacement
| for my legacy OCTOSPI based MCU-APB bridge. I have a previous
| board using QSPI reliably in indirect mode (i.e. not memory
| mapped) but found it was full of pain when memory mapped
| specifically in writes. So that firmware memory maps it for
| reads but switches to indirect mode for writes. And has cache
| disabled.
|
| So far I have it working quite reliably (my test firmware does
| a loopback test with 100K reads/writes of a 32-bit register at
| the start that I had written with intent of using it for link
| training of the PLLs to optimize read/write capture timing but
| never ended up using as such) and my iperf test can push tens
| of thousands of packets per second without issue.
| 15155 wrote:
| The NXP IMXRT-series chips have a similar EMC (external
| memory controller) as well as "FlexIO" - PIO-like
| programmable IO. I've used both for this kind of FPGA
| interface without issue.
|
| The IMXRT1064 is around $7 and is also an M7 core with an HS
| USB PHY, programmable PLL-connected LVDS clock output, 2
| EMACs, excellent hardened IP generally.
| azonenberg wrote:
| I have some RT1176's in my "to try" pile.
|
| The big thing holding me back was that their crypto
| accelerators were all locked behind NDAs (a dealbreaker for
| F/OSS work) while the ST ones are documented in the freely
| downloadable datasheet you can just google up.
|
| But I did find some third party wrapper libraries that
| seemed to be able to use the crypto registers so it might
| be possible to figure things out from that. I haven't tried
| yet.
|
| The other issue I had with the RT is that they lacked
| internal flash so PCB complexity is slightly higher than
| with a STM32.
| 15155 wrote:
| > I have some RT1176's in my "to try" pile.
|
| Keep in mind the dual-core 11xx chips are a bit harder to
| boot than the rest of the line - but you probably need
| the power domain flexibility for most FPGA projects (1064
| has way fewer practically-usable 1v8 banks.)
|
| > crypto accelerators were all locked behind NDAs
|
| I've been able to use every bit of hard IP and high-
| assurance boot from registers using no vendor code
| whatsoever.
|
| Here's what you are looking for:
|
| https://github.com/JayHeng/imxrt-
| level2-boot/blob/master/dev...
|
| > The other issue I had with the RT is that they lacked
| internal flash
|
| The IMXRT1064 has a 4MB Winbond QSPI chip in-package, by
| the way!
|
| > PCB complexity is slightly higher than with a STM32.
|
| The Xilinx FPGA that is sitting next to your MCU incurs
| multiple orders of magnitude more PCB-complexity than a
| little QSPI flash, haha.
| dmitrygr wrote:
| > 100K reads/writes of a 32-bit register
|
| You'll hit almost no bugs if you keep accessing the same
| address in a loop. Lucky you :)
| azonenberg wrote:
| Yeah but again, we're talking about the FMC here not the
| OCTOSPI.
|
| Have you hit issues with the FMC? From what other people
| are telling me, the OCTOSPI is full of land mines and the
| FMC is pretty decent. The worst errata I've encountered so
| far is two dummy clocks with CS# asserted at the end of a
| read burst.
| mips_r4300i wrote:
| Thanks for the heads up. I have a design at fab that uses the
| H7's OctoSPI so this concerns me. I steered away from the
| memory mapped mode because it seemed too good to be true -
| wanted to be able to qsort() and put heaps in this extra space.
|
| I suspect ST only ever tested it with their single PSRAM they
| intend this mode for. My intent is to use indirect mode and
| manually poke the peripheral, though DMA will have to happen
| still.
|
| Back on the PIC32MX platform there was a similar type of bug
| that doesn't exist anywhere else but to me: If any interrupt
| fires while the PMP peripheral is doing a DMA, there is a 1 in
| a million chance that it will silently drop 1 byte. Noticed
| this because all my accesses were 32bit (4 bytes) and broke
| horribly at the misalignment. The solution is to disable all
| interupts while doing DMA.
| dmitrygr wrote:
| it is worse: i think they also did not test random access. I
| suspect their test was to: fill PSRAM linearly and then read
| it back and verify linearly. Random word accesses in
| unachached mode also randomly lose writes. I am unable to
| replicate quickly _on purpose_ , only randomly, so i guess it
| is under 1/100mil so it is not in my list above. My
| workarounds avoid these crashes too though.
| azonenberg wrote:
| As far as using QSPI memory, one thing I have planned (and will
| be thoroughly testing) is using an external SPI flash as
| configuration data storage. Right now if I want to store any
| nonvolatile settings with power loss protection I need to burn
| two 128 kB erase blocks (one primary and one secondary, so I
| can ping-pong data between them and not lose anything if I have
| a power loss during a write cycle or similar) of the on-chip
| flash, space that I'd much rather use for firmware.
|
| MicroKVS expects to be able to memory map data fetches
| (uncached), but is fine with using indirect access for writes.
| azonenberg wrote:
| But if I can memory map the FPGA via the FMC, I can simply
| put an APB memory mapped QSPI controller on the FPGA and
| store my config there, using the same flash for the FPGA
| bitstream as well.
|
| This saves a chip on the board, reduces the amount of PCB
| routing required, and eliminates use of the sketchy OCTOSPI
| peripheral entirely. Testing that out is on my list of things
| to do on this board eventually.
| 15155 wrote:
| I almost always include I2C EEPROM - just too cheap and
| pretty easy to route.
| azonenberg wrote:
| That can't be memory mapped, so I'd need to rewrite my
| KVS code which currently expects to be able to return a
| pointer to the raw on-flash image of the config data.
| Doable but a pain.
| mystified5016 wrote:
| What the hell is going on at ST? Every STM uC I've tried to use
| in the past few years has had showstopper bugs with loads of
| very similar complaints online dating back to the release of
| the part. Bugs that have been in the wild for _years_ and still
| exist in the current production run.
|
| After burning enough company time chasing bugs through ST's
| crappy silicon, I've had to just swear them off entirely. We're
| an Atmel house now. Significantly fewer (zero) problems, and
| some pretty nifty features like UPDI.
| hmry wrote:
| In college, our SoC design instructor told us that to pass
| the class, our modules should be better than ST's "which is
| not that high of a bar" :P
| mips_r4300i wrote:
| They churn out new parts and don't bring in fixes. See all
| the chips in their lineup that have a USB host controller.
| Every one of them (they use Synopsys IP) will fail with
| multiple LS devices through a hub. We talked to our FAE about
| this and they have no plans to fix it. The bug has existed
| for years and the bad IP is being baked into all the new
| chips still. Solution? Just use yet another chip for its host
| controller, and don't use a hub.
___________________________________________________________________
(page generated 2024-07-25 23:04 UTC)