[HN Gopher] SuperH
___________________________________________________________________
SuperH
Author : rdpintqogeogsaa
Score : 94 points
Date : 2021-12-04 11:39 UTC (2 days ago)
(HTM) web link (en.wikipedia.org)
(TXT) w3m dump (en.wikipedia.org)
| monocasa wrote:
| Easily one of my favorite archs, at least as far as the idea of
| it goes. IMO the most pure classic 5 stage RISC. Divide is split
| into each cycle's ops, iterate for as long as your results need.
| It's pretty easy to get to a point where you can just read the
| hex of the opcodes as three address RISC with 16 bit instructions
| and 16 registers means the first nybble is the opcode, and the
| other three nybbles are the registers. And the whole thing feels
| like someone told some engineers that they have to make a RISC
| with half the gates of an equivalent MIPS and the mad lads
| actually did it, and did it well.
|
| A little ascetic for these days though, IMO.
| AlotOfReading wrote:
| I had a completely irrational dislike for the division. You
| could use a loop, but then you were wasting >3/4ths of your
| cycles on shifting and branching. If you had the misfortune to
| have the compiler generate division code, it'd usually fill the
| branch delay slot with a NOP as well. The solution was simply
| to stick 8/16/32 division instructions in a row, but that felt
| inelegant. I spent a day writing some silly routine that
| averaged ~22 cycles over 32 bit integers just because it
| annoyed me so much.
| freemint wrote:
| Do you still have access to this?
| AlotOfReading wrote:
| Sadly not. I was pretty terrible about version control and
| saving things from that early in my career. The concept was
| pretty simple though: division was 1-bit so you could
| simply do fewer divisions if you had smaller numbers. The
| first version was something like a BSR -> jump into the
| divide table. That probably would have been the place to
| stop, but I was clearly terrible at profiling because I
| remember adding all sorts of special case logic on top to
| handle things like powers of two, which likely ended up
| being slower in practice because cycle times aren't
| everything.
| dragontamer wrote:
| > Divide is split into each cycle's ops, iterate for as long as
| your results need.
|
| You know what would be better? If you had a decompression
| routine built into the CPU that would convert division and
| modulus commands into this loop at the micro-code level.
|
| That way, a divide / modulus cycle (probably taking 20
| instructions taking 50+ bytes) can be compressed into a
| singular instruction (1 instruction taking 4 bytes)... using
| less L1 cache.
| monocasa wrote:
| I mean, it doesn't have any microcode, and does better than
| MIPS ironically at achieving the whole "Microprocessor
| without Interlocked Pipelined Stages" thing because of
| directly exposing the divide pipeline in single cycle units.
| Having to add pipeline interlocking from the EX stage just
| for that as well as the microcode itself would have been non
| trivial from a gate count perspective on a core that's
| already cutting gates left and right. And it's neat to be
| able to choose the divide precision you need down to the
| cycle.
|
| I agree though in the general case, hence why I end with it
| being a little ascetic these days.
| rvense wrote:
| Branch delay slots make. assembly weird
| ithkuil wrote:
| Go away NOP!
| cbmuser wrote:
| FWIW, we're still building an unofficial port of Debian for sh4:
|
| > https://buildd.debian.org/status/architecture.php?a=sh4&suit...
|
| Installer images are being built, too. But currently don't boot
| due to an resolved bug in QEMU or the kernel:
|
| > https://cdimage.debian.org/cdimage/ports/debian-installer/20...
|
| If you're interested in Linux on sh4, join the #debian-ports IRC
| channel on OFTC.
|
| I'm the primary maintainer of the Debian sh4 port (and m68k,
| sparc64, x32, ia64, powerpc and ppc64).
| aconst wrote:
| Worked on this while an intern in Ricoh in Japan :
| https://www.kernel.org/doc/ols/2004/ols2004v2-pages-239-250....
|
| I was not the one who put Linux on it (it is not my name on the
| paper, but I was there around that time), but I had fun making a
| NetMeeting like demonstration application on it. Well, there was
| no sound but I could get 5fps video! :D
| anamax wrote:
| That takes me back.
|
| I wrote SH-3/4 simulators that were significantly faster than the
| actual parts. (The cheat was that the simulator ran that fast on
| an early 200MHz Pentium while the SH-3 was something like 35Mhz.)
|
| I also wrote a synthesizable[1] SH-5 hardware model that was
| cycle and signal accurate at every module boundary and ran >100k
| cycles/second on said Pentium. (SH-5 was a 64 bit successor to
| the SH-4 that also had a 32 bit mode that ran SH-4 code. I don't
| know whether it ever shipped.)
|
| [1] The cache, TLB, and floating-point weren't synthesizable.
| Making them synthesizable would have killed the cycles/second.
| thedracle wrote:
| We used SH4 chips for the cosmic-ray surface detector array for
| the Telescope Array project: http://telescopearray.org/
|
| I cut my embedded development teeth writing device drivers and
| custom firmware targeting it.
|
| The reason we used SH4 is because the Dreamcast had failed, so
| there was a huge surplus of them available on the market at the
| time.
|
| It was easily one of the most interesting and rewarding projects
| of my life to work on.
| PandaPanda150 wrote:
| I wrote a couple of games for the SEGA Dreamcast. There are two
| things that I remember from back then:
|
| 1) The compiler support for SuperH was beyond abysmal.
|
| 2) I loved that machine anyway.
| gkhartman wrote:
| Any interesting stories from back then? I'd love to hear about
| any hacks that were unique to Dreamcast if you're willing to
| share.
| vernie wrote:
| Which games?
| xony wrote:
| i loved J-core , it showed the importance of ISA .. RISC-V &
| J-core can change this world ..
| rektide wrote:
| supposedly the j-core open core (sh compatible) isnt totally
| dead. the main patents seem to have run out, so implementation
| ought be ok. but definitely havent seen what the hopeful roadmap
| pitched happen: https://j-core.org/roadmap.html
| MaxBarraclough wrote:
| A pity the J2 never got any traction. I wonder why. RISC-V is
| doing great; it's not as if there's no interest in unencumbered
| ISAs.
| tossaway9000 wrote:
| no update since August 31, 2016 seems pretty dead, though I
| guess there is someone around to renew the domain name, etc.
| freemint wrote:
| They haven't come around to updating the page for a while
| (someone lost login info or something like that) but they
| have a new core and some projects that use the new SH4
| equivalent core. https://www.coresemi.io/
|
| They currently seem to be in a "release tarbal" model of open
| source but know they ought to be in an develop on master
| branch in public repo model.
| TapamN wrote:
| There's some activity on the mailing list.
| ndiddy wrote:
| It would have been interesting to see what would have happened to
| the SuperH architecture if Renesas decided to improve it instead
| of continually selling the same 200 MHz part with no die shrinks
| or improvements until people stopped buying them.
| freemint wrote:
| It is out of patent. Nobody prevents you from just doing that.
| But yeah it would be interesting to watch
| dang wrote:
| A few past related threads:
|
| _J2 open processor: an open source processor using the SuperH
| ISA_ - https://news.ycombinator.com/item?id=26866065 - April 2021
| (45 comments)
|
| _The SuperH-3, part 15: Code walkthrough_ -
| https://news.ycombinator.com/item?id=20779622 - Aug 2019 (1
| comment)
|
| _The SuperH-3, part 1: Introduction_ -
| https://news.ycombinator.com/item?id=20622921 - Aug 2019 (2
| comments)
|
| _Building a SuperH-compatible CPU from scratch [video]_ -
| https://news.ycombinator.com/item?id=11886079 - June 2016 (24
| comments)
|
| _Resurrecting the SuperH architecture_ -
| https://news.ycombinator.com/item?id=9812010 - July 2015 (15
| comments)
| eggsome wrote:
| Here is my current favorite:
|
| https://m.youtube.com/watch?v=dVD1Yws__v0
|
| (open source GPS!)
| dang wrote:
| Ah - that led me to these. Thanks!
|
| _J2 open processor: an open source processor using the
| SuperH ISA_ - https://news.ycombinator.com/item?id=26866065 -
| April 2021 (45 comments)
|
| _Why the J-core open processor is cool_ -
| https://news.ycombinator.com/item?id=24163584 - Aug 2020 (1
| comment)
|
| _J-Core Open Processor_ -
| https://news.ycombinator.com/item?id=20658584 - Aug 2019 (31
| comments)
|
| _J-core Open Processor_ -
| https://news.ycombinator.com/item?id=12105913 - July 2016 (27
| comments)
|
| _Building a CPU from Scratch: Jcore Design Walkthrough
| [video]_ - https://news.ycombinator.com/item?id=12101908 -
| July 2016 (8 comments)
| chasil wrote:
| ARM Thumb is similar in design, and the wiki says that ARM
| licensed SuperH patents to implement it.
|
| Of several Busybox binaries for ARM, the v7m version is the
| smallest, and is (AFAIK) Thumb-only. busybox-
| armv5l 2019-06-10 14:02 1.1M busybox-armv7l
| 2019-06-10 14:02 1.1M busybox-armv7m 2019-06-10
| 14:02 867K busybox-armv7r 2019-06-10 14:02
| 1.1M busybox-armv8l 2019-06-10 14:02 1.1M
| busybox-sh2eb 2019-06-10 14:02 1.3M busybox-
| sh4 2019-06-10 14:02 1.0M
|
| https://busybox.net/downloads/binaries/1.31.0-defconfig-mult...
| tylerflick wrote:
| A cool thing about Thumb/ARM, is that you can actually switch
| between the two instruction sets in your binary. You
| essentially dump your cache when this happens, but you can do
| it.
| pm215 wrote:
| Why would a thumb/arm switch invalidate the caches? It's just
| an interworking branch, it isn't even a context switch.
| mananaysiempre wrote:
| ... Except on ARM micros, where Thumb probably matters most
| nowadays, because Cortex-M only runs in Thumb mode.
| duskwuff wrote:
| And not on ARM64, which has no Thumb equivalent.
| mananaysiempre wrote:
| Well yeah, but "only runs ARM" is less surprising to me
| than "only runs Thumb", especially on 64-bit where it's
| essentially a different architecture altogether. Or maybe
| I'm just crabby because I wanted to write a Thumb-2
| assembler, took a long look at the instruction encoding
| and gave up.
| pm215 wrote:
| Extremely nerdy sub-detail -- M-profile actually has all
| the handling for the Thumb-to-Arm switch architecturally,
| with the LSB of a branch target being significant and a T
| bit in the PSR to tell you what mode you're in, which you
| can set to 0. It's just that attempting to actually execute
| an insn with PSR.T==0 will trigger a UsageFault :-) (The
| CPU is even required to fetch the insn from memory first --
| a MPU fault will take precedence over the T-bit-clear
| fault.)
| tyingq wrote:
| Used in some machines that were pretty cool at the time, like the
| tiny Jornada 680[1]. There was a Linux distro called Jlime that
| could run on these. https://en.wikipedia.org/wiki/Jlime
|
| [1]
| https://ed154c547559d2878d6a-e584b6b63c3a42919fe0cc5066a1430...
| tossaway9000 wrote:
| NetBSD has an active port (though "active" for NetBSD is not as
| "active" as other projects, but it still builds and runs):
| http://wiki.netbsd.org/ports/sh3/
|
| Aside from the Dreamcast, there's a few Japanese NAS devices
| that run the SH3 port too.
| tyingq wrote:
| Ah, okay...the Jornada is on their list too.
| http://wiki.netbsd.org/ports/hpcsh/hpcsh_support_status/
| spatulon wrote:
| I used to work in automotive software, where the SH-2A was a
| popular choice for engine controller units, particularly in
| Japan.
|
| It was the first CPU I encountered that uses branch delay slots
| [1], meaning that the instruction immediately following a branch
| instruction is always executed, even when the branch is taken.
| That took a bit of getting used to, although I understand it's
| quite common on RISC architectures.
|
| The SH-4 was most notably used in the Sega Dreamcast.
|
| [1]: https://en.wikipedia.org/wiki/Delay_slot
| mananaysiempre wrote:
| Branch delay slots are pretty common on old-school RISC archs
| such as MIPS and SPARC, but AFAIU when those designs were faced
| with microarchitectural evolution the initial simplicity turned
| out not to be worth it. Neither PowerPC, Alpha, ARM (if you
| consider it a RISC), nor RISC-V have them. (A similar thing
| happened with ARM's two-words-ahead PC.) It's another matter if
| your microarchitecture _is_ your architecture: apparently
| modern DSPs still have them, although I don't know where you
| could learn about those, what with most designs being secret.
___________________________________________________________________
(page generated 2021-12-06 23:00 UTC)