[HN Gopher] SuperH
       ___________________________________________________________________
        
       SuperH
        
       Author : rdpintqogeogsaa
       Score  : 94 points
       Date   : 2021-12-04 11:39 UTC (2 days ago)
        
 (HTM) web link (en.wikipedia.org)
 (TXT) w3m dump (en.wikipedia.org)
        
       | monocasa wrote:
       | Easily one of my favorite archs, at least as far as the idea of
       | it goes. IMO the most pure classic 5 stage RISC. Divide is split
       | into each cycle's ops, iterate for as long as your results need.
       | It's pretty easy to get to a point where you can just read the
       | hex of the opcodes as three address RISC with 16 bit instructions
       | and 16 registers means the first nybble is the opcode, and the
       | other three nybbles are the registers. And the whole thing feels
       | like someone told some engineers that they have to make a RISC
       | with half the gates of an equivalent MIPS and the mad lads
       | actually did it, and did it well.
       | 
       | A little ascetic for these days though, IMO.
        
         | AlotOfReading wrote:
         | I had a completely irrational dislike for the division. You
         | could use a loop, but then you were wasting >3/4ths of your
         | cycles on shifting and branching. If you had the misfortune to
         | have the compiler generate division code, it'd usually fill the
         | branch delay slot with a NOP as well. The solution was simply
         | to stick 8/16/32 division instructions in a row, but that felt
         | inelegant. I spent a day writing some silly routine that
         | averaged ~22 cycles over 32 bit integers just because it
         | annoyed me so much.
        
           | freemint wrote:
           | Do you still have access to this?
        
             | AlotOfReading wrote:
             | Sadly not. I was pretty terrible about version control and
             | saving things from that early in my career. The concept was
             | pretty simple though: division was 1-bit so you could
             | simply do fewer divisions if you had smaller numbers. The
             | first version was something like a BSR -> jump into the
             | divide table. That probably would have been the place to
             | stop, but I was clearly terrible at profiling because I
             | remember adding all sorts of special case logic on top to
             | handle things like powers of two, which likely ended up
             | being slower in practice because cycle times aren't
             | everything.
        
         | dragontamer wrote:
         | > Divide is split into each cycle's ops, iterate for as long as
         | your results need.
         | 
         | You know what would be better? If you had a decompression
         | routine built into the CPU that would convert division and
         | modulus commands into this loop at the micro-code level.
         | 
         | That way, a divide / modulus cycle (probably taking 20
         | instructions taking 50+ bytes) can be compressed into a
         | singular instruction (1 instruction taking 4 bytes)... using
         | less L1 cache.
        
           | monocasa wrote:
           | I mean, it doesn't have any microcode, and does better than
           | MIPS ironically at achieving the whole "Microprocessor
           | without Interlocked Pipelined Stages" thing because of
           | directly exposing the divide pipeline in single cycle units.
           | Having to add pipeline interlocking from the EX stage just
           | for that as well as the microcode itself would have been non
           | trivial from a gate count perspective on a core that's
           | already cutting gates left and right. And it's neat to be
           | able to choose the divide precision you need down to the
           | cycle.
           | 
           | I agree though in the general case, hence why I end with it
           | being a little ascetic these days.
        
         | rvense wrote:
         | Branch delay slots make. assembly weird
        
           | ithkuil wrote:
           | Go away NOP!
        
       | cbmuser wrote:
       | FWIW, we're still building an unofficial port of Debian for sh4:
       | 
       | > https://buildd.debian.org/status/architecture.php?a=sh4&suit...
       | 
       | Installer images are being built, too. But currently don't boot
       | due to an resolved bug in QEMU or the kernel:
       | 
       | > https://cdimage.debian.org/cdimage/ports/debian-installer/20...
       | 
       | If you're interested in Linux on sh4, join the #debian-ports IRC
       | channel on OFTC.
       | 
       | I'm the primary maintainer of the Debian sh4 port (and m68k,
       | sparc64, x32, ia64, powerpc and ppc64).
        
       | aconst wrote:
       | Worked on this while an intern in Ricoh in Japan :
       | https://www.kernel.org/doc/ols/2004/ols2004v2-pages-239-250....
       | 
       | I was not the one who put Linux on it (it is not my name on the
       | paper, but I was there around that time), but I had fun making a
       | NetMeeting like demonstration application on it. Well, there was
       | no sound but I could get 5fps video! :D
        
       | anamax wrote:
       | That takes me back.
       | 
       | I wrote SH-3/4 simulators that were significantly faster than the
       | actual parts. (The cheat was that the simulator ran that fast on
       | an early 200MHz Pentium while the SH-3 was something like 35Mhz.)
       | 
       | I also wrote a synthesizable[1] SH-5 hardware model that was
       | cycle and signal accurate at every module boundary and ran >100k
       | cycles/second on said Pentium. (SH-5 was a 64 bit successor to
       | the SH-4 that also had a 32 bit mode that ran SH-4 code. I don't
       | know whether it ever shipped.)
       | 
       | [1] The cache, TLB, and floating-point weren't synthesizable.
       | Making them synthesizable would have killed the cycles/second.
        
       | thedracle wrote:
       | We used SH4 chips for the cosmic-ray surface detector array for
       | the Telescope Array project: http://telescopearray.org/
       | 
       | I cut my embedded development teeth writing device drivers and
       | custom firmware targeting it.
       | 
       | The reason we used SH4 is because the Dreamcast had failed, so
       | there was a huge surplus of them available on the market at the
       | time.
       | 
       | It was easily one of the most interesting and rewarding projects
       | of my life to work on.
        
       | PandaPanda150 wrote:
       | I wrote a couple of games for the SEGA Dreamcast. There are two
       | things that I remember from back then:
       | 
       | 1) The compiler support for SuperH was beyond abysmal.
       | 
       | 2) I loved that machine anyway.
        
         | gkhartman wrote:
         | Any interesting stories from back then? I'd love to hear about
         | any hacks that were unique to Dreamcast if you're willing to
         | share.
        
         | vernie wrote:
         | Which games?
        
       | xony wrote:
       | i loved J-core , it showed the importance of ISA .. RISC-V &
       | J-core can change this world ..
        
       | rektide wrote:
       | supposedly the j-core open core (sh compatible) isnt totally
       | dead. the main patents seem to have run out, so implementation
       | ought be ok. but definitely havent seen what the hopeful roadmap
       | pitched happen: https://j-core.org/roadmap.html
        
         | MaxBarraclough wrote:
         | A pity the J2 never got any traction. I wonder why. RISC-V is
         | doing great; it's not as if there's no interest in unencumbered
         | ISAs.
        
         | tossaway9000 wrote:
         | no update since August 31, 2016 seems pretty dead, though I
         | guess there is someone around to renew the domain name, etc.
        
           | freemint wrote:
           | They haven't come around to updating the page for a while
           | (someone lost login info or something like that) but they
           | have a new core and some projects that use the new SH4
           | equivalent core. https://www.coresemi.io/
           | 
           | They currently seem to be in a "release tarbal" model of open
           | source but know they ought to be in an develop on master
           | branch in public repo model.
        
           | TapamN wrote:
           | There's some activity on the mailing list.
        
       | ndiddy wrote:
       | It would have been interesting to see what would have happened to
       | the SuperH architecture if Renesas decided to improve it instead
       | of continually selling the same 200 MHz part with no die shrinks
       | or improvements until people stopped buying them.
        
         | freemint wrote:
         | It is out of patent. Nobody prevents you from just doing that.
         | But yeah it would be interesting to watch
        
       | dang wrote:
       | A few past related threads:
       | 
       |  _J2 open processor: an open source processor using the SuperH
       | ISA_ - https://news.ycombinator.com/item?id=26866065 - April 2021
       | (45 comments)
       | 
       |  _The SuperH-3, part 15: Code walkthrough_ -
       | https://news.ycombinator.com/item?id=20779622 - Aug 2019 (1
       | comment)
       | 
       |  _The SuperH-3, part 1: Introduction_ -
       | https://news.ycombinator.com/item?id=20622921 - Aug 2019 (2
       | comments)
       | 
       |  _Building a SuperH-compatible CPU from scratch [video]_ -
       | https://news.ycombinator.com/item?id=11886079 - June 2016 (24
       | comments)
       | 
       |  _Resurrecting the SuperH architecture_ -
       | https://news.ycombinator.com/item?id=9812010 - July 2015 (15
       | comments)
        
         | eggsome wrote:
         | Here is my current favorite:
         | 
         | https://m.youtube.com/watch?v=dVD1Yws__v0
         | 
         | (open source GPS!)
        
           | dang wrote:
           | Ah - that led me to these. Thanks!
           | 
           |  _J2 open processor: an open source processor using the
           | SuperH ISA_ - https://news.ycombinator.com/item?id=26866065 -
           | April 2021 (45 comments)
           | 
           |  _Why the J-core open processor is cool_ -
           | https://news.ycombinator.com/item?id=24163584 - Aug 2020 (1
           | comment)
           | 
           |  _J-Core Open Processor_ -
           | https://news.ycombinator.com/item?id=20658584 - Aug 2019 (31
           | comments)
           | 
           |  _J-core Open Processor_ -
           | https://news.ycombinator.com/item?id=12105913 - July 2016 (27
           | comments)
           | 
           |  _Building a CPU from Scratch: Jcore Design Walkthrough
           | [video]_ - https://news.ycombinator.com/item?id=12101908 -
           | July 2016 (8 comments)
        
       | chasil wrote:
       | ARM Thumb is similar in design, and the wiki says that ARM
       | licensed SuperH patents to implement it.
       | 
       | Of several Busybox binaries for ARM, the v7m version is the
       | smallest, and is (AFAIK) Thumb-only.                   busybox-
       | armv5l          2019-06-10 14:02  1.1M         busybox-armv7l
       | 2019-06-10 14:02  1.1M         busybox-armv7m          2019-06-10
       | 14:02  867K         busybox-armv7r          2019-06-10 14:02
       | 1.1M         busybox-armv8l          2019-06-10 14:02  1.1M
       | busybox-sh2eb           2019-06-10 14:02  1.3M         busybox-
       | sh4             2019-06-10 14:02  1.0M
       | 
       | https://busybox.net/downloads/binaries/1.31.0-defconfig-mult...
        
         | tylerflick wrote:
         | A cool thing about Thumb/ARM, is that you can actually switch
         | between the two instruction sets in your binary. You
         | essentially dump your cache when this happens, but you can do
         | it.
        
           | pm215 wrote:
           | Why would a thumb/arm switch invalidate the caches? It's just
           | an interworking branch, it isn't even a context switch.
        
           | mananaysiempre wrote:
           | ... Except on ARM micros, where Thumb probably matters most
           | nowadays, because Cortex-M only runs in Thumb mode.
        
             | duskwuff wrote:
             | And not on ARM64, which has no Thumb equivalent.
        
               | mananaysiempre wrote:
               | Well yeah, but "only runs ARM" is less surprising to me
               | than "only runs Thumb", especially on 64-bit where it's
               | essentially a different architecture altogether. Or maybe
               | I'm just crabby because I wanted to write a Thumb-2
               | assembler, took a long look at the instruction encoding
               | and gave up.
        
             | pm215 wrote:
             | Extremely nerdy sub-detail -- M-profile actually has all
             | the handling for the Thumb-to-Arm switch architecturally,
             | with the LSB of a branch target being significant and a T
             | bit in the PSR to tell you what mode you're in, which you
             | can set to 0. It's just that attempting to actually execute
             | an insn with PSR.T==0 will trigger a UsageFault :-) (The
             | CPU is even required to fetch the insn from memory first --
             | a MPU fault will take precedence over the T-bit-clear
             | fault.)
        
       | tyingq wrote:
       | Used in some machines that were pretty cool at the time, like the
       | tiny Jornada 680[1]. There was a Linux distro called Jlime that
       | could run on these. https://en.wikipedia.org/wiki/Jlime
       | 
       | [1]
       | https://ed154c547559d2878d6a-e584b6b63c3a42919fe0cc5066a1430...
        
         | tossaway9000 wrote:
         | NetBSD has an active port (though "active" for NetBSD is not as
         | "active" as other projects, but it still builds and runs):
         | http://wiki.netbsd.org/ports/sh3/
         | 
         | Aside from the Dreamcast, there's a few Japanese NAS devices
         | that run the SH3 port too.
        
           | tyingq wrote:
           | Ah, okay...the Jornada is on their list too.
           | http://wiki.netbsd.org/ports/hpcsh/hpcsh_support_status/
        
       | spatulon wrote:
       | I used to work in automotive software, where the SH-2A was a
       | popular choice for engine controller units, particularly in
       | Japan.
       | 
       | It was the first CPU I encountered that uses branch delay slots
       | [1], meaning that the instruction immediately following a branch
       | instruction is always executed, even when the branch is taken.
       | That took a bit of getting used to, although I understand it's
       | quite common on RISC architectures.
       | 
       | The SH-4 was most notably used in the Sega Dreamcast.
       | 
       | [1]: https://en.wikipedia.org/wiki/Delay_slot
        
         | mananaysiempre wrote:
         | Branch delay slots are pretty common on old-school RISC archs
         | such as MIPS and SPARC, but AFAIU when those designs were faced
         | with microarchitectural evolution the initial simplicity turned
         | out not to be worth it. Neither PowerPC, Alpha, ARM (if you
         | consider it a RISC), nor RISC-V have them. (A similar thing
         | happened with ARM's two-words-ahead PC.) It's another matter if
         | your microarchitecture _is_ your architecture: apparently
         | modern DSPs still have them, although I don't know where you
         | could learn about those, what with most designs being secret.
        
       ___________________________________________________________________
       (page generated 2021-12-06 23:00 UTC)