[HN Gopher] Show HN: I built an embeddable Unicode library with ...
       ___________________________________________________________________
        
       Show HN: I built an embeddable Unicode library with MISRA C
       conformance
        
       Hello, everyone. I built Unicorn: an embeddable MISRA C:2012
       implementation of essential Unicode Algorithms.  Unicorn is
       designed to be fully customizable: you can select which Unicode
       algorithms and character properties are included or excluded from
       compilation. You can also exclude Unicode character blocks
       wholesale for scripts your application does not support. It's
       perfect for resource constrained devices like microcontrollers and
       IoT devices.  About me: I quit my Big Corp job a few years back to
       pursue my passion for software development and this is one of my
       first commercial releases.
        
       Author : hgs3
       Score  : 70 points
       Date   : 2024-12-15 15:41 UTC (7 hours ago)
        
 (HTM) web link (railgunlabs.com)
 (TXT) w3m dump (railgunlabs.com)
        
       | sushidev wrote:
       | Nice!
        
         | sushidev wrote:
         | But not interesting for me in any way since it's not open
         | source.
        
           | hgs3 wrote:
           | Unfortunately, the entire reason I didn't release Unicorn
           | under an OSI approved license is because I see many (most?)
           | FOSS projects are chronically underfunded. Now, I did not
           | quit my job and build this to get rich or anything, but I do
           | need to earn enough to sustain myself. If there's enough
           | interest, I would consider crowdfunding a release under an
           | OSI license.
        
             | kouteiheika wrote:
             | Why not dual license it under a commercial license and
             | something like GPL?
        
               | hgs3 wrote:
               | I went back and forth on this and in my uncertainty I
               | decided it was better to start more "closed" first with
               | the potential to become more "open" over time.
        
         | hgs3 wrote:
         | Thank you, let me know if you have any questions.
        
           | sushidev wrote:
           | Who is your main target audience?
        
             | hgs3 wrote:
             | Primarily, companies developing for embedded systems or
             | other resource constrained devices.
        
       | rubicks wrote:
       | This is not a comment about open/closed-source software and/or
       | licensing models.
       | 
       | Projects like this never fail to impress me vis-a-vis source
       | obfuscation. The 'generate.pyz' is an interesting twist on the
       | usual practice.
        
         | layer8 wrote:
         | #  You may not reverse engineer, decompile, disassemble, or
         | otherwise attempt         #  to derive the source code or
         | underlying structure of this script
         | 
         | This prohibition is void in certain relevant jurisdictions, for
         | any publicly available product.
        
       | chris_wot wrote:
       | I don't get the whole MISRA requirement that functions should
       | only have one exit point. Honestly, nobody has been able to
       | explain why this is important, other than it's a historical
       | anomaly inherited from FORTRAN. (Which was actually for a good
       | reason)
        
         | dark-star wrote:
         | one reason I can think of from the top of my head (although I
         | never had to deal with MISRA C at all) is that if you have to
         | add some cleanup code before your function returns, then there
         | is exactly one place and one place only to do that.
         | 
         | Otherwise this leads to duplication of cleanup code similar to
         | allocate_something()       ..       if failed(foo) {
         | deallocate_something()         return FAILED;       }       ..
         | deallocate_something()       return SUCCESS;
        
           | samatman wrote:
           | This is, more than anything, an argument for a `defer`
           | statement, of the sort you can enjoy in Zig right now.
           | 
           | Or hopefully, eventually, in C, thanks to the tireless
           | efforts of JeanHeyde Meneide:
           | 
           | https://thephd.dev/just-put-raii-in-c-bro-please-bro-just-
           | on...
        
             | layer8 wrote:
             | MISRA C can't mandate new language features though.
        
         | pklausler wrote:
         | FORTRAN II introduced the RETURN statement.
        
           | bee_rider wrote:
           | Languages like Matlab, where the values returned are listed
           | at the top of the function and you don't even need a return
           | statement to tell it what to return, always feel so funky and
           | fun.
        
         | layer8 wrote:
         | This is an old rule from when structured programming was
         | introduced. The prior state of affairs was that code would jump
         | via gotos _between_ functions to different labels within those
         | functions (labels were global). The requirement that every
         | function should have only a single entry point and a single
         | exit point seemed like a good rule to establish sanity.
         | 
         | MISRA C states the following rationale:
         | 
         | "A single point of exit is required by IEC 61508 and ISO 26262
         | as part of the requirements for a modular approach.
         | 
         | Early returns may lead to the unintentional omission of
         | function termination code.
         | 
         | If a function has exit points interspersed with statements that
         | produce _persistent side effects_ , it is not easy to determine
         | which _side effects_ will occur when the function is executed."
         | 
         | Note that the MISRA C rule is merely advisory, meaning it is a
         | recommendation and not a hard requirement (i.e. it's a "should"
         | and not a "shall").
        
         | AlotOfReading wrote:
         | That's one of many rules in MISRA that originate from
         | antiquated "best practices" from the dark ages that don't
         | actually improve safety. We have it today by way of IEC 61508,
         | which gets it from a book on structured programming called
         | _Structured Design_. That book _didn 't_ recommend banning
         | multiple exit points, but it recommended minimizing them to
         | simplify the control flow graph and said code should minimize
         | the distance between black boxes (bits of code that do
         | something without leaky abstractions and have only one return
         | statement). The IEC authors and MISRA thought the logical
         | extension of that was to make _everything_ have one exit point.
        
         | aulin wrote:
         | the abominations I've seen in code review from people trying to
         | fullfil this rule still wake me up at night
        
         | champijone wrote:
         | One reason to prefer it in C is to be able to easily add
         | locally scoped functionality like profiling markers and temp
         | allocators.                 profile_begin("func");       a =
         | temp_arena_begin();       // ... code       temp_arena_end();
         | profile_end();
        
       | kiritanpo wrote:
       | This looks interesting. Most embedded project I know use
       | ICU/libicu for their unicode needs. As a potential customer I
       | would like to know how does it compare against ICU for
       | performance and code size. Why should I switch?
        
         | hgs3 wrote:
         | > I would like to know how does it compare against ICU
         | 
         | ICU is a large library, typically around ~40 MB depending on
         | the platform, whereas Unicorn, with all features enabled, is
         | only about 600 KB.
         | 
         | ICU has a broader scope: it's not just a Unicode library, but
         | also an internationalization library. Unicorn, on the other
         | hand, is specifically focused on Unicode algorithms.
         | 
         | ICU wasn't designed to be customized. It's also non-MISRA
         | compliant and written in C++11. In contrast, Unicorn is written
         | in C99, fully customizable, MISRA compliant, and only requires
         | a few features from libc [1]. It's far more portable.
         | 
         | [1] https://github.com/railgunlabs/unicorn/?tab=readme-ov-
         | file#u...
        
       | rurban wrote:
       | This is commercial only. Free and small is my safeclib, which
       | does about half of it. ICU is not usable on small devices, and
       | also pretty slow. It's much faster to use precomputed tables per
       | algorithm, such as here or in safeclib. libunistring is also
       | extremely slow. This was tried for grep and failed.
        
         | hgs3 wrote:
         | > This is commercial only.
         | 
         | You can use Unicorn for non-commercial use [1], but yes, for
         | commercial use you need to buy a license.
         | 
         | > It's much faster to use precomputed tables per algorithm
         | 
         | You're absolutely right about using precomputed tables per
         | algorithm. That is the secret to the library's speed.
         | 
         | > Free and small is my safeclib, which does about half of it.
         | 
         | I like safeclib! It's nice to hear from the author. It's worth
         | distinguishing that safeclib is a safer string library whereas
         | Unicorn is a Unicode algorithms library, not a string library.
         | 
         | [1] https://github.com/railgunlabs/unicorn/blob/master/LICENSE
        
           | rurban wrote:
           | Well, a string is unicode nowadays. And for sure not just a
           | zero-terminated blob. That would be a buffer. Only the Linux
           | kernel still holds this invalid view.
           | 
           | So every string library needs at least a compare function to
           | find strings, with all the variants of same graphemes. Which
           | leads us to NFC normalization for a start. Upcase tables and
           | wordlength tables are also needed.
        
       | Someone wrote:
       | On https://railgunlabs.com/unicorn/manual/misra-compliance/, I
       | think you will want to fix a typo in                 1.2
       | Required    Compliant (verified by compiling with Clang's
       | -pdentic flag)
       | ^^^^^^^^
       | 
       | Or am I too pedantic?
        
         | hgs3 wrote:
         | This is the most ironic typo I've ever made. Thanks for the
         | catch. I've corrected it.
        
           | canucker2016 wrote:
           | A couple more suggestions.
           | 
           | List the platforms (& compilers) that you've tested on.
           | 
           | Compare (pros/cons) against other Unicode libs (like others
           | have done elsewhere in this thread, i.e.
           | https://news.ycombinator.com/item?id=42424637 and
           | https://news.ycombinator.com/item?id=42424638)
        
             | hgs3 wrote:
             | Thanks for the suggestions. I think a comparison table
             | would be useful, but I want to make sure I do it right
             | since I'd be comparing my work to someone else's.
             | 
             | As for the compilers, I've tested the library with GCC,
             | Clang, and MSVC, and with the -pedantic flag like the GP
             | mentioned. The library should build with any standard-
             | compliant C99 compiler.
        
       | tocariimaa wrote:
       | It uses a privative license if you're wondering.
        
       | __turbobrew__ wrote:
       | There is not much to show if I can't read the source code.
        
       | biosboiii wrote:
       | Since MISRA is targetted at Automotive, as a software dev in the
       | automotive space I would suggest adding the note that this is
       | able to run on POSIX compliant OSes like QNX :)
       | 
       | If you would like to chat, hit me up.
        
       ___________________________________________________________________
       (page generated 2024-12-15 23:00 UTC)