[HN Gopher] My experience binding a couple of scripting engines ...
___________________________________________________________________
My experience binding a couple of scripting engines with C++
Author : germandiago
Score : 56 points
Date : 2021-05-24 09:38 UTC (13 hours ago)
(HTM) web link (germandiagogomez.medium.com)
(TXT) w3m dump (germandiagogomez.medium.com)
| heinrichhartman wrote:
| > [lua] was discarded because of 3. It has unfamiliar syntax, but
| worse, unfamiliar semantics: no classes, use tables, start
| indexing at 1 and other oddities, just as being able to call
| functions with the wrong number of arguments and returning nil on
| the way. Also, use tables for both hash tables and arrays. It is
| powerful, do not misunderstand me, and Lua supports good
| concurrency. It was just not what I was looking for because of
| the mentioned things.
|
| I can fully understand why lua is not a good fit for this case,
| however, I would like to add some color to the picture.
|
| The most powerful way to for C(++) - lua interop, is not the
| official CAPI but luajit/FFI: https://luajit.org/ext_ffi.html
|
| This allows for allocation of C objects on the heap and on the
| stack and FAST function calls. Doing the same for C++ is possible
| but requires some work e.g. http://lua-
| users.org/lists/lua-l/2011-07/msg00492.html
|
| Furthermore:
|
| - unfamiliar syntax -- The syntax is tiny -- and I found nothing
| unexpected about it.
|
| - no classes -- There are many class libraries available for lua.
| Just pick one. Used penlight classes quite a bit, without running
| into major issues.
|
| - use tables for both hash tables and arrays. -- Yes. This is on
| the API side, under the hood hashes and arrays are used where
| appropriate.
| fullstop wrote:
| I grumbled about indexing starting at 1, but once you get used
| to it it makes a lot of problems easier.
|
| I've spent the last twenty-ish years in C and string
| manipulation just sucks. Think about it, you declare a buffer
| of length 20, the indexes are from 0->19, and the 19th byte
| needs to be a null if you are using it as a string and are
| using the entire buffer. Also, the standard library is not
| guaranteed to null terminate in all situations.
|
| Lua's string indexing feels far more natural to me.
| tannhaeuser wrote:
| This. The insistence on using 0-based string offsets is
| purely a C thing (where it makes sense) inherited on to
| languages that wanted to stay close to C or appeal to C devs
| (even though it does not make sense). An easy way to check is
| looking into awk which, as a DSL for string manipulation
| written itself in C, deliberately uses 1-based string
| offsets, and where many/most common string expressions
| collapse to a very compact form, which makes even more sense
| because empty string results are interpreted as false in
| conditions.
| jhvkjhk wrote:
| It's not a C thing, it's a math/utility thing.
|
| Dijkstra: Why numbering should start at zero https://www.cs
| .utexas.edu/users/EWD/transcriptions/EWD08xx/E...
| corysama wrote:
| Which is funny because Lua got it's 1-base from FORTRAN
| which, I believe, adopted it to make TRANslating math
| FORmulas easier.
| tovej wrote:
| I agree in part that it's a C thing. More accurately it's
| an offset from the array base thing.
|
| The first element is at index 0 because its address is
| base + 0 * sizeof(element)
|
| The second element is at index 1 because its address is
| base + 1 * sizeof(element)
| sporedro wrote:
| The indexing starting at 1 is something that has always
| annoyed me. I just haven't seen any other language make that
| decision, and I'm not sure the reasoning for it really
| outweighs the fact every other language just goes with 0 due
| to the origin of it.
|
| Lua is a great language for sure though.
| fullstop wrote:
| Pascal did, but I believe that was because index 0
| contained an 8-bit length. There was no need for null
| termination, and strings were limited to 255 bytes.
| vlovich123 wrote:
| Small correction. The 20th byte (at index 19) needs to be
| null.
|
| The mismatch with the English language and how people
| naturally count is definitely there and annoying. And yes,
| string manipulation in C is especially broken although I
| think the indexing is the smallest problem there.
|
| However, it's extremely natural when you think about it in
| terms of memory access. For example, in a 1-based indexing
| system, ptr[0] would point 1 character behind your pointer
| (weird) and ptr[-1] would point 2 back (wtf). Having the
| index map neatly to the offset makes a lot of sense to me. In
| fact, when I first started programming in VB6 20 years ago
| and only had a math background, the 1-based indexing was
| natural but I could never figure out why I had so many bugs
| related to array and string offsets.
|
| I'll also note that most programming languages are 0-based
| and interop with C is not really the goal (Java, JavaScript,
| Ruby, Python, etc). In fact, Python and Perl's string
| manipulation is some of the best out there and they are 0
| indexed.
| fullstop wrote:
| I figured that someone would pipe up about the 19th byte vs
| 19th index, and I'm glad. This is just semantics, but does
| index 0 represent the first byte or the zeroth byte?
|
| I completely agree regarding memory access, but would argue
| that strings and memory should not be treated in the same
| way. Having the index represent the length - 1 has caused
| countless off-by-one bugs [1] that would not have been
| there in the first place if string indexes started with 1.
|
| Java, JavaScript, Python, (maybe Ruby, I'm not fluent
| there) also bite the user if you attempt index data outside
| of the string's range. C will happily index whatever you
| want, and these bugs can often remain hidden for decades.
|
| 1. https://cwe.mitre.org/data/definitions/193.html
| Joker_vD wrote:
| Index N represents "skip N elements". That gives you very
| easy additive/subtractive behaviour, unlike "how many
| numbers are there from 7 to 17?" scenarios.
|
| Sure, 17-7 gives you 10, but that's not the final answer,
| you have to add 1 to get the right answer, 11. Sorry, no,
| you actually subtract 1 and the right answer is 9. Wait,
| no: 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, okay, 11 in
| total, so you have to add 1, got it right the first time.
| quietbritishjim wrote:
| Another way to look at it, in languages like Python that
| support slicing, is that the indices refer to the
| boundaries between the elements rather than the elements
| themselves. a b c d 0 1 2 3 4
|
| From this 4 character string you can slice s[1:3] and get
| 'b', 'c'. s[i:j] will always have j - i characters
| (ignoring negative indices). s[i] is as if it were short
| for s[i:i+1] e.g. s[3] is s[3:4] and that gives you 'd'.
|
| Admittedly C++ doesn't have slicing so this is less
| relevant, but I think it's still an interesting aspect to
| the discussion (without getting bogged down in ideology).
| germandiago wrote:
| It has with std::span
| CraneWorm wrote:
| > [...] There are many class libraries available for lua. Just
| pick one. [...]
|
| Alternatively, embrace the prototype-based programming and
| just... don't add classes.
| pwdisswordfish8 wrote:
| In practice though, is prototype-based inheritance useful for
| anything other than implementing your own class system on top
| of it?
| CraneWorm wrote:
| I hold an opinion that, if you use the prototype-based
| approach, you should avoid inheritance like the plague ;)
| pwdisswordfish0 wrote:
| You should avoid it whether prototypes are involved or
| not.
| pwdisswordfish8 wrote:
| Also,
|
| > unfamiliar semantics: no classes, use tables, start indexing
| at 1 and other oddities, just as being able to call functions
| with the wrong number of arguments and returning nil on the way
|
| Other than 1-based indexing, those semantics should be very
| familiar from JavaScript.
| germandiago wrote:
| It still feels weird to me to call table an array, a hash
| table and do classes via metatables in very different ways.
| For sure it is powerful.
|
| But it does not fullfill well the zero-friction I was looking
| for when integrating. You have to change a bit your mindset
| when integrating and using this stuff.
|
| In Wren you have lists, which are basically arrays and
| dictionaries, ranges and all things I already know how to use
| from Python/C++.
|
| That said, inheritance is not working smooth and ChaiScript
| does not even have inheritance itself. But for my purposes a
| class, concurrency and familiar data structures and patters
| was enough.
| dahfizz wrote:
| Javascript is not known for its intuitive and sane design,
| and certainly not from the perspective of C/C++ devs
| tines wrote:
| The criterion was "familiar", which is nothing if not
| Javascript -- nobody said anything about intuitive or sane
| :)
| gigel82 wrote:
| JavaScript would definitely be my first choice if I had to
| integrate another scripting language into a native program today
| (doubly so if the eventual target is web, like the author implies
| with WebAssembly support).
|
| Depending on your needs (small vs. fast, low memory usage vs.
| full JIT, etc.) you can pick anything from JerryScript / QuickJS
| all the way to V8.
|
| Perhaps the only thing missing is a universal C/C++ API for
| embedding JavaScript engines that lets you swap out easily to
| test different trade-offs.
| TonyTrapp wrote:
| Developers working with C-like languages might not like Lua
| because of its different looks and behaviour. I was in the same
| situation and started looking into Squirrel because of that. But
| eventually I went back to Lua, for one important reason: The
| ecosystem. Lua has a lot of adjacent tools and a huge community.
| If you want to do something in Squirrel, you will be much more on
| your own, which can be frustrating. Lots of Lua's quirks can be
| easily worked around, but lack of a community supporting the
| language can't. This is especially important if you want to open
| your scripting API to users of your software.
| germandiago wrote:
| I think what you say is true. Anyway, I do not need state-of-
| the-art technology in my case. Basically I really wanted, and I
| think (in order of importance, but being the two former points
| the most important with a big difference), this:
|
| 1. Concurrency support
|
| 2. Easy to bind into C++
|
| 3. Familiarity, etc.
|
| Namely, if only Lua had existed and provided that Sol2 exists
| and makes it bind it to C++ easy, I would have chosen that. But
| since Wren + WrenBind17 existed, Wren had more familiar syntax
| and it was a viable choice, I went for that. I was trying to
| find the past of least resistance (lower learning curve, easier
| to bind, concurrency making my code easier, since I am familiar
| already with most patterns)
|
| As for ChaiScript, it was the first thing I took since it was
| so easy to embed. But it had its own problems: lack of
| concurrency and it does not point the file and line of errors,
| which is _very_ painful because it drops your productivity.
|
| And scripting... scripting is about productivity, at least that
| is what I was using it for.
| germandiago wrote:
| Constructive feedback for the article is welcome. Thanks!
| Rochus wrote:
| Did you have a look at e.g.
| https://root.cern.ch/root/html534/guides/users-guide/CINT.ht...
| and https://root.cern/cling/?
|
| > _Lua ... It has unfamiliar syntax ... start indexing at 1_
|
| Syntax is not unfamiliar, just more Pascal like; if you use
| LuaJIT you can use zero based indices and a powerful FFI for
| direct C code integration.
| pierrec wrote:
| > if you use LuaJIT you can use zero based indices and a
| powerful FFI
|
| Not entirely, Lua standard libraries still expect everything
| to be one-indexed, while FFI structures are zero indexed. So
| with LuaJIT you often end up with a mix of 0 and 1 indexed
| code, which in my experience was workable but definitely a
| pain point.
| Rochus wrote:
| You should avoid the Lua C API in LuaJIT because it is not
| supported by the JIT (i.e. it makes your code running in
| the interpreter instead of the JIT). Using zero based
| indices in Lua code running on LuaJIT works well.
| pierrec wrote:
| I'm not talking about the Lua C API, but things as simple
| as this: > stuff = {"one", "two",
| "three"} > stuff[1] one
|
| These native Lua structures are baked in, and they're a
| lot more flexible than FFI structures - presumably if
| you're using Lua, it's because you want to take advantage
| of that flexibility and those affordances. FWIW I've
| written a lot of LuaJIT, and usually I kept the lower-
| level FFI stuff separate from the higher-level code using
| Lua data structures, so I rarely encountered that
| discrepancy between them, but still something to keep in
| mind.
| Rochus wrote:
| You can e.g. do > stuff = { [0]="one",
| [1]="two" } > print(stuff[0]) one
|
| Works well; I wrote e.g. https://github.com/rochus-
| keller/Smalltalk#a-smalltalk-80-in... that way.
|
| EDIT: even this works > stuff = {
| [0]="one", "two", "three" } > print(stuff[0]) ->
| one > print(stuff[1]) -> two
| pansa2 wrote:
| > _stuff = { [0]= "one", "two", "three" }_
|
| In this case, is it possible to make iteration start with
| the element at index 0? Maybe by implementing a custom
| version of `ipairs`?
| Rochus wrote:
| When using > stuff = { [0]="one", "two",
| "three" } > for k,v in pairs(stuff) do print(v) end
|
| it prints all three elements in the correct order. I
| rarely use iterators for performance reasons anyway.
| Instead of ipairs one can use > for
| i=0,#stuff do print(stuff[i]) end
| germandiago wrote:
| Well, by this I mean "unfamiliar to me", of course. Lol.
|
| Actually Lua is something to consider from the point of view
| of usage: it is an industry standard actually. However, all
| those small quirks in semantics... and classes can be done in
| many ways (that is what I understand, via metatables)...
|
| In ChaiSCript or Wren there is one true way and you are done.
| You might like it or not, but it leads to less confusion,
| especially if you use most of the time what is in the
| mainstream.
|
| This is by no means a bad thing in itself, it is just about
| how ergonomic or time-consuming it could be for myself: I
| just feel more comfortable with ChaiScript, Wren or Squirrel
| than with Lua. Even AngelsCript is also more similar to what
| you already have. So when exposing APIs there is much less
| friction.
|
| Truth to be told, there is also
| https://github.com/ThePhD/sol2 which looks great and
| something to consider. It makes binding things quite easier
| and gives you object-oriented Lua. You could rely on that.
|
| It was just my subjective choice. There is no 100% right
| choice. Probably, if I found people that are comfortable with
| Lua I would use that. But the case is that this is a project
| of mine as it stands now.
| jcelerier wrote:
| > Syntax is not unfamiliar, just more Pascal like;
|
| how is that not unfamiliar
| coldtea wrote:
| In that it still refers to a hugely popular family of
| languages...
| jcelerier wrote:
| Chinese is also a hugely popular language, does not mean
| that it is familiar to a large amount of humans
| coldtea wrote:
| The analogy breaks as Pascal knowledge is not confined to
| one geographical area or ethnicity. The same for
| languages inspired by Pascal syntax, with are tons.
|
| No matter how you slice it or dice it, Pascal and Pascal-
| like syntax are not some obscure niche languages...
| Rochus wrote:
| Well, why would you then consider Python to be familiar?
| oblio wrote:
| Because Python is 100000x more popular than Pascal in
| 2021?
| Rochus wrote:
| Python syntax is more similar to Pascal than to e.g. Java
| or JS.
|
| " _Modula-3 is the origin of the syntax and semantics
| used for exceptions, and some other Python features._ "
| (from https://docs.python.org/3/faq/general.html#why-was-
| python-cr...). Also the predecessor languages ABC and
| SETL were in the Algol tradition.
|
| > _Python is 100000x more popular than Pascal_
|
| It's about factor 8 on https://www.tiobe.com/tiobe-index/
| or factor 3 (in score) on
| https://spectrum.ieee.org/static/interactive-the-top-
| program..., whatever you prefer as a reference. Delphi
| (which is Object Pascal) is still a widely used language.
| oblio wrote:
| I generally go by job listings. It's trivial to find
| Python jobs, Pascal/Delphi jobs are very rare.
| Rochus wrote:
| Maybe you can post a link to a job site where there are
| _100000x_ more Python than Pascal /Delphi/Ada jobs. Btw.
| you can filter the IEEE ranking by jobs which seems to
| correspond well with what I see on monster or indeed.
| Anyway, the discussion was about whether the Lua or
| Pascal style syntax is unfamiliar or not.
| germandiago wrote:
| You make a good point. I started my Computer Science and
| Engineering degree (I am european, not american, so the
| equivalent looks like kind of a merge of both areas) with
| Python, C and C++ on the programming side.
|
| Pascal was discarded a few years back in my university.
| And yes, by familiar I mean exactly what you mean: you
| see nowadays Java, Python, C++, C, C#, but Pascal is
| disappeared.
|
| Disappeared since long ago since I do not know even the
| syntax myself by casual reading around.
| [deleted]
| jcelerier wrote:
| If coming from C++ I definitely wouldn't, especially the
| module system and reference binding in python is WEIRD.
| I'd say C, C#, maybe D and Java would fit ? My criterion
| would be "can a new grad student who only learned c++ be
| productive in a couple days"
| Rochus wrote:
| There may be a difference between your view and that of
| the majority of developers. My primary language is also
| C++, but languages of the Pascal family (to which Python
| is related) remain very popular.
| germandiago wrote:
| As for looking at CINT, CLing, yes I did. I prefer to use a
| dynamic language with coroutines out of the box. It is
| actually what I was looking for besides ease of binding it
| and "familiarity" in semantics/syntax in a broad, imprecise
| way I defined for myself.
| Zababa wrote:
| > Prefer dynamic to static typing, since static typing can
| remove the coding speed: it makes you think about types.
|
| I'm wondering what you mean exactly by this. Do you not think
| about types when programming with a language with dynamic
| typing?
| germandiago wrote:
| No, I think I expressed myself the wrong way.
|
| What I mean is that if you have to annotate all your code
| with types (like in AngelScript), this will slow you down for
| two reasons. First, you need to think about types, and
| second, refactoring is more rigid.
|
| If it is optional, it is ok, you can take advantage of it at
| will (ChaiScript supports types in parameters, but
| optionally).
| Zababa wrote:
| Thanks, that clarifies it and makes sense.
| pwdisswordfish8 wrote:
| I kind of expected to see QuickJS or Duktape here; if the author
| considers 'something like javascript-ey' to be just fine, he
| might as well have used JavaScript itself, with all its strengths
| and faults.
| germandiago wrote:
| Well. I think I was a bit inaccurate. When I said javascript-ey
| what I meant is also familiar. Something like Squirrel and Wren
| do a good job.
|
| The ones you mentioned, as far as I investigated, were not
| dead-easy to integrate into C++, one of the top requirements.
|
| Take into account that I have to expose my own types, not just
| ints and basic types.
|
| The original API was coded in a natural C++ way. APIs that wrap
| well in that sense could be Sol2 for Lua, Chaiscript,
| Wrenbind... which integrate with custom types and smart
| pointers.
|
| With other scripting languages and their libraries you need
| additional work
| pwdisswordfish8 wrote:
| Duktape's host API is pretty much a ripoff of Lua's, and the
| latter wasn't rejected on those grounds, so...
|
| QuickJS's API isn't particularly well-documented, but it's
| not hard to find your way around it either if you dig into
| the source (the engine is very hackable, too; you might even
| fix some of the language's design flaws - obligatory wat talk
| reference - if you're so inclined). The host API follows the
| CPython model, with objects represented by pointers and
| explicit reference counting on the C side. There are some
| predefined macros to ease defining built-in classes. Some
| type-level hackery in C++ might ease things even further. I
| don't know how much deader-easier you want it.
| germandiago wrote:
| Take a look at how pybind11, wrenbind17, sol2 or Chaiscript
| do it. That is how easy I want it: I can expose custom
| types and global state easily. I do not want just ints and
| const char */double.
|
| These bindings do from decent to great.
| pansa2 wrote:
| > _Python [...] could be difficult to port to Web Assembly down
| the road_
|
| The Pyodide project has already compiled CPython to WebAssembly -
| why is that a worse solution than compiling one of these other
| scripting language interpreters to WASM?
| pansa2 wrote:
| One issue could be size - CPython's native binary is an order
| of magnitude larger than Lua's, and the same is probably true
| when using WASM.
|
| Perhaps something like MicroPython could solve that, though.
| zurn wrote:
| Indeed, Python is one of the most well behaved scripting
| languages for WebAssembly, and people were running in browser
| for a good while already with WebAsembly predecessors
| (emscripten and asm.js).
| germandiago wrote:
| This was not my information at the time. But thanks for the
| info, it is helpful.
|
| With https://github.com/pybind/pybind11 there is really great
| integration with C++ and Python is my second home after C++
| actually.
|
| Anyway, I am quite happy with Wren and it seems to be fast
| (not a requirement for my project, though)
| tyingq wrote:
| While it "works" python under WASM means downloading a very
| large interpreter and runtime and waiting quite a long time for
| it to start up.
|
| On my i5 laptop, this demo downloads about 8Mb and takes a
| couple of seconds to load up:
| http://karay.me/truepyxel/demo.html
|
| Lua, by comparison, is very small and has a fast startup under
| WASM.
___________________________________________________________________
(page generated 2021-05-24 23:01 UTC)