[HN Gopher] Oxidizing bmap-tools: rewriting a Python project in ...
___________________________________________________________________
Oxidizing bmap-tools: rewriting a Python project in Rust
Author : glenngillen
Score : 57 points
Date : 2023-03-04 11:43 UTC (11 hours ago)
(HTM) web link (www.collabora.com)
(TXT) w3m dump (www.collabora.com)
| UncleEntity wrote:
| > Usually a project is oxidised into Rust because of many
| reasons, the main usually being memory safety.
|
| What about python's memory model is unsafe?
| pohl wrote:
| Does Python do anything to enforce mutually exclusive access
| when mutating? If not, that's a hole you could drive a truck
| through, isn't it?
| gpm wrote:
| Isn't Python still run under a single global interpreter
| lock? Can't have simultaneous access while mutating if only
| one thing is running at a time...
| pohl wrote:
| Yeah, that's something, at least. Wouldn't the order that
| mutations happen still matter, even though they have to
| acquire a lock? Not a pythoneer, myself.
| masklinn wrote:
| Not in the context of memory safety. You can still have
| race conditions up the ass, but not data races, unless
| you're using a native library which 1. releases the GIL
| and 2. is broken.
| [deleted]
| pohl wrote:
| Interesting, I hadn't realized how much the phrase
| "memory safety" understates what is desirable.
| [deleted]
| [deleted]
| gpm wrote:
| The thing is that races are _good_ a lot of the time. If
| I have a set of tasks running in parallel that take an
| unknown /variable amount of time and I want to tell the
| users which ones are finished, my output _needs_ to be
| based on a race between the tasks. If I 'm scraping a
| website, I (may) want to have multiple connections going
| in parallel, and as soon as one of those connections
| spots a new link I (may) want to open a new connection to
| start scraping it, but I don't know which connection is
| going to spot a new link first, so there's a (benign)
| race condition.
|
| Making a language that banned them outright would be
| making a language that couldn't do things that people
| wanted to do.
| Yoric wrote:
| I figure that you could very easily mark which race
| conditions are good.
| gpm wrote:
| Matter? Sure, there can be race conditions.
|
| But allow for memory unsafety? No, not if every ordering
| of the "critical sections" (chunks of code run as a unit
| while the interpreter is locked) is valid and upholds the
| invariants Python expects.
| tialaramex wrote:
| For so long as the GIL persists, you are correct, and thus
| Python does not have data races and is able to achieve
| memory safety in this regard.
|
| It is conceivable (but extremely unlikely, 'cos it was
| really, really hard) that after a GILectomy Python follows
| the Java path, in which data races are technically safe+.
| However it is most likely Python with a GILectomy will
| behave like Go or C# or numerous other languages and lose
| memory safety properties if a data race occurs.
|
| + Data races can happen in Java, and _astonishing things_
| might happen, but objects always remain in some valid
| state, so there is no loss of memory safety whereas in most
| languages with data races you can e.g. race a hash table
| and mess up its internals and cause chaos.
| brundolf wrote:
| The article felt kind of disjointed, I think that statement was
| just meant generally and not meant to suggest it applies here
| UncleEntity wrote:
| Yeah... It's becoming a pet peeve that the Rustafarians
| believe they have a monopoly on "memory safety" and need to
| point it out all the time.
| shaunsingh wrote:
| Its worth pointing out because rust has a monopoly on easy-
| to-write gc-less memory safety, with the alternatives being
| modern c++ or higher level languages where you run into a
| garbage collector
| masklinn wrote:
| Also how I interpreted it, though even there it's quite weird
| (e.g. better performances is also a common reason to convert
| things to Rust, especially when "easy binding" tools like
| pyo3, neon, or rustler are available and take care of the
| unsafe bits between the two).
| creddit wrote:
| Wow I really like the terminology "oxidizing" for re-writing
| something in Rust.
|
| Sorry for the unsubstantive comment.
| claytonjy wrote:
| Which Rust-written Python tools are folks using?
|
| I know of two big ones: ruff (linting) and pyflow (dependency
| management). The standard lib crypto module uses rust, too.
|
| Are there other ones I should know about? Maybe replacements for
| mypy, pre-commit, tox/nox?
| 1f60c wrote:
| > The standard lib crypto module uses rust, too.
|
| This couldn't matter less, but I think you're confused with the
| third-party Cryptography package, which uses Rust.
| claytonjy wrote:
| My bad, thanks for the correction!
| bogeholm wrote:
| polars, pydantic and deltalake come to mind
|
| - https://pypi.org/project/polars/
|
| - https://pypi.org/project/pydantic/
|
| - https://pypi.org/project/deltalake/
| jamincan wrote:
| Does Pydantic use rust? When I check the github repo, it
| shows 100% python.
| claytonjy wrote:
| He's working on a rust rewrite, to be used in Pydantic 2.0
| ilovecaching wrote:
| I'm confused... you're talking about avoid a local copy of sparse
| regions... Linux already does that at the level of the inode.
| There's also a seek operation to move past the next hole. Not
| sure why you would carry around metadata the filesystem is
| already tracking for you.
| masklinn wrote:
| > Not sure why you would carry around metadata the filesystem
| is already tracking for you.
|
| Because bmap files are independent of the filesystem _and_ OS,
| and thus would probably like to work even with filesystems
| which don 't support sparse files, and OS which don't expose
| holes?
|
| For instance until NFS 4.2 in 2016 you could write sparse files
| to an NFS volume, but there was no way to detect holes when
| reading. exfat doesn't support sparse files at all. And
| according to their man pages, OpenBSD and NetBSD have yet to
| support SEEK_HOLE/SEEK_DATA (which are non-standard extensions
| of POSIX lseek(2)).
|
| Plus according to its history the bmaptools project was created
| about a year after the release of kernel 3.1, which introduced
| support for SEEK_HOLE and SEEK_DATA. Doesn't take much of a
| leap to assume that the project's creator didn't consider that
| widespread enough to be reliable (Debian wouldn't release a
| 3.x-based version until the following year).
| andrewshadura wrote:
| Seeking through holes also doesn't work very well for
| compressed images, since usually there is no way to tell
| apart an insignificant hole from a long sequence of zeroes or
| other filler data.
| hummus_bae wrote:
| [dead]
| agildehaus wrote:
| An example from my work:
|
| We have a Yocto build that results in about 120MB worth of
| files that make up our app and Yocto. Originally we had a
| script that would write a bootloader, partition and format ext4
| our target's eMMC, and decompress a 120MB tarball to that
| filesystem.
|
| That worked well, but we wanted our script to become OS-
| independent, as our field team ran Windows laptops. It's quite
| difficult to get Windows to do an ext4 format, and I wanted our
| tool to have a minimal number of dependencies (e2fsprogs
| requirement? some proprietary thing from Paragon? no thanks)
|
| So instead, have Yocto produce an image containing the
| bootloader and all four pre-formatted ext4 filesystems. No
| operating system needs to do the format if the filesystem
| already exits within, it's just a raw block write. But now the
| image is 4GB, the size of our eMMC, and writing all of it would
| be painfully slow.
|
| Thankfully Yocto also outputs a bmap file which maps the parts
| of that 4GB which are empty space -- blocks we don't need to
| write when commissioning our target device. So our
| commissioning tool was rewritten in Go, and I wrote a bmap
| implementation in Go to do the write. Flashing our target is as
| fast as it used to be, but now that tool can be easily made to
| work on multiple operating systems.
| masklinn wrote:
| There's very little content to the article sadly, aside from
| links to the artefacts.
|
| It also seems to have been done independently of the upstream, so
| it's not really an "oxidation" in the usual terms, more of a
| pseudo-fork of the specific `bmaptool copy` subcommand (though
| TBF it only has one other subcommand which is `create`, and the
| implementation in the upstream is about 1/3rd that of copy, so
| copy is clearly the "meat" of the project).
___________________________________________________________________
(page generated 2023-03-04 23:00 UTC)