[HN Gopher] Faster: Fast persistent recoverable log and key-valu...
___________________________________________________________________
Faster: Fast persistent recoverable log and key-value store
Author : LAC-Tech
Score : 77 points
Date : 2024-02-25 01:27 UTC (21 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| withinboredom wrote:
| FASTER is awesome! I just wish the C version was as feature
| complete as the C# version. I know how to use FFI to call C
| stuff, but I have no idea how to call C# stuff. It's the only
| reason I haven't used it in real life.
| neonsunset wrote:
| You could, in theory, export certain methods with
| [UnmanagedCallersOnly] and AOT* compile it - those become plain
| C exports. Alternatively, you can host .NET runtime within C++
| process and call arbitrary methods from the loaded assemblies.
| Or you could not deal with all that and just use C# :) (which
| comes with an advantage of not using something worse)
|
| * - I don't know if it uses anything requiring JIT but likely
| no or limited to certain features.
| withinboredom wrote:
| I've specifically gone hunting for this documentation and
| never found it. Thanks for the tip!
|
| I love C#, but I've been in PHP/Go land for awhile now. PHP
| is an interesting language, I'd have never thought I'd like
| it as much as I do.
|
| It'd also be cool to see a Go implementation, but there's
| also cgo.
| neonsunset wrote:
| :(
|
| But on occasion you go the first route, the docs are here:
|
| - https://learn.microsoft.com/en-
| us/dotnet/core/deploying/nati... +adjacent sections
|
| - https://learn.microsoft.com/en-
| us/dotnet/api/system.runtime....
|
| This pretty much comes down to writing glue exports the way
| you would do in Rust and then 'dotnet publish'ing it as
| .dll/.so (you could also produce .lib/.a for static linking
| but it is trickier).
|
| Overall, I feel like the ungodly amount of projects of this
| kind (albeit of lower complexity) written in Go (which is a
| weaker platform) could have benefitted from using C#
| instead - it has zero-cost abstractions through struct
| generics like Rust and allows expressing complex data
| structures in a terser way.
| withinboredom wrote:
| I think a lot of what go has going for it is in static
| compilation. Things just work. The C# community leans a
| lot further from open source (paid libraries vs. free).
| neonsunset wrote:
| Go receives so much unjustified good will it is almost
| unbelievable...
|
| Either way, if you do look at current state of .NET
| ecosystem, you may get surprised. But I guess, such is
| the perception of the public that may have read a bit too
| much into Go's promises (it would have abysmal
| performance should FASTER have been written in Go) - C#,
| seen as less popular weird Java, is now an underdog after
| all (look at Github LOC statistics).
| withinboredom wrote:
| > Go receives so much unjustified good will it is almost
| unbelievable...
|
| Yes. Yes it does and it really is annoying. I can't tell
| if it is the language or just people not deeply
| understand the abstractions it provides because it
| doesn't use sane defaults (or the defaults are geared
| towards high-throughput, google scale nonsense).
|
| I help out a lot on FrankenPHP, and lately I've been
| digging into a weird bug, deep in Go, that causes FIN_ACK
| packets to get delayed by hundreds and hundreds of ms.
| There are so many layers of abstractions (nearly 100) to
| dig through. I know for a fact in the C# there are less
| than 10 to the CLR, then after you can rule out the
| language, you are just tracing IR. But no, I'm digging
| through hand-written asm, hunting for a bug or at least
| figure out how to report it with enough information that
| it doesn't sound like "my email can't be sent more than
| 500 miles" (google that one btw).
| apgwoz wrote:
| > I've been digging into a weird bug, deep in Go, that
| causes FIN_ACK packets to get delayed by hundreds and
| hundreds of ms.
|
| Just going to put it out there: have you considered the
| problem might actually be [Nagle's Algorithm](https://en.
| wikipedia.org/wiki/Nagle's_algorithm)? This algorithm is
| the source of 10s of thousands of hours of wasted
| debugging time of "weird bugs related to latency of
| networking." Even the "greats" have wasted time debugging
| something that ended up being "I forgot about Nagle."
| withinboredom wrote:
| Yes. Go doesn't even offer the ability to use Nagle's
| algorithm:
|
| https://withinboredom.info/2022/12/29/golang-is-evil-on-
| shit...
|
| Nagle's algorithm is dealing with connections. These are
| FIN_ACK. Meaning the connection was told to close and we
| need to ack the closed connection.
| kjksf wrote:
| Of course it does: just call TCPConn.SetNoDelay(false).
|
| See https://github.com/golang/go/blob/master/src/net/tcps
| ock.go#...
| withinboredom wrote:
| You have to get access to the TCP connection first ...
| which, once you get more than a few layers above that, it
| is impossible to get to.
| jddj wrote:
| We've arrived at a weird spot.
|
| There's a lot of very good software written in Go now.
| Much of that software could benefit (performance wise, at
| least, leaving ergonomics aside due to subjectivity) from
| being written in C# instead.
|
| But does Go deserve some credit for being associated with
| those projects, or was it marketing and the Google
| effect?
|
| .net AOT is still quite young, so maybe the tide could
| turn somewhat.
| ReflectedImage wrote:
| I think language simplicity is a blind spot for most
| developers. Programming languages shouldn't be judged
| just based on their performance (otherwise assembly
| language would be #1) but also how simple they are. The
| simpler the better.
|
| Go is simpler than C# and that's giving it the advantage
| over C#.
| neonsunset wrote:
| That's what makes it worse. Go was designed, in a way, as
| a toy language to solve the assault on the codebase
| quality by all the fresh graduates Google was hiring. I'm
| not joking - this is paraphrasing Rob Pike. Go is a
| language that wastes my time.
| withinboredom wrote:
| There's also cgo. Say what you will, but an easy-to-use
| way to reach into highly performant libraries and
| existing code was smart from a language design
| perspective. But yeah, other than that, I agree with you.
| ReflectedImage wrote:
| Software developers tend to be far more productive and
| write less bug filled code in "toy" languages. Progamming
| language complexity is usually just plain bad for
| developers at any skill level.
| neonsunset wrote:
| More sophisticated design allows you to richly represent
| a certain problem and offer idiomatic way of solving it
| rather than having you do extra 200 LOC of boilerplate
| like open-coded loops and if err != nils. Not even
| mentioning a dozen DSLs people keep inventing in Go
| ecosystem - something that is usually a sign of language
| weakness, similar to Ruby.
| ReflectedImage wrote:
| Generally speaking, more sophisticated designs perform
| worse both on time to implement and on code correctness.
|
| That's just the way it usually shakes out.
|
| DSLs are known in academic literature as 4th generation
| languages (whereas Go/C# would only be 3rd generation
| languages), they are really good thing as long as you
| aren't the person implementing them.
| zer00eyz wrote:
| There are good reasons to give Go good will:
|
| Simplicity: Editor, GitHub, golang download. I have a
| working dev env for a LOT of use cases. Sudo apt install
| vim git wget tar. wget golang, untar, set env var. I have
| a script to do it for me on Debian boxes. Python, ruby,
| php are pretty close. IM guessing C# is a bit more
| complicated but not by much.
|
| Dependency and library management. GO wins vs
| python(venvs), ruby, node, php... go mod and how it deals
| with pinning, pulling from GitHub. Again I dont know what
| C# has here but go feels both magical, and easy to
| understand on this front.
|
| go build / go run. The fact that is this easy and fast to
| get to a running binary is impressive. I had a badly
| behaving container the other day and the residents of it
| were not giving back helpful errors. One go program (sub
| 100 lines) later I was getting usable error messages and
| quickly worked through the network issues. There are
| plenty of go apps that work like this! Mediamtx is great
| (RPI cam server) just grab the binary blob and go... The
| same thing in python is gonna be a lot more complicated.
|
| Testing: A friend of mine and I recently started a
| project together and his commentary after coming from
| working on large ruby and node projects is "how is this
| testing this fast". GO is eating its own dogfood here
| with its concurrency model. IM guessing that, C# can run
| the same way.
|
| Golangs good will isnt because it is the best at
| something. If were comparing features go is gonna come in
| 2nd or 3rd or 4th every time. Thats the thing, go is
| consistently very good at feature "X". It's not the best
| but it remains in the top 5. Speed, concurrency, compile
| time, ease of setup, ease of deployment, portability,
| scaling...
|
| Golang is the Toyota of programing languages. Lovable in
| its reliability.
| withinboredom wrote:
| Why do you bother commenting if you don't have any
| experience in the other language?
|
| With C#, you literally just `apt install` the cli and it
| keeps up to date. I write go every weekend, and spent
| years in C#. I still don't know how to properly install
| go and I have to run it in docker containers. It's so
| undocumented (or rather the documentation was lacking at
| least a few years ago and I've not bothered checking
| because my workflow works for me), that I got lost as a
| new go dev.
|
| As far as building and running ... C# is on par with Go.
| There's really not much difference, at least from the
| cli.
|
| Editors ... I dunno. I pay for a visual studio license
| and the IDE is simply magical (esp with resharper). I use
| goland for go, and these two IDEs are barely comparable
| in some respects.
|
| The testing story in C# is something to be desired. I
| would rather build a random project to test my code than
| write tests in C#. It's a mess over there.
|
| But yeah, if I were to start a project completely from
| scratch, I'd choose PHP over either of them, so maybe I
| don't know what I'm talking about.
|
| I'll see myself out.
| zer00eyz wrote:
| >>>> It's so undocumented (or rather the documentation
| was lacking at least a few years ago and I've not
| bothered checking because my workflow works for me)
|
| Massive improvements from the days of early go. Between
| ease of install (Mac, Linux, haven't looked at windows in
| 2 ish years) and how it deals with packaging (go mod)
| there has been a ton of progress here.
|
| I have not used C# in at least a decade. My knowledge is
| dated! (python, ruby, node, c, and rust are all much more
| recent). It wasnt a bad language at the time but was very
| MS centric. And installing it on linux is a bit more
| complicated than "apt install" sadly.
|
| As someone who wrote PHP for years (and it paid me well)
| I would say that you should take a look at installing go
| "from scratch" on your local system (
| https://go.dev/doc/install ) and building a few throw
| away tools!
| ryanjshaw wrote:
| If you haven't used C# in a decade you missed that .NET
| Framework has been completely rewritten as open source
| .NET. To install, it's literally just:
|
| apt-get install dotnet-sdk-8.0
| neonsunset wrote:
| and then getting helloworld to run is literally
| dotnet new console -o MyConsole && cd MyConsole && dotnet
| run
|
| (dotnet run uses debug build by default, it's very
| similar to cargo)
|
| as for package management, it is dotnet
| add package {name}
|
| by far one of the best ones, and when you need to
| manually edit .csproj, it's as easy as cargo.toml.
| LAC-Tech wrote:
| Which one have you used, the kv store or log?
|
| Reason I submitted this because I'm curious about peoples real
| world experience.
| withinboredom wrote:
| I was looking to use both. Actually. I discovered FASTER when
| looking to port durable functions to php (it's called
| durable-php if you want to google it, though the
| implementation is nothing like it) and the netherite engine
| uses faster.
|
| It's perfect for my use-case, more so now than when I
| originally researched it as a possibility. Back then, I
| didn't even have threading solved for php. Now that's all a
| solved problem (threads ftw) and I'm refactoring log storage
| now to better support things like faster.
| adamretter wrote:
| Have you considered Meta's RocksDB as an option?
| withinboredom wrote:
| I wrote a time-traveling database (where you can query a
| table/row as of a specific point in time and join it to
| data at another point in time; we used this for AI
| training to predict future behavior in users) completely
| from scratch (that was the coolest work project ever,
| btw) that was built on Hadoop/Hbase. I understand RocksDB
| is fairly similar ... however, I want to stay as far away
| from any of those kinds of APIs. I have scars from
| dealing with hbase and writing query planners and
| figuring out how to do performant joins in a white-room
| type environment. No. Thank. You.
|
| It was fun at the time, but I don't want to go near it
| ever again.
| apgwoz wrote:
| [RocksDB](https://rocksdb.org/) isn't a distributed
| storage system, fwiw. It's an embedded KV engine similar
| to LevelDB, LMDB, or really sqlite (though that's full
| SQL, not just KV)
| withinboredom wrote:
| Yes, it's based on the same paper as hbase, IIRC.
| dgacmu wrote:
| To be perhaps overly detailed: Hbase is an open source
| approximation of bigtable. Bigtable _uses_ leveldb as its
| per-shard local storage mechanism; Rocks is a
| clone+extension of leveldb.
|
| Bigtable and hbase are higher level and provide
| functionality across shards and machines. Level and rocks
| are building blocks that provide a log-structured merge
| tree storage and retrieval mechanism.
| withinboredom wrote:
| > Bigtable _uses_ leveldb as its per-shard local storage
| mechanism
|
| Ah, that's probably what I'm conflating with it then.
|
| Thanks for the information.
| zokier wrote:
| Is this actually used anywhere by MS? Or is it just random
| research project (MSR?)
| withinboredom wrote:
| Durable Functions uses it in Netherite, which is how I
| originally discovered this library.
| LAC-Tech wrote:
| I'm curious about that too, appareantly it was created for
| their SimpleStore research project.
|
| https://www.microsoft.com/en-us/research/project/simplestore...
| Retr0id wrote:
| I think I know what they're getting at, but "can quickly saturate
| disk bandwidth" doesn't exactly sound like a selling point, on
| its own!
| lomereiter wrote:
| Reminds me of a Russian joke.
|
| A secretary applies for a job; the interviewer asks her: "In
| your CV you claim that you can type 1000 characters per minute
| - for real?!" "Yes!", she replies, then adds in a low voice:
| "but such nonsense comes out..."
| ww520 wrote:
| SSD have very good bandwidth, in excessive of 10 GB/s.
|
| The bottleneck is often at the CPU. For a 10Ghz CPU, it can
| spend 1 cycle to process 1 byte. That's the scale we're at now.
|
| https://www.tomshardware.com/features/ssd-benchmarks-hierarc...
| klysm wrote:
| How is that not a selling point? You want the disk to be the
| bottleneck
| cout wrote:
| Not necessarily. For example, an uncompressed log will
| saturate disk more easily than a compressed log but if
| compression is fast enough the compressed log will write more
| data in the same amount of time.
|
| A more complex case: a column store might write in batches.
| Later an insert in the middle might require the entire batch
| to be read from disk and then rewritten. This makes queries
| faster later on but at the cost of more disk io up front. In
| this case disk bandwidth is also saturated but write
| performance might be worse than an append-only log that does
| not optimize at all for reads/queries.
| avinassh wrote:
| How does this compare with RocksDB?
|
| Also, are there any performance benchmarks?
| emills wrote:
| There's a paper that does some benchmarking against other
| options here: https://www.microsoft.com/en-
| us/research/uploads/prod/2018/0...
| welder wrote:
| Looks like it's exponentially faster than RocksDB, but in most
| use cases the bottleneck would be the network and available
| sockets on the machine running FASTER. Unless you're doing high
| throughput embedded data ingestion. Maybe Neuralink could use
| this.
| ww520 wrote:
| RocksDB supports point query and range query. This only
| supports point query. Also I'm not sure whether FASTER supports
| transaction, as the paper didn't mention it.
| CyberDildonics wrote:
| I wish people would push back against using a generic adjective
| as a name. Naming something _" Faster"_ is just trying to get
| someone to remember by being intentionally confusing.
| hacknews20 wrote:
| What! Nonsense.
| CyberDildonics wrote:
| Well you do make a compelling argument.
| bravura wrote:
| Are there Python bindings? I couldn't find any.
| DenisM wrote:
| How does it compare to FoundationDB? Nether the paper nor the
| GitHub page mentioned it.
| _bohm wrote:
| It would be a bit of an apples-to-oranges comparison.
| FoundationDB is a distributed KV database that supports range
| queries. FASTER is an embedded KV store that only supports
| point lookups. The use cases for each are rather different.
| siliconc0w wrote:
| I really like the idea of tiering a log device such that you go
| from memory -> nvme -> object storage. You get some nice
| properties like fast low latency commit times from NVMe, read
| your own local writes (usually good enough), but have most of
| your cold data on cheaper durable object storage with some
| intelligence to pull down/warm up pages when you suspect you'll
| need them.
|
| It'd be nice if this had a more language-agnostic frontend like
| gRPC or something.
| dang wrote:
| Related:
|
| _Faster A fast concurrent persistent key-value store and log, in
| C# and C++_ - https://news.ycombinator.com/item?id=25741670 - Jan
| 2021 (8 comments)
|
| _Faster - Fast key-value store from Microsoft Research_ -
| https://news.ycombinator.com/item?id=17785002 - Aug 2018 (76
| comments)
|
| _Faster - A key-value store for large state management_ -
| https://news.ycombinator.com/item?id=17267403 - June 2018 (34
| comments)
___________________________________________________________________
(page generated 2024-02-25 23:01 UTC)