[HN Gopher] BetrFS: an in-kernel file system that uses Be trees ...
       ___________________________________________________________________
        
       BetrFS: an in-kernel file system that uses Be trees to organize on-
       disk storage
        
       Author : swills
       Score  : 189 points
       Date   : 2021-12-01 13:10 UTC (9 hours ago)
        
 (HTM) web link (www.betrfs.org)
 (TXT) w3m dump (www.betrfs.org)
        
       | BlackLotus89 wrote:
       | > NOTE: The BetrFS prototype currently only works on the 3.11.10
       | kernel.
       | 
       | No thank you.
       | 
       | Why is it called ftfs in the kernel? To confuse potential users?
       | I mean it was probably called fractaltreefs before and they just
       | renamed it to jab at btrfs and get publicity? I don't know, but
       | it seems weird to me.
       | 
       | PS I would have to create a completly new system from scratch
       | just to test this since my systems won't boot with such a
       | prehistoric kernel. Also many improvements that "recently" went
       | into the linux kernel will be moot. Last commit was from march...
       | why was this posted now and is there any interest/activity left?
        
         | JoshuaJB wrote:
         | > Last commit was from march... why was this posted now and is
         | there any interest/activity left?
         | 
         | I work in an adjacent research group at UNC, and I can assure
         | you that this is a very active project. Unfortunately, because
         | most venues now use double-blind review, the updated code can't
         | be posted until after the associated paper(s) are accepted.
         | 
         | I'd encourage any potentially interested parties to star/watch
         | the GitHub repo to keep an eye on development. I've seen some
         | very impressive benchmark improvements from work currently in
         | the pipeline.
        
         | Tobu wrote:
         | The README[1] is enough to understand, this is a really funky
         | implementation and does nothing to bring useful functionality
         | into the kernel.
         | 
         | It relies on TokuDB, which is a database server meant for
         | userland, already a complex piece of code, patent-encumbered,
         | probably well tweaked but heavy. It does not port that to the
         | kernel, rather it reimplements userland interfaces in-kernel.
         | For example, the file-based interfaces TokuDB expects are
         | proxied to files in a different filesystem.
         | 
         | B_epsilon trees are a useful data structure, they may have a
         | place in filesystems, but it will take a from scratch
         | implementation to prove it. Repackaging TokuDB's patented
         | fractal trees with extra duct tape does not address any needs
         | outside of superficial marketing.
         | 
         | [1]: https://github.com/oscarlab/betrfs/blob/master/README.md
        
         | MisterTea wrote:
         | > PS I would have to create a completly new system from scratch
         | just to test this since my systems won't boot with such a
         | prehistoric kernel.
         | 
         | This is most certainly a use case for running this in a virtual
         | machine.
        
           | jandrese wrote:
           | Using a VM to benchmark disk IO is a whole other can of
           | worms.
        
             | derefr wrote:
             | When the limiting factor for IO perf that you're trying to
             | measure with your benchmark is the filesystem, rather than
             | the disk, you don't actually want a real disk backing your
             | filesystem; that would just introduce noise to your
             | measurements. Instead, you want your filesystem to be
             | backed with an entirely in-memory block-device loopback
             | image or some equivalent -- which VMs are perfectly capable
             | of (and in fact better at) providing.
             | 
             | Think of it by analogy to e.g. GPU benchmarking: you'd
             | never use anything slower than the fastest CPU you can get
             | your hands on, because you want to benchmark the GPU on its
             | own as a single system bottleneck; not how well the GPU
             | idles when held back by a bottleneck somewhere else in the
             | system.
        
               | MobiusHorizons wrote:
               | I don't think you want to benchmark datastructures
               | designed to be efficient on high latency disks using low
               | latency memory. Many file systems can be efficient with
               | low latency disks, that's not very impressive. What's
               | impressive is being efficient with high latency storage.
        
       | mirekrusin wrote:
       | Wasn't tokutek's, now part of percona, fractal tree open source
       | _BUT_ patented?
       | 
       | edit: yes, it seems [0]
       | 
       | [0] https://github.com/Tokutek/ft-index/blob/master/PATENTS
        
         | mbravenboer wrote:
         | Be-trees were invented in 2003, predating the Tokutek patent:
         | see citation [1] in
         | http://supertech.csail.mit.edu/papers/BenderFaJa15.pdf . I
         | think Tokutek was originally not aware of Be-trees and only
         | later discovered the similarity to Be-trees.
        
       | david_draco wrote:
       | I wish there was a file system that, when the CPU is unused,
       | trawled the disk for unused files and transparently compressed
       | them.
        
         | pxc wrote:
         | aren't the existing forms of transparent compression for
         | filesystems better than that?
        
         | koverstreet wrote:
         | bcachefs does that, with the background_compression option
        
       | carterschonwald wrote:
       | Last time I looked at this, it was using the toku db b-epsilon
       | tree mysql db code. I wonder how much has changed.
        
       | knl wrote:
       | That's all fine and dandy, but it only means that there is some
       | good/great research in the topic. Winning best paper awards
       | doesn't say a thing about the implementation and handling of
       | various edge cases.
       | 
       | Like many research projects, this one will also probably last as
       | long as there is funding. Remember, the goal of PhD students is
       | to publish papers, not develop and maintain software. Thus,
       | without skin in the game, I couldn't trust my data/workloads to
       | such systems.
        
         | tytso wrote:
         | The goal of academic research is to explore ideas, which are
         | judged by submitting papers to conferences for review, and to
         | train the next generation of academics (i.e., graduate
         | students) in coming up with ideas, proving them, and then
         | writing up said ideas and the proof that they work well. It is
         | not to create a production quality software, which is an
         | orthogonal set of goals and skills.
         | 
         | The key thing to remember is that THERE IS NOTHING WRONG WITH
         | THIS ACADEMIC PROCESS. I go to the Filesystems and Storage
         | Technology (FAST) conference, where many of these BetrFS papers
         | were published, to harvest ideas which I might use in my
         | production systems, and of course, to see if any of the
         | graduate students who have decided that the academic life is
         | not for them, whether they might come to work for my
         | company[1]. I personally find the FAST conference incredibly
         | useful on both of these fronts, and I think the BetrFS papers
         | are super useful if you approach them from the perspective of
         | being a proving ground for ideas, not as a production file
         | system.
         | 
         | So it's unfortunate that people seem to be judging BetrFS on
         | whether they should "trust my data/workloads to such systems",
         | and complaining that the prototype is based on the 3.11 kernel.
         | That's largely irrelevant from the perspective of proving such
         | ideas. Now, I'm going to be much more harshly critical when
         | someone proposes a new file system for inclusion in the
         | upstream kernel, and claiming that it is ready for prime time,
         | and then when I run gce-xfstests on it, we see it crashing
         | right and left[2][3]. But that's a very different situation.
         | You will notice that no one is trying to suggest that BetrFS is
         | being submitted upstream.
         | 
         | A good example of how this works is the iJournaling paper[4],
         | where the ideas were used as the basis for ext4 fast
         | commits[5]. We did not take their implementation, and indeed,
         | we simplified their design for simplicity/robustness/deployment
         | concerns. This is an example of academic research creating real
         | value, and shows the process working as intended. It did NOT
         | involve taking the prototype code from the jJournaling research
         | effort and slamming it into ext4; we reimplemented the key
         | ideas from that paper from scratch. And that's as it should be.
         | 
         | [1] Oligatory aside: if you are interested in working on file
         | systems and storage in the Linux kernel; reach out to me ---
         | we're hiring! My contact information should be very easily
         | found if you do a Google search, since I'm the ext4 maintainer
         | 
         | [2] https://lore.kernel.org/r/YQdlJM6ngxPoeq4U@mit.edu
         | 
         | [3] https://lore.kernel.org/all/YQgJrYPphDC4W4Q3@mit.edu/
         | 
         | [4] https://www.usenix.org/conference/atc17/technical-
         | sessions/p...
         | 
         | [5] https://lwn.net/Articles/842385/
        
         | dang wrote:
         | Please don't take HN threads on generic flamewar tangents.
         | They're repetitive, shallow, and tedious, and usually get
         | nastier as they go along. We're trying for the opposite sort of
         | conversation here.
         | 
         | https://news.ycombinator.com/newsguidelines.html
         | 
         | We detached this subthread from
         | https://news.ycombinator.com/item?id=29403597.
        
         | [deleted]
        
         | wittycardio wrote:
         | You realize that most of the core software that we depend on
         | was built by graduate students right ? Idk why the average
         | programmer assumes that PhDs in freaking computer science can't
         | code. Implementation and edge cases are the easy part, the hard
         | part is design and algorithms. One just requires some focused
         | work , the other requires real skill and intelligence
        
           | mbreese wrote:
           | _> PhDs in freaking computer science can 't code_
           | 
           | I've known people with PhDs in computer science (from a top
           | tier school) that couldn't code. Their research was all done
           | in Matlab for simulations, modeling a biological process. It
           | was a very specific set of skills required. And at the time,
           | this person couldn't have written a web front end to a
           | database to save their lives.
           | 
           | Just because one is good at the theory behind CS doesn't mean
           | they understand software engineering. Similarly, because one
           | is good at the theory doesn't mean they can't code.
           | 
           | They are two related, but different, skill sets.
        
             | pxc wrote:
             | > > PhDs in freaking computer science can't code
             | 
             | and many graduates in freaking computer science can't read
             | or write proofs, either
             | 
             | > They are two related, but different, skill sets.
             | 
             | exactly.
             | 
             | I think it would be valuable to enforce more crossover in
             | our educational institutions, though. We should have
             | clearer boundaries between computer science and software
             | engineering, and then also require students (at every
             | level) in each to do some study in the other.
             | 
             | Researchers should be in touch with the concerns and needs
             | of ordinary programmers, and ordinary programmers should be
             | capable of looking at the output of researchers to take
             | good ideas and make them practicable and polished.
             | 
             | Sometimes the disconnect means research effort gets wasted
             | and practical technology lingers on designs that serve it
             | relatively poorly.
             | 
             | But of course software engineering and computer science are
             | distinct and deep specializations.
        
           | gnufied wrote:
           | I have a feeling that - software along with hardware today
           | has got lot more complicated than what was 30-40 years ago.
           | 
           | Most production software (esp low level stuff like Kernel,
           | filesystem) today is written and maintained by people having
           | that work as jobs. I wish it was any other way. Also, what
           | users expect from production software is way different than
           | situation 30-40 years ago. An Operating sytem _must_ work for
           | different CPU, GPU. A bare-bones OS is basically a non-
           | starter. I mean look at Haiku-OS or any of other operating
           | system projects, for most part they have gone nowhere.
           | 
           | A filesystem is also fairly complicated piece and what we
           | expect from a filesystem is different. Speed is good but that
           | is not the only criteria and I am afraid it does take serious
           | _engineering_ effort (edge cases and all) to get it usable on
           | today 's hardware.
        
           | sfink wrote:
           | > average programmer assumes that PhDs in freaking computer
           | science can't code.
           | 
           | Average programmer here. PhDs in computer science can't code.
           | 
           | Ok, it's an overgeneralization. And it's probably based on a
           | flawed sample of job applicants that make it past HR
           | screening to get to me. The base rate of applicants who can't
           | code is disturbingly high, probably around 20%. (Not that
           | high numerically, but given that they've passed pre-screening
           | and have something impressive-sounding on their resume, it's
           | too high.) The rate of applicants with a PhD in CS who can't
           | code is way higher, probably around 60%.
           | 
           | Note that these tend to be fresh graduates. And it even makes
           | sense -- most theses require just enough coding to prove a
           | point or test a hypothesis. In fact, the people who like to
           | code tend to get sucked into the coding and have trouble
           | finishing the rest of their thesis work, which may start out
           | interesting but soon gets way less fun than the coding part.
           | Often such people bail out with an MS instead of a PhD.
           | 
           | (Source: personal experience, plus talking to people I've
           | worked with, plus pulling stuff out of my butt.)
           | 
           | At the same time, many of the _best_ coders I know have PhDs.
           | 
           | > Implementation and edge cases are the easy part, the hard
           | part is design and algorithms.
           | 
           | Hahahaha. <Snarky comment suppressed, with difficulty.>
           | 
           | I agree that design and algorithms can be hard. (Though they
           | usually aren't; the vast majority of things being built don't
           | require a whole lot.) But the entire history of our field
           | shows that even a working implementation Just Isn't Good
           | Enough. _Especially_ when what you 're writing is exposed in
           | a way that security vulnerabilities matter.
           | 
           | Though it's a bit of a false dichotomy. Handling the edge
           | cases and the interaction with the rest of the system
           | requires design, generally much more so than people give it
           | credit for. Algorithms sometimes too, to avoid spending half
           | your memory or code size on the 1% of edge cases.
        
             | rswail wrote:
             | PhD CS scientists shouldn't _have_ to be able to code. They
             | are exploring the _science_ of computing, not the
             | implementation.
             | 
             | Developing queueing theory doesn't make you a great coder
             | for Kafka environments.
             | 
             | Working on file system design and new innovative data
             | structures (a persistence and retrieval environment) has
             | nothing to do with writing kernel drivers.
             | 
             | On the other hand, a lot of SE graduates dont _want_ to
             | code, because they think there should be code writing tools
             | and frameworks and infrastructure and process.
             | 
             | They'll spend endless hours talking about those things and
             | focusing on their own needs instead of the actual purpose
             | of the code they're supposed to be developing.
        
           | knl wrote:
           | Sorry, but this is nonsense. Look at the chubby
           | implementation and the subsequent paper - implementation and
           | edge cases were the hard part, that took a lot of skill to
           | get right. The algorithm is important, but labeling one as
           | easy is far away from real world experiences.
           | 
           | I never assumed that PhD students can't code. They can and
           | they are pretty good at that. My point is that their
           | incentives are in writing papers and running experiments that
           | support claims in their papers, not produce reliable
           | software. It might be reliable, but mostly it's not. When we
           | use tools build by PhD students, it's usually when there are
           | companies/startups built around it, and that is what I refer
           | to as having skin in the game.
        
             | wittycardio wrote:
             | Fair enough I misunderstood your point then
        
           | WastingMyTime89 wrote:
           | I think it's more that people believe (in my opinion
           | rightfully) that good design is a skill which comes with
           | experience. That's why I expect great algorithms and small
           | software from graduate students and awesome design from
           | established teams working on large scale problems.
           | 
           | That doesn't really apply here obviously. The BetrFS team has
           | experienced members.
        
         | klyrs wrote:
         | Nobody is asking you to deploy this in your production system.
         | This is about an experimental filesystem which supports exactly
         | one version of the Linux kernel. It's neat to see progress in
         | this field -- maybe try and learn something new?
         | 
         | And, the way to get production-ready code is to write a kernel
         | module, with hopes that others in the kernel community will
         | pick it up. Linux certainly didn't start out mature, but you're
         | probably using it now.
        
         | mirekrusin wrote:
         | Tokutek's fractal tree was quite known when they did backend
         | for mongodb on it with record breaking perf, from what I recall
         | it was patented and that was the reason people didn't dive into
         | it.
        
         | _jal wrote:
         | It is both kind of hilarious and kind of terrifying to see this
         | sort of anti-academic, anti-expert nonsense is bleeding in to
         | %$&#ing software development.
         | 
         | All your written-in-production, battle-hardened code with no
         | effete book-larnin' algorithms aren't going to run very well
         | without a functional electricity grid.
        
           | toast0 wrote:
           | I understand some of the frustration though. I was trying to
           | do some audio processing work once. Found the paper(s), which
           | promised code available from websites that are no longer
           | available. Dug through the internet archive to find the zip
           | files with the matlab code; managed to tweak it to run with
           | the matlab version I have; found it works as described with
           | the sample inputs, but crashes horribly on my inputs.
        
             | tytso wrote:
             | Source code availaility for academic papers is important
             | for reproducibility, so other people can run additional
             | experiments and demonstrate that the performance numbers in
             | the paper weren't fudged.
             | 
             | It's not necessarily going to be useful for production use.
             | There are exceptions to this; for example, there are papers
             | were the authors claim that a flash/SSD emulator is
             | suitable for use by other academics to experiment with
             | their FTL ideas, or to grab network traces from NFS traffic
             | so they can be replayed to test file system performance
             | using real world data. In those cases, the point of the
             | paper is to share a tool that can be used by other
             | researchers (as well as the team that created the tool in
             | the first place), and in that case, the code had better
             | d*mned well work. (But even then, there might be buffer
             | overrun bugs in the SSD emulator; which is fine, since the
             | FTL is intended to be provided by an academic researcher,
             | and it is not expected to accept arbitrary inputs including
             | from a malicious attacker.)
             | 
             | I don't know whether the papers in your case were meant to
             | be documentation for code that was meant to be shared, or
             | to explore a particular research idea, the code was only
             | meant for that particular purpose. Even if it was for the
             | former, there's an awful lot of bitrotted, unreliable
             | "abandonware" on sourceforge or github that can be pretty
             | skanky; that's a problem which is not restricted to
             | academically published papers.
        
             | lijogdfljk wrote:
             | I've been debating where the anti-science behavior stems
             | from. From reasonable people at least. The best i can come
             | up with is that most reasonable people recognize how the
             | modern age is an information war. Product sales, articles
             | on economy, articles on politics, even some well advertised
             | miss-steps like the sugar industry pushing/funding pro-
             | sugar anti-fat papers way back _(which may or may not be
             | true, but it is a common trope parroted)_.
             | 
             | I assert that all this leads to people being paranoid about
             | information of subjects well outside their expertise. Which
             | is a really scary place to be. The answer seems non-obvious
             | to me, but is likely nuanced.. and the public doesn't do
             | well with propagating nuance in my experience.
             | 
             | I'm really interested in tooling to help disseminate
             | information.
        
               | GhettoComputers wrote:
               | Readers of Nassim Taleb, doubts raised over our blind
               | trust of institutions like their reaction to coronavirus,
               | any mainstream nutritional advice (fat is bad and causes
               | heart disease, "the China study" was written by a
               | nutritional biochemist), papers that p hack to be
               | published, replication crisis of psychology, statistics
               | and using it to lie.
               | 
               | There is no paranoia, an allegory is an ivory tower of
               | studies about language from non native speakers with a
               | PhD, and a native speaker who gets no recognition for
               | using it daily but doesn't have a fancy diploma or
               | credentials so someone who speaks "proper" Spanish from
               | Spain who has never been to Spain is more "credible" than
               | the Mexican speaking "improper" Spanish daily.
               | 
               | Academia is to be ignored unless it's relevant, Fritz
               | Harber didn't need the Nobel prize to have real world
               | effects in nitrogen fixing to help farmers grow and
               | sustain our population, Obama wasn't more relevant
               | because of his Nobel prize, and Perelman's refusal of the
               | Fields Medal doesn't change his contributions.
        
               | SkyMarshal wrote:
               | Readers of Nassim will also recognize his critiques of
               | academia are mainly targeted at social sciences and
               | similar fields that can only "prove" their findings
               | statistically, where p-hacking and incorrect use of
               | models and wrong distributions and the like result in bad
               | findings passed off as good.
               | 
               | That's not the case with computer science, at least in
               | systems subfields like filesystems, where theories can be
               | implemented in isolation and shown to either work or not.
        
               | GhettoComputers wrote:
               | Disagree. Artificial benchmarks and p hacking for showing
               | good performance in CS are also statistically proven. The
               | best result in geekbench doesn't mean anything to me.
        
               | SkyMarshal wrote:
               | Benchmarks and geekbench are not "where theories can be
               | implemented in isolation and shown to either work or
               | not."
        
               | GhettoComputers wrote:
               | Maybe I don't understand your point, in psychology for
               | example sterile lab tests are isolated and can be shown
               | to work or not, is it not the same idea here?
        
               | SkyMarshal wrote:
               | CompSci theories are essentially mathematical proofs. You
               | create a proof of something, then you build it to test
               | it, to make sure your math is actually correct, and that
               | the theory works in implementation.
               | 
               | Proof of correctness doesn't rely on having a large
               | cohort of test subjects undergoing an experimental trial
               | of some sort, and then interpreting the results with
               | statistical models, distributions, p-values, etc.
               | 
               | I don't know psychology in depth, but if there are
               | similar kinds proofs without requiring statistical
               | analysis of a large experimental cohort, then I don't
               | think Taleb's criticisms are aimed at those either.
               | 
               | It's the fundamental problem of knowledge - can truth be
               | known via logic and reason, or via empiricism and
               | observation? The answer to both is, sometimes, but with
               | caveats.
               | 
               | Peter Norvig also wrote a good take on all the ways
               | studies using experimental cohorts can go wrong:
               | https://norvig.com/experiment-design.html
        
               | Raineer wrote:
               | When I encounter it, I feel it's often a hatred of the
               | "doers vs the thinkers"
               | 
               | My career path was 25 years engineering, before migrating
               | into a hybrid EE/PM role as sort of a natural progression
               | from being "the engineer who knew how to run the
               | project". Once I started learning the more formal
               | approaches to PM, it uncovered an entire world of
               | engineers who have an incredible hatred of any sort of
               | planning of any kind, because all planning time is wasted
               | and we should all just be doing.
               | 
               | The parent comment here feels the same way. Hatred
               | towards research because it's all theoretical (I guess?).
               | It seems clear as day that the best approach is a
               | marriage between the two.
        
               | quintushoratius wrote:
               | I think it's summed up by the mantra "those who can, do.
               | Those who can't, teach."
        
               | gfody wrote:
               | I suspect "hatred of any sort of planning of any kind" is
               | actually hatred for planning that fails to embody any
               | actual strategy (and is therefor a waste of time because
               | it doesn't help solve any actual problem except maybe
               | alleviate non-technical vips anxiety with false hope).
               | "formal approaches to PM" evokes just that sort of thing
               | in my mind (kpis/goals masquerading as strategy, gantt
               | charts, etc)
        
               | willis936 wrote:
               | When I worked in a university lab that builds
               | stellarators I learned that misgivings towards
               | researchers is all bullshit. There are engineers and
               | there are pencil pushers. Pencil pushers burn money and
               | bark loudly. Real engineers can plan and execute on time
               | and under budget.
        
               | Zababa wrote:
               | I think one source of anti-science behaviour might be
               | betrayed expectations. I know that I personally have
               | stronger expectations for academia and scientists than
               | for other people. So when someone from academia, a
               | scientist, an engineer betrays these expectations, I feel
               | worse than if a regular person did it. There's a feeling
               | of "If we can't even count on those people, what are we
               | even supposed to do?". For example, at the beginning of
               | the COVID pandemic (around January 2020), I read a lot
               | about it. Lots of very smart people were saying that this
               | could be a big pandemic. I talked about it with a doctor
               | in a non-professional setting that told me to basically
               | not worry about it, that it wasn't going to be anything
               | huge. This time I was right and he was wrong. Was it
               | because I searched more about it? Was it just luck? I
               | don't know. But I know that it made me lose a bit of
               | trust with that person.
               | 
               | I think the origin of this might be on how I (or we) see
               | those people. You're supposed to follow what the doctor
               | tells you, what the scientists tell you. But in a way,
               | since you're supposed to follow what they say, they have
               | some kind of responsibility towards you. And when they
               | say something wrong, it's way worse than when a regular
               | person says something wrong. It's like when you're young
               | and your teacher or your parents are wrong, it's very
               | frustrating.
               | 
               | Your example about the sugar industry is also a great
               | one. Try to understand a bit more about nutrition, and
               | soon you'll hear all kind of conflicting advice and
               | explanations from very different experts.
               | 
               | I know that personally I have to work on myself and
               | accept that those people are humans, and make mistakes,
               | just like me. But just like telling people to eat less
               | and move more didn't solve the obesity epidemic, I'm not
               | sure that this solution will scale to a large population.
        
               | michaelmrose wrote:
               | Something you may be more familiar with is people's
               | concept that someone that "knows computers" is familiar
               | with any and every sort of task that involves a computer
               | whereas in fact this could encompass a wide variety of
               | different skills that require an individual investment of
               | time.
               | 
               | The same can be said of medicine where encompasses a very
               | broad set of skills. Your doctor may have been an expert
               | in sports medicine or brain surgery but it doesn't
               | automatically make him competent and epidemiology. It
               | also doesn't force him to pay attention to current
               | developments in the news which is likely what informed
               | your opinion. Personally I found it was completely
               | obvious in January that we would be dealing with a crisis
               | because I followed the situation and suspect strongly
               | that your doctor friend did not.
               | 
               | There is also the issue of survivor-ship bias. We worry
               | about many things and we will absolutely recall the times
               | our worry was justified and forget when it we are
               | mistaken. If Yellowstone ever blows there will be many
               | people who knew it was just around the corner and this
               | will be true if it blows now or in a century, whether or
               | not we have any scientific basis for the thought process.
               | 
               | TLDR: A singular doctor of unknown specialty getting it
               | wrong in January isn't a flaw in science. Science isn't
               | expected to be very good at ensuring a single expert of
               | only tangential expertise gives you the right answer
               | whereas it is reasonable good groups sometimes slowly
               | arriving at increasingly correct answers. If you want a
               | more correct answer consider consulting or reading what
               | several people of relevant expertise who are up to the
               | minute on current information have to say.
        
               | _jal wrote:
               | This is interesting, and I've seen a bit of this sort of
               | behavior, too.
               | 
               | Some people seem to confuse expertise for a claim of
               | infallibility, and when some expert get something wrong,
               | the reaction is to conclude that expert advice is worth
               | no more than the guy on the teevee hawking vitamins and
               | anti-expert bile.
               | 
               | It is a sort of Leveler belief wrapped in a search of an
               | Oracle.
        
           | freedomben wrote:
           | I'm extremely pro-academic, but I think you're taking the
           | least charitable interpretation of the parent. While I fully
           | disagree with the parent on the value proposition here, they
           | are quite correct that (at least most) phds aren't concerned
           | with implementation problems like corner cases and long term
           | maintenance. There are of course exceptions, but having
           | worked on quite a bit of academic code, I can say that
           | anecdatally maintainability is not a high priority. It's very
           | much like a typical PoC is in a startup.
        
             | mlyle wrote:
             | > anecdatally
             | 
             | Is this an anecdote vs. data pun? :D
        
               | freedomben wrote:
               | Yes haha, sorry I use it way too much and it's become
               | part of my vocabulary
        
             | cormacrelf wrote:
             | Why is this relevant? Just looking at the title and
             | abstract, it is clearly among the most implementation-
             | focused computer science papers ever written. It's the
             | paper that accompanies BetrFS 0.2, incorporating that
             | source code (which clearly has to handle edge cases), many
             | measurements and discussions of tradeoffs. What more are
             | you people asking for?
             | 
             | > _looks at extremely practical paper, which rightly won a
             | best paper award, probably for the very reason that it was
             | extremely practical_
             | 
             | > _decides to have the thread descend into a dismissal of
             | the value of best paper awards on the basis that they do
             | not reward practicality_
        
               | sfink wrote:
               | It's possible to regard the paper as high value,
               | appreciate its value, _and_ recognize that this is not a
               | production filesystem.
        
               | pittmajp wrote:
               | Why was the fact that it's not a production file system
               | even brought up? Was it advertised as a production file
               | system? Does the paper say that it is one? What was the
               | rhetorical purpose of that statement?
        
               | lhorie wrote:
               | To be fair, FAT32 is a production filesystem...
        
               | knl wrote:
               | I don't think I dismissed the value of the paper. I
               | pointed out that implementation may not be that good, and
               | best paper award and quality of the implementation most
               | of the time are not correlated.
        
               | cormacrelf wrote:
               | You dismissed the value of the paper about ten times in a
               | row, barely stopping for breath. Some of the tone is
               | found in which thing goes on what side of the word "but",
               | some in other words, but generally you really messed it
               | up if you wanted to avoid insulting the authors and
               | dismissing the value of research like this. There is an
               | enormous gulf between a comment like yours
               | 
               | > _That's all fine and dandy, but [...] Winning best
               | paper awards doesn't say a thing about the implementation
               | [...] the goal of PhD students is to publish papers [...]
               | I couldn 't trust [this]_
               | 
               | and a comment like this
               | 
               | > _This is a really impressive project. Obviously this is
               | deeply academic, but since I am so impressed, I wonder
               | what the plans are for this (or the same idea in a new
               | fs) to reach the kind of commercial quality where I can
               | use it in a production system._
               | 
               | If you were so aware of the general nature of academic
               | research vs battle-tested implementations, then you would
               | also know that filesystems are so incredibly complicated
               | that the latter invariably takes on the order of 10+
               | years from a big team to create. When you forget this
               | fact and say that a few-years-old implementation probably
               | sucks because it's from academia, you're ignoring that
               | NOBODY could have made it production-ready in that time,
               | not even Microsoft or Apple or Oracle. Why would you
               | criticise it on this basis? Choosing to do that was the
               | biggest dismissal of the value of the work. Instead, you
               | buried what was in effect a compliment (this would be
               | useful for my production systems) under ten layers of
               | insults.
        
               | tytso wrote:
               | The iJournaling paper was published in 2017 (and like
               | many papers, it took multiple rounds of paper submissions
               | before it was finally accepted; the academic procress is
               | rigorous, and many program committees are especially
               | picky).
               | 
               | The jJournaling ideas hit the upstream kernel in 2021 as
               | ext4 fast commits, and no I don't consider it production
               | ready yet. If the fast commits journal gets corrupted,
               | it's possible that the file system will not be
               | automatically recoverable, and may even lead to kernel
               | crashes. I'd give it another year or so before it's
               | completely ready for prime time.
               | 
               | But the other reason for the four year delay between 2017
               | and 2021 is because I had to find the business
               | justification (and after that, the budget and head count)
               | to invest the SWE time to actually implement the idea. A
               | lot of people want new sexy file system features, but
               | very few people are willing to PAY for them. So part of
               | the job of an open source maintainer is not just to
               | perform quality control and create a technical roadmap,
               | but also to help the developers workin on that subsystem
               | to make business cases to their respective employers to
               | make a particular investment. The dirty little secret is
               | that most people are pretty happy with the file systems
               | as they currently exist; the bottleneck is often not the
               | file system, and while certain file system features are
               | _nice_, they very much aren't critical --- or at least
               | not enough that people are willing to pay the SWE cost
               | for them.
        
           | rackjack wrote:
           | What is going on? The grandparent comment is merely noting
           | the novelty of a filesystem utilizing a recently invented
           | data structure. The parent is weirdly mentioning how they
           | wouldn't trust a research filesystem for real work (who
           | would...?). Now THIS comment is claiming the parent comment
           | is anti-academic and anti-expert when it's actually mainly
           | raising common concerns about the disconnect between theory
           | and practice (then this comment mentions the electricity
           | grid, as if that's of any relevance??). Just a really strange
           | series of disconnects between the arguments.
        
             | SkyMarshal wrote:
             | The grandparent comment is an example of the kind of
             | "middlebrow dismissal" [1] that isn't really welcome here.
             | 
             | It sets the tone with dismissive snark ("find and dandy"),
             | then implicitly asserts that the project is not interesting
             | because it's not production-ready.
             | 
             | Of course it's obvious to everyone here that version 0.2
             | beta software is not production-ready, so obvious that
             | comments to that effect are at best superfluous, at worst
             | annoying.
             | 
             | But its production-readiness is clearly not the focus of
             | the discussion, rather its novelty and potential is. That's
             | what makes it interesting and worth discussing here.
             | 
             | [1]: https://news.ycombinator.com/item?id=4693920
        
         | hhmc wrote:
         | Where did you get the impression that this is the product of
         | PhD students?
        
           | knl wrote:
           | The sibling comment described it well. In addition, the
           | majority of github commits are done by the people that are
           | listed in the alumni section, while they where PhD students.
           | There aren't many commits from people listed as current
           | members, and last significant commits are from the last year.
        
           | [deleted]
        
           | lvh wrote:
           | Not OP, but the majority of people involved have .edu
           | homepages (stints in industry, still research emphasis) and
           | many of the alums appear to have become alums
           | contemporaneously with the end of their academic career, most
           | of them via Stony Brook, and finally there are a bunch of
           | academic papers with authors clearly acting in their academic
           | capacity (and typically prior to their stints in industry),
           | so, IDK, seems like a reasonable assertion that this has a
           | strong academic emphasis and a lot of the work was done by
           | academic students. Whether it's actually unreliable is a
           | different question, but it seems pretty reasonable to suggest
           | that it's a research project and not a production filesystem.
        
         | gnufied wrote:
         | Looks like only works with Linux kernel - 3.11?
         | https://github.com/oscarlab/betrfs/blob/master/README.md , so
         | definitely have not been updated for awhile.
         | 
         | I am not even sure it wants to be production ready but may be
         | it is a playground for ideas.
        
           | donporter wrote:
           | Indeed, we are behind on releases. We do anticipate a major
           | release, including 4.19 kernel support, in the coming months.
           | 
           | Part of our challenge is that we are also exploring non-
           | standard extensions to the VFS API - largely supported by
           | kallsyms + copied code to avoid kernel modifications. This
           | makes rolling forward more labor intensive, but we are
           | working to pay down this technical debt over time, or
           | possibly make a broader case for a VFS API change.
        
           | all2 wrote:
           | Extricating something from specific kernel API calls won't be
           | fun. Might be a good learning experience, tho. I may take a
           | crack at this in my spare time (I'm not good at C. At all. So
           | this will be more learning for me, and much less functional).
        
         | throwaway02201 wrote:
         | I hope you are being downvoted for the harshness and not the
         | content.
         | 
         | > Like many research projects, this one will also probably last
         | as long as there is funding. Remember, the goal of PhD students
         | is to publish papers, not develop and maintain software. Thus,
         | without skin in the game, I couldn't trust my data/workloads to
         | such systems.
         | 
         | Sadly true. For-profit companies only care about $$$. Academia
         | only cares about publishing to get funding.
         | 
         | Both options are not ideal for developing trusted and user-
         | focused software in the long term. OpenSSL is a good example.
         | 
         | No-profits really struggle to get funding. Government grants
         | are a mess.
         | 
         | The world really needs a new approach to R&D.
        
           | globular-toast wrote:
           | > Academia only cares about publishing to get funding.
           | 
           | That's just not true. To do well in academia you have to be
           | truly invested in your field. You can just about get by if
           | you're only in it for the papers, but it's just like getting
           | by in a job that you're only in for the money. At the end of
           | the day, though, in a world where everyone is forced to be
           | productive or be homeless, there are times when publishing
           | becomes a necessity. This doesn't mean they only care about
           | publishing, though.
        
             | esyir wrote:
             | It is however, very much a follow the incentives kinda
             | situation. Just as the monetary incentive can also bring
             | about many unwanted behaviour. The pride /publication based
             | incentives introduce their own flavour of dysfunction.
        
             | goodpoint wrote:
             | > To do well in academia you have to be truly invested in
             | your field
             | 
             | I never said the opposite. For individuals it takes a huge
             | lot of dedication.
             | 
             | But academia, as a whole entity, is being forced into the
             | publish-or-die mindset.
        
       | seirl wrote:
       | Is the name designed to be intentionally confusing?
        
         | pauldavis wrote:
         | Looks like they are starting with the popular BTRFS, and then
         | making the pun of this being "better," and also implying the Be
         | tree data structure they use.
         | 
         | I bet it's intended to be pronounced "Better Eff Ess."
        
           | BlackLotus89 wrote:
           | > Amanda: To clear this up, once and for all: is it
           | pronounced BetterFS or ButterFS?
           | 
           | > Chris: <Grin> Definitely both.
           | 
           | https://web.archive.org/web/20120627065427/http://www.linuxf.
           | ..
        
           | nerdponx wrote:
           | But that's already how a lot of people pronounce Btrfs...
        
             | fnord123 wrote:
             | I thought everyone said butterface.
        
               | yakubin wrote:
               | I'm definitely calling it that starting today.
        
           | emptysongglass wrote:
           | I personally find it really douchey. BTRFS definitely had
           | this first and using such a name in full knowledge of BTRFS
           | is just in poor taste.
        
         | Lhiw wrote:
         | Only if you don't know how to pronounce btrfs.
        
         | myself248 wrote:
         | Letting programmers name their projects was always a mistake.
        
           | klyrs wrote:
           | UUID4 or go home?
        
           | toolslive wrote:
           | letting C-level name projects is always a mistake too.
           | "project Crossbow!" (Microsoft, Sun, EU, ...)
        
             | asplake wrote:
             | After a former colleague of mine, Stuart's rule of system
             | naming: The whizzier the name, the crappier the system.
             | Inside corporates the correlation is uncanny.
        
           | mixmastamyk wrote:
           | WSL would like a word with you.
        
         | vletal wrote:
         | Is it that bad? There is plenty of technical shortcuts which
         | differ in a single letter - that should not be an issue. Plus
         | one is pronounced _/ 'bi:tri:/_ and the other likely _/ 'bet@/_
         | (or _/ 'bet@r/_ in the US :)).
        
           | zauguin wrote:
           | I think that this is particularly bad since there are many
           | different pronunciations for btrfs. E.g. Wikipedia says
           | 
           | > Btrfs (pronounced as "better F S", "butter F S", "b-tree F
           | S", or simply by spelling it out)
           | 
           | and I heard all of them in practice (except for spelling it
           | out). While you can hear the difference for "b-tree F S", the
           | other ones are much harder to distinguish.
        
             | vletal wrote:
             | Oh, thx for pointing that out. I was not aware of the other
             | possible pronunciations.
        
             | carlhjerpe wrote:
             | Swedish person thinking I'll just pronounce it "bee tee arr
             | FS". I mean that's what it says and it isn't a tounge
             | warper so... Same with SQL, never got the sequel / whatever
             | pronunciation, I just say the darn letters.
        
           | willis936 wrote:
           | It's the single letter in a relatively complex acronym where
           | the single letter doesn't distinguish the underlying name.
        
         | mcdonje wrote:
         | They should call it Bepsilon FS, or Bepsi for short.
        
           | dkdbejwi383 wrote:
           | Bepsi Max for when you need large volume support
        
           | junon wrote:
           | Oh no, we're kindling the Bepis vs Conk debate now.
        
             | cerved wrote:
             | OMAN BEPIS
        
       | dang wrote:
       | One previous thread, if anyone's curious:
       | 
       |  _BetrFS: An in-kernel file system that uses Be trees to organize
       | on-disk storage_ - https://news.ycombinator.com/item?id=18202935
       | - Oct 2018 (46 comments)
        
       | mnw21cam wrote:
       | Way back in my final year undergrad project, I put together a
       | LSM-based filesystem that was write-optimised. I even thought
       | that I had invented the notion of an LSM tree, but the original
       | LSM tree paper pre-dated my invention by 3 years. I applied to
       | take the idea and run with it for a PhD afterwards, but no joy.
       | 
       | The criticism of LSM on the FAQ that it can't have as good read
       | performance is perhaps a little over-egged. A fair proportion of
       | the work I did on my project was on how to optimise the read
       | performance. The biggest problem with a LSM tree was working out
       | which level of the hierarchy your entry was in, which involves
       | looking in one, then the next, until you find it. When your data
       | is larger than your RAM, then this becomes a disc access for each
       | level. I was working on structures that could be very small and
       | answer that, so they were more likely to fit in RAM.
       | 
       | The other difference between an LSM and a B-epsilon tree is that
       | with an LSM, the merging is done in bulk as a single operation,
       | whereas with a B-epsilon tree it is done on a node-by-node basis
       | as the node buffers fill up. Therefore an LSM could potentially
       | perform more of its housekeeping in long sequential disc
       | operations than a B-epsilon tree, which is likely to have a more
       | random-access pattern.
        
         | vlovich123 wrote:
         | Random access patterns don't matter so much for SSDs which is
         | where LSMs make the most amount of sense if I recall correctly.
        
           | skyde wrote:
           | This is not completely true. LSM does not just convert random
           | write into sequential write. It also reduce the number of
           | write IO when the dataset is larger than memory.
           | 
           | With B-tree inserting a random entry require reading all
           | b-tree page from disk until the right page is found then
           | writing the updated page (4KB) back to disk.
           | 
           | While in LSM if you are trying to add a new entry that is
           | 300byte you only need to append 300 bytes to the top level
           | file on disk.
        
             | vlovich123 wrote:
             | Right. So it's more a problem of data dependency than it is
             | with the access being random/sequential. Eg. if LSM needed
             | to do 1 large sequential read & then jump around in RAM the
             | same way, it would still largely have the same basic
             | problem, no?
        
       | [deleted]
        
       | stjo wrote:
       | Looks very promising. Anybody tested it yet? The benchmarks look
       | phenomenal https://www.betrfs.org/faq.html !
       | 
       | Even if far from production ready for important data, I can see
       | its immediate uses for certain kinds of software, where the disk
       | is used as a large scratch pad for example. Lot's of random
       | writes are common in photogrammetry in large datasets, where I
       | imagine BetrFS can be used during compute and the final output
       | stored on ZFS.
        
         | mnw21cam wrote:
         | I'd be very cautious about the benchmarks. For example, betrfs
         | was measured performing 1000 4-byte writes into a 1GB file. It
         | isn't clear whether there were any sync operations - there
         | certainly wasn't a sync after each write, although there might
         | have been a sync after the whole set of 1000. That speed up is
         | a simple characteristic of a filesystem that is log-structured
         | (so it is writing those 1000 events as a single sequential disc
         | access) and doesn't store data in 4kB blocks (so it doesn't
         | have to load the other 4092 bytes in the block before writing
         | it). The filesystem I wrote in 1999 for my undergrad project
         | would have done the same thing. One of the benchmarks I wrote
         | for my system showed exactly the same amazing performance
         | benefit. (My benchmark had me generate a tree of a thousand
         | small files in a ridiculously short time - ext2 thrashed all
         | over the disc doing the same thing.) Unfortunately it is
         | unrealistically optimistic because that isn't a write pattern
         | that is going to happen very often. Usually each small write
         | will have an fsync after it. Unless you actually have a
         | thousand writes without a separating sync, then this speedup
         | isn't going to be realised.
         | 
         | I'm struggling to see how the find/grep benchmark could
         | possibly have such a fantastic performance benefit for betrfs,
         | given the fact that all those filesystems are effectively
         | reading a tree or known-location structure. The only conclusion
         | I can reach is that maybe the betrfs test had a hot cache and
         | the others didn't. I could possibly be persuaded if betrfs
         | keeps all its metadata in a small easily-cached part of the
         | disc, but there are disadvantages to that too. I don't think
         | this test is valid.
        
           | williamkuszmaul wrote:
           | It seems like you may be jumping to conclusions a bit
           | prematurely. The paper
           | (https://www.cs.unc.edu/~porter/pubs/fast15-final.pdf) is
           | very explicit that they start with a _cold_ cache. They also
           | go into detail for why they do well on grep. As I understand
           | it (but I 'm not an expert), betrfs's advantage here comes
           | from the fact that it stores files lexicographically by their
           | full names (and metadata), meaning that related files are
           | stored nearby each other on disk. This gives better locality
           | than what you would get with a standard inode structure.
           | 
           | Based on that, it seems like the outcomes of the tests are
           | pretty reasonable.
        
             | mnw21cam wrote:
             | I'll concede on the hot cache suggestion. Storing the files
             | lexicographically is an interesting thing - it means that
             | grep/find (or anything else that reads through the
             | files/directories in order) would perform well. But this
             | makes the test to some extent contrived to specifically run
             | fast on this particular system.
             | 
             | I do agree that this kind of filesystem mechanism should
             | give good performance benefits. But in the general case
             | they won't be quite as fantastic as these benchmarks make
             | out.
        
               | donporter wrote:
               | Please note that the benchmark sources are also
               | available, e.g.; https://github.com/oscarlab/betrfs/blob/
               | master/benchmarks/mi...
        
         | marco_craveiro wrote:
         | Indeed, sounds very interesting. However, from their github
         | [1]:
         | 
         | > NOTE: The BetrFS prototype currently only works on the
         | 3.11.10 kernel.
         | 
         | This is a tad limiting, hopefully they will port it to
         | latest...
         | 
         | [1] https://github.com/oscarlab/betrfs
        
           | tyingq wrote:
           | It's also currently stacked on top of ext4, and the tree data
           | has to sit on some other filesystem as well. So promising
           | design, but quite a long way from ready for production.
        
           | klyrs wrote:
           | Not only that, but they need some fun patches...
           | 
           | > Our design minimizes changes to the kernel. The current
           | code requires a few kernel patches, such as enabling direct
           | I/O from one file system to another. We expect to eliminate
           | most of these patches in future versions.
           | 
           | I prefer half-baked projects that are honest about their
           | status over overpromised vaporware, personally
        
       | williamkuszmaul wrote:
       | The website doesn't seem to mention that several of the papers on
       | the filesystem won best-paper awards at major conferences. The
       | paper, Optimizing Every Operation in a Write-Optimized File
       | System, in particular, won best-paper award at FAST '16.
       | 
       | Also, if you're interested in learning more about B^\epsilon
       | trees, here's a talk given by Rob Johnson a few years ago at
       | Microsoft Research: BetrFS: A Right-Optimized Write-Optimized
       | File System https://www.youtube.com/watch?v=fBt5NuNsoII
       | 
       | In general, I think it's really cool that there is a file system
       | that exists today (i.e., BetrFS) that uses data structures which
       | _didn 't_ exist 25 years ago. It's a great example of
       | theoreticians and systems researchers working together.
        
         | swills wrote:
         | The reason I posted it was I saw it in the FAST21 playlist:
         | 
         | https://www.youtube.com/watch?v=6KueHK9i8lE
        
         | pengaru wrote:
         | > here's a talk given by Rob Johnson a few years ago at
         | Microsoft Research: BetrFS: A Right-Optimized Write-Optimized
         | File System https://www.youtube.com/watch?v=fBt5NuNsoII
         | 
         | Interesting talk, I wonder how much of the perf. advantage
         | diminishes in a finished, production-ready implementation
         | though.
         | 
         | Comparing an 80%-complete R&D prototype mule against crash-
         | resilient posix-compliant production filesystems is basically
         | never a fair perf. comparison.
         | 
         | You might find that just implementing rename and hard-links
         | properly alone is going to kill your perf. since you dispensed
         | with on-disk inode equivalents.
         | 
         | Nice to see people poking at these issues nonetheless, Linux
         | needs better filesystem options.
        
           | zokier wrote:
           | On the other hand I can easily imagine lot of applications
           | that don't care about full posix compliance, and are
           | perfectly happy to trade handling of some obscure feature for
           | improved performance.
        
             | c0balt wrote:
             | Maybe a viable option for single-application container
             | images. They would on the one side offer the ability to
             | have tight control around used functions (to allow for
             | missing features) but also be able to exactly target an FS
             | and be optimized for it.
        
               | fartattack wrote:
               | Containers don't control the FS that they're written to.
               | A container image is in the tar format and at runtime the
               | underlying FS is defined by the host, which is why
               | containers only run on hosts with union filesystems
        
             | pengaru wrote:
             | Renames and hard-links are not obscure features.
             | 
             | And there are myriad mount options for tailoring
             | performance vs. crash-resilience/posix-compliance to the
             | application in most the existing production filesystems.
             | Which was honestly another aspect of the talk that was
             | somewhat lacking; what journaling modes were used?
             | barrier/nobarrier? was it even made equivalent to what
             | betrfs achieves? We don't even know if a betrfs instance
             | can successfully mount after a mid-write hard reboot.
        
       | derefr wrote:
       | If this filesystem "has comparable random-write performance to an
       | LSM tree", would it be viable to use this filesystem _directly_
       | as the storage for a key-value store (i.e. to swap out LevelDB
       | /RocksDB for a simple library that just creates each key as its
       | own file, expecting to be backed by this filesystem)?
       | 
       | If not, why not? I'm guessing mainly because of kernel context-
       | switching overhead?
       | 
       | And if that's why, then could use of this filesystem be _made_
       | competitive with [or better-performing than!] e.g.  "LevelDB
       | writing to ext4", if that context-switch overhead was removed --
       | e.g. if it was either used by a kernel-mode application (i.e. a
       | unikernel approach); or if the driver itself were moved into
       | userspace as a library, with the expectation that you'd compile
       | it into a single daemon process which would own and have write
       | access to a raw block device?
       | 
       | (I ask because part of my job involves tending to blockchain
       | archive-nodes, and the operational management of LevelDB at scale
       | sometimes makes me want to pull out my hair. A million little 2MB
       | files all in one directory, constantly being created and deleted.
       | If I could 1. work with the keys in those databases directly as a
       | mounted [perhaps read-only] filesystem, and 2. get for free the
       | BetrFS equivalent of Btrfs's incremental subvolume send/receive
       | for them, rather than trying to organize parallel rsync(1) for a
       | million tiny files, those factors alone would be worth dealing
       | with an experimental FS.)
        
       | dralley wrote:
       | Wonder how it compares to bcachefs which, I believe, uses similar
       | data structures.
        
         | pxc wrote:
         | Bcachefs has gotten a lot more development time and seems close
         | to ready for mainstream use, and this seems like it's much more
         | in very early stages.
         | 
         | It'd be cool to hear a conversation on the overall design of
         | each project from the authors of both, though
        
           | kzrdude wrote:
           | Bcachefs seems like it needs the attention of a team, not
           | just one good dev. Since it's not even merged, I guess it's
           | still 5 years out before we can use it..
        
             | pxc wrote:
             | I think it's already usable in a way that BeTRFS is not.
             | Like it can be installed on modern kernels, and there are a
             | handful of people using it as their root filesystem today.
             | 
             | I don't think it being out-of-tree is a huge deal per se.
             | ZFS is also out-of-tree. For use on personal systems, I
             | think the bigger thing is that the on-disk format is not
             | officially stable/permanent yet. But if that comes before
             | the thing is merged to the Linux kernel, I'd be willing to
             | try it on a personal system.
             | 
             | Try it at your own risk, of course, but BCacheFS doesn't
             | look like any extra work to set up on NixOS if you wanna
             | try it there-- if you tell NixOS that you wanna use
             | bcachefs it'll just transparently pull in the required
             | kernel for you.
             | 
             | Idk about filesystems development, but I agree that
             | eventually it would be ideal for BCacheFS to have a
             | sizeable development and maintenance team. Maybe in the
             | early stages, though, it's good for it to have the kind of
             | coherence and simplicity required to fit all in one
             | person's head. Time will tell, I guess!
        
               | jnsaff2 wrote:
               | I have used it on NixOS for over a year on my main
               | desktop and NAS box. The experience is .. flaky ..
               | sometimes Kent does not have a new enough kernel version
               | available that NixOS needs, there have been several major
               | breakages where some background tasks spin at 100% cpu
               | forever and the file system slows down to a crawl,
               | sometimes you need a to run fsck from a compat branch to
               | get your fs back into shape. At the moment my desktop is
               | broken because the NixOS config forgot how to unlock my
               | root volume. But when it works, it mostly stays out of my
               | way. I think I will move to ZFS for my desktop, there has
               | been just too much faff with my setup. The claim about
               | there not being any on-disk data loss, IDK, I have read
               | from disk some large media files that have been broken ..
               | when they had been written during a slow crawl while the
               | fs processes were spinning 100%. So jury is still out
               | there.
               | 
               | I totally agree with some parent commenter here that it
               | needs a team to work with Kent. Documentation is almost
               | nonexistent (tho ArchWiki saves the day a little).
        
               | kzrdude wrote:
               | Good points, but bcachefs does not have releases (ZFS has
               | versioned releases) or a development team.
               | 
               | (Obviously I'm not comparing anything to Bepsilon - they
               | are irrelevant until implemented as an actual linux
               | filesystem)
        
               | pxc wrote:
               | Oh yeah. ZFS is mature on a whole different level than
               | BCacheFS, too. As a bystander and potential user, if I
               | have a hope for BCacheFS it's once it makes it into the
               | mainline kernel, it attracts more developers and grows
               | into a community project with versioned releases and all
               | that. I imagine that its author hopes the same.
        
         | williamkuszmaul wrote:
         | I'm not super familiar with bcachefs, but from what I can find
         | it seems like it is based mostly on a standard (but I guess
         | very well implemented) B-tree. Am I missing something?
        
           | jlokier wrote:
           | Bcachefs appends log entries into large leaf blocks instead
           | of updating the sorted block data for insert the way a
           | standard B+tree would do it.
        
       | donporter wrote:
       | As one of the authors of this project, first off, I appreciate
       | the interest.
       | 
       | For those who are curious, our initial goal is indeed to build a
       | PoC and understand whether the data structures actually deliver
       | the potential performance gains in a realistic implementation
       | that one might expect on paper. I see a long arc from a new idea
       | to a production-quality implementation, and several iterations of
       | increasingly thorough evaluation and hardening.
       | 
       | Our current prototype is not production-ready; this is a long-
       | term goal, but we appreciate how much work this is. More of our
       | focus at the moment is on exploring other ways these algorithmic
       | techniques may be useful in a storage system or how to address
       | current problems---i.e., understanding the best way to design
       | such a system before trying to build a production-quality
       | version. Each of our papers has yielded significant overhauls to
       | the design.
       | 
       | We would also consider it a success if other file systems adopted
       | any ideas from our papers, or a new file system were designed by
       | someone else that adopted these techniques.
       | 
       | The commenters are right that there is a gap between when an idea
       | is exciting new research and fundable via grants versus funding
       | the "maturing" phase of the prototype. I will hasten to say that
       | the NSF has been supportive of maturing this system, for which we
       | are most grateful. Nonetheless, like many projects, we could use
       | more resources, and I would be happy to engage constructive
       | conversations out-of-band about how to address this gap.
        
         | dang wrote:
         | (This comment was originally a reply to
         | https://news.ycombinator.com/item?id=29404038 but I've detached
         | it so that more people will see it--the other thread has been
         | moderated since it's a tedious flamewar.)
        
         | matmatmatmat wrote:
         | Looking forward to the results of your work, good luck and
         | thank you!
        
         | adgjlsfhk1 wrote:
         | Since you're answering questions here, what is the impact of
         | the patents on the fractal tree for other implimentations? Are
         | other projects legally allowed to implement their own
         | fractal/B^epsilon trees?
        
       ___________________________________________________________________
       (page generated 2021-12-01 23:01 UTC)