[HN Gopher] We found and fixed a rare race condition in our sess...
___________________________________________________________________
We found and fixed a rare race condition in our session handling
Author : todsacerdoti
Score : 564 points
Date : 2021-03-18 20:00 UTC (1 days ago)
(HTM) web link (github.blog)
(TXT) w3m dump (github.blog)
| maddyboo wrote:
| I'm very impressed by the level of detail provided in this post
| mortem, but it leaves me with a few questions.
|
| What we know based on this report:
|
| - This bug was live in production for at least 3 days.
|
| - It was common enough to happen numerous times in the wild.
|
| - GitHub says that they identified some specific user sessions
| that may have been impacted by this bug by analyzing their logs.
|
| What we do not know:
|
| - How many leaked user sessions were identified in the followup
| investigation?
|
| - Is GitHub confident that they identified all leaked user
| sessions?
|
| - Was there any evidence of malicious use of these sessions?
|
| - Did GitHub notify the users whose sessions were leaked?
|
| - How long/for how many requests could these sessions have been
| valid between their leak and when they were definitively revoked
| on the 8th?
|
| - Is it possible that any private user data was leaked? If so,
| what data?
| azernik wrote:
| > - Is GitHub confident that they identified all leaked user
| sessions?
|
| Given that they revoked _all_ current logins, rather than just
| the affected ones, I assume that they have no way to tell which
| sessions were affected.
| dfcowell wrote:
| Most of that information is available in the initial report
| linked to in the first paragraph of this blog post:
| https://github.blog/2021-03-08-github-security-update-a-bug-...
| maddyboo wrote:
| That provides more information but doesn't fully answer any
| of my questions.
|
| Things that are still unclear:
|
| - GitHub provides an estimate of the prevalence at 0.001%,
| but doesn't state how many leaked sessions were actually
| identified.
|
| - Is GitHub confident that they identified all leaked user
| sessions? This has bearing on my question about notification
| of users whose sessions were leaked; if they did not identify
| all leaked sessions, they obviously could not contact those
| users.
|
| - GitHub states that this bug could not be intentionally
| triggered by a malicious party (why?), but they don't state
| whether any of the natural occurrences may have been used
| maliciously.
|
| - How long/for how many requests could these sessions have
| been valid between their leak and when they were definitively
| revoked on the 8th?
|
| - Is it possible that any private user data was leaked? If
| so, what data?
|
| People with technical understanding can infer most of this
| information, but I think GitHub should issue some
| straightforward answers rather than beating around the bush.
|
| (Correction to my original comment: the bug was live for a
| cumulative period of "less than 2 weeks")
| ac29 wrote:
| > GitHub states that this bug could not be intentionally
| triggered by a malicious party (why?),
|
| Because this bug requires two users making authenticated
| requests at nearly the same time. An attacker cant force
| their target to perform actions, especially not in the same
| small time frame that is likely required (milliseconds?
| less?). I suppose if the malicious party didnt care what
| account they got swapped into, and github didnt rate limit
| their requests, perhaps they could trigger this bug with
| enough time?
| Boltgolt wrote:
| GitHub says there could have been sessions that were leaked but
| can't be identified now:
|
| "While our log analysis, conducted from March 5 through March
| 8, confirmed that this was a rare issue, it could not rule out
| the possibility that a session had been incorrectly returned
| but then never used."
|
| Those sessions are now invalidated and apparently never used
| polote wrote:
| > While our log analysis, conducted from March 5 through March 8,
| confirmed that this was a rare issue, it could not rule out the
| possibility that a session had been incorrectly returned but then
| never used. This was not a risk we were willing to take, given
| the potential impact of even one of these incorrectly returned
| sessions being used.
|
| It is funny how they make it seem like they are super cautious.
| When in fact they had no other choice than do it when you
| rephrase the problem as " there were maybe some people who got
| the admin right of the github account of some of our customers"
| lathiat wrote:
| You say that, but resetting all the sessions is very publicly
| visible and plenty of smaller companies would paper over such
| things. Honestly I don't really expect any less from Github or
| Microsoft scale but don't kid yourself into thinking that kind
| of things goes un-noticed, un-investigated or un-actioned
| elsewhere.
| ComputerGuru wrote:
| To the contrary, I find my session tokens get reset left and
| right by websites that don't even care about how much of an
| inconvenience it is for the customers.
| imtringued wrote:
| Because Discord doesn't let you change avatars I have
| created three different Discord accounts. I get logged out
| of all three all the time. It's a massive pain.
| WayToDoor wrote:
| Huh ? You can change your avatar in discord settings...
| kroltan wrote:
| Maybe part of the sentence is missing and they meant
| change avatars _per server_? That would be the only
| reason I can imagine someone might want to have multiple
| accounts.
| ncann wrote:
| Yeah, I don't visit GitHub that often, maybe once a week
| but it logs me out every damn time. I wonder if it's
| because I have 2FA setup or something.
| yunohn wrote:
| I use a food delivery service (website only, no app) that
| doesn't even remember the chosen language setting, and of
| course, never keeps you logged in past 24hrs. So
| effectively, need to login every time I make an order...
| jspash wrote:
| This is my biggest pet peeve about delivery service
| websites. What are the chances that I've moved house
| since I last ordered a pizza two days ago? Why do you
| incessantly ask for my postcode? When I log in, assume
| I'm the same person at the same address. If I _have_
| moved and accidentally order to the old house...well that
| 's on me and I'd have to be pretty stupid (or hungry) to
| do such as thing.
| noir_lord wrote:
| I mean they could do both.
|
| Default to last postcode but display it on the
| confirmation screen "Sending your Pizza to LE13 XYZ" -
| amazon does it right (as you'd mostly expect)
| alpaca128 wrote:
| That happens to me even on YouTube. Every couple of
| months I have to fix the language setting and switch back
| to dark theme. What can't be fixed are Google's weird
| attempts at auto-translating video titles from other
| languages, making me click on videos that I simply cannot
| understand. Don't do that, Google. Do full title,
| thumbnail and video translation with 90% accuracy or
| just...don't. Half-assed tech is easy but not helpful for
| anyone.
| chrisandchris wrote:
| Personal opinion: Clear cookies every time you stop the
| browser process, which basically is at least once a day.
|
| Logs you out of all services. It's a nightmare as a user,
| but I don't want to stick those tracking to cookies around
| longer than a single day.
| jjav wrote:
| Yes, for sure. I use cookie autodelete to remove all
| cookies as soon as the tab is closed, so nothing lingers.
| SamuelAdams wrote:
| Firefox can do this automatically. Options > Privacy &
| Security > check the box "Delete cookies and site data
| when Firefox is closed".
|
| Anything you intentionally want to retain can be added to
| the exceptions list.
| fsckboy wrote:
| "Delete cookies and site data when Firefox is closed" has
| always bothered me; it should be "Delete cookies and site
| data when Firefox is opened", because abnormal
| termination is not "closing", and you'd still want the
| cookies deleted when you start up again.
| motiejus wrote:
| That way I'd lose the GDPR opt-outs from the number of
| sites I've gone through the process.
|
| Perhaps I should just accept the consent and remove
| cookies regularly, instead of being careful with the
| consent forms... Hmm, good idea.
| nexuist wrote:
| Is it really that hard to log in again? Using a password
| manager it's basically two clicks to autofill the right
| account and hit the submit button.
|
| Login limits are a good thing. It means that the chances of
| someone taking over your account and using it long term is
| minimized.
| Ennea wrote:
| It is not. But a ton of sites seem to be inconsistent
| about it, or it feels that way at least. Some sites I
| frequent don't log me out for months at a time, and then
| suddenly they do. And then they do it again a few weeks
| later. Followed by multiple months of not doing it once
| more.
| TriNetra wrote:
| Obviously if they are logging all actions of all users, and
| there's some decent retention period, they can find out how
| many people got unintended access and what all they did with
| it. Recently, I got an advice from a C-level executive to
| include such analytics in ASPSecurityKit [0], because that's
| what companies are looking for these days. This GH incident
| makes me consider his suggestion more seriously.
|
| 0: https://ASPSecurityKit.net
| abgr-y wrote:
| This points to the last A in AAA of security which stands for
| Authentication, Authorization & Accounting, AAA moniker is
| commonly used in reference to either RADIUS or Diameter
| (network protocols), the concept is widely used for software
| application security as well. So Accounting implies What
| resources were accessed, at what time, by whom, and what
| commands were issued?
| WORMS_EAT_WORMS wrote:
| Oof, my brain. With that giant fiasco master/slave thread the
| other day, I glanced at "race condition" and "GitHub" and
| immediately thought this was related to that.
| naebother wrote:
| cgi-bin for the win
| beders wrote:
| Another case where using immutable data by default would have
| prevented any shenanigans. It also shows the highly entangled
| relationship between state and complexity. (in this case, the env
| object)
|
| Immutable data forces you to design your system in a different
| way and remove layers of complexity.
| ed25519FUUU wrote:
| How would that help here? They would just exchange one piece
| immutable data (un-signed in user cookie) with another piece of
| _wrong_ immutable data (someone else's session).
| beders wrote:
| A mutable object 'env' was shared across threads.
| beders wrote:
| "While the immediate risk has been mitigated, we worked with
| the maintainer of Unicorn to upstream the change to make new
| requests allocate their own environment hashes. If Unicorn
| uses new hashes for each environment, it removes the
| possibility of one request mistakenly getting a hold of an
| object that can affect the next request."
| brundolf wrote:
| While in general I agree, in this case that aspect of the
| problem entirely came down to "don't use object pools that span
| multiple requests". That's a conscious optimization that was
| made and then retracted, not a subtlety that accidentally
| slipped in due to mutability itself.
| beders wrote:
| object pools == mutability (why else would you keep objects
| around in a pool)
| brundolf wrote:
| Object pools require mutability; by no means does
| mutability require you to use object pools
| beders wrote:
| Obvious.
|
| But not what I meant. I meant that it was a conscious
| decision to use an object pool and re-use objects.
|
| That doesn't happen with immutable by default languages.
|
| Ergo, these particular kinds of bugs just don't happen.
| joebob42 wrote:
| I guess the point here is that an object pool isn't the
| default anyways: theyve chosen to do something non-
| default for performance, which you can still do in
| immutable by default languages.
| client4 wrote:
| It would be super cool if they did more work to prevent brute
| force account takeovers.
| tropshop wrote:
| Some time ago I dealt with this exact bug for an e-commerce site
| with millions of requests per day, and it was also Rails +
| Unicorn.
|
| ANY time ruby 1.8 raised a timeout, it was silently swallowed,
| and Unicorn returned the adjacent request session_id. We had to
| constantly defend against any code that might hit the native
| timeout.
| artursapek wrote:
| Lmao I am so glad Ruby/Unicorn/Rails are a distant memory in my
| career at this point. So much magic and poorly defined behavior
| imtringued wrote:
| I don't write Ruby applications but I have never heard
| anything good about Unicorn.
| jacobparker wrote:
| At D2L we have a large C# code base which gets deployed in a few
| different subsets, but is largely a monolithic fleet of web
| servers.
|
| To prevent these kind of problems we have a few approaches, but
| the main way is to prevent shared mutable state. To do this we
| have a custom C# code analyzer (source here:
| https://github.com/Brightspace/D2L.CodeStyle/tree/master/src... ,
| but it's not documented for external consumption at this point...
| ImmutableDefinitionChecker is the main place to look.) It goes
| like this: [Immutable] public class Foo :
| Bar { // object is a scary field type, but "new object()"
| // is always an immutable value. private readonly object
| m_lock = new object(); // we'll check that
| ISomething is [Immutable] here private readonly
| ISomething m_something; // Fine because this
| lambdas in initializers can't // close over mutable state
| that we wouldn't // otherwise catch. private
| readonly Action m_abc = () => {}; // see the
| constructor private readonly Func<int, int> m_xyz;
| // danger: not readonly private int m_zzz;
| // danger: arrays are always mutable private readonly
| int[] m_array1 new[]{ 1, 2, 3 }; // ok
| private readonly ImmutableArray<int> m_array2 =
| ImmutableArray.Create( 1, 2, 3 ); public Foo(
| ISomething something ) { m_something = something;
| // ok: static lambdas can't close over mutable state
| m_xyz = static _ => 3; // danger: lambdas can in
| general close over mutable // state. int i =
| 0; m_xyz = x => { i += x; return i; }; }
| }
|
| Here we see that a type has the [Immutable] attribute on this
| class, so we will check that all the members are readonly and
| also contain immutable values (via looking at all assignments,
| which is easy for readonly fields/properties). Additionally, we
| will check that instances of Bar (our base class) are known to be
| immutable.
|
| Any class that were to extend Foo (as a base class) will be
| required to be [Immutable] as well. There's a decent number of
| aspects to this analysis (e.g. a generic type can be
| "conditionally immutable" -- ImmutableArray<int> is, but
| ImmutableArray<object> is not), check the source if you're
| interested.
|
| We require all static (global) variables be immutable, and any
| "singleton" type (of which we have tens of thousands if I
| remember correctly) also must be.
|
| Another important thing we do to is to cut off any deferred boot-
| up code from accessing customer data (via a thread-local variable
| that is checked before access through the data layer). This
| prevents accidentally caching something for a particular customer
| inside, say, a persistent Lazy<T> (or the constructor for a
| singleton, etc.)
|
| We've adapted our code base to be very strict about this over the
| last few years and it discovered many obscure thread-safety bugs
| and doubtlessly prevented many from happening since.
| tybit wrote:
| This is really cool, it seems similar in intent to Rust's
| "shared or mutable, but not both" philosophy. I hope it spreads
| to C# more generally somehow.
| dvt wrote:
| I love how everyone here on HN shits on Gab's mishap just because
| they happen to host a lot of nonsense alt-right garbage and their
| CEO is a weirdo, but everyone's totally cool with GitHub
| _knowingly having inconsistencies across threads_ :
|
| > We initially thought this to be an internal reporting problem
| only and that we would see some data logged for an otherwise
| unrelated request from the background thread. Though
| inconsistent, we considered this safe since each request has its
| own request data and Rails creates a new controller object
| instance for each request.
|
| I mean this is like an SNL skit: "though inconsistent, we
| considered this safe" -- inconsistency, no matter where it's
| happening, _literally_ means your code isn 't safe. Like, jeez...
| this is GITHUB we're talking about. I mean, I give _money_ to
| GitHub -- this is pretty serious and we should really stop with
| the "omg I love these detailed post-mortems!" stuff. A billion-
| dollar company knowingly having code that's not thread safe in
| their flagship product and going "eh, it's probably fine" is
| bonkers to me.
| reissbaker wrote:
| I think there's a pretty big difference between an engineer
| writing thread-unsafe code, testing and seeing inconsistencies
| that (incorrectly) appear to not affect any external users,
| shipping it, and then the security team finding the problem and
| developing new processes to prevent this in the future -- vs
| the CTO of Gab writing raw SQL using string concatenation,
| shipping it without code review, and then trying to cover it up
| afterwards.
|
| No software maintained by a large team will be bug-free
| forever, and that includes security bugs. The response to bugs
| is what matters. In this case Github's response was quite
| mature; in Gab's case, it wasn't.
| colesantiago wrote:
| > In this case Github's response was quite mature; in Gab's
| case, it wasn't.
|
| This is irrelevant.
|
| There is no excuse for this, especially if you are a billion
| dollar company.
| djmips wrote:
| The billions doesn't automatically scale development. You
| know that. Every development is still done at the human
| level. But you still might be right that there is no
| excuse.
| the_mitsuhiko wrote:
| I think this issue is relevant for Python programmers as well.
| uwsgi for many years shipped a default allocation mode for the
| WSGI environment which just cleared it between requests. One had
| to switch over to `wsgi-env-behaviour=holy` to prevent it. Holy
| is now the default but many people still set it to cheap because
| that prevented some segfaults for a while in uWSGI.
| shay_ker wrote:
| Brutal - Unicorn didn't use Rack.env in a thread safe way. It's
| tougher given that this was in a widely used app-server and not
| even in Rails code.
|
| Impressed with GH folks in finding this, though I wonder how many
| more of these types of bugs are out there in Ruby libs. It'd be
| nice to make this more difficult to do, a la FP languages, by
| making it really intentional to modify state. Maybe in Ruby 4??
| SilverRed wrote:
| I feel like if you dig down deep enough, every single site on
| the web has a serious security bug in its app code or the
| layers it runs on top of. How many more bugs are there still in
| the CPU your server runs on?
| chucke wrote:
| unicorn is single-threaded by design, and is not expected to
| support any background threading workarounds that people build
| on top of it. That "feature" in github shouldn't be passing the
| env hash.
| Thorrez wrote:
| >Unicorn didn't use Rack.env in a thread safe way.
|
| Was Unicorn itself not thread safe? It sounds like Unicorn had
| the requirement that once you send back the request to the user
| you must stop accessing the request object. Maybe Unicorn
| didn't document that requirement clearly, but even if it's not
| documented it's not clear to me that Unicorn itself is not
| thread safe, but rather that it has undocumented pitfalls that
| can easily lead to thread unsafety.
| pmarreck wrote:
| > Because the Rack environment is the same object for all
| requests
|
| Who in the heck thought this would be a good idea? For the
| record, back when Desk.com existed, I also saw strange session
| issues that were probably complex and irreproducible race
| conditions as well.
|
| Note that in Elixir/Phoenix, this class of bug is _literally
| impossible_ , thanks to immutability-all-the-way-down. Turing
| bless it.
| makkesk8 wrote:
| How do you even have logs for something like this? I feel like
| the log level to see the raw cookie responses would be a no go
| since you'd easily have hundreds of gigabytes of logs every day.
| alexeiz wrote:
| > We initially thought this to be an internal reporting problem
| only and that we would see some data logged for an otherwise
| unrelated request from the background thread.
|
| So they already conceded that their code is fundamentally not
| thread-safe and they decided to let it slide.
| macintux wrote:
| Yes, but in fairness, we all make tradeoffs to get more done in
| less time. It's much easier to justify throwing your top
| engineers at a race condition for a few days when you _know_ it
| 's placing security at risk.
| nexuist wrote:
| What is the alternative? Throw the project away and start over?
| tyingq wrote:
| I may be missing something here. I do see that the patch[1] now
| creates a new request object, and thus a new "env" Ruby hash for
| each request.
|
| But I don't see the old behavior described as _" allocates one
| single Ruby Hash that is then cleared (using Hash#clear) between
| each request"_...even after poking around in http_request.rb and
| other places. I reads like the pre-patched version would just re-
| use the hash as-is, only overwriting/adding if it saw a new key
| or value, but not deleting any keys.
|
| [1] https://yhbt.net/unicorn-
| public/66A68DD8-83EF-4C7A-80E8-3F1F...
| lstamour wrote:
| Maybe here?
| https://github.com/defunkt/unicorn/blob/2c347116305338710331...
|
| Admittedly, that's an older copy, I'd have to look up a newer
| copy of the source to see if that function or a similar one
| still exists.
|
| The change was made in Unicorn v6 according to
| https://yhbt.net/unicorn/NEWS.html so I'd have to find code
| earlier than that, maybe.
|
| https://yhbt.net/unicorn.git/tree/ext/unicorn_http/unicorn_h...
| looks like it's still there.
|
| If I'm reading this right, the init function in the above
| unicorn_http.rl function is what's referred to as
| HttpRequest#new thanks to
| https://yhbt.net/unicorn.git/tree/lib/unicorn/http_request.r...
| but I might be misreading this.
|
| Actually... compared to uWSGI's source code, this is quite
| pleasant to read, warts and all. :D
| tyingq wrote:
| Ahh, yep. I hadn't thought to look at Ragel source.
| phjesusthatguy3 wrote:
| I flagged this because it has "race" in the title and if there's
| one thing HN doesn't want to talk about, it's race.
|
| Totally serious, but also totally disgusted.
|
| Yeah, I'm done with HN. Fuck you niggers _.
|
| _ Six-to-a-room six-figure-a-year motherfuckers crying about how
| life is unfair. Fuck you.
| __alexs wrote:
| uWSGI had a similar bug for a long time that was only fixed by
| accident and was never reported as security issue. Concurrency is
| hard I guess.
| boltzmann_brain wrote:
| All race conditions are rare ;P
| teraku wrote:
| This happened to a student colleague of mine in I think 2014. We
| were working on uni projects and when he logged in to check the
| repository, he was suddenly another user and we were all really
| surprised.
| ibraheemdev wrote:
| Funny, I don't believe I was auto-logged out on March 8th.
| [deleted]
| lambda_obrien wrote:
| Always log important events with a timestamp and save them as
| long as possible.
|
| It saved me more than once in similar situations with several
| threads.
| oars wrote:
| Wow, impressive!
| lambda_obrien wrote:
| I'm guessing that's sarcasm, I edited my comment.
| gowld wrote:
| > Always log important events with a timestamp and save them as
| long as possible.
|
| Except when those events are associated with private user data
| or behavior.
| lambda_obrien wrote:
| The personal data itself should not be logged but the event
| itself should be logged with lots of details that aren't
| private info.
| [deleted]
| closeparen wrote:
| Nope, _especially_ in those cases. The more sensitive the
| data, the more important it is to have records of who
| accessed or changed it and when.
| temp667 wrote:
| Huh? I want my bank to be VERY clear on when I transferred
| money for my home closing (and keep that record). They need
| to keep records and the browser used, IP used, authentication
| flow etc. Why does this have to be thrown away?
| ghiculescu wrote:
| Anyone know if puma would be susceptible to the same bug or if
| it's unicorn specific?
| jakswa wrote:
| This is what I'm going to be digging for later... I'm wondering
| if puma also shares `env` objects across requests.
| Freaky wrote:
| Each server does share a "proto env" object, but it is
| duplicated for each Client:
|
| https://github.com/puma/puma/blob/0cc3f7d71d1550dfa8f545ece8.
| ..
|
| and Client objects are not reused:
|
| https://github.com/puma/puma/blob/0cc3f7d71d1550dfa8f545ece8.
| ..
| vemv wrote:
| It would not trigger the same bug.
|
| But since it's a multi-threaded server, it will exercise
| arbitrary code (Rails core, your app, depended-on gems) in a
| concurrent manner.
|
| This can unmask race conditions that wouldn't manifest
| themselves under a single-threaded environment.
|
| ...that's been my experience with Puma - brilliant server in
| itself, somewhat risky approach when considering the Ruby
| ecosystem.
| rantwasp wrote:
| nope. puma does the right thing. this is exactly what i looked
| at when i saw unicorn in the article
| ksec wrote:
| I thought Github at one point looked at Puma and decided not
| to switch over. I wonder if they will reconsider that now.
| rantwasp wrote:
| to be fair, switching over is probably non-trivial and/or
| risky
| rantwasp wrote:
| food for thought: if they have used puma (which is sort of what
| people have been using since 2017 when it comes to ruby/rails)
| would this bug have still existed?
| pianoben wrote:
| This is some fantastic debugging. I love war stories like this!
| They give me confidence that the world is full of mere mortals
| like myself, and we can do great things nonetheless.
| stephenr wrote:
| > One such performance improvement involved moving the logic to
| check a user's enabled features into a background thread that
| refreshed on an interval rather than checking their enablement
| while the request was being processed.
|
| This type of thing is why PHP's shared-nothing model is vastly
| undervalued IMO.
|
| Does PHP have the ability to use threads if you really need it?
| Yes.
|
| Are they generally a PITA to use? Yes.
|
| Is this a problem for the vast, vast, _vast_ majority of
| projects? Not in the slightest.
|
| Having a separate invocation of the runtime for each request is a
| feature, not a failing.
| joshdifabio wrote:
| Not for much longer!
|
| https://externals.io/message/113419#113555
| aquarese wrote:
| Thanks
| bvanderveen wrote:
| Well that's embarrassing. Passing a mutable dictionary--
| containing callbacks that could be closed over heaven-even-knows-
| what, no less!--between threads? And it's re-used between
| requests?
|
| They are lucky this wasn't much, much worse.
|
| Why does the error reporting service need to call the web thread
| back anyway?
|
| Yet another case where immutable-everything would have prevented
| this hairball from accreting in the first place.
| wfleming wrote:
| Unicorn isn't GitHub's code, though. So the object-reuse that
| was the root cause here wasn't their code, and treating your
| webserver as a working black box is pretty typical (right up
| until you hit an edge case like this).
|
| And Unicorn is pretty old. It wouldn't surprise me if this
| object reuse dates back to mongrel's code base, back when doing
| _anything_ with threads in Ruby would have been unusual.
|
| It's still bad code on unicorn's part (when I got to that part
| of the post I said "Unicorn does WHAT?" out loud), and GH
| should have been more careful introducing threads to an
| environment that didn't have concurrency before, but every step
| of the chain is pretty understandable IMHO.
| strictfp wrote:
| I agree, and the bug in the report is likely not the only one
| caused by these changes.
| andix wrote:
| > Well that's embarrassing
|
| Probably it wasn't that obvious in the code to see.
|
| But if they would've used immutable objects, or at least pure
| functions with scalar parameters in the first place, such an
| error would never occur.
| arkadiyt wrote:
| > Well that's embarrassing.
|
| An IDOR is embarrassing - this was a complex bug and Github's
| investigation/response was great here. I'd be proud if my
| company handled a security incident like this.
| brainzap wrote:
| why the acronym
| alpaca128 wrote:
| For the people as confused as me, it stands for "Insecure
| Direct Object Reference" and not "International Day of
| Radiology" or "Iowa Department of Revenue".
| arghwhat wrote:
| Creating such unnecessary complexity that should ahead of
| time be known to be fragile, and then having these resulting
| issues, is embarrassing.
|
| The follow-up report is good, but it doesn't matter when the
| conclusion is "our footguns backfired".
| renewiltord wrote:
| As someone who has never added incremental complexity to
| software until it manifested bugs, I agree.
| yarg wrote:
| It can be both.
| matthewaveryusa wrote:
| Never underestimate the inertia of working code that makes
| money.
| benmmurphy wrote:
| This post mortem and changes to unicorn should be useful for a
| lot of other ruby shops. I suspect github aren't the only people
| who are not defensively copying rack data when sharing it with
| background threads.
| rtpg wrote:
| I really like Python, but the difficulties in using the same
| process for multiple requests really get me (and I think this bug
| labels a similar sort of difficulty with Ruby).
|
| There are of course safe ways to do all of this kind of stuff,
| but there are _so many_ unsafe ways to do things, and
| accidentally share things across requests. At least in the Django
| world I think ASGI is going to make it more likely that stuff is
| done the right way but there's a big learning curve there.
| jrockway wrote:
| Does Ruby have a data race detector, and would a race detector
| have caught this unexpected sharing?
| rantwasp wrote:
| this is not Ruby. This is unicorn doing some dumb stuff
| thecodemonkey wrote:
| Really interesting write-up. Curious how they do logging, and how
| they were able to get such a detailed look at the previous
| requests, including the HTTP response. Wouldn't it be a massive
| amount of data if all HTTP requests/responses were logged? (Not
| to mention the security implications of that)
| firebaze wrote:
| As expensive as it is, datadog provides the required level of
| insight via, among other drill-down methods, detailed
| flamegraphs, to get to the bottom of problems like this one.
|
| No, not affiliated with datadog, and not convinced the
| cost/benefit ratio is positive for us in the long run.
| nullsense wrote:
| We've implemented it recently and it's helped us
| tremendously.
| ckuehl wrote:
| Purely a guess, but I've sometimes been able to work backwards
| from the content length (very often included in HTTP server
| logs) to figure out what the body had to be.
| bob1029 wrote:
| It would indeed be a massive amount of data, but bear in mind
| that only a small fraction of HTTP requests hitting GitHub are
| actually both authenticated and attempting to change the state
| of the system. Most requests are unauthenticated read-only and
| cause no state changes in GitHub.
| vlovich123 wrote:
| It's a reasonable theory but step one required to trigger
| this bug is an unauthenticated request. It's unclear if their
| logs indicated that or they got lucky trying to repro that
| someone thought to try with an unauthenticated request first
| when the back to back session didn't repro.
| whimsicalism wrote:
| From the writeup, it _sounds_ like the latter, but we
| obviously can 't be sure.
| aidos wrote:
| Could be that there's some tell tale in the logs.
|
| We recently fixed a bug with incomplete requests by
| noticing in nginx logs that the size of the headers + the
| size of the body was the size of the buffer of a linux
| socket.
| andix wrote:
| I guess the incident happened more often, then they wanted to
| tell us. So as a first step they probably turned up the logging
| and waited for new reports...
| kevingadd wrote:
| It's pretty common to do exhaustive logging even if the data is
| enormous. It's valuable for debugging and security
| investigations. As far as the security of storing log data, you
| need log messages to not contain PII regardless of how long you
| retain them.
|
| It's not uncommon to see companies talk about processing
| terabytes or even petabytes of log data per day. In the end the
| storage isn't that expensive [1], and you can discard
| everything after a little while.
|
| [1] Assuming of course that any company generating a petabyte
| of logs every day is doing so because they have an enormous
| number of customers, AFAIK the cost of storing 1PB of logs on
| something like Glacier is in the 100-300k USD/yr range.
| Absolutely a tolerable expense for the peace of mind you get by
| knowing you can go back in 2 months and find every single
| request that touched a security vulnerability or hunt down the
| path an attacker took to breach your infrastructure.
| xvector wrote:
| Distributed tracing with a small sample size or small
| persistence duration?
| dmlittle wrote:
| Given the rarity of this bug I doubt a small sample sized
| would have captured the right requests to be able to debug
| this. The probability would be the probability of this bug
| occurring * (sample percentage)^3 [ you'd need to have
| sampled the correct 3 consecutive requests].
| SilverRed wrote:
| Yes it is a large amount of data but its worth paying to store
| it because it lets you go back in time and work out wtf
| happened which will save you far more hours and solve more
| problems making it worth it.
| mtnygard wrote:
| I'm looking forward to learning about how Rust makes this kind of
| bug impossible.
| vlovich123 wrote:
| It doesn't. The issue here seems like a more vanilla logical
| data race + object reuse that Rust can't really do anything
| about. Rust only makes sure that only one thread at a time
| writes and other reads don't race that write. It can't make
| guarantees that the relative ordering of reads and wires makes
| sense for your use-case.
|
| It can simplify certain abstractions of course so that you can
| provide easier tools for devs to avoid footguns.
| cogman10 wrote:
| What rust would do for this particular situation is make it
| painful to share an object like this across threads and make
| it obvious that such a share is happening. That wouldn't
| prevent this problem, but it would make it harder to stumble
| into.
| vlovich123 wrote:
| Maybe, maybe not. A claim like that is definitely in the
| realm of "people think race condition bugs are solved by
| switching to Rust". I don't know Rust well enough to
| evaluate your claim but that feels like an incorrect
| statement - sharing data between threads should be as easy
| as wrapping something in Arc & Mutex to get similar
| semantics I think.
|
| Even if that particular bug is actually harder to stumble
| into, there's no guarantee the space of thread
| synchronization issues one might stumble into is inherently
| smaller for Rust than for Ruby, so all you're doing is
| exchanging one class of failure modes for another.
| cogman10 wrote:
| As I said, they aren't solved but they are hard to do
| unknowingly.
|
| In most other languages, sharing mutable data across
| threads is unprotected. You are lucky if you get a
| compiler warning. Rust disallows that completely. You
| cannot share mutable data across threads without first
| wrapping that data in thread safety constructs.
|
| That's what happened here. A mutable variable was shared
| between threads and updated from different threads. Rust
| makes it obvious that this variable is accessed from
| multiple threads even if it doesn't prevent 2 threads
| from writing to the variable at wrong times. That's where
| it is a partial solution to the problem.
|
| Sharing is as easy as wrapping the thing in a Mutex, and
| if you are a programmer that sees something wrapped in a
| mutex you should immediately start thinking of the thread
| implications. You don't have to, but you should.
|
| That mutex follows the variable around anywhere you want
| to start changing it.
|
| In a language like C or Ruby, that's not something that
| is patently obvious. Shared variables aren't wrapped with
| anything. It's perfectly understandable for someone to
| not realize this bit of shared state is mutated across
| threads.
| bredren wrote:
| Since this was essentially a potential organizational
| administrator account takeover security vulnerability, I can see
| why this got a write up. And as others have said, it's
| impressive.
|
| Github Actions has had many problems that affect many people this
| year alone. While the missed deadlines and stress for people
| can't be quantified by Github, they are happening.
|
| I'd urge Github to take ongoing service interruptions in Github
| Actions more seriously. I hope Github will take a stronger stance
| on providing customer communication on migration timelines
| between Packages and Container Registry.
|
| Just because a bug or service interruption doesn't risk random
| user session exposure doesn't mean it isn't important to
| communicate about and describe how the business is working to
| prevent it from happening again.
| TedShiller wrote:
| "Rare"? How about "catastrophic"? Is this written by PR people?
| tcgv wrote:
| > One such performance improvement involved moving the logic to
| check a user's enabled features into a background thread that
| refreshed on an interval rather than checking their enablement
| while the request was being processed
|
| It's never a good idea to perform multithreaded operations within
| the web server, unless you have a strong knowledge of its
| internal workings. For instance, if you miss manage threads in an
| IIS web application you can end up having your web requests
| queued since the web server has by default a fixed number of
| worker threads to serve web requests.
|
| From their bug description it's clear that ruby on rails is no
| exception to this rule. In such cases where background processing
| is required due to some condition triggered by a web request it's
| best to stick with a propper queue system [1]
|
| [1] https://thomasvilhena.com/2019/07/using-queues-to-offload-
| we...
| skytreader wrote:
| > On March 2, 2021, we received a report via our support team
| from a user who, while using GitHub.com logged in as their own
| user, was suddenly authenticated as another user.
|
| Sobering thought for me is how probable it is that a similar
| issue might be present in other platforms. The title calls it
| "rare" but given (a) the number of complex-enough web/app
| platforms in use today, (b) just the endless number of ways/infra
| permutations for "doing x", and (c) the size of a typical
| company's codebase and dependencies, could it be a more common
| occurrence than we think?
|
| Total anecdote, up for you to believe: Back in April 2016, one of
| my then-housemates bought a brand-new Macbook Air. A week into
| his ownership, he exclaims surprise that he is logged-in to
| another Facebook account, someone we are totally not acquainted
| with. I borrow his MBA and try to investigate but, alas, I lack
| expertise for it. The only relevant thing I can eke out is that
| they both went to the same event a week or so ago and probably
| shared a Wifi AP, where a "leak" could've happened. Or maybe a
| hash collision in FB's side?
|
| I would've reported it to FB but then I don't have enough
| details; with the two conjectures I have, I'm basically holding
| water with a sieve. But now the thought that someone I totally
| don't know might end up logged-in in my FB account is a
| possibility that bothers me. And I have no idea how to prevent it
| other than minimizing what could be compromised (and no, "Get off
| FB!" just isn't in the cards, I'm sorry).
|
| Kudos for Github for investigating and fixing this issue. Another
| thought that occurred to me while writing this is how many users
| of other platforms might be filing issues like this but can't
| provide enough detail and so their reports are eventually marked
| as "Could Not Reproduce". Not exactly leaky sessions but just
| general security mishaps that they would struggle to describe to
| support.
| sellyme wrote:
| > The only relevant thing I can eke out is that they both went
| to the same event a week or so ago and probably shared a Wifi
| AP, where a "leak" could've happened. Or maybe a hash collision
| in FB's side?
|
| Given two users in extremely close geographic proximity, the
| chances of them having manually logged in on the device
| themselves with your housemate either forgetting or not knowing
| are much much higher than the chances of an authentication
| exploit in the world's most popular site that would have gone
| undiscovered for a subsequent five years at this point.
|
| Probably somewhere between the two (but closer to the "they
| just logged in on that computer" side of things) is the chance
| of the LAN being some eldritch abomination that just
| occasionally serves responses to the wrong local IPs, but
| that's not something any online service needs to care about.
| mschuster91 wrote:
| > is the chance of the LAN being some eldritch abomination
| that just occasionally serves responses to the wrong local
| IPs
|
| This could have worked in ye olde pre HTTP days, but with
| HTTPS this _cannot_ reasonably have happened.
| [deleted]
| robindebaets wrote:
| I once had the exact same thing happen on PayPal. I logged into
| my account, but was greeted by the information of another user.
| I immediately logged out. I never reported it and it it never
| happened again.
| exdsq wrote:
| Don't sell yourself short, I'm sure you're more than
| experienced to get your own MBA!
| ficklepickle wrote:
| If it were a hash collision, I would think the odds of it
| involving two users geographically close to each other would be
| quite small.
|
| That's a weird one, thanks for sharing it!
| tcgv wrote:
| Not if you consider that FB servers are somewhat
| geographically partitioned to optmize response times
| vanviegen wrote:
| That sounds like the event was forcing the use of an HTTP proxy
| server with over-eager caching. But in 2016, that would be a
| bit anachronistic..
| thraway123412 wrote:
| Also on GOG:
| https://www.gog.com/forum/general/warning_massive_security_r...
|
| > Hello everyone! This is forum regular fronzelneekburm,
| posting from the account of some poor Chinese guy that gog
| randomly logged me into. I'll explain the situation in further
| detail in another post from my actual account once I logged out
| of this one.
| alickz wrote:
| Happened on Steam too a couple of years ago, people being
| logged into other people's accounts.
| efreak wrote:
| My understanding was that was a caching issue and you
| couldn't actually do anything. Unless maybe I'm thinking of
| the wrong thing?
| BurningFrog wrote:
| > _one of my then-housemates bought a brand-new Macbook Air_
|
| My most fun theories are
|
| 1. Somewhere in the supply chain, someone logged in to their FB
| account on this machine.
|
| 2. During the first week he owned it, someone logged in to
| their FB account on this machine.
|
| 3. You ex housemate lied about this. Crazy people are rare, but
| _MUCH_ more common likely bugs that could cause this.
| deckard1 wrote:
| > The title calls it "rare"
|
| I believe they are referring to the rarity of hitting this bug
| during the period of time it was broken, and not the rarity of
| the bug in general.
|
| People were speculating on this here:
| https://news.ycombinator.com/item?id=26395138, and I even left
| a comment about this type of bug, but on Node. This category of
| bug, where session data leaks across requests, is actually
| incredibly common.
| wholien wrote:
| The earlier report[1] says:
|
| > The underlying bug existed on GitHub.com for a cumulative
| period of less than two weeks at various times between February
| 8, 2021 and March 5, 2021. Once the root cause was identified
| and a fix developed, we immediately patched GitHub.com on March
| 5. A second patch was deployed on March 8 to implement
| additional measures to further harden our application from this
| type of bug. There is no indication that other GitHub.com
| properties or products were affected by this issue, including
| GitHub Enterprise Server. We believe that this session
| misrouting occurred in fewer than 0.001% of authenticated
| sessions on GitHub.com.
|
| If we use the 0.001% figure and assume all 56 million of their
| users are authenticated, then this occurred with 560 sessions
| at most.
|
| You are right in that when something only happens to a
| minuscule % of users is hard to investigate and repro. But at
| the same time something as critical as "I logged in and was
| authenticated as another user" should probably be immediately
| escalated to the highest levels within your company.
|
| 1: https://github.blog/2021-03-08-github-security-update-a-
| bug-...
| abdupo wrote:
| So I am one of the users who reported this. I quickly filed a
| support ticket but then I also searched my network to find
| someone on GH's security team to email. I'm not sure if support
| or the security person I emailed escalated it first.
| nodesocket wrote:
| Did you get any bounty?
| codethief wrote:
| I recently reported a phishing site to GitHub which looked
| exactly like GitHub.com and GitHub responded _within a day_.
| So maybe your escalation wouldn 't even have been necessary.
| :)
| benlivengood wrote:
| This is the kind of bug that makes me leery of frameworks and
| tall application stacks. Having to trust more than one layer of
| multithreaded code between an application and the kernel gets
| riskier with every layer.
|
| Also another reason not to mix the main application code with the
| authentication/authorization code. Setting security cookies
| should be done by dedicated code in a dedicated process to avoid
| this kind of interaction, and the load-balancer layer should
| filter the setting of security cookies from non-security backends
| to enforce it.
| bkartal wrote:
| It is scary to think this is happening with the bank account
| systems.
| gizmo wrote:
| There is a lesson here about the way we go about sharing memory
| between threads that is implicit and fundamentally broken. Adding
| extra threading in order to improve performance is clearly the
| wrong tradeoff when you're in an environment where threads can
| introduce bugs like these. When it takes 10 senior engineers to
| figure out why one user session gets data from another session
| you've already lost.
|
| Shared memory environments are very unsafe and we should be
| working towards a memory model where data that belongs to
| different organizations can never get mixed. This isn't
| impossible, but it requires new programming languages that allow
| us to express which memory areas can't get mixed up.
|
| We've got to start designing our backend infrastructure
| differently such that this entire class of memory errors can't
| ever occur.
| Cloudef wrote:
| It's called separate processes or unshare after new thread.
| lstamour wrote:
| In many languages and web frameworks, the assumption is that
| the application server will do this for you. For example, by
| issuing a new env variable with each request such that requests
| can't easily share memory with other requests, though as noted
| that might not stop background threads from retaining such
| memory.
|
| In other cases, you might assume containers would be enough
| separation, or maybe different databases (sharding, for
| example). One approach I've considered but not yet implemented
| is to take end-user JWT cookies and use them to authenticate to
| a database such that a user can only ever see the rows they
| should be allowed access to, and all other SQL queries would
| fail.
|
| But no matter how you implement a security control, it's
| possible to make mistakes and allow others to see data you
| didn't intend to share with them. Even Rust, famous for its
| explicit memory ownership model, could still have similar bugs,
| they just might be a bit more obvious to spot in some session
| connection pool implemented incorrectly, for example, or maybe
| exposed via a compiler optimization gone wrong. Code review
| will still likely be one of the primary defenses of this kind
| of bug, though hopefully better tooling and language support to
| detect race conditions or shared ownership warnings will help
| over time?
| DougWebb wrote:
| Replace Rails and similar server-side frameworks with CGI
| scripts, and you're done. Every request is a new process with
| its own memory space.
|
| Yes, performance is an issue with that old approach. Around the
| turn of the century I was responsible for a Perl web
| application that ran as a CGI script, and to speed it up I
| wrote my own Perl httpd server that had my web application code
| pre-loaded. Then for each request I forked a new process, which
| carried over all of the pre-loaded code and already-running
| perl.exe, so there was very little startup time (compared to a
| normal CGI script startup.) I was careful to take full
| advantage of the operating system's Copy-on-write memory
| semantics, so that the only memory that had to be allocated for
| the new process was Perl's runtime stack. All of the memory
| containing the application code was shared across processes
| because it never got written to.
|
| This web application is still running today, though I left the
| company in 2010. Back then, it was handling about six million
| requests per day, about 75% of which was during US daytime
| working hours. So, around 160 requests/second, spread across
| five or six servers that were getting old in 2010.
|
| The modern frameworks have taken the same concept a step
| further, and replaced the process fork with a thread pool.
| That's faster, but it allows memory leaking in a way process
| forking does not. You have to go out of your way to share
| memory between processes, but with threads it's easy to do
| accidentally.
| hu3 wrote:
| Workerman [1] uses a similar approach to put PHP pretty high
| on TechEmpower [2]. Specially now with PHP8's JIT.
|
| [1] https://github.com/walkor/Workerman
|
| [2] https://www.techempower.com/benchmarks
| wicket wrote:
| It doesn't seem like they took this security incident very
| seriously. It's a pretty serious bug that was reported by two
| different users within a few hours. It could have been abused by
| others, but it doesn't look like it occurred to them to roll back
| recent changes whilst they investigated the problem.
| vemv wrote:
| This is a good example of why anything related with threading
| makes me nervous when using Ruby. When a language is mutable and
| thread-unsafe by default, keeping the whole stack sane at all
| times seems prohibitively arduous. One cannot code-review a whole
| ecosystem.
|
| I believe that this could be addressed with an architecture that
| favored single-threaded processes that can handle a single
| request at a time. Said processes would be numerous, have a
| (best-effort) low memory footprint, and be pooled under some
| master process written in a concurrent/performant manner.
|
| The result wouldn't be necessarily pretty or resource-efficient
| (especially in terms of RAM), but it also can be seen as a
| consequence of Ruby's requirements for safety.
|
| An org like GH almost certainly can afford using more/beefier
| servers it it means that entire classes of bugs will go away.
| amarshall wrote:
| > when using Ruby
|
| To be fair to Ruby, most languages are largely mutable and
| without any strong thread-safety guarantees. Does that make it
| okay? No, but it's far from a Ruby-specific problem.
|
| > I believe that this could be addressed with an architecture
| that favored single-threaded processes that can handle a single
| request at a time.
|
| Unicorn (the application web server GitHub is using) does
| follow this model. GitHub added their own additional thread
| within that process that broke that design in an unforeseen
| way.
| mcherm wrote:
| I just want to make clear that for me, write ups containing this
| level of detail analyzing the error that occurred give me greater
| confidence in the company rather than less confidence. Greater
| even than if I had never heard of the original bug. There are
| many bugs I have never heard about, but any company that uses
| this level of care and openness is likely to be very reliable.
| TedShiller wrote:
| This is such a n00b mistake though, the "greater confidence" is
| quickly canceled out. I end up having less confidence in this
| company.
| ShroudedNight wrote:
| I think it's more a statement of the quality and clarity of
| the postmortem that this problem appear so obvious in
| hindsight.
| SilverRed wrote:
| Every bug is a "n00b mistake" when you are looking right at
| it. There is not a single large project that is not filled
| with or at least has had trivial bugs with serious
| repercussions.
| sellyme wrote:
| > Every bug is a "n00b mistake" when you are looking right
| at it.
|
| Except the ones that are just the universe conspiring
| against you, like bit flipping in the SM-1800 because of
| heavily irradiated cattle being transported on a nearby
| train line.
| TedShiller wrote:
| If you're a noob then yes every mistake is a noob mistake
| petters wrote:
| I agree with you in general, but in this particular case they
| knowingly deployed thread-unsafe code in production while
| increasing the threaded concurrency. That should have broken
| internal development rules.
| eli wrote:
| Now imagine what happens behind the scenes at other service
| providers. I'd bet the ones that have policies that bury the
| details of incidents, on average, have worse problems.
| yuliyp wrote:
| Yes. Writing code with bugs breaks internal development
| rules. The issue wasn't thread-unsafe code. It was confusion
| about object lifecycle. An env object was assumed to no
| longer be used by the web framework after a request
| terminated, while in reality it was recycled for future
| requests as a performance optimization.
| simonw wrote:
| It sounds to me like they created new internal development
| rules as a result, which is a smart response to the issue.
| m463 wrote:
| I have to say the presentation is exceptional, including very
| good diagrams of the problem.
|
| But there's a thought in the back of my head wondering what
| kind of stuff hit the fan, and wondering who had to create the
| presentation. Was it the person who checked in the broken code,
| was it the bug fixer or his manager, or was it a pm that got
| stuck having to write up what happened?
| fotta wrote:
| My guess is that someone close to the remediation wrote it
| (or it was written by the eng team) and it went through 5
| levels of editing before publication.
| m463 wrote:
| I wonder about the email thread. how many recipients on the
| TO, how many on the CC (how many on BCC?)
| codetrotter wrote:
| > how many on BCC?
|
| I guess the only way we will know that is if there's ever
| a vulnerability in the session handling code of the mail
| server that they use also xD
| Complexicate wrote:
| Agreed. However, I smirked a bit when I read your comment,
| since this is ultimately coming from Microsoft. Let's hope some
| of this mentality seeps into other Microsoft projects
| thitcanh wrote:
| GitHub is owned by Microsoft but as far as I know it operates
| pretty independently. Let's hope it stays that way.
| craftinator wrote:
| At this point we're all just waiting for that last E.
| [deleted]
| johannes1234321 wrote:
| What would be the benefit for Microsoft in that case? -
| They are transforming themselves from a platform company
| based around windows, it a platform company based around
| Azure. GitHub is a relevant part there.
|
| This doesn't mean they are only doing good(tm) with
| GitHub, but destroying it and open source vendors would
| be counted to their interests.
| edoceo wrote:
| Or, pro-actively moving to GitLab - either self-hosted or
| paid and using GH as a simple mirror for discovery.
| arthurcolle wrote:
| I don't understand this reference, can you elucidate
| xxqp wrote:
| elucidate, extend, extinguish
| artificial wrote:
| Most likely a reference to Embrace, Extend, Extinguish. A
| strategy Microsoft employed in the past.
| arthurcolle wrote:
| Ah thanks, yeah. Familiar with the strategy, but didn't
| get the connection - been a long day over here
| m463 wrote:
| We should Eulogize
| scintill76 wrote:
| https://en.m.wikipedia.org/wiki/Embrace,_extend,_and_exti
| ngu...
| zelon88 wrote:
| > We believe that transparency is key in earning and keeping the
| trust of our users and want to share more about this bug.
|
| I find it ironic that the largest repository of open source code
| is closed source and and owned by the largest closed source
| software company on the planet, yet they somehow still "believe
| in transparency."
|
| That's like McDonald's "transparently" recalling burgers because
| of a chemical contamination, and then when you ask what else is
| in the burgers they tell you it's a secret.
| berniemadoff69 wrote:
| Fortunately Microsoft, like McDonalds, can use their billions
| of dollars in the bank to produce high quality graphics to
| 'explain the situation' - and when people visit the site, they
| will be Delighted and Impressed with these high quality
| graphics - instantly remedying the situation. 'Look how nice we
| explained it - look at the cute cartoon characters solving a
| puzzle in the header at the top of the page, see how slick we
| are?'
| spaetzleesser wrote:
| I wonder if open sourcing their code would even be a problem
| for them now. I don't think the code is worth much vs the brand
| name and scale.
| simonw wrote:
| Strategic issues aside, open sourcing a codebase that is over
| a decade old is extremely hard. You need to be absolutely
| sure there's nothing in the code or commit history that's not
| suitable for release.
|
| Just one example: what if someone pasted an error log message
| into a commit message that included a piece of PII?
| [deleted]
| xeeeeeeeeeeenu wrote:
| Open-sourcing the codebase might affect the sales of the on-
| premises version.
| EMM_386 wrote:
| I wonder if Microsoft has long-term plans to move Github off of
| things like Unicorn/Rails/Erlang into a technology stack that
| would be more familiar to them.
|
| I'm not judging the choices of technology, but I'm thinking it
| was easier for them to buy Github outright then try to compete
| with it. I assume Github still operates independently, with
| engineers who are familiar with this stack.
| rantwasp wrote:
| hah. good luck just moving it to another stack. this just
| doesn't happen.
|
| also, what does familiar to them means in the MS context? i'm
| sure the people that work on github are familiar with the stack
| EMM_386 wrote:
| Agreed, the comment was short sighted. Was hesitant to even
| post it considering many involved are likely here on HN.
|
| I wasn't sure how independent Github is from Microsoft after
| the acquisition. My thoughts were initially in moving
| engineers around internally from within Microsoft to work on
| Github.
|
| Multi-threaded race conditions in a Rails app under Unicorn
| Rack takes a lot of experience in specific tooling.
| baggy_trough wrote:
| The smartest engineers in the world are not smart enough to write
| bug free multithreaded code.
| fanf2 wrote:
| This is the kind of multithreading bug that Rust's borrow
| checker can prevent: the compiler would have complained that
| the background task and the main request handler have write
| access to the env object at the same time, which is not
| allowed.
| astrange wrote:
| Writing your website in PHP would also have prevented it,
| because its amateur process per request design is actually
| the best way to do things, scales well, and has the right
| security properties. Having an app server that can have
| requests from different users in the same memory space is a
| mistake.
| baggy_trough wrote:
| You can also run Ruby's unicorn or puma servers as a
| single-threaded process cluster.
| joshdifabio wrote:
| PHP might move away from this safe approach with fibers,
| which are about to be added to the language.
|
| https://externals.io/message/113419#113555
| WJW wrote:
| Nobody is smart enough to write bug free code, but at least
| some companies have the smarts to fix their code and blog about
| how they did it.
| robotresearcher wrote:
| You're right but the parent's point is that multithreaded
| code with shared objects is known to be particularly
| difficult to do perfectly, and hard work to debug. It's so
| difficult that the rational strategy may be to architect to
| avoid it. We will have bugs, but not _those_ bugs. Those bugs
| stink.
| baggy_trough wrote:
| may > definitely is, whenever possible.
| trav4225 wrote:
| It depends on the complexity.
| jbverschoor wrote:
| Is there a reason for the vague title? Any other title would
| cause this to be the top post for 1 or 2 days
___________________________________________________________________
(page generated 2021-03-19 23:03 UTC)