[HN Gopher] Okta Outage
       ___________________________________________________________________
        
       Okta Outage
        
       Author : hunter2_
       Score  : 96 points
       Date   : 2021-12-15 15:57 UTC (7 hours ago)
        
 (HTM) web link (status.okta.com)
 (TXT) w3m dump (status.okta.com)
        
       | nickdothutton wrote:
       | Auth services need to be engineered for at least five nines if
       | not six. System design fail.
        
         | justapassenger wrote:
         | You can engineer for any number of nines and still have massive
         | outages.
        
           | thrashh wrote:
           | With that logic, you can do anything and it's A-OK if nothing
           | you do succeeds.
        
             | salawat wrote:
             | Can confirm. This is how the world works smh.
        
         | neurotixz wrote:
         | OKTA SLAs and support terms specifically exclude AWS outages,
         | so why would they?
        
         | netghost wrote:
         | Ahh, but how many of the nines need to go on which side of the
         | decimal point?
        
       | teddyh wrote:
       | https://news.ycombinator.com/item?id=29567170
        
       | whalesalad wrote:
       | TIL okta is in us-west-2
        
         | gabrielsroka wrote:
         | With redundancy in other regions
         | https://www.okta.com/video/oktane19-roadmap-why-building-cus...
        
           | marcinzm wrote:
           | Given they were down I would say there was, in practice, no
           | redundancy. Simply claiming something doesn't make it true.
        
             | gabrielsroka wrote:
             | I don't know the details but they don't fail over
             | automatically. There has to be a reason to push the button.
             | Perhaps the reason did not exist this time. But I know for
             | sure that there is redundancy.
        
               | stevehawk wrote:
               | So Okta's redundancy is about as reliable as AWS's status
               | page?
        
               | marcinzm wrote:
               | Redundancy is only relevant if it helps you in an outage.
               | Otherwise it's just a pointless marketing term no matter
               | how much effort you put into it.
        
         | pdx6 wrote:
         | Okta uses nearly all the us regions, with older (and larger)
         | customers in us-east-1.
         | 
         | I used to work there and know the internals well. These aws
         | outages must be causing massive chaos there.
        
       | dingosity wrote:
       | Oh. It wasn't just me.
        
       | anonymousiam wrote:
       | Heh. Our company just switched to their TFA from MS as of this
       | morning. Poor timing.
        
         | polskibus wrote:
         | What was the business rationale for such switch? I usually see
         | people migrate toward MS auth not away from it.
        
           | oneplane wrote:
           | We're moving everything that is still tied to it away from
           | MS. The last products we have to solve are Dynamics and Excel
           | but that scope is so small compared to everything else that
           | it might not matter to leave those as-is for now if we can
           | get at least Dynamics as SaaS and ditch AD (which only
           | remains for Dynamics).
           | 
           | MS doesn't do the things we need in a better way than other
           | options, and it's almost always more expensive at product
           | level and TCO level.
        
             | jiggawatts wrote:
             | I've found the opposite -- as long as you're _happy_ with
             | staying within the MS ecosystems. So Azure, Office 365,
             | Teams, etc...
             | 
             | You hear people calling Microsoft expensive when they're on
             | some random mix of Gmail, Notes, CM9, or whatever.
             | 
             | Then MS seems expensive because it's all or nothing.
             | Dipping your toe in the water turns into a dive to the
             | bottom of the pool.
        
               | oneplane wrote:
               | To be honest not a lot of our users and administrators
               | have actually been happy in the MS ecosystem. There are a
               | few outliers, some licensing middlemen, a few MSPs and a
               | couple of hardcore Excel number crunchers that use the
               | Axapta or Dynamics connectors. But you can find those
               | anywhere like with SAS and SPSS. A lot of users don't
               | really care at all so that just makes it a cost and 'does
               | it do the bare minimum'-deal for them.
               | 
               | A few people that really invest and enjoy a specific
               | application does not make it great, especially when it
               | turns our they are just doing more than they should be
               | doing; i.e. when you have an InDesign professional that
               | would be typesetting materials for publication but the
               | person that writes the copy is also trying to 'typeset'
               | the source in Word. It's great if you then feel like Word
               | gets you cool typeset documents as a power user, but if
               | 9999 people in a 10k company don't do that and just let
               | the publication team do that properly in InDesign
               | according to the media standards, it's no reason to keep
               | it as a default available application.
               | 
               | A lot of the usage comes from "well, it was already there
               | so I went and did it in that". Not because it was
               | actually the standard, best choice or in scope of the
               | task that was supposed to be done.
               | 
               | Same goes for things like notes and documentation:
               | 
               | - Code-level docs go in the repo (MD, RST mostly) - Org-
               | level docs go in the wiki (Confluence) - Publications are
               | delivered as copy to the publication team which then uses
               | the DTP/typesetting thing of choice
               | 
               | Yet someone who would ignore that creates extra work by
               | doing it in a different application first, then copying
               | it around and converting it. That means that the
               | person/process needs to be fixed, and doesn't mean we
               | need Word as an expensive WordPad/Pages replacement.
               | 
               | Now, this might not apply to things like mini-orgs inside
               | a bigger org, or very small companies and individuals.
               | But I wasn't writing about those anyway ;-) At that level
               | you don't really have the size and scope to make good
               | choices anyway, and you're best off just sticking with
               | one big vendor, not because they are the best, but
               | because you won't be handling multi-vendor management
               | anyway.
        
           | anonymousiam wrote:
           | According to our CTO, it was related to security. With
           | Microsoft, the TFA options did not include a hardware token.
           | So now we can authenticate with a phone call, a text message,
           | or a security token. (
           | https://www.okta.com/identity-101/security-token/ )
           | 
           | The main advantage is that the hardware token can be used in
           | areas where mobile phones are prohibited, and of course
           | immunity from a SIM swap attack.
        
             | vladvasiliu wrote:
             | They do support hardware tokens. I use a Yubikey with it.
             | However, support is spotty outside of Windows.
        
               | anonymousiam wrote:
               | There may have been other reasons not mentioned, such as
               | Microsoft's tiered services model. It sometimes seems as
               | if they deliberately provide poor solutions at the lower
               | tiers. The USG is pretty pissed off about that.
               | 
               | Also, Yubikey would would not work, because like mobile
               | phones, USB devices are restricted in some areas.
        
               | vladvasiliu wrote:
               | > Also, Yubikey would would not work, because like mobile
               | phones, USB devices are restricted in some areas.
               | 
               | What kind of token would work, then? Something that only
               | generates a TOTP, like those fobs some banks used to give
               | out?
        
               | anonymousiam wrote:
               | Yes. Same method as Google/Microsoft Authenticator, but
               | implemented in a separate hardware device (fob).
        
               | Closi wrote:
               | Ah, Microsoft has this in public preview at the moment so
               | full support looks imminent.
        
               | bsder wrote:
               | Link please so I can track?
        
           | amw-zero wrote:
           | I think that depends on many factors, like where you work,
           | what class of companies you work for, etc. For example, I
           | have not even seen a Windows machine in 10 years at work. I
           | can't think of anyone in my professional circle either who
           | would suggest using any MS product.
        
             | jiggawatts wrote:
             | I saw a Linux machine recently... I think.
        
           | nefitty wrote:
           | I had to use Okta at a company once. I asked myself the very
           | same question you posed, every single day.
        
             | vosper wrote:
             | Is MS auth better? Does that mean Active Directory?
             | 
             | We're in the middle of a migration from in-house auth
             | (which we need to get rid off) to Okta and I think the
             | people involved are finding Okta pretty confusing. But it's
             | a big product and auth stuff is complicated, so I'm not
             | sure how much it's Okta's fault.
        
               | vladvasiliu wrote:
               | In the context of Okta it's probably AzureAD. But yes,
               | it's related to Active Directory, you can easily sync the
               | two. It's probably why many companies use it: it's easy
               | to add on to your existing Windows infrastructure.
               | 
               | My client uses it, it works mostly well. It does have its
               | annoying limitations, though, such as no group
               | inheritance and limited support for hardware tokens
               | outside of Windows (no support on Safari/iOS,
               | Safari/macOS, Firefox/Linux).
        
               | count wrote:
               | What do you mean hardware tokens outside of windows? For
               | primary workstation authentication? Or AzureAD auth? By
               | hardware tokens do you mean RSA or TOTP hardware fobs?
        
             | marcinzm wrote:
             | I've used Okta at two companies now and I found it fairly
             | pleasant. Most issues were around people getting locked out
             | in my experience.
        
       | mooreds wrote:
       | I think this is why having the ability to self-host is
       | worthwhile. This option gives you flexibility to bring this stuff
       | in house if you want to build a team to operate it (or put it on
       | a current team's todo list).
       | 
       | Should you? I don't know your situation and whether you can build
       | an Okta-caliber level team internally. (My guess is that many
       | smaller or non-tech focused orgs would have a hard time with
       | that, but that's just a guess.) It's a hard question worth
       | asking.
       | 
       | It's easy to think "we could have done better" when things are on
       | fire, as opposed to all the times when the status chart is all
       | green and you don't have to think about Okta (feel free to
       | s/Okta/other service provider/) at all.
       | 
       | Disclosure: I work for FusionAuth, an auth provider that has both
       | SaaS and self-hosted installation options.
        
         | zitterbewegung wrote:
         | When I worked for an event services company and a fairly large
         | SR22 / budget insurance company the better question is not
         | cloud or no but really hybrid cloud systems. The first one had
         | a rather long downtime event that they invested in a remote
         | failsafe. The insurance company was largely resilient and when
         | power went out the servers were up but the people doing work
         | weren't able to come there. They bought a generator but they
         | weren't able to turn it on.
         | 
         | Blaming Okta or any other group isn't the issue. Your customers
         | don't care how you are down they only care if you are down.
         | 
         | Also, I got a bill from Amazon when I forgot to shut down a
         | pagemaker instance and that cost me $700. I self host now
         | buying a business internet package with a static ip. I also
         | upgraded the machine but it wasn't necessary and in hindsight I
         | shouldn't have done the upgrade but just fix the case.
        
       | tyingq wrote:
       | I know there are complexities involved, but auth is one of those
       | things that needs to be very insulated from single region issues.
       | How long was Okta not working for customers?
        
         | tw04 wrote:
         | Someone should tell Amazon... IAM is single homed to us-east-1
         | AFAIK.
        
           | dastbe wrote:
           | (I used to work at aws)
           | 
           | The control API (i.e. adding/removing roles, modifying
           | policies, etc.) is available out of us-east-1. However, the
           | bits of IAM that relate to distributing credentials to
           | instances/tasks/lambdas and STS are all regionalized and
           | isolated.
        
             | dboreham wrote:
             | So...parent is correct.
        
               | dastbe wrote:
               | no, because most applications don't have an online
               | dependency for creating roles and modifying policies.
               | what they do typically have an online dependency for is
               | provisioning credentials from those roles, which is
               | architected to be regionally independent.
        
               | sharpy wrote:
               | Disclaimer: Former AWS engineer, never worked on IAM
               | directly.
               | 
               | AWS is divided into multiple partitions. For the vast
               | majority of users, there is one partition - the regular
               | commercial - other partitions being China, GovCloud, etc.
               | 
               | Within each partition, there is a primary region that
               | needs to be available for creation/mutation of
               | credentials and policies. However, that data is
               | replicated to other regions within the partition. That
               | means the use of credentials that exist does NOT depend
               | on the primary region being available. The replication is
               | something that is closed monitored, and SLA breaches will
               | result in pages.
        
               | t0mas88 wrote:
               | Not really. The parts that can take your app down are
               | distributed.
        
         | [deleted]
        
         | kache_ wrote:
         | If you only knew how bad modern software infrastructure really
         | was
        
           | tyingq wrote:
           | Yeah, for a similar vendor, it's interesting to read this
           | page and look at the diagram:
           | 
           | https://auth0.com/availability-trust
           | 
           | And then read this tweet:
           | 
           | https://twitter.com/auth0/status/1471159935597793290
           | 
           | Edit: Ah, seems they picked us-west-1 and us-west-2 as the
           | two regions... _" In this case, we use two AWS regions: us-
           | west-2 (our primary) and us-west-1 (our failover)."_[1] So
           | bit by a double-region failure.
           | 
           | [1] https://auth0.com/blog/auth0-architecture-running-in-
           | multipl...
        
             | crescentfresh wrote:
             | Now that Okta bought Auth0 what's the developer experience
             | like I wonder? I imagine the infrastructure of the two
             | products are still completely isolated. But is it still a
             | separate product you can use for identity management or are
             | new customers forced to use Okta?
        
               | bouzouk wrote:
               | Auth0 customer: they kept everything isolated (and
               | promised it will stay like that for a while)
        
         | drooby wrote:
         | Looks like about 45 mins
        
           | activitypea wrote:
           | Seconded
        
         | abruzzi wrote:
         | Cisco Duo SSO/MFA was also out this morning during the AWS
         | outage. I guess usable redundancy for these services is a
         | difficult problem.
        
       | sbilstein wrote:
       | rolling your own auth is underrated.
        
       ___________________________________________________________________
       (page generated 2021-12-15 23:01 UTC)