[HN Gopher] Inside the longest Atlassian outage
___________________________________________________________________
Inside the longest Atlassian outage
Author : andyjohnson0
Score : 714 points
Date : 2022-04-13 15:27 UTC (7 hours ago)
(HTM) web link (newsletter.pragmaticengineer.com)
(TXT) w3m dump (newsletter.pragmaticengineer.com)
| hsnewman wrote:
| Sounds like the continuity planners at Atlassian (the fall guys)
| will be looking for a new job.
| bogomipz wrote:
| >"The outage is its 9th day, having started on Monday, 4th of
| April." >"It took until Day 9 for executives at the company to
| acknowledge the outage."
|
| Just to put this in perspective. These executives would have left
| on a Friday afternoon to start their weekends without bothering
| to publicly address an ongoing outage that was by then 5 days
| old.
|
| This is mind boggling. Like did some C-level exec say something
| like "Let's just park this whole outage communication discussion
| until Monday, have a good weekend everyone."?
| gunapologist99 wrote:
| Trello seems to still be up?
| R0ger wrote:
| I guess this is wake call for the people rushing to SaaS
| solutions.
| brianwawok wrote:
| Is it?
|
| We use JIRA. Not impacted.
|
| If this had hit us.. we would just switch to excel or something
| for a week/month?
|
| But maybe we are a very light user of JIRA. Nothing in there
| can't be replaced. It's "nice" to be able to go look up a 3
| year old bug and which client reported it, but not really
| crucial for day to day ops.
| ProAm wrote:
| > We use JIRA. Not impacted.
|
| This time.
| bborud wrote:
| "Switch to Excel for a week/month"
|
| Right.
| mrits wrote:
| I wonder why you use Jira if a spreadsheet is sufficient for
| your use case.
| HeyLaughingBoy wrote:
| He didn't say it was sufficient; he said they could do it
| for a short while. I consider myself in the same situation:
| we depend on Jira, but for a week or so it's not a big deal
| to use a bunch of Post-It notes.
| function_seven wrote:
| Same reason I use oil lamps when the power is out, even
| though electric bulbs are my normal lighting.
|
| A spreadsheet may be sufficient, but it's not as good as a
| system designed for development workflows.
|
| (This comment sounds like I have a speck of love for JIRA.
| I don't! :)
| mrits wrote:
| I don't see this as a valid comparison. There is
| information loss. This has happened to my team which had
| about 50 people and it was very chaotic. It took us
| several days to just create the state our features were
| in.
|
| Today it would even be more troublesome as we have a lot
| of integration rules dependent upon the workflow. I'd
| probably just recommend everyone uses a few weeks for
| self improvement and only address critical production
| issues.
| dangus wrote:
| Outages definitely happen with on-premise software.
|
| At some point the logo on the engineer's badge doesn't really
| matter.
| kgeist wrote:
| We use on-premises setups for almost everything (we generally
| avoid cloud solutions to have full control of our data),
| sometimes (approximately once a month) it goes down for a few
| minutes which already feels like a torture because all our
| processes depend on it, I can't imagine having no access to it
| for several weeks, all our work would stop to a halt... The
| office of the guy who administers on-premise servers is literally
| next door, all it takes is to make a visit to him and everything
| works again after 5 minutes. Reading horror stories like this
| (Slack being down, Atlassian being down, no one knows what is
| happening and when it will end etc.), I wonder why many companies
| choose cloud solutions for critical business processes. Is it
| pricing? Ease of use? I can understand why very small companies
| would choose it, but I don't understand why a medium/large
| business would choose anything but an on-premises setup.
| lloydatkinson wrote:
| Cloud solutions can work well. I've used GitHub, Azure Devops,
| and BitBucket (another wonderful atlassian product /s) and
| BitBucket frequently craps out, multiple times a week. We need
| to rerun builds in TeamCity because BitBucket stops talking to
| it.
| Msw242 wrote:
| You're assuming every team would have better uptime with in-
| house solutions
|
| I think many would have worse uptime even with more headcount
| tetha wrote:
| In our experience, this strongly depends on the services
| involved, as well as the scale.
|
| For example, for our own service: If you have a hundred or
| two hundred licenses, you can drop our system on a linux box
| and usually you have to throw a yum update and one or two
| service restarts at it every few months and it just works. I
| honestly wouldn't be surprised if many of our small on-prem
| solutions have better uptime than the SaaS clusters, or be
| capped in uptime by some externality, rendering the system
| downtime irrelevant. If their VMWare cluster is down, our
| system is down, but no one cares.
|
| This also mirrors a lot of our internal systems. At a small
| scale, you can just dump chef, jenkins, sonar, nexus,
| whatever on a linux box and forget about it.
|
| However, this changes with high license counts. We have
| singular customers in our SaaS offering that are more than 50
| - 100x bigger than the small on prem systems. At that point,
| our SaaS offering is better than anything the customer could
| to on-prem. I'm confident to say this about all of our
| customers, except maybe 2.
| bob1029 wrote:
| I find this argument to be totally bs these days.
|
| If anything, a smaller company with smaller footprint and
| fewer total requirements is going to be more likely to manage
| a vertical slice of some SAAS product.
|
| The reason things like github go down so often is _because_
| they are public /shared resources.
| kgeist wrote:
| >The reason things like github go down so often is because
| they are public/shared resources.
|
| Very much this. Managing shared resources at scale is
| pretty hard. We have a bunch of internal sites made by
| interns as part of their internships, and, funny enough,
| those sites have much greater uptime and appear more stable
| than our own multi-tenant SaaS solution made by seasoned
| devs.
| kgeist wrote:
| I've heard this argument many times before, but is there
| research into this? I.e. where they would compare uptime of
| cloud vs. on-premises across a wide range of companies.
| rirze wrote:
| I mean, you're going to get biased results, no? Only
| companies who are confident in self-hosting will self-host
| it. You won't have any real data about companies who are
| not confident in self-hosting maintaining their on-premises
| version of the software.
| NationalPark wrote:
| What do you do if the on-premises guy gets hit by a car and
| isn't in his office?
| kgeist wrote:
| There's IIRC 3 or 4 people in their department, they
| administer the whole building (wifi, security cams, LDAP,
| etc.), not only the on-premises servers. From what I
| gathered, our internal systems usually go down due to lack of
| disk space or some bug in the software which requires merely
| a reboot, it's not rocket science. Another thing is that our
| IT department (for internal systems) and the SRE department
| (for client-facing systems) have 24/7 on-call duty so it's
| unlikely that no one will respond.
| oriki wrote:
| The same thing that the cloud company would do. If there are
| other people there who share that guy's responsibilities,
| have them do it. If there aren't, you should have an on-call.
|
| Cloud just outsources that problem to another business. Sure,
| they have better reasons to actually cover those positions
| and make sure they have on-calls and backup and a disaster
| plan, but just because you pay extra money for it doesn't
| actually make it work better if the company underlying it
| sucks.
| snark42 wrote:
| > but I don't understand why a medium/large business would
| choose anything but an on-premises setup.
|
| Atlassian is in the process of killing the on-premise
| small/medium business option, already announced an EOL date.
|
| Move to the cloud, buy a 500+ user solution for a much higher
| price or migrate away are my choices. Of course I use the local
| database and have local services JIRA/Confluence talk to so
| it's not really an option to move to the cloud.
|
| I assume lack of competent on-site staff 24/7, having someone
| else to blame as well as lower costs are why people choose the
| cloud over on-premise though.
| originalvichy wrote:
| I am biased but I can tell you what works best for mid-large
| companies: having a solution provider. Basically a partner that
| hosts and maintains the instance and has enough Atlassian
| certified people to help you with any question so that you will
| never have to hire people to just maintain the beasts or tell
| you about features, tricks or plugins that could solve problem
| X.
|
| Experienced people hosting and tuning Atlassian products has a
| greater success rate than someone doing it alone for a large
| company. Almost every time I've migrated an old Atlassian
| installation under our wing it's given me shock how users have
| been made to suffer the loading times and perfs that come from
| underprovisioning (db or actual machine) and messy
| configuration. I'm not blaming the former admins but it just
| happens. Usually end users are happy after we clean the mess up
| and everything feels snappy.
|
| Disclosure: I've worked in this kind of expert role.
| crummy wrote:
| I can't see the difference between a "solution provider" that
| hosts your Jira and just getting Atlassian to do it. What's
| stopping the solution provider from accidentally running a
| script that deletes some customer's files and struggling to
| do a partial backup restore?
| snark42 wrote:
| I would assume the MSP is running a dedicated instance and
| can do a full/backup restore just for the user they're
| supporting.
|
| If it's some multi-tenant solution it's no better.
| originalvichy wrote:
| Correct. There are probably not a lot of MSPs that have
| so many customers that they need to share that much data,
| and their customers probably use MSPs for the strict
| purpose that they don't want to share things with other
| companies.
| originalvichy wrote:
| Because you can get the best parts of self-hosted and
| managed services. And on that backup question: self-hosted
| Atlassian is vastly easier to protect against disasters.
| The problem these Atlassian guys had arose from multi-
| tenant architecture. Usually managed service providers will
| host your stack on individual databases and VMs, and
| backing up the software is just a matter of taking pg_dumps
| and rsyncing certain directories (pretty old school) or
| just taking disk level snapshots.
|
| Many medium-large corporations have their own cloud
| environments that their IT Ops control. Solution providers
| can host Atlassian stacks on their own cloud environment
| where they are not affected by data privacy concerns (it's
| in their already green-lit cloud providers data center) so
| they can host it behind a firewall with only VPN access
| allowed. They can also do all the magic you can usually do
| with web software like put a frontend proxy in front of it,
| or use more flexible/legacy authentication methods. Not to
| mention that for example you could have a Jira Cloud that
| you would need to integrate with a SCM program. Jira data
| could be "OK" to live in the cloud but code would be a big
| no-no. These problems can be solved by having them all live
| behind the firewall.
|
| A competent managed solution provider also has consultants
| that can train or instruct on usage. It costs but it is
| simpler and faster than having to go through the forums or
| send a support ticket for every small issue to Atlassian
| itself.
| pphysch wrote:
| It seems like if you are going to pay for a bunch of SaaS
| seats AND a team of technicians/engineers for make it work,
| you might as well just do the latter and roll your own
| solutions...
|
| A lot of these SaaS are just glorified Rails apps with a
| patina of professional "security" and "reliability", and
| loads of extra junk that your co will never use.
| originalvichy wrote:
| Trust me, if someone could clone Jira and its functionality
| they would have done so already. Truth is that if you build
| one product for 20 years you have a giant lead in features.
| If all it took was having a Kanban board then Jira would
| have died years ago.
| yibg wrote:
| What do you do if your on prem setup lost data? There is an
| implicit assumption here that on prem is more reliable than
| cloud. Less downtime, less chances of data loss etc. Obviously
| it depends on which cloud product we're talking about but I
| don't think a blanket "my on prem goes down less and when it
| does go down I can get it back up sooner" is true.
| originalvichy wrote:
| I think for that question we also have to define on-prem just
| to be clear. To many on-prem means "own cloud subscription".
| kingofpandora wrote:
| Engineering mistakes happen.
|
| The most inexcusable thing is not communicating with the paying
| customers who have been affected for over a week.
|
| Atlassian's Global Head of Customer Success probably should have
| been fired but here she is promoting Atlassian Cloud on LinkedIn
| three days ago: https://www.linkedin.com/mwlite/in/gertie-
| rizzo-5b70061
|
| Actually reading a bit more, it seems like their customer team
| was partying in Las Vegas instead of taking care of business:
| https://www.linkedin.com/mwlite/feed/hashtag/atlassianteam22
|
| Priorities.
| naoqj wrote:
| _sword wrote:
| Fair criticisms on response times but regarding Vegas, it was
| their annual user conference last week in Vegas.
| madmulita wrote:
| They claim they test backups quarterly yet they don't have a
| procedure in place to restore the operation. We all know your
| backup is not tested until you restored everything
| successfully. This is not an engineering mistake, it is a flat
| out lie.
| iancarroll wrote:
| Well, their explanation makes sense. These are multi-tenant
| environments where not every tenant was affected; sensibly,
| the backups appear divided by environment, not tenant. You
| can't blindly revert to an environment's last backup in this
| scenario, although you'd think they would have done it
| before.
| julesallen wrote:
| No argument on the crappy comms.
|
| If I was in customer success at an enterprise vendor I doubt
| I'd be let anywhere near the tools to get this back up and
| running. These guys are generally in the way rather than
| helping in a situation like this.
|
| Head of engineering or some product rather than customer
| support? That might be a different outcome.
| iamtheworstdev wrote:
| Can confirm. Saw them there while I was on vacation.
| benreesman wrote:
| Jesus, if there was ever an example of the internet making
| the world smaller.
|
| When do execs living it up at the fucking Wynn Encore while
| the house burns down start to not get another job?
|
| They'll keep pulling this shit until it cost money.
| benreesman wrote:
| For clarity: I went through a period where some combination
| of self-indulgence and legitimate life crisis caused me to
| take my eye off the ball when it mattered.
|
| I'm still trying to kickstart a second act years later,
| because I'm trailer trash and it's hard work when you're
| that.
| syshum wrote:
| sales never takes the blame. If anyone is fired it will be
| scapegoats in engineering once they have busted their ass to
| restore their reward will be the door
| systemvoltage wrote:
| This is an engineering problem. They should own it and
| improve things, make sure it doesn't happen again.
|
| Also, GP's quote
|
| > Engineering mistakes happen.
|
| I don't like this statement because it offers consolation at
| the expense of unintentional normalization.
| nix23 wrote:
| Ever heard of Space Shuttle Challenger? You cant own it if
| your management is against it.
| syshum wrote:
| The deletion of customer data was engineering mistake, that
| is not what I was talking about
|
| The Negative fall out was not due to the deletion of
| customer data, as the Story and multiple customers have
| state the negative fall out was the SILENCE, which is Sales
| / Customer Service not engineering
|
| As the comment I was replying to noted while engineering
| was trying to recover from what might possibly be the
| biggest outage in the history of the company Sales was
| partying and not handling customer communications
|
| That (the failure to communicate with customers) should be
| a resume generating event of all leadership customer
| service / sales. It will not be however because sales will
| simply redirect their failure on to engineering in the
| exact same manner you just have
| buscoquadnary wrote:
| And coders that say all code has bugs are just defeatists
| that are trying to make excuses for being lazy.
|
| Sometimes manure will always hit the fan. Being robust
| means being able to handle that.
| jacksnipe wrote:
| I think this is obviously incorrect.
|
| Human error is probabilistic, and the probability of
| making an error cannot be zero.
|
| On the flip side, it's infeasible to use only provably
| correct systems; not lazy, but literally not a practical
| option due to compute costs, developer time, what formal
| techniques can even be applied to the problem at hand,
| etc...
| systemvoltage wrote:
| A culture where mistakes are taken too seriously or too
| lightly leads to problems. Also it depends on what stage
| of the product cycle (Innovation/Rapid Development vs.
| Robustness/Quality). I'd argue that Atlassian products
| should err towards robustness and high quality. Not
| trying to break any new ground.
| xyst wrote:
| Atlassian about to dip over the next few years as firms around
| the world slowly remove themselves from their ecosystem of
| products.
| jlawer wrote:
| Not to mention a CEO who is more interested in activities
| outside the company like the green energy transition and
| politics.
|
| As an Aussie I always wanted Atlassian to succeed as we have so
| few tech companies at that scale or larger. Now I view them as
| another Oracle. Now they innovate little, they keep ratchetting
| up prices, pushing deployments to cloud where they make more
| money. Nickel and dime you for what should be core features
| (SAML Auth?). They aren't coming up with anything new to keep
| the value in the ecosystem. They buy applications in, spend a
| little to make some cross integration and then drop down to a
| slower development Cadence.
| ekanes wrote:
| Right. Feels similar in a way to an ongoing conflict
| elsewhere... There is what happens now, and what happens over
| the next decade because people have lost fundamental trust in
| you.
| tpmx wrote:
| Their core customers are unfortunately just as dysfunctional
| and slow-learning. Think Boeing, etc. Witness:
|
| https://jobs.boeing.com/job/annapolis-junction/jira-administ...
| xiaodai wrote:
| Comes across as jerk. How can an outsider say things with such
| certianty?
| bluedino wrote:
| Regarding the backup restores:
|
| I once worked a company that had a data loss issue. There was
| nothing else we could do, we had exhausted every option we had
| over almost 40 hours. At the end of the second day, it was
| decided to restore from backup.
|
| We had done this before, as a test. It took about 12 hours to
| restore the data and another 12 hours to import the data and get
| back up and running.
|
| One small thing was different this time, and it had huge
| consequences. As a cost-saving measure, an engineer had changed
| the location of our backups to the cold-storage tier offered by
| our cloud provider. All backups, not just 'old' ones.
|
| This added 2 additional days to our recovery time, for a total of
| five days. Interestingly enough, even though we offered a full
| month's refund to all of our customers, not even half of them
| took us up on it.
| bombcar wrote:
| In these cases the best thing to do is just give every customer
| the full month refund; don't make them ask for it.
| sodality2 wrote:
| The best thing to do business-wise, or as a good faith move?
| rjmunro wrote:
| What's the difference?
| treesknees wrote:
| Good faith would be to lose all of that money to people
| who are already your customers.
|
| Business-wise would be to stay in their good graces and
| keep those customers by offering the refund, but you
| don't lose any money to those who either don't care or
| won't move to a competitor.
| function_seven wrote:
| 25 years ago the clutch in my beater truck was slipping.
| I was 16 years old, making $50 a _week_ and had very
| little in savings. I took that truck to a shop within
| walking distance of my job.
|
| 2 hours later I walked back to see what they found. I
| figured it would be several hundred dollars for a new
| clutch, and I'd have to borrow money or something to get
| it done. I talked to the owner who told be it was an
| adjustment on the cable. Just needed to be scootched up a
| bit and it was probably good for another 30k miles.
|
| When I asked him how much I owed, he laughed at me and
| said, "For that? Not worth writing it up. No charge. You
| want me to show you how to do it yourself next time?"
|
| The shop could very easily have charged me 1 hour of
| labor at their standard rate, maybe $75 or so. Plus a
| diagnostic or test drive fee. Whatever. He could have
| told me, "$123.98" and I would have paid it. I wouldn't
| even have been mad. But I sure as hell wouldn't have
| remembered the experience so clearly. Nor would I have
| told a dozen people over the years to take their cars
| there. And I definitely would not have driven 20 miles
| out of my way to return to that shop in the future years.
|
| Being cynical about this stuff will hurt your brand. It's
| not obvious. It doesn't show up on the earnings report as
| a line item. This is service segmentation that seems like
| a no-brainer to a clueless MBA, but actually matters in
| the long run. How people view your brand is immensely
| important.
|
| Not forcing customers you already screwed over to then
| spend more time chasing a refund is not only the right
| thing to do, it's also good business.
| heisenbit wrote:
| Reducing the impact analysis within a long running
| relationship to a single transaction is too narrow.
| People observe how other people are treated and draw
| their conclusions even if not impacted. People may
| tolerate some abuse but it moves them closer to leaving
| next time. Money lost in the outage may provide for a
| budget creation to look for an alternative.
| rgj wrote:
| A lot of people making those decisions don't care about a
| refund because it's other people's money anyway. In my
| experience only small companies care about that.
|
| Focussing on communicating open and honestly allows them
| to explain the crap they're going through because of your
| mistakes to their bosses, so in fact you can help them
| save their asses, and they'll save your ass in return.
| This is much more important and valuable than a refund.
|
| So you should ALWAYS communicate open and honestly, and
| offer the refund as an option for clients who do not have
| a boss to account to.
| bzxcvbn wrote:
| Not every business can afford to go one month without income.
| What's the best thing for customers? Have the business go
| bankrupt and irremediably lose access to the service?
| function_seven wrote:
| It's 400 clients, not all their user base. They can handle
| the lost income from a small slice of their customers for
| one month.
|
| And if they can't sustain that, then it's even _more_
| imperative that those customers migrate away.
| enra wrote:
| Atlassian had almost a billion in free cashflow last year
| and over a billion in cash. I think they should cover the
| whole year for these customers.
| miketria wrote:
| Hi, I'm Mike and I work in Engineering at Atlassian. Here's our
| approach to backup and data management:
| https://www.atlassian.com/trust/security/data-management - we
| certainly have the backups and have a restore process that we
| keep to. However, this incident stressed our ability to do this
| at scale, which has led to the very long times to restore.
| sizzle wrote:
| How's the atmosphere internally Mike? Must be crazy times
| there. I know this isn't your fault, so hang in there.
| Cheers!
| encryptluks2 wrote:
| You mean your poor practices and bad design. The only way to
| prevent this type of issue in the future is to admit the
| failures.
| farseer wrote:
| They have recently killed off on premise offerings, it's cloud
| only now. And this makes it harder to trust both the security and
| integrity of your data.
| ocdtrekkie wrote:
| The fact that a single bad script could delete 400 of their
| customers should be absolute proof they do not have the
| processes in place to be a steward of your data in the cloud.
| On-prem or bust.
| dangus wrote:
| On-premise just means that your overworked IT person is going
| to spend 5% of their time keeping your service maintained, at
| no point gaining any more than baseline familiarity with the
| product.
|
| On-premise isn't a magic pill guaranteeing 100% uptime and 0
| data loss.
|
| While on-premise may be a good choice in many cases, it's not
| like running on-premise business tools has no risk associated
| with that choice.
|
| Remember that the goal of a company is to sell the most
| product possible (output) with the lowest cost possible
| (input).
|
| Any Joe off the street starting their own business can pay
| Atlassian $0/month for up to a 10 users. On-prem doesn't
| compete with that.
| dzikimarian wrote:
| On Prem means you have control over spending. I calculated
| that if we've moved to the cloud, we would pay YEARLY as
| much as we spent on Atlassian licenses in last 5 years.
| That easily pays for the maintenance overhead on our devops
| team.
| [deleted]
| hrpnk wrote:
| Afaik, the Data Center option still allows for on-premise
| deployment, incl. Kubernetes and cloud deployments [1, 2, 3].
|
| [1] https://www.atlassian.com/enterprise/data-center
|
| [2] https://confluence.atlassian.com/enterprise/jira-data-
| center...
|
| [3] https://confluence.atlassian.com/enterprise/deploying-
| enterp...
| jmondi wrote:
| What blow's my mind is that Atlassian stock has barely taken a
| hit...
| ferdowsi wrote:
| The market will react at the next earnings report, not now. And
| only if customers start to bail.
| NineStarPoint wrote:
| Unless their revenue takes a long term hit over the outage, no
| reason for the stock market to care. There isn't news of people
| actually planning to stop using Atlassian products over this.
| The only direct consequence is going to be the one time payment
| of SLA credits. So I guess the part I find surprising is how
| little impact this looks like it will have on people using
| their products more so than I am that the stock market doesn't
| care much about this.
| mountainriver wrote:
| Yeah Atlassian is a corporate leech, you don't get away that
| easy
| pigtailgirl wrote:
| the stock is on a rally today - just goes to show - the market
| is full of surprise
| capableweb wrote:
| It was a long time ago individual stocks represented anything
| grounded in reality. People talk about "fundamentals" and so
| on, but that's not what the price is based on. I don't think
| anyone know why the prices move as they do anymore, as there
| are so many algorithms involved today, both manual and
| automatic ones.
| __app_dev__ wrote:
| Yeah, I place a Put option order yesterday. By end of day I was
| up over 50% and now down to 50% of what I original purchased
| the Put at because it went up 5% today.
|
| Oh well, better luck next time.
| devmunchies wrote:
| it was at $317 on the day of the outage and now at $278.5. A
| ~12% drop. You're right, not much of a drop for such a large
| outage.
| [deleted]
| __app_dev__ wrote:
| The outage did not impact the stock, most major tech stocks
| have taken a large hit in the past week and a half (until
| today).
|
| This even is not even showing on any financial news site. I'm
| still hoping it does and the stock goes down because I place
| an option order yesterday betting that it goes down by next
| Friday. Seems like it won't now but the risk was worth taking
| in my book.
| mdoms wrote:
| Title is a bit misleading, there's no insider info here. This is
| all stuff we knew from the official statements, the blog post,
| reddit and twitter.
| rmbyrro wrote:
| Are Confluence pages and Jira tickets build like a GPT-3 300
| Terabyte model?
|
| I mean, I thought they were text.
|
| 5 days to restore text?
|
| They must be generated by a huge complex deep learning voodoo.
|
| Atlassian is working on the bleeding edge of technology. This
| outage is understandable...
| er4hn wrote:
| I suppose if they recover what they can and restore the rest
| using GPT-3 that may make the process easier.
| katbyte wrote:
| images and other files can be attached to issues or embedded in
| pages so a single instance can use a lot of storage.
| napolux wrote:
| Yeah, let's centralize the Internet (born decentralized). This is
| what the Internet has become.
| Crabber wrote:
| How do we solve this problem? In other industries based on
| physical products there is a big incentive to buy goods as
| locally as possible because of reduced shipping costs, shorter
| shipping time, no import taxes etc.
|
| But with software it costs nothing to spin up new instances,
| costs nothing to deliver half way across the world, and has no
| delivery time. How can you convince a manager to use a software
| solution provided by a local company when a company in a
| completely different country 600 miles away offers similar
| software with 5 extra features?
|
| It seems like the internet is now perfectly set up to create,
| for each software type, a single company that has a global
| monopoly.
| barneygale wrote:
| That's OK in principle, as long as those companies function
| like governments (i.e. they work to improve things rather
| than turn a profit, subject to constitutions, public voting,
| judicial review). As engineers we should embrace the
| efficiency of scale, but it's quite clear that it can't work
| under capitalism.
| Alex3917 wrote:
| A few years ago we didn't renew our subscription on time because
| we got the email over Christmas break, and iirc they deleted all
| of our data in less than two weeks. They were eventually able to
| manually restore it from backups, but they restored it
| incorrectly so there was a bunch of stuff broken. This whole
| thing isn't even remotely surprising to me.
| a2800276 wrote:
| You can sleep soundly: it seems like they back _everything_ up:
|
| > Second, the script we used provided both the "mark for
| deletion" capability ... (where recoverability is desirable),
| and the "permanently delete" capability that is required to
| permanently remove data _when required for compliance reasons_.
| The script was executed with the wrong execution mode and the
| wrong list of IDs. The result was that sites for approximately
| 400 customers were improperly deleted.
|
| > To recover from this incident, our global engineering team
| has implemented a > methodical process for restoring our
| impacted customers.
|
| [https://www.atlassian.com/engineering/april-2022-outage-
| upda...]
|
| Anyone else find it disturbing that they are able to restore
| data that they deleted permanently for "compliance" reasons? If
| this is true, how were they ever compliant? I guess data is
| only permanently deleted when the engineering team is following
| their typical, non-methodical process...
| notatoad wrote:
| No, I don't think that's disturbing. That's the point of
| backups - even when something is permanently and completely
| erased in the production database, it's still in the backup.
| Eventually it will get rotated out as the backups expire.
|
| Going back and purging things from the backups as part of the
| delete process would be overdoing it to a ridiculous degree.
| Delitio wrote:
| Nope it's not ridiculous. If you are only allowed to store
| data for x month that's it.
|
| It's your job to use technics which allow you to do this
| like using encryption on your backup and deleting the keys
| for it, for example.
| usefulcat wrote:
| > Going back and purging things from the backups as part of
| the delete process would be overdoing it to a ridiculous
| degree.
|
| Also, modifying backups is a great way to inadvertently
| hose your backups.
| yebyen wrote:
| I think that depends on what you mean by compliance. Some
| regulations require you to irreversibly destroy data when
| they prescribe the destruction of that data.
|
| That can mean as much as "you have to encrypt everything
| with a separate key, so that you can destroy the key for
| the given (say, personally identifiable) dataset making its
| retrieval irrecoverable"
|
| I'm not saying that's the particular compliance reason they
| had here, or that the analysis you're giving is wrong,
| either. There is an interpretation where either of these
| ideas could be the correct one.
| a2800276 wrote:
| "permanently delete" strongly suggests to me that it was
| the "medical and financial data" kind of compliance. If
| data can be restored, it's not permanently deleted. But
| this was a statement from the CEO, so words can have
| arbitrary meaning :)
| notatoad wrote:
| "permanently delete" does not mean the same thing as
| "immediately delete". deleting from the live database is
| the first step of a permanent deletion, as long as the
| data exists somewhere the deletion process is still in-
| progress.
|
| there's a whole lot of people in here who are way too
| quick to assume that just because one part of a permanent
| deletion process was inadvertently triggered and then
| caught while they still had backups, their whole
| permanent deletion process is a lie.
| voxic11 wrote:
| https://ico.org.uk/for-organisations/guide-to-data-
| protectio...
|
| You seem to be right-ish, while the gdpr in certain
| circumstances allows you to keep backups of data that
| should have been deleted it seems like they are trying to
| discourage it in the future.
|
| > ...It is, however, important to note that where data
| put beyond use is still held it might need to be provided
| in response to a court order. Therefore data controllers
| should work towards technical solutions to prevent
| deletion problems recurring in the future.
| jacobsenscott wrote:
| A better way to do this sort of thing is not an actual
| "delete", but a "cryptographic delete". The data should
| be encrypted, and you just delete the key. The data is
| then unrecoverable everywhere, including backups. Of
| course you probably don't want to just nuke the key, but
| disable it for some period of time, and then nuke it.
| nindalf wrote:
| This is why regulations specify that data must be
| destroyed within a time period, typically 90 days. It
| gives enough time for backups to rotate out.
|
| If this weren't a concern, regulations would demand
| immediate deletion of data.
| deepspace wrote:
| I asked the same question yesterday, and the responses were
| food for thought.
|
| If you make backups, you are, almost by definition, unable to
| perform a full 'Compliance Delete' before the oldest backup
| in the set has expired.
|
| Compliance-based deletion, if it is offered as a service, is
| almost always something time-based, like "we guarantee the
| data will be deleted 7 years from now". And then that
| deliberate deletion step is baked into the backup process.
|
| So, i.m.o. at best they misrepresented the nature of the
| compliance deletion process. It never did what it was
| designed to do.
| luhn wrote:
| It's generally recognized that deleting data from a backup
| would violate the integrity of the backup, so allowances are
| made. Usually you have to make sure the data is deleted as
| part of the restore process. For example, from CCPA:
|
| > If a business stores any personal information on archived
| or backup systems, it may delay compliance with the
| consumer's request to delete, with respect to data stored on
| the archived or backup system, until the archived or backup
| system relating to that data is restored to an active system
| or next accessed or used for a sale, disclosure, or
| commercial purpose.
| jacquesm wrote:
| Did you continue as their customer after that?
| Alex3917 wrote:
| Nope. I exported our data after they restored the backup and
| then we cancelled less than a month later. Like I obviously
| understand suspending our logins, but why would you ever
| delete someone's data when it's literally only 160 KB of
| text? The whole thing made zero sense.
| Kwpolska wrote:
| > why would you ever delete someone's data when it's
| literally only 160 KB of text?
|
| Compliance? The contract has expired, so there's no legal
| basis for them to keep your data?
| usefulcat wrote:
| Seems like that could be addressed with some fine print
| in the initial agreements. "In the event that you stop
| paying us, we may keep your data for up to N days unless
| directed otherwise by you"--or similar.
| bzxcvbn wrote:
| Why would they bother?
| herpderperator wrote:
| I don't think people write code saying "if accountSize <
| 160kB { skipDelete() }" - THAT would make zero sense. So,
| the size is not relevant here. The process was likely to
| delete data after some event occurred, or lack of event
| occurred.
| hinkley wrote:
| Someone somewhere got a promotion sooner because they
| lowered the slope of a line a little bit.
| hallway_monitor wrote:
| Or some overzealous engineer said hey guys let's delete
| all data 7 days after an account is canceled. This is
| called over optimizing.
| dangrossman wrote:
| Such a decision is just as likely to have come from the
| legal/compliance team as an engineer. Data you no longer
| have clear consent or a legitimate business need to store
| is a liability, and if you operate in Europe, potentially
| illegal to continue storing.
| nemo1618 wrote:
| After I met my now-fiancee on OkCupid, I deactivated my
| profile, turned off notifications and forgot about it for a
| while. A while later, I thought it be nice to revisit the
| first messages we sent to each other, only to find that...
| OkCupid had deleted both of our accounts. They didn't give
| me any advance warning, either, because I turned off
| notifications, remember? :^)
|
| I'm still kinda salty about it. I understand why big
| services can't retain data indefinitely, but like... it's
| just a few KB of text, and that text happens to have a lot
| of sentimental value. Besides, OkCupid _knows_ that I
| deactivated my account _because I am a success story_ --
| why not hold onto those profiles a bit longer? Or better
| yet, how about emailing an archive of those messages
| immediately when you click the "I'm leaving because I'm in
| a happy relationship now" button? /rant
| dangrossman wrote:
| With GDPR, privacy regulations and data breach
| regulations sweeping the globe, holding onto unnecessary
| data is a huge liability. Getting rid of data you no
| longer have clear consent to store, or which you're
| unlikely to have a clear business need to continue
| storing, is a sign of a good company these days.
| jacquesm wrote:
| True, but likely not _this_ kind of data.
| callalex wrote:
| Yes, this kind of data. Your OkCupid account has all
| kinds of information about who you associate with.
| cto_of_antifa wrote:
| [deleted]
| RomanPushkin wrote:
| > I've never seen a product outage last this long
|
| Title should be "Inside the longest outage of all time", without
| "Atlassian" word in it
| k8sToGo wrote:
| If I remember correctly, many years ago PSN was down for
| months.
| nemothekid wrote:
| > _Most of them said they won't leave the Atlassian stack, as
| long as they don't lose data. This is because moving is complex
| and they don't see a move would mitigate a risk of a cloud
| provider going down._
|
| I still don't understand the strangehold JIRA has on some
| clients. I can't quickly think of another SaaS product that could
| be down for almost 2 weeks and not have most customers leave.
| brimble wrote:
| If they don't lose data, two weeks of downtime every few years
| might be cheaper than the cost of switching. Plus, it's not
| like you know the thing you switch to will be any better, if
| it's another SaaS.
| user22 wrote:
| Let's say we have an announced release schedule on may 1st.
| With the tools down, there is no way to meet that date. For a
| 4 billion dollar company, this can make a huge difference in
| revenue. For a public company, the stock will definitely drop
| when it's announced the revenue goals were missed because the
| tools were down.
|
| For companies of size, the cost of tools being down for 3
| weeks can easily be in the multi-millions of dollars.
| brimble wrote:
| Again, part of the trouble is it's hard to gain enough
| _certainty_ that the thing you switch to--self-hosted, or
| another service--won 't be _at least_ as bad. You can look
| at their past record, but then, when 's the last time
| Atlassian had _this_ happen? (or maybe they 've been having
| similar issues every year or two and I've just not noticed,
| in which case, yeah, it's probably a safe bet that
| switching to almost anything else would be an improvement)
| tyingq wrote:
| >I still don't understand the strangehold JIRA has on some
| clients.
|
| - Integrations with things like the source code repos, incident
| management systems, confluence or other wikis, Slack, etc.
| Moving away from Jira creates a bunch of dead links.
|
| - Internal dependence on complex workflows and state transition
| rules that are implemented in Jira.
|
| - Various very customized reports that leaders depend on to
| make decisions, despite the often dubious value and/or
| accuracy.
| femto113 wrote:
| Many years worth of source code filled with comments like
| // if we don't toggle bit 7 here 10% of transactions will
| fail on Thursdays // see JIRA issue BIGPROJ-12654 for
| detailed discussion
| DannyBee wrote:
| Having migrated bug systems for very large, very old code
| bases before, it's pretty easy to make the URls and links
| like this still go to the right place.
|
| This is actually the least difficult thing, i would say ;)
| ReidZB wrote:
| When we migrated away from JIRA, we scripted it such that
| the JIRA issue numbers were recorded in the newly migrated
| issues exactly because of things like this.
| dirtybirdnj wrote:
| >I still don't understand the strangehold JIRA has on some
| clients.
|
| But it's got what plant's crave...
| z58 wrote:
| I imagine most people used something like Google Sheets during
| the downtime
| chupchap wrote:
| A lot of companies have integrations to atlassian suite which
| might not be easy to shift from.
|
| Secondly, there are a lot of individual competitors to Jira,
| Confluence and Bitbucket but which competitor can offer all
| three under a single invoice? May be Microsoft, can't think of
| anyone else.
|
| Also for such an extended downtime the customers are entitled
| to a discount or a credit note which a lot of CXOs consider in
| their decision making.
| krinchan wrote:
| We are in a similar place with Slack. We moved from HipChat
| to Slack and that was painful enough. Then the company
| noticed we get Teams for "free" and they tried to push us
| over to it. But folks have so much automation (because
| "ChatOps" is that new new) that is pushing things into Slack
| the company eventually gave up.
| judge2020 wrote:
| > May be Microsoft,
|
| Is there a Jira replacement/offering in the Microsoft 365
| suite?
| iameoin wrote:
| Microsoft has Azure Devops Boards that is similar:
| https://azure.microsoft.com/en-us/services/devops/boards/
| ralgozino wrote:
| GitHub / GitHub Enterprise?
| muricula wrote:
| visual studio online is what it was called internally, the
| marketing may have changed. It's okay, and is what
| was/probably still is used at MS internally to develop
| windows.
| lfpeb8b45ez wrote:
| Azure DevOps is really underrated:
| https://www.thoughtworks.com/radar/platforms/azure-devops
| travellingprog wrote:
| Never used it, but looks like Microsoft Project would fit
| that box.
| HeyLaughingBoy wrote:
| Oh god, no. We moved from Project to Jira and life became
| immeasurably better!
| shadowronin42 wrote:
| Azure DevOps has boards and tickets and whatnot, so
| probably that?
| encryptluks2 wrote:
| Atlassian sells to execs and gives kickbacks. You don't want to
| burn the company that gave you money and that you pushed
| through although you knew they sucked.
| jasd wrote:
| Even if they don't, I imagine they will have conversations
| internally to see what's feasible. It's just really difficult
| for an organization to move away from a product that everyone
| has learnt how to use. The company I work for is struggling to
| move away from something as simple as a collaborative editor,
| when I feel like I find no difference between the two products.
| travisgriggs wrote:
| > Atlassian is a tech company, built by engineers, building
| products for tech professionals.
|
| I am curious if anyone can provide any more insight on this
| simplification.
|
| I've worked at companies like this. Originally a core of
| motivated creative individuals make a cool product. As the
| business grows rapidly, Pournelle's (Iron) Law (of Bureaucracy)
| takes over. For a variety of reasons, the very capable creators
| depart and are replaced by less motivated/aware individuals who
| are glad to have a job and easily compelled to do things to the
| product that probably should not be done.
|
| My guess is that while Atlassian may have originally been one of
| those cool founder places, it has probably morphed into the more
| incompetent version that comes with scale all too often. But I
| don't know. Thus my question if anyone can speak to the true
| current tech capabilities of this company.
| cdjk wrote:
| This isn't the longest outage - last time they couldn't recover
| and recovered data from email archives.
| elesbao wrote:
| In a side note that someone else already made: it is interesting
| to see that many companies that uses JIRA also uses Slack but the
| noise/complaint/mentions comparing when Slack is down is way
| different. I barely saw people complaning.
| caymanjim wrote:
| I dunno about everyone else, but I'm generally frustrated and
| feel blocked when Slack is down, and I celebrate Jira being
| down because I've never had a pleasant experience using it.
| Jira is bureaucracy that gets in the way of me getting things
| done, and Slack is a critical communication path.
| mountainriver wrote:
| Yup Jira is bureaucracy incarnate. Middle managers love it
| though
| elxr wrote:
| Same here. I actively made an effort to tell coworkers how
| much I hate Jira. Hopefully new startups choose something
| more sensible.
| upbeat_general wrote:
| I don't believe slack has been down as long?
|
| Slack is generally much more critical than JIRA in order to
| keep working.
| adhesive_wombat wrote:
| It's it though? You can hop onto any of a constellation of
| other IM platforms, FOSS and not fairly quickly for an
| instant comms channel, even if you're missing the history.
| Having all your issue tickets missing is something you can't
| really deal with unless you have a very recent dump, and even
| then you can't just fire up Bugzilla and get something
| working without a lot of migration and administrative effort.
|
| You can do without JIRA for a week or two as long as managers
| understand and you all have a good concept of what work
| needed doing anyway. Then it starts getting dicey unless
| someone becomes a human JIRA to connect temporary manual bug
| tracking systems with everyone involved.
| adamc wrote:
| We have all sorts of slack channels set up to coordinate
| activity, so that internal customers can talk to engineers
| easily, or engineers can engage with each other. If slack
| goes down, we'd have to work all that out. For many days,
| it would be a huge drag on the process, slowing down
| interactions.
|
| Other IM platforms wouldn't solve that just by existing.
| Sure, in principle one could set up such channels
| elsewhere, but that takes time, and the communication about
| it takes considerably more time.
| adhesive_wombat wrote:
| Sounds like having a fallback pre-defined would be
| prudent if it's that important and you don't feel you
| could collectively extemporise something. "If Slack goes
| down, the plan is to use WhatsApp/Teams/Jeff's Matrix
| homeserver in his garage until service comes back. A list
| of group channels will be emailed if that happens."
|
| Then if it does go down, you don't have to waste the
| first day arguing about the plan.
| ineedasername wrote:
| Something to consider is that Jira can require a great deal of
| configuration to tailor it to your needs. If you already have a
| DevOps team of some capacity (not everyone does) then it may only
| be a small incremental increase to run thinks on prem. I did it
| myself: I'm ver much not a DevOps person, mostly unfamiliar with
| optimizing JVM parameters for apps like this, but it still only
| took me about 5 hours to get things running stable, and then
| another 2 hours or so a few weeks later to tweak things like heap
| size to help things go a bit faster (though it was still somewhat
| slow)
|
| To be complete open though I don't know how much DevOps overhead
| is involved in maintenance or feature updates. I hated the app
| and used it for less than a year so I didn't have much exposure.
| I guess my point though is simply that you may not need to use
| their SaaS option if you have a decent DevOps team already. After
| the initial setup time I doubt I spent more than half an hour a
| month managing the internals and updates.
|
| I _did_ spend more than that on configuring the system for use,
| which you 'll need to do regardless.
| thesh4d0w wrote:
| Atlassian has EOL'ed their non-cloud products
|
| https://www.atlassian.com/migration/assess/journey-to-cloud
| originalvichy wrote:
| I have had to correct this too many times already. Server is
| the name of the deployment type of their on-prem. It means
| single node non-clustered. Data center is their deployment
| that supports clustering to multiple nodes (and used to
| support a few extra features). They are retiring the Server
| deployment type licenses and pushing everyone to data center
| or cloud. So no, they aren't EOLing their on-prem.
| bombcar wrote:
| The cost of server was lower and fixed (buy it once), the
| cost of datacenter is MUCH higher (minimum 500 users, pay
| per year).
|
| Which is even more amusing when you realize Server has been
| Datacenter with a fake mustache for years now.
| tedivm wrote:
| The datacenter product also seems geared towards people
| reselling Atlassian stacks. For example there's a company
| that offers HIPAA compliant Confluence (complete with
| signing a BAA, so you can actual store PHI on it). It
| doesn't seem like a great replacement for the server
| version.
| dzikimarian wrote:
| Our instance is half the size of minimal Data center
| license. For us and for many customers this is effectively
| EOL.
| tyingq wrote:
| Their on-prem options are being reduced down to one product
| with pretty high minimum spend numbers:
|
| - 500 users (Jira Software, Confluence, Crowd)
|
| - 50 agents (Jira Service Management)
|
| - 25 users (Bitbucket)
|
| https://www.atlassian.com/migration/assess/journey-to-cloud
|
| https://www.atlassian.com/migration/assess/compare-cloud-dat...
| k8sToGo wrote:
| I am _Dev_ Ops, and not Ops. So I try to not waste time with
| self hosting as much as possible.
| h2odragon wrote:
| > However, if they [restore backups], while the impacted ~400
| companies would get back all their data, everyone else would lose
| all data committed since that point
|
| OK, so you restore backups to a separate system, and selectively
| copy the stomped accounts data back to production. Simple
| concepts aren't that simple at their scale, sure, but I suspect
| this is skimping details on some truly horrendous monolithic
| architecture choices that they're trying to hide.
|
| Not that I ever thought using their products was a good idea; to
| be clear about my position... But at this point anyone continuing
| to rely on them for anything is asking for the suffering they'll
| get. Signing up for their crap for a vital business function is
| like offering your tonker to a snapping turtle.
| throwaway894345 wrote:
| I would really like to understand who makes the decision to
| purchase JIRA. It's like the C++ of ticketing software--it does
| everything because no one wanted to sit down and think
| critically about the use cases and instead decided it would be
| easier to say "yes" to every single feature request. It
| definitely feels like whoever is buying JIRA is not on the team
| who is using it (maybe IT or finance) because it ticks their
| boxes and it has such a huge list of features that _nominally_
| it appears to tick the product development boxes (ignoring more
| subjective concerns like "quality", "performance", and
| "usability").
|
| I would really like to try working in an organization that uses
| something simpler, like Trello (although now that this is also
| an Atlassian property, maybe not exactly Trello?).
| robertlagrant wrote:
| The reason to buy Jira is that loads of stuff integrates with
| it, and lots of people know it. Maybe not perfect, but that's
| why. And unless you're in it all the time, which some people
| may be, its ergonomics are not as important as, say, an
| IDE's.
| tenacious_tuna wrote:
| My relatively small team at a massive enterprise built all
| our report generation tools around JIRA for an entire class
| of offerings. It's been easier for them to justify continuing
| to pay for JIRA and keep it propped up than to develop (or
| migrate to) a new solution.
|
| As the lone dev on the team I've been continually astounded
| by my leadership's willingness to commit more and more to
| tech debt laden paths. The notion that _all software_
| requires maintenance is anathema to them, and it 's led us to
| be 'cornered' into decisions re: what software we can use /
| where we can invest our discretionary funding.
|
| Moreover, we're constrained by the parent mega-enterprise's
| software purchase policies; JIRA's already approved (and run
| elsewhere in the enterprise), whereas off-the-shelf or SaaSy
| alternatives are significantly harder to get buy-in for. (No
| using corporate cards for SaaS, all purchases need to go
| through the quote/purchase-order process, etc).
| Cd00d wrote:
| Interesting take.
|
| Personally, I like JIRA. I think it adds a ton of
| transparency in our org, and while I've used Trello for
| personal and home projects, I don't see how it's good enough
| for business. Trello doesn't even allow for time estimates
| (last I tried), which for us is part of planning. Search in
| JIRA is also really good, so no ticket is ever just lost to
| the ether.
|
| Sure, it's not perfect, and waiting for a board to load is
| annoying, but for distributed work and visibility, I haven't
| seen something as professionally useful.
|
| Open to exploring though.
| stuff4ben wrote:
| GitHub Enterprise and ZenHub Enterprise work well for us
| here @IBM, not that I speak on behalf of them, just a drone
| doing work.
| Cd00d wrote:
| ZenHub looks really interesting - thank you for pointing
| it out.
|
| How good is ticket search? I have to be honest, JQL is
| the superpower that makes or breaks for me.
| x0x0 wrote:
| I made the decision, unfortunately. The rationale was
| literally that I hated pivotal tracker -- what a garbage app
| that is -- and I'd heard of jira, needed something to track
| bugs / work items, and signed up. It crucially had a zendesk
| -> jira sync, so all our zendesk requests could end up in
| jira.
|
| In the beginning, with me plus 2 engineers, I noticed it was
| slow but since I used it for 20 minutes a week, that didn't
| really matter. By the time I started using it for an hour a
| day, we had 10 engineers on 2 teams using it. I got to see a
| friend using linear, and I had some spare time that I was
| going to use to switch, but I couldn't get in the beta. By
| the time they let me in, the opportunity was over and I was
| too busy.
| BolexNOLA wrote:
| I really, really like Trello and am dreading the day when
| atlassian starts tinkering with it in any real capacity. As a
| content creator, it is the first workflow system I've ever
| seen that I can effectively share with my client. It's so
| simple and streamlined and the fact that I've stuck with it
| despite my ADHD says a lot.
|
| Clients add their notes to the card, I check the boxes as I
| hit the notes, and I move the card further right as we enter
| different stages of the post production process. We then have
| a column of every completed project, which is incredibly easy
| to sift through if we need to revisit something. It's
| literally left to right in the workflow, it visually is
| telling me where we are at all times.
|
| It's incredibly simple and elegant. For fast turnaround,
| relatively stripped down content (like podcasts) there is
| nothing like it.
| systemvoltage wrote:
| What's wrong with C++? Seems unfair to compare it with JIRA.
| throwaway894345 wrote:
| I was a C++ programmer in a past life and I sorta like it.
| C++ and JIRA seem to have the same philosophy with respect
| to choosing which features to admit: "yes". The idea is
| that by supporting the largest number of features possible,
| they'll surely build something that everyone likes because
| it will tick everyone's boxes. What people frequently fail
| to realize is that the absence of misfeatures or redundant
| features is an important feature in and of itself.
| Moreover, the more features you support, the harder it is
| to control for quality.
| hhmc wrote:
| > The idea is that by supporting the largest number of
| features possible, they'll surely build something that
| everyone likes because it will tick everyone's boxes.
|
| The idea that the C++ committee are unthinking people
| pleasers it patently false.
|
| C++ does have a lot of cruft, but mostly because it aims
| to: i) support new features ii) maintain pretty strong
| backward compatibility guarantees
|
| In general the new features are actually pretty well
| liked, but in conjunction with (ii) it creates a big
| language. There's a reasonably decent subset that can be
| carved out, but it's also clear why newcomers without
| legacy baggage (e.g. rust) are making inroads.
| throwaway894345 wrote:
| "unthinking people pleasers" isn't how I would
| characterize it; rather, I think of it more as a "kitchen
| sink" or "more is more" philosophy rather than a "less is
| more" philosophy. I'm sure the committee deliberated
| extensively, but deliberation within their particular
| philosophical context still produced an unpleasant
| result. I think the same is true of JIRA.
|
| EDIT: clarified wording a bit.
| aaaaaaaaaaab wrote:
| It's a HackerNews meme from people who never bothered to
| properly learn C++ and are angry that it's not
| JavaScript/Ruby/Rust/whatever.
| throwaway894345 wrote:
| I refer you to my sibling comment:
| https://news.ycombinator.com/item?id=31017079
| uuyi wrote:
| Ex C++ dev and ex JIRA admin. They are the same class of
| complete bananas.
| antiterra wrote:
| In a lot of ways, JIRA disrupted Remedy Action Request
| System, which had a painful transition from X to Windows
| client. Remedy was even more admin dependent and unwieldy.
| fnord123 wrote:
| > like Trello
|
| Maybe Asana or Monday would work for you.
| spookthesunset wrote:
| I find it helpful to stop thinking of JIRA as a bug tracker
| or anything like that. In my opinion JIRA is more of a way to
| create and track workflows. It can be used as a blank slate
| for quite a lot of things (which I cannot come up with any
| examples for at the moment!)
|
| That being said, because it can do anything, it doesn't take
| much effort to make a workflow as painful as possible.
| Somebody with the "right" mind might make all kinds of
| checkpoints in a workflow, which makes a lot of operations a
| pain in the ass because you wind up hopping through a bunch
| of steps. Pretty sure in our org we just make our workflow
| "you can hop from any state to any other state"--basically a
| free-for-all.
|
| Dunno my point, but there you go!
| RandallBrown wrote:
| I think people buy JIRA because you can set it up however you
| want. I've seen it almost as simple as Trello and much more
| complicated. It doesn't have to be terrible, it just usually
| is.
|
| If JIRA didn't allow you to make it terrible, it wouldn't
| allow for some of the absurd things that people want it for
| and those companies might not buy it.
| a4isms wrote:
| They used to say of Microsoft Word, "Nobody uses more than
| 5% of its features, but every company uses a different 5%."
|
| The saying is apocryphal and unlikely to be accurate, but
| the shape of the thing its describing applies to almost
| every piece of enterprise software whether installed on-
| prem or SaaS.
|
| And as another comment points out, at Enterprise scale you
| can substitute "team" or "group" for customer. Every team
| might use a different 5%, and unless you standardize their
| processes, you have to buy the product that can accomodate
| all of their needs.
| grog454 wrote:
| >"Nobody uses more than 5% of its features, but every
| company uses a different 5%."
|
| >The saying is apocryphal and unlikely to be accurate
|
| Well its mathematically impossible to be accurate as soon
| as you have > 20 users.
| shukantpal wrote:
| False. If you have 100 features, there are nCr(100, 5)
| combinations of 5% features = 75287520.
| a4isms wrote:
| If it's no more than 5% of the features, it's actually
| n-choose-k(100,5) + n-choose-k(100,4) + n-choose-k(100,3)
| + n-choose-k(100,2) + 100! 75,287,520
| + 3,921,225 + 161,700 + 4,950
| + 100 ------------ 79,375,495
| moonbooth wrote:
| Only if you assume the 5% of features to be a contiguous
| block each time.
|
| However, if we assume there are, say, 100 features in
| Word (the real number is likely much higher), the number
| of combinations is orders of magnitude higher than 20.
| [deleted]
| robertlagrant wrote:
| A better counterexample is that one user could use all
| features.
|
| But your statement doesn't make sense; there might be
| millions of features, and trillions of ways to combine
| them to make 5%.
| KronisLV wrote:
| > Well its mathematically impossible to be accurate as
| soon as you have > 20 users.
|
| It's probably in the semantics.
|
| Text input and editing is clearly a part of functionality
| that's probably used by everyone (or at least most
| users), so it's not possible for "different 5%" to mean
| what you're alluding to, maybe the phrasing needs work.
|
| In any given 5% there might be 1-4% of overlap with what
| others are using and the remainder of that is specific to
| the company.
| grog454 wrote:
| And the greater the degree of overlap the weaker the
| implicit argument.
|
| If it's a uniform distribution of discrete features then
| each feature is equally "important" and worth equal
| resources and dev time. If 81/100 companies use the exact
| same 5% of features and the remaining 19 cover the
| remaining 95%, then all else equal you can probably drop
| 95% of your features and still do well.
| a4isms wrote:
| The dynamics of the Enterprise market are such that there
| are features where having just one customer that will
| make a buy/no-buy decision based on just one feature will
| deliver enough incremental ARR to justify the opportunity
| cost of doing that feature instead of a bunch of others.
|
| Typically you do the most popular features first, but
| most Enterprise vendors end up working on a long tail of
| niche features that nevertheless are profitable.
|
| There's a long conversation to be had about how this ends
| up being a trap where Enterprise software gets bloated
| and shitty and eventually gets disrupted by a small
| vendor that does "less," but in a powerful,
| transformative way that obsoletes the Enterprise
| "standard," which leads us back to discussing Atlassian
| :-)
|
| They're a good example of this dynamic, because they have
| a "constellation" of products to sell. So if they build a
| niche feature that gets a new customer to buy Jira seats,
| having "landed" in the account, their salespeople can
| "expand" by selling OpsGenie and other related products
| very profitably.
| karaterobot wrote:
| The way it works is, someone always says "Sure, JIRA is bad
| out of the box, but you can customize it to work the way you
| want" and there is nobody around to say "so now you have two
| problems: a bad system that depends on having an expert to
| make it work the way it should".
|
| Then, you pay for JIRA, and that expert customizes it the way
| _they_ like. It still doesn 't work very well for most
| people. Nobody likes it except one stakeholder, and the
| engineering lead who acts as a admin on it. A while later,
| those people have left the company, and everyone else is out
| of luck.
|
| Seen this exact scenario play out at two different companies
| now. Am witnessing it play out in real time at a third.
| sam0x17 wrote:
| And yet, it actually is set up in an extremely opinionated
| annoying way. For example there is no way to actually assign
| multiple users to the same ticket, which is a big problem if
| your org legitimately does pair programming (mine does for
| juniors)
| robertlagrant wrote:
| Having a single owner for each ticket is not a bad idea.
| You can see contributors in git.
| Cd00d wrote:
| Why not just clone the ticket?
| BlargMcLarg wrote:
| Trickle down and first mover. JIRA was there first being
| "decently ok", enough people adapted it and now others do the
| same. Then couple with that what you write, the people in
| charge of deciding the software are generally the ones who
| can justify wasting half their day on it.
|
| To this day I still don't know what JIRA does so much better
| that other products don't which big corps are willing to
| waste months worth of manhours over. It's biggest selling
| point is integration with the remainder of the Atlassian
| stack, not exactly known for being great either.
| vikingerik wrote:
| Jira's big feature is being widely known. It's the modern
| version of "nobody ever got fired for buying IBM."
| csours wrote:
| "I am not a fish" - the people who buy it are not the people
| who use it. -
| https://www.ted.com/talks/seth_godin_this_is_broken
| prescriptivist wrote:
| This reminds me of one of my favorite HN comments of all
| time: https://news.ycombinator.com/item?id=16424423
| dangus wrote:
| The answer is medium to large companies. Jira is a tool that
| can satisfy hundreds of different teams' work management
| needs without having to buy dozens of different products.
|
| The fact that it's so feature packed and customizable is the
| point.
|
| I think the complainers are not really investing the time in
| to change project settings to fit their needs.
|
| My only complaint about the Atlassian suite is the
| performance of Jira and Confluence. The overall page load
| speed is too slow.
| matwood wrote:
| I agree. I look at every JIRA killer and think we could
| maybe move and nope...they're missing something we use. In
| many ways JIRA is like Excel. On the surface it can appear
| easy to replicate for a single user, then you realize every
| user uses 10 different features.
| macintux wrote:
| How do you change the markup language to be consistent
| between Jira and Confluence?
|
| How do you eliminate all non-task ticket types in a Jira
| board and allow any ticket to be a child of any other
| ticket?
|
| It's hard to configure away complexity from a product if
| it's designed to be complicated.
| Karunamon wrote:
| Re 1: I'm not sure why that's a necessity beyond a notion
| of consistency. I find that major wiki editors are not
| often major ticket creators, and these are different
| products with different audiences at the end of the day.
| Also, Confluence uses a WYSIWYG editor, so it's rare to
| need to think about the markup.
|
| Re 2: Set the project's issue type scheme to one that
| only allows tasks and subtasks. That gets you one level
| of nesting. (And even though task and subtasks are
| different issue types, changing from one to the other is
| trivial since they have identical fields.) Allowing epics
| gets you another at the top level. That's a bit limited,
| but wouldn't arbitrary nesting be even more complex?
| DocTomoe wrote:
| > allow any ticket to be a child of any other ticket?
|
| I have no idea why you would want this from a work
| management point of view, but you can just use issue
| linking to describe a parent <-> child relationship.
| chrisseaton wrote:
| > How do you change the markup language to be consistent
| between Jira and Confluence?
|
| This here is the single most insane thing about
| Atlassian.
| anecd0te wrote:
| ime people pick Jira because they've used Jira and have been
| promoted via the peter principle to the level at which they
| make purchasing decisions.
| brimble wrote:
| IBM effect. If you don't care _a whole lot_ about your
| ticketing system, you just pick Jira because everyone 'll
| nod along with the choice and you won't personally be
| blamed if/when it sucks, you won't make enemies or have to
| argue over the choice because it can't do something that
| someone else in the org "needs" it to do, et c.
| SatvikBeri wrote:
| At big companies I've worked at, the justification was that
| JIRA was the only one that met all the regulatory/compliance
| requirements. I don't know if this is actually true, but
| smaller companies certainly don't market compliance as well.
| JoBrad wrote:
| I was part of the decision to purchase Atlassian tools at my
| company. We had been using a variety of self-hosted and SaaS
| tools which had varying abilities to integrate with each
| other. We've had very positive feedback from users since
| switching to them. We were also able to move some of our help
| desks to JIRA Service Management, and away from another self-
| hosted product which is still used by a good portion of our
| business. The self-hosted product is honestly a nightmare to
| maintain and keep secure. According to the vendor, the "fix"
| is to have 1-2 people dedicated to that product, which simply
| isn't something that my team has the bandwidth or will to do.
|
| JIRA does try to be all things to all people...and mostly
| succeeds. For instance, we use the same workflow and mostly
| the same nomenclature across our development and helpdesk
| teams. Some of our software projects use Kanban-style
| workflows, while others use sprints, but we can keep track of
| a project across multiple teams using the same tools. I'm
| sure other products also offer this, but we liked the
| integration and overall capability for the price.
|
| There are definitely issues: some feature requests and bugs
| have languished in their backlog for years. But you can get
| started very quickly and we've had great feedback from users.
| throwawayboise wrote:
| The one place I worked that used Jira was a small-but-not-
| tiny company (about 15 devs at the time). The only people who
| actually used Jira were the managers. Developers got printed
| stories. These were used for planning, and were printed on
| cards and taped to a white board when ready. Developer would
| pull a card to work on, and return it to the manager when it
| was complete. The manager did all the status updates and
| reporting to upper management.
|
| IDK if this was to cheap out on the licensing with a minimal
| number of users, or if it was to insulate the developers from
| the experience of using Jira. Perhaps some of both.
|
| Clearly that usage pattern would only scale so far.
| notreallyserio wrote:
| JIRA is generally fine software that is good enough for most
| folks, especially if you're willing to adapt your workflow to
| it. Where it goes wrong is where tools like Jenkins go wrong:
| folks add too much customization.
|
| That means the tool is often the wrong one for the job, but
| instead of picking something that's a better match out of the
| box folks stick with the easy choice (extend what they have).
| closeparen wrote:
| JIRA is a framework for making assembly lines out of
| knowledge workers. When you're a middle manager at a decent
| sized company, a major problem you face is that the mass of
| knowledge workers beneath you are _opaque_ : you have no way
| of knowing whether they're working or not. Another problem
| you face is that they're _uppity_ : people who went to
| college and got used to managing their own time now have all
| kinds of idiosyncratic ideas about how to manage their own
| time and arrange their own working lives. Since you are a
| middle manager you despise local differences. Since you are a
| manager you're pretty sure that only you and your lieutenants
| can be trusted with this kind of decision making power.
| Adopting JIRA is a powerful level to put people back in their
| place as work item churning machines. Constraints such as
| only certain people can create or assign tickets, only
| certain people can mark them completed, only certain states
| are valid transitions from other states, etc. implement a
| level of domination over white-collar workforces that
| managers would be otherwise uncomfortable asserting face to
| face.
|
| Other ticketing systems do not work nearly as well for this
| purpose because they are designed mainly as external brains
| or communication platforms for workers, and they assume a
| level of worker autonomy in moving tasks through their
| lifecycle. In Trello you cannot make it so that a PM has to
| sign off before a card is moved to the in-progress column, or
| that only in-progress cards can have code reviews associated
| with them. JIRA eats these kinds of requirements for
| breakfast.
|
| EDIT: This is not to say you _can 't_ use JIRA in a workflow-
| neutral way, or that everyone uses it for this reason, but I
| would submit that it's JIRA's differentiated advantage.
| TheRealDunkirk wrote:
| Even worse, companies with the resources to buy JIRA will
| probably hire consultants to set it up, and you wind up
| with a system 1) bought by people who don't understand how
| programmers work, 2) configured by people who don't know
| how your company works. So end users usually wind up with a
| terrible system that continually generates complaints
| (along MANY axes), and the people responsible for foisting
| it on them think they're just being difficult.
| mikepurvis wrote:
| So I would say that this assessment is on the whole, kind
| of cynical, however I suppose I have the interesting
| position of being in an organization where I feel like I
| actually see _both_ JIRAs.
|
| One JIRA is the project that's used for development of the
| core product, where there are no constraints-- anyone can
| add a comment, create links, change assignee, add new tags,
| push the tickets through whatever state transitions they
| want, and so on. It works, though it is a little chaotic
| sometimes as subgroups of people have different preferences
| for how things should go (eg, for tickets requiring test
| team validation, should the ticket assignee remain as the
| person who did the original work so it's clear who has more
| to do if it fails validation, or should the assignee change
| to the test team person, so that it's clear that that's the
| next person who has it as an action item?)
|
| The second JIRA is the IT team's internal support project,
| which is completely locked down-- no one except them can
| close tickets or move them around, or even edit the
| contents, closed tickets can't be commented on any more,
| and so on. This is the one that gives me the vibes you are
| talking about. Every time I have to interact with it, I
| loathe it because every inch of it is transparently a
| funnel, railroading me along a path toward one of either
| DONE or WONTFIX. This is absolutely _efficient_ , in the
| sense of meeting the goal of closing all the tickets, but I
| feel it introduces friction for the larger business goal of
| actually helping people resolve their problems. To the
| point where eventually most of the IT support activity
| moved away from the JIRA project to an informal Slack
| channel, which is way more accessible, but worse in
| basically every other way: it's harder to effectively
| search, impossible to properly link, bad for async, bad for
| dealing with more than one thing at once, etc.
| codycraven wrote:
| It sounds like you've been hurt by the some terrible
| management practices, I'm truly sorry that some managers
| think their job is to control their subordinates.
|
| However, regarding ticketing systems, in team environments,
| it is very effective and helpful to have a system that
| manages the data about the work that has been completed, is
| being worked, and is planned to be worked on .
|
| Part of that system might be defining restrictive workflows
| for some teams, not for control, but to ensure the agreed
| upon process is followed for quality or consistency.
|
| One of the many problems Jira has is that if you don't have
| a Jira admin on your team, it's impossible to build an
| effective and efficient workflow for your team. Coupled
| with Jira making many things global by default (it takes a
| lot of care to make a change that only affects specific
| Jira projects) most configurations end up being a pile of
| garbage automatically inherited from changes an admin(that
| is not part of the team) made when intending to change
| something for another specific team.
| agalunar wrote:
| Caveat: this is going to be a meta comment rather than a
| comment about the topic proper, and so maybe not
| appropriate for HN, but I think it's worth discussing.
|
| > It sounds like you've been hurt by the some terrible
| management practices, I'm truly sorry that some managers
| think their job is to control their subordinates.
|
| When we assume someone was hurt, and imply they hold an
| opinion only because they were hurt, we risk
| delegitimizing their position. The interpolated message
| we might be sending is "your experience is personal and
| not representative of the subject at hand, and so your
| thoughts are only applicable to your situation; so, after
| we express our sympathy, your thoughts can be dismissed."
| Or the message we might be sending can be patronizing:
| "you hold your opinion for emotional, rather than
| rational, reasons; I'm sorry that you are so
| unfortunate."
|
| To be clear, though, I'm sure this wasn't your intent,
| and it makes me glad to see someone being compassionate
| (i.e. that you bothered to consider the experiences and
| feelings of the parent commenter).
|
| A personal story: I was raised devoutly religious but
| left the church in my twenties. My family and friends
| assumed I left because I wanted to be free from guilt,
| had been hurt by a culture that belied the doctrine, and
| so on (and they said as much). My change of belief
| occurred after recovering from a few years of mental
| illness, and while it is true that I may not have left
| _when_ I did were it not for the opportunity to reexamine
| my beliefs (while trying to piece back the fragments of
| my life into a sense of self), the reasons _why_ I left
| were the result of a lot of research and thinking. It was
| mildly frustrating when people assumed my decision was
| made for emotional convenience, when in reality, the
| research was uncomfortable and contemplating an
| unfamiliar universe was scary.
|
| I recognize the irony here - the issue I'm highlighting
| in this comment may be something that only I feel is an
| issue, born from a personal experience. But I _think_ it
| 's more common than that.
| [deleted]
| liamwire wrote:
| I sincerely appreciate your articulation of this, thank
| you for taking the time.
| closeparen wrote:
| >ensure the agreed upon process is followed for quality
| or consistency
|
| That is what I mean here by "assembly line" and
| "control." Making sure that processes lead and
| individuals follow.
|
| Citing consistency as a terminal value in the same breath
| as quality is also exactly what I mean by the middle-
| manager aversion to local differences.
| chousuke wrote:
| Beyond trivial scale, you need good processes so that
| individuals can do their jobs. If you have no processes,
| change and development becomes _extremely_ difficult
| because people will be hunting for documentation all the
| time, stepping on each other 's toes, and making mistakes
| that they should not be making because they forgot a
| trivial procedure that was a prerequisite to solving
| their actual problem.
|
| I work with a variety of different environments, and
| depending on the environment I can either solve my
| problem in minutes and get it deployed in another few
| minutes _or_ solve the problem in minutes and spend hours
| figuring out how to safely deploy it without breaking
| everything. JIRA is terrible if you do anything that it
| offers by default, but when used properly it can
| absolutely help with this.
| baq wrote:
| To add to that, and perhaps educate your downvoters a
| bit, it can be very hard to imagine why or when such
| strict processes are helpful without having direct
| experience with organizations of sufficient scale. It
| literally boggles the mind but the process truly is king
| when there are hundreds (or thousands) of individuals
| working on a single product.
| hn_go_brrrrr wrote:
| Agreed. An essential part of blameless engineering
| culture is "the outage isn't any one person's fault, it's
| the fault of the tooling and processes for allowing them
| to do that". Good processes prevent everyone from making
| the same mistakes.
| tyingq wrote:
| >However, regarding ticketing systems, in team
| environments, it is very effective and helpful to have a
| system
|
| I think the point is that Jira is particularly granular
| in the way that it lets you do things with permissions,
| workflow rules, roles, metrics, etc. There's a fair
| number of places that use that granularity to create a
| weird digital sweatshop.
|
| Meaning the complaint is more about really deep _"
| micromanagement as a service"_ than what you might get
| with lighter tools.
| brazzledazzle wrote:
| Micro managers are everywhere, even in places that may
| seem culturally incompatible. I've yet to work for a
| business that prioritizes regularly evaluating managers
| for their management skills. It's only addressed when
| shit really hits the fan. Managers are primarily
| evaluated by their own managers on deliverables. As long
| as they're getting results and entire teams aren't
| quitting simultaneously there's no need to question
| anything. As long as a manager is toxic in ways that
| don't break the law or violate major company policies any
| attempt to address this by a direct report carries the
| risk of termination or retribution. Does it contradict
| your company's cultural values? Rules for thee.
|
| And I wouldn't assume you're not one of them. The worst
| cases I've run into aren't even the psychos that embrace
| micro management as part of their "management style".
| It's the ones that genuinely believe they aren't engaging
| in the behavior. They're not micro-ing, they're "helping"
| their team because they are an awesome manager and their
| team is _almost_ awesome, they just need to be monitored
| very carefully and given "suggestions" until they nail
| it. But they'll never nail it. Because no one is as
| smart, experienced or does a task "just so". They view
| themselves as a mentor to all. All decisions must be
| theirs to make. Jira becomes the perfect tool since the
| team effectively becomes little boxes that accept tickets
| or stories and return work both performed and delivered
| as specified.
|
| For any managers reading this that don't see a problem
| with this or see some of those behaviors in yourself
| please understand that you are sacrificing your team's
| happiness and motivation at the altar of your own
| insecurities. No one can grow where they're not trusted
| and no one can improve their skills when they're never
| given latitude to make meaningful decisions. Your people
| will make mistakes. They will accomplish things in ways
| that are different from how you would do them. It might
| even be objectively worse. That's ok. That's how you grow
| into a strong team with confident members.
| mistrial9 wrote:
| I was told by a lifetime manager turned successful
| consultant, that roughly fifty percent of engineering
| firms govern their engineers basically using fear.
| ornornor wrote:
| > using fear
|
| Could you elaborate? What kind of fear? "You're fired"? I
| wonder how effective it actually is because of the
| current job market and also because I (and others) react
| very poorly to this kind of tactics: "you want me to fear
| getting fired? Joke's on you, please DO fire me, I dare
| you"
| KronisLV wrote:
| > I wonder how effective it actually is because of the
| current job market
|
| Counterpoint: software developers aren't necessarily well
| paid or highly regarded _everywhere_ , since remote
| working for companies abroad hasn't quite gotten
| mainstream enough.
|
| So it might just be effective against some people, or in
| cases where the hiring process itself has become
| increasingly unreasonable - the job being working on
| boring CRUD apps but the hiring process being multiple
| stages of Leetcode and complex interviews.
|
| That's probably not applicable to everyone since plenty
| of folk can grokk Leetcode and find jobs without too much
| trouble, but i still recall "The Unseen 99%" article:
| https://www.hanselman.com/blog/dark-matter-developers-
| the-un...
|
| It probably applies to the industries and companies where
| devs are treated as a cost center and since those
| companies aren't all out of business, plenty of people
| must be working in such environments, with sometimes sub-
| optimal conditions.
| numpad0 wrote:
| I'm guessing it's a sort of a nerd shorthand for "various
| means that are accompanied with self confusion of users
| but not with strong rational or scientific or technical
| basis"
| zrail wrote:
| The perf process is basically one big exercise in fear-
| based control.
| 52-6F-62 wrote:
| Kanban, by design, was a tool used in production control.
| It's one of the ways Toyota made their JIT production
| function.
|
| I worked on the line (Toyoda Iron Works) and used a real-
| life Kanban implemented by the plant engineers. It was
| used for quality control, to broadcast quality control
| and station output, and was checked regularly against
| their internal estimates and baselines and used also as a
| gauge for employee output.
|
| Control is what it's designed to do. The very fact that
| Kanban is the tool of choice should support at least some
| of OP's points, objectively.
| [deleted]
| sjtindell wrote:
| Agreed. This is a problem of scale in my opinion. When we
| have 10 engineers, it is easy to check in with everyone
| and know what they are working on and get a status
| update. When we have 500 engineers, making sure all their
| tasks are aligning (organizations are one big race
| condition) is not just hard but impossible without some
| sort of tracking system. We all want to grow big. To do
| so, your processes need to change as you add more people.
| The exceptions (Valve, Netflix, etc.) that can handle
| being flat or semi-flat are very unique.
| biomcgary wrote:
| Are they unique because their problem domain allows it or
| because the leadership is uniquely ideologically driven
| (and competent) to implement efficient, flat systems?
| malermeister wrote:
| > ensure the agreed upon process is followed for quality
| or consistency.
|
| Isn't that just a more corporate way of phrasing
| "control"?
| robertlagrant wrote:
| Not in a negative way. You want to trust engineers to
| always have changes built and tested before they go to
| production, but when something egregious happens you need
| to go back and see what went wrong. You can choose to
| interpret that as control, but really the only
| alternative (often cited) is "Well that shouldn't ever
| happen, so you don't need tooling to support that
| situation".
|
| And that is not a useful way of thinking when you have
| real engineers writing software that people depend on.
| malermeister wrote:
| I think the problem is that the processes are often not
| _mutually agreed_ , but instead dictated by middle
| managers.
|
| JIRA then becomes a tool for enforcing arbitrary rules,
| e.g. control
| robertlagrant wrote:
| This is very likely even if engineers come up with the
| processes, unless all process is scrapped and done from
| scratch every time an engineer is hired.
| Rimbo wrote:
| Oh, nonsense. People buy Atlassisn because the licensing is
| cheap, not because it's particularly good at what it does
| or designed with any particular workflow in mind.
| Viliam1234 wrote:
| Cheaper than whatever is the open-source alternative?
| chaosite wrote:
| Sure, if you host it yourself you have to pay someone to
| admin it (usually significantly more expensive than a
| license), and if you use a hosted solution you have to
| pay the host.
| ivan_gammel wrote:
| Free software has zero acquisition cost, but non-zero
| TCO, which can measure in millions USD (recurring salary
| of dedicated IT team), depending on the size of
| organization and complexity of the setup. You will need
| to maintain on-premise infrastructure, automate backups
| and recovery, automate security, automate updates
| (including testing and rollbacks) etc etc, basically
| doing all the jobs of the people responsible for the
| infrastructure at the SaaS provider, but at much smaller
| scale and not achieving the same efficiency. You will
| have to do those jobs considerably better to justify the
| costs.
| mistrial9 wrote:
| in thirty years of experience, I see this talking point
| straight from Microsoft anti-Open Source days..
|
| > Free software has zero acquisition cost, but non-zero
| TCO, which can measure in millions USD
|
| Often a primary driver is exactly the opposite -- for-
| profit companies are accustomed to paying money for a
| good or service, with a billing pattern and legal
| obligations. The company financial deciders do not want a
| setup that does not have a billing pattern and clear
| legal obligations. Meanwhile, Open Source Software went
| from niche to mission-critical in the 2000s via the
| Internet. For-profit companies (and their publicists)
| scrambled to explain it, and came up with that exact line
| repeated again today. I do not blame any person for
| saying it, it was in print in some reliable place. It
| does not capture the reality in 2022 IMO.
| ivan_gammel wrote:
| To be honest, I do not understand your comment.
|
| > The company financial deciders do not want a setup that
| does not have a billing pattern and clear legal
| obligations.
|
| I haven't ever met a CTO or CIO, who would make budget
| decisions like that, neither I do it this way myself. The
| reality in 2022 is the same as it was in 2012 or in 2002:
| when you choose a solution, you consider all long term
| costs. In 2022 TCO for the server software includes
| everything that I mentioned in my comment and more.
| There's a lot of use cases for OSS in corporate
| environment, for sure, but not every OSS solution is
| cheap or even affordable. Running on-premise open source
| collaboration tool is certainly not cheap if you do it
| right.
| ofrzeta wrote:
| I don't see how it is cheap. Standard may be cheap but
| then you are missing a lot of features that are announced
| on the product pages with a small footnote saying "only
| in premium".
| _dark_matter_ wrote:
| I feel you here, but I've been at multiple companies that
| used JIRA and never once had any of those requirements.
| I've also never seen it come up when deciding which
| ticketing system to use. Teams have always been free to
| move tickets at-will.
| KptMarchewa wrote:
| One very large video game studio has tons of automation
| for Jira. Imagine someone deciding to add new weapon. The
| automation creates 100s of tasks for concept artists, 3d
| artists, animators, sound artists, software developers
| with complex dependencies better those. Most importantly,
| automation creates multiple QA steps for each element of
| completed work.
|
| The same exists for levels, enemies, quests and tons of
| other elements.
|
| I would not be surprised if a lot of studios had similar
| workflows.
| BlargMcLarg wrote:
| See, that is great. Automate what can logically be
| deduced from the information available and set up
| templates to provide that information. For developers, it
| should be automated enough you shouldn't have to write
| the same info twice, once in commit
| messages/merges/branch names, once in the ticket itself.
| If the workflow is so streamlined, all that information
| can be deduced and the ticket can be advanced
| automatically. Most information is available and
| documented for other parties.
|
| However, that's just not what most people go through in
| companies using JIRA. Worse, they have to toggle between
| pages multiple times, each taking at least a few decent
| seconds to reload. I'd like to give JIRA the benefit of
| the doubt here, but it sounds like the tool is just _very
| easy_ to misconfigure and abuse.
| robertlagrant wrote:
| This is pretty easy with Jira. There's a GitHub plugin
| which links PRs and commits to a ticket, and a GitHub
| plugin that links ticket numbers back to Jira tickets.
|
| And you generally do them both at a lower level than
| tickets, certainly commits, so you don't want to have too
| much automation between them as that starts adding
| constraints.
| theptip wrote:
| I think you've got part of the answer here, but are selling
| it short. Jira is the most complex task-processing rule
| engine that is also easy enough for a small team to
| operate, and also has the broadest set of integrated tools
| of any offering.
|
| You can use Jira as a simple Scrum board, a Kanban board,
| or you can build enforced-process monstrosities. You can
| build customer-support / internal-helpdesk workflows, or
| even model internal work-item-oriented business processes,
| etc. Now, as you point out, just because you can doesn't
| mean you should, and many orgs fall into the trap of making
| issue workflows overly-restrictive. But most companies (I
| believe) choose Jira before they choose those hairy task
| workflows. Startups with zero process use Jira.
|
| Also, you can integrate it all together to give good-enough
| dashboards/roadmaps, good-enough (for some, not me) docs
| integrations with Confluence, Git integration with
| Bitbucket etc. -- while there are big issues with these
| systems, I think it would be myopic to ignore the real
| benefits of working in one integrated stack where every
| design doc you write has dynamically-updated labels and
| auto-complete for each issue you type in.
|
| For context, I use Jira for tasks and don't love it, found
| Confluence to be really annoying and so I don't use it, and
| prefer Gitlab to Bitbucket, but I think you have to
| recognize these unique selling points. If all Jira had to
| offer was the rule engine it would not be as widely used.
| pid-1 wrote:
| Yeah my team uses Jira to keep track of what we are doing
| and what we need to do.
|
| Each member can actually organize their sprint and create
| tasks.
|
| Point assignment is not a big deal, it's just there so we
| avoid promising more than we can chew.
|
| I've found Jira really pleasant to use for lightweight
| processes.
| [deleted]
| richardw wrote:
| I'm just a user but totally happy with all our Atlassian
| apps. Confluence is a huge win across our multi-thousand
| person company and the best teams use it very well. I like
| the integration between Jira and Bitbucket. We don't over
| complicate things and it works fine.
|
| It's like my taste in wine. I don't want an overdeveloped
| sense of taste where only a $400 bottle will do. I'm fine
| with what we have because the work is what excites me and if
| people are documenting projects and managing workloads and
| committing code, we're 90% of the way there.
| danielovichdk wrote:
| Good point.
|
| Wine that costs 400$ is for fun.
|
| You don't drink that professionally.
| spaetzleesser wrote:
| "horrendous monolithic architecture "
|
| I don't really understand what this has to do with "monolithic"
| or not.
|
| Atlassian's software is probably very complex and convoluted
| but from my experience it's almost impossible to keep a clean
| architecture in a software system that has grown over many
| years and is used and customized by many customers so you have
| to avoid breaking backwards compatibility.
| JoBrad wrote:
| It sounds like that's what they are doing, but it's manual.
| outworlder wrote:
| > OK, so you restore backups to a separate system, and
| selectively copy the stomped accounts data back to production
|
| This seems to be exactly what they are doing, as described in
| the article. They don't have automated tools to do this.
| omoikane wrote:
| Not being able to selectively restore data for a subset of
| users might also indicate that users are not fully isolated
| from each other, which is worrying for technical and
| nontechnical reasons.
| indymike wrote:
| There is nothing non-technical that matters. If we start
| acting like it does, we incredibly poor decisions that in
| fact have nothing to do with physical reality, and quickly
| arrive at unworkable technology.
| omoikane wrote:
| Non-technical reasons include "legal" and "compliance",
| which often matters a fair bit. I am not disagreeing that
| non-technical requirements occasionally lead to poor
| decisions, for some value of poor.
| indymike wrote:
| I live is a state that once tried to legislate that pi =
| 3.15. The results were tragic, and the attempt to
| legislate a ratio was a failure, much like systems
| created by regulation and laws often are. Math is much
| less forgiving than legal prose. Making database
| decisions based on criteria that don't make any
| engineering sense one way or the other is not far off
| from legislating the value of PI.
| ksala_ wrote:
| Personally, given the multi-day outage, I think I would just
| restore everything to a separate system, and then only point
| the affected customers to this new system. Take the hit of
| having two fully separated system initially, and work on
| reconciling them after without having to worry about the on-
| going outage.
|
| I wonder if they're not doing this due to some tech
| limitations, to avoid taking the financial cost of running two
| systems, or to avoid having to reconcile the systems.
| bigtones wrote:
| That's a really good idea.
| mandevil wrote:
| At a big multi-tenancy company I used to work at, the problem
| would have been the accessory machines: we had something like
| 15-20 different machines around the main DB and API machines,
| running cron jobs, terminating SSL connections, load
| balancing, sending alerts to us and customer emails out, etc.
| And while the backing up and failing over on DB and API
| machines was a well documented, thoroughly tested process...
| the other machines were all custom jobs that were very poorly
| documented, with who knows what scripts running on them, that
| might or might not be important. Trying to replicate all of
| that during an emergency would have been a challenge.
|
| For just this sort of problem, we actually had three DB
| servers running all the time: active, passive, and _hour
| behind_ with the ability to break _hour behind_ 's copying of
| the write-ahead log of active as the DBA's secret weapon for
| just this problem. If all customers had accidentally lost an
| hours worth of data it would have been embarrassing, but much
| less than completely shutting out hundreds of paying
| customers for two weeks, I think?
| underdeserver wrote:
| > Simple concepts aren't that simple at their scale
|
| It's true that nothing is simple at scale, but it's important
| to note that simple concepts are the _only_ concepts that work
| at scale.
| VWWHFSfQ wrote:
| Most likely the database tables themselves are just a mixture
| of everyone's data. There's no true multitenancy. So they have
| to load the backups into a separate database. Then just go
| through and individually select/insert into the old database.
| And then you have to worry about things like foreign key
| constraints complicating the bulk data loading. Are you going
| to disable constraint enforcement while you bulk load the data?
| How does that affect existing and new data from customers using
| the database? Just a guess. But this sounds like a nightmare
| honestly.
| tetha wrote:
| Yup. The database schema of one of our products uses a
| tenant_id in most tables to separate customers logically.
|
| I've eventually gotten a tenant exporter to work.
| Practically, this requires some deep and nasty digging
| through the information_schema to build a graph of tables and
| foreign key constraints. Once it had that, it generates
| selects with a simple where clause for tables with the
| tenant_id, and selects with weird joins all over the place
| for other tables to dump the tenant data.
|
| All of that sounds complex, but that part took a day or two
| to hammer together to 90% completion, since it's just some
| graph handling. The other 10% were getting some weird date
| formatting questions right to produce a properly importable
| sql dump. And interestingly enough, it's working for more
| than just that one product.
|
| But that's just where the journey started. After that, it
| took a weeks and months to sort out legacy tables, old
| tables, tables without indexes, tables no one knew about,
| tables that were important (but not), tables with
| inconsistent data, .... And it's just handling a single
| relational database. And compared to \copy in psql, it's
| slow. And at times, weird things happen if you import huge
| chunks of sql into a postgres with deferred foreign keys
| (because our schema has cyclical references).
|
| Point is, I know how painful it can be to handle that kind of
| database schema, at a ridiculously smaller scale. I'm kind of
| happy to not work there.
| [deleted]
| radicaldreamer wrote:
| I can't believe that they would intermix the data in that
| way... but if they did, godspeed to them, they're likely
| still overpromising what can be done in this time frame.
| mdoms wrote:
| They don't, you're responding to speculation which is just
| outright wrong. Jira and Confluence is single tenanted
| databases, unless something fundamental has changed at
| Atlassian in the past 4 years.
|
| Source: worked at Atlassian, on Jira, 4 years ago.
| robertlagrant wrote:
| Then Atlassian's description of why the restore took so
| long makes no sense to me.
| dabeeeenster wrote:
| How else do you run a multitenancy platform?
| kikki wrote:
| Not quite the same but at Fandom (Wikia), every wiki has
| its own DB (over 300,000 wikis), and they are clustered
| across a bunch of servers (usually balanced by traffic).
| It works well - but we don't ever really need to query
| across databases. There's a bunch of logic around
| instance/db selection but that's about as complex as it
| gets.
| jjice wrote:
| Interesting architecture. From a design point of view, I
| like the idea of full isolation. From an infrastructure
| point of view I'm a little scared. I'd assume it's
| actually not that bad and there's a good way to manage
| the individual DBs and scale them individually.
|
| Really interested if you can share any details.
|
| Edit: I know each wiki is on a subdomain. Does each wiki
| also have it's own server?
| kikki wrote:
| There are _many_ databases on each server, last I checked
| there was around 8 servers (or: "clusters") - and we have
| it so the traffic is somewhat evenly distributed across
| each server. There are reasonable capacity limits, and
| when servers get full we spin up a new one and start
| accepting new wikis there. I am not in OPS, and they do a
| lot of work behind the scenes to make this all run
| smoothly - but from an eng perspective we rarely have
| issues with this at scale.
|
| Some of this was open source before we unified all of our
| wiki products, which has a lot of the selection / db
| logic, at https://github.com/Wikia/app.
| spookthesunset wrote:
| How do you update the schema on 300,000 databases?
| msh wrote:
| At minimum separate tables for each tenant.
| Spivak wrote:
| At that point you might as well just do separate schemas,
| it's actually less headache.
| radicaldreamer wrote:
| Sorry, I'm not actually sure... maybe someone who's
| experienced in backend db can elucidate here.
|
| Is it not a good idea to spin up separate db instances
| for each client/company?
| indymike wrote:
| Answer: it depends on the application. For example big
| social app is not going to provision a new db for every
| user, or for every customer that runs an ad. Likewise, a
| lot of enterprise software fits a model where each
| customer getting it's own db makes sense. So, really,
| just a design decision.
| nemothekid wrote:
| I believe you can sign up an account for free or
| incredibly cheap ($5/user). You would potentially have
| tens of thousands of databases. Imagine trying to do
| something like a database migration to add a column. I
| believe the day to day operations would be a nightmare as
| no RDBMS has probably had that kind of feature stress
| tested.
| some-guy wrote:
| The company I work at (Workday) does this, but it's for
| business / liability reasons.
| robertlagrant wrote:
| Bearing in mind the licence fees of Workday, the costs of
| separate databases pale in comparison!
| andyjohnson0 wrote:
| > Is it not a good idea to spin up separate db instances
| for each client/company?
|
| It depends, really. There is a trade-off in terms of
| software and operational complexity vs scalability/perf
| and isolation. And probably a bunch of other factors.
|
| If you have separate databases for each customer, schema
| migrations can be staged over time. But that means your
| software backend needs to be able to work with different
| schemas concurrently. You can also benefit from
| resilience and isolation guarantees provided by the dbms.
| On the other hand, having a dbms manage lots of databases
| can affect perf. Linking between databases can be a
| minefield, especially w/r/t foreign keys and distributed
| transactions.
|
| https://docs.microsoft.com/en-us/azure/azure-
| sql/database/sa...
| Spivak wrote:
| > But that means your software backend needs to be able
| to work with different schemas concurrently.
|
| Not if you're truly multi-tenant and each customer has
| their own app servers. Then your code and schema version
| are always in lock-step.
| andyjohnson0 wrote:
| True. But then you have an additional problem ...
| truffdog wrote:
| Separate DB instances doesn't scale as well cost wise,
| and generally means onboarding takes a few minutes
| instead of being instant. It is very common though.
| Spivak wrote:
| The solution that satisfies everyone is having a separate
| _schema_ per customer and a number of database clusters.
| Then each customer is assigned to a particular cluster.
| Always make sure you have excess capacity on your pool of
| clusters and onboarding is still instant.
| brightball wrote:
| There are basically two options for multi-tenancy with
| their own tradeoffs.
|
| 1. An account/tenant_id field for each table
|
| 2. A schema for each tenant wrapping all of the tables
|
| Option 2 gives you cleaner separation but complicates
| your deployment process because now you have to run every
| database change across every schema every time you
| deploy. This gets more complicated as your code is
| deploying in case the code itself gets out of sync,
| there's a rollback or an error mid deploy due to an issue
| with some specific data.
|
| The benefit of the approach is the option to do different
| backup policies for different customers, makes moving
| specific customers to specific instances easier and you
| avoid the extra index on tenant_id in every table.
|
| Option 1 is significantly easier to shard out
| horizontally and simplifies the database change process,
| but you lose space on the extra indexes. Plus in many
| databases you can partition on the tenant_id.
|
| Most people typically end up with option 1 after dealing
| with or reading horror stories about the operational
| complexity of option 2.
| outworlder wrote:
| Option 2 has many unforeseen consequences.
|
| Business wants to run a query across customers? In most
| DBs you need either custom code or to create a stored
| procedure to iterate across schemas.
|
| Every table that you create is multiplied by the number
| of customers. This has implications for some database
| systems (like PG's vacuum).
|
| Your migrations will take _forever_ to run.
|
| Etc.
| Spivak wrote:
| The second problem is mitigated by the fact that schemas
| are trivially migratable between database servers. Once
| you grow too big for one cluster just make another.
| eropple wrote:
| The secret bomb in option 1 is that you generally have to
| have smarter primary keys that fully embrace multitenancy
| and while Atlassian hires smart folks and I'm sure they
| at some level know this--that's a relatively hard
| retrofit to work into a system.
| [deleted]
| codingdave wrote:
| It is like any other architectural choice - there are
| pros and cons both directions. If you have separate db
| instances, you have to scale up the operations to manage
| each one - migrations, scripts, etc need to be either run
| against them all, or you need good tooling in place to
| automate it. A single instance avoids all that, but is
| more complex in the actual software and definitely more
| complex for security. A single DB also would let you
| share data amongst organizations fairly easily, but
| whether that is good or bad depends on your product. I've
| created and run products both ways, and I like separate
| DBs at small scales, single DBs at medium scale, but
| separate DBs again at huge scale if you also put
| management tooling in place.
| stickfigure wrote:
| I have built multiple multi-tenancy platforms and I never
| create separate databases for each customer. If you have
| separate databases, it's almost impossible to run
| meaningful queries across all of them. That architectural
| choice creates far more headaches than it solves. Usually
| people end up with the split-database architecture when
| they want a quick retrofit for a system that wasn't
| designed with multiple tenants.
|
| I've also had to restore partial data from backups on a
| few occasions when customers fat-fingered some data and
| asked pretty-please to undo. If someone on staff
| understands the system well, it's not hard. I suspect
| Atlassian suffers from a complicated schema and a post-
| IPO brain drain.
| x0x0 wrote:
| I can't believe anyone would do separate databases.
|
| Just wait until a migration doesn't run on 2 of your 400+
| customer databases. Or multi-hour migrations.
| tedunangst wrote:
| Sounds good to me. Now you've got 398 happy customers on
| the new version, and a small scale issue to resolve with
| two customers.
| rectang wrote:
| When all customer data lives in the unified database:
| Just wait until a bug in a query exposes the data of
| customers to each other, creating instant regulatory and
| privacy nightmares for everyone.
| x0x0 wrote:
| With an orm and customer objects to create scoped
| queries, I haven't found this to be a problem. It's also
| very easy to check in code reviews. And not a painful
| issue from, well, the lack of this happening given it's
| an extremely common app design.
| sharken wrote:
| It's likely a mixture of all these factors, the brain
| drain could absolutely be responsible.
|
| At least it would not be the first time in history that a
| company has lost the engineering spirit. And instead the
| business people have taken over, so that details like
| disaster plans become less of a priority.
|
| A business person and an engineer will always view risk
| differently, better disaster plans is a kind of insurance
| that is a lot harder to sell when too many business
| people run the company.
| ZetaZero wrote:
| This. It would be an impossible nightmare for every
| account to have their own DB. Hundreds of thousands of
| accounts and databases....
| lelandbatey wrote:
| I worked at company that architected their multi-tenancy
| in almost exactly this style. In their particular case,
| only a few of the very largest customers had their
| database set aside on their own dedicated instance, but
| every customer did have their own DB with their own set
| of tables. Having worked in that world (every customer
| had their own DB) and on a product where all customers
| had their data intermingled in one gigantic set of tables
| in one giant DB on one logical instance, I'd definitely
| encourage the "every customer gets their own DB".
|
| Giving every customer their own table means you're going
| to need database administrators. For these folks their
| _dedicated_ job was maintaining, operating, and changing
| their fleet of databases, but they where very technical
| and were _amazing_ to work with.
| david422 wrote:
| > I'd definitely encourage the "every customer gets their
| own DB".
|
| Does this extend to services as well? We have a suite of
| (micro) services. Are they all segregated?
| mdoms wrote:
| This is the case. I won't comment on your "hundreds of
| thousands" figure because the number of Cloud customers
| was a closely guarded secret at least when I worked
| there, but yes one DB per tenant, dozens to hundreds of
| DBs per server, and some complicated shuffling of tenant
| DBs when you run into noisy neighbours.
| mh- wrote:
| That makes this prolonged restore process all the more
| confusing, then.
|
| I (and many others) assumed they had to graft in data
| from backups since a full restore would clobber newer
| changes from unaffected customers.
|
| If they're all isolated in their own logical per-tenant
| DBs, I'm really at a loss for what is making restoration
| take 3 weeks for 400 tenants.
|
| I understand if you'd rather not venture into it, but
| care to offer any speculation?
| spookthesunset wrote:
| If they had multi-tenant databases for SaaS it would mean
| either the self-hosted jira instances also had the same
| multi-tenant database schema or they'd have to maintain
| two almost entirely different data access layers for
| cloud vs. on-prem. Since their cloud offering came from a
| historically on-prem codebase, I would expect the easiest
| way to offer cloud stuff is to do a DB per tenant.
| Otherwise there would a shit-ton of new code that only
| applies for cloud stuff....
| [deleted]
| taeric wrote:
| Wait. Why? This sounds like something that feels hard, if
| you are used to the giant DBs of old. But you can
| probably get many many instances of the smaller databases
| without much trouble.
|
| Would still be some maintenance, don't get me wrong. But
| far from impossible.
| ezekg wrote:
| Imagine the database schema migrations...
| jagged-chisel wrote:
| the good news is by the time you get to the 100th client,
| you'll likely have run into all possible bugs and the
| remaining 6900 will be pretty smooth.
| Spivak wrote:
| Having worked at shops that used this architecture it's
| really not that bad. Can you write the code to do one
| schema migration? Great, now you can do 1000. App server
| boots and runs the schema migrations, drops privs and
| launches the app. Now you've staved off your scaling
| issues from "how to have a db large enough to hold all
| our customer data" to "how to have a db large enough to
| hold our biggest customer's data." Much easier.
| robertlagrant wrote:
| You can write the code to do 1000 schema migrations, but
| the problem is if you've migrated 40% of them and hit an
| issue. What do?
| spookthesunset wrote:
| One of the many reasons to put good constrains on fields
| and use referential integrity! If you don't let the
| database enforce data validity you are gonna get fucked
| at some point!
|
| source: every single place I've worked at that poo-poos
| referential integrity has a database that is full of
| bullshit that "the application code" never cleaned up
|
| Always use referential integrity. The people who are
| against it almost always are against it for superstitious
| reasons (eg: "it makes things slow" or "only one codebase
| calls it so the code can enforce the integrity"). All it
| takes is exactly one bug in the application code to
| corrupt the whole damn thing. And that bug _will_ happen
| over the lifetime of the product regardless of how
| "good" or "awesome" the programmers think they are....
|
| ... I'll get off my soapbox now!
| oauea wrote:
| You'll quickly run into limitations of how many tcp
| connections you can hold open. Unless you also want to
| run separate app servers for each customer, which will
| cost a lot of $$$
|
| Oh, and just forget about allowing your customers to
| share their data with each other, which most enterprises
| want in one way or another.
| kubami wrote:
| Wait. What? None of the enterprise customers want to
| share data with each other. And definitely not on a DB
| level. That should happen in the business logic.
| lalaithion wrote:
| Lots of companies have consultants, and want to be able
| to share their consulting-related tickets with their
| consultants. And the consultants want one system they can
| log into and see the tickets from all of the companies
| that are hiring them.
| outworlder wrote:
| It would be a nightmarish scenario if you have thousands
| of customers. And completely unnecessary. You can create
| multiple databases and or schemas in a single instance.
|
| Don't do any of the above unless you understand the
| implications.
| hnlmorg wrote:
| Multiple schemas? You don't need every tenant in the same
| schema. However I'm not a DBA by trade so there might be
| some issue with doing this at scale that I'm unaware of.
| doliveira wrote:
| By segregating as much as you can. Definitely not by
| putting everything in a single table. At the very least
| separate databases/schemas with proper permissions so
| there's not any chance of data intermiBy segregating as
| much as you can. Definitely not by putting everything in
| a single table. At the very least separate
| databases/schemas with proper permissions so there's no
| chance of data intermixing.
|
| The best would be multiple separate database instances,
| which is not even hard to manage specially for qualified
| engineers like Atlassian surely has plenty of. The
| problem are business decisions of ignoring the tech debt,
| usually...
| akie wrote:
| Now every time you run a database migration, you have to
| adjust N tables - and in Atlassian's case, N is 200000.
| Is that better? It depends. There is no "best" way of
| doing multitenancy.
| doliveira wrote:
| There is a worst way of doing multitenancy, and that is
| sharing a single big table.
| hnlmorg wrote:
| That's just an automation issue. It's not like you have
| to write a bespoke database migration script per DB.
| robertlagrant wrote:
| The bug we are mitigating was also just an automation
| issue.
| hnlmorg wrote:
| It's also pretty easy to foobar up a single DB instance
| if you don't have proper guardrails in place.
|
| Automation wasn't the issue here. It's the symptom not
| the cause.
| doliveira wrote:
| Way easier, actually.
| robertlagrant wrote:
| No, the symptom was the loss of customer data.
| tus666 wrote:
| > OK, so you restore backups to a separate system, and
| selectively copy the stomped accounts data back to production
|
| You don't think that's exactly what they are doing?
| qiskit wrote:
| > However, if they [restore backups], while the impacted ~400
| companies would get back all their data, everyone else would
| lose all data committed since that point
|
| How would they lose committed data? Even after restoring the
| backups can't they run the logs so that everyone is caught up?
| drjasonharrison wrote:
| Are you assuming that they record the events in a way that
| can be played back?
| mh- wrote:
| _(There 's a tacit assumption here that the data across
| tenants is commingled in tables, and that's being disputed
| elsewhere in the thread, but playing along..)_
|
| You wouldn't be able to do that without forcing downtime for
| all customers, for the duration it takes to restore the
| snapshot and then replay the logs. Not to mention the risks
| of the process failing somehow
|
| You could narrow the window to just the "replay" portion, if
| you were able to stand up an extra database/infra, to switch
| over to when it was ready. But at some point you'd probably
| still have to go read only to checkpoint the logs and begin
| the replay.
|
| It's of course possible to do something more complicated here
| and stream the changes then eventually enact a failover, but
| this would all be too complex and error prone to introduce in
| their current crisis mode. It's something I'd suggest
| _considering_ when architecting their DR /BCP, but it's too
| late for that kind of elegance (and complexity) now.
| more_corn wrote:
| Yeah, I'm thinking the exact same thing.
|
| Perhaps they don't have the right people on hand to do hard
| things like this.
|
| They also apparently lack an incident response plan since a
| critical component of that is coms to affected customers.
|
| They also lack good practices around preventing human error. It
| should not have even been possible to make the initial mistake.
| It certainly should have involved multiple steps of "are you
| sure" and potentially even review.
|
| Sounds like an operations shit show. Glad it's not my circus.
| robertlagrant wrote:
| They have great practices; they even published them. They
| just didn't follow them here.
| ChrisMarshallNY wrote:
| Heh. We have a Confluence account.
|
| That no one uses.
|
| So we didn't notice.
| teh_klev wrote:
| You probably wouldn't if you weren't in the affected subset of
| customers who were. This wasn't a total outage, but rather it
| affected a group of users who had been running a legacy
| standalone app called "Insight - Asset Management".
| throwawayHN378 wrote:
| When in doubt I go on LinkedIn and find an engineer that works
| for the company and message them directly.
| rjmunro wrote:
| Does that work? I've sometimes thought about trying it, but
| never actually done so.
| flaviotsf wrote:
| I recommend doing disaster recovery steps for your personal data
| as well, such as Gmail. At one point recently I was creating
| filters to delete bulk messages and - when the filter got
| created, it somehow missed the from:@xyz.com domain part and I
| ended up deleting => delete forever all emails. I noticed the
| issue right away but it was enough to wipe 2-3 months worth of
| emails (all of them, even Sent ones).
| Traster wrote:
| I remember finding out one of the senior managers from my company
| ended up as head of software at Atlassian. It was at that point I
| was convinced Atlassian has no idea what the hell they're doing.
| I think this demonstrates the point nicely.
| celim307 wrote:
| After this they might have to boomerang back to your company
| lol
| Cederfjard wrote:
| PSA, because I'm seeing a lot of JIRA in this thread: Since the
| 2017 rebranding, Jira is no longer officially written in all
| caps: https://community.atlassian.com/t5/Feedback-Forum-
| articles/A...
|
| (You can argue how successful it was when people are still using
| the old style in 2022).
|
| It also makes more sense, since Jira is not an acronym, it's a
| truncation of Gojira, inspired by Bugzilla/Mozilla.
| spookthesunset wrote:
| Yeah, I've never typed it as anything but JIRA. Pretty sure my
| auto-complete will vouch for that.
| Vaslo wrote:
| I bet the Shitlassian guy is dancing and singing because of this.
| a-dub wrote:
| i hate deleting things. prefer flags that hide things instead
| (like a boolean deleted flag in an rdbms table).
|
| prevents data integrity issues in relational databases, makes
| debugging easier and prevents disasters.
|
| ideally also include a timestamp, both for bookkeeping and safe
| tools that only remove things that have been soft deleted for
| some time and are safe to delete without compromising integrity
| of anything that is not deleted (this is especially important in
| relational data models)
| jacquesm wrote:
| Better still: a field that registers at what date a record was
| supposedly marked as deleted. Because otherwise you still can't
| bulk recover from an error.
| a-dub wrote:
| yep. but at least in the rdbms case, and probably in all
| cases, a flag (and an index on it) tends to be essential for
| query performance since the state of the flag will appear in
| most, if not all queries.
|
| that's okay though, queries that reference the timestamp can
| be slow since they're housekeeping.
| bombcar wrote:
| The GDPR and various things have made companies more skittish
| in doing things this way, because they get scared.
|
| Perhaps an effective measure would be to create a key that
| encrypts a customer's data, and give them a copy of the key,
| and let them know that after a certain point your copy of the
| key will be deleted, and if they want a restore past that point
| they'll need to provide the key.
| brimble wrote:
| You may as well just delete it, then. I guarantee a high
| percentage of users won't save that key _and_ be able to find
| it later. GH (edit: or similarly nerdy sites) might (might!)
| be able to get away with that, but as soon as part of your
| process is "give the user a cryptographic key" you've just
| guaranteed yourself a support nightmare, with normal users.
| It's why the only cryptographic person-to-person
| communication systems that've been broadly successful haven't
| involved keeping track of _anything_ , and don't have a setup
| process more complex than "point camera at QR code".
| bombcar wrote:
| Yeah, you end up in the case where you "officially" cannot
| recover after X, but then you make sure that "accidentally"
| you might be able to recover by keeping copies around
| somewhere ... until someone realizes and you get sued.
| a-dub wrote:
| that's an interesting question, i've given a little thought
| to this multi tenant saas stuff...
|
| not sure if the right way forward is some sort of innovation
| in operating system and software design where people write
| and run apps that feel like single tenant apps attached to
| dedicated per tenant datastores where os and framework magic
| handle per tenant encryption and segmentation (tenant id as
| an os level concept)
|
| or... if it makes more sense to encrypt at the record level
| with keys that only the customers hold using (assuming it's
| up to the task) homomorphic encryption for things like
| searches and other backend functions.
|
| either way, for now, soft deleting and following up with an
| automatic daily hard delete of things soft deleted more than
| x days ago is a totally reasonable approach.
|
| ops scripts should require typing "yes i know what i'm doing"
| if someone attempts to hard delete things that have not yet
| been soft deleted.
| bombcar wrote:
| Yeah, soft delete is the way to go in 99.99% of the cases,
| with a system setup to eventually hard delete on some
| schedule (preferably don't hard delete until X number of
| backups have caught the soft deleted data safely, for
| example).
| miketria wrote:
| Hi, this is Mike from Atlassian Engineering. Strongly
| agree with this. I'd say that if you can afford it, don't
| do the hard deletes on a schedule though. You never know
| when there's a system out there referring to soft deleted
| data that fails once the data is hard deleted. Hard
| deletes should feel frightening because they are
| frightening.
| deckard1 wrote:
| > The GDPR and various things have made companies more
| skittish in doing things this way, because they get scared.
|
| They may be scared. But are they scared enough to reload
| every single backup they have, purge the desired records, and
| resave each and every single backup they have? And not also
| worry they will corrupt/break the backups in the process.
|
| GDPR compliance is a mess of contradictions and unreasonable
| asks which all seem to amount to "depends on who you ask."
| yabones wrote:
| What's a good Jira replacement? Redmine? Phabricator?
| OpenProject? Just leaving the jira server alone and hoping
| there's no new and exciting zero-days? One thing is clear, these
| guys are a bunch of cowboys who can't be trusted with any amount
| of data.
| elxr wrote:
| If you're hosting your code on GitHub, then GitHub projects is
| definitely worth using.
|
| Does everything I used to use Jira for, but feels more modern
| and lightweight. Also, it has dark mode.
| dzikimarian wrote:
| I'm on the same boat. Currently best choice seems to be
| youtrack, which has reasonable licensing model for self hosted
| option.
| 420official wrote:
| I'm not at all familiar but a tweet linked from the OP and
| written by the author plugs https://linear.app/
| gkoberger wrote:
| Linear is phenomenal. Probably built for a different audience
| than Jira (it's like Superhuman for tickets), but if you want
| something that works well and is opinionated I highly highly
| recommend it.
| bloopernova wrote:
| Linear has a dark mode. I'm already won over! ;)
| kitsune_ wrote:
| Gitlab would be enough for engineering teams
| nicoburns wrote:
| We switched from JIRA to Shortcut https://shortcut.com/
| (formerly Clubhouse), and I'd highly recommend them. It's much
| better than JIRA ever was, both from a UX perspective and an
| implementation/performance perspective.
| _dark_matter_ wrote:
| _Bugzilla_
| [deleted]
| originalvichy wrote:
| For pure engineering teams it's either Gitlab or Azure Devops.
| Those are the most common competitors I hear about. If you have
| non-engineers the choice gets trickier.
| histriosum wrote:
| I've used Request Tracker for years. It's not pretty, it's
| written in Perl, but I can fairly easily make it do all the
| ticket tracking flows I care about and it just runs and runs
| and runs. My scale is admittedly small, but I put tens of
| thousands of tickets per year through my instance, and i
| basically never have to touch it unless I'm setting up a new
| queue or different flow for something.
| beardbound wrote:
| Wow, I've never seen anyone mention RT here. I used it for
| years when I was working IT for my university while in
| undergrad. It worked pretty well. It didn't have a lot of
| features but it allowed clients/customers to respond to
| tickets via email which was pretty cool at the time (late
| 00s). It also ran pretty fast on the terrible servers we had
| it on.
| WC3w6pXxgGd wrote:
| Tomsilverberry wrote:
| jacquesm wrote:
| I suspect - pure speculation - that they _can 't_ restore the
| backups, because if they could then they could easily do this in
| a way that accounts affected could be restored selectively. In
| other words: test your backups, if you don't they won't be there
| for you when you need them.
| digital79 wrote:
| ordiel wrote:
| All I can say as an Attlassian Server products user is that the
| moment they say it was Cloud or nothing, I choose nothing.
|
| I much rather running Gittea on a raspberry pi that I CONTROL
| than having to have the impotence of doing nothing for more than
| a week. + having work at cloud companies and having been
| requested to "collect customer data" to hand it over to the
| government I would NEVER move critical pieces to anyone else's
| infa...
|
| (Note: I am not supporting crime, but I rather to have privacy
| and criminals than living on an authoritarian regime where a
| dictator who knows everything abot everyone keeps "peace".... Yes
| I am looking at you China!)
|
| If mistakes will be made, at least I wont pay others to do them
| for me....
| Phelinofist wrote:
| As I understood it is not "Cloud or Nothing" but "Cloud or Data
| Center" - is this wrong?
| rsstack wrote:
| The on-prem offering of Atlassian was discontinued. Existing
| contracts are being honored but as of March 2022, that's the
| end of the line for it. Maybe it will be revived now.
| grnmamba wrote:
| Unlike server, Data Center starts at 42.000$ per year.
|
| For most SMBs, it's cloud or nothing (or a different vendor,
| of course).
| yabones wrote:
| AFAIK the Datacenter pricing starts at 500 users and goes up
| from there. So a small org could end up paying 5-10x what
| they were before on the Server license.
| callamdelaney wrote:
| Where do you think the cloud lives?
| kache_ wrote:
| return to monke
|
| vi your todolists on an ec2 box
| [deleted]
| mc4ndr3 wrote:
| They never heard of beta testing, rolling updates, infrastructure
| as code, federation, customer isolation, or Public Relations.
| What the heck.
| parentheses wrote:
| A case for reducing complexity of software. Also, given the
| recent GitHub incident spree, it's almost debilitating. The
| entire tech industry takes a hit when companies like these fail
| at operations.
| oldshatterhand wrote:
| Random guess, that this is a "we say we make backups, but we
| actually take snapshots" issue :)
| luckydata wrote:
| so this is the end of Atlassian as a company right?
| vinay_ys wrote:
| Depends. Are there strong alternate products to which customers
| can easily migrate in next 6-12 months? If yes, and they choose
| to move away, then Atlassian will be in serious trouble. I
| wonder how many of their customers have long-term locked-in
| contracts and if they have performance clauses that allows them
| to exit such contracts.
| lifefeed wrote:
| Eh. The Exxon Valdez oil spill is a case study in the failure
| of crisis management, but Exxon weathered it. It's a vastly
| different industry with huge "economic moats," but it does
| point to the fact that a company can weather a crisis.
| raincom wrote:
| I don't think so, as long as investors hold the stock, as long
| as customers keep paying Atlassian.
| function_seven wrote:
| I had the same initial thought. _Surely_ a weekslong outage
| would drive customers away permanently, right?
|
| Nope. From TFA:
|
| > _I asked customers if they would offboard Atlassian as a
| result of the outage. Most of them said they won't leave the
| Atlassian stack, as long as they don't lose data. This is
| because moving is complex and they don't see a move would
| mitigate a risk of a cloud provider going down._
| luckydata wrote:
| it doesn't happen overnight, but this is a really bad
| precedent and it will definitely have an effect on both sales
| AND renewals. This market is theirs to lose and seems they
| are doing everything they can to do just that. Github is
| getting better, and it has mindshare amongst developers, not
| to mention it's part of a company that like it or not knows
| how to sell to large enterprises (Microsoft).
| case0x wrote:
| Why would it ? On our end everything works fine. If you're not
| one of the 400 companies, there's no difference
| asah wrote:
| "first they came for ..."
| gmfawcett wrote:
| Poor taste, buddy. Comparing the Atlassian mess-up to the
| Holocaust diminishes the Holocaust.
| asah wrote:
| um... the sentiment is universal it's not specific to
| that particularly awful history. Sorry if it triggered
| you, HN doesn't offer a delete button.
|
| FYI my ancestors fled oppression on both sides and I'm
| well aware that it's a miracle I'm alive.
|
| Again, one bad thing leading to another is a common human
| behavior, and the Holocaust is just an extreme example
| that I ABSOLUTELY did not intend whatsoever. You make
| this connection, not me.
| gmfawcett wrote:
| If I'm then one making this connection, then it should be
| trivial for you to finish your sentence. "First they came
| for..." Who are the Jews in your analogy ? Who are the
| communists? the trade unionists? And who is the
| totalitarian regime?
|
| Suggesting that Niemoller's poem is about "one bad thing
| leads to another" is like suggesting that Anne Frank's
| diary is about "sometimes girls have really bad days." I
| understand you didn't mean any offense to anyone. But
| that's not a license to be offensive, and then duck for
| cover.
| [deleted]
| foobiekr wrote:
| Severe operational issues don't give you pause?
| dividedbyzero wrote:
| Even those 400, especially Jira is crazy popular with a lot
| of scrum masters and the scrum crowd in general. I could see
| some of those 400 stick with Jira even after this shit show
| if only to avoid losing all their scrum masters.
| openknot wrote:
| Yep. The vast majority of users don't follow these outages
| (aka don't browse forums like Hacker News or r/sysadmin), and
| thus aren't aware of them.
|
| Many of these users are decision-makers who decide what tools
| to use, and will continue to use Atlassian out of inertia due
| to lots of existing documentation on the tool (this is
| compounded by not knowing about the outages, or not knowing
| the severity of the outages), and also because large,
| professional companies use their tools too.
|
| I don't necessarily agree with the perspective to stay with
| it, but it uses a lot of political capital/innovation
| tokens/goodwill/etc. to change systems, when there are
| usually higher-priority things to do (than to get buy-in to
| switch).
| knbrlo wrote:
| My current employer uses Jira but we seem to have not been
| affected by this. Hopefully those customers affected are able to
| press Atlassian for improvements from notification time, backups,
| usability etc.
| bitwise101 wrote:
| This talk from Atlassian aged well
| https://conferences.oreilly.com/software-architecture/sa-eu-...
| danuker wrote:
| I am tired of survivor-biased "best practices" advice. I wonder
| which practices contained there are the _worst practices_.
| nitinagg wrote:
| Selectively restoring data only for certain rows is super hard.
| But the communications by Atlassian has been the worst I have
| ever seen in the industry.
| raincom wrote:
| So, it must be a bad idea to shove the data of multiple
| customers in a single table controlled by some column name
| ('tenant').
| profmonocle wrote:
| I actually got an email from our Atlassian contact just the
| other day encouraging us to switch to their cloud service.
| Crazy that no one thought to pause those. (I assume it _must_
| have been scheduled.)
| HeyLaughingBoy wrote:
| This article on HN is the _only_ time I 've even heard that
| Atlassian was having a problem. I suspect that 99% of the
| tech "community" has absolutely no idea this is happening.
|
| We use Jira, but it's self-hosted for my team. Maybe other
| teams that have transitioned to the cloud version are aware
| that there's a problem, but I haven't heard about it.
| LadyCailin wrote:
| Apparently the self hosted version goes out of support in
| 2024, so there will only be cloud hosting. Dumb dumb dumb.
| mcintyre1994 wrote:
| It's only 400 teams affected, but from this article it
| sounds like they're all really big ones.
| seanwilson wrote:
| > Selectively restoring data only for certain rows is super
| hard.
|
| What's the right way to structure your data here that would
| make restoring more straightforward here? Is this
| backup/restore scenario niche or they should have designed for
| it?
| inopinatus wrote:
| in theory, shard your customer databases 1:1, job done. alas,
| in practice, many SaaS compromise this two ways:
|
| a) overwhelmed by creeping featuritis, each customer's data
| has relationships to global tables, and
|
| b) they backup their entire database cluster in one snapshot
|
| and there maybe other gotchas for restoration, like relying
| on denormalized views and caches that have to be rebuilt.
| they may also have erroneously assumed that data protection's
| main value driver is whole-of-system disaster recovery, which
| can lead to pathologies such as "we don't have a single-
| customer restoration tool".
|
| this is not a niche scenario
| bpicolo wrote:
| Heck, it's worse now - if your data deletion tooling did a
| good job, there are dozens or hundreds of microservice
| databases to restore.
| seanwilson wrote:
| > shard your customer databases 1:1
|
| What are the downsides to this?
| inopinatus wrote:
| * makes it much harder to distribute your tables by any
| other factor, for whatever reason (usually performance,
| sometimes archival)
|
| * disaggregates data that the SaaS might be interested in
| querying/updating as an aggregate
|
| * not all ORM frameworks handle this case well, if at all
|
| * dumps are more than a single trivial command
|
| basically all your data operations gain an additional
| dimension of complexity, and you may not perceive the
| benefits until much later
| seanwilson wrote:
| Would it be fair to estimate that the majority of SaaS
| companies aren't sharding like this then? Seems like a
| lot of downsides that impact everything often except for
| backups, which you'd restore rarely.
| mypalmike wrote:
| Per-customer is a common sharding strategy for noSQL
| databases, so it may not be entirely uncommon.
| darkwater wrote:
| All of your points (minus maybe the first one) should be
| "easily" solved/implemented in a company the size of
| Atlassian, and maybe there are newer costumers sharded
| like this already. IMO what happened in this case is
| basically tech debt that is now being paid with loooot of
| interests.
| deckard1 wrote:
| > not all ORM frameworks handle this case well, if at all
|
| typically this is probably for internal
| reporting/metrics. But yeah, a custom script with direct
| SQL is in order. Personally my opinion is avoid ORM at
| all costs. Never seen a benefit that wasn't trivially
| done in SQL, and the downsides are incredibly painful.
|
| The big downside of sharding out, per customer, is that's
| a _lot_ of databases to migrate on upgrades. Or rollback
| if shit hits the fan.
|
| The upside? You can have customers on different versions
| of your app if you really wanted to do such a thing.
|
| In any case, proper tooling goes a _long_ way to making
| it the difference between wonderfully manageable and
| torturous nightmare. Think idempotent backup scripts that
| are capable of failing at any time and resuming where
| they died, etc.
| oauea wrote:
| Work out a relationship graph and automate the export/import
| ollien wrote:
| As someone who has never had to perform this kind of recovery:
| why is it so hard?
| jacquesm wrote:
| Because it is very difficult to maintain relational integrity
| during a restore like that.
| ollien wrote:
| Gotcha. I guess you could be heavy-handed and disable
| foreign key checks, but who knows what other bugs that
| would bring into the mix.
| teling2 wrote:
| The other difficulty is if you don't restore the entire
| state in a single transaction. Imagine you have partial
| data restored in Table A but haven't updated Table B
| correspondingly. Now some other program that consumes
| Table A and Table B and doesn't have error handling will
| crash (or worse, mutate state in other weird ways).
| jacquesm wrote:
| That _is_ relational integrity.
| miketria wrote:
| Hi, this is Mike from Atlassian Engineering. You are right the
| communications from us have not lived up to our standard. We
| will focus on this specifically once we restore service and get
| the post incident review out there. More details here:
| https://www.atlassian.com/engineering/april-2022-outage-upda...
| lallysingh wrote:
| Spamming HN isn't helping your cause man.
| [deleted]
| jacquesm wrote:
| It is, but between 'hard' and 'impossible' there is the nagging
| question of whether you actually really still _have_ that data.
| chousuke wrote:
| If the database schema for Jira on the cloud is anything like
| the Datacenter version, I'm not surprised they're having a hard
| time restoring data. I once tried to figure out how to find
| duplicate / redundant project schemas by querying the database
| (the required APIs are cloud-only) and could not even find
| which tables stored half the data, never mind how they referred
| to each other.
| duxup wrote:
| As this continues I suspect that this might be one of the few
| times where a lack of transparency / good communication really
| ... might not be better or worse because the situation is so
| bad that transparency would be horrible just the same.
|
| Granted that's how all lies start / what sometimes people
| assume and they're wrong but ... maybe this is that time?
|
| Maybe it is in fact so bad that honesty would be a push or
| worse?
| adamc wrote:
| If so, that itself would be a huge red flag for dealing with
| Atlassian.
| duxup wrote:
| I think it is...either way.
| tmpz22 wrote:
| It's super hard no doubt but I wonder how much of the data was
| hot vs cold.
| abraae wrote:
| This is extremely poor for a large SaaS company.
|
| A standard RFP question for SaaS should be:
|
| - Can you restore data for a single customer, and if so, what is
| the RTO for that operation?
|
| A smaller SaaS could be excused for only thinking about full
| database restores. When you're a scrappy upstart, thinking about
| hypotheticals is less important than survival.
|
| But for any decent size multi-tenanted SaaS, it's imperative that
| you have the ability to selectively restore individual customers.
|
| The usual approach is to do a full database restore into a
| separate instance, then run your pre-prepared "restore customer"
| scripts to extract a single customer's data from there and pump
| it across your prod instance. In Oracle for example you might use
| database links to give your restore code access to prod and also
| the restore instance at the same time.
|
| Atlassian - MUST DO BETTER.
| scottlamb wrote:
| Is it standard for a RFP to have a long list of questions like
| this? I've never been involved in an RFP from either side.
|
| Is it standard to (in addition or instead) to have something
| more general/forward-looking like: how do you watch other
| providers' postmortems and apply the lessons to your own
| system?
|
| > - Can you restore data for a single customer, and if so, what
| is the RTO for that operation?
|
| If I were to aim something at this specifically, it'd be: can
| you restore data for N customers or N% of customers, and if so,
| what is the RTO for that operation?
|
| I mentioned in another comment that Gmail had a similar outage
| in which they had to restore from tape.
| https://news.ycombinator.com/item?id=31017160 They had a tool
| for restoring a single account but not for restoring N accounts
| in bulk, which would be significantly more efficient than doing
| the one-account process N times. (E.g., in the case of tape
| backups, imagine the difference between pulling data from the
| tape library sequentially for each user vs all N at once,
| particularly when one tape may hold data for many of these
| customers.)
| taude wrote:
| Yes, pages of them. Multiple pages of security questions,
| ciphers used, how data is stored, when is it encrypted, etc.
| I filled out a 20 pager once. As the company got better and
| more mature, we had a bunch of canned answers to make it
| easier and faster....
| drsim wrote:
| Which SaaS platforms provide account-level restores?
|
| If you contact them and say "please restore our data to as it
| was last week" those I know do not offer this.
| boardwaalk wrote:
| I wouldn't expect them to advertise such a thing, but the
| question is "can they recover from their own mistakes" not
| "can they recover from mine." I don't care if this is with an
| "account-level restore" or whatever; it shouldn't be my
| concern.
| fknorangesite wrote:
| I wouldn't expect it if I just asked. I think it's reasonable
| as part of their disaster recovery though.
| Tobani wrote:
| I accidentally built out this feature at a company once and
| it totally saved our asses a week later.
| hotpotamus wrote:
| I actually did this once with Dropbox, though it wasn't a
| feature they actually published. I clobbered my Dropbox
| directory accidentally, but I was able to find a script
| someone wrote to roll it back to a previous point in time and
| it worked quite well. After that I also took my own snapshots
| just in case.
| MapleWalnut wrote:
| Dropbox support can rollback your Dropbox account to a
| previous point in time too.
| ibejoeb wrote:
| I did. It was an first-principles architectural decision. A
| client could request any point-in-time within the contracted
| period, and it could be either a restoration or a fully
| operational, parallel instance of the account.
|
| It was initially a cover-my-own-ass design, but it turned out
| to be an extremely popular feature that was never even used
| for disaster recovery. Instead, it was used for audit
| support, trial scenarios, projections, and all kinds of other
| stuff.
| Animats wrote:
| Rather, customers must stop using Atlassian cloud services.
| imroot wrote:
| Which is becoming more and more difficult due to them
| focusing on Cloud Products (my on-prem renewal jumped almost
| 8x this year).
|
| I'd rather use request tracker or bugzilla over Atlassian
| these days
| 1970-01-01 wrote:
| Interesting note: Atlassian stock (NASDAQ: TEAM) is up 4% as of
| noon today.
| radicaldreamer wrote:
| It might be a good short opportunity... I imagine a lot of
| customers are kicking off their own internal process for
| migrating away from JIRA. By the time they actually do, it'll
| be at least a couple of quarters from now, which is when the
| customer hit will start materializing in quarterly results for
| the company.
|
| Maybe time to throw a few chips at some long term puts?
| eli wrote:
| Aren't most customers in 12+ month contracts? A migration
| seems like it would take many months to select a new vendor
| and migrate regardless. Be careful about the date on those
| puts. It's pretty hard to out-think the market on this kind
| of stuff. I'd just as soon bet the other way: few customers
| will _actually_ churn and in 6 months this won 't really
| matter.
| __app_dev__ wrote:
| They might even get some new customers after people who
| never used it look at their site and offerings.
|
| Disclaimer I have puts that expire 4/22 (purchased
| yesterday) so I hope they go down in the short term. Seems
| like a total loss now after being up 50% yesterday.
| dahdum wrote:
| I wouldn't short. They just slapped 400+ customers and likely
| hundreds of thousands of users in the face and the C-suite
| didn't think it was important to even acknowledge.
|
| That might look like incompetence, but I think it's
| confidence. They know the switching costs for large orgs are
| so high they can treat these people like trash and few if any
| will leave. I wouldn't be surprised if the total number of
| seats among affected customers has gone up in a few months.
| By failing to acknowledge the problem they've kept it out of
| the mainstream media and financial press.
|
| They have their customers by the balls and don't respect
| them. That's a short term bullish signal to me.
| Iolaum wrote:
| Given how out of sync tech people are with the general
| population I 'd be tempted to think it's a buy opportunity.
| Time will tell.
| __app_dev__ wrote:
| I bought puts yesterday morning. Was up 50% by the end of day
| but now down to 50% of what I paid.
|
| Mine expire 4/22 but I have more calls open at the moment
| anyways so if I had to choose between this going down or the
| market up I'll take a full loss on these puts (seems likely
| at the moment)
| mkl95 wrote:
| The fact it's been so long and they still haven't revealed and
| explained the root cause of the outage is going to make it hard
| to regain trust on their buggy, slow tools. The bright side of
| the incident is that competitors that somewhat care about users
| have a unique opportunity to stand out.
| pgwhalen wrote:
| > The fact it's been so long and they still haven't revealed
| and explained the root cause of the outage
|
| They did last night:
| https://www.atlassian.com/engineering/april-2022-outage-upda...
| hu3 wrote:
| > Faulty script. Second, the script we used provided both the
| "mark for deletion" capability used in normal day-to-day
| operations (where recoverability is desirable), and the
| "permanently delete" capability that is required to
| permanently remove data when required for compliance reasons.
| The script was executed with the wrong execution mode and the
| wrong list of IDs. The result was that sites for
| approximately 400 customers were improperly deleted.
|
| Ouch. I hope no one person got the blame. This is a systemic
| failure. Regardless, my regards to the engineers involved.
| gtm1260 wrote:
| Right? The way this reads it seems like one person set a
| flag incorrectly, something I'm sure we've all done
| numerous times. And there were no checks down the line to
| catch it.
| miketria wrote:
| Hi, this is Mike from Atlassian Engineering. You are
| right that the checks need to improve to reduce human
| error, but that's only half of it. I don't see this as
| human error though. It's a system error. We will be doing
| some work to make these kind of hard deletes impossible
| in our system.
| TheJoeMan wrote:
| I suppose that's why you don't combine a tazer and gun into
| 1 device with 2 triggers.
| mrits wrote:
| If you have a 3rd trigger where the gun turns on the user
| it would be fairly safe.
| femiagbabiaka wrote:
| the problem is that sometimes that gun looks like a
| taser.
| dylan604 wrote:
| Instead, you make it with one trigger and a PRNG that
| decides which gets activated. Just hope you've chosen the
| right PRNG!!
| tadfisher wrote:
| I will then write a script calls your script with the
| PRNG of my choice: PRNG1 always returns "trigger 2", and
| PRNG2 always returns "trigger 1". This detail will be
| documented in Confluence.
| rubyist5eva wrote:
| Considering American police can't even seem to get it
| right when they have two distinct firearms, and are
| trained to holster them on specific sides so they know
| what they are grabbing - and still manage to f*ck it
| up....this might be an improvement.
| hinkley wrote:
| If coding is theatrical then ops is operatic. You have to
| telegraph stuff so over the top that the people in the
| cheap seats know what's going on.
|
| I think what we've lost in the post-XP world is that just
| because you build something incrementally doesn't mean it's
| designed incrementally (read: myopically).
|
| My idiot coworkers are "fixing" redundancy issues by adding
| caching, which recreates the same problem they're
| (un?)knowingly trying to avoid, which is having to iterate
| over things twice to accomplish anything. They've just
| moved the conditional branches to the cache and added more.
|
| Most of the time, and especially on a concurrent system,
| you are better off building a plan of action first and then
| executing it second. You can dedupe while assembling the
| plan (dynamic programming) and you don't have to worry
| about weird eviction issues dropping you into a logic
| problem like an infinite loop.
|
| More importantly, you can build the plan and then explain
| the plan. You can explain the plan without running it. You
| can abort the plan in the middle when you realize you've
| clicked the wrong button. And you can clean up on abort
| because the plan is not twelve levels deep in a recursive
| call, where trying to clean up will have bugs you don't see
| in a Dev sandbox. Deleting 500 users...
|
| Versus Permanently deleting 500 users...
|
| Maybe with a nice 10 second pause (what's an extra ten
| seconds for a task that takes five minutes?)
| deckard1 wrote:
| I don't want to assume too much, since the details are
| sparse. But I know for a fact that few of my current
| coworkers know a thing about writing tooling code. It's
| becoming a bit of a lost art.
|
| Here's the way such a script should be done. You have a
| dry-run flag. Or, better yet, make the script dry-run
| _only_. What this script does is it checks the database,
| gathers actions, and then sends those actions to stdout.
| You dump this to a file. These commands are executable.
| They can be SQL, or additional shell scripts (e.g.
| "delete-recoverable <customer-id>" vs. "delete-permanent
| <customer-id>").
|
| The idea is you now have something to verify. You can scan
| it for errors. You can even put it up on Github for review
| by stakeholders. You double/triple check the output and
| then you execute it.
|
| Tooling that enhances visibility by breaking down changes
| into verifiable commands is incredibly powerful. Making
| these tools idempotent is also an art form, and important.
| krooj wrote:
| This speaks to a lack of operational excellence - when you
| develop a platform like JIRA, Confluence, etc, the
| operational tools required to manage the systems are just
| as important as the features themselves. If all you do is
| pump out features, you're a feature factory and will suffer
| these kinds of issues. There's no reasonable explanation
| for needing a script to do what was described when the
| necessary tooling to generalize such an operation should
| have been in existence.
| drc500free wrote:
| Highlighting the text in any of their lists breaks the page
| in interesting ways, apparently due to some twitter-sharing
| functionality.
| mkl95 wrote:
| > Communication gap. First, there was a communication gap
| between the team that requested the deactivation and the team
| that ran the deactivation. Instead of providing the IDs of
| the intended app being marked for deactivation, the team
| provided the IDs of the entire cloud site where the apps were
| to be deactivated.
|
| So what they are saying is that they are not testing scripts
| at some staging server before running them in production.
| It's wild that they've managed to scale their products so
| much before something like this happened.
|
| I hope they've learnt their lesson and they set up some QA
| process for that stuff.
| notdang wrote:
| it seems that it worked as intended, thus they have a QA
| process. The problem was in the wrong IDs provided and I
| doubt that at their scale they have a staging environment
| that duplicates the customer data.
| dylan604 wrote:
| Would it be bad practice to append values to a GUID type
| of ID that would help a human recognize them? For
| instance, in this specific case they wanted app IDs as
| APP-XXXXX-XXXX-blahblah and CLOUD-XXXXX-blahblah.
|
| I'm not looking to help their specific problems, but this
| is more from a general question I've thought of doing but
| never have done just because I'm sure I'd get laughed at
| for blazing my own trail
| bombcar wrote:
| This is recommended in my experience, but you do have
| some potential issues when a UUID gets reused or
| repurposed.
|
| WHENEVER a human is involved in the chain, UUIDs can be
| suspicious because there's no easy way to verify what it
| is, whereas a human has a good chance of realizing that
| $1,342.34 is probably not a valid date.
| xeromal wrote:
| I kind of dig it. Something that helps make things
| obvious to a human
| mkl95 wrote:
| > I doubt that at their scale they have a staging
| environment that duplicates the customer data.
|
| If there is no feasible way of replicating their
| production environment somewhere else, then there should
| be some sanity checks in place. Something like "if an
| abnormally high amount of customer sites go down during
| the script's execution, kill the script". This is a 20/20
| hindsight approach though and if Atlassian engineers
| can't solve I doubt a random HN user like me can.
| tempest_ wrote:
| Is it just me or is highlighting on that site broken,
|
| Perhaps my ad blocker is causing that stupid highlight to
| tweet js they are using to break.
| teh_klev wrote:
| > and they still haven't revealed and explained the root cause
| of the outage
|
| They did, this post by Atlassian from yesterday is referenced
| in the article.
|
| https://www.atlassian.com/engineering/april-2022-outage-upda...
|
| Still doesn't excuse them for the time taken to come clean.
| selimnairb wrote:
| CTO should be fired.
| scottlamb wrote:
| Gmail had a vaguely similar outage years ago. [1] tl;dr:
|
| 1. Different root cause. There was a bug in a refactoring of
| gmail's storage layer (iirc a missing asterisk caused a pointer
| to an important bool to be set to null, rather than setting the
| bool to false), which slipped through code review, automated
| testing, and early test servers dedicated to the team, so it got
| rolled out to some fraction of real users. Online data was
| lost/corrupted for 0.02% of users (a huge amount of email).
|
| 2. There were tape backups, but the tooling wasn't ready for a
| restore at scale. It was all hands on deck to get those accounts
| back to an acceptable state, and it took four days to get back to
| basically normal (iirc no lost mail, although some got bounced).
|
| 3. During the outage, some users could log in and see something
| frightening: an empty/incomplete mailbox, and no banner or
| anything telling them "we're fixing it".
|
| 4. Google communicated more openly, sooner, [2] which I think
| helped with customer trust. Wow, Atlassian really didn't say
| anything publicly for nine days?!?
|
| Aside from the obvious "have backups and try hard to not need
| them", a big lesson is that you have to be prepared to do a
| _mass_ restore, and you have to have good communication: not only
| traditional support and PR communication but also within the UI
| itself.
|
| [1]
| https://static.googleusercontent.com/media/www.google.com/en...
|
| [2] https://gmail.googleblog.com/2011/02/gmail-back-soon-for-
| eve...
| fishnchips wrote:
| Funny enough, most of what we restored then was spam (ex gTape
| SRE, remember the outage).
| Aissen wrote:
| The sad truth is that with 99.8% of customers unaffected, it was
| probably thought to be a minor issue. If those customers didn't
| have Gergely's ear we probably wouldn't have heard about it.
| miketria wrote:
| Hi, this is Mike from Atlassian Engineering. Not a minor issue.
| Once we knew the extent and severity of the incident, we had
| hundreds of engineers engaged and working to restore service.
| tpmx wrote:
| Is there a source on this number?
| Aissen wrote:
| From the article:
|
| > Atlassian claims the customers impacted were "only" 0.18%
| of its customer base at 400 companies.
|
| From https://jira-software.status.atlassian.com/ :
|
| > The team is continuing the restoration process for the ~400
| impacted customers.
| tpmx wrote:
| > The team is continuing the restoration process for the
| ~400 impacted customers. We have restored functionality for
| 45% of impacted users.
|
| If this is truthful it implies implies more than 400
| impacted customers.
| jgrahamc wrote:
| _Communicate directly and transparently_
|
| Yes. Always.
| politelemon wrote:
| > it takes between 4 and 5 elapsed days to hand a site back to a
| customer.
|
| Atlassian's SLA page says, Premium Cloud Products 99.9%
|
| That's 43 minutes of downtime per month.
|
| That works out to, Atlassian can't have any more downtime for the
| next 14 years. Are SLAs even real?
|
| I'm being slightly facetious. From the page text it's just a
| threshold after which I think you're entitled to some money back
| for that month.
| bborud wrote:
| Think of SLAs as "this is how hard we'll scramble when shit
| hits the fan".
|
| Except...I don't even believe that.
| chrsig wrote:
| It's more "this is our contractual obligation, if we're down
| more than this, then we might not charge you"
| dylan604 wrote:
| Lawyers are involved, so I'd assume some text about
| "excluding acts of god, sabotage,etc" to weasel their way
| out of things. They might even be able to get away with
| "acts of incompetence" how ever a lawyer might phrase that
| to allow their client to weasel.
| mywittyname wrote:
| That's a good way to get executive approval to replace a
| system. Google or Apple can get away with this kind of
| behavior, I doubt Atlassian can.
|
| This outage alone has spurred conversations in slack
| about how terrible JIRA is and why we should replace it.
| If this kind of shit was pulled, I can guarantee we'd be
| on shortcut, linear, or something else in short order.
| [deleted]
| MajorBee wrote:
| > Google or Apple can get away with this kind of
| behavior, I doubt Atlassian can
|
| Atlassian absolutely can in enterprise settings. In my
| company (a large cloud company), if JIRA goes down, large
| swathes of the business will also stall, including code
| deployment (deployments are tracked through change
| management JIRA tickets). We also use the DC version of
| Atlassian products, so presumably we aren't be at the
| mercy of Atlassian cloud engineers.
| TheCoelacanth wrote:
| SLA credits are a thing that actually happen in the
| industry. I wouldn't automatically assume that they will
| be able to weasel out of it.
|
| They are typically limited to the amount that you
| actually paid, though, so basically they don't charge you
| for the time when you couldn't use the product. You
| usually won't get more than that.
| [deleted]
| mmcgaha wrote:
| I think of SLAs as how do we design this thing. Ask for a
| system without an SLA and I will give you a system that is
| well designed and almost never goes down. As soon as you ask
| for an SLA, I will give you an over engineered system that
| costs more, takes longer to implement and is slower to
| iterate but it will almost never go down either.
| echelon wrote:
| In some industries, three nines isn't exactly stellar. Every
| service I've worked on recently has demanded five nines of
| uptime and tons of reporting on latency and even seconds-long
| outages.
|
| I've been on-call during a total infrastructure outage whose
| root cause was a service my team owned [1]. Our CEO was aware
| of it. Customers and business partners were aware of it.
| Other CEOs were aware of it. The media, you name it.
|
| Some outages can be "business ending" or "business damaging".
| That's why we made a practice and process of performing
| regular disaster recovery exercises, had exceptionally well
| documented runbooks, had monitoring attached to everything,
| and engineered for resilience.
|
| Though I'm not familiar with how Atlassian runs, I think this
| is an "engineering culture" thing or can be mitigated with a
| proper approach.
|
| [1] The company has only had a few of these in total, and no
| member of our team was culpable for the complicated failure.
| krinchan wrote:
| Per the article, if you experience < 95% uptime in any 30 day
| window you qualify for a 50% discount. On a month or your
| next year or ... ? it doesn't say.
| hinkley wrote:
| Basically not counting lost sales their income for this year
| went down 2%, which is not as big a deal to them as it is to
| their customers.
| 0xbadcafebee wrote:
| The typical SLA has no teeth because even if the customer gets
| their money back, the real harm to the customer may be orders
| of magnitude greater than what they paid for the service. Some
| services are contractual or tightly embedded and you know
| you're not gonna lose the customer if your service goes down
| frequently. If the service provider doesn't lose money or face,
| they aren't motivated to prevent the downtime.
|
| One alternative I thought of is the Charity SLA. The service
| provider pledges to give $5,000 to charity for every minute of
| downtime. Now everyone within the company knows "if we're down,
| we're losing thousands of dollars a minute!" and thus will be
| motivated to ensure the services stay up. But even if the
| services go down, the company's making tax-free donations,
| which isn't really bad for anybody. The company could even have
| a specific downtime goal every year, to make sure their
| monitoring/alerting/runbooks actually work, and to ensure they
| donate every year.
| bluedino wrote:
| > Are SLAs even real?
|
| _Tommy: Here 's the way I see it, Ted. Guy puts a fancy
| guarantee on a box 'cause he wants you to fell all warm and
| toasty inside.
|
| Ted Nelson: Yeah, makes a man feel good.
|
| Ted Nelson: But why do they put a guarantee on the box?
|
| Tommy: Because they know all they sold ya was a guaranteed
| piece of shit. That's all it is, isn't it? Hey, if you want me
| to take a dump in a box and mark it guaranteed, I will._
| rglover wrote:
| Haha I needed this, thank you.
| nh2 wrote:
| > it's just a threshold after which I think you're entitled to
| some money back for that month
|
| That is exactly what SLAs are.
|
| There are just a lot of people applying the wishful thinking
| that SLAs are a goal or metric of uptime.
|
| Consider the AWS S3 page on the topic:
| https://aws.amazon.com/s3/sla/
|
| "Reasonable efforts"; if not met, you get some fraction of the
| money back.
|
| S3 has worse uptime than my desktop PC over the last years, but
| affected users got some fraction of their spending back.
| iso1631 wrote:
| > S3 has worse uptime than my desktop PC over the last years
|
| That's sacrilege on HN
| colechristensen wrote:
| SLAs aren't real unless there's a contractual consequence for
| not meeting them.
|
| And a couple of percent discount on services for the extra
| downtime isn't really a meaningful consequence.
| imglorp wrote:
| I was just thinking that there's a hysteresis function here:
| the service is worth much more to your team after you've
| wired your whole process into it than before you joined.
|
| Offering you a free month or whatever doesn't acknowledge all
| the person-hours lost.
| colechristensen wrote:
| There are certainly circumstances where you might have
| grounds to sue for damages if an SLA is breached. I'm not
| sure how often this happens but the losses from something
| like Jira being down could be quite a lot more than anybody
| pays for it. It's quite likely that defenses against
| exactly this are written into the contracts you agree to
| signing up for the service though.
| dxf wrote:
| >Are SLAs even real?
|
| SLI: Some metric you use to measure a thing (e.g. uptime,
| latency, etc.)
|
| SLO: Some objective you try to hit, as measured by the SLI
| (e.g. "99.99% of requests are processed within 3 seconds)
|
| SLA: A promise to a customer that they will meet some SLO, and
| consequences if they don't. If there aren't consequences for
| not meeting the SLO, then measuring and tracking the metrics is
| a pointless exercise.
|
| The SLA is "real" to the extent Atlassian is adhering to any
| listed consequences.
| aunty_helen wrote:
| You could use it as a material breach of the contract and
| possibly get out of any arrangement you have with Atlassian.
| inopinatus wrote:
| A typical SLA precludes that by specifying the remedy for
| noncompliance with the performance measure. Only if they
| fail to apply the remedy is there a material breach. For a
| month-to-month SLA, this limits liability to one month's
| subscription, as agreed in black-and-white.
|
| Customers that demand service level agreements often fail
| to recognise that they cut both ways.
| bombcar wrote:
| Most SLAs say "if we miss this, you get time for free" which
| means that these companies will hopefully get a refund ...
| for the time they can't use the service.
|
| SLAs are mostly aspirational.
| hinkley wrote:
| Cars warranties are also aspirational/virtue signaling, to
| a point.
|
| If the maintenance costs exceed the margins on the cars you
| lose money. Do that on too many product lines for too often
| and you're looking at bankruptcy. But some makers clearly
| are more risk averse than others, so a 6 year warranty from
| maker X does not translate to a 7 year warranty from maker
| Y.
| mh- wrote:
| But Atlassian's (published*) SLA offers a credit of at
| most 50% of the month.. not really the same as a
| manufacturer warranty on a car, which the costs of
| servicing could easily exceed the price paid for the car.
|
| * - their larger customers will have negotiated SLAs.
|
| edit: to be clear, I expect Atlassian will offer
| concessions beyond their SLA obligations. I'm only
| responding to the comparison.
| towelrod wrote:
| The linked article directly talks about this, at this level
| of downtime customers are promised a 50% discount. That's
| what the SLA means, effectively
| profmonocle wrote:
| > and consequences if they don't.
|
| And these consequences usually just amount to getting some
| percentage of your service fees back. I'm sure the affected
| customers will get their entire monthly Atlassian Cloud fees
| back. Since this is _so_ severe maybe Atlassian will even
| give them credits for some # of free months.
|
| But there's no way the amount they'll get from Atlassian is
| going to come close to what they're losing in productivity by
| not having access to Jira & Confluence. At my company,
| getting an entire free year of Jira wouldn't be worth Jira
| being inaccessible for a week.
| bee_rider wrote:
| Does that indicate it would be preferable to pay more for a
| more reliable solution, if such a thing were to exist?
| Although, it definitely would be hard to quantify 'more
| reliable' there.
| miketria wrote:
| Hi, this is Mike from Atlassian Engineering. For the customers
| impacted by this incident covered by an SLA, we will adhere to
| our contractual terms. However, given the long duration of this
| outage, we are planning to go above and beyond for our impacted
| customers. We are currently focused on restoring service, but
| after that will be discussing how we can make it right for each
| impacted customer.
| encryptluks2 wrote:
| It looks like you are focused on Hacker News comments.
| leeoniya wrote:
| > Atlassian's SLA page says, Premium Cloud Products 99.9%
|
| > That's 43 minutes of downtime per month.
|
| we need a better default way to communicate SLOs than "number
| of 9s", which are more human. how the status quo has stayed
| this way can only be attributed to intentional dark patterns,
| imho.
| deathanatos wrote:
| ... honestly, even the "number of 9s" concept is a struggle
| for some companies. I've seen a number of SLAs that fail to
| correctly state a unit: it's %/<unit of time>, and I see the
| "unit of time" get dropped every now and then, and the
| resulting thing is meaningless absurdity.
| mc4ndr3 wrote:
| I've yet to work at an office that paid sufficient attention to
| regular backup & restore validation, to scalable design, or
| proper unit testing, or to basic security updates. Upper
| management is repeatedly incentivized to produce vaporware, not
| reliable service.
|
| Suits think a crummy Flash quiz on PII is enough to stop leaks.
| The automotive industry couldn't stop airbags from acting as
| claymores. It's even harder to get good code approved in tech.
| hinkley wrote:
| The longest Atlassian outage _so far_ ...
| anshumankmr wrote:
| I don't get it. JIRA is working for me.
| vvpan wrote:
| Article points out that 400 companies were effected.
| politelemon wrote:
| A subset of their customers are affected (badly). Enumerate
| your blessings!
| [deleted]
| 1970-01-01 wrote:
| Wouldn't you love to see the Atlassian internal JIRA epic for
| this outage?
| captaincaveman wrote:
| A dumpster fire of a company that has terrible communication with
| customers outside of outages as well.
| snarkerson wrote:
| > Most of them said they won't leave the Atlassian stack, as long
| as they don't lose data. This is because moving is complex and
| they don't see a move would mitigate a risk of a cloud provider
| going down. However, all customers said they will invest in
| having a backup plan in case a SaaS they rely on goes down.
|
| The real key lesson here. Your business is important to you. Not
| so much to the service provider.
| hougaard wrote:
| Always judge companies on how they handle a crysis, not on how
| they do when everything runs smoothly.
| escot wrote:
| When doing bulk deletes like this what safe guards do you put in
| place, other than testing the script up/down in another
| environment, turning off app servers etc (which Im guessing they
| did not do)?
| mh- wrote:
| Depends how complex the query/procedure is.
|
| Naive approach, replace delete with select and see if you're
| surprised at the results.
|
| More mature approach, especially in an environment where
| engineers are running bulk changes against the database, you
| don't do bulk deletes. You change that delete into an update
| that marks things for later collection.
|
| One tactic I've seen that worked, assuming you have
| straightforward relational tables: you add a "marked for
| deletion" column whose value is an identifier for the single
| run of the bulk job you just did. Then you can query rows with
| that value in that column to ensure it had the desired effect.
| If you're satisfied, you run another bulk job which doesn't re-
| run your original query.. it just deletes rows with that
| "marked" value.
|
| Lots of places rely on schema-enforced foreign keys and
| cascading deletes though. In that case, my recommendation is:
| don't.
___________________________________________________________________
(page generated 2022-04-13 23:00 UTC)