hngopher.com

       [HN Gopher] Show HN: PlayBooks - Jupyter Notebooks style on-call...
       ___________________________________________________________________
        
       Show HN: PlayBooks - Jupyter Notebooks style on-call investigation
       documents
        
       Hello everyone, Dipesh and Siddarth here. We are building PlayBooks
       (https://github.com/DrDroidLab/playbooks), an open source tool to
       write executable notebooks for on-call investigations /
       remediations instead of Google Docs or Wikis. There's a demo video
       here: https://www.youtube.com/watch?v=_e-wOtIm1gk, and our docs are
       here: https://docs.drdroid.io/docs/playbooks  We were in YC's W23
       batch working on a data lakehouse with support for dynamic log
       schemas. Eventually we realized it was a product in search of a
       market and decided to stop building it. When pivoting, we decided
       to work on something that we originally prototyped (before even YC)
       but didn't execute on.  In our previous jobs, we were at a food
       delivery startup in India with a busy on-call routine for backend &
       devops engineers and a small tech team. Often business impacting
       issues (e.g. orders dropped by >5% in the last 15 minutes) would
       escalate to Dipesh as he was the lead dev who had been around for a
       while and he always had 4-5 hypotheses on what might have failed.
       To avoid becoming the bottleneck, he used to write scripts that
       fetched custom metrics & order related application logs every 5
       minutes during peak traffic. So if an issue was reported, engineers
       would check the output of those scripts with all the usual suspects
       first, before diving into a generic exploration. This was the
       inspiration to get started on PlayBooks.  We've put together a
       platform that can help any dev create scripts with flexibility and
       without requiring to code much. Our goals were: (1) it can be
       automated to run and send updates; (2) investigation progress can
       be shared easily with other team members so everyone has the right
       context; (3) It can all be done without being on-call or having a
       laptop access.  Using PlayBooks, a user can configure the steps as
       data queries or actions within their observability stack. Here are
       the integrations we currently support: - Run bash commands on a
       remote server; - Fetch logs from AWS Cloudwatch and Azure Log
       Analytics; - Fetch metrics from any PromQL compatible db, AWS
       Cloudwatch, Datadog and New Relic; - Query PostgreSQL, ClickHouse
       or any other JDBC compatible databases; - Write a custom API call;
       - Query events from EKS / GKE; - Add an iFrame  The platform
       focuses on not just running the tasks but also displaying
       information in a meaningful form with relevant graphs / logs / text
       outputs alongside the steps in a notebook format. Some of our users
       have shared feedback that on-call decision making overload has
       reduced with PlayBooks as relevant data from multiple tools is
       presented upfront in one page.  Here are some of the key features
       that we believe will further increase the value to users looking to
       improve developer experience for their on-call engineers: -
       Automated surfacing of PlayBooks against alerts & enriching alerts
       with above-mentioned data; - AI-supported interpretation layer --
       connect with LLM or ML models to auto-analyze the data in the
       playbook; - Logs of historical executions to ease the effort of
       creating post-mortems / timelines and/or share information with
       peers.  If this looks like something that would have been useful
       for you on-call or will be in your current workspace, we welcome
       you to try our sandbox: https://sandbox.drdroid.io/. We have added
       a default playbook. Just click on one of the steps in the playbook
       and then the "Run" button to see the playbook in action.  We are
       excited to hear what you like about the PlayBooks and what you
       think could improve the oncall developer experience for your team.
       Please drop your comments here - we will read them eagerly and
       respond!
        
       Author : TheBengaluruGuy
       Score  : 64 points
       Date   : 2024-06-04 12:29 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | vvoruganti wrote:
       | This is really cool! Love seeing more tools to help SREs and
       | hopefully lessen the burden of on calls.
       | 
       | The notebook style interface for logging and taking notes is
       | appealing too.
       | 
       | Seen a similar approach with https://fiberplane.com/
       | 
       | Haven't been able to play around too much but watching the space
        
         | TheBengaluruGuy wrote:
         | Thank you.
         | 
         | If you get a chance to play around, would love to hear your
         | thoughts on it :)
        
       | taeric wrote:
       | Reminds me of https://nathanielhoag.com/blog/2022/interactive-
       | runbook/. Fun space to play in. Good luck on this!
        
         | TheBengaluruGuy wrote:
         | This is quite an interesting evaluation, thanks for sharing. We
         | are piloting with a large enterprise (100+ SREs). Before us,
         | they started implementing Jupyter Notebooks in a similar
         | direction.
         | 
         | Writing one playbook in Jupyter is easy but creating a
         | framework to enable their 100+ product teams to self-serve and
         | create playbooks has been so intensive for them, they even
         | started working on their internal SDK for it.
         | 
         | It was a lot of code and the lead felt like the Jupyter visual
         | interface was harder to follow for instructions/runbooks.
         | 
         | With PlayBooks, we have tried to abstract out the entire
         | execution engine and configuration to a intuitive user
         | experience (our architecture is explained here --
         | https://slender-resolution-789.notion.site/PlayBooks-Documen...
         | )
        
           | anonme4ever wrote:
           | You should check out Nurtch[0] with Rubix integration[1].
           | Gitlab have some docs on how to use it[2].
           | 
           | Your project seems nice! I'll give it a try ;-) Only thing,
           | the Jupiter-like part is not clear enough.
           | 
           | 0: https://www.nurtch.com/
           | 
           | 1: https://docs.nurtch.com/en/latest/rubix-library/index.html
           | 
           | 2: https://docs.gitlab.com/ee/user/project/clusters/runbooks/
        
             | TheBengaluruGuy wrote:
             | Thanks for sharing about Nurtch & Rubix, I have come across
             | it before in the Gitlab Runbooks.
             | 
             | The Jupyter part is reference to the cellular execution of
             | tasks as per the preference of the users + being able to
             | get execution / code next to each other. Both have been
             | design principles for us from the get-go.
             | 
             | Just like how variables can be reused across cells in
             | Jupyter, we plan to shortly introduce rules / conditionals
             | creating interdependencies between variables in the
             | PlayBooks steps.
             | 
             | Edit: Adding the a sample Playbook link here for reference
             | -- https://sandbox.drdroid.io/playbooks/14
        
       | perpil wrote:
       | I like the integration with slack and the inline execution of
       | steps. I've been working on a similar product with
       | https://speedrun.cc but it just piggybacks on GitHub markdown and
       | most of the execution is done via a deeplink. Reach out if I can
       | help, I've been messing around in this space for awhile.
        
         | TheBengaluruGuy wrote:
         | Slack has become so central to every on-call investigation,
         | that it was like a dealbreaker for my cofounder, Dipesh, to
         | have a fully functional Slack workflow in our MVP.
         | 
         | I did come across Speedrun a while back and was planning to
         | give it a spin. Thanks for dropping a note, I'll drop you a
         | mail sometime in the near future to discuss more on the topic.
         | :)
        
       | ystad wrote:
       | Nice. Similar solution https://github.com/1xyz/pryrite
        
       | bckr wrote:
       | Great to see this launch! I'm looking forward to trying this when
       | our startup is a bit more mature.
        
       | dennisy wrote:
       | This is a great idea! But I feel better served by an existing
       | workflow tool, such as Airflow?
        
       | debarshri wrote:
       | Reminds me of Rundeck and the time we were trying to build
       | something similar. There are more modern take like fiberplane and
       | moment.dev. Not sure about their adoption.
       | 
       | At one point, we were building something like this on top of
       | kubernetes. I think tech is the easy part here. Getting people to
       | leave their existing workflows and use your product is hard.
       | 
       | Secondly, difficult part of our journey was integrations. Until
       | you have integrated all the tools an org uses, product is
       | useless.
       | 
       | Thirdly, it is great that there are building blocks, but users
       | understand use cases. So, expecting end users to build playbooks
       | themselves is tricky. There has to be an intrinsic motivation
       | within the platform.
       | 
       | Fourthly, it is super competitive space if you see it from an
       | internal tool building perspective. There are lot of internal
       | tool builders like appsmith, retool, tooljet, django admin you
       | are competing with where you could run bash scripts, sql queries
       | etc.
       | 
       | Best of luck, with you journey.
        
       | delano wrote:
       | If it works like Jupyter, as a file that can be version
       | controlled, and like Deepnote where multiple people can be
       | viewing/working on it at the same time, my mind would be blown.
        
         | samuelstros wrote:
         | here, be blown away
         | https://github.com/opral/monorepo/tree/main/lix
         | 
         | solving version control for files like jupyter notebooks brings
         | collaboration to those files without the need to give up files
         | in favor of the cloud. playbooks could leverage lix in 1-2
         | years to build a file-based version of their tool
        
         | mcintyre1994 wrote:
         | You might also like Elixir Livebook! :) https://livebook.dev/
        
       ___________________________________________________________________
       (page generated 2024-06-04 23:00 UTC)