[HN Gopher] Launch HN: Pyroscope (YC W21) - Continuous profiling...
___________________________________________________________________
Launch HN: Pyroscope (YC W21) - Continuous profiling software
Hi HN! Dmitry and Ryan here. We're building Pyroscope
(https://pyroscope.io/) -- an open source continuous profiling
platform (https://github.com/pyroscope-io/pyroscope). We started
working on it a few months ago. I did a lot of profiling at my last
job and I always thought that profiling tools provide a ton of
value in terms of reducing latency and cutting cloud costs, but are
very hard to use. With most of them you have to profile your
programs locally on your machine. If you can profile in production,
you often have to be very lucky to catch the issues happening live,
you can't just go back in time with these tools. So I thought, why
not just run some profiler 24/7 in production environment? I
talked about this to my friend Ryan and we started working. One of
the big concerns we heard from people early on was that profilers
typically slow down your code, sometimes to the point that it's not
suitable for production use at all. We solved this issue by using
sampling profilers -- those work by looking at the stacktrace X
number of times per second instead of hooking into method calls and
that makes profiling much less taxing on the CPU. The next big
issue that came up was storage -- if you simply get a bunch of
profiles, gzip them and then store them on disk they will consume a
lot of space very quickly, so much that it will become impractical
and too expensive to do so. We spent a lot of energy trying to come
up with a way of storing the data that would be efficient and fast
to query. In the end we came up with a system that uses segment
trees [1] for fast reads (basically each read becomes log(n)), and
tries [2] for storing the symbols (same trick that's used to encode
symbols in Mach-O file format for example). This is at least 10
times more efficient than just gzipping profiles. After we did all
of this we ran some back of the envelope calculations and the
results were really good -- with this approach you can profile
thousands of apps with 100Hz frequency and 10 second granularity
for 1 year and it will only cost you about 1% of your existing
cloud costs (CPU + RAM + Disk). E.g if you currently run 100
c5.large machines we estimate that you'll need just one more
c5.large to store all that profiling data. Currently we have
support for Go, Python and Ruby and the setup is usually just a few
lines of code. We plan to release eBPF, Node and Java integrations
soon. We also have a live demo with 1 year of profiling data
collected from an example Python app
https://demo.pyroscope.io/?name=hotrod.python.frontend{}&fro...
And that's where we are right now. Our long term plan is to keep
the core of the project open source, and provide the community with
paid services like hosting and support. The hosted version is in
the works and we aim to do a public release in about a month or so.
Give it a try: https://github.com/pyroscope-io/pyroscope. We look
forward to receiving your feedback on our work so far. Even better,
we would love to hear about the ways people currently use profilers
and how we can make the whole experience less frustrating and
ultimately help everyone make their code faster and cut their cloud
costs. [1] https://en.wikipedia.org/wiki/Segment_tree [2]
https://en.wikipedia.org/wiki/Trie
Author : petethepig
Score : 52 points
Date : 2021-02-15 16:00 UTC (7 hours ago)
| fijal wrote:
| Hi Dmitry, Hi Ryan
|
| I love the fact that this is out! I have been the original author
| of vmprof and I have been working on profilers for quite some
| time. I'm also one of the people who worked on PyPy. We never
| managed to launch a SaaS product out of it, but I'm super happy
| to answer questions about profiling, just in time compilers and
| all things like that! Hit me here or in private (email in
| profile)
| petethepig wrote:
| Hi Maciej,
|
| vmprof is cool! For Python we currently use py-spy. The way it
| works is it reads certain areas of process's memory to figure
| out what the current stack is. It's a clever approach that I
| like because that means you can attach to any process very
| quickly without installing any additional packages or anything
| like that. The downside is that from the OS perspective reading
| another process's memory is often seen as a threat -- so on
| macOS you have to use sudo, and on Linux sometimes you have to
| take extra steps to allow this kind of cooperation between
| processes -- we already saw people with custom kernels having
| issues with it.
|
| Going forward we'll definitely experiment with more profilers
| and over time add support for other ones as well.
|
| I saw you joined our Slack, we'll be happy to chat about
| profilers at some point :)
| itamarst wrote:
| Note that py-spy seems problematic in containers--it requires
| ptrace, which means you need a special capability, and that's
| a security risk so many environments won't even give people
| the option to enable it.
|
| In addition to vmprof, pyinstrument is another alternative.
| geoah wrote:
| I bumped onto pyroscope earlier this month and loved how easy it
| is to get up and running and integrating with golang services.
| I'm looking forward to see how pyroscope evolves! All the best
| luck :D
| petethepig wrote:
| Hi there,
|
| I'm very happy you found it easy to install. This has
| definitely been one of our priorities from the beginning -- I
| personally feel like it's a very important, but often
| overlooked detail, particularly in open source projects.
| tracyhenry wrote:
| Nice work! I have maybe a dumb question: why not use a RDBMS to
| store the logs and use a b-tree index for the range queries? Is
| there a type of query that you must build your own segment tree
| index for?
| petethepig wrote:
| We write profiles to DB with 10 second resolution, so 1 profile
| with approximately 1000 samples per 10 seconds. When we later
| read this data, if we're talking about 1 minute of data, we
| need to merge 6 profiles (1 per 10 seconds). However, if we're
| talking about an hour of profiling data, that turns into 360
| merges. Each merge is expensive, so this whole process becomes
| somewhat impractical.
|
| That's where segment trees come into play. On each write we
| "pre-aggregate" data for wider time ranges so that next time
| there's a wide read we can use a "wider" profile and thus
| reduce the total number of merges we need to make. Hope this
| helps visualize it: https://pyroscope-
| public.s3.amazonaws.com/slides-segment-tre...
|
| Let me know if you have any other questions, happy to answer
| here or in our Slack.
| tracyhenry wrote:
| Thanks, this perfectly answers my question!
| stephen wrote:
| Fwiw would throw in a feature request for wall-time based
| profiling / tracing.
|
| A lot of times in micro-services, performance issues are making
| many/slow I/O calls, and that doesn't really show up on a CPU-
| based profile.
|
| I.e. "this request took 10 seconds but only 100ms-or-less of CPU
| time"...
| jpgvm wrote:
| I see the current arch uses a separate process.
|
| Is the JVM integration likely to follow the same path or use a
| Java Agent?
|
| Very cool project, continuous profiling, distributed tracing and
| always-on debugging are production tooling I feel will eventually
| become common place just need to crack through the YAGNI by
| making them easier to obtain.
| petethepig wrote:
| I think for languages like Java we're gonna have the profiler
| run inside the profiled process. This is how it currently works
| in our Go integration.
|
| RE continuous profiling and things: that's our hope as well. At
| my last job I got a lot of people to start using these kinds of
| tools and it's fun to watch this technology adoption process
| that goes from "why do I need this?" to "I remember you showed
| me this once, how do I use it again?" to "wow, this saved us so
| much time / money".
|
| It's a bit of an uphill battle, but we're hopeful because
| there's clearly value in these tools.
| jpgvm wrote:
| That is good news. I think Java Agent is definitely the way
| to go for JVM. Gives you all the access and APIs you need
| with low resource usage and only need to drop file in place
| add flag to JVM.
|
| If you don't need the C API you can also write the agent in a
| JVM language which obviates the need platform specific
| binaries.
|
| Agree wholeheartedly on direction, I'm hoping for a final
| phase of "of course we have that" but maybe that is wishful
| thinking considering not even good metrics are a given in
| many shops still but we can hope for a better future.
___________________________________________________________________
(page generated 2021-02-15 23:01 UTC)