hngopher.com

       [HN Gopher] ArcticDB: Why a Hedge Fund Built Its Own Database
       ___________________________________________________________________
        
       ArcticDB: Why a Hedge Fund Built Its Own Database
        
       Author : todsacerdoti
       Score  : 41 points
       Date   : 2024-08-21 13:17 UTC (3 days ago)
        
 (HTM) web link (www.infoq.com)
 (TXT) w3m dump (www.infoq.com)
        
       | dang wrote:
       | Related:
       | 
       |  _ArcticDB: A high-performance, serverless Pandas DataFrame
       | database_ - https://news.ycombinator.com/item?id=35198131 - March
       | 2023 (1 comment)
       | 
       |  _Introducing ArcticDB: Powering data science at Man Group_ -
       | https://news.ycombinator.com/item?id=35181870 - March 2023 (1
       | comment)
       | 
       |  _Introducing ArcticDB: A Database for Observability_ -
       | https://news.ycombinator.com/item?id=31260597 - May 2022 (31
       | comments)
        
         | Nelkins wrote:
         | I don't think the last link is related. Different database.
        
           | silisili wrote:
           | Correct. They renamed FrostDB, here is the announcement -
           | 
           | https://www.polarsignals.com/blog/posts/2022/06/16/arcticdb-.
           | ..
        
       | OutOfHere wrote:
       | https://github.com/man-group/arcticDB
        
       | stackskipton wrote:
       | Read the presentation. Answer was what I expected. We had unique
       | problem and because we make oil drums amount of cash, dipping a
       | bucket and taking that cash to solve the problem was easy
       | justification.
       | 
       | These are really smart people solving problems they have but many
       | companies don't have buckets of cash to hire really smart people
       | to solve those problems.
       | 
       | Also, the questions after presentation pointed out the data isn't
       | always analyzed in their database so it's more like storage
       | system then database.
       | 
       | >Participant 1: What's the optimization happening on the pandas
       | DataFrames, which we obviously know are not very good at scaling
       | up to billions of rows? How are you doing that? On the pandas
       | DataFrames, what kind of optimizations are you running under the
       | hood? Are you doing some Spark?
       | 
       | >Munro: The general pattern we have internally and the users
       | have, is that your returning pandas DataFrames are usable.
       | They're fitting in memory. You're doing the querying, so it's
       | like, limit your results to that. Then, once people have got
       | their DataFrame back, they might choose another technology like
       | Polars, DuckDB to do their analytics, depending on if they don't
       | like pandas or they think it's too slow.
        
         | datahack wrote:
         | This comment is underrated comedy gold. You clearly have worked
         | with big data.
        
         | primitivesuave wrote:
         | I skipped to the "why build a database" section and then
         | skipped another two minutes of his tangential thoughts - seems
         | like the answer is "because Moore's law"?
        
       | tda wrote:
       | I know there are tons of problems that are solved in excel while
       | they really shouldn't. Instead of getting the expert business
       | analyst to use a better tool (like pandas), money is spent to
       | "fix" excel.
       | 
       | Apparently there is also a class of problems that outgrow pandas.
       | And instead of the business side switching to more suitable
       | tools, some really smart people are hired to build crutches
       | around pandas.
       | 
       | Oh well, they probably had fun doing it. Maybe they get to work
       | on nogil python next
        
       | bdjsiqoocwk wrote:
       | Isn't it constrained to minutely timestamps or something like
       | that.
        
       ___________________________________________________________________
       (page generated 2024-08-24 23:00 UTC)