[HN Gopher] Building a recommendation engine inside Postgres wit...
       ___________________________________________________________________
        
       Building a recommendation engine inside Postgres with Python and
       Pandas (2020)
        
       Author : rbanffy
       Score  : 44 points
       Date   : 2021-10-26 20:30 UTC (2 hours ago)
        
 (HTM) web link (blog.crunchydata.com)
 (TXT) w3m dump (blog.crunchydata.com)
        
       | wallace01 wrote:
       | Enjoyed that read but to be honest I always had mixed feelings
       | about PSQL Extensions. Logic in a database is wrong
        
         | CameronNemo wrote:
         | Can you expand on why you think it is wrong or problematic?
        
           | FridgeSeal wrote:
           | Data and its structure outlives the developer, and often the
           | application.
           | 
           | Tightly coupling a lot of the application-specific compute in
           | with how the day is stored and accessed says you up for even
           | more difficulties when you need to debug, scale, migrate
           | storage/compute or evolve your application faster or more
           | radically than your data organisation.
        
           | systemvoltage wrote:
           | Database is the last thing that scales usually so if you put
           | a bunch of computational load on it besides queries, you've
           | set yourself up for scaling/sharding sooner than later.
        
             | lowwave wrote:
             | Not always! If computation involves math over set of
             | records, then Postgres is great for that. Have the
             | operation inside the db reduce connection pooling on the
             | application level.
        
         | arthurcolle wrote:
         | Do what thou wilt shall be the whole of the Law
        
         | [deleted]
        
         | snissn wrote:
         | It's so amazing to have the ability to enhance your database
         | with custom methods. Keeps your data model really well
         | organized across your infrastructure
        
           | systemvoltage wrote:
           | That's the job of an API IMO.
        
       | earthscienceman wrote:
       | It's interesting to me to see pandas used in this application.
       | I'd be curious to see a more fully featured implementation.
       | 
       | I'm a scientist by profession and I've been working on building
       | out several different generalized data processing pipelines for
       | some specific problems in my sub-field, to make gathering and
       | formatting in-situ data easier and more
       | standardized/open/version-controlled. It's going great, worlds
       | better than the smattering of matlab code strewn across the hard
       | drives in the lab written in a non-collaborative manner and
       | shared by email...
       | 
       | ... but. I'll admit, I've run into a _lot_ of footguns in the
       | pandas API in terms of efficiency. You 'll do something it what
       | seems to be the logical way, or in a way that the API funnels you
       | towards (like the groupby calls in the OP), and you'll quickly
       | realize that if you're working on large-ish tables (>10Gb in
       | memory) that it was the stupid way to do things. In terms of
       | readable code to share with colleagues, pandas can't be beat, but
       | things get wonky when you reach significant complexity and I
       | would be surprised if it made any sense to use in a 'real'
       | recommendation engine when considering developer productivity.
        
       | mooneater wrote:
       | Seems like a nice approach, I used to try things like this with
       | postgres (embedding sklearn).
       | 
       | Though it can be slow, harder to deal with memory usage issues,
       | harder to debug in general, harder to extend/generalise.
       | 
       | And for me postgres is now one of multiple datastores so doing
       | this is not as helpful.
        
       ___________________________________________________________________
       (page generated 2021-10-26 23:00 UTC)