[HN Gopher] Machine Learning Models Are Missing Contracts
       ___________________________________________________________________
        
       Machine Learning Models Are Missing Contracts
        
       Author : Aliabid94
       Score  : 17 points
       Date   : 2021-01-19 18:05 UTC (4 hours ago)
        
 (HTM) web link (gradio.app)
 (TXT) w3m dump (gradio.app)
        
       | gillesjacobs wrote:
       | I cannot agree with OP more. ML model code itself is too often
       | seen as documentation for a paper, in the sense that authors
       | implicitly expect users to go through the pre-processing pipeline
       | to find the actual implementation steps.
       | 
       | This is because the data handling and pre-processing nitty gritty
       | is not actually interesting from an academic perspective.
       | 
       | I cannot count the times I went to look at a paper's
       | implementation source code on a benchmark dataset to find it cut-
       | off annotated sequences to a fixed max length, essentially
       | turning it into a different dataset making comparison to previous
       | work invalid.
       | 
       | Good documentation costs time, time academics don't have.
        
       | aliabd wrote:
       | I think ML as a field just needs to really mature. A lot of the
       | work feels super hacky, sort of a mix between research, demos,
       | and production.
        
         | fuzzybear3965 wrote:
         | I don't think the author would disagree with you. In fact, I
         | think this article was highlighting one specific area in which
         | the field could improve.
        
         | mlthoughts2018 wrote:
         | More specifically, at least in industry, we need SRE / ops
         | support to mature. Taking a team of people who are highly
         | specialized at the research layer of a statistical computing
         | problem, then treating them like they are immature when they
         | get massively overloaded also solving credential management,
         | Kubernetes config, web service hardening, efficient data
         | pipelining, etc. etc. is just such a whiny and immature thing
         | to see come out of infra / ops team leaders, that leads to
         | burning out ML engineers, and wasting a lot of money failing to
         | extract value from their comparative advantage for the business
         | just because infra / ops leaders can't get it together and
         | solve ML coordination problems.
        
         | Guest42 wrote:
         | Right. I think it ignores the importance of the data when
         | building a model. Even great data can lead to difficult
         | modeling scenarios. The premise that a "solution" can guarantee
         | (or even partially guarantee) to be useful is misleading.
        
       | data_ders wrote:
       | What's the difference between a test and a contract? I agree that
       | code in the ML space needs to be more rigorously tested
       | especially the data flowing in and out. But how are contracts
       | different?
        
         | drewcoo wrote:
         | A contract defines how the parties should interact. A test
         | determines how something is or behaves. Contract tests are a
         | thing.
        
       ___________________________________________________________________
       (page generated 2021-01-19 23:02 UTC)