hngopher.com

       [HN Gopher] Executable Examples for Programming Problem Comprehe...
       ___________________________________________________________________
        
       Executable Examples for Programming Problem Comprehension [pdf]
        
       Author : luu
       Score  : 31 points
       Date   : 2022-05-13 20:07 UTC (1 days ago)
        
 (HTM) web link (cs.brown.edu)
 (TXT) w3m dump (cs.brown.edu)
        
       | jswrenn wrote:
       | Oh, whoa, I'm the author of this! Happy to answer any questions.
        
       | acbart wrote:
       | I saw this talk at ICER, and I really loved how it led to the
       | idea of evaluating tests in terms of "wheats" (good programs pass
       | the test) and "chaffs" (bad programs fail the tests). They
       | describe this in terms of Thoroughness and Validity.
       | 
       | > A suite is valid if it accepts (i.e., its assertions pass) all
       | correct implementations... In order for a suite to be valid for
       | all implementations of median, it must not include any assertions
       | involving empty input lists. We can accurately identify such
       | assertions as invalid by checking them against two correct
       | implementations (henceforth wheats [24])... If a student asserts
       | that implementations should produce an error on empty inputs,
       | their suite will reject the wheat that produces 0 (and visa
       | versa). Provided that the set of wheats completely exercises the
       | space of underspecified behaviors permitted by the specification,
       | accepting all wheats guarantees that a suite is valid and will
       | accept all correct implementations.
       | 
       | > A suite is thorough if it rejects (i.e., its assertions do not
       | pass) buggy implementations. We assess the thoroughness of a
       | suite by running it against a curated set of buggy
       | implementations (henceforth chaffs [24]). The thoroughness of a
       | suite is measured as the proportion of chaffs it rejects. To
       | assess test suites, the set of chaffs should include subtly buggy
       | implementations. To assess examples, we take a different
       | perspective: the set of chaffs should exercise logical
       | misunderstandings that students are likely to make. For instance,
       | to assess the thoroughness of examples for median, the set of
       | chaffs could include implementations of mean and mode.
       | 
       | I want to see this used in more curricula and tools. I need to
       | see if there's been any follow-up on this research and learn how
       | it's gone.
        
         | np_tedious wrote:
         | Seems roughly analogous to consistency and completeness in a
         | logic system
        
       ___________________________________________________________________
       (page generated 2022-05-14 23:01 UTC)