https://blog.brownplt.org/2021/11/21/b2t2.html

The Brown PLT Blog

[brownplt]

RSS

CONTACT

GROUP PAGE

PREVIOUS POSTS

  * Automated, Targeted Testing of Property-Based Testing Predicates
  * A Benchmark for Tabular Types
  * Student Help-Seeking for (Un)Specified Behaviors
  * Adding Function Transformers to CODAP
  * Developing Behavioral Concepts of Higher-Order Functions
  * Adversarial Thinking Early in Post-Secondary Education
  * Teaching and Assessing Property-Based Testing
  * Students Testing Without Coercion
  * Using Design Alternatives to Learn About Data Organizations
  * What Help Do Students Seek in TA Office Hours?
  * Combating Misconceptions by Encouraging Example-Writing
  * The Hidden Perils of Automated Assessment
  * Mystery Languages
  * Resugaring Type Rules
  * Picking Colors for Pyret Error Messages
  * Can We Crowdsource Language Design?
  * Crowdsourcing User Studies for Formal Methods
  * User Studies of Principled Model Finder Output
  * A Third Perspective on Hygiene
  * Scope Inference, a.k.a. Resugaring Scope Rules
  * The PerMission Store
  * Examining the Privacy Decisions Facing Users
  * The Pyret Programming Language: Why Pyret?
  * Resugaring Evaluation Sequences
  * Slimming Languages by Reducing Sugar
  * In-flow Peer Review: An Overview
  * Tierless Programming for SDNs: Differential Analysis
  * Tierless Programming for SDNs: Verification
  * Tierless Programming for SDNs: Optimality
  * Tierless Programming for SDNs: Events
  * Tierless Programming for Software-Defined Networks
  * CS Student Work/Sleep Habits Revealed As Possibly Dangerously
    Normal
  * Parley: User Studies for Syntax Design
  * Typechecking Uses of the jQuery Language
  * Verifying Extensions' Compliance with Firefox's Private Browsing
    Mode
  * From MOOC Students to Researchers
  * Social Ratings of Application Permissions (Part 4: The Goal)
  * Social Ratings of Application Permissions (Part 3: Permissions
    Within a Domain)
  * Social Ratings of Application Permissions (Part 2: The Effect of
    Branding)
  * Social Ratings of Application Permissions (Part 1: Some Basic
    Conditions)
  * Aluminum: Principled Scenario Exploration Through Minimality
  * A Privacy-Affecting Change in Firefox 20
  * The New MOOR's Law
  * Essentials of Garbage Collection
  * (Sub)Typing First Class Field Names
  * Typing First Class Field Names
  * S5: Engineering Eval
  * Progressive Types
  * Modeling DOM Events
  * Mechanized LambdaJS
  * ECMA Announces Official lJS Adoption
  * Objects in Scripting Languages
  * S5: Wat?
  * Belay Lessons: Smarter Web Programming
  * S5: Semantics for Accessors
  * S5: A Semantics for Today's JavaScript
  * The Essence of JavaScript
  * ADsafety

A Benchmark for Tabular Types

Posted on 21 November 2021.

Tables are Everywhere

Tables are ubiquitous in the world. Newspapers print tables. Reports
include tables. Even children as young as middle-school work
comfortably with tables. Tables are, of course, also ubiquitous in
programming. Because they provide an easy-to-understand, ubiquitous,
already-parsed format, they are also valuable in programming
education (e.g., DCIC works extensively with tables before moving on
to other compound datatypes).

(Typed) Programming with Tables

When it comes to programming with tables, we have excellent tools
like relational databases. However, using external databases creates
impedance mismatches, so many programmers like to access tabular data
from directly in the language, rather than construct external calls.
The popularity of language-embedded query has not diminished with
time.

Programming with tables, however, requires attention to types. Tables
are inherently heterogeneous: each column is welcome to use whatever
type makes most sense. This is all the more so if tables are a part
of the language itself: while external data tend to be limited to
"wire-format" types like numbers and strings, inside the language
they can contain images, functions, other tables, and more. (For
instance, we use all of these in Pyret.)

What is the type of a table? To make the output of tabular operations
useful, it can't be something flat like just Table. Because tables
are heterogenous, they can't have just a single type parameter (like
Table<T>). It may conceptually make sense to have a type parameter
for each column (e.g., Table<String, Number>), but real-world tables
can have 17 or 37 columns! Programmers also like to access table
columns by name, not only position. And so on.

Making Results Comparable

In Spring 2021, we ran a seminar to understand the state of knowledge
of type systems for tables. While we read several excellent papers,
we also came away very frustrated: authors simply did not seem to
agree on what a "table" was or what operations to support. The result
was an enormous degree of incommensurability.

Therefore, rather than invent Yet Another Tabular Type System, we
decided to take a step back and address the incommensurability
problem. What we need as a community is a shared, baseline
understanding of several aspects of tables. That is what this work
does: create a tables benchmark. This is not a performance benchmark,
however; rather, it's an expressivity and design benchmark. We call
it B2T2: The Brown Benchmark for Tabular Types.

The benchmark doesn't spring out of thin air. Rather, we extensively
studied tabular support in widely-used industrial languages/
libraries: R, Python/Pandas, and Julia. To cover educational needs,
we also studied the Pyret-based Bootstrap:Data Science curriculum.
You will notice that all are based on dynamic languages. (Though
Pyret has an optional static type system, it currently does not
support tables in any meaningful manner, so tabular programming is
essentially dynamic.) This is intentional! If you start with a typed
language, you end up reflecting the (potentially artificial and
overly-restrictive) constraints of that type system. Rather, it's
healthy to study what programmers (seem to) want to say and do,
filter these for reasonability, and reconcile that with the needs of
static types (like decidability).

Do Now!

What do you expect to find in a tabular programming benchmark?

Make a list before you read on!

Benchmark Components

B2T2 has the following parts:

  * A definition of a table. There is actually a large space of
    possibilities here. We've chosen a definition that is both broad
    and interesting without being onerous.

  * Examples of tables. Why did we bother to provide these? We do so
    because many type systems may have all sorts of internal
    encodings. They are welcome to do so, but they cannot expect the
    outside world to conform to their representation. Therefore,
    these examples represent the canonical versions of these tables.
    Explaining how these will be converted to the internal format is
    the responsibility of the type system designers.

  * An API of table operations. This is of course the heart of the
    benchmark. In particular, different papers seem to use different
    subsets of operations. What is unclear is whether the missing
    operations are just as easy as the ones shown; difficult; or even
    impossible. This is therefore a big source of incommensurability.

  * Example programs. Depending on the representation of tables and
    the nature of the type systems and languages, these programs may
    have to be rewritten and may (to some observers) look quite
    unnatural.

All these might be predictable with some thought. There are two more
components that may be a little more surprising:

  * Erroneous programs. In all sophisticated systems, there is a
    trade-off between complexity and explainability. We are disturbed
    by how little discussion there is of error-reporting in the
    papers we've read, and think the community should re-balance its
    emphasis. Even those who only care about technical depth (boo!)
    can take solace: there can be rich technical work in explaining
    errors, too! Furthermore, by making errors an explicit component,
    a team that does research into human factors--even if they leave
    all other aspects alone--has a "place to stand" to demonstrate
    their contribution.

  * A datasheet. To improve commensurability, we want authors to tell
    each other--and their users--in a standard format not only what
    they did but also where the bodies are buried.

Of course, all these parts are interesting even in the absence of
types. We just expect that types will impose the most interesting
challenges.

An Open Process

We expect this benchmark to grow and evolve. Therefore, we've put our
benchmark in a public repository. You're welcome to make
contributions: correct mistakes, refine definitions, add features,
provide more interesting examples, etc. You can also contribute
solutions in your favorite language!

For More Details

You can read about all this in our paper and work with our repository
.