https://mmapped.blog/posts/25-domain-types.html
* mmap(blog)
* Posts
* About
* Atom Feed
Universal domain types
2024-02-26 2024-02-26 Reddit
---------------------------------------------------------------------
* Introduction
* Language features
+ Newtypes
+ Typedefs
* Domain type classes
+ Identifiers
+ Amounts
+ Loci
+ Quantities
* Conclusion
* Exercises
---------------------------------------------------------------------
in strong typing we trust
Inscription on the Ada coin
Introduction
Skillful use of a strong static type system can eliminate certain
classes of bugs. Using custom application-specific types instead of
raw integers or strings is a powerful technique that will save you
hours of debugging. The following quote from one of my favorite books
on software design illustrates this point:
It took six months, but I eventually found and fixed the bug. [...]
At one point in the code there was a block variable containing a
logical block number, but it was accidentally used in a context
where a physical block number was needed.
John Ousterhout, `A Philosophy of Software Design', Chapter 14,
`Choosing names'
The author attributes the bug to poor variable naming, but this blame
is misplaced. If the programmer had defined distinct types for
logical and physical block numbers, the compiler would have caught
this mistake immediately. In this article, I call such definitions
domain types. They serve as documentation, help catch bugs at compile
time, and make the code more secure[ ] The book Secure by Design by
Dan Bergh Johnsson et al. provides many examples of using domain
types for building a secure system from the ground up. .
This article shows a systematic approach to domain types, provides
examples of domain types applicable to most applications, and
contains hints on how to implement them effectively.
Language features
Many languages provide syntax for simplifying domain type
definitions. Such definitions create a new distinct type sharing
representation with a chosen underlying type (e.g., a 64-bit
integer). The semantics of such definitions vary across languages,
but they usually fall into one of two categories: newtypes and
typedefs.
Newtypes
Newtypes wrap an existing type, allow the programmer to inherit some
of the operations from the underlying type, and add new operations.
Newtypes are flexible but may need boilerplate code to implement all
features required in a real-world application. [?][ ] An example of
using the newtype idiom in Rust. Inheriting basic operations, such as
comparison and hashing, is easy, but arithmetic operations require a
lot of boilerplate code. Some third-party packages, such as
derive_more, make this task easier.
/// The number of standard SI apples.
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct MetricApples(i64);
impl std::ops::Add for MetricApples {
type Output = Self;
fn add(self, other: Self) -> Self {
MetricApples(self.0 + other.0)
}
}
Haskell and Rust are examples of languages supporting newtypes.
Typedefs
Typedefs introduce a new name for an existing type, inheriting all
underlying type operations. [?][ ] An example of using typedefs in Go.
Typedefs inherit all operations from the underlying type, even those
meaningless for the new type.
/// MetricApples hold the number of standard SI apples.
type MetricApples int64
func main() {
a, b := MetricApples(2), MetricApples(3)
// Go allows us to add, multiply, and divide MetricApples.
// Note that all these operations give us MetricApples back, which doesn't always make sense.
// Apples times apples should give apples squared.
// Dividing apples should give a dimensionless number.
fmt.Printf("%[1]T %[1]d, %[2]T %[2]d, %[3]T %[3]d\n", a+b, a*b, b/a)
}
Go, D and Ada provide typedefs (Ada calls typedefs derived types).
The Boost project for C++ implements typedefs as a library (C's
typedef declarations are `weak typedefs': they introduce an alias for
an existing type, not a new type).
Newtypes and typedefs are versatile and practical, but they approach
the problem in a way that's too simplistic and mechanical. There is a
more systematic way to think about domain types.
Domain type classes
A constraint on component design leads to freedom and power when
putting those components together into systems.
Runar Bjarnason, Constraints Liberate, Liberties Constrain.
Over the years, I found that specific classes of domain types appear
repeatedly in most applications I work on. This section is an
overview of these categories.
I use pseudo-Rust syntax to illustrate the concepts, but the ideas
should easily translate to any statically typed language. [?][ ] The
interface shared by all universal domain types in this article.
trait DomainType {
/// The primitive type representing the domain value.
type Representation;
/// Creates a domain value from its representation value.
fn from_repr(repr: Representation) -> Self;
/// Extracts the representation value from the domain value.
fn to_repr(self) -> Representation;
}
The code snippets present minimal interfaces for each type class.
Practical concerns often require adding more operations. For example,
using identifiers as keys in a dictionary requires exposing a hash
function (for hash maps) or imposing an ordering (for search trees),
and serializing values requires accessing their internal
representation.
Identifiers
One of the most common uses of domain types is a transparent handle
for an entity or an asset in the real world, such as a customer
identifier in an online store or an employee number in a payroll
application. I call these types identifiers.
Identifiers have no structure, i.e., we don't care about their
internal representation. The only fundamental requirement is the
ability to compare values of those types for equality. This lack of
structure suggests an appropriate mathematical model for such types:
a set, a collection of distinct objects. [?][ ] The minimal interface
for identifiers.
trait Eq {
/// Returns true if two values are equal.
fn eq(&self, other: &Self) -> bool;
}
trait IdentifierLike: DomainType + Eq {}
Newtypes are a perfect fit for identifiers thanks to their ability to
hide structure. Typedefs, on the other hand, impose too much
structure, allowing the programmer to add and subtract numeric
identifiers accidentally. But given the choice, typedefs are safer
than raw integers or strings.
Amounts
Another typical use of domain types is representing quantities, such
as the amount of money in usd on a bank account or the file size in
bytes. Being able to compare, add, and subtract amounts is essential.
Generally, we cannot multiply or divide two compatible amounts and
expect to get the amount of the same type back[ ] Unless we're
modeling mathematical entities, such as probabilities or points on an
elliptic curve. . Multiplying two dollars by two dollars gives four
squared dollars. I don't know about you, but I'm yet to find a
practical use for squared dollars.
Multiplying amounts by a dimensionless number, however, is
meaningful. There is nothing wrong with a banking app increasing a
dollar amount by ten percent or a disk utility dividing the total
number of allocated bytes by the file count.
The appropriate mathematical abstraction for amounts is vector spaces
. Vector space is a set with additional operations defined on the
elements of this set: addition, subtraction, and scalar
multiplication, such that behaviors of these operations satisfy a few
natural axioms. [?][ ] The minimal interface for amounts.
trait Ord: Eq {
/// Compares two values.
fn cmp(&self, other: &Self) -> Ordering;
}
trait VectorSpace {
/// The scalar type is usually the same as the Representation type.
type Scalar;
/// Returns the additive inverse of the value.
fn neg(self) -> Self;
/// Adds two vectors.
fn add(self, other: Self) -> Self;
/// Subtracts the other vector from self.
fn sub(self, other: Self) -> Self;
/// Multiplies the vector by a scalar.
fn mul(self, factor: Scalar) -> Self;
/// Divides the vector by a scalar.
fn div(self, factor: Scalar) -> Self;
}
trait AmountLike: IdentifierLike + VectorSpace + Ord {}
Newtypes allow us to implement amounts, but they might need some
tedious code to get the multiplication and division right. Typedefs
are handy, but get multiplication and division wrong, confusing
dollars and dollars squared.
Loci
Working with space-like structures, such as time and space, poses an
interesting challenge. Spaces have two types of values: absolute
positions and relative distances.
Positions refer to points in space, such as timestamps or
geographical coordinates. Distances represent a difference between
two such points.
Some natural languages acknowledge the distinction and offer
different words for these concepts, such as `o'clock' vs. `hours' in
English or `Uhr' vs. `Stunden' in German.
While distances behave the same way as amounts, positions are
trickier. We can compare, order, and subtract them to compute the
distance between two points. For example, subtracting 5 am on Friday
from 3 am on Saturday gives us twenty-two hours. Adding or
multiplying these dates makes no sense, however. This semantic
demands a new class of types, loci (plural of locus).
One example of the locus/distance dichotomy coming from system
programming is the memory address arithmetic. Low-level programming
languages differentiate pointers (memory addresses) and offsets
(distances between addresses). In the C programming language, the
void* type represents a memory address, and the ptrdiff_t type
represents an offset. Subtracting two pointers gives an offset, but
adding or multiplying pointers is meaningless.
We can view each position as a distance from a fixed origin point.
Changing the origin or the distance type calls for a new locus type.
[?][ ] The minimal interface for loci.
trait LocusLike: IdentifierLike + Ord {
/// The type representing the distance between two positions.
type Distance: AmountLike;
/// The origin for the absolute coordinate system.
const ORIGIN: Self;
/// Moves the point away from the origin by the specified distance.
fn add(self, other: Distance) -> Self;
/// Returns the distance between two points.
fn sub(self, other: Self) -> Distance;
}
Timestamps offer an excellent demonstration of the `distance type +
the origin' concept. Go and Rust represent timestamps as a number of
nanoseconds passed from the unix epoch (midnight of January 1st,
1970), The C programming language defines the time_t type, which is
almost always the number of seconds from the unix epoch. The q
programming language also uses nanoseconds, but chose the millennium
(midnight of January 1st, 2000) as its origin point. Changing the
distance type (e.g., seconds to nanoseconds) or the origin (e.g.,
unix epoch to the millennium) calls for a different timestamp type.
The Go standard library employs the locus type design for its time
package, differentiating the time instant (time.Time) and time
duration (time.Duration).
The Rust standard module std::time is a more evolved example. It
defines the SystemTime type for wall clock time (the origin is the
unix epoch), Instant for monotonic clocks (the origin is `some
unspecified point in the past', usually the system boot time), and
the Duration type for distances between two clock measurements.
Quantities
So far, we considered applications where domain types barely interact
with one another. Many applications require combining values of
different domain types in a single expression. A physics simulation
might need to multiply a time interval by a velocity to compute the
distance travelled. A financial application might need to multiply
the dollar amount by the conversion rate to get an amount in euros.
We can model complex type interactions using methods of dimensional
analysis. If we view amounts as values with an attached label
identifying their unit, then our new types are a natural extension
demanding a more structured label equivalent to a vector of base
units raised to rational powers. For example, acceleration would have
label (distance x time^-2), and the usd/eur pair exchange rate would
have label (eur x usd^-1). I call types with such rich label
structure quantities.
Quantities are a proper extension of amounts: addition, subtraction,
and scalar multiplication work the same way, leaving the label
structure untouched. The additional label structure gives meaning to
multiplication and division.
The result of multiplication will have a base unit vector with the
component-wise sum of the power vectors of the unit factors. For
example, car fuel consumption computation could use an expression
like 2 (km) x 0.05 (liter x km^-1) = 0.1 (liter).
Dividing values produces a label that's a component-wise difference
between the dividend and divisor power vectors. For example, running
pace computation could use an expression like 10 (min) / 2 (km) = 5
(min x km^-1). [?][ ] The minimal interface for quantities.
trait QuantityLike: AmountLike {
/// Multiplies two quantities.
fn mul>(self, other: O)
-> impl QuantityLike>;
/// Divides self by the specified quantity.
fn div>(self, other: O)
-> impl QuantityLike>;
}
Quantities require complex type-level machinery, which makes them
hard to implement in most languages. Boost.Units is one of the first
libraries to provide comprehensive implementations of quantity types
in C++. Rust ecosystem offers the dimensioned package. The units
package is a popular choice in the Haskell ecosystem.
If your language doesn't support advanced type-level programming or
using rigid types is impractical in your application, you can do unit
checks at runtime. Python's quantities package is an example of this
approach.
Conclusion
This article shows a systematic approach to designing domain types:
we identify the minimal interface the type must satisfy to address
practical needs and find suitable mathematical machinery.
We discussed four classes of domain types that fit naturally in
almost any application: identifiers, amounts, loci, and quantities.
Each type class is a little gem, a universal structure akin to a
design pattern. Unlike design patterns, the universal domain types
are based on mathematical abstractions and can be specified
precisely.
I'm sure you will identify a few places in your application where one
of these type classes fits perfectly. If you use Rust, you might find
my phantom_newtype package helpful. Maybe you'll discover some gems
of your own. Have fun!
Exercises
The following questions and exercises will help you understand and
apply the material in this article.
1. Think of bugs you found in your career that were embarrassingly
hard to find but trivial to fix. Could using more precise types
prevent those bugs?
2. Are there any areas in the application you're working on where
one of the universal domain types could improve type safety?
3. How do universal domain types relate to one another? Draw a line
diagram.
4. Non-spacial quantities, such as mass and electrical charge, don't
seem to require corresponding locus types. Why is that?
5. Did you find any other domain types in your work that might be
universal? If so, comment on GitHub issue roman-kashitsyn/
mmapped.blog#50. I'll gladly add your gem to this article with a
proper attribution.
The off-chain reporting protocol-
---------------------------------------------------------------------
(c)Roman Kashitsyn Creative Commons License
Source Code