https://www.cell-lang.net/faq.html

Frequently Asked Questions

Why yet another programming language?

Because there's currently no high-level programming language for
writing stateful software that I'm aware of. That's the short answer.

To elaborate a bit on that we need to discuss separately the three
main types of what we generically refer to as "computation": pure
computation, state and updates, and I/O.

Pure computation is the process of computing a result from some input
data. That's what pure functions do. Some types of software, like
compilers or scripts that perform scientific calculations do hardly
anything else.

But in many cases pure computation is only a (relatively small) part
of what software does. Most applications are stateful: they sit
there, idle or semi-idle, and wait for some sort of input from the
external world. When that happens they change their state in response
(and usually also perform some I/O). Think, for example, of desktop,
web or mobile application, which respond to user input, or some types
of server applications, which respond to requests issued by the
client across the network.

Finally, there's I/O, which is what happens when a piece of software
interacts with the external world.

For pure computation, there's certainly a number of languages that
can be regarded as high-level, at least in some specific niches.
Languages like Matlab and Mathematica, for example, are very
effective for things like numerical and scientific computation, and
functional languages like Haskell do a decent enough job (although
there's certainly a lot of room for improvement there) when
implementing things like parsers, compilers and other types of
data-transformation software.

But when it comes to managing state, we're still living in the stone
age. The programming paradigm that has been by far the most popular
for the last twenty years or so, OOP, is a complete disaster,
hampered by, among (many) other things, a very low-level way of
representing information, one that makes data difficult to inspect,
manipulate and update with the result that code ends up being far
more complex that it should be, and much less reliable. Functional
programming's sins, on the other hand, are mainly of omission: in its
pure form it simply doesn't have a satisfactory way to deal with
state, which I suspect is the main reason it never expanded outside a
few niches. Impure functional programming is probably the best thing
we've at the moment, but it's still far from ideal.

What's worse, innovation seems to be dead. With only one (admittedly
important) exception I'm aware of, there's been no real progress
towards higher-level languages for this type of software for the last
30 years or so.

The situation is even worse, if possible, for I/O. Even task that
could be carried out automatically in any kind of sane programming
language/environment, often end up being a huge drag on developers.
Think for example of how much much effort goes into writing code that
only moves data into and out of databases. Or how much more complex
it is to implement a client/server application, compared to a local
application with similar functionalities, and how tedious and time
consuming it often is to implement even trivial things like sending
data from the client to the server and vice-versa. Yet the majority
of these task could be performed automatically by the compiler under
the hood, if one starts with a properly designed language and network
architecture.

So while there's certainly a lot that could be done to improve the
ability of today's programming languages to perform pure computation,
it's rather obvious that the low-hanging fruit in language design is
in the other two areas, state and I/O.

Cell is not about pure computation, and it's a rather conventional
language in that regard: it's basically a combination of ideas from
Matlab and functional programming. Its aim instead is to improve the
way state is managed, and to enable the compiler and runtime
environment to automatically perform many I/O tasks that are managed
by the developer in conventional languages.

How does Cell improve on existing languages?

The single most important improvement is in the way information is
represented. Cell's data model combines a staple of functional
programming, algebraic data types, with relations and other ideas
from relational databases. That alone goes a long way toward
simplifying (drastically, in some cases) both the data structures
used to encode the state of an application and the code that
manipulates them. The data model and type systems are also entirely
structural, which brings a number of benefits that are discussed
elsewhere, and most importantly enables one of the language's major
features: orthogonal persistence.

There's also a strict separation between pure computation and state
updates, and the developer is encouraged to shift as much complexity
as possible into the purely functional part of the code, which is
intrinsically less problematic to deal with. That provides many of
the benefits of functional programming, while avoiding the
difficulties the latter paradigm has in dealing with state. That's
made more effective by the fact that the state of the application can
be partitioned in separate components (called automata) that do not
share any mutable state and can be safely updated concurrently.
Future versions of the language will provide additional tools that
will help simplifying the part of the codebase that mutates the
application's internal state.

The combination of all the above properties, plus the fact that I/O
is also separate from pure computation and updates, enables a number
of features that are not found in conventional languages.

The first one is the ability to "replay" the execution of a Cell
program. One can easily reconstruct the exact state of a Cell program
at any point in time. Obviously that's very useful for debugging, but
there are others, more interesting ways of taking advantage of that.
It's for example possible to run identical replicas of the same Cell
process on different machines over a network, and that's something
that will be crucial in Cell's future network architecture.

The second one is orthogonal persistence, that is, the ability to
take a snapshot of the state of the application (or part of it),
which can then later be used to create with minimal effort an
identical copy of it, that will behave in exactly the same way. In
Cell, you don't save your data the way you do it in conventional
languages. Instead, you simply declare which parts of your
application's data structures have to be persisted, and the compiler
generates for you all the code needed to save and load them.

The third one is a very robust error-handling mechanism, based on
transaction. If an exception is thrown during an update, or if your
code violates any of the integrity constraints that are placed on
your data model, the entire update is simply rolled back and
discarded, and the state of your application is left untouched.

Then there's support for reactive programming, which is still in its
infancy. At the moment all the language provides is reactive
automata, which can be used to implement certain specific types of
reactive software, but reactive programming itself is something much
more general than that, and a lot more is coming in the future.

The last major part of the language will be its network model/
architecture, which hasn't been implemented yet.

What is functional/relational programming?

The term "functional/relational programming" was coined (I believe)
in a paper that came out more than a decade ago, Out of the Tar Pit.
It refers to a programming paradigm that combines functional
programming with some version of the relational model (in the case of
Cell one that is very different from the flavor used in SQL
databases), with the former providing the general purpose programming
capabilities, and the latter the ability to encode, navigate, query
and update large and complex application states or datasets.

The starting point of functional/relational programming is the
observation that the primary source of complexity and unreliability
in our programs is the presence of way too much mutable state. So in
order to address that problem, the first step is to avoid state
altogether whenever possible. That means doing all pure computation
using the functional part of the language, and clearly separating
pure code from code that has side effects.

The next step is to identify the parts of an application's state that
are redundant, and to get rid of them. A piece of information is
redundant if it can be derived from other pieces of information that
are already present in the system, or, in other words, if it can
always be reconstructed after having been deleted. Once you've
removed all redundant information what you're left with is the
essential state: at this point, there's nothing else you can remove
without permanently losing information. That leaves you with a much
smaller state, which is therefore a lot easier to deal with. Then you
need to design the data structures that encode the essential state so
as to avoid invalid/inconsistent states as much as possible. These
are not new ideas: they have been conventional wisdom in the database
community for nearly half a century.

For this latter step to be possible, you need a high-level data
model, and here the relational model, in one form or another, is the
only game in town. A high level data model also has the added benefit
of making information easier to search, navigate and manipulate.

Why is data/state representation so important?

This is the kind of thing that is difficult to explain in the
abstract, but which becomes obvious after you've implemented a
nontrivial piece of software with a language that provides a
high-level data model.

But as an example, compare the mathematical definition of a set with
its implementation in an imperative or even functional language. A
set in mathematics is an extremely simple entity: it's just a
collection of elements with no order and no duplicates. Operations
like union, intersection, difference or cartesian product can be
defined in a single very short line.

Now compare that to an implementation in any imperative or even
functional language that uses, say, red-black trees. Those data
structures contain a lot of low-level "information" that has nothing
to do with the abstract notion of set, and are subject to complex
invariants that have to be carefully maintain every time they are
updated. A typical implementation in an imperative language consists
of around a thousand lines of code, and it's very difficult to get
right.

And while red-black trees may be an especially complex example, the
data structures used to encode the state of any imperative (or, to a
lesser extent, functional) program are every bit as low level and
suffer from the same problems, like the presence of an overwhelming
amount of low-level noise and complex, hard to maintain dependencies
between their components, only on a much larger scale.

What is reactive programming?

I'm not sure there's a generally agreed upon definition, as reactive
programming has been implemented in many different forms. One thing
they all have in common, though, is the automatic propagation of
change. It is designed to solve one specific problem: very often, in
a stateful application, you end up having two (or more) variables or,
more generally, pieces of the application's state (let's call them A
and B in what follows) that depend on each other, in the sense that
whenever one of them changes, the other has to change as well.

The simplest form of dependency is when one of the two can be derived
from the other, that is, B = f(A), where f(..) is a pure functions.
One large scale example of this is the The Elm Architecture (which is
basically a functional version of the Model/View/Controller
architecture), where A is the model, B the HTML code that describe
the (current state of the) UI, and f(..) is the view function. Every
time the model changes, the view function is automatically
recalculated and UI is refreshed.

In less trivial cases the dependency between A and B may take the
form B' = f(A, B) (with B and B' being the old and new values of B
respectively), that is, every time A changes, the value of B is
updated but not entirely reset. As an example, say that you're
implementing a software thermostat that controls your air
conditioner. You want to switch it on when the temperature exceeds
28degC, and to switch it off when it goes back below 24degC. The state of
the air conditioner is not entirely determined by the current
temperature, because between 24degC and 28degC it could be either on or
off, depending on the the history of the temperature signal/variable.
Here's the Cell code for that:

reactive Thermostat {
  input:
    temperature: Float;

  output:
    on: Bool;

  state:
    // When the system is initialized, on is true if
    // and only if the current temperature exceeds 28degC
    on: Bool = temperature > 28.0;

  rules:
    // Switching on the air conditioner when
    // the temperature exceeds 28degC
    on = true when temperature > 28.0;

    // Switching it off when it falls below 24degC
    on = false when temperature < 24.0;
}



These dependencies can become a lot more complex that this, of
course. The dependency between A and B can, for example, be mutual,
with a change to A triggering an update/recalculation of B and
vice-versa. This is not something the reactive layer of Cell is
equipped to deal with at the moment, but this kind of situation will
be arise in the future network architecture, and the language will
need special constructs to deal with that.

Note that reactive programming is not in and of itself a
general-purpose programming paradigms, but rather a set of
capabilities that augment an existing paradigm. It works a lot better
when the paradigm it's applied to is a functional or declarative one,
because in languages with side effects and pointer aliasing it
becomes impossible to figure out statically when a variable has
changed, and in what order the recalculations should be done, and in
order to compensate for that, reactive frameworks for imperative
languages have to introduce a number of restrictions and/or resort to
baroque architectures that make the code significantly more complex,
and obscure the much simpler underlying logic.

If you want to learn more, check out the Wikipedia pages on reactive
and functional reactive programming.

Are functional/relational programming and relational automata only
for CRUD applications?

Absolutely not, although you might be excused for getting that
impression.

The vast majority of developers regard the relational model as some
sort of legacy, obsolete technology, and believe that modeling a
given problem or domain in terms of objects/classes is somehow
superior to modeling it using relations. In fact, that's completely
backwards. Objects are a clumsy and very low-level technology, and
I'm not aware of any type of information that cannot be modeled much
better with the relational model than with objects (Why relations are
better than objects explains this claim in more detail).

In all likelihood, the relational model gets it bad rap from the fact
that the only implementation of it developers are familiar with is
the one found in SQL databases. That particular implementation was
very lame to begin with and has not really evolved since its initial
design nearly half a century ago. On top of that, it was designed
specifically for databases, and it's not well-suited for general
purpose programming. But all those flaws and limitations are specific
to a particular implementation, not the relational model itself.

The best way to illustrate the advantages of functional/relational
programming over OOP is through a case study. Once version 0.5 is out
and the basic set of features of the language is finally complete,
we'll be comparing the implementations of a number of small but
realistic applications in the two paradigms (for a problems that are
universally regarded as especially well-suited to OOP) and that will
show how much shorter, simpler and more elegant the functional/
relational implementation can be, even when such a comparison is made
on OOP's home turf.

  * Start Here
      + Home
      + Overview
      + FAQ
      + Introductory example
      + A comparison with OOP
  * Rationale
      + Why relations are better than objects
  * Values and Types
      + Data
      + Types
  * The Functional Language
      + Functions
      + Imperative code
      + Procedures
      + Type checking
      + Protocols, implicit arguments and memoization
      + Benchmarks
  * Relational Automata
      + Schemas
      + Data modeling
      + Methods
      + Aggregate functions
      + Inheritance and polymorphism
      + Wiring automata together
      + Design Process
      + State updates
      + Using relational automata
      + Benchmarks
  * Reactive Automata
      + Reactive automata
      + Using reactive automata
      + Future work
  * Interfacing with...
      + Java
      + C#
  * Network architecture
      + Overview
  * Miscellanea
      + Getting started
      + Standard library
      + Status and roadmap