https://www.circle-lang.org/draft-profiles.html
Why Safety Profiles Failed
Document #:
Date: 2024-10-24
Project: Programming Language C++
Audience:
Reply-to: Sean Baxter
Contents
* 1 Abstract
* 2 C++ is under-specified
+ 2.1 Inferring aliasing
+ 2.2 Inferring lifetimes
+ 2.3 Inferring safeness
* 3 Lifetime safety is static typing
* 4 Lifetime parameters don't cause soundness bugs
* 5 C++ is too irregular for Profiles
+ 5.1 C++ cannot enforce exclusivity
* 6 Carcinization
* 7 C++ in the future
* 8 References
1 Abstract
As for dangling pointers and for ownership, this model detects
all possible errors. This means that we can guarantee that a
program is free of uses of invalidated pointers.
- A brief introduction to C++'s model for type- and resource-
safety[type-and-resource-safety-2015]
Safety Profiles were introduced in 2015 with the promise to detect
all lifetime safety defects in existing C++ code. It was a bold
claim. But after a decade of effort, Profiles failed to produce a
specification, reliable implementation or any tangible benefit for
C++ safety. The cause of this failure involves a number of mistaken
premises at the core of its design:
1. "Zero annotation is required by default, because existing C++
source code already contains sufficient information"[P3465R0]
2. "We should not require a safe function annotation"[P3446R0]
3. "Do not add a feature that requires viral annotation"[P3466R0]
4. "Do not add a feature that requires heavy annotation"[P3466R0]
The parameters of the problem make success impossible. This paper
examines the contradictions in these premises, explains why the
design didn't improve safety in the past and why it won't improve
safety in the future.
2 C++ is under-specified
Zero annotation is required by default, because existing C++
source code already contains sufficient information.
- Pursue [P1179R1] as a Lifetime Safety TS[P3465R0]
C++ source code does not have sufficient information for achieving
memory safety. A C++ function declaration lacks three things that are
critical for lifetime safety:
1. Aliasing information.
2. Lifetime information.
3. Safeness information.
Functions involving parameter types with pointer or reference
semantics have implicit aliasing, lifetime and safeness requirements.
Safety Profiles cannot recover these properties from C++ code,
because there are no language facilities to describe them. These
requirements are only specified in documentation, if they are
specified at all.
2.1 Inferring aliasing
A C++ compiler can infer nothing about aliasing from a function
declaration. A function parameter with a mutable reference might
always alias other parameters, it might never alias other parameters,
or it might not care about aliasing other parameters.
// i and j must always alias. They must refer to the same container.
void f1(std::vector::iterator i, std::vector::iterator j) {
// If i and j point into different vectors, you have real problems.
std::sort(i, j);
}
// vec must not alias x.
void f2(std::vector& vec, int& x) {
// Resizing vec may invalidate x if x is a member of vec.
vec.push_back(5);
// Potential use-after-free.
x = 6;
}
// vec may or may not alias x. It doesn't matter.
void f3(std::vector& vec, const int& x) {
vec.push_back(x);
}
f1 and f2 have aliasing requirements. In f1, both iterators must
point into the same container. In f2, x must not come from the
container vec. These requirements are only visible as documentation.
The compiler cannot infer a function's aliasing requirements from its
declaration or even from its definition. If the safety profile
enforces no mutable aliasing, then the definitions of f1 and f3 will
fail to compile, breaking your program.
int main() {
std::vector vec1, vec2;
// *Incorrectly* permits call.
// UB, because the iterators point into different containers.
f1(vec1.begin(), vec2.end());
// *Incorrectly* rejects call.
// This is the correct usage, but mutable aliasing prevents compilation.
f1(vec1.begin(), vec1.end());
// *Correctly* rejects call.
f2(vec1, vec1[2]);
// *Incorrectly* rejects call.
f3(vec1, vec1[2]);
}
Profiles chose the wrong convention for several uses. It permits the
incorrect call to f1 to compile, but rejects a correct usage of f1 on
the grounds of mutable aliasing. An unsound call to f2 is correctly
rejected, but a sound call to f3 is also rejected. Rejecting or
permitting code (rightly or wrongly) is a matter of coincidence, not
intelligence.
Without language-level aliasing information, compile-time memory
safety is not possible. This requirement is the motivation for Rust's
borrow type. A mutable borrow cannot alias other borrows. That's
enforced by the borrow checker. Raw pointers have no aliasing
requirements, but are unsafe to dereference. In general, things that
can be checked by the compiler are checked, and things that can't be
checked are unsafe to use.
(Compiler Explorer)
#include
#include
void func(std::vector& vec, int& x) {
vec.push_back(1);
x = 2; // A write-after-free when x is a member of vec!
}
int main() {
std::vector vec;
vec.push_back(1);
func(vec, vec[0]);
std::cout<< vec[0]<< "\n";
std::cout<< vec[1]<< "\n";
}
Program returned: 0
1
1
The Safety Profiles partial reference implementation can't prevent
aliasing-related undefined behavior because C++ doesn't provide
aliasing information.
2.2 Inferring lifetimes
A C++ compiler can infer nothing about lifetimes from a function
declaration. A reference return type may be constrained by the
lifetimes of any number of reference parameters, by none of the
reference parameters, or by some other lifetime.
// The returned reference is only constrained by the lifetime of the map
// parameter.
// It is not constrained by the lifetime of the key parameter.
const int& f4(std::map& map, const int& key) {
return map[key];
}
// The returned reference is constrained by the lifetime of both x and y
// parameters.
const int& f5(const int& x, const int& y) {
return std::min(x, y);
}
// The returned reference is not constrained by the lifetime of any
// reference parameter.
const int& f6(const int& key) {
static std::map map;
return map[key];
}
These three functions have different lifetime requirements, which are
indicated by comments. This information is available to developers
but not to the compiler. What's the strategy to uphold these lifetime
requirements? Read the documentation, read the code, and don't make
mistakes.
int main() {
std::map map;
// r4 is constrained by lifetimes of map and 40.
int& r4 = f4(map, 40);
// *Incorrectly* rejects usage of r4. r4 is constrained to the lifetime
// of the temporary 40, which expired at the end of the above statement.
int x = r4;
// r5 is constrained by lifetimes of 50 and 51.
const int& r5 = f5(50, 51);
// *Correctly* rejects usage of r5. The reference refers to one of the
// two expired temporaries. This use would be a use-after-free.
int y = r5;
// r6 is constrained by the lifetime of 60.
const int& r6 = f6(60);
// *Incorrectly* rejects usage of r6.
// The return reference r6 should not be constrained by the lifetime of 60.
int z = r6;
}
Profiles take a similarly conservative approach to lifetimes as they
do with aliasing. The lifetime of a returned reference is constrained
by the lifetimes of all of its arguments. This is fortuitous for a
function like std::min, which returns a reference to either of its
function parameters. It's bad for a function like std::map::
operator[], which takes a key argument by reference but returns a
reference that's only constrained by the lifetime of this.
Since the compiler has no information about function parameter
lifetimes, it can't accurately flag out-of-contract function calls.
f4 and f6 take references to temporary objects but return references
that should not be constrained to that temporary. In both cases, the
safety profile rejects a subsequent use of the reference as a
use-after-free, because it applies a too-conservative convention.
The need for explicit lifetime information in function types is the
motivation for Rust's lifetime arguments. A returned reference must
be annotated with a lifetime parameter that is constrained by a
function parameter on the same function, or it must be static. The
alternative is to be deluged with an impossible quantity of
use-after-free false positives.
(Compiler Explorer)
#include
#include
const int& f4(std::map& map, const int& key) {
return map[key];
}
int main() {
std::map map;
const int& ref = f4(map, 200);
int x = ref;
}
:11:11: warning: dereferencing a dangling pointer [-Wlifetime]
int x = ref;
^~~
:10:32: note: temporary was destroyed at the end of the full expression
const int& ref = f4(map, 200);
^
The Safety Profiles reference implementation can't accurately deal
with lifetimes because C++ doesn't provide lifetime information. The
tool doesn't test for correctness, it only tests if your code
conforms to a pre-chosen convention.
2.3 Inferring safeness
We should not require a safe function annotation that has the
semantics that a safe function can only call other safe
functions.
- (Re)affirm design principles for future C++ evolution[P3446R0]
Recall what "safe" actually means:
* A safe function has defined behavior for all valid inputs.
* An unsafe function has soundness preconditions. Calling an unsafe
function with out-of-contract inputs may result in undefined
behavior.
A C++ compiler can infer nothing about safeness from a function
declaration. It can't by tell by looking what constitutes an
out-of-contract call and what doesn't. A safe-specifier indicates the
presence of soundness preconditions. An unsafe-block permits the user
to escape the safe context, prove the preconditions, and call the
unsafe function.
template
class vector {
public:
size_t size() const noexcept safe {
return _len;
}
T& operator[](size_t index) noexcept safe {
// Can call size() because it's a safe function.
if(index >= size())
panic("Out-of-bounds vector::operator[]");
unsafe {
// Pointer operations only allowed in unsafe context.
// Safety proof:
// The allocation has size() valid elements and index < size().
return _data[index];
}
}
private:
T* _data;
size_t _len, _cap;
};
Let's take a really simple case: vector::operator[]. Profiles have to
reject pointer arithmetic, because there's no static analysis
protection against indexing past the end of the allocation. How is
the compiler told to permit the raw pointer subscript in the
return-statement in vector::operator[]? In Rust and Safe C++, enter
an unsafe-block.
This design distinguishes safe functions, which have no soundness
preconditions and can be called from other safe functions, and unsafe
functions, which require an unsafe-block escape to use, just like
pointer operations.
Separation of safe and unsafe functions is common in memory-safe
languages. Rust and C#[csharp] include an unsafe function specifier
and an unsafe-block construct. This is a human- and tooling-readable
tag for auditing potential origins of soundness defects. Aliasing and
lifetimes are transitive properties that must be recoverable from a
function declaration in order to be upheld. Safeness (the lack of
soundness preconditions) is another transitive property that must be
marked in a function declaration. The way to do that is with a
safe-specifier.
template< class RandomIt >
void sort( RandomIt first, RandomIt last );
Let's consider another example: the std::sort API that takes two
random-access iterators. This is an unsafe function because it
exhibits undefined behavior if called with the wrong arguments. But
there's nothing in the type system to indicate that it has soundness
preconditions, so the compiler doesn't know to reject calls in safe
contexts.
What are sort's preconditions?
* The first and last iterators must point at elements from the same
container.
* first must not indicate an element that appears after last.
* first and last may not be dangling iterators.
In the absence of a enforced safeness information, it's up to the
user to follow the documentation and satisfy the requirements.
Guidance for calling unsafe functions is essentially "don't write
bugs."
void func(std::vector vec1, std::vector vec2) {
// #1 - *Incorrectly* rejects correct call for mutable aliasing
sort(vec1.begin(), vec1.end());
// #2 - *Incorrectly* permits out-of-contract call.
sort(vec1.begin(), vec2.end());
}
In the Profiles model, the correct call to sort #1 is rejected due to
mutable aliasing. That's bad, but permitting the out-of-contract call
#2 is worse, because it's a soundness bug. There's no realistic
static analysis technology to verify that a call to sort meets its
preconditions. Even the safety profile with the most conservative
aliasing setting lets this call through.
This is where safe and unsafe specifiers play an important role. From
the caller's perspective, sort is unsafe because it has preconditions
that must be upheld without the compiler's help. From the callee's
perspective, sort is unsafe because it's written with unsafe
operations. Pointer differencing computes a pivot for the sort, and
pointer differencing is undefined when its operands point to
different allocations.
// No safe-specifier means unsafe.
void sort(vector::iterator begin, vector::iterator end);
// A safe-specifier means it can only call safe functions.
void func(vector vec1, vector vec2) safe {
// Ill-formed: sort is an unsafe function.
// Averts potential undefined behavior.
sort(vec1.begin(), vec2.end());
unsafe {
// Well-formed: call unsafe function from unsafe context.
// Safety proof:
// sort requires both iterators point into the same container.
// Here, they both point into vec1.
sort(vec1.begin(), vec1.end());
}
}
The only way to enforce memory safety is to separate safe and unsafe
functions with a safe-specifier. In this example, func is safe
because it's defined for all valid inputs. It cannot call sort,
because that has soundness preconditions: the two iterators must
point into the same container. A call to sort in a safe context
leaves the program ill-formed, because the compiler cannot guarantee
that the preconditions are satisfied. But by entering an unsafe-block
, the user can prove the preconditions and make the unsafe call
without the compiler's soundness guarantees.
[P3081R0] does float a [[suppress(profile)]] attribute to turn off
certain Profiles checkes. It looks like the equivalent of an
unsafe-block. It may permit pointer operations in a definition, but
it doesn't address the other side of the call: without a
safe-specifier, how does the Profiles design deal with functions like
sort that are inherently unsafe? They must be separated from provably
safe functions. User intervention, wrapped up in unsafe-blocks, is
needed to satisfy their preconditions. Without this bump of impedance
the language cannot guarantee safety, as the property that a safe
functions contains no undefined behavior is not transitively upheld.
(Compiler Explorer)
#include
#include
#include
int main() {
std::vector v1, v2;
v1.push_back(1);
v2.push_back(2);
// UB!
std::sort(v1.end(), v2.end());
}
Program returned: 139
double free or corruption (out)
Program terminated with signal: SIGSEGV
The Safety Profiles reference implementation can't deal with unsafe
functions, because C++ doesn't know which functions are unsafe. This
out-of-contract call produces a heap double-free and then segfaults.
3 Lifetime safety is static typing
Do not add a feature that requires viral annotation.
- (Re)affirm design principles for future C++ evolution[P3446R0]
Rust's safety model incorporates lifetime arguments on every
reference (or struct with reference semantics) that occurs in a
function type. The authors of Profiles disparagingly call these
"viral annotations." Don't be scared. C++ has always been full of
viral annotations: types are viral annotations.
Types establish type safety properties that are enforced by both the
caller and callee. These properties are transitive (i.e. viral)
because they're enforced through any number of function calls,
creating a network of reasoning from the point where an object is
created to all of its uses.
Languages that treat types as viral annotations are statically-typed
languages. Languages that don't are dynamically-typed languages.
These have well-known trade-offs. Statically-typed languages exhibit
higher performance and provide more information to developers;
programs in a statically-typed language may be easier to reason
about. Dynamically-typed languages are much simpler and can be more
productive.
Lifetime parameters, which provide crucial information to the
compiler to enable rigorous safety analysis, defines another axis of
typing. Rust has static lifetimes, which is a high-performance,
high-information approach to memory safety. Users can reason about
lifetimes and aliasing because those concepts are built into the
language. The compiler has sufficient information to rigorously
enforce lifetime safety with borrow checking.
Most other memory-safe languages use dynamic lifetimes, of which
garbage collection is an implementation. Instead of enforcing
lifetimes and exclusivity at compile time, the garbage collector
manages objects on the heap and extends their scope as long as there
are live references to them. This has the same basic trade-off as
dynamic typing: simplicity at the cost of performance.
Static lifetimes Dynamic lifetimes
Static types Rust Java, Go
Dynamic types - Javascript, Python
The static types/static lifetimes quadrant is a new area of language
design, at least for languages widely used in production. The
principles may be unfamiliar. Lifetime annotations feel different
than type annotations because they establish relationships between
parameters and return types rather than on individual parameters and
objects. Instead of answering the question "What are the properties
of this entity?" they answer "How does this entity relate to other
entities?".
Profiles fail because they reject, as a design principle, the
specific language improvements that provide necessary lifetime
information for compile-time safety.
4 Lifetime parameters don't cause soundness bugs
Annotations are distracting, add verbosity, and some can be wrong
(introducing the kind of errors they are assumed to help
eliminate).
- Profile invalidation - eliminating dangling pointers[P3446R0]
This is not right. In a memory-safe language you can't introduce
undefined behavior with mere coding mistakes. That's the whole point
of memory safety. If you put the wrong lifetime annotation on a
parameter, your program becomes ill-formed, not undefined. A mistaken
use of lifetime parameters can be an ergonomics bug, or it can mask
undefined behavior when wrapping an unsafe function in a safe
interface, but it can't cause undefined behavior.
(Compiler Explorer)
fn f1<'a, 'b>(x:&'a i32, y:&'b i32) -> &'b i32 {
return x;
}
error: lifetime may not live long enough
--> lifetime1.rs:5:10
|
4 | fn f1<'a, 'b>(x:&'a i32, y:&'b i32) -> &'b i32 {
| -- -- lifetime `'b` defined here
| |
| lifetime `'a` defined here
5 | return x;
| ^ function was supposed to return data with lifetime `'b` but it is returning data with lifetime `'a`
|
= help: consider adding the following bound: `'a: 'b`
Lifetime constraints are a contract between the caller and callee. If
either side violates the contract, the program is ill-formed. In the
code above, the lifetime constraints are violated by the callee. The
lifetime of the x parameter does not outlive the lifetime of the
returned reference. We used the wrong annotation, but instead of
leading to undefined behavior, the compiler produces a detailed
message that explains how the lifetime contract was not met.
(Compiler Explorer)
fn f2<'a, 'b>(x:&'a i32, y:&'b i32) -> &'b i32 {
// Well-formed. The lifetime on y outlives the lifetime on
// the return reference.
return y;
}
fn f3() {
let x = 1;
let r:&i32;
{
let y = 2;
r = f2(&x, &y);
}
// Ill-formed: r depends on y, which is out of scope.
let z = *r;
}
error[E0597]: `y` does not live long enough
--> lifetime2.rs:15:16
|
14 | let y = 2;
| - binding `y` declared here
15 | r = f2(&x, &y);
| ^^ borrowed value does not live long enough
16 | }
| - `y` dropped here while still borrowed
...
19 | let z = *r;
| -- borrow later used here
Let's fix the implementation of the callee and test a broken version
of the caller. The returned reference depends on y, but it's used
after y goes out of scope. The compiler rejects the program and tells
us "y does not live long enough."
The use of lifetime annotations on parameters is the same as the use
of type annotations on parameters: it turns an intractable
whole-program analysis problem into an easy-to-enforce local-analysis
problem. Lifetime annotations, which exist to guarantee safety, do
not jeopardize safety.
5 C++ is too irregular for Profiles
Do not add a feature that requires heavy annotation. "Heavy"
means something like "more than 1 annotation per 1,000 lines of
code."
- (Re)affirm design principles for future C++ evolution[P3446R0]
We have an implemented approach that requires near-zero
annotation of existing source code.
- Pursue [P1179R1] as a Lifetime TS[P3465R0]
Central to Safety Profiles is the claim that annotations are
exceptional rather than the norm. For this to be true, the great bulk
of C++ would need to be written according to some preferred
convention. [P1179R1] chooses "no mutable aliasing" and constrains
reference return types to all reference parameters. Let's consider a
number of Standard Library functions and compare their aliasing and
exclusivity requirements to those conventions. Functions that don't
adhere to these conventions must be annotated, and those annotations
must be virally propagated up the stack to all callers, as aliasing
and lifetime requirements are transitive. Only functions that have no
soundness preconditions can be considered safe.
Let's start in and work through alphabetically,
indicating how functions deviate from the Safety Profile's aliasing
and lifetime conventions:
// Unsafe!
// Precondition: `first` and `last` must alias.
template< class InputIt, class UnaryPred >
bool all_of( InputIt first, InputIt last, UnaryPred p );
template< class InputIt, class UnaryPred >
bool any_of( InputIt first, InputIt last, UnaryPred p );
template< class InputIt, class UnaryPred >
bool none_of( InputIt first, InputIt last, UnaryPred p );
// Unsafe!
// Precondition 1: `first` and `last` must alias.
// Lifetime: The return type is not constrained by the lifetime of `value`
template< class InputIt, class T >
InputIt find( InputIt first, InputIt last, const T& value );
template< class InputIt, class UnaryPred >
InputIt find_if( InputIt first, InputIt last, UnaryPred p );
template< class InputIt, class UnaryPred >
InputIt find_if_not( InputIt first, InputIt last, UnaryPred q );
// Unsafe!
// Precondition 1: `first` and `last` must alias.
// Precondition 2: `s_first` and `s_last` must alias.
// Lifetime: The return type is not constrained by the lifetime of `s_first`
// or `s_last`.
template< class InputIt, class ForwardIt >
InputIt find_first_of( InputIt first, InputIt last,
ForwardIt s_first, ForwardIt s_last );
// Unsafe!
// Precondition 1: `first` and `last` must alias.
template< class ForwardIt >
ForwardIt adjacent_find( ForwardIt first, ForwardIt last );
// Unsafe!
// Precondition 1: `first1` and `last2` must alias.
// Lifetime: The returned Input1 is constrained only by `first1` and `last1`
// Lifetime: The returned Input2 is constrained only by `first2`.
template< class InputIt1, class InputIt2 >
std::pair mismatch( InputIt1 first1, InputIt1 last1,
InputIt2 first2 );
// Unsafe!
// Precondition 1: `first` and `last` must alias.
// Precondition 2: `s_first` and `s_last` must alias.
// Lifetime: The returned ForwardIt1 is constrained only by `first` and `last`
template< class ForwardIt1, class ForwardIt2 >
ForwardIt1 search( ForwardIt1 first, ForwardIt1 last, ForwardIt2 s_first,
ForwardIt2 s_last );
The functions in mostly involve iterators which are
inherently unsafe. Additionally, the lifetime convention chosen by
Profiles is frequently wrong: the lifetime of a returned reference
rarely is constrained by the lifetimes of all its parameters. You'd
need annotations in all of these cases.
Consider these conventions against the API for a container. Let's
look at :
// Aliasing: the `key` parameter may alias `*this`.
// Lifetimes: the returned T& is only constrained by `*this` and not by `key`.
T& map::at( const Key& key );
T& map::operator[]( const Key& key );
// Aliasing: the `key` parameter may alias `*this`.
// Lifetimes: the returned iterator is only constrained by `*this` and not by
// `value`.
iterator map::find( const Key& key );
iterator map::lower_bound( const Key& key );
iterator map::upper_bound( const Key& key );
// Aliasing: the `value` parameter may alias `*this`.
// Lifetimes: the returned iterator is only constrained by `*this` and not by
// `value`.
std::pair map::insert( const value_type& value );
// Unsafe!
// Precondition 1: `pos` must point into `*this`
// Aliasing: the `value` parameter may alias `*this` or `pos`
// Lifetimes: The returned iterator is only constrained by `*this` and not by
// `value`.
iterator map::insert( iterator pos, const value_type& value );
// Aliasing: The `k` and `obj` parameters may alias `*this`.
// Lifetimes: The returned iterator is only constrained by `*this` and not by
// `k` or `value`.
template< class M >
std::pair map::insert_or_assign( const Key& k, M&& obj )
// Unsafe!
// Precondition 1: `hint` must point into `*this`
// Aliasing: The `k` and `obj` parameters may alias `*this` and `hint`.
// Lifetimes: The returned iterator is only constrained by `*this` and not by
// `k` or `value`.
template< class M >
iterator insert_or_assign( const_iterator hint, const Key& k, M&& obj );
This is only a few of the map APIs which would either be unsafe or
require annotations in the Profiles model. The conservative aliasing
rules gets most member functions wrong: a reference returned from a
member function is typically constrained only by the *this/self
parameter. That's what Rust's lifetime elision rules do. Regardless
of the convention chosen, expect annotations every time the function
does something different. With C++ code, it does something different
very often.
#include
int main() {
std::map m;
m[1] = 2;
// Temporary 1 expires. Profiles considers `value` a dangling reference.
int& value = m[1];
// Profiles should flag this apparent use-after-free.
value = 2;
}
Profile's inability to deal accurately with lifetimes means that an
implementation would reject much valid code. In this example the
subscript to map::operator[] is a temporary. It goes out of scope at
the end of the statement. Under the Profile's conservative lifetime
convention, the returned reference (stored in value) would be
considered a dangling reference and the subsequent use would make the
program ill-formed.
I do not believe that C++ code, with its countless unstated soundness
preconditions and inconsistent aliasing and lifetime requirements,
can be made memory safe with fewer than "1 annotation per 1,000 lines
of code." In fact, legacy C++ code will have many more annotations
than equivalent Rust code. Rust often chooses object relocation to
pass parameters by value rather than pass them by reference. This
reduces the number of lifetime constraints that the system deals
with. Additionally, it has simpler, safe versions of facilities which
are unsafe in C++: the Rust iterator, for example, keeps both the
data pointer and length in the same struct to completely alleviate
the aliasing concerns that prevent safety analysis in C++.
5.1 C++ cannot enforce exclusivity
The density of annotations required to vet existing code is not the
biggest problem facing Profiles. C++ overload resolution has created
a knot that cannot be untangled. Its standard conversion rules are
one reason why C++ is considered inherently unsafe.
For many accessor-style C++ APIs, there are two overloads:
1. A candidate that binds a const object and a returns a const
reference (or pointer or iterator).
2. A candidate that binds a mutable object and returns a mutable
reference (or pointer or iterator).
If the mutable candidate can be chosen, it is chosen, no matter what
the result object is used for.
void f1(const int& x, const int& y);
void f2(std::vector vec) {
// The mutable overload of operator[] is called here.
f1(vec[0], vec[1]);
}
This code will not pass an exclusivity test. vec is a mutable object,
so vec[0] calls the mutable version of operator[] and produces a
mutable reference result object. While that mutable loan is in scope
(it remains in scope until f1 returns), vec[1] calls the mutable
version of operator[] to produce its mutable reference result object.
But you're not allowed more than one mutable reference to the same
place. This is an exclusivity error!
Rust avoids this problem in two ways:
* In general there is no function overloading. As a convention, if
there are mutable and const versions of a function, the mutable
one is named with a _mut suffix.
* There is syntax sugar which maps subscript operations to either
index or index_mut. The latter is chosen in a mutable context,
which is the left-hand side of an assignment.
We can't ditch function overloading and remain C++. But we can change
how overload resolution evaluates candidates. The standard conversion
is responsible for binding references to expressions. C++ chooses the
wrong (for safety purposes) subscript candidate because the standard
conversion is able to bind mutable references to lvalue expressions.
(Compiler Explorer)
void f3(const int^ x, const int^ y) safe;
int main() safe {
std2::vector vec { };
// Okay.
f3(vec[0], vec[1]);
// Ill-formed: mutable borrow of vec between its mutable borrow and its use.
f3(mut vec[0], mut vec[1]);
}
safety: during safety checking of int main() safe
borrow checking: example.cpp:13:22
f3(mut vec[0], mut vec[1]);
^
mutable borrow of vec between its mutable borrow and its use
loan created at example.cpp:13:10
f3(mut vec[0], mut vec[1]);
^
Safe C++ changes the standard conversion to work around this language
defect. In this extension, standard conversions do not bind mutable
references. vec[0] chooses the const candidate, which permits
aliasing, and mut vec[0] chooses the mutable candidate, which does
not. By opting in to mutation, you get aliasing by default.
(Compiler Explorer)
#feature on safety
int main() safe {
int x = 1;
int^ ref = x; // Ill-formed! Can't bind mutable reference to lvalue.
}
error: example.cpp:5:14
int^ ref = x;
^
cannot implicitly bind borrow int^ to lvalue int
The mut keyword[mutation] puts the subexpression into the mutable
context and restores the restricted functionality. In the mutable
context, the compiler will bind mutable references to expression:
(Compiler Explorer)
#feature on safety
int main() safe {
int x = 1;
int^ ref = mut x; // Ok. Can bind mutable references in mutable context.
}
Now, the const overload of a function is chosen unless the user
escapes with the mut keyword. This addresses a language defect
head-on.
What option does Profiles have? In its full generality, the mutable
binding default makes for an exceptionally thorny analysis problem.
Does Profiles replace calls to mutable candidates with calls to
similarly-named const candidates? That's a presumption. Does it
retroactively classify mutable loans as shared loans depending on
usage? I'm not a soundness maverick. This is getting close to
touching a live wire.
Legacy C++ errs on the side of mutability, making it too
unconstrained to test for soundness. Old code is what it is.
6 Carcinization
The development of new product lines for use in service of
critical infrastructure or NCFs (national critical functions) in
a memory-unsafe language (e.g., C or C++) ... is dangerous and
significantly elevates risk to national security, national
economic security, and national public health and safety.
- CISA, Product Security Bad Practices[cisa]
[P3466R0] insists that "we want to make sure C++ evolution ... hews to
C++'s core principles." But these are bad principles. They make C++
extra vulnerable to memory safety defects that are prevented in
memory-safe languages. The US Government implicates C++'s core
principles as a danger to national security and public health.
Static lifetimes Dynamic lifetimes
Static types Rust Java, Go
Dynamic types - Javascript, Python
Reconsider this table. We want to evolve C++ to live in the static
types/static lifetimes quadrant. Since Rust is the only species in
that design family (at least among production languages), a new entry
is necessarily going to resemble Rust (at least in its memory safety
treatment) more than it does other languages. An earnest effort to
pursue [P1179R1] as a Lifetime TS[P3465R0] will compromise on C++'s
outdated and unworkable core principles and adopt mechanisms more
like Rust's. In the compiler business this is called carcinization: a
tendency of non-crab organisms to evolve crab-like features.
* Standard C++ doesn't have aliasing information. We need a new
reference type that upholds the "mutation XOR aliasing" rule as a
program-wide invariant.
* Standard C++ doesn't have lifetime information. We need lifetime
parameters to indicate constraint relationships between function
parameters and return references.
* Safety is a transitive property. It has to be upheld with a
safe-specifier on functions to establish the absence of soundness
preconditions and an unsafe-block to call unsafe operations.
* Lifetime constraints are a transitive property. They must be
upheld by both caller and callee as viral annotations.
* Lifetime constraints on functions do not follow any particular
convention. Constraints that deviate from a default (such as the
lifetime elision rules) require annotation, even heavy
annotations that may exceed 1 per 1,000 lines of code.
* The standard conversion rules make exclusivity enforcement
impossible. We have to change the language default, establishing
no implicit mutation in order to support aliasing in functions
that take const references.
7 C++ in the future
I think it is worth pursuing this compatible path first before,
or at least at the same time as, trying to graft another foreign
language's semantics onto C++ which turns C++ into "something
else" and/or build an off-ramp from C++.
- Pursue [P1179R1] as a Lifetime TS[P3465R0]
Who does this provincialism serve? The latest Android security study
"prioritizes transitioning to memory-safe languages."[
android-security] The off-ramp from C++ is an increasingly viable and
attractive strategy for projects looking to reduce CVE exposure. The
off-ramp is happening and its benefits are measurable. As the Android
study observes, "once we turn off the tap of new vulnerabilities,
they decrease exponentially, making all of our code safer."
All focus should be on turning off the tap of new vulnerabilities.
Incorporating Rust's safety model into C++ helps in two ways:
1. It provides an off-ramp from unsafe C++ to Safe C++ within a
single toolchain. Projects can follow best practices for Safe
Coding[safe-coding] without retraining the whole engineering
staff in a new programming language.
2. It can hasten the migration to Rust by improving C++/Rust
interop. By extending C++ with representations of all Rust
constructs that can appear in function declarations (such as Rust
enums, borrows and lifetimes, ZSTs, traits, etc) the number of
common vocabulary types is greatly increased. This allows interop
tooling to map between C++ and Rust declarations at a more
expressive level than the current C-level API.
C++ can be made memory safe, but not by dismissing everything that
works, which is what the authors of Safety Profiles do. The language
must evolve to be more explicit in how it expresses aliasing,
lifetime and safeness properties. C++ can meet the security needs of
its users, both in a principal role, and, for those projects
determined to take the off-ramp, in an important supporting role.
8 References
[android-security] Eliminating Memory Security Vulnerabilities at the
Source.
https://security.googleblog.com/2024/09/
eliminating-memory-safety-vulnerabilities-Android.html?m=1
[cisa] Product Security Bad Practices.
https://www.cisa.gov/resources-tools/resources/
product-security-bad-practices
[csharp] unsafe (C# Reference).
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/
keywords/unsafe
[mutation]
https://safecpp.org/draft.html#explicit-mutation
[P1179R1] Herb Sutter. 2019-11-22. Lifetime safety: Preventing common
dangling.
https://wg21.link/p1179r1
[P3081R0] Core safety Profiles" Specification, adoptability, and
impact.
https://isocpp.org/files/papers/P3081R0.pdf
[P3446R0] Profile invalidation - eliminating dangling pointers.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3446r0.pdf
[P3465R0] Pursue P1179 as a Lifetime TS.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3465r0.pdf
[P3466R0] (Re)affirm design principles for future C++ evolution.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3466r0.pdf
[safe-coding] Secure by Design: Google's Perspective on Memory
Safety.
https://storage.googleapis.com/gweb-research2023-media/pubtools/
7665.pdf
[type-and-resource-safety-2015] A brief introduction to C++"s model
for type- and resource- safety.
https://www.stroustrup.com/resource-model.pdf