Subj : 'volatile' Rules
To   : comp.programming.threads
From : Uenal Mutlu
Date : Tue Jun 07 2005 12:03 pm

'volatile' Rules in C/C++ MT programs:
--------------------------------------

Facts about 'volatile':
 f1) volatile disables optimizations for the given variable.
 f2) some measurements show that its use leads to about 35% and more performance penalty.

Below I've tried to formulate a rule for when safely not to use volatile
for getting max performance and still be on the safe side.

Rule: volatile is not necessary if (a OR (b AND c AND d) holds:
  a) variable is used in a locked state (ie. protected via mutex etc.).
  b) size of variable <= machine word.
  c) variable is aligned on machine word boundary
     (ie. its start adress is an integral multiple of machine word size).
  d) within the same func there is at least 1 "funccall" before
     subsequent queries to, or modifications of, the variable.

Regarding d:
  Clear:
    c1) calling an operator for a primitive type is NOT a "funccall".

  Unclear yet (in this context):
    u1) is calling an inline func a "funccall"? [yes]
    u2) is calling an operator for a non-primitive type a "funccall"? [yes]


Reference citations from a discussion in comp.lang.c++.moderated
(thread "passing data between threads"):

rc1)  [...] Posix defines the semantics of C with regards to threading in such a way
  that a compiler cannot maintain variables accessible from another thread
  in registers across calls to a certain number of functions, including all
  functions which acquire a lock.  If the compiler is Posix conform,
  volatile isn't necessary [...]
  (James Kanze)

rc2)  [...] Locking mutexes (and misleadigly named critical sections) impose a memory
  barrier and is a function call. A compiler can not assume state of variables
  after a function call, so it rereads them from memory no matter if they
  are declared volatile or not. So, you don't need volatile in this case [...]
  (Maxim Yegorushkin)

rc3)  [...] The point is that when you do the locking (or any other operation that
  guarantees visibility of changes), then, well, the changes are
  guaranteed to be visible. You don't need to do any further tricks.
  Remember that in order to write programs compliant with some MT standard
  (like POSIX), then everything you use needs to be aware of this standard
  - not only the OS, but also the compiler, run-time library, everything.
  This means that the compiler cannot optimize accesses across locking
  operations. Either it is dumb enough not to optimize across any function
  call or it is smart enough to recognize that what you are doing is a
  sychronization operation with some guarantees that cannot be bypassed.
  So, you don't need to do anything to ensure that the variable will be
  re-read after entering the locked region [...]
  (Maciej Sobczak)


Constructive fixes and additions welcome.

U.Mutlu

.