Subj : Re: Memory Barriers, Compiler Optimizations, etc. To : comp.programming.threads From : SenderX Date : Wed Feb 02 2005 12:47 am > So I have two questions about this. First, are acquire/release properly > part of loads/stores, or does it make sense for them to be standalone? If > the former, how are programmers in languages like C/C++ expected to make > the association between reads/writes and memory barriers? Ok. Well, acquire/release semantics work very well with confining the visiblity of loads and stores in a critical section. They can also be used to send objects between processors via producer/consumer relationship. You should wrap up the load/store and membar in a single function. These functions will work for for this kind of stuff most of the time; you need to study the specs for your specific compiler... I will use Alex's notation for the code... ;) /* sink-store barrier extern void* ac_cpu_i686_mb_store_ssb ( void**, void* ) */ align 16 ac_cpu_i686_mb_store_ssb PROC mov ecx, [esp + 4] mov eax, [esp + 8] sfence mov [ecx], eax ret ac_cpu_i686_mb_store_ssb ENDP /* hoist-load barrier with dd "hint" extern void* ac_cpu_i686_mb_load_ddhlb ( void** ) */ align 16 ac_cpu_i686_mb_load_ddhlb PROC mov ecx, [esp + 4] mov eax, [ecx] lfence ret ac_cpu_i686_mb_load_ddhlb ENDP /* classic release (slb+ssb -- see below) extern void* ac_cpu_i686_mb_store_rel ( void**, void* ) */ align 16 ac_cpu_i686_mb_store_rel PROC mov ecx, [esp + 4] mov eax, [esp + 8] mfence mov [ecx], eax ret ac_cpu_i686_mb_store_rel ENDP /* acquire with data dependency extern void* ac_cpu_i686_mb_load_ddacq ( void** ) */ align 16 ac_cpu_i686_mb_load_ddacq PROC mov ecx, [esp + 4] mov eax, [ecx] mfence ret ac_cpu_i686_mb_load_ddacq ENDP /* DCL pseudo-code using fine-grain barriers */ 1. static T *shared = 0; 2. T *local = ac_cpu_i686_mb_load_ddhlb( &shared ); 3. if ( ! local ) 4. { ac_mutex_lock( &static_mutex ); 5. if ( ! ( local = shared ) ) 6. { local = ac_cpu_i686_mb_store_ssb( &shared, new T ); } 7. ac_mutex_unlock( &static_mutex ); } /* DCL pseudo-code using coarse barriers */ 1. static T *shared = 0; 2. T *local = ac_cpu_i686_mb_load_ddacq( &shared ); 3. if ( ! local ) 4. { ac_mutex_lock( &static_mutex ); 5. if ( ! ( local = shared ) ) 6. { local = ac_cpu_i686_mb_store_rel( &shared, new T ); } 7. ac_mutex_unlock( &static_mutex ); } See how memory barriers can be embedded in the correct place within loads and stores to create a sort of producer/consumer relationship wrt common shared data? Also, combining all of this in a single externally assembled function can cut down on the chances of a rouge compiler reordering your "critical-sequence" under your nose, and your application crashing seven or eight months down the line from some mystery race-condition... ;) .