Another look at Peterson

December 6, 2008

I had another read of Bartosz Milewski recent blog post about the different memory orderings of c++0x.

The most interesting thing about the blog post (that I missed last time) was actually the proposed Peterson locking algortithm that used some none default c++0x  memory orderings for the stores and loads of its atomic variables. It turns out that his proposed implementation is broken. Yupp, that’s right, even some of the experts on concurrency can get lost when straying from the straight and narrow road of  default sequential memory ordering. Exactly why the implementation is broken, and another implementation that actually works (by Dmitriy V’jukov) , is explained (or even proven) on Anthony Williams blog.

Anthony’s blog post was a very informative read. I probably read it ten times until I (think I) understood why one implementation was broken and the other wasnt. Lots of “ahaa – no wait a minute – oh OK, I think I see what you did there” moments. Reasoning about concurrency is allot like pealing an onion, there is always another layer underneath that you didn’t know about. 

(The problem as I understand it, was that the atomic variable “victim” was read and written by both threads,  and one needed to make sure that the writes that happen before on the first  thread where synchronized to the the reads that happen after on the second thread. Because the memory ordering for the write on the first thread was release only, and the read on the second thread was acquire only, the second thread could read an old value from it invalid cached even though the variable had been overwritten by the the first thread.)

Writing correct concurrent code is an interesting but complex task, and anyone claiming different are probably writing broken code but dont realize. If guess the thing to take from all this is that unless you can prove that its Okay to use a more relaxed memory model than the sequential consistent one, you really shouldn’t.

Advertisements

Here is a short description of the different memory ordering for concurrency in C++0x; Bartosz Milewski’s blogs port about C++ atomics and memory ordering.

The default for atomic variables is the familiar sequential ordering (memory_order_seq_cst), its also the slowest as it inserts lots of most memory fences that prohibit the CPU from reordering code as it sees fit. Between that and no ordering (memory_order_relaxed), there are also ordering that just insert fences on reads (memory_order_acquire), on writes (memory_order_release), and on both read and write (memory_order_acq_rel). Being able to choose a memory order other than the safe and slower default is a nice feature, and corresponds well with the C++ deign rationale of not having to pay for what you don’t use. The java volatile’s apparently works much like the c++0x atomic’s with the default sequential ordering, except java volatile’s are not atomic (there is a java library for that).

Herb Sutter posted an excellent new effective concurrency article; Measuring Parallel Performance: Optimizing a Concurrent Queue. Its surprising to see how much scalability can be gained by wasting space on padding, to make sure independent variables get to live on different cache lines.

I have been reading Herb Sutters effective concurrency articles over the past last couple of weeks. I just finished reading the last one. I think he will compile them into a book when he has enough of them (there are about 17 of them at the moment). I’m sure I’ll buy the book once its out, as I got all his other book already.

What is clear is that concurrent programming has hit the mainstream, and its here to stay whether we like it to or not. Tomorrows PC’s might not have faster cores than today’s, but tomorrows PC’s might have a hundred times more cores than todays. Programming such a beast correctly, while keeping your sanity, will not be an easy task. Although an interesting subject, it comes abit of a surprise to me how low level concurrent programming today still is. For parallel C++ programs its not just the order of the code as you write it that matters, not even the compiled rearranged code order can be counted on, its the order of the code actually being executed on the cores that makes all the difference to the correctness of the program. All this rearranging behind the scenes by the compiler and the CPU is done to optimize sequential execution but it makes life very difficult for concurrent programmers. To sequential programs the order looks the same on all three levels, but to parallel programs the different ordering will be seen and this causes more potential bugs and more complex logic to get right. Its like a step down back into assembler programming to get the ordering and synchronization right, or even like a step one level below assembler code, down to the the actual machine code on the CPU.

The C++0x standard will address some of this, in that C++ will finally get a defined memory model. This will basically limit how the compiler and the CPU may rearrange the code you write. Then there will be a new keywords, atomic, that forces the compiler to play nice and synchronize reads and writes to a given variable between all the cores. Then there will be threads, thread local storage, locks, futures etc. But its all still very low level. Hopefully higher level libraries and patterns will emerge that make writing deadlock free, data race free, scalable programs easier.

In my humble opinion it’s a bit unfortunate that once a new thread is spawned, the language and compiler assumes that all data and functions are safe to be used without synchronization. You have to explicitly specify and single out the variables and functions that have to be synchronized, instead of the ones that don’t. Well that’s just my two cents. The switch in programming styles, patterns and tools to accommodate the concurrency revolution will sure make for an interesting decade. If anything is going to kill C++ it might be that parallell programs are just to complex to get right in a language that allowsshared state and imperative pprogramming styles. Then again, if that the case, then most of todays mainstream languages will get killed even worse.

Anyways, here are Sutter’s Effectice Concurrency articles.