The most interesting explanation of rvalue references yet:

Want to know whats the need for the std::move method? As I understand it, It’s needed because as soon as you give an rvalue reference a name it “becomes” an lvalue reference! This is probably done to make it harder for you to shoot yourself in the foot (as you would not want to use the name of a rvalue reference after its rvalue had already been moved away and the rvalue reference name was invalidated). So there needs to be a way to get back to an  rvalue reference from an lvalue somehow . That is what the move function is all about. std::move()simply converts any rvalue or lvalue to an rvalue reference, by stripping away its name.


Another look at Peterson

December 6, 2008

I had another read of Bartosz Milewski recent blog post about the different memory orderings of c++0x.

The most interesting thing about the blog post (that I missed last time) was actually the proposed Peterson locking algortithm that used some none default c++0x  memory orderings for the stores and loads of its atomic variables. It turns out that his proposed implementation is broken. Yupp, that’s right, even some of the experts on concurrency can get lost when straying from the straight and narrow road of  default sequential memory ordering. Exactly why the implementation is broken, and another implementation that actually works (by Dmitriy V’jukov) , is explained (or even proven) on Anthony Williams blog.

Anthony’s blog post was a very informative read. I probably read it ten times until I (think I) understood why one implementation was broken and the other wasnt. Lots of “ahaa – no wait a minute – oh OK, I think I see what you did there” moments. Reasoning about concurrency is allot like pealing an onion, there is always another layer underneath that you didn’t know about. 

(The problem as I understand it, was that the atomic variable “victim” was read and written by both threads,  and one needed to make sure that the writes that happen before on the first  thread where synchronized to the the reads that happen after on the second thread. Because the memory ordering for the write on the first thread was release only, and the read on the second thread was acquire only, the second thread could read an old value from it invalid cached even though the variable had been overwritten by the the first thread.)

Writing correct concurrent code is an interesting but complex task, and anyone claiming different are probably writing broken code but dont realize. If guess the thing to take from all this is that unless you can prove that its Okay to use a more relaxed memory model than the sequential consistent one, you really shouldn’t.

Here is a short description of the different memory ordering for concurrency in C++0x; Bartosz Milewski’s blogs port about C++ atomics and memory ordering.

The default for atomic variables is the familiar sequential ordering (memory_order_seq_cst), its also the slowest as it inserts lots of most memory fences that prohibit the CPU from reordering code as it sees fit. Between that and no ordering (memory_order_relaxed), there are also ordering that just insert fences on reads (memory_order_acquire), on writes (memory_order_release), and on both read and write (memory_order_acq_rel). Being able to choose a memory order other than the safe and slower default is a nice feature, and corresponds well with the C++ deign rationale of not having to pay for what you don’t use. The java volatile’s apparently works much like the c++0x atomic’s with the default sequential ordering, except java volatile’s are not atomic (there is a java library for that).

I have been reading Herb Sutters effective concurrency articles over the past last couple of weeks. I just finished reading the last one. I think he will compile them into a book when he has enough of them (there are about 17 of them at the moment). I’m sure I’ll buy the book once its out, as I got all his other book already.

What is clear is that concurrent programming has hit the mainstream, and its here to stay whether we like it to or not. Tomorrows PC’s might not have faster cores than today’s, but tomorrows PC’s might have a hundred times more cores than todays. Programming such a beast correctly, while keeping your sanity, will not be an easy task. Although an interesting subject, it comes abit of a surprise to me how low level concurrent programming today still is. For parallel C++ programs its not just the order of the code as you write it that matters, not even the compiled rearranged code order can be counted on, its the order of the code actually being executed on the cores that makes all the difference to the correctness of the program. All this rearranging behind the scenes by the compiler and the CPU is done to optimize sequential execution but it makes life very difficult for concurrent programmers. To sequential programs the order looks the same on all three levels, but to parallel programs the different ordering will be seen and this causes more potential bugs and more complex logic to get right. Its like a step down back into assembler programming to get the ordering and synchronization right, or even like a step one level below assembler code, down to the the actual machine code on the CPU.

The C++0x standard will address some of this, in that C++ will finally get a defined memory model. This will basically limit how the compiler and the CPU may rearrange the code you write. Then there will be a new keywords, atomic, that forces the compiler to play nice and synchronize reads and writes to a given variable between all the cores. Then there will be threads, thread local storage, locks, futures etc. But its all still very low level. Hopefully higher level libraries and patterns will emerge that make writing deadlock free, data race free, scalable programs easier.

In my humble opinion it’s a bit unfortunate that once a new thread is spawned, the language and compiler assumes that all data and functions are safe to be used without synchronization. You have to explicitly specify and single out the variables and functions that have to be synchronized, instead of the ones that don’t. Well that’s just my two cents. The switch in programming styles, patterns and tools to accommodate the concurrency revolution will sure make for an interesting decade. If anything is going to kill C++ it might be that parallell programs are just to complex to get right in a language that allowsshared state and imperative pprogramming styles. Then again, if that the case, then most of todays mainstream languages will get killed even worse.

Anyways, here are Sutter’s Effectice Concurrency articles.

Better functors with boost

November 30, 2008

Today I read about boost::bind

It replaces the old STL function/functor binders (although lamda functions are a better replacement when composing/nesting several layers of binders). The improvement over the old binders is that (by way of better deduction) the new binder is less picky about the function/functor type, it can bind to any function/functor with up to 9 arguments, and the order of the arguments can be rearranged when binding. Bind is now part of the TR1 of the C++0x standard.

Here is a article and tutorial about boost::bind by Björn Karlsson: “How the Boost Bind Library Can Improve Your C++ Programs”.  The article is from the book “Beyond the C++ Standard Library: An Introduction to Boost” I’m tempted to buy the book, although I first have to finish a couple of other C++ books im reading.

Functional sugar in C++0x

November 10, 2008

I found this easy to follow lamda tutorial when checking out the wikipage for C++0x. Its written by a member of the VC10 Visual Studio C++ team.

Check the tutorial out at:

I think lamdas are great, as it will automate much of the cumbersome task of defining functors. Functors are central to STL, but I find the amount of boilerplate code that has to be written (even for the most basic functor) to be cumbersome to write and distracting to read.

More Importantly though is that the declaration and definition of the functor is often visually removed (and logically hidden) from the local context where its used. This has made defining functors unpractical for simple one-liner tasks. In the most simple cases one can always compose the standard functor adapters and binders to do the task at hand. But for slightly less simplistic logic the code quickly becomes unreadable.  

So for logic that are is not simple enough for adapters and binders, but not complicated enough to warrant being encapsulated into its own named class.. well for such logic lamda functors is now a much better fit. I think lamda function will probably make the functor adapters obsolete, and full blown encapsulated functor class declarations even rarer. You can even store the unnamed lambda functors in a named wrapper. Lamdas might just sound like syntactic sugar, but the fact that it makes functor so much more useable will make it easier to code in a more functional manner.

Another nicefeature exemplified in the post, is the new auto keyword. In C++0x the compiler will let you use the the keyword auto in-place of a type name when the compiler already can deduce the type. Again this is just syntactic sugar, but i think it will make a big difference in the way people code. I think it will help a programmer weaken the dependency and delay the commitment to a specific type until compile time (as opposed to design time). One might not even care or know what the actual name of the type one are using is. That again will make it easier to program in a more functional style.

Maybe this is more than syntactic sugar, maybe it will fundamentally change how people think about C++ design.

Anyways that’s my two cent. Check out the post to find out the details.