October | 2006 | Thorsten's Weblog

This starts where Stephan’s OOoCon
talk
left off – by taking his problem statement and maybe a gist of the
proposed concepts into OOo’s core (which happens to be C++ instead of
Haskell. Which some folks might bemoan, but still, ~90% of our code
consists of that language).

To recap, the free lunch
is over for us application software developers, i.e. either OOo
gets seriously multi-threaded soon, or we stop adding features and
only optimize for speed, or OOo will get significantly slower within
the next few years. But as you, the gurus and yours
faithfully have noticed by now, becoming concurrent with osl::MutexGuard,
osl::Thread
& friends is tedious, error-prone, and neither race nor dead-lock
free for a project of OOo’s size. With the expressive power of a
general programming language, it is even theoretically impossible to
verify against these properties (for a sufficiently large project. OOo
probably satisfies this constraint).

So, Stephan already showed what’s possible in other
languages. To boot, the trick is to raise the level of abstraction
above digging-in-the-dirt, hand-made, explicit, and ugly locking or thread
fiddling. Whereever possible, parallelism should be detected either
implicitely, or at least denoted declaratively (instead of manually
telling the machine what to do). John Ousterhout has a nice writeup about
that. It’s basically another instance of the lack of transparency and
ease of use that is so ingrained with day-to-day tasks in C++. For
stuff like object lifetime management, serialization and so on, boost provides transparent
solutions. For large-scale concurrency management, the UNO
threading framework does. For local, statement-level
parallelization, there ain’t much – or is there?

In fact, there actually are several attempts to extend C++ in that
direction. To name a few:

OpenMP

Boost.Act, a template
framework that, together with Boost.Threads, provides
functionality on the same conceptual level as OpenMP (it
actually builds upon OpenMP internally). Furthermore,
Boost.Act has so-called
active
objects, which provide their results lazily – one can
construct a whole tree of calculations using active objects,
which internally enqueue those, run them asynchronously, and
force a wait on the result only if a non-active object
actually requests it.

Concur,
featuring active objects, messages,
parallel STL algorithms and futures (with futures being values
that are possibly unevaluated at the point of instantiation,
and get their actual result only eventually – forcing futures
evaluation might block the requester until the value becomes
available). Probably served as an inspiration for
Boost.Act.

TBB, or Intel
ThreadBuildingBlocks, another (closed-source) C++ template
framework with similar scope & features as Concur and
Boost.Act.

Two proposals for extending the C++ standard: Multithreading
API for C++0X (and related: Transporting
Values and Exceptions between Threads)

KDE’s
ThreadWeaver, a threaded job queue that
offers some nice concepts, such as explicit job dependencies.

From a coarse inspection, Boost.Act seems appealing (TBB really
isn’t an option for OOo, no?) – boost is high-quality anyway, and I
think that the Action
concept is a richer one than Futures.

Basically, the advanced approaches listed above employ declarative
semantics, i.e. one no longer needs to create threads by hand, assign
a thread function, and manually serialize access to shared
state. Instead, one simply states that a loop body shall be executed
in parallel:

parallel_transform( src.begin(), src.end(), dest.begin(), bind(_1 * 42) );

This multiplies each element in src with 42, and stores
the result in dest. How many threads are created, and
where they are taken from is completely up to the implementation – a
typical threading runtime on a machine with N cores will
hold a static pool of N threads, each of them receiving a
chunk of the parallel_transform workload from above. In
the old way of doing multi-threading, this behaviour would potentially
be spread over the whole code base (making it a cross-cutting
concern, just like explicit method guarding is today – with thousands
of osl::MutexGuard(m_aMutex) invocations all over the
place).

One of the crucial points of doing stuff like the above in a safe way
is to avoid hidden access to shared state. Access to the container can
be serialized by parallel_transform, but if the called
functor modifies shared objects concurrently, all bets are off. Thus,
the listed frameworks usually have some safe-guards in place, most of
them using a way of explicitely annotating free-standing functions and
classes to be thread-safe (note that this tagging of thread-safeness
can easily be achieved by means of the UNO
threading framework as well – Kay already has different
uno::Reference types on his list, to distinguish between
thread-safe and thread-unsafe components during compile-time). Another
important concept noteworthy here are pure functions,
i.e. functions that, given the same input parameters, always return
the same result, no matter where or when called (related, but not
exactly the same are referentially
transparent functions, which could also be exploited by an
optimizing parallel execution runtime). It’s always safe to call pure
functions on disjunct parameters concurrently with no extra
synchronization effort. A safe (because syntactically limited) way of
defining pure functions is via boost’s bind or
lambda library. There, expressions that modify shareable state could
simply be banned from the available syntax (they aren’t – but it’s
possible to do that).

Need an example? Good, take the standard one, that so naturally asks
for parallelization, that it’s usually been sped up via SIMD in the past (you
might have noticed that those parallel_something
algorithms principally apply the same instructions (the functor to
run) in parallel to all container elements – that’s SIMD), namely operating on image
pixel:

// color-convert an image from RGB to CMYK color space
parallel_transform(src_img.begin(),
                   src_img.end(),
                   dst_img.begin(),
                   bind(&rgb2cmyk,_1,cref(conversion_parms));

This is the archetype of inherent parallelism, having a huge amount of data (image pixel), that all need to have the same operation applied to. Disregarding memory bandwidth for the moment, this snippet scales linearly with the number of available cores.

Note the conversion_parms parameter to the bound
function, which can pose problems if rgb2cmyk silently
const_casts it to something mutable. This is the dark
corner of C/C++’s non-enforcable constness, and therefore, a minimal
amount of developer discipline is necessary: if objects or functions
employ magic behind their public interfaces (on-demand creation of
data, buffering, caching, copy-on-write), this still needs to be made
thread-safe explicitely. Fortunately, for each of the aforementioned
items, there’s a thread-safe helper available in OOo (use rtl::Static
for late-initialization of static data, boost::shared_ptr
for lazy initialization of class members, caching and buffering, and
cow_wrapper
for copy-on-write).

To be continued.

Thorsten's Weblog

Monthly Archives: October 2006

OOo my Threading…

« Home

Pages

Categories

Archives

Search

Blogroll

RSS Feeds

Meta