Skip navigation

I’ve already talked a bit
about status quo of threading in OOo, and listed some largely
useful stuff that others have been doing to remedy the problems
shared-state multi-threading poses.

Why not going further? If there’s a way to know that a function is pure, or that an
object is thread-safe, then it would be nice to have
automatic parallelization employed. The underlying issue that
needs to be controlled here are races
prohibiting unserialized access to shared state. So, either a function
is pure, i.e. does not depend on shared state at all, or the called
subsystem takes care of itself against concurrent modification (this
is most easily achieved by the environment concept of the UNO
threading
framework: the UNO runtime implicitely serializes
external access to thread-unsafe appartements).

Although C++ provides no portable ways to express those concepts on a
language level, for pure functions, there’s a way of using a limited
subset of lambda
expressions
, that inhibit access to shared state on the syntax
level. And it’s perfectly possible to mark objects (or even subsets of
a class’ methods) to be thread-safe. One straight-forward way to do
this are specializations of UNO interface references, i.e. ones that
denote thread-safe
components
.

Given all of this, we can form statements that contain:

    unsafe method calls

    thread-safe method calls

    pure function calls

    impure function calls

So, in fact, a runtime engine could reason about which subexpressions
can be executed concurrently, and which must be serialized. If you
treat method calls as what they are, e.g. implicitely carrying a
this pointer argument, a possible data flow graph might
look like this:

new object1                      new object2
 |                                |
 +->object1::methodA              +->object2::methodA
            |                               |
            +------------------------------>+->object2::methodB(object1)
                                                         |
                                                         v
                                                 object1::methodC

new object3
 |
 +->object3::methodA

That is, the this pointer is carried along as a target
for modifications, and as soon as two methods have access to the same
object, they need to be called sequentially. This does not apply for
UNO interface calls or objects that are tagged as thread-safe, of course. To be specific, a forest of data flow trees can be generated, which
defines a partial ordering over the subexpressions. If neither
exp1<exp2 nor exp1>exp2 can be deduced
from this ordering, those two subexpressions can be executed in
parallel. Really basic stuff, that compiler optimizers do, as well –
only that plain C/C++ doesn’t provide that many clues to safely
parallelize. From the example above, it is obvious that
object3::methodA can be executed concurrently to all other
methods, that object1::methodC must be execute strictly
after object2::methodA, and that
object1::methodA and object2::methodA can
also be executed concurrently.

Okay, this is largely crack-smoking. But there is something to be made
of it. Stay tuned.

This starts where Stephan’s OOoCon
talk

left off – by taking his problem statement and maybe a gist of the
proposed concepts into OOo’s core (which happens to be C++ instead of
Haskell. Which some folks might bemoan, but still, ~90% of our code
consists of that language).

To recap, the free lunch
is over
for us application software developers, i.e. either OOo
gets seriously multi-threaded soon, or we stop adding features and
only optimize for speed, or OOo will get significantly slower within
the next few years. But as you, the gurus and yours
faithfully have noticed by now, becoming concurrent with osl::MutexGuard,
osl::Thread
& friends is tedious, error-prone, and neither race nor dead-lock
free for a project of OOo’s size. With the expressive power of a
general programming language, it is even theoretically impossible to
verify against these properties (for a sufficiently large project. OOo
probably satisfies this constraint).

So, Stephan already showed what’s possible in other
languages
. To boot, the trick is to raise the level of abstraction
above digging-in-the-dirt, hand-made, explicit, and ugly locking or thread
fiddling. Whereever possible, parallelism should be detected either
implicitely, or at least denoted declaratively (instead of manually
telling the machine what to do). John Ousterhout has a nice writeup about
that. It’s basically another instance of the lack of transparency and
ease of use that is so ingrained with day-to-day tasks in C++. For
stuff like object lifetime management, serialization and so on, boost provides transparent
solutions. For large-scale concurrency management, the UNO
threading framework
does. For local, statement-level
parallelization, there ain’t much – or is there?

In fact, there actually are several attempts to extend C++ in that
direction. To name a few:

    OpenMP, a C/C++ language
    extension, which provides things like parallel loops, work
    sharing or master and synchronization constructs. Drawbacks:
    controlled via pragmas, expressiveness limited, C origins
    (e.g. there’s no proper way to propagate exceptions out of OpenMP threads).

    Boost.Act, a template
    framework that, together with Boost.Threads, provides
    functionality on the same conceptual level as OpenMP (it
    actually builds upon OpenMP internally). Furthermore,
    Boost.Act has so-called
    active
    objects
    , which provide their results lazily – one can
    construct a whole tree of calculations using active objects,
    which internally enqueue those, run them asynchronously, and
    force a wait on the result only if a non-active object
    actually requests it.

    Concur,
    featuring active objects, messages,
    parallel STL algorithms and futures (with futures being values
    that are possibly unevaluated at the point of instantiation,
    and get their actual result only eventually – forcing futures
    evaluation might block the requester until the value becomes
    available). Probably served as an inspiration for
    Boost.Act.

    TBB, or Intel
    ThreadBuildingBlocks, another (closed-source) C++ template
    framework with similar scope & features as Concur and
    Boost.Act.

    Two proposals for extending the C++ standard: Multithreading
    API for C++0X
    (and related: Transporting
    Values and Exceptions between Threads
    )

    KDE’s
    ThreadWeaver
    , a threaded job queue that
    offers some nice concepts, such as explicit job dependencies.

From a coarse inspection, Boost.Act seems appealing (TBB really
isn’t an option for OOo, no?) – boost is high-quality anyway, and I
think that the Action
concept is a richer one than Futures.

Basically, the advanced approaches listed above employ declarative
semantics, i.e. one no longer needs to create threads by hand, assign
a thread function, and manually serialize access to shared
state. Instead, one simply states that a loop body shall be executed
in parallel:

parallel_transform( src.begin(), src.end(), dest.begin(), bind(_1 * 42) );

This multiplies each element in src with 42, and stores
the result in dest. How many threads are created, and
where they are taken from is completely up to the implementation – a
typical threading runtime on a machine with N cores will
hold a static pool of N threads, each of them receiving a
chunk of the parallel_transform workload from above. In
the old way of doing multi-threading, this behaviour would potentially
be spread over the whole code base (making it a cross-cutting
concern, just like explicit method guarding is today – with thousands
of osl::MutexGuard(m_aMutex) invocations all over the
place).

One of the crucial points of doing stuff like the above in a safe way
is to avoid hidden access to shared state. Access to the container can
be serialized by parallel_transform, but if the called
functor modifies shared objects concurrently, all bets are off. Thus,
the listed frameworks usually have some safe-guards in place, most of
them using a way of explicitely annotating free-standing functions and
classes to be thread-safe (note that this tagging of thread-safeness
can easily be achieved by means of the UNO
threading framework
as well – Kay already has different
uno::Reference types on his list, to distinguish between
thread-safe and thread-unsafe components during compile-time). Another
important concept noteworthy here are pure functions,
i.e. functions that, given the same input parameters, always return
the same result, no matter where or when called (related, but not
exactly the same are referentially
transparent
functions, which could also be exploited by an
optimizing parallel execution runtime). It’s always safe to call pure
functions on disjunct parameters concurrently with no extra
synchronization effort. A safe (because syntactically limited) way of
defining pure functions is via boost’s bind or
lambda library. There, expressions that modify shareable state could
simply be banned from the available syntax (they aren’t – but it’s
possible to do that).

Need an example? Good, take the standard one, that so naturally asks
for parallelization, that it’s usually been sped up via SIMD in the past (you
might have noticed that those parallel_something
algorithms principally apply the same instructions (the functor to
run) in parallel to all container elements – that’s SIMD), namely operating on image
pixel:

// color-convert an image from RGB to CMYK color space
parallel_transform(src_img.begin(),
                   src_img.end(),
                   dst_img.begin(),
                   bind(&rgb2cmyk,_1,cref(conversion_parms));

This is the archetype of inherent parallelism, having a huge amount of data (image pixel), that all need to have the same operation applied to. Disregarding memory bandwidth for the moment, this snippet scales linearly with the number of available cores.

Note the conversion_parms parameter to the bound
function, which can pose problems if rgb2cmyk silently
const_casts it to something mutable. This is the dark
corner of C/C++’s non-enforcable constness, and therefore, a minimal
amount of developer discipline is necessary: if objects or functions
employ magic behind their public interfaces (on-demand creation of
data, buffering, caching, copy-on-write), this still needs to be made
thread-safe explicitely. Fortunately, for each of the aforementioned
items, there’s a thread-safe helper available in OOo (use rtl::Static
for late-initialization of static data, boost::shared_ptr
for lazy initialization of class members, caching and buffering, and
cow_wrapper
for copy-on-write).

To be continued.

After much prep work (kudos to you, Armin!), things start to look like it’s working:

screen shot

(if you inspect this carefully, you’ll see the difference between the preview image on the left side, which is still rendered the old way, and Draw’s edit pane content to the right, which now gets displayed via cairocanvas).

Gory details, live demo, and plans how to move on at this year’s OOoCon. If you can’t wait: OOoWiki has a brief hacker’s guide.

Started in late 2002, OOo’s module basegfx
is now a pretty mature collection of data structures and algorithms
for graphics processing. The set of exported headers sport abstract
data types such as:

  • points, vectors, boxes, ranges, bezier curves
  • matrices
  • polygons and collections of polygons (“poly-polygons”)
  • </ul

    Those are (where sensible) provided in the flavors

    • 1D, 2D, 3D
    • integer and double-precision floating point

    In addition, especially the polygon ADTs are shipped with a set of associated algorithms (provided as free functions, instead of
    member functions):

    • a poly-polygon clipper (comparable to what
      gpc is doing)
    • a poly-polygon triangulator (convert a polygon to triangles, which
      can be fed to a GPU)
    • a function to generate stroked paths from a given polygon
    • plus a lot of basic, but very useful stuff (like svg im/export, scan
      conversion, arc and circle generation, etc.)

    What’s missing, though, is everything related to bitmaps. That’s where
    basebmp
    comes into play, started a few weeks ago and set to provide for pixel
    processing, what basegfx provides for the symbolic graphics
    primitives.

    I’ve initially started this work by using AGG, but quickly found out that
    adding pixel formats other than integer true color ones is pretty hard
    (Because the code is largely undocumented. Because the assumption that
    pixel values can be modified by shift left and right operations, and
    are PODs (i.e. non-class types) is all over the place. And because
    basically, a way to somehow get a hold to the famous extra level of indirection is
    missing. Which, as usual, would have solved this software engineering
    problem). Instead of starting from scratch, we’ve based this on vigra, a framework for generic image processing, which provides an elegant way to add ‘special processing’ before an actual pixel gets assigned a value.

    In order to really get to the point here, I first need to sketch the
    way generic bitmap processing works: it’s the idea of separating data
    access and the actual bitmap manipulation algorithm, using C++
    templates. The concept is borrowed from the STL – bitmap pixel can be
    accessed by (two-dimensional) iterators, which in a very basic sense
    behave like a pointer:

    while( i!=end )
        *i++ = rand();
    

    If i is a bitmap iterator, the pixel will be set to
    random values with this algorithm. The type of i can be chosen freely,
    and so can the actual pixel format of the bitmap. If one wants to
    e.g. add palette lookup to this algorithm (or scale the random values
    into the appropriate color range) without changing the algorithm’s code, either extremely ugly hacks
    involving proxy objects are necessary (because operator*() is expected
    to return an lvalue, which gets assigned the new value directly), or
    one needs a direct way to ‘see’ both the iterator and the new value
    prior to assignment:

    while( i!=end )
        acc.set(i++, rand());
    

    The acc variable is a pixel accessor in vigra lingo,
    which takes the iterator and the new value, and accesses the
    corresponding pixel. Now, adding palette lookup, scaling, blending,
    mask operations and much more is as easy as providing a different
    accessor to the algorithms is. Plus, it’s completely orthogonal to
    both the iterator implementation, and the algorithm used. That means,
    you define a set of iterators (for packed-pixel images, for r8g8b8
    images, for r5g6r5 images), a set of accessors (for true color pixel,
    for palette lookup, for blitting through a binary mask), and a set of
    algorithms. And you combine all those freely, i.e. you have the
    cartesian product of variations to choose from.

Guess it’s about time to post some update to my photovoltaic roof thingy. The generator is now running for 6 months, without any downtime, and produced ~700 kWh during that period. Right now, that’s about 10 percent below the estimated yield (based on a simulation, which in turn is based on the mean from 20 years of local weather recordings), and has saved the athmosphere around 88 kg of CO2 exhaust (well, not really. First of all, the generator has to earn the energy it has used up for producing and mounting it. This might take some years).

Currently, with the May providing a cloudless sky one day after the other, we’re reaping 20 to 25 kWh per day (could be even more, if it wasn’t so damn hot for the season – thermal effects in the panels now cancel even more of the electrons).

All in all: feels really good.

Operational, and in the sun

Following my habit to write something about source code analysis every
few weeks, here comes the next installment under that topic:

Initially set off by KaiB’s
pain with OOo’s build slowness, I remembered some rules from John
Lakos’ ‘Large-Scale
C++ Software Design’
tome – and to first off assess the state of
affairs regarding include dependencies, found this:

  • DEPS, a tool to extract dependencies (e.g. includes) from source code
  • cinclude2dot, a tool plotting a project’s include dependencies as a GraphViz dot graph

Somewhat related (because they also facilitate getting an overview
over large code bases) are

  • Bloof, an infrastructure for processing of version control data, and
  • StatCVS, which extracts statistical HTML reports from a CVS repository.
  • (thanks Pavel for
    pointing me to it).

Thanks to Radek’s tireless work, CWS cairocanvas is now finally integrated (into the 2.0.3 code line). Took about eight months to get there, and now provides a nicely anti-aliased slideshow experience, with even the alpha effects running at a decent speed.

The downside: you need a fairly recent X server and cairo 1.0.x on your system.

After having played
a bit with cpd, I
went searching for more elaborate source code analysis tools. And there are some, even in the FLOSS realm:

  • pygccxml, a gcc plugin to generate XMLized
    output from the AST (plus a python framework to actually work with this representation).
  • gccxml, quite similar to pygccxml, focused on extracting declarations
    only.
  • synopsis, a standalone parser plus analysis framework – a bit lacking
    for c++, which is inherently hard to parse.
  • opencxx, same as above (actually a predecessor/inspiration for synopsis)
  • rtlcheck, a gcc plugin that extracts code on the RTL (register-transfer level).
    Currently, seems to be used only for c, extending that to c++ should be both easy & rewarding (as rtlcheck appears to be the
    most elaborate project of all listed).
  • Also quite interesting, though with different scope (being a runtime-analysis tool) is the
    Design Recovery Tool, roughly characterizable as a means to get from UI
    actions to the code involved processing them.

I do like the gcc patching/plugin approach the most, knowing how complex it is to get c++ parsing even half-way right. And as OpenOffice.org is such a beast, code-size wise, getting all the help we could from automatic tools appears wise to me.

Quite coincidentally, after having rambled about everybody rolling their own copy-on-write implementations, I’ve found severe issues with some of the COW implementations in VCL. Most notably, they employ 16 bit refcounts – which is a bad idea indeed, when used for objects that occur ten thousands of times, e.g. in metafiles. brrr.

Thus, this needs some larger fixups, work for that is going to happen in CWS cowfixes01.

Since nobody complained, checked in cow_wrapper.hxx in a new module, under the tools project (which seemed to me the right place for a tools package): o3tl/inc/o3tl/cow_wrapper.hxx contains the stuff, the remainder of the module is currently a mere unit test for said template.

The scope for o3tl is admittedly kind of ambitious, as it should contain “…very
basic (template) functionality, comparable to what’s provided by boost
or stl, but specific to OOo (what comes to mind are e.g. stl adapters
for our own datatypes and UNO, and stuff that could in principle be
upstreamed to boost, but isn’t as of now).”