OpenOffice.org | Thorsten's Weblog

Category Archives: OpenOffice.org

OOo my Threading…

This starts where Stephan’s OOoCon
talk
left off – by taking his problem statement and maybe a gist of the
proposed concepts into OOo’s core (which happens to be C++ instead of
Haskell. Which some folks might bemoan, but still, ~90% of our code
consists of that language).

To recap, the free lunch
is over for us application software developers, i.e. either OOo
gets seriously multi-threaded soon, or we stop adding features and
only optimize for speed, or OOo will get significantly slower within
the next few years. But as you, the gurus and yours
faithfully have noticed by now, becoming concurrent with osl::MutexGuard,
osl::Thread
& friends is tedious, error-prone, and neither race nor dead-lock
free for a project of OOo’s size. With the expressive power of a
general programming language, it is even theoretically impossible to
verify against these properties (for a sufficiently large project. OOo
probably satisfies this constraint).

So, Stephan already showed what’s possible in other
languages. To boot, the trick is to raise the level of abstraction
above digging-in-the-dirt, hand-made, explicit, and ugly locking or thread
fiddling. Whereever possible, parallelism should be detected either
implicitely, or at least denoted declaratively (instead of manually
telling the machine what to do). John Ousterhout has a nice writeup about
that. It’s basically another instance of the lack of transparency and
ease of use that is so ingrained with day-to-day tasks in C++. For
stuff like object lifetime management, serialization and so on, boost provides transparent
solutions. For large-scale concurrency management, the UNO
threading framework does. For local, statement-level
parallelization, there ain’t much – or is there?

In fact, there actually are several attempts to extend C++ in that
direction. To name a few:

OpenMP

Boost.Act, a template
framework that, together with Boost.Threads, provides
functionality on the same conceptual level as OpenMP (it
actually builds upon OpenMP internally). Furthermore,
Boost.Act has so-called
active
objects, which provide their results lazily – one can
construct a whole tree of calculations using active objects,
which internally enqueue those, run them asynchronously, and
force a wait on the result only if a non-active object
actually requests it.

Concur,
featuring active objects, messages,
parallel STL algorithms and futures (with futures being values
that are possibly unevaluated at the point of instantiation,
and get their actual result only eventually – forcing futures
evaluation might block the requester until the value becomes
available). Probably served as an inspiration for
Boost.Act.

TBB, or Intel
ThreadBuildingBlocks, another (closed-source) C++ template
framework with similar scope & features as Concur and
Boost.Act.

Two proposals for extending the C++ standard: Multithreading
API for C++0X (and related: Transporting
Values and Exceptions between Threads)

KDE’s
ThreadWeaver, a threaded job queue that
offers some nice concepts, such as explicit job dependencies.

From a coarse inspection, Boost.Act seems appealing (TBB really
isn’t an option for OOo, no?) – boost is high-quality anyway, and I
think that the Action
concept is a richer one than Futures.

Basically, the advanced approaches listed above employ declarative
semantics, i.e. one no longer needs to create threads by hand, assign
a thread function, and manually serialize access to shared
state. Instead, one simply states that a loop body shall be executed
in parallel:

parallel_transform( src.begin(), src.end(), dest.begin(), bind(_1 * 42) );

This multiplies each element in src with 42, and stores
the result in dest. How many threads are created, and
where they are taken from is completely up to the implementation – a
typical threading runtime on a machine with N cores will
hold a static pool of N threads, each of them receiving a
chunk of the parallel_transform workload from above. In
the old way of doing multi-threading, this behaviour would potentially
be spread over the whole code base (making it a cross-cutting
concern, just like explicit method guarding is today – with thousands
of osl::MutexGuard(m_aMutex) invocations all over the
place).

One of the crucial points of doing stuff like the above in a safe way
is to avoid hidden access to shared state. Access to the container can
be serialized by parallel_transform, but if the called
functor modifies shared objects concurrently, all bets are off. Thus,
the listed frameworks usually have some safe-guards in place, most of
them using a way of explicitely annotating free-standing functions and
classes to be thread-safe (note that this tagging of thread-safeness
can easily be achieved by means of the UNO
threading framework as well – Kay already has different
uno::Reference types on his list, to distinguish between
thread-safe and thread-unsafe components during compile-time). Another
important concept noteworthy here are pure functions,
i.e. functions that, given the same input parameters, always return
the same result, no matter where or when called (related, but not
exactly the same are referentially
transparent functions, which could also be exploited by an
optimizing parallel execution runtime). It’s always safe to call pure
functions on disjunct parameters concurrently with no extra
synchronization effort. A safe (because syntactically limited) way of
defining pure functions is via boost’s bind or
lambda library. There, expressions that modify shareable state could
simply be banned from the available syntax (they aren’t – but it’s
possible to do that).

Need an example? Good, take the standard one, that so naturally asks
for parallelization, that it’s usually been sped up via SIMD in the past (you
might have noticed that those parallel_something
algorithms principally apply the same instructions (the functor to
run) in parallel to all container elements – that’s SIMD), namely operating on image
pixel:

// color-convert an image from RGB to CMYK color space
parallel_transform(src_img.begin(),
                   src_img.end(),
                   dst_img.begin(),
                   bind(&rgb2cmyk,_1,cref(conversion_parms));

This is the archetype of inherent parallelism, having a huge amount of data (image pixel), that all need to have the same operation applied to. Disregarding memory bandwidth for the moment, this snippet scales linearly with the number of available cores.

Note the conversion_parms parameter to the bound
function, which can pose problems if rgb2cmyk silently
const_casts it to something mutable. This is the dark
corner of C/C++’s non-enforcable constness, and therefore, a minimal
amount of developer discipline is necessary: if objects or functions
employ magic behind their public interfaces (on-demand creation of
data, buffering, caching, copy-on-write), this still needs to be made
thread-safe explicitely. Fortunately, for each of the aforementioned
items, there’s a thread-safe helper available in OOo (use rtl::Static
for late-initialization of static data, boost::shared_ptr
for lazy initialization of class members, caching and buffering, and
cow_wrapper
for copy-on-write).

To be continued.

First impression: Draw/Impress on cairocanvas

After much prep work (kudos to you, Armin!), things start to look like it’s working:

(if you inspect this carefully, you’ll see the difference between the preview image on the left side, which is still rendered the old way, and Draw’s edit pane content to the right, which now gets displayed via cairocanvas).

Gory details, live demo, and plans how to move on at this year’s OOoCon. If you can’t wait: OOoWiki has a brief hacker’s guide.

The graphics brethren: basegfx and basebmp

Started in late 2002, OOo’s module basegfx
is now a pretty mature collection of data structures and algorithms
for graphics processing. The set of exported headers sport abstract
data types such as:

points, vectors, boxes, ranges, bezier curves
matrices
polygons and collections of polygons (“poly-polygons”)

</ul

Those are (where sensible) provided in the flavors

1D, 2D, 3D
integer and double-precision floating point

In addition, especially the polygon ADTs are shipped with a set of associated algorithms (provided as free functions, instead of
member functions):

a poly-polygon clipper (comparable to what
gpc is doing)
a poly-polygon triangulator (convert a polygon to triangles, which
can be fed to a GPU)
a function to generate stroked paths from a given polygon
plus a lot of basic, but very useful stuff (like svg im/export, scan
conversion, arc and circle generation, etc.)

What’s missing, though, is everything related to bitmaps. That’s where
basebmp
comes into play, started a few weeks ago and set to provide for pixel
processing, what basegfx provides for the symbolic graphics
primitives.

I’ve initially started this work by using AGG, but quickly found out that
adding pixel formats other than integer true color ones is pretty hard
(Because the code is largely undocumented. Because the assumption that
pixel values can be modified by shift left and right operations, and
are PODs (i.e. non-class types) is all over the place. And because
basically, a way to somehow get a hold to the famous extra level of indirection is
missing. Which, as usual, would have solved this software engineering
problem). Instead of starting from scratch, we’ve based this on vigra, a framework for generic image processing, which provides an elegant way to add ‘special processing’ before an actual pixel gets assigned a value.

In order to really get to the point here, I first need to sketch the
way generic bitmap processing works: it’s the idea of separating data
access and the actual bitmap manipulation algorithm, using C++
templates. The concept is borrowed from the STL – bitmap pixel can be
accessed by (two-dimensional) iterators, which in a very basic sense
behave like a pointer:

while( i!=end )
    *i++ = rand();

If i is a bitmap iterator, the pixel will be set to
random values with this algorithm. The type of i can be chosen freely,
and so can the actual pixel format of the bitmap. If one wants to
e.g. add palette lookup to this algorithm (or scale the random values
into the appropriate color range) without changing the algorithm’s code, either extremely ugly hacks
involving proxy objects are necessary (because operator*() is expected
to return an lvalue, which gets assigned the new value directly), or
one needs a direct way to ‘see’ both the iterator and the new value
prior to assignment:

while( i!=end )
    acc.set(i++, rand());

The acc variable is a pixel accessor in vigra lingo,
which takes the iterator and the new value, and accesses the
corresponding pixel. Now, adding palette lookup, scaling, blending,
mask operations and much more is as easy as providing a different
accessor to the algorithms is. Plus, it’s completely orthogonal to
both the iterator implementation, and the algorithm used. That means,
you define a set of iterators (for packed-pixel images, for r8g8b8
images, for r5g6r5 images), a set of accessors (for true color pixel,
for palette lookup, for blitting through a binary mask), and a set of
algorithms. And you combine all those freely, i.e. you have the
cartesian product of variations to choose from.

Source code grokking, 3rd installment

Following my habit to write something about source code analysis every
few weeks, here comes the next installment under that topic:

Initially set off by KaiB’s
pain with OOo’s build slowness, I remembered some rules from John
Lakos’ ‘Large-Scale
C++ Software Design’ tome – and to first off assess the state of
affairs regarding include dependencies, found this:

DEPS, a tool to extract dependencies (e.g. includes) from source code
cinclude2dot, a tool plotting a project’s include dependencies as a GraphViz dot graph

Somewhat related (because they also facilitate getting an overview
over large code bases) are

Bloof, an infrastructure for processing of version control data, and
StatCVS, which extracts statistical HTML reports from a CVS repository.

(thanks Pavel for
pointing me to it).

cairocanvas is integrated!

Thanks to Radek’s tireless work, CWS cairocanvas is now finally integrated (into the 2.0.3 code line). Took about eight months to get there, and now provides a nicely anti-aliased slideshow experience, with even the alpha effects running at a decent speed.

The downside: you need a fairly recent X server and cairo 1.0.x on your system.

More on source code grokking tools

After having played
a bit with cpd, I
went searching for more elaborate source code analysis tools. And there are some, even in the FLOSS realm:

pygccxml, a gcc plugin to generate XMLized
output from the AST (plus a python framework to actually work with this representation).
gccxml, quite similar to pygccxml, focused on extracting declarations
only.
synopsis, a standalone parser plus analysis framework – a bit lacking
for c++, which is inherently hard to parse.
opencxx, same as above (actually a predecessor/inspiration for synopsis)
rtlcheck, a gcc plugin that extracts code on the RTL (register-transfer level).
Currently, seems to be used only for c, extending that to c++ should be both easy & rewarding (as rtlcheck appears to be the
most elaborate project of all listed).
Also quite interesting, though with different scope (being a runtime-analysis tool) is the
Design Recovery Tool, roughly characterizable as a means to get from UI
actions to the code involved processing them.

I do like the gcc patching/plugin approach the most, knowing how complex it is to get c++ parsing even half-way right. And as OpenOffice.org is such a beast, code-size wise, getting all the help we could from automatic tools appears wise to me.

Rework of COW implementations

Quite coincidentally, after having rambled about everybody rolling their own copy-on-write implementations, I’ve found severe issues with some of the COW implementations in VCL. Most notably, they employ 16 bit refcounts – which is a bad idea indeed, when used for objects that occur ten thousands of times, e.g. in metafiles. brrr.

Thus, this needs some larger fixups, work for that is going to happen in CWS cowfixes01.

Update: cow_wrapper is available now

Since nobody complained, checked in cow_wrapper.hxx in a new module, under the tools project (which seemed to me the right place for a tools package): o3tl/inc/o3tl/cow_wrapper.hxx contains the stuff, the remainder of the module is currently a mere unit test for said template.

The scope for o3tl is admittedly kind of ambitious, as it should contain “…very
basic (template) functionality, comparable to what’s provided by boost
or stl, but specific to OOo (what comes to mind are e.g. stl adapters
for our own datatypes and UNO, and stuff that could in principle be
upstreamed to boost, but isn’t as of now).”

COWs all over the place

After fixing a self-assignment blunder in basegfx/source/matrix/b2dhommatrix.cxx, which happened in the copy-on-write-related parts of operator=, I began to wonder why on earth everybody and his grandma was doing cow (copy-on-write) by hand (a search for mnRefCount and m_nRefCount (though certainly only a subset) should give you an idea how abundant this theme is. BTW: I generally agree with Herb Sutter, that cow has disadvantages in a multi-threaded world — but, alas, giant objects like bitmaps or polygons should better still provide a cow implementation in the back).

So, I dug at the usual places, and in the corners of boost and ASL, I indeed found a cow_ptr and a copy_on_write wrapper. Nice. Only that I found cow_ptr too ambitious (all the cloning stuff should be irrelevant for OOo’s cow usage), and copy_on_write lacked some of boost::shared_ptr commodities I got used to (checked_delete, glue for boost::mem_fn, operator forwarding, standard-compatible swap, you get the idea).

Consequently, I cobbled together a cow_wrapper, tailored to the usual, pimpled use of cow in OOo:

class Impl;

class MyClass
{
public:
   MyClass();

   void set( int v );
   int  get() const;

private:
   cow_wrapper mpImpl;
};

where the get() and set() methods simply forward to the Impl methods:

void MyClass::set( int v )
{
    mpImpl->set(v);
}
int MyClass::get() const
{
    return mpImpl->get();
}

And the trick is: cow_wrapper has operator-> overloaded for const and non-const access. For non-const access, the potentially shared copy of the Impl object gets copied (i.e. made unique).

Except that cow_wrapper doesn’t yet have all wanted features (checked_delete support still missing), and possibly needs some testing, I think this holds up for quite painless cow implementations (yes, I intend to move at least basegfx to cow_wrapper, eventually).

Irrelevant musings…

While debugging an especially insidious slideshow jerkiness (which only manifests itself on a single type of Toshiba laptop, and only with the most recent JDS builds — i.e. prolly exposed by that particular combination of NVidia chip and Xorg nv driver), I found relief (and distraction) in reading one of ESR’s inspiring texts: The Tao of Unix Programming (Chapter 1 — Philosophy got me straightened out again).

Which in turn got me (somehow) looking for keithp’s article about X11 network performance, which in turn referenced xcap, which in turn pointed me to the thing that possibly goes wrong in the setup above: after every bitmap operation, OOo seems to scan the X11 queue for GraphicsExpose events, and generates paint events from that — even if there’s nothing that’s been exposed, or needs repaint. I guess that needs a change.

Thorsten's Weblog

Category Archives: OpenOffice.org

OOo my Threading…

First impression: Draw/Impress on cairocanvas

The graphics brethren: basegfx and basebmp

Source code grokking, 3rd installment

cairocanvas is integrated!

More on source code grokking tools

Rework of COW implementations

Update: cow_wrapper is available now

COWs all over the place

Irrelevant musings…

« Home

Pages

Categories

Archives

Search

Blogroll

RSS Feeds

Meta