Skip navigation

Category Archives: OpenOffice.org

This starts where Stephan’s OOoCon
talk

left off – by taking his problem statement and maybe a gist of the
proposed concepts into OOo’s core (which happens to be C++ instead of
Haskell. Which some folks might bemoan, but still, ~90% of our code
consists of that language).

To recap, the free lunch
is over
for us application software developers, i.e. either OOo
gets seriously multi-threaded soon, or we stop adding features and
only optimize for speed, or OOo will get significantly slower within
the next few years. But as you, the gurus and yours
faithfully have noticed by now, becoming concurrent with osl::MutexGuard,
osl::Thread
& friends is tedious, error-prone, and neither race nor dead-lock
free for a project of OOo’s size. With the expressive power of a
general programming language, it is even theoretically impossible to
verify against these properties (for a sufficiently large project. OOo
probably satisfies this constraint).

So, Stephan already showed what’s possible in other
languages
. To boot, the trick is to raise the level of abstraction
above digging-in-the-dirt, hand-made, explicit, and ugly locking or thread
fiddling. Whereever possible, parallelism should be detected either
implicitely, or at least denoted declaratively (instead of manually
telling the machine what to do). John Ousterhout has a nice writeup about
that. It’s basically another instance of the lack of transparency and
ease of use that is so ingrained with day-to-day tasks in C++. For
stuff like object lifetime management, serialization and so on, boost provides transparent
solutions. For large-scale concurrency management, the UNO
threading framework
does. For local, statement-level
parallelization, there ain’t much – or is there?

In fact, there actually are several attempts to extend C++ in that
direction. To name a few:

    OpenMP, a C/C++ language
    extension, which provides things like parallel loops, work
    sharing or master and synchronization constructs. Drawbacks:
    controlled via pragmas, expressiveness limited, C origins
    (e.g. there’s no proper way to propagate exceptions out of OpenMP threads).

    Boost.Act, a template
    framework that, together with Boost.Threads, provides
    functionality on the same conceptual level as OpenMP (it
    actually builds upon OpenMP internally). Furthermore,
    Boost.Act has so-called
    active
    objects
    , which provide their results lazily – one can
    construct a whole tree of calculations using active objects,
    which internally enqueue those, run them asynchronously, and
    force a wait on the result only if a non-active object
    actually requests it.

    Concur,
    featuring active objects, messages,
    parallel STL algorithms and futures (with futures being values
    that are possibly unevaluated at the point of instantiation,
    and get their actual result only eventually – forcing futures
    evaluation might block the requester until the value becomes
    available). Probably served as an inspiration for
    Boost.Act.

    TBB, or Intel
    ThreadBuildingBlocks, another (closed-source) C++ template
    framework with similar scope & features as Concur and
    Boost.Act.

    Two proposals for extending the C++ standard: Multithreading
    API for C++0X
    (and related: Transporting
    Values and Exceptions between Threads
    )

    KDE’s
    ThreadWeaver
    , a threaded job queue that
    offers some nice concepts, such as explicit job dependencies.

From a coarse inspection, Boost.Act seems appealing (TBB really
isn’t an option for OOo, no?) – boost is high-quality anyway, and I
think that the Action
concept is a richer one than Futures.

Basically, the advanced approaches listed above employ declarative
semantics, i.e. one no longer needs to create threads by hand, assign
a thread function, and manually serialize access to shared
state. Instead, one simply states that a loop body shall be executed
in parallel:

parallel_transform( src.begin(), src.end(), dest.begin(), bind(_1 * 42) );

This multiplies each element in src with 42, and stores
the result in dest. How many threads are created, and
where they are taken from is completely up to the implementation – a
typical threading runtime on a machine with N cores will
hold a static pool of N threads, each of them receiving a
chunk of the parallel_transform workload from above. In
the old way of doing multi-threading, this behaviour would potentially
be spread over the whole code base (making it a cross-cutting
concern, just like explicit method guarding is today – with thousands
of osl::MutexGuard(m_aMutex) invocations all over the
place).

One of the crucial points of doing stuff like the above in a safe way
is to avoid hidden access to shared state. Access to the container can
be serialized by parallel_transform, but if the called
functor modifies shared objects concurrently, all bets are off. Thus,
the listed frameworks usually have some safe-guards in place, most of
them using a way of explicitely annotating free-standing functions and
classes to be thread-safe (note that this tagging of thread-safeness
can easily be achieved by means of the UNO
threading framework
as well – Kay already has different
uno::Reference types on his list, to distinguish between
thread-safe and thread-unsafe components during compile-time). Another
important concept noteworthy here are pure functions,
i.e. functions that, given the same input parameters, always return
the same result, no matter where or when called (related, but not
exactly the same are referentially
transparent
functions, which could also be exploited by an
optimizing parallel execution runtime). It’s always safe to call pure
functions on disjunct parameters concurrently with no extra
synchronization effort. A safe (because syntactically limited) way of
defining pure functions is via boost’s bind or
lambda library. There, expressions that modify shareable state could
simply be banned from the available syntax (they aren’t – but it’s
possible to do that).

Need an example? Good, take the standard one, that so naturally asks
for parallelization, that it’s usually been sped up via SIMD in the past (you
might have noticed that those parallel_something
algorithms principally apply the same instructions (the functor to
run) in parallel to all container elements – that’s SIMD), namely operating on image
pixel:

// color-convert an image from RGB to CMYK color space
parallel_transform(src_img.begin(),
                   src_img.end(),
                   dst_img.begin(),
                   bind(&rgb2cmyk,_1,cref(conversion_parms));

This is the archetype of inherent parallelism, having a huge amount of data (image pixel), that all need to have the same operation applied to. Disregarding memory bandwidth for the moment, this snippet scales linearly with the number of available cores.

Note the conversion_parms parameter to the bound
function, which can pose problems if rgb2cmyk silently
const_casts it to something mutable. This is the dark
corner of C/C++’s non-enforcable constness, and therefore, a minimal
amount of developer discipline is necessary: if objects or functions
employ magic behind their public interfaces (on-demand creation of
data, buffering, caching, copy-on-write), this still needs to be made
thread-safe explicitely. Fortunately, for each of the aforementioned
items, there’s a thread-safe helper available in OOo (use rtl::Static
for late-initialization of static data, boost::shared_ptr
for lazy initialization of class members, caching and buffering, and
cow_wrapper
for copy-on-write).

To be continued.

After much prep work (kudos to you, Armin!), things start to look like it’s working:

screen shot

(if you inspect this carefully, you’ll see the difference between the preview image on the left side, which is still rendered the old way, and Draw’s edit pane content to the right, which now gets displayed via cairocanvas).

Gory details, live demo, and plans how to move on at this year’s OOoCon. If you can’t wait: OOoWiki has a brief hacker’s guide.

Started in late 2002, OOo’s module basegfx
is now a pretty mature collection of data structures and algorithms
for graphics processing. The set of exported headers sport abstract
data types such as:

  • points, vectors, boxes, ranges, bezier curves
  • matrices
  • polygons and collections of polygons (“poly-polygons”)
  • </ul

    Those are (where sensible) provided in the flavors

    • 1D, 2D, 3D
    • integer and double-precision floating point

    In addition, especially the polygon ADTs are shipped with a set of associated algorithms (provided as free functions, instead of
    member functions):

    • a poly-polygon clipper (comparable to what
      gpc is doing)
    • a poly-polygon triangulator (convert a polygon to triangles, which
      can be fed to a GPU)
    • a function to generate stroked paths from a given polygon
    • plus a lot of basic, but very useful stuff (like svg im/export, scan
      conversion, arc and circle generation, etc.)

    What’s missing, though, is everything related to bitmaps. That’s where
    basebmp
    comes into play, started a few weeks ago and set to provide for pixel
    processing, what basegfx provides for the symbolic graphics
    primitives.

    I’ve initially started this work by using AGG, but quickly found out that
    adding pixel formats other than integer true color ones is pretty hard
    (Because the code is largely undocumented. Because the assumption that
    pixel values can be modified by shift left and right operations, and
    are PODs (i.e. non-class types) is all over the place. And because
    basically, a way to somehow get a hold to the famous extra level of indirection is
    missing. Which, as usual, would have solved this software engineering
    problem). Instead of starting from scratch, we’ve based this on vigra, a framework for generic image processing, which provides an elegant way to add ‘special processing’ before an actual pixel gets assigned a value.

    In order to really get to the point here, I first need to sketch the
    way generic bitmap processing works: it’s the idea of separating data
    access and the actual bitmap manipulation algorithm, using C++
    templates. The concept is borrowed from the STL – bitmap pixel can be
    accessed by (two-dimensional) iterators, which in a very basic sense
    behave like a pointer:

    while( i!=end )
        *i++ = rand();
    

    If i is a bitmap iterator, the pixel will be set to
    random values with this algorithm. The type of i can be chosen freely,
    and so can the actual pixel format of the bitmap. If one wants to
    e.g. add palette lookup to this algorithm (or scale the random values
    into the appropriate color range) without changing the algorithm’s code, either extremely ugly hacks
    involving proxy objects are necessary (because operator*() is expected
    to return an lvalue, which gets assigned the new value directly), or
    one needs a direct way to ‘see’ both the iterator and the new value
    prior to assignment:

    while( i!=end )
        acc.set(i++, rand());
    

    The acc variable is a pixel accessor in vigra lingo,
    which takes the iterator and the new value, and accesses the
    corresponding pixel. Now, adding palette lookup, scaling, blending,
    mask operations and much more is as easy as providing a different
    accessor to the algorithms is. Plus, it’s completely orthogonal to
    both the iterator implementation, and the algorithm used. That means,
    you define a set of iterators (for packed-pixel images, for r8g8b8
    images, for r5g6r5 images), a set of accessors (for true color pixel,
    for palette lookup, for blitting through a binary mask), and a set of
    algorithms. And you combine all those freely, i.e. you have the
    cartesian product of variations to choose from.

Following my habit to write something about source code analysis every
few weeks, here comes the next installment under that topic:

Initially set off by KaiB’s
pain with OOo’s build slowness, I remembered some rules from John
Lakos’ ‘Large-Scale
C++ Software Design’
tome – and to first off assess the state of
affairs regarding include dependencies, found this:

  • DEPS, a tool to extract dependencies (e.g. includes) from source code
  • cinclude2dot, a tool plotting a project’s include dependencies as a GraphViz dot graph

Somewhat related (because they also facilitate getting an overview
over large code bases) are

  • Bloof, an infrastructure for processing of version control data, and
  • StatCVS, which extracts statistical HTML reports from a CVS repository.
  • (thanks Pavel for
    pointing me to it).

Thanks to Radek’s tireless work, CWS cairocanvas is now finally integrated (into the 2.0.3 code line). Took about eight months to get there, and now provides a nicely anti-aliased slideshow experience, with even the alpha effects running at a decent speed.

The downside: you need a fairly recent X server and cairo 1.0.x on your system.

After having played
a bit with cpd, I
went searching for more elaborate source code analysis tools. And there are some, even in the FLOSS realm:

  • pygccxml, a gcc plugin to generate XMLized
    output from the AST (plus a python framework to actually work with this representation).
  • gccxml, quite similar to pygccxml, focused on extracting declarations
    only.
  • synopsis, a standalone parser plus analysis framework – a bit lacking
    for c++, which is inherently hard to parse.
  • opencxx, same as above (actually a predecessor/inspiration for synopsis)
  • rtlcheck, a gcc plugin that extracts code on the RTL (register-transfer level).
    Currently, seems to be used only for c, extending that to c++ should be both easy & rewarding (as rtlcheck appears to be the
    most elaborate project of all listed).
  • Also quite interesting, though with different scope (being a runtime-analysis tool) is the
    Design Recovery Tool, roughly characterizable as a means to get from UI
    actions to the code involved processing them.

I do like the gcc patching/plugin approach the most, knowing how complex it is to get c++ parsing even half-way right. And as OpenOffice.org is such a beast, code-size wise, getting all the help we could from automatic tools appears wise to me.

Quite coincidentally, after having rambled about everybody rolling their own copy-on-write implementations, I’ve found severe issues with some of the COW implementations in VCL. Most notably, they employ 16 bit refcounts – which is a bad idea indeed, when used for objects that occur ten thousands of times, e.g. in metafiles. brrr.

Thus, this needs some larger fixups, work for that is going to happen in CWS cowfixes01.

Since nobody complained, checked in cow_wrapper.hxx in a new module, under the tools project (which seemed to me the right place for a tools package): o3tl/inc/o3tl/cow_wrapper.hxx contains the stuff, the remainder of the module is currently a mere unit test for said template.

The scope for o3tl is admittedly kind of ambitious, as it should contain “…very
basic (template) functionality, comparable to what’s provided by boost
or stl, but specific to OOo (what comes to mind are e.g. stl adapters
for our own datatypes and UNO, and stuff that could in principle be
upstreamed to boost, but isn’t as of now).”

After fixing a self-assignment blunder in basegfx/source/matrix/b2dhommatrix.cxx, which happened in the copy-on-write-related parts of operator=, I began to wonder why on earth everybody and his grandma was doing cow (copy-on-write) by hand (a search for mnRefCount and m_nRefCount (though certainly only a subset) should give you an idea how abundant this theme is. BTW: I generally agree with Herb Sutter, that cow has disadvantages in a multi-threaded world — but, alas, giant objects like bitmaps or polygons should better still provide a cow implementation in the back).

So, I dug at the usual places, and in the corners of boost and ASL, I indeed found a cow_ptr and a copy_on_write wrapper. Nice. Only that I found cow_ptr too ambitious (all the cloning stuff should be irrelevant for OOo’s cow usage), and copy_on_write lacked some of boost::shared_ptr commodities I got used to (checked_delete, glue for boost::mem_fn, operator forwarding, standard-compatible swap, you get the idea).

Consequently, I cobbled together a cow_wrapper, tailored to the usual, pimpled use of cow in OOo:

class Impl;

class MyClass
{
public:
   MyClass();

   void set( int v );
   int  get() const;

private:
   cow_wrapper mpImpl;
};

where the get() and set() methods simply forward to the Impl methods:

void MyClass::set( int v )
{
    mpImpl->set(v);
}
int MyClass::get() const
{
    return mpImpl->get();
}

And the trick is: cow_wrapper has operator-> overloaded for const and non-const access. For non-const access, the potentially shared copy of the Impl object gets copied (i.e. made unique).

Except that cow_wrapper doesn’t yet have all wanted features (checked_delete support still missing), and possibly needs some testing, I think this holds up for quite painless cow implementations (yes, I intend to move at least basegfx to cow_wrapper, eventually).

While debugging an especially insidious slideshow jerkiness (which only manifests itself on a single type of Toshiba laptop, and only with the most recent JDS builds — i.e. prolly exposed by that particular combination of NVidia chip and Xorg nv driver), I found relief (and distraction) in reading one of ESR’s inspiring texts: The Tao of Unix Programming (Chapter 1 — Philosophy got me straightened out again).

Which in turn got me (somehow) looking for keithp’s article about X11 network performance, which in turn referenced xcap, which in turn pointed me to the thing that possibly goes wrong in the setup above: after every bitmap operation, OOo seems to scan the X11 queue for GraphicsExpose events, and generates paint events from that — even if there’s nothing that’s been exposed, or needs repaint. I guess that needs a change.