lmb is currently certified at Master level.

Name: Lars Marowsky-Brée
Member since: 2000-02-24 15:38:43
Last Login: 2012-05-04 09:42:58

FOAF RDF Share This

Homepage: https://www.xing.com/profile/Lars_MarowskyBree


Roles: Geek, spare-time philosopher, employed as project manager and architect for High-Availability and Storage at SUSE (Novell.)

Google actually knows what I'm doing and what I have done better than I do, so I'll not repeat what can be found.

These are entirely my personal opinions. If you are inclined to believe that I might be representing official company policy, opinion, or future direction, you are seriously deluded about big companies, and it would be my pleasure to sell you some real estate ;-)


Recent blog entries by lmb

Syndication: RSS 2.0
9 Feb 2011 (updated 9 Feb 2011 at 13:45 UTC) »

There is such a thing as a free lunch!

A current discussion reminded me that, while certain parts of our cluster stack do have comprehensive regression tests, not all parts do. However, test coverage is crucial: untested code is broken code, and clean-ups and re-factoring become a risky business, which impedes adding features and maintainability.

Code that is tested can be cleaned up with confidence; new features can be added safely, trusting that old functionality is not broken. Good tests make for good sleep.

We need more, and better, tests for all aspects of our cluster stack; functional and non-functional both. If I had my test-driven way, I'd veto every contribution that came without sufficient tests, but that's a heavy obligation to place on contributors. Providing tests ought to be perceived as a positive task, not as a burden.

We need the awareness that meaningful tests are good all by themselves, and that not just feature work is cool. Tests ensure quality, inspire confidence, reduce risk for release managers and contributors, lead to more modular and maintainable code, and allow you to point out errors other contributors make - what more could you ask for?

Hence, I am announcing the Almost Free Lunch initiative: contribute an Open Source, reasonably comprehensive and non-trivial test suite for one of our components, document it so that people can actually run it ;-), and I will invite you to lunch! (I would offer to sing songs praising the glory of you, but that would be rather counterproductive.)

The obvious place to start with is the one which started this discussion: our resource agent test coverage really needs more love. But you will find all projects - Linux-HA, Pacemaker, corosync, OCFS2, GFS2, ... - more than willing to provide you with hints where our tests can be improved. Contact me, or any of the projects' mailing lists, to learn more.

18 Oct 2010 (updated 18 Oct 2010 at 13:07 UTC) »
Linux Magazin Artikel zu Pacemaker, OCFS2 und DRBD

Dear international readers, what follows is a critique of a German language article, and hence the rest of this post will also be in German.

Selbstverständlich habe ich mich sehr gefreut, in der Ausgabe 11/2010 zu diesen Projekten eine Setup-Guide zu lesen, noch dazu auf Basis von openSUSE; alles Themen und Projekte, die mir sehr am Herz liegen.

Jedoch hat mich der Artikel fachlich sehr enttäuscht.

Im einzelnen meine Kritikpunkte:

  • Im Artikel wird ein Active/Passive Fail-over für einen LAMP-Stack konfiguriert. In diesem Fall ist OCFS2, genau wie DRBD's Active/Active Mode, fehl am Platz - DRBD sollte ebenfalls in eine Active/Passive ("Single Primary") Konfiguration betrieben werden.

  • Wenn schon OCFS2 zum Einsatz kommt, so sollte in jedem Fall OCFS2 unter der Kontrolle von Pacemaker und Corosync gestartet werden, und nicht via Init-Scripten und /etc/fstab. Ansonsten steht zum Beispiel vollständiges POSIX Locking nicht zur Verfügung; desweiteren kann die Konfiguration von /etc/ocfs2/cluster.conf entfallen, weil diese Informationen automatisch von Corosync übernommen werden.

    Gleiches gilt natürlich auch für DRBD: auch dieser Dienst sollte von Pacemaker gesteuert - und somit auch überwacht - werden. Nur so steht die volle Funktionalität von allen Cluster Komponenten und ihr Zusammenspiel sicher gestellt werden.

  • Auch beschreibt der Artikel, ohne in irgendeiner Form die Konsequenzen dessen zu diskutieren, die Deaktivierung des IO-Fencing-Mechanismus "STONITH". Dadurch können Daten-Diskrepanzen auftreten.

  • Gänzlich entsetzt war ich von dem "empfohlenen" Wrapper, der LSB Scripte "cluster-tauglich" machen soll. Nicht nur, dass der Cluster-Stack selbstverständlich eine Möglichkeit zur Einbindung von LSB Scripts mitbringt (via der Resource Class "LSB"), sondern das referenzierte Script ist auch noch fundamental kaputt - es wartet nicht, dass der Dienst wirklich gestartet oder gestoppt wurde, es gibt falsche Metadaten aus, und die Rückgabe-Werte der Status- und Monitor-Operationen sind fehlerhaft.

    Und dann wird dieses defekete Wrapper-Script auch noch für Dienste verwendet, für die der Cluster Stack selbstverständlich vollständige OCF Resource Agents mitbringt - nämlich Apache und MySQL.

  • Die Konfiguration des Clusters könnte durch Verwendung einer Resource Group, anstatt von drei Abhängigkeiten, ebenfalls gestrafft werden.

  • Geschwiegen sei davon, dass bei Ausfall eines Systems das andere, so wie in diesem Artikel angekündigt, eben nicht übernehmen würde, da die no-quorum-policy nicht gesetzt wird.

  • Ebenfalls wird verzichtet, auf Grundlagen eines redundanten Systems einzugehen: so wird nicht einmal eine unbedingt notwendige redundante Netzwerk-Anbindung konfiguriert, noch empfohlen.

  • Das im Detail Markennamen - openSUSE, SLE 11 HAE - falsch geschrieben sind, ist dann nur noch das i-Tüpfelchen.

Es fällt mir schwer, einen solchen Artikel konstruktiv zu kritisieren; werter Autor, lieber Lektor: das geht so nicht!

On selecting good timeouts

Timeouts are a common design choice or implementation detail in any computer system, but are in particular popular in High-Availability clusters (such as those build with the SUSE Linux High-Availability Extension and other stacks that are similarly based on corosync and pacemaker).

They are seemingly straightforward to detect faults: if the task doesn't complete within N seconds, it is considered failed, and recovery attempted. (The task could be anything from a network messaging protocol, a database starting under the cluster's control, any IO, and a number of other cases.)

However, selecting a good value for the timeout is less straightforward than it may seem; more often than not, they are much too short. This seems to stem from the belief that a fast response to failures is unconditionally a good thing: the system will perform better if timeouts are shorter. This is not quite true, though.

To illustrate, assume two scenarios:

  1. First, that the system has failed in such a way that it will not respond with a failed response to a monitor task immediately, but instead runs indefinitely unless aborted by the timeout.
  2. Second, that the system is operating fine, but experiencing a brief period of stress, where responses are delayed, just to the edge of the timeout value.

Now, let us explore the impact of a timeout that is one second "too long"; and then, one that is one second "too short".

For a too long timeout, the failure in the first scenario is detected one second later, adding one second to the recovery time. In the second scenario, no timeout occurs, and the system continues as normal.

For the too short timeout, the first scenario is recovered one second faster; the second scenario causes an unnecessary recovery, probably incurring a real service outage in the attempt to restart the application, or at least a brief period without service!

Another problem arises from how timeouts are often chosen; of course, if they were obviously too short, administrators would immediately notice, since their system would never get off the ground at all, but immediately start spewing errors. Instead, the timeouts are usually adequate for the tested scenario (note that you can use the pacemaker monitoring tools to look at the actual runtime of operations); if your test load exceeds the load of your live system, raise your hand - more often than not, it does not.

Under a stress/peak load, the system response tends to degenerate exponentially; it will not just slow down by ten percent, but by thirty. If this scenario gets treated as a failure, the likelihood that the fail-over system will experience the same level of stress is high; worse, requests may have queued up, and if - due to the stress, remember - the system did not shutdown cleanly, an application-internal recovery phase will compound the effect.

Monitoring application performance for load-distribution is quite a different task from monitoring application correctness. The former is important, and a performance degradation may also imply violation of service level agreements; however, initiating recovery through restart is unlikely to alleviate the problem. (In a pacemaker cluster, this would best be monitored externally and fed into the utilization constraints of the resources and nodes.)

In summary, a too short timeout is the worse choice; rather, it is safer to make hard timeouts large enough beyond reasonable doubt. Yes, it will slow down the fail-over and recovery slightly, but at least not cause them by mistake.

(For a rather excellent and exhaustive treatment of this subject matter, see K. Wolter, “Stochastic Models for Fault Tolerance: Restart, Rejuvenation and Checkpointing,” Habilitation Thesis, Humboldt-University, 2007.)

19 Jul 2010 (updated 19 Jul 2010 at 22:26 UTC) »

It has been a while since I took the chance to blog here; the time has been pretty packed with shipping SUSE Linux Enterprise 11 Service-Pack 1's High-Availability Extension (or SLE HA 11 SP1 for short ;-), and supporting the first deployments.

It is a good time to look back and review the very awesome new features that the community developed along with us, and that we are shipping as Enterprise-ready now.

A feature that I am personally very impressed by is the OCFS2 reflink feature; basically, OCFS2 cracked the hard nut of cluster-wide copy-on-write snapshots, which LVM2 has been trying to for years. This allows space-efficient and very fast provisioning of new VMs, snapshots for backup, cloning from templates, cloning from clones, etcetera; it really is amazing.

For those of you who prefer a visual, the team from NGN taped a video with me being interviewed by Sander at Novell's BrainShare in Amsterdam; this is my first video interview ever!

In case you would like an audio-only review, Ron and terry interviewed me for Novell Open Audio as well.

I hope you find them informative - if so, please spread them, and let me know your feedback.

My colleague Tim has drawn awesome cartoons to illustrate my last cluster zombie story on why you need STONITH (node fencing). Clusters and the undead, I spot an upcoming theme for my stories ...

106 older entries...


lmb certified others as follows:

  • lmb certified lmb as Master
  • lmb certified dan as Journeyer
  • lmb certified gbritton as Journeyer
  • lmb certified seklos as Journeyer
  • lmb certified alan as Master
  • lmb certified Telsa as Journeyer
  • lmb certified ajh as Master
  • lmb certified miguel as Master
  • lmb certified kitsune as Apprentice
  • lmb certified ben as Master
  • lmb certified chbm as Journeyer
  • lmb certified willy as Journeyer
  • lmb certified mkp as Master
  • lmb certified pp as Master
  • lmb certified shaver as Master
  • lmb certified phil as Master
  • lmb certified uzi as Journeyer
  • lmb certified riel as Master
  • lmb certified prumpf as Journeyer
  • lmb certified dick as Journeyer
  • lmb certified DV as Journeyer
  • lmb certified Zaitcev as Journeyer
  • lmb certified rmk as Master
  • lmb certified penguin42 as Journeyer
  • lmb certified jdube as Apprentice
  • lmb certified sad as Apprentice
  • lmb certified zab as Journeyer
  • lmb certified AlanShutko as Apprentice
  • lmb certified ajkroll as Journeyer
  • lmb certified scandal as Master
  • lmb certified LGaby as Journeyer
  • lmb certified adam as Master
  • lmb certified Bryce as Master
  • lmb certified AnnaMoo as Apprentice
  • lmb certified fork as Journeyer
  • lmb certified pjones as Apprentice
  • lmb certified alanr as Master
  • lmb certified prw as Apprentice
  • lmb certified lilo as Journeyer
  • lmb certified hpa as Master
  • lmb certified kgb as Journeyer
  • lmb certified dwmw2 as Master
  • lmb certified Netdancer as Journeyer
  • lmb certified LenZ as Journeyer
  • lmb certified axboe as Master
  • lmb certified cord as Journeyer
  • lmb certified darkworm as Journeyer
  • lmb certified rbrady as Journeyer
  • lmb certified jLoki as Journeyer
  • lmb certified skh as Journeyer
  • lmb certified mperry as Journeyer
  • lmb certified acme as Master
  • lmb certified eckes as Journeyer
  • lmb certified wli as Master
  • lmb certified ladypine as Journeyer
  • lmb certified riggwelter as Journeyer
  • lmb certified kroah as Master
  • lmb certified Marcus as Master

Others have certified lmb as follows:

  • lmb certified lmb as Master
  • pp certified lmb as Master
  • dick certified lmb as Journeyer
  • gbritton certified lmb as Master
  • ajh certified lmb as Journeyer
  • harold certified lmb as Journeyer
  • uzi certified lmb as Journeyer
  • chbm certified lmb as Journeyer
  • seklos certified lmb as Journeyer
  • rmk certified lmb as Journeyer
  • zhp certified lmb as Journeyer
  • ajkroll certified lmb as Master
  • jes certified lmb as Journeyer
  • dwmw2 certified lmb as Journeyer
  • mkp certified lmb as Journeyer
  • alan certified lmb as Journeyer
  • fork certified lmb as Master
  • mark certified lmb as Journeyer
  • jdube certified lmb as Master
  • marcelo certified lmb as Journeyer
  • phaedrus certified lmb as Journeyer
  • riel certified lmb as Journeyer
  • acme certified lmb as Journeyer
  • splork certified lmb as Journeyer
  • kgb certified lmb as Journeyer
  • axboe certified lmb as Journeyer
  • LenZ certified lmb as Journeyer
  • nixnut certified lmb as Master
  • Stalker certified lmb as Journeyer
  • cord certified lmb as Journeyer
  • ths certified lmb as Journeyer
  • jLoki certified lmb as Apprentice
  • Platypus certified lmb as Journeyer
  • skh certified lmb as Master
  • fxn certified lmb as Journeyer
  • eckes certified lmb as Journeyer
  • Grit certified lmb as Master
  • zwane certified lmb as Journeyer
  • ladypine certified lmb as Master
  • mulix certified lmb as Master
  • riggwelter certified lmb as Master
  • sdog certified lmb as Master
  • Marcus certified lmb as Master
  • Zaitcev certified lmb as Journeyer
  • icherevko certified lmb as Master
  • rainer certified lmb as Journeyer

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page