Saturday, 20 December 2008

Asserting a Coding Style

There are two schools of thought regards how your code should fail: either it should try to keep going as best it can, or it should hard stop.

After many years in IT, I'm firmly in the hard stop school. It makes it far easier to debug, if your program fails at the point at which an exception occurs, even if it is a less pleasant experience for your users. (And trust me, trying to talk someone on Windows through setting up a debugger is not easy - I mean, I can't even do it for myself at the moment).

The 'try to keep going as best it can' is often a euphemism for silently corrupt something, violate the assumptions we've got in the code and then fail inexplicably further down the track.

Luckily for people playing the game, Unangband, like Windows, is written to the silently corrupt and experience weird errors standard. Which is why I have in the back of my mind, the intention to adopt a coding standard that promogulates hard failure when it detects an exception.

The easiest way to do this is have an ASSERT() macro which if false, does something that'll cause general program fault, like dividing by zero. ASSERT is used in a couple of places in the Unangband source, where my more disciplined co-developer has attempted to impose some kind of rigor in the codebase.

What has piqued my interest and led me to discuss this, is a new dungeon generation algorithm posted on rec.games.roguelike.angband. This algorithm has been kindly put into the public domain by it's developer Kusigrosz, and you'll be seeing it at some point in Unangband as a new room type, nest and as a new dungeon theme. (Screen shot of the output either at the link or the start of this article).

The C source for this algorithm uses another macro REQUIRE() which if false, immediately exits the program and dumps the source code line at which the fault occurs. REQUIRE() is like ASSERT() in that it lends itself naturally to specifying the pre-requisites and post-requisites for a function. You can for instance add a line REQUIRE(x>0); which intuitively indicates that at this point in the code we expect x>0.*

What I ideally would like, is REQUIRE() but with the ability to set a failure level at compile time or run time. That way I could have a number of differing levels of failure, allowing the user or distribution to select how hard they want the program to fall over. The levels would be:

0. REQUIRE() compiles out completely, used on systems with limited memory or performance.
1. Silently return from the function.
2. Write a warning to a log file, and silently return from the function.
3. Write an error to standard out, including the line number of the source code and exit.
4. Dump the stack to a file, including the line number of a source code and exit, useful for a debug build for users who don't have the technical skill to set up a debugger.
5. ASSERT().

I'm quite happy to be able to bloat the source doing this in the long term, but I want to make sure I'm doing so with a robust mechanism.

So my questions are:

1. Is there any other mechanism out there that better supports this hard, flexible fail over?
2. What is the best C constructions to ensure levels 0 and 1 work in a platform independent manner? Remember, I don't know necessarily whether the function is intended to return anything at all.
3. Is there any macros or public domain code that supports dumping the stack at runtime? This would also need to be relatively platform independent. Note that Windows is the only platform where this is currently a major problem.

Thoughts and feedback welcome.

* The Unangband form of this, adopted from Angband, is as follows:

/* Paranoia */
if (x < 0) return;

which is very much of the form 'silently fail and keep going'. [Edit: And you'll see here, prone to failure. Of course, I meant if (x <= 0). But then out by one errors are always fun]. [Edit 2: And it also looks like I reversed the conditionality of the assert() statement, in which case assert() and require() are equivalent. Article amended. And I should read assert.h].

Similarly, the TRY {} CATCH construction is mostly focused on trying to recover from exceptions.

16 comments:

  1. The link to the source doesn't work here. Probably something strange on my end. :/

    I agree with you on the 'fail hard and fast' school of code writing. I'm afrwaid I can't really be more help than that. Setting the stuff at compile time is easily achieved with a macro, runtime you would have to use a function, but the compile out completely setting would have to be compile time I would think.

    ReplyDelete
  2. Ad 2. I would just do what standard "assert.h" does - make a macro dependent on some 'DEBUG_LEVEL' symbol, that would either compile to nothing, a code that returns, or a call to some helper function.
    As for the return value, make it an argument in your macro. Then you can just write
    REQUIRE(x > 0,);
    if you want the function to fail with "return;" or
    REQUIRE(x > 0, 42);
    if you want it to "return 42;".

    Ad 3. Does "execinfo.h" in GNU libc do what you want? I'm not sure if it works in Windows version.

    ReplyDelete
  3. new algorithm output immediately reminded me of river basins ...

    ReplyDelete
  4. Looks like ye standard cellular automata (based on looking only at the two pictures) to me thats already in the rogue dev wiki...

    as for the require macro.. Sometimes I use DBC for C, its written in ruby and its pretty cool doing pre/post/invariant rules in your C code if you specify them in comments before the routine itself..

    pretty neat.
    http://www.onlamp.com/pub/a/onlamp/2004/10/28/design_by_contract_in_c.html

    ReplyDelete
  5. Stu, that's very neat indeed! Any chance it works for C++ (or there is an alternative that does)? (Judging from the article I guess not.) The custom script puts me off a bit, I dunno how easy it would be to integrate something like this in my VS2005 environment... I'll start digging for DBC alternatives now :)

    ReplyDelete
  6. Humpolec: You mean there's an assert.h? :)

    VRBones: It reminded me of a rapidly exploring random tree for the same reason.

    Stu: The implementation picks a random cell, do something based on the nearest neighbours - yes, it is undoubtedly a cellular automata. Its optimised, of course, to ensure that it picks only cells that it can do work with.

    I'm trying as a side project to come up with a generic cellular automata class in Javascript for the PCG Wiki if anyone wants to help out. Have a look at the simple first pass attempt if you want. It's intended as a teaching tool to help people understand how flexible CA are.

    Jotaf: Since DBC is intended for a class based environment, I'm sure someone will have done something similar for C++. Let us know if you find anything.

    ReplyDelete
  7. Design by Contract is OK, indeed, especially if it's not mandatory. Good luck with it.

    I've tried to use asserts in UnAngband as often as I did in my other projects, including those in C, but it somehow didn't work. I think one reason is the functions in Angband are often too long, so in the middle of the function body you lose track of soundness conditions. The other reason may be the overall hackish smartness of the code, when the same code path is used for different purposes. Another reason is poor separation of UI and logic, e.g., you can expect any piece of code to happily print things on the screen directly, so it's hard to write assertions about the state of the screen (see the easy-more nightmare). One more reason is low-level stuff, e.g., a lot of logical properties can only be expressed as conditions on bit-flags --- which then fail miserably after addition of one too many flag that spills to another word, etc.

    ReplyDelete
  8. Mikolaj: Completely agree - I don't know if this will be useful, but its something to keep in mind, and let me sleep at night :)

    One reservation I have with design by contract is that you made end up with a contract as complex as the code you're intending to implement. In which case, you're not helping at all. It's mostly intended to formalise the 'paranoia' code that occasionally gets injected into a function.

    ReplyDelete
  9. > end up with a contract
    > as complex as the code

    Happens in all (semi-)formal methods. There are two popular answers:

    1. Hang on to it and next time, when you reimplement the function in a very tricky and stunningly efficient way, the contract will at last prove simpler. You may even keep the old simple implementation on the side for comparative testing.

    2. Screw it and write a much simplified "contract" formally plus a bit of hand-waving or pointing to code in comments. I the limit case, the contract is empty, there are no comments and all the hand-waving is in the name of the function and the readability of the code. Improve from there.

    ReplyDelete
  10. I have been trying to work out ways of unit-testing Angband, but the code requires extensive refactoring before that can happen. My plan was always to have assert()s which failed when running to unit-test, but which didn't do anything when not unit-testing.

    ReplyDelete
  11. Mikolaj:

    > > end up with a contract
    > > as complex as the code

    > Happens in all (semi-)formal methods.

    Hang on. Am I not then effectively duplicating the code in the contract? In which case, should I be refactoring this out? :)

    It seems to me more and more that I might just be better off ensuring that the code, just, you know, works, than trying to specify complex contracts...

    ReplyDelete
  12. > Am I not then effectively
    > duplicating the code in the contract?

    Yes, you are. So what? If the language for writing contracts is much poorer than the programming language, you may even end with a contract longer than code. So what? The purpose of contracts is not to shorten the source files.

    The longer answer is that it depends on the code. If the function's code is deep, with a mathematical model, lots of logic in it and simple, high-level types of the function's input and output, then the contract will be much simpler than the code, even if the implementation is very naive (and much more so if the implementation is optimized and tricky --- moreover, the contract stays the same, so refactoring is cheap!!!).

    If, however, the function's code is shallow and/or with low-level (or none at all) function's input and output, as is common for e.g. UI code, the contract won't magically shorten the boring list of random things to do in random order. Perhaps you don't want to write contracts for such code.

    Of course, in codebases with good separation of boring (UI, ports, parsers and pretty-printers) and interesting code (game rules, content generation, AI), the decisions are easier. You can decide to write contracts only for the interesting code and don't end up with a half-done chaotic mess.

    ReplyDelete
  13. I was a bit carried away with the "refactoring is cheap" line. It depends on the refactoring. Some kinds touch contracts just as much as they touch the code. Others just tweak the implementations (e.g. switching to a different library), so the contract stays the same.

    > should I be refactoring this out?

    No, you shouldn't, for exactly the reason, that the contract often stays the same, while the code changes. So, the old code becomes the contract for the new (possibly more complex) code. Which is not ideal, but still much better than no contract or an absurdly long one written using e.g. only conditional expressions and inequalities. Moreover, you can sometimes test with the old code in place of the new one and compare the results.

    But for imperative programming languages this is all probably moot, because you usually _can't_ write contracts using them. For functional programming languages, OTOH, it's customary to give full power of the language and then some to the contract writer.

    ReplyDelete
  14. I feel like I'm taking the first steps on the path to programming enlightenment...

    ReplyDelete
  15. Philosophically, the fail-fast approach is what lets the Erlang programming language reach ninenines of reliability - 99.9999999% - in a certain telecom switch application. That's 31ms of downtime per year!

    Then once you start getting into the junction of concurrency and fail-fast, you see this little flash of light behind your eyes, and you want to start programming everything in Erlang.

    Best of luck on your respective trips to programming Nirvana (:

    ReplyDelete
  16. The thing about the paranoia code is that it is only put there in the first place because the programmer suspects that something may be wrong. I found a years-old bug in O/FAangband recently which was caused by a piece of code introduced as a make-sure-this-definitely-works measure, and had been causing significant player grief ever since.

    Whenever I try to state a general strategy for this sort of thing it always comes back to some sort of motherhood statement like "Do your best", and then I just end up deciding that lots of playtesting is the answer. So I guess I'm just saying do whatever the hell you like. In fact, (*warms to subject*) if roguelike programming isn't done for fun, then nothing is. Stop worrying, damn you! Have fun!

    Over and out.

    ReplyDelete