Sunday, 6 June 2010

Proceduralism: Part Two (Taxonomies)

[You'll probably want to read the original article series that inspired this follow up, then start with part one of this series.]

In attempting to define a taxonomy of procedural generation, I was conscious of the lack of a rigorous vocabulary when talking about procedural content born from the lack of academic investigation of procedural content systems (as distinct from, say, Artificial Intelligence). That is not to say that procedural content generation hasn't been primarily a focus of academia: it is built from mathematical systems which have intrigued academics ever since the Fibonacci sequence, but there has been only a limited exploration of the consequences of these systems, particularly with regards to the game development. This is changing: particularly in the last two years spurred on in a no small part by the efforts of Julian Togelius (not to take lightly the contributions of many others in the field) but also by the validation of games as a field worthy of serious academic investigation.

But taxonomies are difficult beasts to design correctly, because language itself is used in two distinct ways in talking about knowledge and the properties of systems. Language is incredibly powerful tool which we often take for granted, and we often disagree strongly over attempts to provide definitions, which degenerate into arguments about semantic relationships of words to other words. In this sense, words are slippery creatures which we cannot pin down like butterflies to a mounting board.

There is one subset of language is useful: when we talk about the formal properties of a system. When I say that a particular maze is acyclic, I'm not referring to a debatable point which you can disagree on - I'm instead referring to a particular well-defined property of a particular maze which we can provide a number of different but equivalent tests to agree that this property exists for some mazes and does not exist for other mazes. This formal use of languages is equivalent in many ways to mathematical expressions: I've chosen mazes as a specific example both because of the utility of mazes for procedural content generation and the excellent reference on mazes at the Think Labyrinth website.

Using language in this rigorous way has its basis in mathematics, particularly the mathematics of proof, and has grown naturally from there to other scientific disciplines as they have adopted the tools of mathematics in its various forms. But at the same time, the phrasing and idiosyncrisies required of mathematical proof has been adopted as a rhetorical device, and spread much further than the methods of proof it is based on. You should be suspicious of anyone who talks about a definition of something by saying it is this, but not this, and because it is this and this, it must also through induction be this as well: because they are attempting to frame an argument in formal language about a system which is not formally defined, and that is not how people are experimentally shown to give meaning to words.

(How they do give meaning to a word is likely to be something like Prototype Theory).

But while people are not swayed by formal logic, they are swayed by rhetoric, and so my earlier attempt to define a taxonomy for Procedural Content Generation can be seen primarily as series of rhetorical posts, arguing for something (PCG) and against something (little p procedural), and then expanding on what is great about procedural content generation, and setting up various categorisation tropes so that you felt like there existed a field of procedural content generation, stretched back to the history of early gaming (Elite, Rogue and Pascal - the person - if you were especially diligent).

It was, I'll state modestly, relatively successful. Not necessarily directly, but it did force me to actively expand on my initial categorisation efforts into the PCG wiki, which has hopefully created a little more interest in procedural generation, and attempted to provide a working vocabulary for practioners in the field. The one thing I think it has done well is popularized a way of stating what people are working on (Procedural Content Generation) - and the acronym PCG (which undoubtedly has been independently been coined elsewhere).

The taxonomy itself has not been useful. And attempts to categorise games using PCG using it has never got anywhere.

If you look at the PCG wiki algorithms page, instead of the six (or seven) point categories I outlined, you'll instead find a number of headings: concepts, map generation, sequence generation, ontogenetic and teleological, and the categories I proposed buried as relatively unimportant subheadings alongside various other algorithmic techniques and high-level concepts.

Concepts is kinds of a catch all category, which should really be the name of the whole page (instead of algorithms or code), because there is a large amount of talking about concepts on the wiki, and links to people outside the wiki, and precious little in the way of actual algorithms and code. I would love at some point to sit down and write lots of example code, and if you have the time, please do so. But you'll probably succumb to Doull's Corollary.

(For the record, Doull's Law as "Any time saved using procedural content generation techniques will be lost staring at the resulting screen saver." and Doull's Corollary is "Don't ask the procedural generation community for help because they're inspired by procedural generation as a way of avoiding the type of work you're asking them to do.")

Map generation and sequence generation were suggested early on by droid as a way of breaking down a relatively monolith version of this page into two different but related types of algorithms. They are good at doing that, but of themselves are not especially useful as ways of thinking about procedural content generation.

Ontogenetic and teleological on the other hand are incredibly useful concepts which I stole shamelessly from an article by Mick West. (I'd love to know if there are any earlier references to these two terms being used in association with procedural generation). In short, ontogenetic algorithms attempt to duplicate the end result of a physical process without emulating the intermediate steps, and teleological algorithms attempt to simulate the physical processes which result in the desired procedural output. Each approach can be used in combination, but any time you attempt to create something procedurally, you will be guided by one or other approach.

Julian Togelius, first in a post to the Procedural Content Generation group, and then in a paper Search-based Procedural Content Generation written in conjunction with Georgios N. Yannakakis, Kenneth O. Stanley and Cameron Browne outlines some other decisions required in the development of a procedural content generation algorithm: online vs offline, necessary vs optional, random seed vs parameter values, stochiastic vs deterministic generation and constructive vs generate and test. Note that each of these is a continuum rather than a necessarily binary distinction, and they do not attempt to define what is not procedural content, but more what are the decisions around implementing or requirements of a particular PCG algorithm. I highly recommend the paper as an overview of PCG and a view of where academic research in PCG is currently concentrated. It is written at a level you should be able to follow even if you are not especially technical or academic.

There are other distinctions which you could attempt to make between various types of procedural content generation. One example I'll pick is content generation vs content selection - described in an article at Grand Text Auto. You'll see from my comments there that it is a distinction I don't necessarily agree with, although it is another attempt to redefine what I called user mediated content in the original article series.

The take away from this revised overview of PCG taxonomies is that you shouldn't wed yourself to any particular definition of what is or isn't procedural content beyond the point where it assists you in creating a particular procedural generation algorithm. At the time I wrote Death of the Level Designer, there wasn't a particular emphasis in academia on the Procedural Content Generation as a discipline, but since then (and I hasten to add, not because of the article) there is a growing belief that there are specific reasons to investigate PCG as a separate field of study distinct from its parts. You should use resource like the PCG wiki to help get inspiration for how to solve a particular problem, but feel free to look elsewhere (I'm fond at the moment of this implementation of Navier-Stokes equations).

In part three in this series, I'll ask the more fundamental question I've been avoiding so far: if words are slippery enough that we can't define a PCG taxonomy, can we at least define what procedural content generation is?

No comments: