Saturday 5 June 2010

A Heated Debate Over Balance: Part Two (Viability and Validity)

In part one of this article series, I outlined some ways of balancing games in reaction to a nerf to Team Fortress 2's Pyro. Since then, the Pyro has been (mostly) unnerfed by tweaking the percentages and duration of various values but the Team Fortress 2 Pyro community remains dissatisfied with class as it currently plays. It's time to revisit this discussion.

In the various Team Fortress 2 articles I've written, I've come up with a variety of weapon unlocks and suggestions for the classes in the game. I've not especially worried about the numbers, on the basis that these values require extensive testing - but I was proud of the variety of different ideas I'd come up in isolation.

But reading the Steam TF2 class and balance forums after these suggestions is a little disheartening. It looks like everyone has lots of ideas to contribute, and they're all prepared to go into lots of detail about these ideas, defending, tweaking and trying to balance them based on the opinions of even more people willing to label them as OP or UP (overpowered or underpowered). My thoughts are bound to get lost in the noise. And there is a lot of noise.

(I'm reminded of a Penny Arcade cartoon about the evils developing an MMORPG.)

It is hard to balance the Pyro because Valve hasn't given enough statistical information about the game as a whole, so any set of numbers anyone comes up with exists in a vacuum. It is incredible difficult to work out if a change is viable as a result. The way I get around this in Unangband is through algorithmic balancing: by running short artificial simulations to attempt to determine the degree of difficulty of a monster, and where to place it in the game. There are a lot of built-in checks and balances to allow this in a roguelike; most of which of these are absent in a multi-player game.

Especially critical for balance purposes is whether a choice is viable at high level play. High level competitive play in Team Fortress 2 is usually 6v6, with two soldiers, 2 scouts, a medic and a demo on each team. This structure has emerged organically from the interactions between the variety of TF2 classes and shows that the choice of class is important, and balanced in a way that is interesting and useful in tournament play. Making a class that was too viable by itself would disrupt the delicate balance that has evolved.

Last night I finally got a Team Fortress 2 drop I had been waiting for: the Scotsmans's Skullcutter. This is a recent addition to the game from a community contribution with the following stats: +20% damage, -15% speed, longer weapon reach.

So how can I determine if this choice presented to me is a viable one? Well, I can reasonably be assured that it is unlikely to be used in high level play: because at that level, a large part of the game is about mobility, and a speed reduction is a significant nerf at that level (although the demoman has other ways of quickly getting around). I can look at the percentage of players choosing to equip the item at tf2stats or try to get a recommendation on the Steam forums either from the Demoman forums or the victims of Demoman forums. I can also compare simulations of combat between the skull cutter equipped demoman against other classes, weighting the probability of getting a crit against the maximum or average hit points of a particular class.

But what a chore.

It was far easier for me to switch to the Demoman class, equip the weapon (along with the Chargin' Targe) and rush out and sever heads. 8 heads in 3 minutes later, I could confidently declare 'The demoaxe is OP' before switching back to whichever class I was intending to play for the evening.

That is because I'm not trying to find out whether the Skullcutter is a viable weapon - because I'm not able to run a controlled statistical trial or simulation. What I am looking for is for anecdotal evidence that the Skullcutter is a valid weapon to use at all. When we talk about interesting choices in games, we are not trying to determine whether the alternatives are equally viable, but whether the alternatives can be easily validated.

What do I mean by this? Let's take the Pyro's Homewrecker. This weapon has it's damage nerfed against players in return for increased damage against enemy buildings, and the ability to knock a Spy's sappers from friendly buildings. The Homewrecker is not an especially viable weapon. The damage nerf is too high and there is limited opportunity to use its special abilities.

But people still use the Homewrecker: 39% of them when given the choice. This is because it is easy to validate when the Homewrecker is useful - by knocking sappers off a friendly building. Everytime you do this, you are unconsciously reinforcing the correctness of the decision to use the Homewrecker. And if you see sapped buildings when you don't have a Homewrecker equipped, you are immediately reminded of the presence of this other choice.

Any time you are forced to make a decision you are immediately also creating a doubt in your mind about whether you have made the right choice, so you are unconsciously attempting to look for evidence to support the decision. People play games not for the choices, but for the validation of those choices.

So we have a contrast of types of decision: viable decisions are only important at high level play, where the percentage amount damage or movement is changed by are critical, but validated decisions are important at all levels of play. Equally, you can present lots of interesting choices which are not viable, but are still valid - the fact there are nine classes in TF2, only five of which are viable, but all nine of which are valid. A lot of balancing work, and statistical analysis, and heated debate is put towards trying to make decisions viable - but it is far more important to make sure decisions are valid.

I've suggested there are two ways that a decision can be validated: you can have either negative or positive reinforcement. The negative enforcement - where you are shown to have made the wrong decision - is weak in the sense that it reminds you of the presence of alternatives, but also creates frustration about the choice you have made. But positive reinforcement, that you have made the right decision, is much stronger because we actively seek it out, and allow our gut instincts in doing so to overcome our rational, viable, decision making processes. And it is particularly the feeling of mastery, where you feel unbeatable against others who have made a poor choice, that is important in multi-player games.

So how does this apply to the Pyro?

Let's take another Team Fortress 2 class which is not viable at high level play: the Spy. The Spy isn't used much competitively because his abilities rely in part on the incompetence of other players: his attacks are situational, he has limited ability to move unexpectedly and limited damage output.

But importantly, the Spy has a lot of opportunity to demonstrate mastery - the backstab being the ultimate expression of dominance over another player. Each successful backstab reinforces the decision taken to be a Spy, as does sneaking around behind enemy lines, avoiding detection while being disguised and sapping buildings. A Cloak & Dagger spy has his decision to equip this item validated every time he hides in a corner and does nothing - surely a contrast in a decision being validated, while being completely nonviable for him and his team. A Dead Ringer spy has his decision to equip this item reinforced every time he (apparently) dies. Beginner spies are drawn to sap buildings like flies to honey simply because they can.

Even the simple act of jumping up and down on the spot can validate the choice to be a Spy if you're disguised as a Scout, and the enemy makes the mistaken assumption you are one. And no other class has a 'validate my Spy-ness' button like the Spy's call for medic while disguised.

Similarly the Engineer is rewarded for their engineer busy work alongside sentry kills and the Sniper for head shots, even though fully charged body shots are a much more viable decision. (The Razorback is another great example of a decision which is not especially viable but is frequently, and usually negatively, validated).

What creates the most heat about the Pyro is the fact the class has so few opportunities for mastery. There are two clear examples:
1. Air blasting an uber charge
2. Setting a Spy on fire

In addition, I regularly experience two additional types, which are situational and can be difficult to achieve:
3. Killing an enemy from behind using the Backburner.
4. Reflecting rockets at medium range against a Soldier.

There are more (Puff & Sting, finishing with an Axtinguisher, Flaregun criticals, corner rushing a sentry nest, the Homewrecker example I gave earlier) but these have the same problems of unreliability and situational complexity.

Equally there are situations that the Pyro primary weapon design implies should give them mastery, but in fact do not, such as setting lots of enemies on fire and ambushing an enemy. Each time the Pyro fails to perform in these situations, the player is negatively reminded of the choices that they could have made (e.g. pick another class) but did not.

The challenge in a Pyro redesign is to validate their decision through feelings of mastery, by appealing to the intuition of the player, without necessarily adjusting the viability of the class. I don't expect the Pyro to be used in high level play, so I'm not focused on increasing the top end skill level. I would suggest a design which does the following:

1. Rewards setting multiple enemies on fire.
2. Reduce the penalty for missing nearby rocket and grenade reflections.
3. Increased survivability in close quarter combat.
4. Reduce the penalty for choosing the Backburner while increasing the skill required.

These are chosen to try to minimise the negatives for a wrong decision, to allow the situations I outlined earlier to emerge more frequently and validate the decision to play a Pyro. I believe these can be achieved without requiring additional art or animation assets, or new features be added to the game, as follows:

1. Rewards setting multiple enemies on fire.

The only benefit the Pyro receives from the after burn is an increased chance of criticals, which ramps up slowly as the after burn DoT accumulates (and is vulnerable to after burn removal from a variety of sources). The minimal change to benefit the Pyro would be to apply the total DoT credit to the critical chance immediately, instead of accumulating the damage as the enemy burns i.e. every enemy the Pyro has on fire counts for +0.6% chance of criticals every second for up to 30 seconds after they were last hit by a flame particle.

As noted, Backburner criticals create a sense of mastery - by crediting the Pyro prematurely for fire damage they will inflict, there is an increased likelihood of experiencing this feeling against a large group of enemies.

2. Reduce the penalty for missing rocket and grenade reflections.

Grenades and rockets both deliver explosive damage, so the simplest solution without affecting the balance of the Pyro against other classes is to give the Pyro innate explosives resistance - an alternative would be to reduce burst damage but which doesn't address instances where the Pyro is directly hit (likely given they are moving into the proximity of the weapon). A the moment two close range rocket or grenade direct hits will kill a Pyro: to boost this to three requires they have at least 224 health, or approximately 1.3 times their current health. A critical rocket does 270 damage, a critical grenade does 280 - 300 damage.

So the Pyro should have 30-60% additional resistance to explosive damage. This can be justified in game by the suit the Pyro wears. At 30% resistance, an overhealed Pyro will be able to resist a single critical rocket or grenade so I'd suggest this as the baseline (compare this to the Targe's 40% resistance).

3. Increase the survivability in close quarters combat.

The biggest risk to the Pyro in close quarters combat is from rockets or grenades which they are unable to reflect in time - all other enemies can be set on fire and then puffed away. The increased resistance to explosive damage should be sufficient to address this.

4. Reduce the penalty for choosing the Backburner while increasing the skill required.

The choice of Backburner is penalized every time the Pyro is presented with the opportunity to do an Air Blast - which is frequently. I suggest the situations where the Backburner gets mastery are so limited (from behind, and sometimes against other Pyros) that the complete elimination of the air blast is too significant. So I'd change the Backburner to have the following changes from the Flamethrower:

+15% damage
Criticals from behind
Reflected projectiles do not minicrit
-50% ammunition capacity

At 100 ammo, the Backburner still has over 8 seconds of burn available, but each air blast uses twice the effective ammunition and so is a significant choice to make.

4 comments:

Anonymous said...
This comment has been removed by the author.
Unknown said...

"High level competitive play in Team Fortress 2 is usually 6v6, with two soldiers, 2 scouts, a medic and a demo on each team. This structure has emerged organically from the interactions between the variety of TF2 classes and shows that the choice of class is important, and balanced in a way that is interesting and useful in tournament play."

It's worth mentioning that this is a result of competitive TF2 having a limit of 1 medic and 1 demoman per team, and 2 of any other class. Early tournaments without class limits were dominated by teams consisting entirely of demomen and medics, and I'm sure the same combo would work these days even after all the balance changes.

Jotaf said...

Team Fortress has hardly any ASCII!! :P

Just kidding, I enjoyed the last few posts. (I'm just not too crazy about TF.)

Andrew Doull said...

Chris: Thanks, wasn't aware of that.

Jotaf: I'll defend my recent posts by pointing out that the developer of 100 Rogues also cites TF2 as a big influence.

And Robin Walker plays roguelikes.