Thread by @MikkiHEL, So let me put my narrative jerk hat on for a sec [...]

So let me put my narrative jerk hat on for a sec and talk about something I think might be the most commonplace way to fuck up the storytelling of your game. It& #39;s a tiny thing, it makes perfectly serviceable scenes kind of suck, and you see it ALL. THE. TIME.

(A thread.)

As an example, I& #39;m going to use a bit from Cyberpunk 2077. Please don& #39;t think I& #39;m trying to dunk on the game. I promise I& #39;m not. You see this in games everywhere CONSTANTLY. I chose this simply because it& #39;s a loading screen, so I could very easily find it and capture it.

Okay, now, what we have here is a segment from a talk show. It kind of speaks for itself -- take a look:

If you have any kind of ear for this sort of thing, you can probably tell that it& #39;s... not very good. Doesn& #39;t flow very well, feels stilted and unnatural. You& #39;d be forgiven for thinking it& #39;s badly written or badly acted. But it isn& #39;t! The content and performance are both fine.

This is a question of implementation. The problem is the pacing. Before I get into this, I should briefly explain how your typical dialogue system works. Essentially, somebody puts the recorded lines into a script file in the correct order, and then the system plays them.

And when it plays them, it leaves a little gap between the lines, typically .6 seconds, which is generally a pretty good pause for that, because on average, it sounds pretty natural: somebody says something, another person responds. What could be simpler, right?

But in this clip, it doesn& #39;t work. There& #39;s crosstalk! So when Lt. Sara Krakosky stammers and trails off, the idea is she does that because Ziggy Q interrupts her. But the dialogue system doesn& #39;t know that. It just plays the audio assets one after another, with the gap in between.

So what we hear is Krakosky just kind of losing steam for no reason, and falling quiet. There& #39;s a pause and then Ziggy barges in with "hold on, hold on." But of course he& #39;s speaking over dead air at that point, instead of interrupting anybody. And that sounds weird as hell.

And I mean, bad writing and bad acting in video games are certainly not unheard of! But just the same, this is the single biggest reason why video game dialogue so often sounds bad. In fact, when people lose interest in story? Bet you this why more often than they even realize.

When cause and effect are effectively decoupled from each other like this, people just come across like their reactions are random and stupid. Or unsettling, depending on the context. It& #39;s just very hard to be invested in people who don& #39;t sound at all like people.

So why does this happen? Depends! Could be folks who implement it don& #39;t even realize it& #39;s bad. Maybe the dialogue tools suck and and you can& #39;t fix it without enormous effort. Maybe it& #39;s a known problem, but not a priority -- it& #39;s not broken, per se, and you gotta ship the fucker.

Tools are very often the culprit: if you have to manually offset the lines every time there& #39;s overlap, that can be a gigantic scripting headache. How many milliseconds is good in this particular instance? How much of a hassle is to make it happen in the first place?

And depending on how things are set up, perhaps you can& #39;t even test the scene right away. You change the dialogue timing by a quarter second, then you wait hours for a new build. Not quite right? Tweak it again, wait again. Mind-numbing, and not actually mission critical.

But it does make a huge difference in how things feel. A couple of tweaks in the dialogue timing can make a scene feel completely different. To demonstrate this, here is a version of this exchange that I have (very crudely) re-edited to improve the flow.

Now, yeah, OBVIOUSLY, this is far from a great edit; I& #39;m an amateur fucking around to make a point about pacing. I don& #39;t have a clean version of the source (that ambient soundscape + my clumsy skills = yikes).

But listen to them back to back.

Bet you can tell the difference.

And like I said, you see this all the time, in huge games with gigantic budgets. Great care goes into writing and casting and acting and recording, but then the material is consigned to the dialogue system. It spits out lines one after another, heedless of the content or context.

It makes a lot of storytelling come across far, far worse than it could, or should. It& #39;s a goddamn tragedy. If you& #39;re working on a game with talking in it? Make this your priority. I guarantee it will pay off. Giving a shit about this is the secret sauce to making dialogue work.

And that, friends, is one of the most important things you can learn about video game storytelling: way, WAY more often than you might realize, what you have is way less important than how it is implemented. This applies to a lot of other things besides dialogue, too.

OK! Done.

Okay, so, I went to bed, I woke up to a ton of good questions and comments! Awesome, I love that so many people are feeling this. It& #39;s pretty great.

Still, let me just add a couple of things for clarity and maybe expand on a few points a little bit...

First of all: I know I talk about the technical aspects quite a bit above. There are others I didn& #39;t even get into! (Subtitles and localization, anyone?) But this is not REALLY a technical issue. More than anything else, it& #39;s a question of industry culture and prioritization.

Secondly: Ain& #39;t as simple to fix as you might think. I know it& #39;s very attractive to say "oh, just do X," but... if it really was easy, the problem wouldn& #39;t be near-universal. A lot of the obvious technical solutions have issues you aren& #39;t thinking about. But more than that...

At its core, this is a question of getting the work done. Particularly in a big open world game, you can easily have 100,000 lines of dialogue. If you assume that you spend, on average, just 15 seconds ensuring that each line is properly paced, that& #39;s about 417 hours of work.

And that& #39;s assuming 15 seconds is all it takes per line. (It won& #39;t be. It won& #39;t be anywhere even close to that.) That& #39;s not even taking into account all the times the lines themselves have to change. So it& #39;s a big commitment of time and resources.

And I (obviously!) think that it& #39;s absolutely worthwhile to make that commitment! It makes such a huge difference. But it& #39;s easy for me to say this here and now. Those conversations have a very different tone during the ongoing rolling emergency that is game development.

Because when you prioritize things, there& #39;s "mission critical" and then there& #39;s "nice to have." And this stuff is rarely considered to be mission critical. Sometimes it is! There are studios that prioritize it, because they recognize the huge impact it has. But others do not.

And it& #39;s not necessarily because they are stupid or don& #39;t get it, if we& #39;re honest about this, it& #39;s because they just... think everything else matters more. I very much disagree, but then storytelling is my specialty, it& #39;s what I do. So of course I disagree.

And, I mean, when a game like Skyrim sells fifty bazillion copies even though most of its dialogue sounds like every character is set to random, are they wrong? Are they really?

...I mean, they are. Yes. I say this as an expert in my field. But even so, I have to be honest and acknowledge that I am neither funding the project nor personally beholden to the people who do, and that shit matters. You can argue it shouldn& #39;t but, well, it is what it is.

Another huge factor is that when these decisions are REALLY made, it& #39;s not an abstract weighing of pros and cons. It& #39;s "we have 50 individual fires and 40 buckets of water, CHOOSE WISELY, MOTHERFUCKER. CHOOSE NOW AND LIVE WITH IT." So, you know... it gets complicated.

Really, all of this is so much more complicated than you realize if you haven& #39;t been in one of these sausage factories. The entire production process has this way of making seemingly obvious and straightforward decisions and situations incredibly murky and fraught.

So, yeah, it& #39;s... complicated. But don& #39;t misunderstand: I& #39;m not saying this is how it has to be, or it& #39;s inevitable, or there& #39;s no point in complaining about it. I AM complaining about it! But more than anything else, we need a conscious shift in culture and priorities to fix it.

Okay, so let me add another thing, just because it& #39;s been brought up a lot: a lot of people have asked why you wouldn& #39;t just record both actors at the same time and put all the dialogue in a single file. So let me tackle that here so I don& #39;t have to keep repeating it! :)

It seems like an obvious thing, right? The actors are going to sound better playing off each other, you can get the pacing right in their actual performance, and if once put it in a single file, then you don& #39;t have to worry about pacing the individual lines. Easy peasy!

Well...

There are many reasons why that& #39;s not how things are generally done. Here are a few!

First of all, these games put you in a 3D environment, where audio changes depending on where you are in relation to its source. That requires that each character& #39;s voice be a separate asset.

In this particular instance, given that this is a canned talk show, you could maybe get away with it! But... subtitle systems rely on the scripting. When a line is played, the subtitle appears on screen. Otherwise you have to time them manually, which is a huge pain.

Also, consider localization: when the lines are separate assets, you can get translated versions, drop them in, and you can expect things to work. The pacing, if it& #39;s done right to begin with, is likely to be fine, even though the audio files aren& #39;t the exact the same length!

Also, now individual lines can also change for other reasons without you having to re-time everything. Did a line& #39;s length change? No problem. Did the content change? No problem. Just drop it in, it works. You don& #39;t have to re-do the entire scene. (I& #39;m simplifying it, but still.)

So... why aren& #39;t actors recorded together? Wouldn& #39;t the performances be better? Most likely yes! But a big game can easily have hundreds of speaking parts, and scheduling the actors to be in the booth together is often very difficult. It easily turns into a logistical quagmire.

What& #39;s more, voice actors who get cast also also tend to end up doing multiple parts -- a good voice actor can do several characters, so frankly, it& #39;s just better bang for your buck to take advantage of that, not just in terms of money but also production bandwidth.

A lot of time and effort goes into setting things up, getting the actor into the game& #39;s tone and atmosphere, etc. So once you have them prepped, if you can do multiple characters with them, that& #39;s just so much easier and faster for everybody concerned.

But if you have, say, four actors doing ten characters between them and some of them interact in various combinations, it becomes a booking nightmare. All of a sudden what could have simply been four different VO sessions mushrooms into a real cluster headache.

That& #39;s not to say you can& #39;t record actors together! Sure you can, and you can get great results out of that. But it& #39;s not a simple solution; if anything, it& #39;s more difficult to do. It& #39;s EXACTLY like pacing dialogue properly, in fact: absolutely doable... but it& #39;s extra effort.

(Oh, and note I& #39;m now talking about recording VO in a recording studio. Actual performance capture on a sound stage is a very different animal for all sorts of reasons. I won& #39;t get into it now, but you should know that vastly different standards and practices apply.)

Latest Threads Unrolled: