Data oriented game design




















Does this mean that OOP is useless and you should never apply it in your programs? Thinking in terms of objects is not detrimental when there is only one of each object a graphics device, a log manager, etc although in that case you might as well write it with simpler C-style functions and file-level static data. Instead, it will be split up into smaller subcomponents, each one forming part of a larger data table of similar components.

Thanks to Mike Acton and Jim Tilander for challenging my ideas over the years and for their feedback on this article. This article was originally printed in the September issue of Game Developer.

But frankly, I wrote this article based completely on my experiences with current game consoles of high CPU speeds and slow main memory, and deep memory hierarchies. Interesting read. Input is the current game state and output is the GUI elements describing that state to the user.

Highly optimized software has never had much of anything to do with OOP, it has always had a lot to do with understanding how hardware executes on code and data. No, he wrote about hardware and how to use it optimally. You can certainly construct this code to be cache efficient if you have patience and use good profiling and cache optimization tools. But engineering this is costly and complicated for software that is supposed to look real or hyper-real.

And it is even more costly if you have to fight with horrible copy-in, copy-out encapsulation done in the name of OOP or sometimes in the name of exception safety, which is even more irrelevant in optimized games. Hi, Thank you for this brilliant post. Well done for pointing out the importance of optimizing data flows in game software. Clearly getting your data flows right is essential for good performance for all the reasons you mention. Bark or Car.

Classes need to encapsulate meaningful abstractions in your software to be useful. I would propose making data flows explicit from the start, and introducing classes that work with the data flows rather than against them. Maybe you could point out specific OOP practices that obscure the underlying data flows. Then we could discuss whether OOP is at fault or just some bad design practices.

In general, I love immediate everything. You can have it immediate or retained either way. Could you give more details on what would be the differences between some traditional imperative code written in C and a the equivalent data-oriented one? People have been doing this in one way or another for a while, especially in the area of DSP.

This is about the data flow instead. That should help clarify things. Of course this potentially allows for a heavy speed-up and parallelization, but it still remains the same operation — all you have to do to switch paradigms is to not call the member method but to come up with a highly specialized transform function one of a gazillion that executes the very same code from the member method in a loop for a set of objects — modern compilers or jitters should do just that automatically when they encounter a member method being called for a set of objects.

Also, you are loosing the abilities that oop tries to provide for distinguishing between object instances the private implementation aspect of an object, publishing issues and abstraction through inheritance , but of course in many scenarios this is not really needed. However, if at some point in the future of a program you suddenly have to care about differences in each block of data, you would have to reinvent the wheel. OO is good when you want to describe a type.

OO is also good at encapsulation. However, most programmers familiar with UNIX should be familiar with the concepts you mentioned. The shell uses exactly the same design pattern, as a way to pass data between pipelines processes. I have to agree with frevd. I understand where you are going with the paradigm of processing data flows with a transformation focus, but data can be very volatile and the slightest need to change a flow can have you digging around several functions, albeit simple, but possibly numerous frevd labeled it a gazillion which is probably pretty accurate.

OOP provides a bit of a shield from this I think. Personally I think you could do much of this in OOP by simply thinking in more abstract terms. A class that focuses on a single bullet and its trajectory? No, a class that focuses on the data flow that will take the bullet from point A to point B given a series of data transformations perhaps.

Definitely a word of thought and gives OOP another twist… at least when it comes to some of the classes one might design. If not, then you can work on it at the object layer. Thanks for the great comments and feedback. My response is that technically you can, but the resulting architecture is anything but OOP. Objects end up being very small and not representing a whole concept, but only a fraction of one.

And yes, it means that when you think about full logical concepts, their implementation is going to be spread out across multiple files. Parallelization — I thought this argument was pretty weak.

Like this:. Not true. This is actually the same as OOP, except the function call is on the other side:. There need not be any real difference except for the fact that in data-oriented design, the input can be one chunk of contiguous memory, and in OOP you have no control over that.

But those issues are not inherently OOP issues. You can have all of those things in data-oriented design, or any other kind of programming for that matter. Maybe OOP encourages the programmer to make these kinds of mistakes, and maybe data-oriented programming encourages them not too.

Once again, great article. Usually member functions change internal state, and objects are often interconnected so a function call chains to other function calls in other objects. So yes, you can use classes, member functions, and private variables. Yes, you can put the update function as a static member function.

If you have a server based application, it is cheaper to throw hardware at the problem to manage the complexity of the system with OOP and abstraction layering, than drown in the sea of data.

OOP has its place. This style of programming has its place. Other styles of programming Functional, Logical, and other esoteric things have their places.

End of the story. My thoughts exactly. I think it would be easier to write Pac-Man in data oriented design than OOP, but the code would be messier. As you point out, it simplifies multithreading and cache utilization. Objects can and often should be completely abstract, and so can just as easily represent a data stream, an algorithm or a stream manager rather than an entire game entity. You say post 19 that the objects would end up being very small. I would say that this is good OOP — each object should have a single responsibility.

Are there any OOP principles in particular that must be violated in order to implement a data-oriented design? The essence of OOD is to provide encapsulation so that you can isolate each object as a black box that performs operations on itself.

Why is that so important? This process modularises the code so that lots of people can work on it at once and can be built up gradually along the way. It helps encourage code reuse and simplifies the design for anyone coming to the project fresh. When looking at a carefully written OOP, you can start at the outer layer, then gradually dig deeper into each layer of abstraction to see specific implementations. I entirely disagree with your point on modularity, a well-written OOP is much easier to understand than a completely data-oriented program.

In addition, as has been mentioned in these comments, a major headache with the data approach is passing in thousands of arguments into each method, which have to be kept up to date and restrict the flexibility to change the data double to float for example , without changing every single argument in which it is used.

However, I have successfully been combining the improved performance of the data model with the better design of the object model. We need the next step of evolution that ties these two areas together. Reading through the comments I see that many here have problems understanding that OOD and data oriented design is both about a way of thinking, and not about syntax.

Some seem to think that the first one is functional or prodecural while the last one is OOP. The first might just as well be OOP. Just look at e. You can decide whether something is OOP or not based on a single function call.

While if input is a simple data type which is publicly known and you can easily manipulate directly then it might be part of a more data oriented design or procedure oriented system.

Functional programming has nothing to do with data oriented design btw as some seem to think. After all, all experience programmers know about pointers and indices, right?

From a 10,Foot view, all video games are just a sequence of bytes. Those bytes can be divided into code and data. Code is executed by the hardware and it performs operations on the data. This code is generated by the compiler and linker from the source code in our favorite computer language. Data is just about everything else. We spend most of our time thinking about it and make most decisions based on a code-centric view of the game.

Modern hardware architectures have turned things around. A data-centric approach can make much better use of hardware resources, and can produce code that is much simpler to implement, easier to test, and easier to understand.

This month we start by looking at how to manage data relationships. Data is everything that is not code: meshes and textures, animations and skeletons, game entities and pathfinding networks, sounds and text, cut scene descriptions and dialog trees. In a game, just about all the data is intertwined in some way.

A model refers to the meshes it contains, a character needs to know about its skeleton and its animations, and a special effect points to textures and sounds. How are those relationships between different parts of data described? There are many approaches we can use, each with its own set of advantages and drawbacks. However, they have their share of shortcomings. The biggest drawback is that a pointer is just the memory address where the data happens to be located.

We often have no control over that location, so pointer values usually change from run to run. This means if we attempt to save a game checkpoint which contains a pointer to other parts of the data, the pointer value will be incorrect when we restore it.

Pointers represent a many-to-one relationship. You can only follow a pointer one way, and it is possible to have many pointers pointing to the same piece of data for example, many models pointing to the same texture. All of this means that it is not easy to relocate a piece of data that is referred to by pointers.

Unless we do some extra bookkeeping, we have no way of knowing what pointers are pointing to the data we want to relocate. They will point to a place in memory that contains something else, but the program will still think it has the original data in it, causing horrible bugs that are no fun to debug.

So we need to have some extra step in the runtime to set the pointers after loading the data so the code can use them. This is usually done either by explicit creation and linking of objects at runtime, by using other methods of identifying data, such as resource UIDs created from hashes, or through pointer fixup tables converting data offsets into real memory addresses. All of it adds some work and complexity to using pointers.

Given those characteristics, pointers are a good fit to model relationships to data that is never deleted or relocated, from data that does not need to be serialized. One way to get around the limitation of not being able to save and restore pointer values is to use offsets into a block of data. The problem with plain offsets is that the memory location pointed to by the offset then needs to be cast to the correct data type, which is cumbersome and prone to error.

The more common approach is to use indices into an array of data. Unfortunately, they still suffer from the same problem as pointers of being strictly a many-to-one relationship and making it difficult to relocate or delete the data pointed to by the index.

Additionally, arrays can only be used to store data of the same type or different types but of the same size with some extra trickery on our part , which might be too restrictive for some uses.

A good use of indices into an array are particle system descriptions. The game can create instances of particle systems by referring to their description by index into that array.

On the other hand, the particle system instances themselves would not be a good candidate to refer to with indices because their lifetimes vary considerably and they will be constantly created and destroyed. That way, we would be able to deal with different types of data. Unfortunately, storing pointers means that we have to go through an extra indirection to reach our data, which incurs a small performance hit.

An even bigger problem is that, if the data is truly heterogeneous, we still need to cast it to the correct type before we use it. Unless all data referred to by the pointers inherits from a common base class that we can use to query for its derived type, we have no easy way to find out what type the data really is.

We could even delete the data and null the pointer out to indicate it is gone. Because of these drawbacks, indices into an array of pointers is usually not an effective way to keep references to data. Handles are small units of data 32 bits typically that uniquely identify some other part of data.

They also have the advantages of being updatable to refer to data that has been relocated or deleted, and can be implemented with minimal performance overhead. The handle is used as a key into a handle manager, which associates handles with their data. The simplest possible implementation of a handle manager is a list of handle-pointer pairs and every lookup simply traverses the list looking for the handle. Even sorting the handles and doing a binary search is slow and we can do much better than that.

The handle manager is implemented as an array of pointers, and handles are indices into that array. However, to get around the drawbacks of plain indices, handles are enhanced in a couple of ways. The workings of the handle manager itself are pretty simple. Accessing data from a handle is just a matter of getting the index from the handle, verifying that the counters in the handle and the handle manager entry are the same, and accessing the pointer.

Just one level of indirection and very fast performance. We can also easily relocate or invalidate existing handles just by updating the entry in the handle manager to point to a new location or to flag it as removed. Handles are the perfect reference to data that can change locations or even be removed, from data that needs to be serialized. Game entities are usually very dynamic, and are created and destroyed frequently such as enemies spawning and being destroyed, or projectiles.

So any references to game entities would be a good fit for handles, especially if this reference is held from another game entity and its state needs to be saved and restored.

Examples of these types of relationships are the object a player is currently holding, or the target an enemy AI has locked onto. A common type of smart pointer deals with object lifetime. Smart pointers keep track of how many references there are to a particular piece of data, and free it when nobody is using it. Another kind of smart pointers insert an indirection between the data holding the pointer and the data being pointed. This allows data to be relocated, like we could do with handles.

However, implementations of these pointers are often non- serializable, so they can be quite limiting. If you consider using smart pointers from some of the popular libraries STL, Boost in your game, you should be very careful about the impact they can have on your build times. Including a single header file from one of those libraries will often pull in numerous other header files.

Additionally, smart pointers are often templated, so the compiler will do some extra work generating code for each data type you instantiated templates on. All in all, templated smart pointers can have a significant impact in build times unless they are managed very carefully.

But is the extra complexity of that layer worth the syntax benefits it provides? There are many different approaches to expressing data relationships. This article was originally printed in the September issue of Game Developer. Really, data is any byte in memory, and that includes code. Most of the time programs are going to be managing references to non-code data, but sometimes to other code as well: function pointers, compiled shaders, compiled scripts, etc.

So just ignore that distinction and think of data in a more generic way. This is my first entry into iDevBlogADay. It all started very innocently with a suggestion from Miguel , but the ball got rolling pretty quickly.

The idea is to have one independent iPhone game developer write a blog entry each day of the week. Check out the new sidebar with all the iDevBlogADay blogs. What I find interesting is that I can do the same thing with my own code… as it changes over time. Every new language I learn, every book I read, every bit of code I see, every open-source project I browse, every pair-programming session, every conversation with a fellow developer leaves a mark behind.

It slightly changes how I think of things, and realigns my values and priorities as a programmer. And those new values translate into different ways to write code, different architectures, and different coding styles. It never happens overnight. I googled this and couldn't find any real information as to what this is, let alone any code samples.

Is anyone familiar with this term and can provide an example? Is this maybe a different word for something else? My understanding of Data-Oriented Design is that it is about organizing your data for efficient processing.

Especially with respect to cache misses etc. Data-Driven Design on the other hand is about letting data control a lot of the behavior of your program described very well by Andrew Keith's answer.

Say you have ball objects in your application with properties such as color, radius, bounciness, position, etc. As you can see there is no single unit representing one Ball anymore. Ball objects only exist implicitly. This can have many advantages, performance-wise. Usually, we want to do operations on many balls at the same time. The hardware usually wants large contiguous chunks of memory to operate efficiently.

Secondly, you might do operations that affect only part of the properties of a ball. For E. However, when all ball properties are stored in one unit you will pull in all the other properties of a ball as well.

Even though you don't need them. Say each ball takes up 64 bytes and a Point takes 4 bytes. A cache slot takes, say, 64 bytes as well. That fits in one cache fetch. Thus we only get 1 cache miss to update all the 10 balls. These numbers are arbitrary - I assume a cache block is bigger. But it illustrates how memory layout can have a severe effect on cache hits and thus performance.

In my ball example, I simplified the issue a lot, because usually for any normal app you will likely access multiple variables together.

Then your structure should be:. The reason you should do this is that if data used together are placed in separate arrays, there is a risk that they will compete for the same slots in the cache. Thus loading one will throw out the other. So compared to Object-Oriented programming, the classes you end up making are not related to the entities in your mental model of the problem.

Since data is lumped together based on data usage, you won't always have sensible names to give your classes in Data-Oriented Design. The thinking behind Data-Oriented Design is very similar to how you think about relational databases. Optimizing a relational database can also involve using the cache more efficiently, although in this case, the cache is not CPU cache but pages in memory. A good database designer will also likely split out infrequently accessed data into a separate table rather than creating a table with a huge number of columns where only a few of the columns are ever used.

He might also choose to denormalize some of the tables so that data don't have to be accessed from multiple locations on disk. Just like with Data-Oriented Design these choices are made by looking at what the data access patterns are and where the performance bottleneck is. Mike Acton gave a public talk about Data oriented design recently:. My basic summary of it would be: if you want performance, then think about data flow, find the storage layer that is most likely to screw with you and optimize for it hard.

Mike is focusing on L2 cache misses, because he's doing realtime, but I imagine the same thing applies to databases disk reads and even the Web HTTP requests. It's a useful way of doing systems programming, I think. Note that it doesn't absolve you from thinking about algorithms and time complexity, it just focuses your attention at figuring out the most expensive operation type that you then must target with your mad CS skills.

I just want to point out that Noel is talking specifically about some of the specific needs we face in game development. I suppose other sectors that are doing real-time soft simulation would benefit from this, but it is unlikely to be a technique that will show noticeable improvement to general business applications. This set up is for ensuring that every last bit of performance is squeezed out of the underlying hardware. If you want to take advantage of modern processor architecture, you need to lay out your data in memory in a certain way.



0コメント

  • 1000 / 1000