A tiny ray tracer project…
May/100
You can download the .zip file including the sourcecode and executable file for windows.
Project page :
http://code.google.com/p/tinyraytracer/
Here is one of the latest personal project I’ve been workin on… Actually I spent about 3 days to finish it up.
It is objective is to create a customizable ray tracer using as much processing power as possible. For now it is multithtreaded but uses only the CPUs, not GPUs.
you can customize the ray tracer parameters from Main.cpp file (hardcoded for now)
-
-
#define SCREEN_WIDTH 800
-
#define SCREEN_HEIGHT 600
-
#define AMBIENT_COLOUR COLOUR_BLACK
-
#define NUM_OF_MAX_REFLECTIONS 8
-
#define SQRT_OF_RAYS_PER_PIXEL 1
-
#define NUM_OF_THREADS_PER_CPU 1
-
#define NUM_OF_PIXELS_TO_BE_RENDERED_BY_A_THREAD_AT_A_TIME 100
-
here is the objective which you can also find at http://code.google.com/p/tinyraytracer/
a primitive ray tracer which for the beginning renders only spheres no texturing only reflection only point light sources static super sampling (but hard shadows) multithreaded static scene
in the future renders triangles 2d texturing might be kd-tree (not sure of it) reflection and refraction point and plane light sources dynamic super sampling CUDA or OpenCL dynamic scene (moving objects and camera)
Setting Up a Home Server with Ubuntu 9.10
Apr/100
It’s being a long time since my last post in my blog but this time I wanted to come across with something different which I think might be quite useful for whom wants to assess their old PCs or laptops.
If ten years ago someone had told me that one day I will have three computers at home in my service, I only would have laughed. But today it is real
I actually have 2 laptops (one is my mother’s which would have made me laugh even more) and one desktop.
The desktop is an AMD athlon 2500 1GB RAM and NVidia 6600GT, one of the laptops is a 512MB RAM (384MB after sharing ) intel 1.6 GHz and other laptop is a intel dualcore 1.8GHz with 2GB RAM. After buying my laptop I gave up using my PC for more than one year. It was resting in peace
Recently I decided to assess it and wanted to set it up as a server for downloading stuff and for experimental uses. However in my first try I couldn’t manage to install Ubuntu 9.10 as I was using my two 120GB disks with Raid0. Unfortunately Ubuntu 9.10 (yes it is the desktop version not the server one) having some problems with Raid 0 on my hardware. I removed the strip and also one of the hdds, started installation… Booom!!! my cdrom wasn’t working properly. There was only one way; to install Ubuntu via a usb drive.
Typing ‘install ubuntu from usb’ on google, I found https://help.ubuntu.com/community/Installation/FromUSBStick and https://help.ubuntu.com/community/USB%20Installation%20Media . Using UNBootin (http://unetbootin.sourceforge.net/) as advised and setting the usb drive as the boot media from my pc’s setup, installation started at last! It should be noted that if the usb drive is an external harddrive then UNbootin doesn’t seem to work, instead use an usb pendrive. Installing ubuntu from a pendrive works pretty fast and flawles.
Anyway
TO BE CONTINUED…
Speed Test: std::Vector vs Array
Mar/100
Most recently I have been working on a multithreaded ray tracer in order to port it to CUDA so I am not using any recursion. The first step is implementing the ray tracer without using any acceleration structure. So I simply test a ray with all the objects in a scene thus I needed to decide using whether an array or a vector.
As the current code strongly relies on the data structure I am using, I wanted to test the iteration speed of std::vector versus array.
Here is the code I used for testing.
-
-
#include <iostream>
-
#include <map>
-
#include <vector>
-
#include <array>
-
#include <stack>
-
#include <windows.h>
-
#include <math.h>
-
typedef DWORD TIME_TYPE;
-
-
-
#define BEGIN_TIMING(repeat) TIME_TYPE dwStartTime = timeGetTime(); for(int __i = 0; __i < repeat; __i++){
-
#define END_TIMING }; TIME_TYPE __ret = (timeGetTime() – dwStartTime); cout<< __FUNCTION__ << "\t\t: elapsed time =\t"<<__ret<<endl; return __ret;;
-
-
-
const unsigned int g_uiStartTestSize = 1000000;
-
const unsigned int g_uiStartRepeatCount = 1;
-
-
unsigned int g_uiTestSize = 0;
-
unsigned int g_uiRepeatCount = 0;
-
unsigned int g_uiNumOfTestSteps = 10;
-
using namespace std;
-
using std::map;
-
using std::vector;
-
-
class ClassA
-
{
-
public:
-
ClassA(){a=b=c=d=0;};
-
~ClassA(){};
-
double a;
-
double b;
-
double c;
-
double d;
-
};
-
-
-
-
vector<ClassA*>* classAVector;
-
ClassA** classAArray;
-
-
TIME_TYPE testFillInArray()
-
{
-
BEGIN_TIMING(1)
-
classAArray = new ClassA*[g_uiTestSize];
-
for(int i =0; i < g_uiTestSize; i++)
-
{
-
classAArray[i] = new ClassA();
-
}
-
END_TIMING
-
}
-
-
TIME_TYPE testFillInVector()
-
{
-
BEGIN_TIMING(1)
-
classAVector = new vector<ClassA*>();
-
for(int i =0; i < g_uiTestSize; i++)
-
{
-
classAVector->push_back(new ClassA());
-
}
-
END_TIMING
-
}
-
-
TIME_TYPE testDestroyArray()
-
{
-
BEGIN_TIMING(1)
-
for (int i = 0; i < g_uiTestSize; i++)
-
{
-
delete classAArray[i];
-
}
-
delete [] classAArray;
-
END_TIMING
-
}
-
-
TIME_TYPE testDestroyVector()
-
{
-
BEGIN_TIMING(1)
-
for (int i = 0; i < g_uiTestSize; i++)
-
{
-
delete (*classAVector)[i];
-
}
-
delete classAVector;
-
-
END_TIMING
-
}
-
-
TIME_TYPE testIterateVector(void)
-
{
-
BEGIN_TIMING(g_uiRepeatCount)
-
ClassA *pClassA;
-
-
/*
-
const int sz = classAVector->size();
-
for (int i = 0; i < sz; i++)
-
{
-
pClassA = (*classAVector)[i];
-
pClassA->a++;
-
pClassA->b++;
-
pClassA->c++;
-
pClassA->d++;
-
}*/
-
-
-
-
vector<ClassA*>::iterator it = classAVector->begin();
-
const vector<ClassA*>::iterator endIt = classAVector->end();
-
while(it != endIt)
-
{
-
pClassA = (*it);
-
pClassA->a++;
-
pClassA->b++;
-
pClassA->c++;
-
pClassA->d++;
-
it++;
-
}
-
-
END_TIMING
-
}
-
-
TIME_TYPE testIterateArray(void)
-
{
-
BEGIN_TIMING(g_uiRepeatCount)
-
ClassA *pClassA;
-
for (int i =0; i < g_uiTestSize; i++)
-
{
-
pClassA = classAArray[i];
-
pClassA->a++;
-
pClassA->b++;
-
pClassA->c++;
-
pClassA->d++;
-
-
}
-
END_TIMING
-
}
-
-
void verify()
-
{
-
cout<<"verifying"<<endl;
-
for (int i=0; i < g_uiTestSize; i++)
-
{
-
if( (*classAArray[i]).a != classAVector->at(i)->a )
-
{
-
cout << "error!:" << i <<endl;
-
}
-
}
-
}
-
-
TIME_TYPE testAll()
-
{
-
BEGIN_TIMING(1)
-
g_uiTestSize = g_uiStartTestSize;
-
for (int i =1; i <= g_uiNumOfTestSteps; i++)
-
{
-
g_uiRepeatCount = g_uiStartRepeatCount * i;
-
cout<< "repeat count =\t\t"<<g_uiRepeatCount<<endl;
-
cout<< "test size =\t\t"<<g_uiTestSize<<endl;
-
testFillInArray();
-
testFillInVector();
-
testIterateArray();
-
testIterateVector();
-
testDestroyArray();
-
testDestroyVector();
-
cout<<"———————————————"<<endl;
-
}
-
g_uiRepeatCount = g_uiStartRepeatCount;
-
for (int i =1; i <= g_uiNumOfTestSteps; i++)
-
{
-
g_uiTestSize = g_uiStartTestSize * i;
-
cout<< "repeat count =\t"<<g_uiRepeatCount<<endl;
-
cout<< "test size =\t"<<g_uiTestSize<<endl;
-
testFillInArray();
-
testFillInVector();
-
testIterateArray();
-
testIterateVector();
-
testDestroyArray();
-
testDestroyVector();
-
cout<<"———————————————"<<endl;
-
}
-
-
END_TIMING
-
-
}
-
-
-
-
-
int main(void)
-
{
-
testAll();
-
return 0;
-
}
-
-
Results
Results are striking indeed. If you compiling the code in Release mode without any run-time error checking and disabled exection support iterating a Vector by using std::vector<>::iterator is almost as fast as using array.
However testing them on debug mode shows the difference, iteration time is terribly slow on std::vector ; about 60 times slower.
Conclusion
std::vector is almost as fast as using a static array on release mode (using Visual Studio 2008) but terribly slow on debug mode. So if your code is running slow on the debug mode, don’t give up std::vector and give a go on the release mode.
finding prime numbers in a range using python
Dec/090
the dictionary definition of prime numbers gives us a clear idea about prime number finding algorithm; ‘–noun Mathematics.
a positive integer that is not divisible without remainder by any integer except itself and 1, with 1 often excluded: The integers 2, 3, 5, and 7 are prime numbers. ‘ according to http://dictionary.reference.com/browse/prime+number.
hence the most primitive algorithm is dividing the number by all the positive numbers between 2 and the number-1 and waiting for non-zero reminders from all.
however it is not really the most practical way.
first of all it is obvious that any numbers except for 2 cannot be divided without reminder by its decrement. this way of thinking leads us to question the range of the numbers to be used as divider. logically the limit should be between 2 and square root of the number. the explanation is simple; lets assume that the number is not prime and we divide it with a number bigger than its square root, in that case the result will be a number between 2 and the number’s square root, which we should have already tested (since we start the division test in order starting from two).
secondly the test should be done only with prime numbers, not all. the reason is quite simple, if X divides Y then all the multipliers of X also divides Y. think about dividing 600 by 30; 600 can also be divided by 2, 3 and 5 which are the multipliers of 30.
lastly there is no point of testing even numbers as they already can be divided by 2. by doing that, for finding prime number in a range, we also prevent the division by 2 tests.
here is the python code for finding prime numbers in the range of 2 to 1′000′000, takes aroun 5.4 seconds on my laptop.
-
from math import sqrt
-
import time
-
t1 = time.time()
-
primenumbers=[2,3,5]
-
-
for i in range(7,1000000,2):
-
sq = sqrt(i)+1
-
for j in primenumbers:
-
if sq < j:
-
primenumbers.append(i)
-
#print i, len(primenumbers)
-
break
-
-
if i % j == 0:
-
break
-
-
print len(primenumbers)
-
t2 = time.time()
-
print (t2 – t1)
Game Balyoz
Dec/090
svn checkout http://balyoz.googlecode.com/svn/trunk/ balyoz-read-only
http://code.google.com/p/balyoz/
1.XML Based Definition
<?xml version="1.0" ?> <weapons> <weapon> <name>bazooka</name> <mesh>cube.mesh</mesh> <reloadtime>500</reloadtime> <numofbullets> <capacity>1000</capacity> <initial>1</initial> <maximum>9</maximum> <minimum>1</minimum> <anglebetweenbullets>18</anglebetweenbullets> </numofbullets><bullet> <initialspeed>10</initialspeed> <maximumspeed>-100</maximumspeed> <power>100</power> <radius>10</radius> <effect>linear</effect> <lifetime>4.5</lifetime> <particles>Examples/Smoke</particles> <explosion>explosion</explosion> <controller>dummy</controller> </bullet> </weapon> </weapons> |
2.Controller Design


Ray Tracing on Cell by Using KD-Tree Acceleration Structure – PDF
Nov/090
It’s being a long time since promising to publish my dissertation thesis with the title of Ray Tracing on Cell by Using KD-Tree Acceleration Structure
At last I found it and decided to publish! Unfortunately html version is a bit crappy and it’s a pain in the neck to make it tidy and nice. You can also check the PDF version of my MSc thesis out.
Mind Confusing Interview Questions
Nov/092
Not everyone but quite a lot of people get annoyed with frustrating interview questions, especially the ones working in gaming industry. Here are some of the questions you may also get frustrated.
1) As far as a I heard from a friend of mine this question was asked during a google phone interview;
In a virtual country all the families want to have one and only one male baby (boy) and they really insist of having that baby; they contiue giving a birth until they get the boy and they stop having baby after that.
To give an example, let sey there are only 3 families; Family A,B,C
Family A gets the boy after 4 births
G G G B
and Family B gets the boy in the first place
B
and Family C gets it after 10 tries
G G G G G G G G G B
as a result, the number of girls is 12 and boys is 3.
What would you expect the percent of the population of boys to be in this country?
The result is not that striking: 0.5
CUDA and Ray tracing; are they really a good combination? Or PS3?
Jun/090
Previous week I mailed to people who worked on ray tracing with CUDA architecture and I received some disheartening mails. I would like to share these mails since you may need to understand reasons for the conflict between NVidia’s CUDA 1.1 architecture and ray tracing.
I sent this mail to “the secret matador” whom I found in NVidia’s CUDA forums;
> umut has sent you this email from
> http://forums.nvidia.com/index.php.
>
>
> Hello
> I am studying my master degree on computer games
> tech and I’m planning to choose ray tracing on cuda
> processors in dynamic scenes for my master thesis,
> however, as i read from your post itz really not a
> good idea. Is it? I am also concerned about the
> number of replies to your post (no reply as i see).
> I would like to ask you if it is a good idea to do
> my master thesis on this topic or would you like to
> suggest some different topics about CUDA?
> I would really appreciate your answer.
> Thanks
>
> UMUT
> umuter { @t } gm@il.com
>
Well, I was trying to perform raytracing with CUDA
also some time ago. My scenes contained triangles. If
you want to use CUDA for just spheres I think can be
as fast as Cell ( see
http://eric_rollins.home.mindspring.com/ray/cuda.html
)
Well, for xNormal I tryed CUDA with triangle
scenes(1-10M triangles) and I wasn’t able to get it
working at decent speed. Perhaps if you use smaller
scenes could be ok… but was terrible slow for me…
I think my scenes were too big to fit well in the
shared/const memory. I used BVHs, grids and stackless
kdtrees without any good result. A dual-core was
always faster. Btw, the CUDA stackless approach sux a
lot! CUDA is very good processing linear data(like an
image kernel) but terrible processing random and
recursive structures like the ones that requires
raytracing.
Another problem I found was that there is no CUDA x64
version, neither is Vista compatible and the VRAM on
desktop G80 cards was too small for me(256-512Mb is
the typical and my scenes were occuping almost 1Gb).
The CUDA 1.1 does not support 3D textures so is a pain
to get the uniform grids working…
Well, I hope you could improve my results… CUDA was
not very good for me. I hope they add a function
stack, better syncronization mechanism and improved
registers/instruction count for CUDA 2.
After getting this mail I asked a similar question to Eric Rollins who is most probably one of the first people trying to implement ray tracing algorithm on CUDA. And I also asked if PS3 or CUDA? which one should I choose and the answer is pretty clear; PS3. Eric Rollin’s complete answer:
Umut;
I think you are correct about the difficulties with ray tracing on CUDA. It is easier on the Cell/PS3, though still challenging. See papers linked on my PS3 ray tracing page, and MIT blue steel: http://cag.csail.mit.edu/ps3/blue-steel.shtml
In my code for CUDA and Cell/PS3 the only primitive I have implemented is the sphere.
I assume you have already seen this paper on alternative approaches: http://graphics.stanford.edu/papers/i3dkdtree/gpu-kd-i3d.pdf . Also some of the latency avoidance techniques discussed for Cell/PS3 might apply to CUDA. I do recommend reading the papers for Cell/PS3 even if you want to try CUDA.
If you have a big insight on how to do CUDA, go with it. Otherwise I recommend Cell/PS3.
Good luck.
My Masters DissertationProposal; Raytracing on CUDA
Jun/090
The document below is prepared as for an MSc dissertation proposal, actual dissertation post is here and the direct link to the dissertation pdf is here and the html version is here.
UNIVERSITY
of
ABERTAY DUNDEE
School of Computing & Creative Technologies
May 2008
By
Umut Riza ERTURK
0703851
Introduction
Introduction
High Concept
The aim of this project is to develop a real-time ray tracing system on Compute Unified Device Architecture (CUDA). The main idea is to create a generic and portable real-time ray tracer involving the realistic lightning factors such as soft shadows (penumbra), reflection and refraction with an acceptable speed of 30 frames per second (fps) for human eye by using KD-Trees on a fully parallel processing architecture; CUDA.
Project Overview
One of the most important pushing forces of the computer graphics technologies is obviously computer games which are always getting more realistic. As the games are getting developed in terms of visual realism they need more realistic visual attributes such as soft and realistic volume shadows, multi-pass lightning and complex shaders. To achieve this level realism they are not only getting more resource intensive but also getting more complicated in terms of development process with the currently most common rendering method; rasterization [03]. The main problem of this classic rendering approach is all the objects in a 3D scene are compound of triangles and all these triangles have to pass through from main processing unit to graphics processing unit one by one. In the rasterization pipeline all these triangles needs to be analyzed, coloured, lighted, textured, culled and as a result became a pixel (CDR-INF, 2007). This approach does not only give rise to a linear slow down with the increasing number of triangles in a game scene but also increases the complexity of the implementation of the effects to get a realistic result since each effect needs to be implemented with shader phases.
Despite the unrealistic results and complex implementation of the effects, rasterization considered as conventional rendering method for the computer games because of the low computation cost. However the microprocessor technology as well as the parallel-computing technologies significantly improved for the last few decades which provoked to being focused on the other alternative way for rendering; ray tracing which is a completely different technique to achieve nearly perfect visual realism. The high quality visual characteristic of ray tracing is based on the physically correct simulation of the real light behaviour. Basically ray tracing algorithm starts with sending at least one ray for each pixel from the camera to the 3D world space. If the ray intersects with an object according to the material of the object, algorithm creates the reflected-refracted rays from the object and recursively continues its process. Because of its physically correct implementation it does not struggle to create the realistic effects such as global illumination or shadowing. However this method computationally expensive due to two major problems; creating the rays and finding the intersection point of the ray and any object in the 3D space. For instance a scene with 1K triangles and with 1024×768 pixels needs at least 786432 eye rays to be created and according to the organisation method, may require millions of triangle-ray intersection tests; moreover these rays will interact with object surfaces.
With the help of the significantly improving hardware and software technologies these problems are getting easier to tackle. For the object-ray intersection problem one of the most efficient algorithms involves KD-tree structure. KD-tree is a kind of binary space partitioning (BSP) tree which is one of the most preferred data structure using for interactive scenes (Shevtsov Maxim, 2007). Briefly KD-tree is a multidimensional search tree for points in k dimensional space (NIST, 2005) which enables to find an object in the space with the complexity of log (n). In addition to this algorithmic improvement against the object-ray intersection problem, another improvement for creating rays is the parallel computing. One of the most recent parallel computing architecture is NVidia’s CUDA hardware solution whom scales up to one hundred cores and one thousand threads. Apart from the massively multi parallel advantage other advantage is its programming language which is C.
Ray tracing is one of the most promising photo realistic rendering technique for computer games according to the researches have been making on as well as according to the improvements in hardware technologies. On the other hand CUDA is a very new and promising architecture with its suitable structure for ray-tracing and there are no satisfactory researches have been done about this system. This project will investigate techniques and possible programming improvements of ray tracing on massively multi-thread architecture named CUDA.
About this Proposal Document
This proposal aims to investigate, develop and try to improve a specific real time ray tracing structure named KD-tree for a specific multi thread programming architecture called CUDA. A review of the KD-Tree structure as well as the CUDA hardware can be found in the ‘Literature Review’ section.
Following on from that research, a method for implementing the discussed techniques is presented in the ‘Method Design’ chapter. This general design is followed by a week-by-week plan detailing a schedule for implementing the proposed project, in the ‘Project Plan’ section
Literature Review
This chapter aims to summarise the certain techniques, data structures and hardwares encapsulated by the scope of this project. A brief overview of each topic is given, followed by a discussion of the key areas in each field which are applicable to this project.
Ray Tracing
Ray tracing is a rendering method to create photo realistic 2D views from 3D space by simulating the real light behaviour. It is one of the first solutions for rendering that’s why it is also one of the most researched rendering techniques up to now, however because of the complexity and amount of the calculations, mostly used for offline rendering. Contrary to its complexity the idea is very simple. Before understanding ray tracing it is essential to understand the real light behaviour since ray tracing is just a simulation of it.
What is light and colour?
Light is an electromagnetic wave in a certain range of frequency which determines the colour of the light. Apart from the wave explanation of the light the other explanation is the particle explanation; light is the compound of small packages named photons which are travelling in a certain direction with the speed of light. When a photon collides with a surface three possible scenarios happen according to the physical attributes of the surface. If the surface is reflective then the photon reflects respect o the normal of the hit point. After the reflection, photon changes its colour according to the colour of the surface. There are two kinds of reflection; ’specular reflection’ which happens on regular surfaces and ‘diffuse reflection’ which happens on irregular surfaces.
|
|
Figure 2 – Diffuse Reflection of Light |
Other scenario is the photon passes through the objects it is collided which is called refraction. After it enters into the object, it changes its direction according to the density of the surface and also according to the normal of the hit point.
|
|
The last scenario is it gets absorbed by the surface and neither can reflect nor refract, this happens only on real dark surfaces.
The reflection and the refraction can also happen at the same time in this case it can be considered as more than one photon has been created from one photon.
|
|
Basically ray tracing is the simulation of these rules in a virtual 3D world and its objective is to convert the 3D space to 2D image by determining the colour of each 2D image cell (pixel) in the 2D world. The main difference of ray tracing from the real world is, in real world all the photons are coming to our eyes from light sources directly or after reflections-refractions, however ray tracing algorithm does the reverse; sends the light rays (photons) from each of the pixels in the 2D world (screen) to the 3D world (e.g. game scene). The reason for sending rays from the screen to the scene is only the rays reaching to the screen will affect the final colour value of the pixel. That also means the photons reaching to the back face of the sphere (on Figure-5) and reflecting through an irrelevant direction will not affect the final 2D image therefore calculations for this photon are unnecessary.
|
Figure 5 – Basic ray tracing concept; rays from the screen to scene |
Figure 6 – The dashed lines represents the wasted photons which will never reach to the screen |
![]() |
Figure 7 – Adaptive super sampling, sending 5 rays for each pixel for the begging, increases the number of rays if the colour difference is big between these 5 rays |
Specification of the ray tracing is at least one ray needs to be sent from each pixel however in most cases one ray is not enough since the projection of one pixel may correspond to a large area in 3D world. On the other hand sending a constant number of rays (e.g. 10 rays per pixel) from each pixel, which is called supersampling, is not an efficient solution since it increases the ray tracing time linearly. Determination the number of pixels to be sent from each pixel is one of the problems of ray tracing which plays an important role on the quality issues of the image such as anti-aliasing. One of the efficient solutions for this performance-quality paradigm is adaptive supersampling. This method sends a small amount of rays from a pixel for the beginning. If the result colour values of these rays are slightly different from each other, it increases the number of rays for the current pixel and sends new rays according to the difference between rays’ colours and at the end mixes the colours of these rays to calculate the actual pixel colour (Glassner Andrew S., 1989).
After creating the rays and sending them through the 3D scene, the colliding objects with the photon needs to be found. In addition to finding the object, exact location in object space and the normal of this location needs to be found. This is one of the most challenging problems of ray-tracing and many algorithms and data structures have been discussed and implemented up to now. The most known two methods for quick finding of the hit point are bounding volume hierarchies (BVH) and KD-Trees approach. The detailed information about the KD-Trees will be discussed in the next section but BVH method will not be discussed in this paper. Although the significant algorithmic improvements about this problem, ray tracers still spend their 75% to 95% of their processing time on this problem quotes James Foley (Foley James D., 1990).
The next step after creating the primary rays and finding the intersection points is firstly looking for the hit point’s position respect to the light sources. If there is no object between the hit point and the light source then the colour of the light ray will depend on directly the light source and more other calculations.
|
|
After taking light resources and shadows into account, next point is reflection or/and refracting the ray from the intersection point. The key input coming from the game scene to find the reflection and refraction rays is the normal of the point. A 3D scene usually compounds of geometric primitives such as spheres, cubes, planes, triangles etc. The normal calculation of a point depends on the primitive the ray intersects with. To give an example ray sphere intersection problem can easily be solved with these algebraic equations;

(Owen G. Scott, 1999)
The normal allows the reflection and refraction rays to be found;
Solution for the reflection ray Rl is;

Solution for the refraction ray Rr is;
|
Figure 9 – (a) reflection, (b) refraction |
After finding the reflection and refraction rays the algorithms continues recursively as it is shown in the pseudo code below;
|
Figure 10 – demostration of a ray path with reflections and refractions |
As a result the image quality of this rendering technique is obviously much realistic compare to the classic rasterization method since it is a real implementation of the light behaviour. However the calculation cost for each step is still expensive to put this algorithm into current games with the current hardwares. Fortunately there are satisfactory improvements for methods as well as for hardwares. In the next section one suitable technique and a specific hardware will be introduced.
KD-Trees
KD-tree is a special kind of binary tree data structure for organising points in k-dimensional space (since the graphics applications run on three dimensional spaces the kd-tree on this paper will be representing ‘three dimensional kd-tree’) which provides multidimensional search [05]. The construction cost for the KD-tree is logarithmic respect to the number of primitives in the scene and the average cost of traversing a voxel can be estimated (Shevtsov Maxim, 2007). Another important point about kd-trees is different from octree, which divide the 3d space into constant number of parts, kd-tree divides the space into unfixed parts with planes perpendicular to one of the coordinate system axes which also differs KD-trees from the conventional BSP trees.
The use of kd-trees for ray-tracing algorithm is finding the intersecting objects in the 3D-world space with the ray sent from the camera through the scene as quick as possible.
Since there are different kinds of KD-trees according to the optimisations, roughly the data structure for a node contains two child pointers (just like binary tree), a name and a key (usually a pair of floating points) which represent the dimensions of the rectangle.
The creation of a KD-tree briefly can be described by the following steps;
|
|
Figure 11 – Steps for constructing a kd-tree, taken from the lecture presentations of Sharat Chandran, 2002 |
The major problem of constructing a kd-tree is determination of the division planes since kd-trees are not fixed divided.
As shown in figure 12, dividing a scene into two parts can be done in several ways. The problem is finding more balanced division to increase the efficiency of the kd-tree as well as the ray tracer. The optimisation is based on a simple rule; large areas with few objects or small areas with many objects work faster (Martin E., 2006).
|
|
|
|
|
Figure 12 – Different division methods, taken from lecture slides of Martin Eisemann
The cost optimisation formula according to the rule mentioned above is;
![]() |
Equation 1 – cost equation to split the space with higher efficiency, taken from lecture slides of Martin Eismann
As a result, the cost predictable structure of KD-trees is a very big plus for ray tracing with multi threads since the jobs can be equally distributed between threads to maximise the efficiency of the multi processing. On the other hand, KD-trees are considered as the best solution for static scenes however with the increasing interest on ray tracing algorithms, several researches have been done to use KD-trees for dynamic scenes which makes it challenging.
CUDA
For the last few years GPU devices have reached to significant computational power (figure-13) compare to CPUs. The main reason is; as the graphics operations are highly intensive and requires parallel computation, GPUs evolved in a different way from CPUs; they devotes more transistors for data processing compare to caching and data controlling (NVidia, 2007).
|
|
Figure 14 – Difference between current GPUs and CPUs, GPUs devotes more transistors for processing (figures taken from (NVidia, 2007)) |
With the extremely increasing computational power of GPUs, not only graphics applications but also many other applications have been tried to be run on GPUs. CUDA has developed as hardware-software architecture to respond these needs by providing a significantly parallel and simple computation structure.
|
|
Design goals of CUDA are described by its founder NVidia ;
|
Figure 16 – (NVidia, 2007)
In addition to its parallel computing power, other powerful side of CUDA is its C language which allows more understandable and clean programmes. Other important differences coming with CUDA is, contrary to other GPU programming languages such as CG or GLSL, CUDA can write arbitrary memory addresses and CUDA exposes a shared memory area of 16KB which is very fast.
![]() |
Briefly, main reason for choosing this new generation hardware-software architecture is it provides great parallelisation opportunities which are very suitable for ray-tracing by using KD-trees. However CUDA’s stackless architecture is a big problem for recursive algorithms. This problem is the challenging point of the ray-tracing/CUDA combination. On the other hand recent researches showed that it is not impossible, in addition CUDA is a very new and promising system for the future applications.
Method Design
High-Level Outcomes
The aims of this project are:
-
To implement a homogeneous (by using both the CPU as well as the GPU) real time ray-tracer on a specific system called CUDA by using a specific data structure named KD-trees.
-
To achieve an acceptable frame rate for human eye (~20fps) and visual quality with the resolution of 800×600 in a one million-triangels scene by using a G80 GPU processor.
Programming the CPU
Constructing and updating the KD-tree data structure
According to the ray tracing algorithm there are four main processes need to be done. First process is the construction of the KD-tree according to the object locations in the scene. For a real time ray tracer not only constructing the data structure but also keeping the data structure is very important. The first major problem to tackle is using the KD-tree for dynamic scenes since its structure is more suitable for static scenes. However thankfully there are some other papers have already been published and it is proved that by using some additional data it is possible to port kd-trees to dynamic scenes (e.g. [05]). Apart from keeping the data structure updated, the 3D object data needs to be provided by the CPU to the CUDA threads.
The other three main processes (creating the rays and finding the collision point of the object and calculating the new rays reflected or refracted from the hit point) will be done by the CUDA threads.
Programming the CUDA
Ray operations
While keeping the KD-tree data structure update and sending this structure to the shared memory area of the CUDA threads, these threads will be creating the rays by using adaptive super sampling method from the screen through the scene therefore each thread will be responsible for different number of rays. This operation is not complex but heavy because of the high number of rays. The first improvement of using an extremely parallel architecture is that hundreds of these rays will be created at the same time. After creating these rays each thread will search the colliding objects in the KD-tree structure and will find the normal of the hit point according to the type of the primitive since in this project sphere and triangle primitives will be implemented. Before calculating the normals, threads will finding out if is there any direct connection between the hit point and any of the light sources. After finding if the hit point is in the shadow region or in the bright side, the colour code of this hit point will be recorded to the corresponding pixel data. Next step is finding the reflection and refraction rays. This step involves the recursion problem which is not supported by CUDA. This problem seems to be the biggest problem of the project; even though using an iterative way is not a real solution because of CUDA threads small (16KB) shared memories. To overcome with this problem there are three possible solutions; trying to limit the number of reflection and refractions (which will cause quality problems), using the global memory (which will cause performance problems) or synchronising the threads about using the shared memory (which will increase the complexity of the implementation). As a result with these reflected and refracted rays the final colour value of the pixel will be determined by using a linear blending function.
To sum up, implementation of real time ray tracing by using KD-tree data structure on CUDA architecture has many problems to overcome with. On the other hand, magnificently improving GPU technologies opens new doors to tackle these problems either by increasing its data process speed or by increasing the memory needed for these processes (i.e. it is expecting that CUDA 2.0 will involve recursion). Because of these achievements on hardware side, the interests about ray-tracing increases day by day and as a result of these achievements new methods and new scientific researches getting published every day. This project will be hopefully one of these researches.
Evaluation
Evaluation Criteria
There are two basic criteria to evaluate the result of the project:
-
Visual quality (qualitative)
-
Performance (quantitative)
Qualitative
According to the project aims, the most important qualitative requirement is the visual quality of the final image. Such that, even if the visual quality of the final image is poor the speed won’t be an important since this can be easily done by reducing the number of rays sent from screen. However the quality of the image is subjective therefore the satisfaction of this criterion will be human dependent. To reduce the subjective view; one of the most common scene (named Cornell box) using in ray tracing tests will be used to test the visual quality.
|
|
Quantitative
Since the aimed fps and the hardware as well as the software described clearly in the aims of this project, evaluating the quantitative criteria will be very basic. As soon as the getting visually satisfactory results from Cornell Box, 20 fps will be enough for the project to achieve its goal under mentioned hardware and software implementations
Project Plan
| June 2008 | ||
| 1st – 31st | Learning CUDA architecture, reviewing NVidia documents, implementation of some algorithms on CUDA to have an experience on CUDA.Revising parallel computation. | |
| July 2008 | ||
| 1st – 31st | Implementation of KD-tree structure, searching for improvements for KD-tree structure on dynamic scenes | |
| August 2008 | ||
| 1st – 31st | Constructing primitives’ data structure and working on calculation of reflected-refracted rays. | |
| September 2008 | ||
| 1st w. -2nd w. | Implementation of the reflection and refraction rays on CUDA | |
| 3rd w. | Incorporating of super sampling method on CUDA | |
| 4th w. | Progress presentation | |
| October 2008 | ||
| 1st w. | Preparing documentation of the progress | |
| 2nd w. – 3rd w. | Integrating the CPU-side operations with CUDA-side operations. Getting first image results | |
| 4th w. | Submit 1st Draft | |
| November 2008 | ||
| 1st w. | Maintaining of the program | |
| 2nd w. – 3rd w. – 4th w. | Preparing final draft and reviewing final draft feedback. | |
| December 2008 | ||
| 1st-31st | Incorporate any additional changes based on final draft feedback | |
| January 2008 | ||
| Submit Dissertation | ||
References
- Schmittler J., Pohl D., Dahmen T., Vogelgesang C., and Slusallek P., 2005. Realtime Ray Tracing for Current and Future Games, ACM Portal, [accessed 10th May 2008], url: http://portal.acm.org/citation.cfm?id=1198555.1198762
- Schmittler J., Pohl D., Dahmen T., Vogelgesang C., and Slusallek P., 2004, Realtime Ray Tracing of Dynamic Scenes on an FPGA Chip, ACM Portal, [accessed 10th May 2008], url: http://portal.acm.org/ft_gateway.cfm?id=1058143&type=pdf&coll=GUIDE&dl=GUIDE&CFID=68260220&CFTOKEN=96881082
- Friedrich H., Gunther J., Dietrich A., Scherbaum M., Seidel Hans-Peter, Slusallek P., 2006, Exploring the Use of Ray Tracing for Future Games, ACM Portal, [accessed 10th May 2008], url: http://portal.acm.org/ft_gateway.cfm?id=1183323&type=pdf&coll=GUIDE&dl=GUIDE&CFID=68260623&CFTOKEN=18219301
- Woop Sven, Schmittler Jorg, Slusallek Philipp, 2005. RPU: a programmable ray processing unit for realtime ray tracing, ACM Portal, [accessed 10th May 2008], url: http://portal.acm.org/ft_gateway.cfm?id=1073211&type=pdf&coll=GUIDE&dl=GUIDE&CFID=68261058&CFTOKEN=632516362
- Zhou Kun, Hou Qiming, Wang Rui, Guo Baining,2008. Real-Time KD-Tree Construction on Graphics Hardware, Microsoft Research, [accessed 10th May 2008], url: ftp://ftp.research.microsoft.com/pub/tr/TR-2008-52.pdf
- Slusallek Philipp, Shirley Peter, Mark Bill, Stoll Gordon, Wald Ingo, 2005. Introduction to real time ray tracing – course 41, ACM Portal, http://portal.acm.org/ft_gateway.cfm?id=1183323&type=pdf&coll=GUIDE&dl=GUIDE&CFID=68262552&CFTOKEN=78451416
- Shevtsov Maxim, Soupikov Alexei, Kapustin Alexander, 2007. Highly Parallel Fast KD-tree Construction for Interactive Ray Tracing of Dynamic Scenes, Intel corporation, [accessed 10th May 2008], url: http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fkesen.huang.googlepages.com%2FIntel-EG07.pdf&ei=GFUsSMazL5O-0QST1JSOBQ&usg=AFQjCNHctNbBVxaKwpIZ7f71SQlBbvXobQ&sig2=iFf0gqjzcfxGAIBflFwgOg
- NVidia, 2007. NVIDIA CUDA Programming Guide, NVidia, [accessed 10th May 2008], url: http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fdeveloper.download.nvidia.com%2Fcompute%2Fcuda%2F1_0%2FNVIDIA_CUDA_Programming_Guide_1.0.pdf&ei=pVYsSJGFEKa8QLXogasF&usg=AFQjCNHrbSl3bDFxVTvtHfjJ8RKpjlgxzg&sig2=Df-Ib4yBF9BFNqi_dwWAeg
- Rademacher Paul, Ray Tracing: Graphics for the Masses, The University of North Carolina, [accessed 10th May 2008], url: http://www.cs.unc.edu/~rademach/xroads-RT/RTarticle.html
- Glassner Andrew S., 1989. An Introduction to ray tracing, Morgan Kaufmann
- Foley James D., Dam Andries van, Feiner Steven K., Hughes John F., 1990. Foley, James D. Computer Graphics : Principles and Practice, USA: Adisson Wesley
- Owen G. Scott, 1999. Siggraph Education Materials(online) , [accessed 10th May 2008], url: http://www.siggraph.org/education/materials/HyperGraph/raytrace/rtinter1.htm
- Bikker Jacco, 2005. DevMaster, Rayt`racing: Theory & Implementation Part 7, Kd-Trees and More Speed,. [accessed 10th May 2008], url: http://www.devmaster.net/articles/raytracing_series/part7.php
- Chandran Sharat, 2002. University of Maryland web site Data Structures lecture notes, Introduction to kd-trees, [accessed 10th May 2008], url: http://www.cs.umd.edu/class/spring2002/cmsc420-0401/pbasic.pdf
- Martin Eisemann, 2006. University of Carolo-Wilhelmina, Computer Graphics- kD-Tree and Optimizations for Ray Tracing, [accessed 10th May 2008], url: http://graphics.tu-bs.de/teaching/lectures/ws0607/CG1/slides/07-kD-Tree.pdf
- Marlon John, LaMothe Andre, 2003.
Focus on Photon Mapping, Ohio:Premier Press.
- CDR-INF, 2007. Real Time Ray-Tracing May Replace GPU Rasterization, [accessed 10th May 2008], url:
http://www.cdrinfo.com/Sections/News/Details.aspx?NewsId=21608
- NIST (National Institute of Standards and Technology), 2005. K-D tree data structure, [accessed 10th May 2008], url:
http://www.nist.gov/dads/HTML/kdtree.html
Short Term Decision Making with Fuzzy Logic And Long Term Decision Making with Neural Networks In Real-Time Strategy Games
Jun/090
Abstract
Real-time strategy is one of the most important game genre from the beginning of the computer games history and one of the biggest inner market in computer games market, in spite of that fact there are only a few strategy games have successful AIs. This paper will try to improve real-time strategy games’ AI by improving long term decisions such as; resource management, determining correct buildings to build, achieving correct upgrades/technologies with neural networks as well as to short term decisions such as; creating correct troops against opponents’ troops, determination of attacking or fleeing, attacking to correct enemy troops. Also this paper will try to give examples from strategy games such as Command and Conquer: Tiberium Wars, Age of Empires.
Key Words: Artificial intelligence in real-time strategy games, fuzzy logic, neural networks, short term decision making in strategy games, long term decision making in strategy games.
1. Definition of Fuzzy Logic
The best definition of fuzzy logic is given by its inventor Lotfi Zadeh; “Fuzzy logic means of representing problems to computers in a way akin to the way human solve them and the essence of fuzzy logic is that everything is a matter of degree.”
The meaning of solving problems with computers akin to the way human solve can easily be explained with a simple example from a basketball game; if a player wants to guard another player firstly he should consider how tall he is and how his playing skills are. Simply if the player that he wants to guard is tall and plays very slow relative to him then he will use his instinct to determine to consider if he should guard that player as there is an uncertainty for him. In this example the important point is the properties are relative to the player and there is a degree for the height and playing skill for the rival player. Fuzzy logic provides a deterministic way for this uncertain situation.
There are some steps to process the fuzzy logic (Figure-1). These steps are; firstly fuzzification where crisp inputs get converted to fuzzy inputs secondly these inputs get processed with fuzzy rules to create fuzzy output and lastly defuzzification which results with degree of result as in fuzzy logic there can be more than one result with different degrees.
|
|
| Figure 1 – Fuzzy Process Steps (David M. Bourg P.192) |
To exemplify the fuzzy procee steps, the previous basketball game situation could be used. As mentioned in the example the rival player is tall with 1.87 meters which is quite tall relative to our player and can dripple with 3 m/s which is slow relative to our player. Addition to these data some rules are needed to consider which are called fuzzy rules such as;
- if player is short but not fast then guard,
- if player is fast but not short then don’t guard
- If player is tall then don’t guard
- If player is average tall and average fast guard
Figure 2 – how tall |
Figure 3- how fast |
According to the rules and the input data an output will be created by fuzzy system such as; the degree for guard is 0.7, degree for sometimes guard is 0.4 and never guard is 0.2.
|
Figure 4-output fuzzy sets |
On the last step, defuzzication, is using for creating a crisp output which is a number which may determine the energy that we should use to guard the player during game. The centre of mass is a common method to create the output. On this phase the weights to calculate the mean point is totally depends on the implementation. On this application it is considered to give high weight to guard or not guard but low weight given to sometimes guard. (David M. Bourg, 2004)
|
Figure 5- fuzzy output (David M. Bourg P.204) |
Output = [0.7 * (-10) + 0.4 * 1 + 0.2 * 10] / (0.7 + 0.4 + 0.2) ≈ -3.5
As a result fuzzy logic is using under uncertainty to make a decision and to find out the degree of decision. The problem of fuzzy logic is as the number of inputs increase the number of rules increase exponential.
2. Definition of Artificial Neural Networks
Although there is no agreed definition the most common definition is “it involves a network of simple processing elements (neurons), which can exhibit complex global behaviour, determined by the connections between the processing elements and element parameters” (Wikipedia, Artificial neural network).
The structure of ANN (Artificial Neural Networks) is same with human’s neural network’s structure as in human neural system there are receptors to collect the data named dendrites, a processing unit named cell body and an output unit named axon. Similar to this structure in ANN there is an input layer, a processing unit which sums the inputs after multiplying them with weights and an output layer which gives a result for a neuron (Figure 6).
|
Figure 6- structure of a simple neuron (Mat Buckland, 2002, p.241) |
As seen in the figure the output will depend on the inputs and the weights as well. This is the main point of the ANN as if we can find the correct weights for the inputs than we can create a correct result for ever situation. On reverse order if we know have the inputs and if we know the results for every input pair than we can calculate the weights therefore after finding weights if we give input values to our neural network it can calculate the correct output for certain input pairs. This method which is called back propagation is a supervised learning. The problems of that method are; training sets for feeding the ANN may not have all the possibilities or these sets may contain novel data (Dr Emma Hart, 2005).
Although the logic sounds pretty easy for ANNs with supervised learning, there is a mathematical background which makes this logic work for computers. On this part of this paper no mathematical background will be explained since it is not the main goal of this paper. (To get more information about ANNs please check further reading)
|
Figure 7- structure of a neural network with one input layer, one hidden layer and one output layer (Mat Buckland, 2002, p.242) |
As seen in the figure above there is a hidden layer which is necessary to make complex calculations. A neural network without hidden layer can make logical calculations such as ‘AND’ and ‘OR’ but cannot calculate XOR operation. This means that for complex calculations the hidden layer is obviously necessary. The number of neurons in the hidden layer is up to implementation. Increasing number of hidden layers increases the precision and complexity of the calculations which results with accurate calculations with using more CPU time and more memory that is the trade off. To find the best number of hidden layers the only way is experimenting different values, however, there are different estimations for that problem such as square root of number of input layers multiply by number of output layer.
In brief, ANNs are using for complicated or imprecise data and also for recognising the patterns (Christos Stergiou, NEURAL NETWORKS). Recognition of patterns requires very complicated methods and data structures with classic methods; however ANNs are very simple structures which make the implementation easier as ANNs are self-organising structures as well. The cons of ANNs with back propagation supervised learning are; high consumption of resources, choosing correct propagation sets, choosing correct number of hidden layers and hard debugging.
3. Real Time Strategy Games (RTS Games)
A RTS game is a strategic game which runs in real time distinguish it from turn based strategy games. Although the high number of RTS games there are only a few scenarios are used for these games. The most popular scenario starts with a couple of productive units such as villagers, and a few resources. The player should use the villagers to collect resources and create new buildings or units who can be soldiers or villagers. The buildings usually enable new unit tapes. To give an example the player can create knights by using stable and to build a stable a barracks should have been built. On the other hand to upgrade the unit strength usually players should achieve new technologies which are usually much costly than unit production. As seen in this scenario usually in RTS games there is a hierarchical order which the player should follow to defeat the enemy.
Although there is a static hierarchical order for creation of units or achievement of technologies, always complex algorithms running for RTS games AI since the number of probabilities and parameters are very high for making decisions. Moreover the only problem in RTS games is not decision making but also path finding or this kind of AI algorithms but in this paper only decision making will be discussed.
In the history of RTS games, game AIs usually statically programmed like finite state machines for decision making. To give an example AOE-2 (Age of Empires II) has a rule based AI with priority queues which means nearly static development for NPC (non player character). If the player can find a way to defeat the NPC than the player will never lose again as the NPC cannot evolve itself and cannot find a way to overcome the bottlenecks of its management strategy. Besides these long term decision failures in AOE-2 consideration mechanism for choosing the correct the enemy to attack is not working at all as the troops always choose and attack an enemy until they die. This makes the game AI dummy since if the human player creates a trap with a wall such as builds a wall around an archer, all the knights try to attack this archer but they are not able to attack because of the wall but they lose lots of energy or time. Even if the AI programmer can find out this bug and tries to prevent this situation, he has to write lots of scripts as there is no rule to generalise in this situation. That makes the AI very complicated.
Thankfully in new generation games such as Command Conqueror Tiberium Wars, the troops behave reasonable as they can choose the correct troop to attack according to its health. To give an example all the troops choose the enemy with lowest health in their range but this approach still involves some problems. Moreover this game has obviously problem while making long term decisions because NPC obviously cheats by having lots of recourses and by producing them very fast. These approaches make strategy games very boring for game players as if the game player can defeat NPC for once then computer nearly has no chance to win again. That’s because of their static structure and behaviours.
To find alternative solutions for these problems in this paper the decisions will be divided into two parts according to their importance. If the decision is unit wise such as choosing the correct troop in the range to attack will be called as short term decision. On the other hand the decisions, like choosing correct technologies to achieve, which are not individual decision for a troop but about a certain group of all troops in the game or the decisions which will affect the destiny of the game for long term will be called long term decisions.
Not only the importance but also the complexity of the decisions is another reason for this differentiation. The short term decisions such as considering the correct troop to build is a less complex decision compare to resource management and it is only uncertain which means can be found out by analysing some other parameters. However the resource management cannot be done by analysing some parameters because of the huge amount of the parameters and complex relations between these parameters.
a. Short Term Decisions in Real Time Strategy Games
As mentioned before, short term decision means instantaneous decisions made by individual troops or the controlling mechanism such as; decision of attacking or fleeing or creating the correct troops according to immediate data. Although these kind of short term decisions are unpredictable, they can be made by deterministic calculations such as Fuzzy Logic which provides a high resolution for output with crisp input data. In the next parts of this paper, these short term decisions will be discussed in terms of their input data, output data and the way to make these decisions with the explanations.
The sections below based on a hypothetical RTS game scenario. In this hypothetical RTS game the units have the properties showed on the list below with explanations.
| Health | The health of the troop, in fuzzy set it will be normalized range between [0,100], 100 represents the healthiest and 0 is death. |
| Armour | Unlike some of the RTS games, in this example there is only one type of armour for the troop which reduces the attack power of enemy against the troop. |
| Attack Power | Attack power of troop against the enemy. |
| Attack Speed | Attack speed is the variable defines how many time can the troop attack to the enemy in a constant time interval |
| Movement Speed | Differs from attack speed, this is the parameter to define how fast the troop move or escape, it is constant for every type of the troops. |
And the parameters are found by some calculations using the variable above. The main reason for doing that is having more realistic information. To give an example, let’s assume there is two kinds of troops first one with attack power of 50 and attack speed of 1, the other troop has the attack power of 15 with the speed of 4. This means that while the first troop causes 50×1 = 50 units of damage in T seconds but the second troop causes 15×4 = 60 units of damage which means the second troop is stronger than second one.
| Relative Attack Power | ![]() |
OAS = Own Attack Speed
EAS = Enemy’s Attack Speed This variable normalized by multiplying with 50. The result of 50 means equality. |
| Relative Speed | ![]() |
This variable normalized by multiplying with 50. The result of 50 means equality. |
The usage of symbol adds the group behaviour for the decision making. To give an example, if the Team A has only one troop with good condition in terms of its health, on the other side Team B has five troops and all of them heavily injured but has the opportunity to destroy the enemy troop. On this condition if the consideration algorithm runs on only the troops one by one than all the Team B troops should escape from the clash but actually they shouldn’t since they can destroy the enemy.
Another example for better understanding can be given from the figure below. The troop with number 7 from the Square Team is attacking to the troop with number 2 from the Circle Team and the number 2 is being attacked by number 8 as well. On the other hand number 7 is being attacked by number 3. On this condition RAP (Relative Attack Power) of number 7 will be calculated by sum of powers of number 7 and number 8 divide by the attacking power of number 3. So the equation for RAP is;
|
Equation 1 – RAP Equation for Troop 7 |
|
Figure 8 – A clash example between two forces |
i. Attack or Flee
In a RTS game one of the most usual decisions to be made is the individual decision of a troop to continue attacking or start fleeing. In most of the RTS games like; warcraft, dune 2000, command conqueror as well as age of empires these kinds of decisions made by FSM (finite state machines). The main reason for these games to use finite state machines can be predict as reducing the CPU time for game AI as FSM is a static structure which needs minimum amount of resources but the problem of this structure is the number of states and unrealistic behaviour.
Apart from FSMs another approach can be the Fuzzy logic which provides realistic behaviour but consumes more CPU time but this is a trade off and acceptable due to the increasing CPU/GPU power.
The first stage to setup the fuzzy logic is determination of Fuzzy Input data Sets (FIS) and membership functions. However it’s better to define the Fuzzy Output Set (FOS) and its membership functions as we have only two FOSs; first one is Attack, second one is Flee (figure-9).
|
Figure 9 – Fuzzy output sets and member functions |
The FISs are Own Health, Enemy’s Health, Relative Attack Power (RAP), Relative Movement Speed (RMS). With the help of these variables the troop will show the group behaviour and make realistic decisions.
The other step is determination of membership function. On this paper only linear membership functions such as triangle and trapezoid used for easy understanding as well as easier calculation, however it is better to use logarithmic curves for RAP and RMS since they are the results of division of two forces and two speeds. For both of these data sets 50 represents the Equality.
|
Own Health |
Enemy’s Health |
RAP |
RMS |
|
Near dead |
Near dead |
Weak |
Slow |
|
Injured |
Injured |
Equal |
Equal |
|
Normal |
Normal |
Strong |
Fast |
Table 1 – Fuzzy Input Sets
And the third step is defining these FIS’s membership functions.
|
Figure 10 – Membership function for Own Health / Enemy’s Health which are normalized to 100 |
The figure above shows the membership function for Own Health as well as Enemy’s Health. As seen in figure, the health under 10 is definitely named like near dead and over 30 counts like Normal. The main reason for doing that is to prevent all the troops escape from the battle as all the troops should definitely fight if they are over 30 and they should try to escape if they are near dead.
The figure below shows the RAP membership function. In this MF (membership function), equality is 50 and definitely stronger means 1.2 times stronger than the troop which is equal to 60 and definitely weaker means 1.2 time weaker than the troop which is equal to 46.6.
|
Figure 11 – RAP Membership function |
|
Figure 12 – RMS Membership function |
On the other hand 1.1 times faster means definitely faster and 1.1 times slower means definitely slower. The tolerance is 10% to prevent the errors. If the tolerance is kept 20% than the flee consideration may not work as fifteen percent slower means no way to escape so it is better to keep the tolerance at 10%.
And the last part is determination of rules. This part is the most important and most open part to make mistakes. In addition, the numbers of all combinations are. However it is not necessary to create all the rules for every possibility. In our system the rules are; 3×3x3×3=81
|
INPUTS |
OUTPUT |
|||
|
Own Health |
Enemy’s Health |
RAP |
RMS |
Attack / Flee |
|
Not near dead |
All possibilities |
All possibilities |
All possibilities |
Attack |
|
near dead |
All possibilities |
All possibilities |
Slower |
Attack |
|
near dead |
injured |
equal |
Not faster |
Attack |
|
near dead |
near dead |
Weaker |
Not slower |
Flee |
|
near dead |
injured |
Weaker |
Not slower |
Flee |
|
near dead |
normal |
Weaker |
Not slower |
Flee |
|
near dead |
normal |
equal |
faster |
Flee |
|
near dead |
normal |
stronger |
faster |
Flee |
|
near dead |
injured |
equal |
faster |
Flee |
Table 2 – Fuzzy Conditions for Attack or Flee
If the troop is not heavily injured (not near dead) then it should continue attacking and shouldn’t escape. The problem is if it is near dead since all the flee possibilities needs this condition first. But even if the troop is near dead and slower than the enemy it should attack because there is no way to escape and the rest of the rules are easy to understand with the help of the table above.
And the very last part is testing the fuzzy system with different data. All the data should start with the near dead condition or injured for Own Health since for the condition of normal, troop will definitely attack.
One of the best examples is giving these values;
|
Own Health |
Enemy’s Health |
RAP |
RMS |
Result |
|
9.38(near dead) |
35.6(normal) |
56.8(stronger) |
53.1(faster) |
Attack |
|
9.38(near dead) |
36.9(normal) |
56.8(stronger) |
53.1(faster) |
Flee |
|
8.38(near dead) |
35.6(normal) |
56.8(stronger) |
53.1(faster) |
Flee |
Table 3 – Test decisions
Although the only difference is enemy’s health with the difference of 1.3 the result changes and seems to be a good decision since if the enemy is weaker than there is a possibility to kill because it is stronger. And the next turn if the troop got closer to death than will definitely escape showed at the last line of the table above.
|
Figure 13 – Decision is attack with the value of 51.1 |
|
Figure 14 - Decision is flee with the value of 49.6 |
ii. Choosing Correct Troop to Attack
Another short term decision for individual troops in the RTS games is choosing correct troop to attack. This decision also is a short term decision since it is only related with an individual troop also in this part of this essay; Fuzzy Logic will be used to answer this question. The main reason for using Fuzzy Logic is the problem involves uncertainty and the solution should be deterministic. To give an example during a clash between two enemy groups let’s assume that Team 1 has one troop and Team 2 has two troops and the troop in team 1 trying to determine which enemy troop to attack also let’s assume that for each troop number of properties are three such as; health attack, power as well as armour. In this situation this troop should choose it according to degree of weakness of the enemy and this weakness should be calculated with the help of these troops’ properties. This explanation seems best fit to Fuzzy Logic to be used for determination.
The FIS terms and the explanations for the hypothetical game scenario of this section are;
|
Own Effective Attack Power (OEAP) |
|
OAP = Own Attack Power
OAS = Own Attack Speed EAR = Enemy’s Armour EFH = Enemy’s Full Health |
|
Enemy’s Effective Attack Power (EEAP) |
|
EAP = Enemy’s Attack Power
EAS = Enemy’s Attack Speed OAR = Own Armour OFH = Own Full Health |
|
Enemy’s Health (EH) |
|
ECH = Enemy’s Current Health
EFH = Enemy’s Full Health This variable normalized between [0-100] where 100 is healthiest. |
These equations differ from the equations in `Attack or Flee` section of this essay since while determining the correct troop to attack should be done according to individual attack power not group attack power. The main reason for doing that is, if the group power is calculated instead of individual power against an enemy troop we cannot really choose a weak enemy for our attacking skills.
Another interesting point for this approach is this part doesn’t include Own Health parameter since this section is not determining our attacking or fleeing decision this means that the decision of the troop should be best fitting decision for the weakness degree not our own health.
After choosing FISs second part is creating the MF for these FISs.
|
|
|
|
|
|
Figure 15 – Membership functions for Choosing Correct Troop to Attack
Different from the ‘Attack or Flee’ section, there are four MFs where healthy MF is new. The mean reason of putting the healthy parameter is increasing the precise for enemy’s health MF since it is really important for the decision calculation.
The next step for creating our fuzzy logic is creating the output (FOS). The output membership graph consists of three possibilities as we need a precise output for this short term decision.
|
|
And the third step is creating rules. In this section, there are 36 possible combinations however it is reduced by using not operations as a result there are 15 combinations to implement, these are;
|
Inputs |
Output |
||
|
OEAP |
EH |
EEAP |
|
| strong | not healthy | All conditions | attack |
| strong | healthy | All conditions | neutral |
| not weak | near dead | not weak | attack |
| weak | near dead | weak | neutral |
| not weak | injured | All conditions | attack |
| weak | injured | All conditions | attack |
| normal | healthy | All conditions | neutral |
| weak | healthy | All conditions | don’t attack |
| normal | normal | strong | attack |
| normal | normal | not strong | neutral |
| weak | normal | All conditions | don’t attack |
| weak | not near dead | All conditions | neutral |
| not weak | All conditions | strong | attack |
| weak | All conditions | strong | neutral |
Table 4- Fuzzy rules for choosing correct troop to attack
The main goal of these rules is attacking to the weakest in terms of its health and our attack power against this troop as well as the strongest enemy in terms of its attack power. After creating out fuzzy system the troop can choose the weakest enemy after the defuzzication operation.
|
|
Figure 16- Example for which troop to attack decision
To give an example, let’s suppose that there are three troops, first and second are from Circle Team and the third one is from Square Team (Figure 16). And the variables for the troop 3 against 2 and one are;
|
Troop 1 |
Troop 2 |
||||
|
OEAP |
EH |
EEAP |
OEAP |
EH |
EEAP |
|
88 |
19 |
88 |
55 |
7 |
95 |
Under these conditions, the troop is powerful against the troop 1 and health of troop 1 is bad but better than troop 2, however, troops 2’s effective attack power is better than troop 1. So it is hard to determine since troop 2 is closer to dead but our attack power is not as strong as against troop1. But the fuzzy logic has an answer for both of the troops;
|
|
Troop 1 |
Troop 2 |
|
Output (result of defuzzication) |
66.7 |
67 |
Also the results of the fuzzy logic are very close to each other as expected but there is a numerical result that we can decide according to. So the troop will attack to troop 2 as the output is bigger than troop 1’s output.
|
Figure 17- Outputs for troop 1 |
|
Figure 18- Outputs for troop 2 |
b. Long Term Decisions in RTS Games
The easiest explanation for long term decisions in RTS games can be described as the decisions which affect all or a group of troops or buildings for a long time as well as can be considered as the management strategy of the game player. The best example for long term decisions is resource management during game play which means how many resource collectors (e.g. villagers, harvesters) should be created and ordered to collect certain types of resources. In addition to resource managements other examples for long term decisions are; determining correct buildings to build, achieving correct upgrades/technologies, choosing correct time to attack with full power etc.
Not only for RTS games but also for any kind of game long term decision making always a big problem as the number of possibilities is may not be a certain value. Even if the numbers of possibilities are certain then possibilities and parameters are too much to make an easy decision. Moreover the difference between a short term decision and a long term decision is simply unreasonable short term decisions can be fixed in a short time however long term decision cannot be fixed so easily.
For RTS games long term decisions are vital since the CCP (computer controlled player) should behave realistic, intelligent as well as shouldn’t to not to frustrate the human player. The most common method for making these kinds of decisions are again the finite state machines which cause nearly a static game play. Every time the game player experiences the same game for the same map. To give an example in AOE2, at the end of the fourth minute, the CCP sends a scout to explore the map and creates a static number of villagers, orders them in a static way such as; four villagers hunt deer, ten of them collect food from trees etc., at the end of the seventh minute CCP attacks with a constant number of troops. This causes very static game scenario since the human player finds a way to repel the CCP and after the human player cannot be defeated by the CCP because the CCP cannot create any new way. The game producers of these RTS games usually create an AI with a static behaviour by using scripting languages. Therefore to create a strong opponent against the human player they prefer using cheating methods such as gifting new troops as well as resources which frustrates the human player.
To create a realistic behaviour for CCPs, which means intelligent and learnable behaviour, some other methods should be used such as ANN since it provides learning for CCPs. In addition to this ability of ANNs, ANNs may also provide un-deterministic behaviour by integrating GA (Genetic Algorithm). However, this method consumes lots of CPU time and provides a very slow learning.
This section of this paper will recommend using ANN (artificial neural networks) for long term decision making by giving some basic ideas such as defining input layer parameters for ANN and the outputs as well. However, all the recommendations are hypothetical since none of the ideas are based on any calculations or experiment. For these hypothetical recommendations the famous RTS game AOE2 (Age of Empires 2) will be used as the RTS game environment.
i. ANN Structure and Hypothetic Feeding Mechanism of the ANN
As mentioned before on this paper, an ANN structure consists of three fundamental elements; inputs, weights and outputs. In our system inputs are the numerical values of a relevant system such as; for resource management system first input can be the recourse (e.g. gold, wood) the player has and the outputs are actions such as; create villager or send a villager to collect gold.
|
|
Figure 19 – An ANN Module
As mentioned before, the training method for the RTS AI is back propagation which is a supervised learning method and the weight calculations are not dynamic. This means the calculation of weights can be done after the game play.
To reduce the calculations and to separate behaviours, the module approach is preferred to be used. To give an example from our conceptual game AOE2, behaviour of the CCP during dark-age in terms of resource collection should be different from castle age. Instead of using the current age as an input value for the AI, creating different ANN modules can be another approach. This also provides parallel calculations for server-side such as calculation of dark- age could be made separately from the calculation of castle age by using threads. Another benefit of this approach is the success of each module can be send to server as training data if it fits the necessary criteria such if the player achieves castle age before computer this data can be send to the server even if the player losses the game. However, determination of the criteria should be search carefully to not to train the AI with novel data.
|
|
Figure 20 – Before games start all weights getting from AI server
To load the weights of the game AI, the client should connect to the AI server and should get the trained ANNs weights to provide a better and dynamic game play. This method can be another approach for reducing piracy in games like, if the player uses a pirate version of the game cannot download the last AI weights since to download the AI data server can request for the player ID and password first.
Also the game AI server should contain trained and tested data before release of the game AI since, no one wants to play with a dummy AI. So the game company should be well trained the game AI modules which can be a problem for the game development.
|
|
Figure 21 – At the end of each game all input and outputs sending to AI server for calculation of weights if the result of the game is successful
Briefly the steps of this system are;
For AI loading;
- Connect to AI server
- Check for ID, Password
- Send weights if successfully connected
- Create the local AI with new weights
- If cannot connect
- Use the previous ANN weights.
For data collection (collecting the training data);
- For every action
- Collect input variables of the relevant AI module
- Collect the action
- If the current ANN module reaches the criteria to be changed
- Change the ANN module
- If the ANN training data reaches the criteria, mark to be sent to server
- Else don’t mark and don’t discard the ANN training data set
- At the end of the game
- Send all ANN training data sets of the winner’s.
- Send all the marked ANN training data sets.
For Server Side Weight calculations;
- If there is new data,
- join all the data of relevant ANN modules including new data and old data
- i. Create threads for every ANN module
- ii. Calculate until reaching a threshold for every ANN module
- If there is no new data,
- Define threshold values lower than the old threshold values
- i. Create threads for every ANN module
- ii. Calculate until reaching a threshold for every ANN module
- Define threshold values lower than the old threshold values
- join all the data of relevant ANN modules including new data and old data
ii. Example ANN Structure – 1: Resource Management in Dark Age
To create the ANN structure first of all the input values should be determined. In a RTS game for resource management can be the number of resources, number of troops collecting the resources.
For our example game AOE2, the input values for dark-age can be chosen like;
1. Number of workers
2. Number of workers collecting food
3. Number of workers collecting wood
4. Number of workers collecting gold
5. Number of workers collecting stone
6. Amount of wood we have
7. Amount of wood we have
8. Amount of gold we have
9. Amount of stone we have
10. Race of the side (such as Aztecs, Goths, Turks etc.)
And the outputs of the ANN module are decisions such as;
1. Create villager
2. Send a villager from food to wood
3. Send a villager from food to gold
4. Send a villager from food to stone
5. Send a villager from wood to food
6. Send a villager from wood to gold
7. Send a villager from wood to stone
8. Send a villager from gold to food
9. Send a villager from gold to wood
10. Send a villager from gold to stone
11. Send a villager from stone to food
12. Send a villager from stone to wood
13. Send a villager from stone to gold
The data collection in the game should be triggered by the action such as, if the player choose a villager from stone collection and send him to wood collection than all the input data and should be collected after the worker starts working.
At the same time another module is working for achieving technologies and if the resources are enough to pass the next age than all the data should be recorded according to the algorithm mentioned before.
iii. Example ANN Structure – 2: Deciding the Correct Troops to Create
This part can be a either fuzzy logic or ANN since it seems like a spontaneous action, however the number of variable makes the fuzzy logic nearly impossible to implement this part and also actually this decision is not only spontaneous but also effects the game in long term as creating of Paladin is as expensive as achieving some technologies.
The input variables can be;
- Number of own archers
- Number of own knights
- Number of own swordsmen
- Number of own skirmishers
- Number of own special troops
- Number of own other troops
- Number of own defensive buildings
- Number of enemy’s archers
- Number of enemy’s knights
10. Number of enemy’s swordsmen
11. Number of enemy’s skirmishers
12. Number of enemy’s special troops
13. Number of enemy’s other troops
14. Number of enemy’s defensive buildings
15. Score of the enemy (this variable provided by the AOE2 game environment)
16. Own score
17. Own race
18. Enemy’s race
19. Amount of wood we have
20. Amount of wood we have
21. Amount of gold we have
22. Amount of stone we have
Outputs are decisions again;
- Create archer
- Create knight
- Create swordsman
- Create skirmisher
- Create special troop
As seen above, the numbers of inputs are pretty much since it is really hard to determine even for human instinct which troop to create that makes the game interesting and keeps players’ passions on the game. Therefore if this system really can be implemented and reaches a degree of success than game’s lifetime for players definitely will be longer as most of the strategy game players play strategy games for the game AI not only for graphics.
Conclusion
Artificial intelligence is the most important for strategy games and especially for real time strategy games since the computer controlled opponents should make reasonable decisions spontaneously. For most of the strategy games, game producers preferred the easy way; cheating in different ways such as giving new troops, increasing income rate providing from resources and other similar ways, however results of these methods never satisfied the game players. The most important reason for that is; when the human player finds a way to defeat the computer, by using the same patterns (tactics), the human player never lose again. Briefly human player can learn but the computer cannot. Actually they can but this method never tried in commercial strategy games. There are several reasons behind that; first of all the CPU intensity of the learning structure, secondly the success rate is not known as the learning method hardly used in games. Not only machine learning but also spontaneous decision making, in most of the strategy games, is poor. Actually these short term decisions can be made with fuzzy logic and in some of the new strategy games it seems to be used (Command Conqueror Tiberium Wars) for choosing correct troop to attack, however it is not definite since no explanation have been made about the game AI for this game.
At first sight the implementation of fuzzy logic could be seem like simple especially for the short term decision making, however, the exponentially increasing number of possibilities makes hard the fuzzy logic to be implemented. On the other hand, in this paper, the tested short term decisions seem working. Especially choosing correct troop in the range to attack seems to be a successful application of fuzzy logic according to test results but without a real time application it is not reasonable to judge as a successful application of fuzzy logic since CPU usage is not known.
The conceptual distributed server based ANN application hasn’t tried yet by any game. Besides this kind of distributed approach haven’t tried for any kind of game AI yet. The CPU usage problem of the ANN is a known weak point of this structure. Another weak point of ANNs is the requirement for huge number of feed data (training). This conceptual distributed system may be a key for solution of these problems. Additionally, this trained AI structures can have a commercial value since they can decrease the number of pirate usage as the strategy game players always want to play with a more intelligent opponent and most of them will want to be registered to the AI service provider this means preventing piracy.
As a result, the problems about short term decision making and also long term decision making tried to be solved by using ANNs and Fuzzy Logic but none of the approaches has been tried in a real time game. However, the results for fuzzy logic seem to fit like a glove the given problem. On the other hand the conceptual ANN distributed system can be a new approach for solving the problems of ANNs in RTS games.
References
[1] David M. Bourg & Glen Seemann, AI for Game Developers, July 2004, First Edition, O’Reilly
[2] Mat Buckland & Andre LaMothe, AI Techniques for Game Programming, 2002, Premier Press
[3] Mat Buckland, Programming Game AI by Example, 2005, Wordware Publishing Inc.
[4] Todd Barron, Strategy Game Programming with DirectX 9.0, 2003, Wordware Publishing Inc.
[5] Mark A. DeLoura (edited by), Game Programming Gems, 2000, Charles River Media
[6] Mark A. DeLoura (edited by), Game Programming Gems 2, 2001, Charles River Media
[7] Steve Rabin (edited by), AI Game Programming Wisdom, 2002, Charles River Media
[8] T.J.A. Jansen, Player Adaptive Cooperative Arti_cial Intelligence for RTS Games, June 13, 2007, http://www.cs.unimaas.nl/~uiterwyk/Theses/BSc/Jansen_BSc-paper.pdf, last visit date: 30-11-2007
[9] Michael Chung, Michael Buro, and Jonathan Schaeffer, Monte Carlo Planning in RTS Games, http://www.cs.ualberta.ca/~mburo/ps/mcplan.pdf, visit date: 30-11-2007
[10] F.C. Schadd, June 13 2007, Hierarchical Opponent Models for Real-Time Strategy Games http://www.cs.unimaas.nl/~uiterwyk/Theses/BSc/Schadd_BSc-paper.pdf, visit date: 30-11-2007
[11] Daniel Johnson and Janet Wiles, Computer Games with Intelligence, http://www.itee.uq.edu.au/~uqdjohns/publications/AI/AJIIPS_AI.pdf, visit date: 30-11-2007
[12] Wikipedia, Artificial_neural_network, http://en.wikipedia.org/wiki/Artificial_neural_network, last visit date: 30-11-2007
[13] Wikipedia, Fuzzy Logic, http://en.wikipedia.org/wiki/Fuzzy_logic, last visit date: 30-11-2007
[14] Dr Emma Hart, Lecture Notes; Introduction to Neural Networks, 13-04-2005, http://www.soc.napier.ac.uk/module.php3?op=getlecture&cloaking=no&lectureid=365774, last visit date: 30-11-2007
[15] Christos Stergiou and Dimitrios Siganos, NEURAL NETWORKS, http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#Why%20use%20neural%20networks, last visit date: 30-11-2007
[16] Stuart James Kelly, Applying Artificial Intelligence, Search Algorithms and Neural Networks to Games, 2003, http://www.generation5.org/content/2003/KellyMiniPaper.asp, last visit date: 30-11-2007
[17] Artificial Neural Networks, 1999, http://www.psych.utoronto.ca/users/reingold/courses/ai/nn.html, last visit date: 30-11-2007
[18] Artificial Intelligence in Games , 17 Jul 2006, http://www.codeproject.com/KB/architecture/aigame.aspx, last visit date: 30-11-2007












































