A tiny ray tracer project…

6
May/10
0


You can download the .zip file including the sourcecode and executable file for windows.

Project page :
http://code.google.com/p/tinyraytracer/



Here is one of the latest personal project I’ve been workin on… Actually I spent about 3 days to finish it up.

It is objective is to create a customizable ray tracer using as much processing power as possible. For now it is multithtreaded but uses only the CPUs, not GPUs.

you can customize the ray tracer parameters from Main.cpp file (hardcoded for now)

  1.  
  2. #define SCREEN_WIDTH    800
  3. #define SCREEN_HEIGHT   600
  4. #define AMBIENT_COLOUR  COLOUR_BLACK
  5. #define NUM_OF_MAX_REFLECTIONS  8
  6. #define SQRT_OF_RAYS_PER_PIXEL  1
  7. #define NUM_OF_THREADS_PER_CPU  1
  8. #define NUM_OF_PIXELS_TO_BE_RENDERED_BY_A_THREAD_AT_A_TIME      100
  9.  

here is the objective which you can also find at http://code.google.com/p/tinyraytracer/

a primitive ray tracer which for the beginning renders only spheres no texturing only reflection only point light sources static super sampling (but hard shadows) multithreaded static scene

in the future renders triangles 2d texturing might be kd-tree (not sure of it) reflection and refraction point and plane light sources dynamic super sampling CUDA or OpenCL dynamic scene (moving objects and camera)

Setting Up a Home Server with Ubuntu 9.10

16
Apr/10
0

It’s being a long time since my last post in my blog but this time I wanted to come across with something different which I think might be quite useful for whom wants to assess their old PCs or laptops.

If ten years ago someone had told me that one day I will have three computers at home in my service, I only would have laughed. But today it is real :) I actually have 2 laptops (one is my mother’s which would have made me laugh even more) and one desktop.

The desktop is an AMD athlon 2500 1GB RAM and NVidia 6600GT, one of the laptops is a 512MB RAM (384MB after sharing ) intel 1.6 GHz and other laptop is a intel dualcore 1.8GHz with 2GB RAM. After buying my laptop I gave up using my PC for more than one year. It was resting in peace :)

Recently I decided to assess it and wanted to set it up as a server for downloading stuff and for experimental uses. However in my first try I couldn’t manage to install Ubuntu 9.10 as I was using my two 120GB disks with Raid0. Unfortunately Ubuntu 9.10 (yes it is the desktop version not the server one) having some problems with Raid 0 on my hardware. I removed the strip and also one of the hdds, started installation… Booom!!! my cdrom wasn’t working properly. There was only one way; to install Ubuntu via a usb drive.

Typing ‘install ubuntu from usb’ on google, I found https://help.ubuntu.com/community/Installation/FromUSBStick and https://help.ubuntu.com/community/USB%20Installation%20Media . Using UNBootin (http://unetbootin.sourceforge.net/) as advised and setting the usb drive as the boot media from my pc’s setup, installation started at last! It should be noted that if the usb drive is an external harddrive then UNbootin doesn’t seem to work, instead use an usb pendrive. Installing ubuntu from a pendrive works pretty fast and flawles.

Anyway

TO BE CONTINUED…

Speed Test: std::Vector vs Array

12
Mar/10
0

Most recently I have been working on a multithreaded ray tracer in order to port it to CUDA so I am not using any recursion. The first step is implementing the ray tracer without using any acceleration structure. So I simply test a ray with all the objects in a scene thus I needed to decide using whether an array or a vector.
As the current code strongly relies on the data structure I am using, I wanted to test the iteration speed of std::vector versus array.

Here is the code I used for testing.

  1.  
  2. #include <iostream>
  3. #include <map>
  4. #include <vector>
  5. #include <array>
  6. #include <stack>
  7. #include <windows.h>
  8. #include <math.h>
  9. typedef DWORD TIME_TYPE;
  10.  
  11.  
  12. #define BEGIN_TIMING(repeat)     TIME_TYPE dwStartTime = timeGetTime();         for(int __i = 0; __i < repeat; __i++){
  13. #define END_TIMING              }; TIME_TYPE __ret = (timeGetTime() – dwStartTime); cout<< __FUNCTION__ << "\t\t: elapsed time =\t"<<__ret<<endl; return __ret;;
  14.  
  15.  
  16. const unsigned int g_uiStartTestSize    = 1000000;
  17. const unsigned int g_uiStartRepeatCount = 1;
  18.  
  19. unsigned int g_uiTestSize       = 0;
  20. unsigned int g_uiRepeatCount = 0;
  21. unsigned int g_uiNumOfTestSteps = 10;
  22. using namespace std;
  23. using std::map;
  24. using std::vector;
  25.  
  26. class ClassA
  27. {
  28. public:
  29.         ClassA(){a=b=c=d=0;};
  30.         ~ClassA(){};
  31.         double a;
  32.         double b;
  33.         double c;
  34.         double d;
  35. };
  36.  
  37.  
  38.  
  39. vector<ClassA*>* classAVector;
  40. ClassA** classAArray;
  41.  
  42. TIME_TYPE testFillInArray()
  43. {
  44.         BEGIN_TIMING(1)
  45.                 classAArray = new ClassA*[g_uiTestSize];
  46.                 for(int i =0; i < g_uiTestSize; i++)
  47.                 {
  48.                         classAArray[i] = new ClassA();
  49.                 }
  50.         END_TIMING
  51. }
  52.  
  53. TIME_TYPE testFillInVector()
  54. {
  55.         BEGIN_TIMING(1)
  56.                 classAVector = new vector<ClassA*>();
  57.                 for(int i =0; i < g_uiTestSize; i++)
  58.                 {
  59.                         classAVector->push_back(new ClassA());
  60.                 }
  61.         END_TIMING
  62. }
  63.  
  64. TIME_TYPE testDestroyArray()
  65. {
  66.         BEGIN_TIMING(1)
  67.                 for (int i = 0; i < g_uiTestSize; i++)
  68.                 {
  69.                         delete classAArray[i];
  70.                 }
  71.                 delete [] classAArray;
  72.         END_TIMING
  73. }
  74.  
  75. TIME_TYPE testDestroyVector()
  76. {
  77.         BEGIN_TIMING(1)
  78.                 for  (int i = 0; i < g_uiTestSize; i++)
  79.                 {
  80.                         delete (*classAVector)[i];
  81.                 }
  82.                 delete classAVector;
  83.  
  84.         END_TIMING
  85. }
  86.  
  87. TIME_TYPE testIterateVector(void)
  88. {
  89.         BEGIN_TIMING(g_uiRepeatCount)
  90.         ClassA *pClassA;
  91.  
  92. /*
  93.         const int sz = classAVector->size();
  94.         for (int i = 0; i < sz; i++)
  95.         {
  96.                 pClassA = (*classAVector)[i];          
  97.                 pClassA->a++;
  98.                 pClassA->b++;
  99.                 pClassA->c++;
  100.                 pClassA->d++;
  101.         }*/
  102.  
  103.  
  104.  
  105.         vector<ClassA*>::iterator it = classAVector->begin();
  106.                 const vector<ClassA*>::iterator endIt = classAVector->end();
  107.                 while(it != endIt)
  108.                 {
  109.                         pClassA = (*it);
  110.                         pClassA->a++;
  111.                         pClassA->b++;
  112.                         pClassA->c++;
  113.                         pClassA->d++;
  114.                         it++;
  115.                 }
  116.  
  117.         END_TIMING
  118. }
  119.  
  120. TIME_TYPE testIterateArray(void)
  121. {
  122.         BEGIN_TIMING(g_uiRepeatCount)
  123.         ClassA *pClassA;
  124.         for (int i =0; i < g_uiTestSize; i++)
  125.         {
  126.                 pClassA = classAArray[i];
  127.                 pClassA->a++;
  128.                 pClassA->b++;
  129.                 pClassA->c++;
  130.                 pClassA->d++;
  131.  
  132.         }
  133.         END_TIMING
  134. }
  135.  
  136. void verify()
  137. {
  138.         cout<<"verifying"<<endl;
  139.         for (int i=0; i < g_uiTestSize; i++)
  140.         {
  141.                 if( (*classAArray[i]).a != classAVector->at(i)->a )
  142.                 {
  143.                         cout << "error!:" << i <<endl;
  144.                 }
  145.         }
  146. }
  147.  
  148. TIME_TYPE testAll()
  149. {
  150.         BEGIN_TIMING(1)
  151.                 g_uiTestSize = g_uiStartTestSize;
  152.                 for (int i =1; i <= g_uiNumOfTestSteps; i++)
  153.                 {
  154.                         g_uiRepeatCount = g_uiStartRepeatCount * i;
  155.                         cout<< "repeat count =\t\t"<<g_uiRepeatCount<<endl;
  156.                         cout<< "test size    =\t\t"<<g_uiTestSize<<endl;
  157.                         testFillInArray();
  158.                         testFillInVector();
  159.                         testIterateArray();
  160.                         testIterateVector();
  161.                         testDestroyArray();
  162.                         testDestroyVector();
  163.                         cout<<"———————————————"<<endl;
  164.                 }
  165.                 g_uiRepeatCount = g_uiStartRepeatCount;
  166.                 for (int i =1; i <= g_uiNumOfTestSteps; i++)
  167.                 {
  168.                         g_uiTestSize = g_uiStartTestSize * i;
  169.                         cout<< "repeat count =\t"<<g_uiRepeatCount<<endl;
  170.                         cout<< "test size    =\t"<<g_uiTestSize<<endl;
  171.                         testFillInArray();
  172.                         testFillInVector();
  173.                         testIterateArray();
  174.                         testIterateVector();
  175.                         testDestroyArray();
  176.                         testDestroyVector();
  177.                         cout<<"———————————————"<<endl;
  178.                 }
  179.  
  180.         END_TIMING
  181.  
  182. }
  183.  
  184.  
  185.  
  186.  
  187. int main(void)
  188. {
  189.         testAll();
  190.         return 0;
  191. }
  192.  
  193.  

Here is the code to download

Results
Results are striking indeed. If you compiling the code in Release mode without any run-time error checking and disabled exection support iterating a Vector by using std::vector<>::iterator is almost as fast as using array.

However testing them on debug mode shows the difference, iteration time is terribly slow on std::vector ; about 60 times slower.

Conclusion
std::vector is almost as fast as using a static array on release mode (using Visual Studio 2008) but terribly slow on debug mode. So if your code is running slow on the debug mode, don’t give up std::vector and give a go on the release mode.

Tagged as:

finding prime numbers in a range using python

15
Dec/09
0

the dictionary definition of prime numbers gives us a clear idea about prime number finding algorithm; ‘–noun Mathematics.
a positive integer that is not divisible without remainder by any integer except itself and 1, with 1 often excluded: The integers 2, 3, 5, and 7 are prime numbers. ‘ according to http://dictionary.reference.com/browse/prime+number.

hence the most primitive algorithm is dividing the number by all the positive numbers between 2 and the number-1 and waiting for non-zero reminders from all.

however it is not really the most practical way.

first of all it is obvious that any numbers except for 2 cannot be divided without reminder by its decrement. this way of thinking leads us to question the range of the numbers to be used as divider. logically the limit should be between 2 and square root of the number. the explanation is simple; lets assume that the number is not prime and we divide it with a number bigger than its square root, in that case the result will be a number between 2 and the number’s square root, which we should have already tested (since we start the division test in order starting from two).
secondly the test should be done only with prime numbers, not all. the reason is quite simple, if X divides Y then all the multipliers of X also divides Y. think about dividing 600 by 30; 600 can also be divided by 2, 3 and 5 which are the multipliers of 30.

lastly there is no point of testing even numbers as they already can be divided by 2. by doing that, for finding prime number in a range, we also prevent the division by 2 tests.

here is the python code for finding prime numbers in the range of 2 to 1′000′000, takes aroun 5.4 seconds on my laptop.

  1. from math import sqrt
  2. import time
  3. t1 = time.time()
  4. primenumbers=[2,3,5]
  5.  
  6. for i in range(7,1000000,2):
  7.         sq = sqrt(i)+1
  8.         for j in primenumbers:
  9.                 if sq < j:
  10.                         primenumbers.append(i)
  11.                         #print i, len(primenumbers)
  12.                         break
  13.  
  14.                 if i % j == 0:
  15.                         break
  16.  
  17. print len(primenumbers)
  18. t2 = time.time()
  19. print (t2 – t1)

Game Balyoz

10
Dec/09
0

svn checkout http://balyoz.googlecode.com/svn/trunk/ balyoz-read-only

http://code.google.com/p/balyoz/


Balyoz is a 3D shooter game having been written using OGRE engine.The aim is to provide a interesting
and enjoyable game play experience for the player and still in development.In order to achieve this purpose,
we have been using PyhsicX physic engine to provide a more realistic and unique game experience.Although
Balyoz is a classic 3D shooting game,it has also some different features then classic shooting games
offer and full of challenge even its underlying structure.The war plane will be flying over a terrain and be
capable of moving both X and Z direction.Besides, there will be different weapon options which makes
the game play more enjoyable.In Balyoz, other then enemy air units, there will be navy and ground units
which will be shooting to out plane as well.To destroy ground units, player have to use bombs with
the correct timing and position combination.I would like to give some information about the underlying
structure of the Balyoz game.

1.XML Based Definition

In order to achieve flexibility, unit definitions, weapons, levels, maps etc.. are defined XML files.
They are loaded either on game initialization stage or level lodging stage.It provides us a great
flexibility to alter the attributes, for example weapon attributes like speed, range or controller
type, or level attributes like the unit types in level and their positions, without changing even
one line code.It also prevents us to recompile the code for each time we change a attribute.
Let us have a look a example xml used in game.
<?xml version="1.0" ?>
<weapons>
<weapon>
<name>bazooka</name>
<mesh>cube.mesh</mesh>
<reloadtime>500</reloadtime>
<numofbullets>
<capacity>1000</capacity>
<initial>1</initial>
<maximum>9</maximum>
<minimum>1</minimum>
<anglebetweenbullets>18</anglebetweenbullets>
</numofbullets>
<bullet>
<initialspeed>10</initialspeed>
<maximumspeed>-100</maximumspeed>
<power>100</power>
<radius>10</radius>
<effect>linear</effect>
<lifetime>4.5</lifetime>
<particles>Examples/Smoke</particles>
<explosion>explosion</explosion>
<controller>dummy</controller>
</bullet>
</weapon>
</weapons>
XML structure in terms of game is shown below,
Game.xml
|
Levels.xml ____
|                             |
Map.xml Terrain.xml
|
Units.xml
|
Weapons.xml
|
BulletController.xml
So,if it is needed to load a unit into the game, first units.xml file will be read and unit attribute
will be figured out from units.xml file.Then, weapon names related to that unit will be read from
units.xml file and with that reference, weapon attributes will be taken from weapon.xml file.
After that, the information about by which weapon controller it will be controlled will be read
from BulletController.xml file since for example guided missiles should be controlled in a
different way.As you can see, even controller types are defined in xml files and for sure, this
system provides a great flexibility when we want to add a new weapon to a unit, or a new
unit to a level.It is one of the most powerful feature of Balyoz in terms of design.

2.Controller Design

As a design decision, a main controller is implemented which is responsible to process
and update all the game events. Main game controller process the events via a event
queue it contains. Besides, there are some sub-controllers like Unit Controller or
Collision Controller which are responsible for creating related game events and adding
the main controller`s queue.So, before each frame is rendered, first all the controllers
are executed, produced related event and added to main controller`s event queue. Then
before rendering the frame (or after) this queue is processed by main controller and game
word is updated. Separating controlling behaviors to different controllers is also provides a
flexibility to implement control behavior and maintaining the code.Moreover, game core
talks to just main controller and does not need to know about sub-controllers.These design
decisions are shown by images below,
controllers-sequence
controllers-class
Balyoz is still in its early stage of development.I will be adding news about the progress of game.Beside, screenshots and videos will be added soon as well.
al
ates

Ray Tracing on Cell by Using KD-Tree Acceleration Structure – PDF

30
Nov/09
0

It’s being a long time since promising to publish my dissertation thesis with the title of Ray Tracing on Cell by Using KD-Tree Acceleration Structure

At last I found it and decided to publish! Unfortunately html version is a bit crappy and it’s a pain in the neck to make it tidy and nice. You can also check the PDF version of my MSc thesis out.

Mind Confusing Interview Questions

16
Nov/09
2

Not everyone but quite a lot of people get annoyed with frustrating interview questions, especially the ones working in gaming industry. Here are some of the questions you may also get frustrated.

1) As far as a I heard from a friend of mine this question was asked during a google phone interview;
In a virtual country all the families want to have one and only one male baby (boy) and they really insist of having that baby; they contiue giving a birth until they get the boy and they stop having baby after that.
To give an example, let sey there are only 3 families; Family A,B,C

Family A gets the boy after 4 births
G G G B

and Family B gets the boy in the first place
B

and Family C gets it after 10 tries
G G G G G G G G G B

as a result, the number of girls is 12 and boys is 3.

What would you expect the percent of the population of boys to be in this country?

The result is not that striking: 0.5

CUDA and Ray tracing; are they really a good combination? Or PS3?

29
Jun/09
0

Previous week I mailed to people who worked on ray tracing with CUDA architecture and I received some disheartening mails. I would like to share these mails since you may need to understand reasons for the conflict between NVidia’s CUDA 1.1 architecture and ray tracing.

I sent this mail to “the secret matador” whom I found in NVidia’s CUDA forums;

> umut has sent you this email from
> http://forums.nvidia.com/index.php.
>
>
> Hello
> I am studying my master degree on computer games
> tech and I’m planning to choose ray tracing on cuda
> processors in dynamic scenes for my master thesis,
> however, as i read from your post itz really not a
> good idea. Is it? I am also concerned about the
> number of replies to your post (no reply as i see).
> I would like to ask you if it is a good idea to do
> my master thesis on this topic or would you like to
> suggest some different topics about CUDA?
> I would really appreciate your answer.
> Thanks
>
> UMUT
> umuter { @t } gm@il.com
>
Well, I was trying to perform raytracing with CUDA
also some time ago. My scenes contained triangles. If
you want to use CUDA for just spheres I think can be
as fast as Cell ( see
http://eric_rollins.home.mindspring.com/ray/cuda.html
)

Well, for xNormal I tryed CUDA with triangle
scenes(1-10M triangles) and I wasn’t able to get it
working at decent speed. Perhaps if you use smaller
scenes could be ok… but was terrible slow for me…
I think my scenes were too big to fit well in the
shared/const memory. I used BVHs, grids and stackless
kdtrees without any good result. A dual-core was
always faster. Btw, the CUDA stackless approach sux a
lot! CUDA is very good processing linear data(like an
image kernel) but terrible processing random and
recursive structures like the ones that requires
raytracing.

Another problem I found was that there is no CUDA x64
version, neither is Vista compatible and the VRAM on
desktop G80 cards was too small for me(256-512Mb is
the typical and my scenes were occuping almost 1Gb).
The CUDA 1.1 does not support 3D textures so is a pain
to get the uniform grids working…

Well, I hope you could improve my results… CUDA was
not very good for me. I hope they add a function
stack, better syncronization mechanism and improved
registers/instruction count for CUDA 2.

After getting this mail I asked a similar question to Eric Rollins who is most probably one of the first people trying to implement ray tracing algorithm on CUDA. And I also asked if PS3 or CUDA? which one should I choose and the answer is pretty clear; PS3. Eric Rollin’s complete answer:

Umut;

I think you are correct about the difficulties with ray tracing on CUDA. It is easier on the Cell/PS3, though still challenging. See papers linked on my PS3 ray tracing page, and MIT blue steel: http://cag.csail.mit.edu/ps3/blue-steel.shtml

In my code for CUDA and Cell/PS3 the only primitive I have implemented is the sphere.

I assume you have already seen this paper on alternative approaches: http://graphics.stanford.edu/papers/i3dkdtree/gpu-kd-i3d.pdf . Also some of the latency avoidance techniques discussed for Cell/PS3 might apply to CUDA. I do recommend reading the papers for Cell/PS3 even if you want to try CUDA.

If you have a big insight on how to do CUDA, go with it. Otherwise I recommend Cell/PS3.

Good luck.

My Masters DissertationProposal; Raytracing on CUDA

29
Jun/09
0







The document below is prepared as for an MSc dissertation proposal, actual dissertation post is  here and the direct link to the dissertation pdf is here and the html version is here.













UNIVERSITY

of

ABERTAY DUNDEE

School of Computing & Creative Technologies

May 2008

By

Umut Riza ERTURK

0703851


Introduction


Introduction


High Concept

The aim of this project is to develop a real-time ray tracing system on Compute Unified Device Architecture (CUDA). The main idea is to create a generic and portable real-time ray tracer involving the realistic lightning factors such as soft shadows (penumbra), reflection and refraction with an acceptable speed of 30 frames per second (fps) for human eye by using KD-Trees on a fully parallel processing architecture; CUDA.


Project Overview

One of the most important pushing forces of the computer graphics technologies is obviously computer games which are always getting more realistic. As the games are getting developed in terms of visual realism they need more realistic visual attributes such as soft and realistic volume shadows, multi-pass lightning and complex shaders. To achieve this level realism they are not only getting more resource intensive but also getting more complicated in terms of development process with the currently most common rendering method; rasterization [03]. The main problem of this classic rendering approach is all the objects in a 3D scene are compound of triangles and all these triangles have to pass through from main processing unit to graphics processing unit one by one. In the rasterization pipeline all these triangles needs to be analyzed, coloured, lighted, textured, culled and as a result became a pixel (CDR-INF, 2007). This approach does not only give rise to a linear slow down with the increasing number of triangles in a game scene but also increases the complexity of the implementation of the effects to get a realistic result since each effect needs to be implemented with shader phases.

Despite the unrealistic results and complex implementation of the effects, rasterization considered as conventional rendering method for the computer games because of the low computation cost. However the microprocessor technology as well as the parallel-computing technologies significantly improved for the last few decades which provoked to being focused on the other alternative way for rendering; ray tracing which is a completely different technique to achieve nearly perfect visual realism. The high quality visual characteristic of ray tracing is based on the physically correct simulation of the real light behaviour. Basically ray tracing algorithm starts with sending at least one ray for each pixel from the camera to the 3D world space. If the ray intersects with an object according to the material of the object, algorithm creates the reflected-refracted rays from the object and recursively continues its process. Because of its physically correct implementation it does not struggle to create the realistic effects such as global illumination or shadowing. However this method computationally expensive due to two major problems; creating the rays and finding the intersection point of the ray and any object in the 3D space. For instance a scene with 1K triangles and with 1024×768 pixels needs at least 786432 eye rays to be created and according to the organisation method, may require millions of triangle-ray intersection tests; moreover these rays will interact with object surfaces.

With the help of the significantly improving hardware and software technologies these problems are getting easier to tackle. For the object-ray intersection problem one of the most efficient algorithms involves KD-tree structure. KD-tree is a kind of binary space partitioning (BSP) tree which is one of the most preferred data structure using for interactive scenes (Shevtsov Maxim, 2007). Briefly KD-tree is a multidimensional search tree for points in k dimensional space (NIST, 2005) which enables to find an object in the space with the complexity of log (n). In addition to this algorithmic improvement against the object-ray intersection problem, another improvement for creating rays is the parallel computing. One of the most recent parallel computing architecture is NVidia’s CUDA hardware solution whom scales up to one hundred cores and one thousand threads. Apart from the massively multi parallel advantage other advantage is its programming language which is C.

Ray tracing is one of the most promising photo realistic rendering technique for computer games according to the researches have been making on as well as according to the improvements in hardware technologies. On the other hand CUDA is a very new and promising architecture with its suitable structure for ray-tracing and there are no satisfactory researches have been done about this system. This project will investigate techniques and possible programming improvements of ray tracing on massively multi-thread architecture named CUDA.

About this Proposal Document

This proposal aims to investigate, develop and try to improve a specific real time ray tracing structure named KD-tree for a specific multi thread programming architecture called CUDA. A review of the KD-Tree structure as well as the CUDA hardware can be found in the ‘Literature Review’ section.

Following on from that research, a method for implementing the discussed techniques is presented in the ‘Method Design’ chapter. This general design is followed by a week-by-week plan detailing a schedule for implementing the proposed project, in the ‘Project Plan’ section

Literature Review

This chapter aims to summarise the certain techniques, data structures and hardwares encapsulated by the scope of this project. A brief overview of each topic is given, followed by a discussion of the key areas in each field which are applicable to this project.

Ray Tracing

Ray tracing is a rendering method to create photo realistic 2D views from 3D space by simulating the real light behaviour. It is one of the first solutions for rendering that’s why it is also one of the most researched rendering techniques up to now, however because of the complexity and amount of the calculations, mostly used for offline rendering. Contrary to its complexity the idea is very simple. Before understanding ray tracing it is essential to understand the real light behaviour since ray tracing is just a simulation of it.

What is light and colour?

Light is an electromagnetic wave in a certain range of frequency which determines the colour of the light. Apart from the wave explanation of the light the other explanation is the particle explanation; light is the compound of small packages named photons which are travelling in a certain direction with the speed of light. When a photon collides with a surface three possible scenarios happen according to the physical attributes of the surface. If the surface is reflective then the photon reflects respect o the normal of the hit point. After the reflection, photon changes its colour according to the colour of the surface. There are two kinds of reflection; ’specular reflection’ which happens on regular surfaces and ‘diffuse reflection’ which happens on irregular surfaces.


Figure 1 – Specular Reflection of Light

Figure 2 – Diffuse Reflection of Light

Other scenario is the photon passes through the objects it is collided which is called refraction. After it enters into the object, it changes its direction according to the density of the surface and also according to the normal of the hit point.


Figure 3 – Rafraction of Light

The last scenario is it gets absorbed by the surface and neither can reflect nor refract, this happens only on real dark surfaces.

The reflection and the refraction can also happen at the same time in this case it can be considered as more than one photon has been created from one photon.


Figure 4 – Reflection and Refraction together

Basically ray tracing is the simulation of these rules in a virtual 3D world and its objective is to convert the 3D space to 2D image by determining the colour of each 2D image cell (pixel) in the 2D world. The main difference of ray tracing from the real world is, in real world all the photons are coming to our eyes from light sources directly or after reflections-refractions, however ray tracing algorithm does the reverse; sends the light rays (photons) from each of the pixels in the 2D world (screen) to the 3D world (e.g. game scene). The reason for sending rays from the screen to the scene is only the rays reaching to the screen will affect the final colour value of the pixel. That also means the photons reaching to the back face of the sphere (on Figure-5) and reflecting through an irrelevant direction will not affect the final 2D image therefore calculations for this photon are unnecessary.

Figure 5 – Basic ray tracing concept; rays from the screen to scene

Figure 6 – The dashed lines represents the wasted photons which will never reach to the screen

Figure 7 – Adaptive super sampling, sending 5 rays for each pixel for the begging, increases the number of rays if the colour difference is big between these 5 rays

Specification of the ray tracing is at least one ray needs to be sent from each pixel however in most cases one ray is not enough since the projection of one pixel may correspond to a large area in 3D world. On the other hand sending a constant number of rays (e.g. 10 rays per pixel) from each pixel, which is called supersampling, is not an efficient solution since it increases the ray tracing time linearly. Determination the number of pixels to be sent from each pixel is one of the problems of ray tracing which plays an important role on the quality issues of the image such as anti-aliasing. One of the efficient solutions for this performance-quality paradigm is adaptive supersampling. This method sends a small amount of rays from a pixel for the beginning. If the result colour values of these rays are slightly different from each other, it increases the number of rays for the current pixel and sends new rays according to the difference between rays’ colours and at the end mixes the colours of these rays to calculate the actual pixel colour (Glassner Andrew S., 1989).

After creating the rays and sending them through the 3D scene, the colliding objects with the photon needs to be found. In addition to finding the object, exact location in object space and the normal of this location needs to be found. This is one of the most challenging problems of ray-tracing and many algorithms and data structures have been discussed and implemented up to now. The most known two methods for quick finding of the hit point are bounding volume hierarchies (BVH) and KD-Trees approach. The detailed information about the KD-Trees will be discussed in the next section but BVH method will not be discussed in this paper. Although the significant algorithmic improvements about this problem, ray tracers still spend their 75% to 95% of their processing time on this problem quotes James Foley (Foley James D., 1990).

The next step after creating the primary rays and finding the intersection points is firstly looking for the hit point’s position respect to the light sources. If there is no object between the hit point and the light source then the colour of the light ray will depend on directly the light source and more other calculations.


Figure 8 – Colour of ray-a directly depends on the light source however ray-b is in shadow region since it is way to light source is blocked by itself

After taking light resources and shadows into account, next point is reflection or/and refracting the ray from the intersection point. The key input coming from the game scene to find the reflection and refraction rays is the normal of the point. A 3D scene usually compounds of geometric primitives such as spheres, cubes, planes, triangles etc. The normal calculation of a point depends on the primitive the ray intersects with. To give an example ray sphere intersection problem can easily be solved with these algebraic equations;















(Owen G. Scott, 1999)

The normal allows the reflection and refraction rays to be found;


Solution for the reflection ray Rl is;


Solution for the refraction ray Rr is;


Figure 9 – (a) reflection, (b) refraction

After finding the reflection and refraction rays the algorithms continues recursively as it is shown in the pseudo code below;


(Rademacher Paul, 2008)

Figure 10 – demostration of a ray path with reflections and refractions

As a result the image quality of this rendering technique is obviously much realistic compare to the classic rasterization method since it is a real implementation of the light behaviour. However the calculation cost for each step is still expensive to put this algorithm into current games with the current hardwares. Fortunately there are satisfactory improvements for methods as well as for hardwares. In the next section one suitable technique and a specific hardware will be introduced.

KD-Trees

KD-tree is a special kind of binary tree data structure for organising points in k-dimensional space (since the graphics applications run on three dimensional spaces the kd-tree on this paper will be representing ‘three dimensional kd-tree’) which provides multidimensional search [05]. The construction cost for the KD-tree is logarithmic respect to the number of primitives in the scene and the average cost of traversing a voxel can be estimated (Shevtsov Maxim, 2007). Another important point about kd-trees is different from octree, which divide the 3d space into constant number of parts, kd-tree divides the space into unfixed parts with planes perpendicular to one of the coordinate system axes which also differs KD-trees from the conventional BSP trees.

The use of kd-trees for ray-tracing algorithm is finding the intersecting objects in the 3D-world space with the ray sent from the camera through the scene as quick as possible.

Since there are different kinds of KD-trees according to the optimisations, roughly the data structure for a node contains two child pointers (just like binary tree), a name and a key (usually a pair of floating points) which represent the dimensions of the rectangle.

The creation of a KD-tree briefly can be described by the following steps;

  • At each step, choose one of the dimensions as a basis of dividing the rest of the point
  • For example at the root choose x-axis as the basis
    • Like binary search trees, all items to the left of root will have the x-coordinate less than that of the root
    • All items to the right of the root will have the x-coordinate greater than (or equal to) that of the root
  • Choose y as the basis for discrimination for the root’s children
  • And choose x again for the root’s grandchildren

Figure 11 – Steps for constructing a kd-tree, taken from the lecture presentations of Sharat Chandran, 2002

The major problem of constructing a kd-tree is determination of the division planes since kd-trees are not fixed divided.

As shown in figure 12, dividing a scene into two parts can be done in several ways. The problem is finding more balanced division to increase the efficiency of the kd-tree as well as the ray tracer. The optimisation is based on a simple rule; large areas with few objects or small areas with many objects work faster (Martin E., 2006).

  1. Split at the middle

  1. Split at the median


  1. Cost optimized partitioning

Figure 12 – Different division methods, taken from lecture slides of Martin Eisemann

The cost optimisation formula according to the rule mentioned above is;

Equation 1 – cost equation to split the space with higher efficiency, taken from lecture slides of Martin Eismann

As a result, the cost predictable structure of KD-trees is a very big plus for ray tracing with multi threads since the jobs can be equally distributed between threads to maximise the efficiency of the multi processing. On the other hand, KD-trees are considered as the best solution for static scenes however with the increasing interest on ray tracing algorithms, several researches have been done to use KD-trees for dynamic scenes which makes it challenging.

CUDA

For the last few years GPU devices have reached to significant computational power (figure-13) compare to CPUs. The main reason is; as the graphics operations are highly intensive and requires parallel computation, GPUs evolved in a different way from CPUs; they devotes more transistors for data processing compare to caching and data controlling (NVidia, 2007).


Figure 13 – Floating-Point Operations per Second for the CPU and GPU (figures taken from (NVidia, 2007))



Figure 14 – Difference between current GPUs and CPUs, GPUs devotes more transistors for processing (figures taken from (NVidia, 2007))

With the extremely increasing computational power of GPUs, not only graphics applications but also many other applications have been tried to be run on GPUs. CUDA has developed as hardware-software architecture to respond these needs by providing a significantly parallel and simple computation structure.


Figure 15 – Architecture of CUDA

Design goals of CUDA are described by its founder NVidia ;

  • Scale to 100’s of cores, 1000’s of parallel threads
  • Nearly auto-managed and simple parallel computing structure
  • Enabling heterogeneous systems (i.e. CPU+GPU)

Figure 16 – (NVidia, 2007)

In addition to its parallel computing power, other powerful side of CUDA is its C language which allows more understandable and clean programmes. Other important differences coming with CUDA is, contrary to other GPU programming languages such as CG or GLSL, CUDA can write arbitrary memory addresses and CUDA exposes a shared memory area of 16KB which is very fast.

Briefly, main reason for choosing this new generation hardware-software architecture is it provides great parallelisation opportunities which are very suitable for ray-tracing by using KD-trees. However CUDA’s stackless architecture is a big problem for recursive algorithms. This problem is the challenging point of the ray-tracing/CUDA combination. On the other hand recent researches showed that it is not impossible, in addition CUDA is a very new and promising system for the future applications.

Method Design

High-Level Outcomes

The aims of this project are:

  1. To implement a homogeneous (by using both the CPU as well as the GPU) real time ray-tracer on a specific system called CUDA by using a specific data structure named KD-trees.
  2. To achieve an acceptable frame rate for human eye (~20fps) and visual quality with the resolution of 800×600 in a one million-triangels scene by using a G80 GPU processor.

Programming the CPU

Constructing and updating the KD-tree data structure

According to the ray tracing algorithm there are four main processes need to be done. First process is the construction of the KD-tree according to the object locations in the scene. For a real time ray tracer not only constructing the data structure but also keeping the data structure is very important. The first major problem to tackle is using the KD-tree for dynamic scenes since its structure is more suitable for static scenes. However thankfully there are some other papers have already been published and it is proved that by using some additional data it is possible to port kd-trees to dynamic scenes (e.g. [05]). Apart from keeping the data structure updated, the 3D object data needs to be provided by the CPU to the CUDA threads.

The other three main processes (creating the rays and finding the collision point of the object and calculating the new rays reflected or refracted from the hit point) will be done by the CUDA threads.

Programming the CUDA

Ray operations

While keeping the KD-tree data structure update and sending this structure to the shared memory area of the CUDA threads, these threads will be creating the rays by using adaptive super sampling method from the screen through the scene therefore each thread will be responsible for different number of rays. This operation is not complex but heavy because of the high number of rays. The first improvement of using an extremely parallel architecture is that hundreds of these rays will be created at the same time. After creating these rays each thread will search the colliding objects in the KD-tree structure and will find the normal of the hit point according to the type of the primitive since in this project sphere and triangle primitives will be implemented. Before calculating the normals, threads will finding out if is there any direct connection between the hit point and any of the light sources. After finding if the hit point is in the shadow region or in the bright side, the colour code of this hit point will be recorded to the corresponding pixel data. Next step is finding the reflection and refraction rays. This step involves the recursion problem which is not supported by CUDA. This problem seems to be the biggest problem of the project; even though using an iterative way is not a real solution because of CUDA threads small (16KB) shared memories. To overcome with this problem there are three possible solutions; trying to limit the number of reflection and refractions (which will cause quality problems), using the global memory (which will cause performance problems) or synchronising the threads about using the shared memory (which will increase the complexity of the implementation). As a result with these reflected and refracted rays the final colour value of the pixel will be determined by using a linear blending function.

To sum up, implementation of real time ray tracing by using KD-tree data structure on CUDA architecture has many problems to overcome with. On the other hand, magnificently improving GPU technologies opens new doors to tackle these problems either by increasing its data process speed or by increasing the memory needed for these processes (i.e. it is expecting that CUDA 2.0 will involve recursion). Because of these achievements on hardware side, the interests about ray-tracing increases day by day and as a result of these achievements new methods and new scientific researches getting published every day. This project will be hopefully one of these researches.

Evaluation

Evaluation Criteria

There are two basic criteria to evaluate the result of the project:

  1. Visual quality (qualitative)
  2. Performance (quantitative)

    Qualitative

According to the project aims, the most important qualitative requirement is the visual quality of the final image. Such that, even if the visual quality of the final image is poor the speed won’t be an important since this can be easily done by reducing the number of rays sent from screen. However the quality of the image is subjective therefore the satisfaction of this criterion will be human dependent. To reduce the subjective view; one of the most common scene (named Cornell box) using in ray tracing tests will be used to test the visual quality.


Figure 17 – Cornell Box will be used for the image quality criteria

Quantitative

Since the aimed fps and the hardware as well as the software described clearly in the aims of this project, evaluating the quantitative criteria will be very basic. As soon as the getting visually satisfactory results from Cornell Box, 20 fps will be enough for the project to achieve its goal under mentioned hardware and software implementations

Project Plan

June 2008
1st – 31st Learning CUDA architecture, reviewing NVidia documents, implementation of some algorithms on CUDA to have an experience on CUDA.Revising parallel computation.
July 2008
1st – 31st Implementation of KD-tree structure, searching for improvements for KD-tree structure on dynamic scenes
August 2008
1st – 31st Constructing primitives’ data structure and working on calculation of reflected-refracted rays.
September 2008
1st w. -2nd w. Implementation of the reflection and refraction rays on CUDA
3rd w. Incorporating of super sampling method on CUDA
4th w. Progress presentation
October 2008
1st w. Preparing documentation of the progress
2nd w. – 3rd w. Integrating the CPU-side operations with CUDA-side operations. Getting first image results
4th w. Submit 1st Draft
November 2008
1st w. Maintaining of the program
2nd w. – 3rd w. – 4th w. Preparing final draft and reviewing final draft feedback.
December 2008
1st-31st Incorporate any additional changes based on final draft feedback
January 2008
Submit Dissertation

References

Short Term Decision Making with Fuzzy Logic And Long Term Decision Making with Neural Networks In Real-Time Strategy Games

24
Jun/09
0

Abstract

Real-time strategy is one of the most important game genre from the beginning of the computer games history and one of the biggest inner market in computer games market, in spite of that fact there are only a few strategy games have successful AIs. This paper will try to improve real-time strategy games’ AI by improving long term decisions such as; resource management, determining correct buildings to build, achieving correct upgrades/technologies with neural networks as well as to short term decisions such as; creating correct troops against opponents’ troops, determination of attacking or fleeing, attacking to correct enemy troops. Also this paper will try to give examples from strategy games such as Command and Conquer: Tiberium Wars, Age of Empires.

Key Words: Artificial intelligence in real-time strategy games, fuzzy logic, neural networks, short term decision making in strategy games, long term decision making in strategy games.

1.     Definition of Fuzzy Logic

The best definition of fuzzy logic is given by its inventor Lotfi Zadeh; “Fuzzy logic means of representing problems to computers in a way akin to the way human solve them and the essence of fuzzy logic is that everything is a matter of degree.”

The meaning of solving problems with computers akin to the way human solve can easily be explained with a simple example from a basketball game; if a player wants to guard another player firstly he should consider how tall he is and how his playing skills are. Simply if the player that he wants to guard is tall and plays very slow relative to him then he will use his instinct to determine to consider if he should guard that player as there is an uncertainty for him. In this example the important point is the properties are relative to the player and there is a degree for the height and playing skill for the rival player. Fuzzy logic provides a deterministic way for this uncertain situation.

There are some steps to process the fuzzy logic (Figure-1). These steps are; firstly fuzzification where crisp inputs get converted to fuzzy inputs secondly these inputs get processed with fuzzy rules to create fuzzy output and lastly defuzzification which results with degree of result as in fuzzy logic there can be more than one result with different degrees.

image004

Figure 1 – Fuzzy Process Steps (David M. Bourg P.192)

To exemplify the fuzzy procee steps, the previous basketball game situation could be used. As mentioned in the example the rival player is tall with 1.87 meters which is quite tall relative to our player and can dripple with 3 m/s which is slow relative to our player. Addition to these data some rules are needed to consider which are called fuzzy rules such as;

  • if player is short but not  fast then guard,
  • if player is fast but not short then don’t guard
  • If player is tall then don’t guard
  • If player is average tall and average fast guard
image005

Figure 2 – how tall

image007

Figure 3- how fast

According to the rules and the input data an output will be created by fuzzy system such as; the degree for guard is 0.7, degree for sometimes guard is 0.4 and never guard is 0.2.

image009

Figure 4-output fuzzy sets

On the last step, defuzzication, is using for creating a crisp output which is a number which may determine the energy that we should use to guard the player during game. The centre of mass is a common method to create the output. On this phase the weights to calculate the mean point is totally depends on the implementation. On this application it is considered to give high weight to guard or not guard but low weight given to sometimes guard.  (David M. Bourg, 2004)

image012

Figure 5- fuzzy output (David M. Bourg P.204)

Output = [0.7 * (-10) + 0.4 * 1 + 0.2 * 10] / (0.7 + 0.4 + 0.2) ≈ -3.5

As a result fuzzy logic is using under uncertainty to make a decision and to find out the degree of decision. The problem of fuzzy logic is as the number of inputs increase the number of rules increase exponential.

2.     Definition of Artificial Neural Networks

Although there is no agreed definition the most common definition is “it involves a network of simple processing elements (neurons), which can exhibit complex global behaviour, determined by the connections between the processing elements and element parameters” (Wikipedia, Artificial neural network).

The structure of ANN (Artificial Neural Networks) is same with human’s neural network’s structure as in human neural system there are receptors to collect the data named dendrites, a processing unit named cell body and an output unit named axon. Similar to this structure in ANN there is an input layer, a processing unit which sums the inputs after multiplying them with weights and an output layer which gives a result for a neuron (Figure 6).

image014

Figure 6- structure of a simple neuron (Mat Buckland, 2002, p.241)

As seen in the figure the output will depend on the inputs and the weights as well. This is the main point of the ANN as if we can find the correct weights for the inputs than we can create a correct result for ever situation. On reverse order if we know  have the inputs and if we know the results for every input pair than we can calculate the weights therefore after finding weights if we give input values to our neural network it can calculate the correct output for certain input pairs. This method which is called back propagation is a supervised learning. The problems of that method are; training sets for feeding the ANN may not have all the possibilities or these sets may contain novel data (Dr Emma Hart, 2005).

Although the logic sounds pretty easy for ANNs with supervised learning, there is a mathematical background which makes this logic work for computers. On this part of this paper no mathematical background will be explained since it is not the main goal of this paper. (To get more information about ANNs please check further reading)

image016

Figure 7- structure of a neural network with one input layer, one hidden layer and one output layer (Mat Buckland, 2002, p.242)

As seen in the figure above there is a hidden layer which is necessary to make complex calculations. A neural network without hidden layer can make logical calculations such as ‘AND’ and ‘OR’ but cannot calculate XOR operation. This means that for complex calculations the hidden layer is obviously necessary. The number of neurons in the hidden layer is up to implementation. Increasing number of hidden layers increases the precision and complexity of the calculations which results with accurate calculations with using more CPU time and more memory that is the trade off. To find the best number of hidden layers the only way is experimenting different values, however, there are different estimations for that problem such as square root of number of input layers multiply by number of output layer.

In brief, ANNs are using for complicated or imprecise data and also for recognising the patterns (Christos Stergiou, NEURAL NETWORKS). Recognition of patterns requires very complicated methods and data structures with classic methods; however ANNs are very simple structures which make the implementation easier as ANNs are self-organising structures as well. The cons of ANNs with back propagation supervised learning are; high consumption of resources, choosing correct propagation sets, choosing correct number of hidden layers and hard debugging.

3.     Real Time Strategy Games (RTS Games)

A RTS game is a strategic game which runs in real time distinguish it from turn based strategy games. Although the high number of RTS games there are only a few scenarios are used for these games. The most popular scenario starts with a couple of productive units such as villagers, and a few resources. The player should use the villagers to collect resources and create new buildings or units who can be soldiers or villagers. The buildings usually enable new unit tapes. To give an example the player can create knights by using stable and to build a stable a barracks should have been built. On the other hand to upgrade the unit strength usually players should achieve new technologies which are usually much costly than unit production. As seen in this scenario usually in RTS games there is a hierarchical order which the player should follow to defeat the enemy.

Although there is a static hierarchical order for creation of units or achievement of technologies, always complex algorithms running for RTS games AI since the number of probabilities and parameters are very high for making decisions. Moreover the only problem in RTS games is not decision making but also path finding or this kind of AI algorithms but in this paper only decision making will be discussed.

In the history of RTS games, game AIs usually statically programmed like finite state machines for decision making. To give an example AOE-2 (Age of Empires II) has a rule based AI with priority queues which means nearly static development for NPC (non player character). If the player can find a way to defeat the NPC than the player will never lose again as the NPC cannot evolve itself and cannot find a way to overcome the bottlenecks of its management strategy. Besides these long term decision failures in AOE-2 consideration mechanism for choosing the correct the enemy to attack is not working at all as the troops always choose and attack an enemy until they die. This makes the game AI dummy since if the human player creates a trap with a wall such as builds a wall around an archer, all the knights try to attack this archer but they are not able to attack because of the wall but they lose lots of energy or time. Even if the AI programmer can find out this bug and tries to prevent this situation, he has to write lots of scripts as there is no rule to generalise in this situation. That makes the AI very complicated.

Thankfully in new generation games such as Command Conqueror Tiberium Wars, the troops behave reasonable as they can choose the correct troop to attack according to its health. To give an example all the troops choose the enemy with lowest health in their range but this approach still involves some problems. Moreover this game has obviously problem while making long term decisions because NPC obviously cheats by having lots of recourses and by producing them very fast. These approaches make strategy games very boring for game players as if the game player can defeat NPC for once then computer nearly has no chance to win again. That’s because of their static structure and behaviours.

To find alternative solutions for these problems in this paper the decisions will be divided into two parts according to their importance. If the decision is unit wise such as choosing the correct troop in the range to attack will be called as short term decision. On the other hand the decisions, like choosing correct technologies to achieve, which are not individual decision for a troop but about a certain group of all troops in the game or the decisions which will affect the destiny of the game for long term will be called long term decisions.

Not only the importance but also the complexity of the decisions is another reason for this differentiation. The short term decisions such as considering the correct troop to build is a less complex decision compare to resource management and it is only uncertain which means can be found out by analysing some other parameters.  However the resource management cannot be done by analysing some parameters because of the huge amount of the parameters and complex relations between these parameters.

a.      Short Term Decisions in Real Time Strategy Games

As mentioned before, short term decision means instantaneous decisions made by individual troops or the controlling mechanism such as; decision of attacking or fleeing or creating the correct troops according to immediate data. Although these kind of short term decisions are unpredictable, they can be made by deterministic calculations such as Fuzzy Logic which provides a high resolution for output with crisp input data. In the next parts of this paper, these short term decisions will be discussed in terms of their input data, output data and the way to make these decisions with the explanations.

The sections below based on a hypothetical RTS game scenario. In this hypothetical RTS game the units have the properties showed on the list below with explanations.

Health The health of the troop, in fuzzy set it will be normalized range between [0,100], 100 represents the healthiest and 0 is death.
Armour Unlike some of the RTS games, in this example there is only one type of armour for the troop which reduces the attack power of enemy against the troop.
Attack Power Attack power of troop against the enemy.
Attack Speed Attack speed is the variable defines how many time can the troop attack to the enemy in a constant time interval
Movement Speed Differs from attack speed, this is the parameter to define how fast the troop move or escape, it is constant for every type of the troops.

And the parameters are found by some calculations using the variable above. The main reason for doing that is having more realistic information. To give an example, let’s assume there is two kinds of troops first one with attack power of 50 and attack speed of 1, the other troop has the attack power of 15 with the speed of 4. This means that while the first troop causes 50×1 = 50 units of damage in T seconds but the second troop causes 15×4 = 60 units of damage which means the second troop is stronger than second one.

Relative Attack Power image017 OAS = Own Attack Speed

EAS = Enemy’s Attack Speed

image019= Sum of the powers of the troops attacking to the target troop

image021= The target troop’s armour

image023= Sum of the powers of the enemy troops attacking to the troop

image021= Enemy troop’s armour in the aim of the troop

image025= The troop’s Armour.

This variable normalized by multiplying with 50. The result of 50 means equality.

Relative Speed image027 image029= Own Movement Speed

image031= Enemy’s Movement Speed

image033= The mean of the speeds of the attacking troop’s to the target troop.

image035= The mean of the speeds of the enemy troops attacking to the troop.

This variable normalized by multiplying with 50. The result of 50 means equality.

The usage of  symbol adds the group behaviour for the decision making. To give an example, if the Team A has only one troop with good condition in terms of its health, on the other side Team B has five troops and all of them heavily injured but has the opportunity to destroy the enemy troop. On this condition if the consideration algorithm runs on only the troops one by one than all the Team B troops should escape from the clash but actually they shouldn’t since they can destroy the enemy.

Another example for better understanding can be given from the figure below. The troop with number 7 from the Square Team is attacking to the troop with number 2 from the Circle Team and the number 2 is being attacked by number 8 as well. On the other hand number 7 is being attacked by number 3. On this condition RAP (Relative Attack Power) of number 7 will be calculated by sum of powers of number 7 and number 8 divide by the attacking power of number 3. So the equation for RAP is;

image039

Equation 1 – RAP Equation for Troop 7

image041

Figure 8 – A clash example between two forces

i.      Attack or Flee

In a RTS game one of the most usual decisions to be made is the individual decision of a troop to continue attacking or start fleeing. In most of the RTS games like; warcraft, dune 2000, command conqueror as well as age of empires these kinds of decisions made by FSM (finite state machines). The main reason for these games to use finite state machines can be predict as reducing the CPU time for game AI as FSM is a static structure which needs minimum amount of resources but the problem of this structure is the number of states and unrealistic behaviour.

Apart from FSMs another approach can be the Fuzzy logic which provides realistic behaviour but consumes more CPU time but this is a trade off and acceptable due to the increasing CPU/GPU power.

The first stage to setup the fuzzy logic is determination of Fuzzy Input data Sets (FIS) and membership functions. However it’s better to define the Fuzzy Output Set (FOS) and its membership functions as we have only two FOSs; first one is Attack, second one is Flee (figure-9).

image044

Figure 9 – Fuzzy output sets and member functions

The FISs are Own Health, Enemy’s Health, Relative Attack Power (RAP), Relative Movement Speed (RMS). With the help of these variables the troop will show the group behaviour and make realistic decisions.

The other step is determination of membership function. On this paper only linear membership functions such as triangle and trapezoid used for easy understanding as well as easier calculation, however it is better to use logarithmic curves for RAP and RMS since they are the results of division of two forces and two speeds. For both of these data sets 50 represents the Equality.

Own Health

Enemy’s Health

RAP

RMS

Near dead

Near dead

Weak

Slow

Injured

Injured

Equal

Equal

Normal

Normal

Strong

Fast

Table 1 – Fuzzy Input Sets

And the third step is defining these FIS’s membership functions.

image046

Figure 10 – Membership function for Own Health / Enemy’s Health which are normalized to 100

The figure above shows the membership function for Own Health as well as Enemy’s Health. As seen in figure, the health under 10 is definitely named like near dead and over 30 counts like Normal. The main reason for doing that is to prevent all the troops escape from the battle as all the troops should definitely fight if they are over 30 and they should try to escape if they are near dead.

The figure below shows the RAP membership function. In this MF (membership function), equality is 50 and definitely stronger means 1.2 times stronger than the troop which is equal to 60 and definitely weaker means 1.2 time weaker than the troop which is equal to 46.6.

image048

Figure 11 – RAP Membership function

image050

Figure 12 – RMS Membership function

On the other hand 1.1 times faster means definitely faster and 1.1 times slower means definitely slower. The tolerance is 10% to prevent the errors. If the tolerance is kept 20% than the flee consideration may not work as fifteen percent slower means no way to escape so it is better to keep the tolerance at 10%.

And the last part is determination of rules. This part is the most important and most open part to make mistakes. In addition, the numbers of all combinations are. However it is not necessary to create all the rules for every possibility. In our system the rules are; 3×3x3×3=81

INPUTS

OUTPUT

Own Health

Enemy’s Health

RAP

RMS

Attack / Flee

Not near dead

All possibilities

All possibilities

All possibilities

Attack

near dead

All possibilities

All possibilities

Slower

Attack

near dead

injured

equal

Not faster

Attack

near dead

near dead

Weaker

Not slower

Flee

near dead

injured

Weaker

Not slower

Flee

near dead

normal

Weaker

Not slower

Flee

near dead

normal

equal

faster

Flee

near dead

normal

stronger

faster

Flee

near dead

injured

equal

faster

Flee

Table 2 – Fuzzy Conditions for Attack or Flee

If the troop is not heavily injured (not near dead) then it should continue attacking and shouldn’t escape. The problem is if it is near dead since all the flee possibilities needs this condition first. But even if the troop is near dead and slower than the enemy it should attack because there is no way to escape and the rest of the rules are easy to understand with the help of the table above.

And the very last part is testing the fuzzy system with different data. All the data should start with the near dead condition or injured for Own Health since for the condition of normal, troop will definitely attack.

One of the best examples is giving these values;

Own Health

Enemy’s Health

RAP

RMS

Result

9.38(near dead)

35.6(normal)

56.8(stronger)

53.1(faster)

Attack

9.38(near dead)

36.9(normal)

56.8(stronger)

53.1(faster)

Flee

8.38(near dead)

35.6(normal)

56.8(stronger)

53.1(faster)

Flee

Table 3 – Test decisions

Although the only difference is enemy’s health with the difference of 1.3 the result changes and seems to be a good decision since if the enemy is weaker than there is a possibility to kill because it is stronger. And the next turn if the troop got closer to death than will definitely escape showed at the last line of the table above.

image054

Figure 13 – Decision is attack with the value of 51.1

image056

Figure 14 -  Decision is flee with the value of 49.6

ii.      Choosing Correct Troop to Attack

Another short term decision for individual troops in the RTS games is choosing correct troop to attack. This decision also is a short term decision since it is only related with an individual troop also in this part of this essay; Fuzzy Logic will be used to answer this question. The main reason for using Fuzzy Logic is the problem involves uncertainty and the solution should be deterministic. To give an example during a clash between two enemy groups let’s assume that Team 1 has one troop and Team 2 has two troops and the troop in team 1 trying to determine which enemy troop to attack also let’s assume that for each troop number of properties are three such as; health attack, power as well as armour. In this situation this troop should choose it according to degree of weakness of the enemy and this weakness should be calculated with the help of these troops’ properties. This explanation seems best fit to Fuzzy Logic to be used for determination.

The FIS terms and the explanations for the hypothetical game scenario of this section are;

Own Effective Attack Power (OEAP)

image057

OAP = Own Attack Power

OAS = Own Attack Speed

EAR = Enemy’s Armour

EFH = Enemy’s Full Health

Enemy’s Effective Attack Power (EEAP)

image059

EAP = Enemy’s Attack Power

EAS = Enemy’s Attack Speed

OAR = Own Armour

OFH = Own Full Health

Enemy’s Health (EH)

image061

ECH = Enemy’s Current Health

EFH = Enemy’s Full Health

This variable normalized between [0-100] where 100 is healthiest.

These equations differ from the equations in `Attack or Flee` section of this essay since while determining the correct troop to attack should be done according to individual attack power not group attack power. The main reason for doing that is, if the group power is calculated instead of individual power against an enemy troop we cannot really choose a weak enemy for our attacking skills.

Another interesting point for this approach is this part doesn’t include Own Health parameter since this section is not determining our attacking or fleeing decision this means that the decision of the troop should be best fitting decision for the weakness degree not our own health.

After choosing FISs second part is creating the MF for these FISs.

image064

image066

image068

Figure 15 – Membership functions for Choosing Correct Troop to Attack

Different from the ‘Attack or Flee’ section, there are four MFs where healthy MF is new. The mean reason of putting the healthy parameter is increasing the precise for enemy’s health MF since it is really important for the decision calculation.

The next step for creating our fuzzy logic is creating the output (FOS). The output membership graph consists of three possibilities as we need a precise output for this short term decision.

image070

And the third step is creating rules. In this section, there are 36 possible combinations however it is reduced by using not operations as a result there are 15 combinations to implement, these are;

Inputs

Output

OEAP

EH

EEAP

strong not healthy All conditions attack
strong healthy All conditions neutral
not weak near dead not weak attack
weak near dead weak neutral
not weak injured All conditions attack
weak injured All conditions attack
normal healthy All conditions neutral
weak healthy All conditions don’t attack
normal normal strong attack
normal normal not strong neutral
weak normal All conditions don’t attack
weak not near dead All conditions neutral
not weak All conditions strong attack
weak All conditions strong neutral

Table 4- Fuzzy rules for choosing correct troop to attack

The main goal of these rules is attacking to the weakest in terms of its health and our attack power against this troop as well as the strongest enemy in terms of its attack power. After creating out fuzzy system the troop can choose the weakest enemy after the defuzzication operation.

image071

Figure 16- Example for which troop to attack decision

To give an example, let’s suppose that there are three troops, first and second are from Circle Team and the third one is from Square Team (Figure 16). And the variables for the troop 3 against 2 and one are;

Troop 1

Troop 2

OEAP

EH

EEAP

OEAP

EH

EEAP

88

19

88

55

7

95

Under these conditions, the troop is powerful against the troop 1 and health of troop 1 is bad but better than troop 2, however, troops 2’s effective attack power is better than troop 1. So it is hard to determine since troop 2 is closer to dead but our attack power is not as strong as against troop1. But the fuzzy logic has an answer for both of the troops;

Troop 1

Troop 2

Output (result of defuzzication)

66.7

67

Also the results of the fuzzy logic are very close to each other as expected but there is a numerical result that we can decide according to. So the troop will attack to troop 2 as the output is bigger than troop 1’s output.

image074

Figure 17- Outputs for troop 1

image076

Figure 18- Outputs for troop 2

b.     Long Term Decisions in RTS Games

The easiest explanation for long term decisions in RTS games can be described as the decisions which affect all or a group of troops or buildings for a long time as well as can be considered as the management strategy of the game player. The best example for long term decisions is resource management during game play which means how many resource collectors (e.g. villagers, harvesters) should be created and ordered to collect certain types of resources. In addition to resource managements other examples for long term decisions are; determining correct buildings to build, achieving correct upgrades/technologies, choosing correct time to attack with full power etc.

Not only for RTS games but also for any kind of game long term decision making always a big problem as the number of possibilities is may not be a certain value. Even if the numbers of possibilities are certain then possibilities and parameters are too much to make an easy decision. Moreover the difference between a short term decision and a long term decision is simply unreasonable short term decisions can be fixed in a short time however long term decision cannot be fixed so easily.

For RTS games long term decisions are vital since the CCP (computer controlled player) should behave realistic, intelligent as well as shouldn’t to not to frustrate the human player. The most common method for making these kinds of decisions are again the finite state machines which cause nearly a static game play. Every time the game player experiences the same game for the same map. To give an example in AOE2, at the end of the fourth minute, the CCP sends a scout to explore the map and creates a static number of villagers, orders them in a static way such as; four villagers hunt deer, ten of them collect food from trees etc., at the end of the seventh minute CCP attacks with a constant number of troops. This causes very static game scenario since the human player finds a way to repel the CCP and after the human player cannot be defeated by the CCP because the CCP cannot create any new way. The game producers of these RTS games usually create an AI with a static behaviour by using scripting languages. Therefore to create a strong opponent against the human player they prefer using cheating methods such as gifting new troops as well as resources which frustrates the human player.

To create a realistic behaviour for CCPs, which means intelligent and learnable behaviour, some other methods should be used such as ANN since it provides learning for CCPs. In addition to this ability of ANNs, ANNs may also provide un-deterministic behaviour by integrating GA (Genetic Algorithm). However, this method consumes lots of CPU time and provides a very slow learning.

This section of this paper will recommend using ANN (artificial neural networks) for long term decision making by giving some basic ideas such as defining input layer parameters for ANN and the outputs as well. However, all the recommendations are hypothetical since none of the ideas are based on any calculations or experiment. For these hypothetical recommendations the famous RTS game AOE2 (Age of Empires 2) will be used as the RTS game environment.

i.      ANN Structure and Hypothetic Feeding Mechanism of the ANN

As mentioned before on this paper, an ANN structure consists of three fundamental elements; inputs, weights and outputs. In our system inputs are the numerical values of a relevant system such as; for resource management system first input can be the recourse (e.g. gold, wood) the player has and the outputs are actions such as; create villager or send a villager to collect gold.

image077

Figure 19 – An ANN Module

As mentioned before, the training method for the RTS AI is back propagation which is a supervised learning method and the weight calculations are not dynamic. This means the calculation of weights can be done after the game play.

To reduce the calculations and to separate behaviours, the module approach is preferred to be used.  To give an example from our conceptual game AOE2, behaviour of the CCP during dark-age in terms of resource collection should be different from castle age. Instead of using the current age as an input value for the AI, creating different ANN modules can be another approach. This also provides parallel calculations for server-side such as calculation of dark- age could be made separately from the calculation of castle age by using threads. Another benefit of this approach is the success of each module can be send to server as training data if it fits the necessary criteria such if the player achieves castle age before computer this data can be send to the server even if the player losses the game. However, determination of the criteria should be search carefully to not to train the AI with novel data.

image079

Figure 20 – Before games start all weights getting from AI server

To load the weights of the game AI, the client should connect to the AI server and should get the trained ANNs weights to provide a better and dynamic game play. This method can be another approach for reducing piracy in games like, if the player uses a pirate version of the game cannot download the last AI weights since to download the AI data server can request for the player ID and password first.

Also the game AI server should contain trained and tested data before release of the game AI since, no one wants to play with a dummy AI. So the game company should be well trained the game AI modules which can be a problem for the game development.

image081

Figure 21 – At the end of each game all input and outputs sending to AI server for calculation of weights if the result of the game is successful

Briefly the steps of this system are;

For AI loading;

  1. Connect to AI server
    1. Check for ID, Password
    2. Send weights if successfully connected
    3. Create the local AI with new weights
    4. If cannot connect
      1. Use the previous ANN weights.

For data collection (collecting the training data);

  1. For every action
    1. Collect input variables of the relevant AI module
    2. Collect the action
    3. If the current ANN module reaches the criteria to be changed
      1. Change the ANN module
      2. If the ANN training data reaches the criteria, mark to be sent to server
      3. Else don’t mark and don’t discard the ANN training data set
      4. At the end of the game
        1. Send all ANN training data sets of the winner’s.
        2. Send all the marked ANN training data sets.

For Server Side Weight calculations;

  1. If there is new data,
    1. join all the data of relevant ANN modules including new data and old data
      1. i.      Create threads for every ANN module
      2. ii.      Calculate until reaching a threshold for every ANN module
      3. If there is no new data,
        1. Define threshold values lower than the old threshold values
          1. i.      Create threads for every ANN module
          2. ii.      Calculate until reaching a threshold for every ANN module

ii.      Example ANN Structure – 1: Resource Management in Dark Age

To create the ANN structure first of all the input values should be determined. In a RTS game for resource management can be the number of resources, number of troops collecting the resources.

For our example game AOE2, the input values for dark-age can be chosen like;

1. Number of workers

2. Number of workers collecting food

3. Number of workers collecting wood

4. Number of workers collecting gold

5. Number of workers collecting stone

6. Amount of wood we have

7. Amount of wood we have

8. Amount of gold we have

9. Amount of stone we have

10. Race of the side (such as Aztecs, Goths, Turks etc.)

And the outputs of the ANN module are decisions such as;

1. Create villager

2. Send a villager from food to wood

3. Send a villager from food to gold

4. Send a villager from food to stone

5. Send a villager from wood to food

6. Send a villager from wood to gold

7. Send a villager from wood to stone

8. Send a villager from gold to food

9. Send a villager from gold to wood

10. Send a villager from gold to stone

11. Send a villager from stone to food

12. Send a villager from stone to wood

13. Send a villager from stone to gold

The data collection in the game should be triggered by the action such as, if the player choose a villager from stone collection and send him to wood collection than all the input data and should be collected after the worker starts working.

At the same time another module is working for achieving technologies and if the resources are enough to pass the next age than all the data should be recorded according to the algorithm mentioned before.

iii.      Example ANN Structure – 2: Deciding the Correct Troops to Create

This part can be a either fuzzy logic or ANN since it seems like a spontaneous action, however the number of variable makes the fuzzy logic nearly impossible to implement this part and also actually this decision is not only spontaneous but also effects the game in long term as creating of Paladin is as expensive as achieving some technologies.

The input variables can be;

  1. Number of own archers
  2. Number of own knights
  3. Number of own swordsmen
  4. Number of own skirmishers
  5. Number of own special troops
  6. Number of own other troops
  7. Number of own defensive buildings
  8. Number of enemy’s archers
  9. Number of enemy’s knights

10.  Number of enemy’s swordsmen

11.  Number of enemy’s skirmishers

12.  Number of enemy’s special troops

13.  Number of enemy’s other troops

14.  Number of enemy’s defensive buildings

15.  Score of the enemy (this variable provided by the AOE2 game environment)

16.  Own score

17.  Own race

18.  Enemy’s race

19.  Amount of wood we have

20.  Amount of wood we have

21.  Amount of gold we have

22.  Amount of stone we have

Outputs are decisions again;

  1. Create archer
  2. Create knight
  3. Create swordsman
  4. Create skirmisher
  5. Create special troop

As seen above, the numbers of inputs are pretty much since it is really hard to determine even for human instinct which troop to create that makes the game interesting and keeps players’ passions on the game. Therefore if this system really can be implemented and reaches a degree of success than game’s lifetime for players definitely will be longer as most of the strategy game players play strategy games for the game AI not only for graphics.

Conclusion

Artificial intelligence is the most important for strategy games and especially for real time strategy games since the computer controlled opponents should make reasonable decisions spontaneously. For most of the strategy games, game producers preferred the easy way; cheating in different ways such as giving new troops, increasing income rate providing from resources and other similar ways, however results of these methods never satisfied the game players. The most important reason for that is; when the human player finds a way to defeat the computer, by using the same patterns (tactics), the human player never lose again. Briefly human player can learn but the computer cannot. Actually they can but this method never tried in commercial strategy games. There are several reasons behind that; first of all the CPU intensity of the learning structure, secondly the success rate is not known as the learning method hardly used in games. Not only machine learning but also spontaneous decision making, in most of the strategy games, is poor. Actually these short term decisions can be made with fuzzy logic and in some of the new strategy games it seems to be used (Command Conqueror Tiberium Wars) for choosing correct troop to attack, however it is not definite since no explanation have been made about the game AI for this game.

At first sight the implementation of fuzzy logic could be seem like simple especially for the short term decision making, however, the exponentially increasing number of possibilities makes hard the fuzzy logic to be implemented. On the other hand, in this paper, the tested short term decisions seem working. Especially choosing correct troop in the range to attack seems to be a successful application of fuzzy logic according to test results but without a real time application it is not reasonable to judge as a successful application of fuzzy logic since CPU usage is not known.

The conceptual distributed server based ANN application hasn’t tried yet by any game. Besides this kind of distributed approach haven’t tried for any kind of game AI yet. The CPU usage problem of the ANN is a known weak point of this structure. Another weak point of ANNs is the requirement for huge number of feed data (training). This conceptual distributed system may be a key for solution of these problems. Additionally, this trained AI structures can have a commercial value since they can decrease the number of pirate usage as the strategy game players always want to play with a more intelligent opponent and most of them will want to be registered to the AI service provider this means preventing piracy.

As a result, the problems about short term decision making and also long term decision making tried to be solved by using ANNs and Fuzzy Logic but none of the approaches has been tried in a real time game. However, the results for fuzzy logic seem to fit like a glove the given problem. On the other hand the conceptual ANN distributed system can be a new approach for solving the problems of ANNs in RTS games.

References

[1] David M. Bourg & Glen Seemann, AI for Game Developers, July 2004, First Edition, O’Reilly

[2] Mat Buckland & Andre LaMothe, AI Techniques for Game Programming, 2002, Premier Press

[3] Mat Buckland, Programming Game AI by Example, 2005, Wordware Publishing Inc.

[4] Todd Barron, Strategy Game Programming with DirectX 9.0, 2003, Wordware Publishing Inc.

[5] Mark A. DeLoura (edited by),  Game Programming Gems, 2000, Charles River Media

[6] Mark A. DeLoura (edited by),  Game Programming Gems 2, 2001, Charles River Media

[7] Steve Rabin (edited by), AI Game Programming Wisdom, 2002, Charles River Media

[8] T.J.A. Jansen, Player Adaptive Cooperative Arti_cial Intelligence for RTS Games, June 13, 2007, http://www.cs.unimaas.nl/~uiterwyk/Theses/BSc/Jansen_BSc-paper.pdf, last visit date: 30-11-2007

[9] Michael Chung, Michael Buro, and Jonathan Schaeffer, Monte Carlo Planning in RTS Games, http://www.cs.ualberta.ca/~mburo/ps/mcplan.pdf, visit date: 30-11-2007

[10] F.C. Schadd, June 13 2007, Hierarchical Opponent Models for Real-Time Strategy Games http://www.cs.unimaas.nl/~uiterwyk/Theses/BSc/Schadd_BSc-paper.pdf, visit date: 30-11-2007

[11] Daniel Johnson and Janet Wiles, Computer Games with Intelligence, http://www.itee.uq.edu.au/~uqdjohns/publications/AI/AJIIPS_AI.pdf, visit date: 30-11-2007

[12] Wikipedia, Artificial_neural_network, http://en.wikipedia.org/wiki/Artificial_neural_network, last visit date: 30-11-2007

[13] Wikipedia, Fuzzy Logic, http://en.wikipedia.org/wiki/Fuzzy_logic, last visit date: 30-11-2007

[14] Dr Emma Hart, Lecture Notes; Introduction to Neural Networks, 13-04-2005,  http://www.soc.napier.ac.uk/module.php3?op=getlecture&cloaking=no&lectureid=365774, last visit date: 30-11-2007

[15] Christos Stergiou and Dimitrios Siganos, NEURAL NETWORKS, http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#Why%20use%20neural%20networks, last visit date: 30-11-2007

[16] Stuart James Kelly, Applying Artificial Intelligence, Search Algorithms and Neural Networks to Games, 2003, http://www.generation5.org/content/2003/KellyMiniPaper.asp, last visit date: 30-11-2007

[17] Artificial Neural Networks, 1999,  http://www.psych.utoronto.ca/users/reingold/courses/ai/nn.html, last visit date: 30-11-2007

[18] Artificial Intelligence in Games , 17 Jul 2006, http://www.codeproject.com/KB/architecture/aigame.aspx, last visit date: 30-11-2007