Jump to content
Eternal Lands Official Forums
Entropy

Special effects

Recommended Posts

Mine says 1.5. You sure you can't update?

 

Patch committed; the invsqrt function is now broken out. I left fastsqrt in eye candy since it's so simple; if we need to move it later, I will.

 

Browsing through the CVS shows that the current revision is 1.4.

 

Oh, and I think you forgot to add dont_optimize.h/cpp into CVS :hehe:

Share this post


Link to post
Share on other sites

Ack, thanks for that catch!

 

This is strange -- the thing with effect_breath.

 

[meme@spathi eye_candy]$ cvs status effect_breath.cpp

rei@cvs.elc.berlios.de's password:

===================================================================

File: effect_breath.cpp Status: Up-to-date

 

Working revision: 1.5

Repository revision: 1.5 /cvsroot/elc/elc/eye_candy/effect_breath.cpp,v

Sticky Tag: (none)

Sticky Date: (none)

Sticky Options: (none)

 

I'm going to try making a trivial edit to the file, then recommitting. Let's see if that forces an update for you.

 

It's checked in now.

Share this post


Link to post
Share on other sites

Ok I get the new files.. don't have time to build right now.... and I know the problem with effect_breath... I'm talking about the .h file not .cpp. The 10 that need to be changed to 10.0f is in the .h file :hehe:

Share this post


Link to post
Share on other sites

I have some information about the invsqrt problem.

The problem is that optimization passes in the compiler assumes the strict aliasing rule. E.g. it assumes that a pointer to an int can't point to the same memory location as a pointer to a float. Therefore it can reorder the reads and writes to generate faster code. In the invsqrt function the strict aliasing rule is broken by the pointer casting and therefore invalid code may be generated. For more info see http://www.cellperformance.com/mike_acton/...t_aliasing.html

 

I made the following test program, save it as invsqrt.c

 

#include <stdio.h>
#include <math.h>

/* This one breakes strict aliasing rule */
__attribute__ ((noinline)) float invsqrt(float f)
{
float half = 0.5f * f;	   
int i = *(int*)((void*)&f);		  
i = 0x5f3759df - (i >> 1);   
f = *(float*)((void*)&i);
f = f * (1.5f - half * f * f);	   
return f;	
}

/* This one is ok */
__attribute__ ((noinline)) float invsqrt2(float f)
{
union { int i; float f; } tmp;
float half = 0.5f * f;
tmp.f = f;
tmp.i = 0x5f3759df - (tmp.i >> 1);   
f = tmp.f;
f = f * (1.5f - half * f * f);	   
return f;	
}

int main(int argc, char **argv)
{
float f = 0.1;
int i;
for(i = 0; i < 100; i++) {
	printf("%f = %f, %f, %f\n", f, invsqrt(f), invsqrt2(f), (float) (1/sqrt(f)));
	f += 0.1;
}
return 0;
}

 

Try the following compilation commands:

$ gcc invsqrt.c -lm

$ gcc invsqrt.c -lm -O2

$ gcc invsqrt.c -lm -O2 -fno-strict-aliasing

 

Running ./a.out after each and check the result. The middle one is bad for me but the first and last give good results.

 

This is my version of gcc:

gcc (GCC) 4.1.1 (Gentoo 4.1.1-r3)

 

Edit:

Try compiling with -S to get the assembler output. In the -O2 case gcc figured it could remove the whole line that calculates i (because the result "can't be used"), resulting in a float load from an uninitialized stack position.

 

Edit2:

Try removing the casts to void* and add -Wall to the compilation command. Now you get the following warnings:

invsqrt.c: In function ‘invsqrt’:

invsqrt.c:7: warning: dereferencing type-punned pointer will break strict-aliasing rules

invsqrt.c:9: warning: dereferencing type-punned pointer will break strict-aliasing rules

The void* casts hid this warnings. Maybe the casts where put there to get rid of the warnings, but that was the wrong fix. I've been bitten by this "bug" before and I hope the info I have given here will help others avoid it in the future.

Keep up the good work, I like the look of the new features in EL. I am looking forward to start playing a bit again after the next update.

Edited by Troca

Share this post


Link to post
Share on other sites

Troca, you're a dear. :blink: That's exactly what we need. I'll mess with the Linux Makefile and update the documentation for other platforms this evening.

 

Adding -fno-strict-aliasing is just a workaround. The solution is to use the invsqrt2 function in my example code. The compiler can make better optimizations if you fix the code to follow the strict aliasing rule and compile with -fstrict-aliasing (the default with -O).

Share this post


Link to post
Share on other sites

I'm not sure how that's better, though. From what I read on the page that you linked, the union workaround only works because it first *copies* the value into the union. That's an extra operation. By using -fno-strict-aliasing, it won't be needing to do that copy. Given how short this code is, and given that there are no real "array" operations, I don't think that strict aliasing would speed it up. In short, it seems that no-strict-aliasing would be faster.

 

What do you think?

 

P.S. -- You're right, the void* casts were to get rid of those warnings. I googled for those warnings when I first got them (an older GCC didn't give them, and the code was working fine with the older GCC). One page recommended the void* cast, so that's what I did. Clearly, that was the wrong option. :)

Edited by KarenRei

Share this post


Link to post
Share on other sites

Come to think of it, though: separating it into a different file, meaning a different .o, is preventing inlining, so you're right -- your version probably is faster.

 

I'll do it that way. Actually, even better, I'll put it in the .h so that it's compiled each time: thus, files other than eye_candy.o will get faster square roots by being able to inline it. If I don't have to worry about the aliasing bugs due to the union, it doesn't matter if it gets inlined. This will, of course, mean making it a static function inside an object. Perhaps this should go in math cache, then...

 

It seems frustrating to me that we have to add in an extra op to stop the compiler from doing something stupid. But it seems that it's either that or disable strict aliasing for a lot of code.

Edited by KarenRei

Share this post


Link to post
Share on other sites

Well in my opinion the new code for invsqrt is worth using.

 

Let us not forget that the client is build on at least 3 different platforms with about 4-5 different compilers. Taking this into consideration I think we can sacrifice a bit of speed in favor of portability. We don't know if that '-fno-strict-aliasing' option can be enabled in all of them (I didn't see it it VC++; maybe I missed it) What's more, by using and union we get rid of the call to invsqrt_workaround.

 

I tested this new invsqrt WITH SSE2 enabled and it works ok, so even though there is an union, we might actually gain by SSE2 optimization. :)

Share this post


Link to post
Share on other sites

Compile with

$ gcc invsqrt.c -S -O2 -fno-strict-aliasing

and look at the generated code. The difference between invsqrt and invsqrt2 is not that big. A quick look and I think there may be two additional stores to the stack in my version. The stack is in cache memory so that is not a big impact I think. Is it worth the extra complication of having a separate file compiled with different options, I don't know?

 

Oh, I'm typing too slow, I see you already answered this yourself, well... :)

Share this post


Link to post
Share on other sites

Yeah, I think I've been neglecting the benefits of inlining a bit. I'm going to move all of the simple math functions from eye_candy.cpp to math_cache.h. It'll slow compiling down a tad, but it will give the code the option to inline those functions.

 

You know, I think we're actually going to end up with faster code after all of this. :)

Share this post


Link to post
Share on other sites

Yeay! I got a crash!

 

Normally that's not something to celebrate, except that it appears that my recent changes to hook in many more special effects are causing a crash right where someone earlier had reported problems with one. I wasn't able to reproduce it at the time, but now I can. Which means that I can now fix it.

 

I'll keep you all posted.

 

... And, I think I've got it. But it's too late for me to fix it now. I'll try out my theory tomorrow.

Edited by KarenRei

Share this post


Link to post
Share on other sites

invsqrt() and SSE

Don't forget that SSE/MMX2 already includes reciprocal sqrt functions,
and
(four operations at once)
. These are equivalent to two iterations of the approximation used by the invsqrt function, giving ~14bit accuracy in 4-7 cycles. They are accessible from GCC, ICC, and VC through the Intel definitions.

 

With the degree of approximation in invsqrt(),
do not use it
where the results may be compared, particulary if the difference between two results is important.

 

Share this post


Link to post
Share on other sites
With the degree of approximation in invsqrt(), do not use it where the results may be compared, particulary if the difference between two results is important.

 

I never do. :medieval: I tend to avoid floating point equality comparisons altogether; they're horribly unreliable unless you're looking for the chance that the exact same operations have been conducted on both variables. Rarely does a floating point equality check even make much sense. Greater than or less than comparisons, sure; they're just fine.

 

Don't forget that SSE/MMX2 already includes reciprocal sqrt functions, rsqrtss and rsqrtps (four operations at once). These are equivalent to two iterations of the approximation used by the invsqrt function, giving ~14bit accuracy in 4-7 cycles. They are accessible from GCC, ICC, and VC through the Intel definitions.

 

Not familiar with those functions, so thanks for the link. I have no use for four square roots at once, but rsqrtss may well be faster than invsqrt, since it's hardware. However, it appears to be an assembler statement, which would raise portability issues; I'd hesitate to use it. Are there any portable functions that I could call for "fast inverse square root" that would make use of these statements on appropriate architectures? I've benchmarked invsqrt() versus 1.0 / sqrt() with SSE2, and invsqrt is much faster. Any other portable square root functions?

 

Just to head off anyone who might be thinking it, I already eliminiated square roots wherever possible. Cases where there was a square root over another square root were turned into a single square root. Cases where two distances were being compared to each other, I instead compare the distances squared. Etc. However, there are still many square root cases left that cannot be optimized out.

Edited by KarenRei

Share this post


Link to post
Share on other sites

It wasn't floating point equality I was thinking of (which as you say is bad practice), but other comparisons, such at taking differences; for two similar values the difference can easily be less than the error, and you then have garbage in the system. The use of the approximation is ok if it is your answer, but be wary of the consequencies if it is an intermediate result.

 

SSE/MMX functions can be found hidden away in "xmmintrin.h" (on GCC and VC), which is:

Implemented from the specification included in the Intel C++ Compiler User Guide and Reference, version 8.0.

I haven't got a quick introduction to using these functions, but google should provide.

 

Incidentally, there was a reference that doing "X * RSQRTSS (X)" was quicked than plain old "sqrt (X)"; but I suspect that is very dependant on whether sqrt() has been implemented as a builtin operation.

Edited by trollson

Share this post


Link to post
Share on other sites
but other comparisons, such at taking differences

 

One of my favorite graphics books that I had in the mid 90s was the "Zen of Graphics Programming" by Michael Abrash. At the start of one chapter, he mentions how one of the asteroids in "The Empire Strikes Back" was a potato, while another was a shoe. How on Earth did they get away with that? Because everything is very "busy". Human eyes are very poor at perceiving imperfections in graphics when there's a lot of motion going on. You can cut corners left and right to improve speed. Even if it gives you some "wrong" answers, even ones with perceptable results, if those results are overshadowed by everything else that's going on, the viewer won't notice. In short, it's not a scientific application; all that matters in the end is whether the viewer can perceive any visual errors. I've tested with the quake invsqrt function and regular inverse square roots. No visual difference.

 

Just a few examples of where square roots are used.

* Determining the size to draw point sprites (if the size is off by a pixel, who will notice?).

* Gradually shifting one velocity vector into another (if the vec shifts slightly faster or slower, who will notice?).

* Maintaining "energy" levels in a gravity mover (simulating a point gravity source is prone to accumulated errors, so you have to regularly renormalize) (if the energy level is off for a frame, who will notice?)

* Determining the gravitational force for particles in a gravity mover (if the force is slightly off, who will notice?)

* Determining distances where you can't simplify by comparing two squared distances (many places) (if the distance determined is slightly wrong, who will notice?)

* Scaling particles as the level of detail changes (X times as many particles generally implies a particle size 1.0/sqrt(X) times the old size) (if it's not exact on the size scalar, who will notice?)

 

And so on.

 

Anyways: I'll look into that header and see what I can find. Thanks!

Edited by KarenRei

Share this post


Link to post
Share on other sites

I just did some benchmarks. Conclusion: xxmintrin.h is looking pretty darn nice! Once again, the quake routine blows away the default square root (by more than my last tests; interesting. It is on a different computer, though). However, even it gets blown away by the fast hardware routine, which seems to run almost as fast as the loop to determine overhead.**

 

Assuming that it's as portable as you say it is, I'll hook this in surrounded by #ifdef __SSE__. Yeay, even more performance. ;) Hmm, while I'm at it, have you heard of any fast power functions? I do exponents a good bit too -- they're needed to, among other things, deal with the fact that frames don't come at constant time intervals apart. If you want a particle's alpha to be cut in half every second, you need alpha *= (0.5 ^ time_in_seconds). I googled a while back and couldn't find any, so I wrote a powf cache that does linear interpolation between datapoints. It's better than nothing (the special case of 0.5 ^ x is ~40% the speed of powf; the case of 0.0 - 1.0 ^ x is ~75% the speed of powf), but I'm sure it could be done a lot better.

 

** While I didn't review the assembly code to make sure the compiler wasn't cheating, I did put in cheat-preventing features, such as storing up a sum of the accumulated results, then printing the sum so that the program has to actually compute the results. My best guess is that, since it can't unroll the for loop due to it's huge size, the pipeline that does the square root completes before the pipeline that deals with increasing the loop counter, the loop's conditional branch, and the sum's floating point addition.

Edited by KarenRei

Share this post


Link to post
Share on other sites
valgrind does have stuff about eye candy, however; here's a sample:
==27001== Invalid read of size 4

well, I found the cause of this :(
Valgrind

The NVIDIA OpenGL implementation makes use of self modifying code. To force Valgrind to retranslate this code after a modification you must run using the Valgrind command line option:

--smc-check=all

 

Without this option Valgrind may execute incorrect code causing incorrect behavior and reports of the form:

==30313== Invalid write of size 4

http://us.download.nvidia.com/XFree86/Linu...appendix-l.html

it's not anything in your code :)

Share this post


Link to post
Share on other sites

Interesting :(

 

I'm getting close to fixing all of the crash-potential issues here, so there should be a patch checked in tonight that will hopefully fix everyone's problems. Plus add in a number of new effects :)

Share this post


Link to post
Share on other sites

Latest update:

 

* Speed improvements

* Likely fixed the crash that some had reported.

* Many new special effects:

** Fountains

** Various smokes

** Teleporters

** Teleport in/out

** Various fires, torches, and candles

* Stayed up way, way, way too late.

 

As always, please let me know if I introduced any new bugs that I didn't catch. ;)

Share this post


Link to post
Share on other sites

I'll add a note that I plan to strip out the wind from the current wind effect into a separate object that can be accessed by other effects, including non-eye-candy effects. This would include smoke that blows in the wind, which I think would look pretty nice.

Share this post


Link to post
Share on other sites
Hmm, while I'm at it, have you heard of any fast power functions? I do exponents a good bit too -- they're needed to, among other things, deal with the fact that frames don't come at constant time intervals apart. If you want a particle's alpha to be cut in half every second, you need alpha *= (0.5 ^ time_in_seconds).
The various extensions don't seem to include exp or pow functions[2].

 

The only thing I saw mentioned [1] is that using log() and exp() is "much" faster than pow(). Try comparing:

y = exp (p * log (x);
y = pow (x, p);

And see if it helps (you may be able to cache log(x)?).

 

[1] Other than citations or abstracts to papers that are pay-to-read.

[2] The athlon maths library includes 'fastpow' and 'fastexp', but I'm not sure what that is exactly.

Share this post


Link to post
Share on other sites

As always, please let me know if I introduced any new bugs that I didn't catch. :)

 

I got one :) Beer for me :)

 

extern "C" void ec_delete_effect_loc_type(float x, float y, ec_EffectEnum type)
{
 for (int i = 0; i < (int)references.size(); )
 {
std::vector<ec_internal_reference*>::iterator iter = references.begin() + i;
if ((*iter)->dead)
{
  delete *iter;
  references.erase(iter);
  continue;
}

if (((*iter)->position.x == x) && ((*iter)->position.z == -y) && (type == (ec_EffectEnum)(*iter)->effect->get_type()))
{
  (*iter)->effect->recall = true;
  continue;
}
i++;
 }
}

 

In my case:

references.size() == 3

i = 2

 

if ((*iter)->dead) == false

 

if (((*iter)->position.x == x) && ((*iter)->position.z == -y) && (type == (ec_EffectEnum)(*iter)->effect->get_type())) == true

 

so... we continue and never increase the i. The i stays at 2 and we have infinite loop :)

 

Happens at random time after starting campfire effect.

 

Call stack

>	elc_d.exe!ec_delete_effect_loc_type(float x=52.750000, float y=73.250000, ec_EffectEnum type=EC_CAMPFIRE)  Line 212	C++
 elc_d.exe!remove_fire_at_tile(unsigned short x_tile=105, unsigned short y_tile=146)  Line 587 + 0x15 bytes	C
 elc_d.exe!process_message_from_server(const unsigned char * in_data=0x0d911a00, int data_length=7)  Line 986 + 0x15 bytes	C
 elc_d.exe!start_rendering()  Line 110 + 0x12 bytes	C
 elc_d.exe!Main(int argc=-1, char * * argv=0x00000000)  Line 251	C
 elc_d.exe!WinMain(HINSTANCE__ * hInst=0x00400000, HINSTANCE__ * hPrev=0x00000000, char * lpCmd=0x00151ef7, int nShow=1)  Line 306 + 0xd bytes	C
 elc_d.exe!__tmainCRTStartup()  Line 589 + 0x35 bytes	C
 elc_d.exe!WinMainCRTStartup()  Line 414	C

Share this post


Link to post
Share on other sites

Ack, yep, that'll do it. :) No campfire effects happened to be recalled while I was on, I suppose.

 

I'll get that fix in tonight. If you want to try the fix out ahead of time, you just need to move the i++ up above that if statement. I'll check for any other code that might have this problem.

 

Ed: Unfortunately, looks like there's several of these. I recently changed a bit of code in this area to simply recall effects instead of outright deleting their references; however, if I don't delete the reference, the array doesn't shrink, so the loop never breaks. :)

 

I could check in the changes, but I haven't been able to test them (I'm at work) beyond compiling. Should I? They're pretty straightforward, and if there's a lockup, seems like checking is better than not checking in.

Edited by KarenRei

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×