Jump to content
Eternal Lands Official Forums
bluap

Eye candy bug? - harm spell

Recommended Posts

I've suspected for a while that the eye candy harm spell effect was causing the memory corruption that leads to a client crash on exit, in the SDL_Quit() call of main.c This was reported on Mantis here. While doing some other bug tracking, I found a 100% reproducible glGetError() was happening in the eye candy code, I eventually tracked this down and enabled some more debug. However, after quite some time trying, I still haven't got to the bottom of the problem so thought someone else might be able to have a look, rei for example? :P Anyway, here's my debug:

Index: eye_candy/eye_candy.cpp
===================================================================
RCS file: /cvsroot/elc/elc/eye_candy/eye_candy.cpp,v
retrieving revision 1.77
diff -a -u -r1.77 eye_candy.cpp
--- eye_candy/eye_candy.cpp	 9 Jan 2008 21:52:20 -0000	   1.77
+++ eye_candy/eye_candy.cpp	 15 Jan 2008 16:43:09 -0000
@@ -1949,8 +1949,12 @@

void EyeCandy::draw_point_sprite_particle(coord_t size, const GLuint texture, const color_t r, const color_t g, const color_t b, const alpha_t alpha, const Vec3 pos)
{
-//  std::cout << "A: " << size << ", " << texture << ", " << Vec3(r, g, b) << ", " << alpha << std::endl;
  glPointSize(size);
+  if (glGetError() != GL_NO_ERROR)
+  {
+	   std::cout << "A: " << size << ", " << texture << ", " << Vec3(r, g, b) << ", " << alpha << std::endl;
+	   return;
+  }
  glBindTexture(GL_TEXTURE_2D, texture);
  glBegin(GL_POINTS);
  {

Here's an example of the output you get if you cast a harm spell:

A: -3.323e+38, 4284743338, <nan, -3.37619e+38, -2.6318e+38>, 0

The size values are always negative or zero when the error occurs, to me this looks like uninitialised or invalid memory. Setting a gdb breakpoint on the std::cout, you can get a back trace, but for brevity, here's the call tree:

(gdb) where
#0  ec::EyeCandy::draw_point_sprite_particle (this=0x8376a20,
size=-0.466198832, texture=3218243508, r=-0.348312348, g=84.8962173,
b=-0.180342346, alpha=0, pos=@0xbf8cebfc) at eye_candy/eye_candy.cpp:1955
#1  0x0813ba73 in ec::Particle::draw (this=0xe28eeb0, usec=36133)
at eye_candy/eye_candy.cpp:764
#2  0x0818618d in ec::TargetMagicParticle::draw (this=0xe28eeb0, usec=36133)
at eye_candy/effect_targetmagic.cpp:286
#3  0x0813a26d in ec::EyeCandy::draw (this=0x8376a20)
at eye_candy/eye_candy.cpp:1649
#4  0x0812a642 in ec_draw () at eye_candy_wrapper.cpp:425
#5  0x08099fda in display_game_handler (win=0x8f312a8) at gamewin.c:1104
#6  0x0808d8cd in draw_window (win=0x8f312a8) at elwindows.c:1140
#7  0x0808df25 in display_window (win_id=0) at elwindows.c:1309
#8  0x0808afab in display_windows (level=1) at elwindows.c:76
#9  0x080842a6 in draw_scene () at draw_scene.c:116
#10 0x080bd9b8 in start_rendering () at main.c:168
#11 0x080bdd67 in main (argc=2, argv=0xbf8cf164) at main.c:307

I'm using default compile options, latest CVS, Debian Linux, nvidia graphics 1.0.8776-4. I did most of this testing (and this example) using real harm spells. However, eventually I remembered Florian's debug eyecandy window so gave that a try. It works very nicely indeed and can be used to reproduce the error. May be we should commit that code.... :P

Share this post


Link to post
Share on other sites

Well, that definitely appears to be a problem. You should never get NANs or extreme values (positive or negative) for size, color, velocity, or position. It's strange to see size messed up, though. I don't see anywhere that size gets changed in TargetMagicEffect::HARM. It's just set to its intiial defaults (either 12 or 2 + randcoord(6), and it stays that way. Perhaps what you're seeing is a side effect of a buffer overflow elsewhere? Unfortunately, buffer overflows can be anywhere in the program :P I'd start commenting out code, hacking out one bit of functionality after another. Also, might not be a bad idea to see how far back you can reproduce this -- try way earlier versions and see if you can reproduce it in the same way. That'd at least rule out a buffer overflow introduced by recent code changes.

Share this post


Link to post
Share on other sites

...Also, might not be a bad idea to see how far back you can reproduce this -- try way earlier versions and see if you can reproduce it in the same way. That'd at least rule out a buffer overflow introduced by recent code changes.

I've just checked out and built (with the above small patch) the elc_1_5_0 labelled version of the code. This has the same problem, with similar output e.g.

A: -1.40897e+38, 4279238569, <-2.81794e+38, nan, -3.11036e+38>, 0
A: -2.55216e+38, 4285792459, <nan, -3.04396e+38, -3.26993e+38>, 0
A: -8.17457e+37, 4270522010, <-1.52194e+38, -1.12316e+38, -1.33584e+38>, 0
A: -2.45906e+38, 4282646498, <nan, nan, nan>, 0
A: -1.22951e+38, 4278583153, <-2.28626e+38, -1.66817e+38, -1.22287e+38>, 0

Thanks for having a look at this KarenRei.

Share this post


Link to post
Share on other sites

I'm thinking not 1.5.0, but way, way back -- what version did eye candy first go into, 1.3.3 or so? I.e., far enough to ensure that it's no insidious buffer overflow that's snuck in since then. Because it looks like a buffer overflow, and that could be anywhere.

Edited by KarenRei

Share this post


Link to post
Share on other sites

I've tried to launch the client with valgrind to see if I can catch a buffer overflow but I wasn't successful. But I had a very bad fps with valgrind and AFAIK the idling of eye_candy is based on the fps so maybe I missed it. Is it possible?

Share this post


Link to post
Share on other sites

I've tried to launch the client with valgrind to see if I can catch a buffer overflow but I wasn't successful. But I had a very bad fps with valgrind and AFAIK the idling of eye_candy is based on the fps so maybe I missed it. Is it possible?

Tried that many times with no luck on this one. The FPS for me is bearable if I keep indoors.

 

I'm thinking not 1.5.0, but way, way back -- what version did eye candy first go into, 1.3.3 or so? I.e., far enough to ensure that it's no insidious buffer overflow that's snuck in since then. Because it looks like a buffer overflow, and that could be anywhere.

I was afraid you meant that. I can try but its going to be a difficult.... Was kinda hoping you'd be able to eyeball the code and spot the problem quick as a flash :pickaxe:

 

edit: presumably you've both been able to reproduce the problem, unlike Florian?

Edited by bluap

Share this post


Link to post
Share on other sites
edit: presumably you've both been able to reproduce the problem, unlike Florian?
I don't have the crash but I have the same errors than you when I put the code you posted.

However, I've had the crash with my old video card.

 

Anyway, I was successful to trigger the error in valgrind, the only thing to do was to lower the framerates to the minimum in the eye_candy options :blink:

And I obtain a lot of errors like the following ones:

 

==6702== 1564 errors in context 342 of 360:
==6702== Use of uninitialised value of size 4
==6702==	at 0x44BAED3: (within /lib/tls/i686/cmov/libc-2.6.1.so)
==6702==	by 0x44BBD05: __printf_fp (in /lib/tls/i686/cmov/libc-2.6.1.so)
==6702==	by 0x44B6AE6: vfprintf (in /lib/tls/i686/cmov/libc-2.6.1.so)
==6702==	by 0x44D9EF3: vsnprintf (in /lib/tls/i686/cmov/libc-2.6.1.so)
==6702==	by 0x472471B: (within /usr/lib/libstdc++.so.6.0.9)
==6702==	by 0x4732CEB: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, char, double) const (in /usr/lib/libstdc++.so.6.0.9)
==6702==	by 0x4733204: std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const (in /usr/lib/libstdc++.so.6.0.9)
==6702==	by 0x473E78F: std::ostream& std::ostream::_M_insert<double>(double) (in /usr/lib/libstdc++.so.6.0.9)
==6702==	by 0x473E8E3: std::ostream::operator<<(float) (in /usr/lib/libstdc++.so.6.0.9)
==6702==	by 0x814B0D0: ec::EyeCandy::draw_point_sprite_particle(float, unsigned, float, float, float, float, ec::Vec3) (eye_candy.cpp:1957)
==6702==	by 0x814ECF0: ec::Particle::draw(unsigned long long) (eye_candy.cpp:764)
==6702==	by 0x8199E75: ec::TargetMagicParticle::draw(unsigned long long) (effect_targetmagic.cpp:292)
==6702==
==6702== 471 errors in context 321 of 360:
==6702== Conditional jump or move depends on uninitialised value(s)
==6702==	at 0x4A2FC60: (within /usr/lib/libGLcore.so.100.14.19)
==6702==	by 0x814ECF0: ec::Particle::draw(unsigned long long) (eye_candy.cpp:764)
==6702==	by 0x8199E75: ec::TargetMagicParticle::draw(unsigned long long) (effect_targetmagic.cpp:292)
==6702==	by 0x814D4AF: ec::EyeCandy::draw() (eye_candy.cpp:1649)
==6702==	by 0x813C273: ec_draw (eye_candy_wrapper.cpp:425)
==6702==	by 0x809F2AC: display_game_handler (gamewin.c:1104)
==6702==	by 0x8092668: draw_window (elwindows.c:1140)
==6702==	by 0x8092CC0: display_window (elwindows.c:1309)
==6702==	by 0x808FD46: display_windows (elwindows.c:76)
==6702==	by 0x8088CE1: draw_scene (draw_scene.c:116)
==6702==	by 0x80C42C7: start_rendering (main.c:168)
==6702==	by 0x80C46B9: main (main.c:307)

 

I've checked in the code and valgrind has well spotted the problem (like always :P ). The problem is that the blur_motion array from the Particle class is build but never initialized because the default constructor of the class ParticleHistory is empty. Then if we look at the code where the function ec::EyeCandy::draw_point_sprite_particle is called, it always access to these values and never initialize them...

 

@Karen: as I don't know what your code do, please can you have a look to see what should be done about this initialization?

Share this post


Link to post
Share on other sites

....

Anyway, I was successful to trigger the error in valgrind, the only thing to do was to lower the framerates to the minimum in the eye_candy options :blink:

Fantastic news :P

Meanwhile, after regressing various packages on my system I managed to checkout and build a version from the about the time of the 1.4.0 release (it's not labelled in CVS as far as I can see). I hacked the version and protocol numbers (sorry Entropy, I was in the test server) and applied my mini patch. I get similar results as with the latest code, including the crash on exit. Hopefully that puts 1.5.0 code off the hook. Now to repair my system....

cvs  -z3 -d:pserver:anonymous@cvs.elc.berlios.de:/cvsroot/elc co -D "4 Jun 2007 00:00 gmt" elc

Share this post


Link to post
Share on other sites

Aha! That's what's different about Harm from all of the others -- it uses the motion blur! I didn't think any effects were currently using that. And while the particle position history used for motion blur will quickly fill with entries, by default it doesn't have any in it. It sets the alpha of all of the ParticleHistory entries to zero so that they won't draw, but doesn't mess with their size, which causes problems in the draw routine. Oh, duh... Nice catch, there, Schmurk :)

 

Seems the simplest answer is, as you suggested, to create a default constructor. So, in eye_candy.h, we'll just initialize everything to default values. Something like:

 

ParticleHistory()
{
 size = 0.00001;
 texture = 0;
 color[0] = 0.0;
 color[1] = 0.0;
 color[2] = 0.0;
 alpha = 0.0;
 pos = Vec3(0.0, 0.0, 0.0);
};

 

Anyone want to try this out?

Share this post


Link to post
Share on other sites

Anyone want to try this out?

Perfect! :)

Thank you both so much, this one's been on my nag list for ages. :P

Do you want me to commit the change?

 

On a related note, have you both tried Florian's eye candy debug testing window? It's an excellent tool for testing current and future effects. I'll commit that too if no one objects.

Edited by bluap

Share this post


Link to post
Share on other sites
On a related note, have you both tried Florian's eye candy debug testing window? It's an excellent tool for testing current and future effects. I'll commit that too if no one objects.

I didn't test it but I've seen a screenshot of it and it seems really nice. So yes, please commit it too :)

Share this post


Link to post
Share on other sites

On a related note, have you both tried Florian's eye candy debug testing window? It's an excellent tool for testing current and future effects. I'll commit that too if no one objects.

I have a more recent version with lots more spells. I wanted to upload a patch including my EC "bind spells to bones" changes and the test/debug window. But, as you may have seen on mantis, I'm currently quite busy with my private life, so if you don't mind I'll update the berlios patch with the half-finished current work.

Share this post


Link to post
Share on other sites

I have a more recent version with lots more spells. I wanted to upload a patch including my EC "bind spells to bones" changes and the test/debug window. But, as you may have seen on mantis, I'm currently quite busy with my private life, so if you don't mind I'll update the berlios patch with the half-finished current work.

There's no rush. Let me know when its ready.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×