Jump to content
Eternal Lands Official Forums
Wytter

Crash On Map Change

Recommended Posts

Greetings all,

 

I just want to know if we still have the problem where the client would suddenly crash on map change? Anyone have any recent info about this?

 

And remember, it must be from a non-modified CVS from say the past 2 weeks.

Edited by Wytter

Share this post


Link to post
Share on other sites

I'm not quite sure what you mean.

 

But the Closest thing that happens to me is whenever I Walk from Portland into Desert Pines and then switch to the Map, The Map will Freeze up and I have to wait a while until it unfreezes. I just assume it was because of my computer. I don't know if that's what you meant or not.

Share this post


Link to post
Share on other sites

OK, so the problem persists.

Could anyone with the CVS client do us a favor? Type #log conn data when you enter a new map (you can turn this off afterwards). If you crash, please post the last content of the connection_log.txt that you'll find in your EL folder (or ~/.elc/ if on *nix). That way we'd know if it's i.e. a stray message that causes these crashes or at least we'd know the last command getting send to the client before the crash...

 

I have never been able to catch this with a debugger so seems that it must be done by hand... Will try debugging it later - garg cave might be a good place to try reproducing it then (if that is true, perhaps it's a problem with accessing a stray pointer in actors_list? :-\)

Share this post


Link to post
Share on other sites

OK, will try it now. Using CKs CVS windows client.

 

 

Edit:Tried a few times, did not crash this time tho. I'll keep trying.

Edited by Derin

Share this post


Link to post
Share on other sites

Sorry, wytter,

 

this error is hard to get. I was lucky that i caught it under Windows with GCC.

 

Normally my Linux client crashes 1-2 times a month when changing maps and thats where im mostly on with ;(

 

All i can say is, the client crashes, when clicking on a door or a post or cave entrance. Theres a loop where all actors are checked if the click goes on one of them.

 

And one actor in the actor_list is rotten. Its not a NULL pointer, its a regular looking value for a pointer.

 

So my thoughts are, that is a pointer to freed memory. And that freed memory was not removed from actor_list.

 

Piper

 

PS: And, of course, im using a modified client with the CVS sources. But i isolated all my changes to a separate file which i can compile standalone and test with valgrind. Which i can unfortunately not do with the client, because valgrind crashes and reports a crash in a library. So, im pretty shure, that its not my modifications which cause the crash. Because my changes doesnt deal with actor_list or have any memory problems.

Edited by The_Piper

Share this post


Link to post
Share on other sites
You wouldn't happen to have a backtrace anywhere Pip? :-\

Not really ;((

 

If you mean the "where" command of GDB, maybe i should start the client always with GDB. But the crashes do show up very seldom here..

 

And, at least, it seems (thats my idea or view) that the problem is not the crashing client but the invalid data which is still in actor_list and a backtrace wouldnt show where that data was added or where that freed memory wasnt removed from actor_list.

 

Piper

Share this post


Link to post
Share on other sites

Nope, but a backtrace that doesn't help you directly tells a lot more than no backtrace :)

 

I don't know how it's implemented on Windows, but I'd like that everyone using the CVS-client with GNU/Linux would run their client in gdb: (first you must compile it with debug information, but that is the default make target)

 

Next, start gdb with the program as the first argument, type run and press enter and the client will start:

bjorn@darkhelmet elc $ gdb el.x86.linux.bin
GNU gdb 6.2.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) run

 

If the client crashes the window will not close, however you will get info that an error has occured. In this case I made a mistake on purpose (tried accessing a NULL-pointer) and caused the client to crash. You'll see something similiar to the following:

 

Starting program: /home/bjorn/elsrc/dev/elc/el.x86.linux.bin
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
[Thread debugging using libthread_db enabled]
[New Thread 182915818272 (LWP 2867)]
[New Thread 1082128736 (LWP 2870)]
[New Thread 1090517344 (LWP 2871)]
[Thread 1090517344 (zombie) exited]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182915818272 (LWP 2867)]
0x000000000043a436 in HandleEvent (event=0x7fbffff2d0) at events.c:46
46              actors_list[200]->x_pos=0;
(gdb)

 

Now, the first thing you'd do to check what could cause this to happen would be trying to get the value of the variable that caused the crash.

 

(gdb) print actors_list[200]
$1 = (actor *) 0x0
(gdb)

 

Oh, so we tried accessing a NULL-pointer - that'll explain the error :)

 

Now, in most cases it's not that obvious, and we'd want a backtrace. To get a backtrace do the following:

 

(gdb) backtrace
#0  0x000000000043a436 in HandleEvent (event=0x7fbffff2d0) at events.c:46
#1  0x000000000044de05 in start_rendering () at main.c:48
#2  0x000000000044e106 in main (argc=1, argv=0x7fbffff3f8) at main.c:149
(gdb)

 

It'll simply show the latest function calls, and that can be used to locate the error.

 

This was a small crash-course on using gdb in GNU/Linux - could anyone make a similiar one for debugging on windows?

Edited by Wytter

Share this post


Link to post
Share on other sites

Ok, I'm still trying to crash it again, but here's a tail -10 of function_log.txt when i crashed:

multiplayer.c.process_message_from_server:315

multiplayer.c.process_message_from_server:344

multiplayer.c.process_message_from_server:435

map_io.c.load_map:297

map_io.c.destroy_map:9

multiplayer.c.process_message_from_server:315

multiplayer.c.process_message_from_server:353

multiplayer.c.process_message_from_server:819

actor_scripts.c.get_actor_heal:662

multiplayer.c.process_message_from_server:344

 

EDIT: I nota a get_actor_heal right before the crash. I'm full health now, and I seem unable to reproduce it, so I'll go hurt myself ;)

Edited by Grum

Share this post


Link to post
Share on other sites

OK, we really need all the help we can get to eliminate this bug. I added a new target to the CVS called EXTRA_DEBUG.

 

To help us, compile a client from cvs with -DEXTRA_DEBUG and change maps untill you hit the bug - if / when you hit it DO NOT OPEN THE CLIENT AGAIN!. This is very important!

 

You must instead copy and paste the last 10-20 lines of the file called function_log.txt that'll be found the same place as your chat log and el.ini.

 

When you hit the bug and post the content, please try our fix by compiling with -DPOSSIBLE_FIX and check if it's still crashing.

Share this post


Link to post
Share on other sites

Ok. Got a crash entering nordcarn.

here comes function_log.txt:

multiplayer.c.process_message_from_server:324

multiplayer.c.process_message_from_server:324

multiplayer.c.process_message_from_server:324

multiplayer.c.process_message_from_server:324

multiplayer.c.process_message_from_server:324

multiplayer.c.process_message_from_server:333

actor_scripts.c.destroy_actor:494

multiplayer.c.process_message_from_server:324

multiplayer.c.process_message_from_server:344

multiplayer.c.process_message_from_server:435

map_io.c.load_map:297

map_io.c.destroy_map:9

multiplayer.c.process_message_from_server:491

multiplayer.c.process_message_from_server:653

multiplayer.c.process_message_from_server:315

multiplayer.c.process_message_from_server:344

multiplayer.c.process_message_from_server:435

map_io.c.load_map:297

map_io.c.destroy_map:9

multiplayer.c.process_message_from_server:653

 

Also, I have a crappy connection today, and get tons of resyncs, may be it's related somehow.

Edited by pavel

Share this post


Link to post
Share on other sites

Thanks for your feedback. Those who this happens to, could you try with -DPOSSIBLE_FIX? And could anyone compile a version for Windows and Linux with -DPOSSIBLE_FIX so we can see if we fixed it..? :P

 

 

If the POSSIBLE_FIX doesn't fix the problem, and you are capable of programming, please add the following on strategic places in the source ( dig your way down ) and see if you can find it:

 

#ifdef EXTRA_DEBUG
      ERR();
#endif

 

We are though pretty sure that the bug has been fixed in POSSIBLE_FIX - in all cases that fixes some possible threading errors.

Edited by Wytter

Share this post


Link to post
Share on other sites

I compiled it yesterday with -DPOSSIBLE_FIX and tried to enter nordcarn again. It didn't crash, but it became locked after some retries. Client displayed the NC cemetary instead of the entrance and I couldn't move. I could eat though.

Share this post


Link to post
Share on other sites
I compiled it yesterday with -DPOSSIBLE_FIX and tried to enter nordcarn again. It didn't crash, but it became locked after some retries. Client displayed the NC cemetary instead of the entrance and I couldn't move. I could eat though.

Hmm, sounds like some strange lockup during the slide-in with draw_scene... Could it be the timer locking up?

Edited by Wytter

Share this post


Link to post
Share on other sites

Could you try do add an event like the following (HandleEvents in events.c):

 

if(event->key.keysym.sym==SDLK_F5) SDL_SetTimer(0, NULL);

And tell me if it'll lock up in the same way that you experienced?

Edited by Wytter

Share this post


Link to post
Share on other sites

OK, added some code to the CVS that checks for a timer failure and restarts it if that happens.

Share this post


Link to post
Share on other sites
I compiled it yesterday with -DPOSSIBLE_FIX and tried to enter nordcarn again. It didn't crash, but it became locked after some retries. Client displayed the NC cemetary instead of the entrance and I couldn't move. I could eat though.

That happened to me once, with the official client.

I only remember lagging like hell and trying to get from IP to WS

Share this post


Link to post
Share on other sites
I compiled it yesterday with -DPOSSIBLE_FIX and tried to enter nordcarn again. It didn't crash, but it became locked after some retries. Client displayed the NC cemetary instead of the entrance and I couldn't move. I could eat though.

That happened to me once, with the official client.

I only remember lagging like hell and trying to get from IP to WS

cvs up and it should never happen to you again ;-)

Share this post


Link to post
Share on other sites

Hello Wytter,

 

this never happened to me. After you asked for help in-game yesterday, I updated my cvs sources,

recompiled using the -DEXTRA_DEBUG define and ran the game for ~3 hours within gdb. I excessively changed map in garg cave on whitestone map and also changed map from whitestone to nordcarn very much. Later I was in portland and harvested some blue lupines and changed map to flower shop in portland for quite some times. When leaving the flower shop I am immediately opening up the map screen to click back to the flowers, but it didn't crach once.

 

I use a cvs client most of the time, but the game never crashed or frozes for me.

 

Actually I do not really know the source code very well. Is there a place where it is reasonable to put a breakpoint for this?

 

cvs up and it should never happen to you again ;-)

Or does this mean, this problem is now fixed?

Share this post


Link to post
Share on other sites

Well, _some_ bugs have been fixed - but they are basically theoretical and could just theoretically lead to a bug like this (we are simply not sure if they were causing this, but it is very likely).

 

The threading bug with map changes occurs almost never in gdb - at least I have never been able to catch it. However, outside gdb I was capable of catching it with my old AthlonXP, but I am beginning to fear that my Athlon64 3500+ is too fast to catch this (unless I am _extremely_ lucky).

 

We believe that the bug has been fixed, but I simply cannot say for sure untill we have more people testing with -DPOSSIBLE_FIX :-)

 

The "cvs up... never.. again" was for the timer related bug - it now checks if a timer is lagging severely behind, which would mean that it has stopped... If that's the case the timer will be restarted.

 

In case anyone didn't see and can't get the newest CVS to compile, I cleaned up the timers (and added one timer that repeats twice a second, for general purpose usages) and added them to timers.{c,h}. We were using the SDL_SetTimer which is obsolete - I changed that to SDL_AddTimer instead.

Edited by Wytter

Share this post


Link to post
Share on other sites

just saw, that I also used POSSIBLE_FIX defined, because it is already defined on the CFLAGS line. I added it to the OPTIONS line and when recompiling I noticed that it was defined twice.

 

if the threading bug seems not to be triggered with gdb, I will continue running without gdb. just as always.

Share this post


Link to post
Share on other sites

Yep, since we can't seem to trigger it in gdb, we created the EXTRA_DEBUG target that'll help us in at least tracking it down. I just found some other potential errors (with strange compiler optimization patterns) in actor_scripts.c and added the new POSSIBLE_FIX to cvs.

Share this post


Link to post
Share on other sites

Thanks to CrusadingKnight there's now a Windows build available here:

http://bjorn_mm.users.whitehat.dk/elc_patches/EL.zip

 

Please note down any unusual activity (you might get a message saying that your timer is restarting - please give feedback on when this happens, how high the load was on your system etc.). But the main purpose is to test if the client crashes at any time...

 

Since this is a a lot newer client than the one that was released in August, here's a few threads that'll help you fix any problems that you may experience:

 

Crash on startup:

Get the e3dlist.txt:

http://cvs.berlios.de/cgi-bin/viewcvs.cgi/...type=text/plain

 

You might want to use a newer el.ini, but it's not required:

http://cvs.berlios.de/cgi-bin/viewcvs.cgi/...type=text/plain

 

You might run into problems with having letters in the water - do the following:

What you will need to do to get it to work is:

Move textures/sky.bmp to tiles/tile0.bmp

Move textures/water2.bmp to tiles/tile231.bmp.

Any new water textures should be called tile232.bmp... to tile 254.bmp. If more water tiles are needed, it's easy to change (the macro is_water_tile in reflection.h)

To avoid having to update existing maps, tile0 is automatically replaced by tile231 in dungeons.

 

If you don't see any particles, get the following:

http://www.geocities.com/quelschanak/part.zip

 

Furthermore if you haven't run with one of the newer clients, you'll need to get some newer DLL's. They are mentioned in the following thread:

http://www.eternal-lands.com/forum/index.php?showtopic=9911

 

You might also need OpenAL:

http://developer.creative.com/articles/art...=OpenALwEAX.exe

Edited by Wytter

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×