Jump to content
Eternal Lands Official Forums
Rhonan

Client crash: out of memory

Recommended Posts

O/S: Fedora Linux, 32-bit

 

There appears to be an issue when updates (like custom clothing) are being downloaded. The files download for a while, then downloads just stop, and then changing maps to a large map like White Stone will crash the client.

 

The client crash appears to be caused by an unchecked pointer in map.c:build_path_map() where the calloc() fails, pf_tile_map is assigned NULL, and it is never checked. I'm not sure what you would want to do in the case that it returns NULL and you can't build the map, but at least reporting the failed calloc() would be a start and maybe exiting gracefully. I'll continue to track down where the memory is leaking to.

 

static __inline__ void build_path_map()
{
       int i, x, y;

       //create the tile map that will be used for pathfinding
       pf_tile_map = (PF_TILE *)calloc(tile_map_size_x*tile_map_size_y*6*6, sizeof(PF_TILE));

       i = 0;
       for (y = 0; y < tile_map_size_y*6; y++)
       {
               for (x = 0; x < tile_map_size_x*6; x++, i++)
               {
                       pf_tile_map[i].x = x;
                       pf_tile_map[i].y = y;
                       pf_tile_map[i].z = height_map[i];
               }
       }
}

Edited by Rhonan

Share this post


Link to post
Share on other sites

O/S: Fedora Linux, 32-bit

 

There appears to be an issue when updates (like custom clothing) are being downloaded. The files download for a while, then downloads just stop, and then changing maps to a large map like White Stone will crash the client.

 

The client crash appears to be caused by an unchecked pointer in map.c:build_path_map() where the calloc() fails, pf_tile_map is assigned NULL, and it is never checked. I'm not sure what you would want to do in the case that it returns NULL and you can't build the map, but at least reporting the failed calloc() would be a start and maybe exiting gracefully. I'll continue to track down where the memory is leaking to.

 

static __inline__ void build_path_map()
{
       int i, x, y;

       //create the tile map that will be used for pathfinding
       pf_tile_map = (PF_TILE *)calloc(tile_map_size_x*tile_map_size_y*6*6, sizeof(PF_TILE));

       i = 0;
       for (y = 0; y < tile_map_size_y*6; y++)
       {
               for (x = 0; x < tile_map_size_x*6; x++, i++)
               {
                       pf_tile_map[i].x = x;
                       pf_tile_map[i].y = y;
                       pf_tile_map[i].z = height_map[i];
               }
       }
}

 

===== Update ====

Underlying issue appears to be missing SDL_WaitThread. According to the SDL documentation wiki:

 

"Even after the procedure started in the thread returns, there still exist some resources allocated to the thread. To free these resources, use SDL_WaitThread to wait for the thread to finish and obtain the status code of the thread. If not done so, SDL_CreateThread will hang after about 1010 successfully created threads (tested on GNU/Linux)."

 

The appropriate solution is to add a non-local thread pointer in update.c to track the thread so that SDL_WaitThread can be called when the thread completes.

 

Top of file, declare:

SDL_Thread *download_cur_thread = NULL;

In handle_file_download(), where download_cur_file is set to NULL:

// Add appropriate error checking to make sure download_cur_thread is not NULL
SDL_WaitThread(download_cur_thread, NULL);
download_cur_thread=NULL;

In http_threaded_get_file(), modify the thread creation call:

download_cur_thread = SDL_CreateThread(&http_get_file_thread_handler, (void *) spec);

 

My files are now all messed up with debugging code, but if you need a clean diff, let me know. With the above code added, instead of hanging after ~246 custom files downloaded, the client continues to download more files. And I can enter the White Stone map successfully after 15 minutes, whereas before the client would segfault.

 

[Rhonan@devhost .elc]$ grep Finished error_log.old | wc -l

246

[Rhonan@devhost .elc]$ grep Finished error_log.txt | wc -l

776

 

 

There is another call to SDL_CreateThread in update.c:do_updates() that should probably be fixed as well, but it's only a single call and unlikely to cause issues.

 

 

 

If you have questions, feel free to PM me in game. I think the fix is relatively straighforward.

 

--Rhonan

Edited by Rhonan

Share this post


Link to post
Share on other sites

This may be useful:

 

 

RCS file: /cvsroot/elc/elc/update.c,v
retrieving revision 1.48
diff -r1.48 update.c
49a50
> SDL_Thread    *download_cur_thread;
76a78
> 		download_cur_thread= NULL;
408a411
> 	SDL_WaitThread(download_cur_thread,NULL);
409a413
> 	download_cur_thread = NULL;
481c485
< 	SDL_CreateThread(&http_get_file_thread_handler, (void *) spec);
---
> 	download_cur_thread = SDL_CreateThread(&http_get_file_thread_handler, (void *) spec);

Share this post


Link to post
Share on other sites

Fantastic! This looks like a strong possibility for a long standing issue. I'll give your fix a test and hopefully we can lay this particular bug finally to rest. Thanks for investigating. :)

Share this post


Link to post
Share on other sites

Wow, that was a hug memory leak. I've committed a fix to CVS based on your suggestion. Again a big thank you Rhonan. If this is, as I suspect the source of the Linux client crash when custom clothes are enabled, then I've spent ages looking for it. If only I had looked at the memory usage! As for checking the return status of mallocs. Of course we should do that but there are many places to fix and all the crash reports have been different places. A job for another day....

Edited by bluap

Share this post


Link to post
Share on other sites

Thanks for the fix! It looks like it should work, and not have an issue with parallel downloads of updates and custom clothing files that my abbreviated patch would have had. I'll check it out when I get a chance.

 

By the way... are there special compile time options you use? or to you make a specific target for the linux machines? When I do

 

# make -f Makefile.linux

 

my executable is over 1M, whereas the one that comes in the zip is under 700K. If I strip the symbols, mine goes down to under 350K, so I'm guessing that's not it.

 

Thanks for the quick implementation. I appreciate it, and I'm sure others will, too. :)

 

-Rhonan

Share this post


Link to post
Share on other sites

The default build includes symbol information for debuggers. "make -f Makefile.linux release" is optimised and excludes the symbols. The versions included with the .zip download are compiled with "make -f Makefile.linux static". I built the last two releases using this target on Ubuntu hardy machines (other platforms may or may not work out of the box), the static build includes most of the required system libraries built into the executable and so, while it will run on many different distros, is much bigger. All these versions can be stripped to some extent.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×