bluap Report post Posted November 3, 2007 I have a repeatable crash when changing maps from Southern Redmoon Mines [171,110] (top of the stairs) back into the main cave. I have to be facing about due west and zoomed fully out. I'm using latest CVS RC build options, in Linux debian etch. My graphics is a GeForce4 Ti 4200. However, the crash also happens on the same machine using Ubuntu 7.10 and even if I run the windows RC145 under wine. It does not happen on my daughters laptop which uses Intel graphics. So I suspect by graphics card is to blame but bare with me. I'm also crash free if I build without CLUSTER_INSIDES defined. Neither gdb nor valgrind reveal anything useful so I set about sprinkling the code with printf() and finally tracked it down a glDrawArrays() call at line 763 of reflection.c in the draw_water_quad_tiles() function. I added some extra debug: Index: reflection.c =================================================================== RCS file: /cvsroot/elc/elc/reflection.c,v retrieving revision 1.123 diff -a -u -r1.123 reflection.c --- reflection.c 2 Oct 2007 07:29:23 -0000 1.123 +++ reflection.c 3 Nov 2007 22:01:47 -0000 @@ -760,7 +760,11 @@ } size++; } +printf("draw_water_quad_tiles final glDrawArrays start=%d stop=%d idx=%d size=%d id=%d\n", start, stop, idx, size, water_id ); +fflush(stdout); glDrawArrays(GL_QUADS, idx * 4, size * 4); +printf("glDrawArrays done\n"); +fflush(stdout); } void draw_lake_tiles() This shows that things are fine until size gets big. Here's the last few lines of debug output. Size is mostly zero, often 1 or 5 but can't cope with 19 or more it would appear. draw_water_quad_tiles final glDrawArrays start=0 stop=0 idx=0 size=0 id=90 glDrawArrays done draw_water_quad_tiles final glDrawArrays start=489 stop=508 idx=0 size=0 id=90 glDrawArrays done draw_water_quad_tiles final glDrawArrays start=0 stop=0 idx=0 size=0 id=90 glDrawArrays done draw_water_quad_tiles final glDrawArrays start=489 stop=508 idx=0 size=0 id=90 glDrawArrays done draw_water_quad_tiles final glDrawArrays start=0 stop=0 idx=0 size=0 id=90 glDrawArrays done draw_water_quad_tiles final glDrawArrays start=489 stop=508 idx=0 size=19 id=90 Segmentation fault I'm not familiar enough with openGL calls to know how to proceed but will happily add more debug if it helps. Share this post Link to post Share on other sites
bluap Report post Posted November 3, 2007 This shows that things are fine until size gets big. Here's the last few lines of debug output. Size is mostly zero, often 1 or 5 but can't cope with 19 or more it would appear. More testing. I've had a crash with size=1. I also note that size can be big and things are fine while walking around the top cave. Size may not be important... Share this post Link to post Share on other sites
Grum Report post Posted November 5, 2007 Can you try to get the values of water_buffer_reflectiv_index, water_buffer_usage, and idx when it crashes? We should have 0 <= idx < water_buffer_reflectiv_index for the first call to draw_water_quad_tiles() in draw_lake_tiles() (line 841) and water_buffer_reflectiv_index <= idx < water_buffer_usage for the second call (line 934). Share this post Link to post Share on other sites
bluap Report post Posted November 6, 2007 Can you try to get the values... Thanks Grum. It's always the second call to draw_water_quad_tiles() that causes the crash and always the three values are all zero. I guess that breaks your conditions. Interestingly, all three values of zero do not always cause a crash and can occur for both calls. The crash only happens on the map change. I added a check for your conditions and could not cause the crash any more. I think you're on to something Share this post Link to post Share on other sites
Grum Report post Posted November 13, 2007 (edited) Do you happen to use Mesa? From http://www.mesa3d.org/relnotes-7.0.2.html: Bug fixes ... # glDrawArrays(count=0) led to a crash Does the crash still hapen if you protect the calls to glDrawArrays() with an if (size > 0) Edited November 13, 2007 by Grum Share this post Link to post Share on other sites
bluap Report post Posted November 13, 2007 Do you happen to use Mesa? ... Yes I have it installed (Debian Etch) and a preview of apt-get remove shows its rather central to many applications. Its version 6.5.1-0.6: dpkg --list *mesa* | grep ^ii ii libgl1-mesa-dev 6.5.1-0.6 ... ii libgl1-mesa-dri 6.5.1-0.6 ... ii libgl1-mesa-glx 6.5.1-0.6 ... ii libglu1-mesa 6.5.1-0.6 ... ii libglu1-mesa-dev 6.5.1-0.6 ... ii mesa-common-dev 6.5.1-0.6 ... ii mesa-utils 6.3.2-2.1 ... Does the crash still hapen if you protect the calls to glDrawArrays() with an if (size > 0) Yes, unfortunately it does, good find though. In fact idx is 0, size is 15 when the crash happens. The only way I've found to prevent the crash is to test your "water_buffer_reflectiv_index <= idx < water_buffer_usage" condition and only call draw_water_quad_tiles() if that test is good. Index: reflection.c =================================================================== RCS file: /cvsroot/elc/elc/reflection.c,v retrieving revision 1.123 diff -a -u -r1.123 reflection.c --- reflection.c 2 Oct 2007 07:29:23 -0000 1.123 +++ reflection.c 13 Nov 2007 20:37:53 -0000 @@ -931,7 +931,8 @@ } get_intersect_start_stop(main_bbox_tree, TYPE_REFLECTIV_WATER, &start, &stop); - draw_water_quad_tiles(start, stop, water_buffer_reflectiv_index, water_id); + if (water_buffer_reflectiv_index < water_buffer_usage) + draw_water_quad_tiles(start, stop, water_buffer_reflectiv_index, water_id); #ifdef USE_SHADER if (water_shader_quality > 0) Share this post Link to post Share on other sites
Grum Report post Posted November 15, 2007 I seem to be unable to reproduce the crash. I have an idea, which is probably crazy, but may cause the crash: your current actor is being set between the call to build_water_buffer() and draw_water_quad_tiles(), or his position is updated in that period. In that case, the actor cluster in build_water_buffer() (where the arrays to be drawn are computed) will be different from those that are actually drawn in draw_water_quad_tiles(). Can you print the actor cluster (variable cluster) in those two functions, and see if they're different when it crashes? Share this post Link to post Share on other sites
bluap Report post Posted November 15, 2007 I seem to be unable to reproduce the crash. I have an idea, which is probably crazy, but may cause the crash: your current actor is being set between the call to build_water_buffer() and draw_water_quad_tiles(), or his position is updated in that period. In that case, the actor cluster in build_water_buffer() (where the arrays to be drawn are computed) will be different from those that are actually drawn in draw_water_quad_tiles(). Can you print the actor cluster (variable cluster) in those two functions, and see if they're different when it crashes? Standing at the top of the mine stairs I get a value of 12, as I change maps, the value goes to 0, then to 11 just before the crash: .... build_water_buffer() cluster=12 build_water_buffer() cluster=12 build_water_buffer() cluster=12 draw_water_quad_tiles() cluster=12 draw_water_quad_tiles() cluster=12 build_water_buffer() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 draw_water_quad_tiles() cluster=0 draw_water_quad_tiles() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 draw_water_quad_tiles() cluster=0 draw_water_quad_tiles() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 draw_water_quad_tiles() cluster=0 draw_water_quad_tiles() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 draw_water_quad_tiles() cluster=0 draw_water_quad_tiles() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 build_water_buffer() cluster=0 draw_water_quad_tiles() cluster=0 draw_water_quad_tiles() cluster=0 build_water_buffer() cluster=11 build_water_buffer() cluster=11 build_water_buffer() cluster=11 draw_water_quad_tiles() cluster=11 draw_water_quad_tiles() cluster=11 idx=0, size+28, water_buffer_usage=0 Segmentation fault Note this is using your modified version from CVS with the exit(1) removed. Not sure if you meant to commit those changes with the server.lst stuff... Anyhow, if you change that check so that prevents the call to glDrawArrays() the code does not crash. Share this post Link to post Share on other sites
Grum Report post Posted November 15, 2007 Oh crap.... No, that wasn't supposed to be committed. Anyway, my idea was wrong I just can't figure out why water_buffer_usage (or size) is wrong... *Goes off to fix a broken commit* Share this post Link to post Share on other sites
bluap Report post Posted November 15, 2007 Oh crap.... No, that wasn't supposed to be committed. Anyway, my idea was wrong I just can't figure out why water_buffer_usage (or size) is wrong... Ah well, but thanks for looking into this. It's tempting to just do the check and avoid the crash. May be this is an option for the release version if it never gets triggered without causing a crash. We could add a message to the error log. I can't be the only person this might be an issue for.... *Goes off to fix a broken commit* I was just typing a thing about interface.c but I note you've already seen to it. At work, we refer to these incidents as swabs left in the patient Share this post Link to post Share on other sites
Grum Report post Posted November 16, 2007 Grmbl... I have the feeling I'm missing something really obvious. In the first part of build_water_buffer(), the bbox tree is checked to see if the tile buffer needs to be updated. Can you check if the client gets past this check (i.e. skip the return statement on line ~133) and the buffer is really rebuilt with the correct visibility cluster before it's drawn? Share this post Link to post Share on other sites
bluap Report post Posted November 16, 2007 Grmbl... I have the feeling I'm missing something really obvious. In the first part of build_water_buffer(), the bbox tree is checked to see if the tile buffer needs to be updated. Can you check if the client gets past this check (i.e. skip the return statement on line ~133) and the buffer is really rebuilt with the correct visibility cluster before it's drawn? Yes it does, sometimes, but it appears unrelated to the crash. Your opinion may be different so I've added lots of debug and written the output to the log file (I'll send you the URL via PM shortly). The cvs diff reflection.c output is append to the log too. I started in the top mine, moved to the lower mine, moved back without crashing (by keeping my zoom low), back down to the bottom mine, then up again allowing a crash. Location information is also written to the log. The crash happens when the cluster tests in build_water_buffer() think that there is nothing to do, but the same tests in draw_water_quad_tiles() think there is. This should be obvious in the debug. Another thing I noticed that I had not before. To avoid the crash I change perspective, direction and/or zoom level. The fine line between crashing and not, exactly corresponds to the window background being black (no crash) or blue (crash). Weird, I'm off to hunt for where that's done.... Share this post Link to post Share on other sites
Torg Report post Posted November 17, 2007 (edited) the window background being black (no crash) or blue (crash). Weird, I'm off to hunt for where that's done....Great! Grum and I haven't been able to find it. There are bugs in it which Roja has wanted fixed (it changes) so if you can find them she'll be very happy. :-) Edited November 17, 2007 by Torg Share this post Link to post Share on other sites
bluap Report post Posted November 17, 2007 the window background being black (no crash) or blue (crash). Weird, I'm off to hunt for where that's done....Great! Grum and I haven't been able to find it. There are bugs in it which Roja has wanted fixed (it changes) so if you can find them she'll be very happy. :-) Don't get your hopes too high You probably already know this but the blue is really some texture being rendered not just a colour fill. I've just loaded the map editor and it shows the same issues. Some camera angles have a black background (low CPU) then suddenly the whole background is filed with blue (high CPU). This is for the map I'm having trouble with cont2map16_insides. Its the same blue as the water in the cave and appears/disappears at the same time but its not the same textured look. Its as if the water had spilled out from the lake.... Share this post Link to post Share on other sites
Torg Report post Posted November 17, 2007 Don't get your hopes too high You probably already know this but the blue is really some texture being rendered not just a colour fill.Oh! I didn't know that. I was looking in the completely wrong place then. I still have my hopes high. ;-) I've just loaded the map editor and it shows the same issues. Some camera angles have a black background (low CPU) then suddenly the whole background is filed with blue (high CPU). This is for the map I'm having trouble with cont2map16_insides. Its the same blue as the water in the cave and appears/disappears at the same time but its not the same textured look. Its as if the water had spilled out from the lake....Well the blue is correct from what I understand. It is meant to be the sky meeting with the water edge. There is some grey in there too, which is wrong iirc. It would be best to ask Roja about the exact details. Share this post Link to post Share on other sites
Entropy Report post Posted November 17, 2007 Well, since this is the only remaining bug that holds the update, maybe we should just not use CLUSTER_INSIDES for the official client, and just release an optional patch with that included when it becoems stable. Share this post Link to post Share on other sites
bluap Report post Posted November 17, 2007 Well, since this is the only remaining bug that holds the update, maybe we should just not use CLUSTER_INSIDES for the official client, and just release an optional patch with that included when it becoems stable. OK, a few facts/beliefs to help decide. 1) I only crash if CLUSTER_INSIDES is enabled. It's in the map transition between SM lower cave and upper. I have not experience the same crash elsewhere. Though I guess it's possible. 2) The blue/black background switching issue occurs whether or not CLUSTER_INSIDES is used. It also occurs in the map editor. It does not cause a crash but is a but annoying. When its blue, CPU usage rises dramatically for me at least. 3) The link between the blue/black issues is that if I have a black the background, I can go back and forth without crashing. If I have the blue background, I always almost always crash. The exception is if I click on the sign from lower down the stairs (I've only just realised this exception). 4) A simple check of state before doing a particular glDrawArrays() will prevent the crash. That, along with the fact excluding CLUSTER_INSIDES does not fix the blue/black issue would suggest to me we might like to go with this trap instead or excluding CLUSTER_INSIDES. Index: reflection.c =================================================================== RCS file: /cvsroot/elc/elc/reflection.c,v retrieving revision 1.125 diff -a -u -r1.125 reflection.c --- reflection.c 15 Nov 2007 21:35:32 -0000 1.125 +++ reflection.c 17 Nov 2007 12:35:19 -0000 @@ -760,7 +760,10 @@ } size++; } - glDrawArrays(GL_QUADS, idx * 4, size * 4); + if (size && idx >= water_buffer_usage) + printf("%s(): crash averted idx=%d, size+%d, water_buffer_usage=%d\n", __FUNCTION__, idx, size, water_buffer_usage); + else + glDrawArrays(GL_QUADS, idx * 4, size * 4); } void draw_lake_tiles() Share this post Link to post Share on other sites
Entropy Report post Posted November 17, 2007 I don't really care about blue and black issues, all I care about is not to crash. So if that check prevents the crash, please just commit it and let's go on with the update. Share this post Link to post Share on other sites
bluap Report post Posted November 17, 2007 I don't really care about blue and black issues, all I care about is not to crash. So if that check prevents the crash, please just commit it and let's go on with the update. OK, done. An message is written to the error log if the trap is used. Share this post Link to post Share on other sites
Grum Report post Posted November 18, 2007 Very instructive (and extensive!) log, thanks. build_water_buffer() if is true, building build_water_buffer() 3 cluster test if start=486 stop=512 l=102413 x=13 y=25 tile_cluster=11 cluster=12 ... build_water_buffer() water_buffer_usage=0 build_water_buffer() if is false, returning build_water_buffer() if is false, returning ... draw, don't rebuild (times many) ... draw_water_quad_tiles() 1 cluster test if start=486 stop=512 l=98326 x=22 y=24 tile_cluster=11 cluster=0 draw_water_quad_tiles() OK idx=0, size=0, water_buffer_usage=0 cluster=0 build_water_buffer() if is false, returning build_water_buffer() if is false, returning build_water_buffer() if is false, returning draw_water_quad_tiles() OK idx=0, size=0, water_buffer_usage=0 cluster=11 draw_water_quad_tiles() 1 cluster test else start=486 stop=512 l=102413 x=13 y=25 tile_cluster=11 cluster=11 So the last time the water buffer is built, the actor is still on cluster 12. Drawing is ok while the actor is in limbo (cluster==0), because the test tile_cluster==cluster always fails. Finally, when the actor reappears, he's on cluster 11, but the buffer isn't rebuilt. The code tries to draw the tiles on cluster 11, but fails because it doesn't have the necessary tiles. So it looks like the ide_changed flag on the bbox tree isn't set when the actor teleports (at least not for types TYPE_REFLECTIV_WATER and TYPE_NO_REFLECTIV_WATER). I'll try to track it down, but I have to leave now. Share this post Link to post Share on other sites
Grum Report post Posted November 25, 2007 Does this help? Index: new_actors.c =================================================================== RCS file: /cvsroot/elc/elc/new_actors.c,v retrieving revision 1.124 diff -u -r1.124 new_actors.c --- new_actors.c 19 Nov 2007 19:31:58 -0000 1.124 +++ new_actors.c 25 Nov 2007 13:54:58 -0000 @@ -133,7 +133,14 @@ } if (actor_id == yourself) + { + // We have just returned from limbo (after teleport, + // map change, disconnect, etc.) Since our position may + // have changed, tell the bbox tree to update so + // that our environment is recomputed. + set_all_intersect_update_needed (main_bbox_tree); set_our_actor (our_actor); + } actors_list[i]=our_actor; Share this post Link to post Share on other sites
bluap Report post Posted November 25, 2007 Does this help? You're a genius. Yes it does. With this change, the trap is no longer triggered and when the trap is removed the client does not crash! Nice one Share this post Link to post Share on other sites
bluap Report post Posted November 29, 2007 This has been working fine since I applied it to my client so I've just committed Grum's fix and removed the previous trap code. Share this post Link to post Share on other sites