New feature freeze (after 1.5.0)

Entropy · December 1, 2007

I think we should have another feature freeze until we solve the 1.5.0 bugs, pretty much any client crashes, regardless of the OS.

Even mine crashed today, so we need to really inspect the code, and just add new debugging stuff until we find the causes.

After we release either a patch or a new version, we can add more stuff, but I don't want any other update (except for a fix one) sooner than May, so we will have an ample amount of time to add new feature.

Any new code that was introduced since the 1.5.0 tag that is not bug fixes should be #ifdefed in such a way so that it will not interfere at all with the 1.5.0. code (I am talked especially about the materials code).

There is one exception from this rule:

Debugging support.

What I have in mind (and discussed with Xaphier as well), is to have an aggressive debug mode, that will log every function entry and exit to a file.

The idea is to know where the client crashed. For example, we will need something like:

	debug_log = fopen("debug.log","wb");
	fseek(debug_log, 0L, SEEK_SET);
	fwrite(stuff);
	fflush(debug_log);

Note that this debug function will overwrite the debug log each time it is called, so we won't have a huge, 20TB debug log.

When the client crashes, the player can look into it, and find which function caused the crash (the last function in there probably caused it).

Do you have any ideas/suggestions about this?

P.S. You can of course develop new code, but do not commit it unless it has to do with fixes or this debug thing.

bkc56 · December 1, 2007

Do you have any ideas/suggestions about this?

A few random thoughts:

* Opening for write (or seeking to 0) is a good idea (for log file size) but may give you some ambiguous results. If you exit a function (logged) and then crash you'll have no idea which of the N calls to that function you had just finished (you won't know which function called it). So...

* If most operations are event-driven (receive a message from the server and process it, receive input from the user and process it, etc.) I might suggest the functions write in append mode and you use the event handlers (server message, user input) to clear the log. This will still prevent the log from getting to large, but will also give you some history leading up to the crash that a single "exit function X" wouldn't.

* If you really mean logging entry/exit from EVERY function, that implies a lot of editing to add that stuff and a notable increase to the client size. Not a big deal for a debug client, but be aware.

* You'll want to #ifdef them all so you can turn them off rather than having to remove them all again (yea, obvious, but worth mentioning).

</thoughts>

Entropy · December 1, 2007

The idea is to first have some (like, say, 20) important functions monitored. Then, if it crashes in one of them, expand to the subfunctions, until we pinpoint the exact problem, or at least a small area of code. When that happens, we can add some addiitonal, context based logging.

This is my prefered debugging method when a debugger can't be used (on the server, on clients that run on computers without debuggers, etc.)

Schmurk · December 1, 2007

* You'll want to #ifdef them all so you can turn them off rather than having to remove them all again (yea, obvious, but worth mentioning).

Not a big deal. You just have to write a macro that do the job like LOG_ENTRY(func) and define it empty if you don't want any messages to be printed. It's much better than having #ifdef all around the code...

BTW, it's also my favorite debugging method and I use it in almost all the code I produce :cry:

In order to facilitate the job, you can also use compiler defined words like __file__, __line__ and __func__ and maintain a depth counter to indent the result in the log in order it is more readable...

It's just a few ideas but I'm sure that everybody had the same ones...

alvieboy · December 1, 2007

Another idea:

Have a static buffer (let's say with 512 entries) which serves as a "stack" of calling methods, one for each thread.

use some macros FUNC_ENTER() FUNC_LEAVE() to change the function stack. Something simple. Then catch some signals like SIGILL, SIGSEGV, SIGBUS and dump the "stack" to a file, with some extra info like registers.

Example:

#include <stdio.h>
#include <sys/signal.h>

static const char *fstack[8192];
static int fptr = 0;

inline void pushf(const char *f) { fstack[fptr++]=f; }
inline void popf(const char *f) { f--; }

#define FUNC_ENTER do { pushf(__FUNCTION__); } while (0)
#define FUNC_LEAVE do { popf(__FUNCTION__); } while (0)

void jkl() { FUNC_ENTER; *((char*)(0))=1; FUNC_LEAVE; }
void ghi() { FUNC_ENTER; jkl(); FUNC_LEAVE; }
void def() { FUNC_ENTER; ghi(); FUNC_LEAVE; }
void abc() { FUNC_ENTER; def(); FUNC_LEAVE; }


void my_sighandler(int s, siginfo_t *info, void *p)
{
fprintf(stderr,"Signal %d\nStack trace:\n\n",s);
while(fptr-->0) {
	fprintf(stderr," > %s\n",fstack[fptr]);
}
abort();
}

int main()
{
struct sigaction act = {
	.sa_handler = NULL,
	.sa_sigaction = &my_sighandler,
	.sa_mask = 0,
	.sa_flags = 0,
	.sa_restorer = 0
};

sigaction( SIGSEGV, &act, NULL );
FUNC_ENTER;

abc();

FUNC_LEAVE;
}

Output:

Signal 11
Stack trace:

> jkl
> ghi
> def
> abc
> main
Aborted

This can be outputed to a file, and include some extra info .

Álvaro

bluap · December 1, 2007

Another idea:

...

Álvaro

This is good, would allow the trace without effecting client speed too much in functions called a lot. Only down side I can see is that it makes core files no use. Still ok if run from something like gdb though, I guess that overrides the sighandler.

bluap · December 1, 2007

I think we should have another feature freeze until we solve the 1.5.0 bugs, pretty much any client crashes, regardless of the OS. Even mine crashed today, so we need to really inspect the code, and just add new debugging stuff until we find the causes.

I think it might be useful to produce another test client with NEW_SOUND disabled. This is because, even if folks are switching off sound and music, some NEW_SOUND code is still run. There are a few threads about with crashes related to NEW_SOUND stuff. One very easy to reproduce one here. There's also the fighting bug that some see. We can start to tidy up the protection for threads/NULL values of your_actor and actors_list[] access but an early no NEW_SOUND test client would help us be sure we are on the right track.

alvieboy · December 1, 2007

This is good, would allow the trace without effecting client speed too much in functions called a lot. Only down side I can see is that it makes core files no use. Still ok if run from something like gdb though, I guess that overrides the sighandler.

I was thinking of M$ windows people, for which core files do not exist, nor gdb. I assume we can catch the exceptions too.

But even on linux/maosx we could benefit from this - at abort() you can use gdb to backtrace, and I think I can force core even when capturing segv.

Álvaro

DogBreath · December 1, 2007

Not sure I have a suggestion on debugging, but I can give you what I've seen so far, and hopefully it'll help.

I've experienced the random crashes, and tried the no clusters (worked great for a while) then the no_tile_clusters (ack, black ground was rough.) But, what I've noticed in all of this is, it seems to be sound related. I might be doing nothing, and then someone beside me is fighting, and poof. If I turn sounds off, no crashes.

I did update my drivers (video and sound) just to be sure. And, although it's reduced it quite a bit, still happens rarely (only with sounds on.) Any graphics bugs seemed to have gone away with the current nvidia drivers (I was on the beta driver, and it was worse, actually...)

If there's a way I can help more (like trying to run a self built client) I guess I could try, but I might need some help compiling it (not totally a newbie, but not a guru either...)

There is one error message from the log (after trying to get rid of them all) that seems to be repeating every time I crash:

stop_stream - Error stopping stream - error: Invalid Value

I know it's not telling me much, but maybe it'll ring a bell to someone. This can repeat many times, and not crash, or crash the first time it happens, so it's hard to pin down why (back to the point about it possibly being sounds, in fact seems to be fighting sounds...)

Maybe this message could be expanded to include what stream is having trouble stopping? Also, it seemd to happen much more during the last invasion (hook back to fighting.. as there was much more fighting going on then.)

Hope this doesn't seem like idol rambling, just my noob analysis of what seems to be causing the crashes.

I've also had some people that I've tried to help say they had crashes when they fight (some times only after changing maps, and then coming back and fighting...)

Please let me know if there's any information I can provide to help clarify this.

Vegar · December 1, 2007

Another idea:

Have a static buffer (let's say with 512 entries) which serves as a "stack" of calling methods, one for each thread.

use some macros FUNC_ENTER() FUNC_LEAVE() to change the function stack. Something simple. Then catch some signals like SIGILL, SIGSEGV, SIGBUS and dump the "stack" to a file, with some extra info like registers.

....

Álvaro

The idea is good, but in the case of a SIGSEGV (which is the most likely cause of a crash), you can no longer trust that the pointer to the buffer is correct, nor the contents of it.

Entropy · December 1, 2007

Yes, we need reliable logging, no buffers (who knows, a pointer might trash our nice buffer). Additionally, the sigsegv method is OS dependent and doesn't work under Windows.

Of course, we can use both methods, which would allow us to at least give us some redundancy and added reliability.

BTW, some people reported that the client will crash if the sound is enabled but no sound files are found.

So I propose the sound to automatically turn off if the sound is not found.

One other problem: It seems that if the data path is not properly set, the sound files are not found from the current directory.

So can someone look at those two problems please?

Vegar · December 1, 2007

BTW, some people reported that the client will crash if the sound is enabled but no sound files are found.
So I propose the sound to automatically turn off if the sound is not found.

One other problem: It seems that if the data path is not properly set, the sound files are not found from the current directory.

So can someone look at those two problems please?

This is the exact kind of post that reminds me of the lack of a bug tracking system. It would be a lot easier and more convenient (both for developers and users) to track bugs if we had a proper bug tracking system. The current system, using the forum, is just a big mess.

Berlios has a mantis we can use, it shouldn't take long to get it up and running.

Entropy · December 1, 2007

Yeah, I know, but we are not a really big project and we don't have that many bugs. If we fix the existing bugs, we won't have to have a bug tracking system

Placid · December 2, 2007

Yeah, I know, but we are not a really big project and we don't have that many bugs. If we fix the existing bugs, we won't have to have a bug tracking system

Software with 1000+ active users is not a small project. Bug tracking software isn't for 'big projects', it's for sane (and semi-organised) developers and project leaders who like to minimise stress and make it easier on themselves and their team.

Regardless of the amount of bugs, it makes a lot of sense to use a bug tracking for ELC IMO.

Can I also take this opportunity to ask why the Linux client has the DLLs and Windows executables distributed with it? There's 6MB (~14% of the total download) in the Linux zip that is a complete waste. It's unnecessary.

Learner · December 2, 2007

Yeah, I know, but we are not a really big project and we don't have that many bugs. If we fix the existing bugs, we won't have to have a bug tracking system

Software with 1000+ active users is not a small project. Bug tracking software isn't for 'big projects', it's for sane (and semi-organised) developers and project leaders who like to minimise stress and make it easier on themselves and their team.

Regardless of the amount of bugs, it makes a lot of sense to use a bug tracking for ELC IMO.

Can I also take this opportunity to ask why the Linux client has the DLLs and Windows executables distributed with it? There's 6MB (~14% of the total download) in the Linux zip that is a complete waste. It's unnecessary.

I make it a point to add to the file what I get from Entropy, so that the windows client is also there. That way is someone needs the windows stuff, its available (like the Wine users)

Placid · December 2, 2007

I make it a point to add to the file what I get from Entropy, so that the windows client is also there. That way is someone needs the windows stuff, its available (like the Wine users)

That's noble, but how many Linux users actually need the Windows stuff? I'd bet very few do. Actually, if that's the case, why are there two separate Zips?

Edited December 2, 2007 by Placid

Vegar · December 2, 2007

Yeah, I know, but we are not a really big project and we don't have that many bugs. If we fix the existing bugs, we won't have to have a bug tracking system

I hope you're not being serious.

How many of the active developers can actually manage and organise that forum? "Forum Led by: The_Piper, Maxine, Acelon" None.

Checking if a bug has been reported before is nearly impossible, because searching the forum is a PITA. The search itself is slow and it takes ages to check each of the search results, reading all the posts to see if it might be a duplicate. Gathering info relevant to the actual bug is not an easy task either because there is so much useless chat in those threads, and often multiple bugs are reported in one thread.

How can you tell if anyone is working on resolving the bug? You can't, not without scavenging the entire thread, looking for posts from developers, then contacting them and asking if they're working on it. This is a waste of time, time that could be spent debugging. Remember that you're dealing with voluntary programmers here, not paid professionals (which certainly wouldn't crawl through all the above shit to resolve a bug). There are reasons why you're having a hard time finding people to help you on your project, and this is one of them.

If you want people to help with a project, make it easy for them to do so.

Edited December 2, 2007 by Vegar

Entropy · December 2, 2007

If we have an open bug tracking system, do you think anyone who submits bugs will check to see if a bug has been submited before?

Do you think they will not post stupid stuff such as: "Bug: Doesn't work"?

With the forums, we can at least get more details from the users, ask follow up questions, and so on. Personally I think the forums way is more flexible.

Placid: We don't have two client zips. We have an exe and a zip. The DLLs in the Linux version can be useful for those who want tot ry Wine, or for people that have dual boot and don't want to download both the Linux and the Windows version.

P.S. I dn't want any bug tracking system posts in this thread. If you want to discuss about it, make a new thread.

Vegar · December 2, 2007

New thread here: http://www.eternal-lands.com/forum/index.php?showtopic=38934

bluap · December 2, 2007

Entropy ask me to post the status here, so...

I've been working on fixing the map change crash bug reported here. I have fixed that bug, it was the NEW_SOUND code accessing out of date actor information. It could strike whether or not sound was actually enabled to play. While fixing this, I went though all access of the actor information checking and fixing thread issues. There were some, again mainly in NEW_SOUND, but else where also. This work may have potentially fixed other bugs, like the crash/freeze on fighting - but I was never able to trigger that on demand.

I propose to give what I've done a final check and then, rather than commit the changes, submit a patch. That way it can be tested by others before I mess up CVS. If this is OK, I'll do it tomorrow, way past bedtime now. Again!

Entropy · December 2, 2007

Sure, please submit the patch so that we can review it.

alvieboy · December 2, 2007

The idea is good, but in the case of a SIGSEGV (which is the most likely cause of a crash), you can no longer trust that the pointer to the buffer is correct, nor the contents of it.

You can, as long as that static buffer is not the source of the crash. Making it simple (array or char pointer and an unsigned int) you should have no problem at all.

At least Visual C++ can handle signals:

http://msdn2.microsoft.com/en-us/library/x...x12(vs.71).aspx

Álvaro

Edited December 2, 2007 by alvieboy

Entropy · December 2, 2007

We use gcc as the 'official' compiler for Windows. I am not sure how we can implement this for windows in a way that is compiler independent. If you find a way, please let me know.

bluap · December 4, 2007

Entropy ask me to post the status here, so...

I've been working on fixing the map change crash bug reported here. I have fixed that bug, it was the NEW_SOUND code accessing out of date actor information. It could strike whether or not sound was actually enabled to play. While fixing this, I went though all access of the actor information checking and fixing thread issues. There were some, again mainly in NEW_SOUND, but else where also. This work may have potentially fixed other bugs, like the crash/freeze on fighting - but I was never able to trigger that on demand.

I propose to give what I've done a final check and then, rather than commit the changes, submit a patch. That way it can be tested by others before I mess up CVS. If this is OK, I'll do it tomorrow, way past bedtime now. Again!

Well, after several days of battling, I've backed my way out of a wholesale rewrite of the actors code and settled for some targeted bug fixing. As this was a smaller change then I was thinking, I've committed to CVS directly. My commit fixes the map change crash related to sounds and adds actors mutex locks to the bits of sound executed in a thread. There are also a few extra checks for NULL pointers and some extra MUTEX_DEBUG code. Sorry for the delay.

Entropy · December 4, 2007

Np, thanks!

New feature freeze (after 1.5.0)

Recommended Posts

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in