More Unhandled Exceptions with 4.45

Message boards : Number crunching : More Unhandled Exceptions with 4.45
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
genes
Volunteer tester
Avatar

Send message
Joined: 25 May 99
Posts: 117
Credit: 580,187
RAC: 0
United States
Message 124538 - Posted: 17 Jun 2005, 11:36:47 UTC
Last modified: 17 Jun 2005, 11:39:55 UTC

I'm seeing Unhandled Exceptions with 4.45. The message box says simply this:

An unhandled exception occurred. Press "abort" to terminate the program, "retry" to exit the program normally and "ignore" to try to continue.


It seems to be happening about once a day to one of my 6 computers, not always the same one each time. I have always chosen "retry", and then the Boinc CC exits. I start it up again, and it continues normally.

Here's what's in stderrgui.txt on the machine I'm currently using:

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004190C9 read attempt to address 0x00000000

1: 05/14/05 17:24:05



***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004191E9 read attempt to address 0x00000000

1: 06/06/05 18:39:11



***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004191E9 read attempt to address 0x00000000

1: 06/07/05 06:59:42



***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004190B9 read attempt to address 0x00000000

1: 06/17/05 07:13:29


Actually, looking at these dates, only the last one could be from 4.45, so it's been going on for a bunch of versions. I'll have to check the other machines.


ID: 124538 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21773
Credit: 7,508,002
RAC: 20
United Kingdom
Message 124557 - Posted: 17 Jun 2005, 12:53:56 UTC - in response to Message 124538.  

An unhandled exception occurred. Press "abort" to terminate the program, "retry" to exit the program normally and "ignore" to try to continue.

Hardware fault?

Does that system check out ok with Memtest86+ and GIMPS mprime95 torture test?

Good luck,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 124557 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 124559 - Posted: 17 Jun 2005, 12:59:16 UTC

It is more likely that under some circumstances a null pointer is used.


BOINC WIKI
ID: 124559 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 124564 - Posted: 17 Jun 2005, 13:22:01 UTC - in response to Message 124559.  

It is more likely that under some circumstances a null pointer is used.

There was a note in one of the mailing lists that there is a possibility that this may occur. I think that the fix will be in the 4.46 version.

Now, if I could just get a 4.45 version of the Macintosh client ...
ID: 124564 · Report as offensive
Profile Celtic Wolf
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 3278
Credit: 595,676
RAC: 0
United States
Message 124565 - Posted: 17 Jun 2005, 13:25:18 UTC - in response to Message 124564.  

It is more likely that under some circumstances a null pointer is used.

There was a note in one of the mailing lists that there is a possibility that this may occur. I think that the fix will be in the 4.46 version.

Now, if I could just get a 4.45 version of the Macintosh client ...


You would probably get it the same time I geta 4.45 Solaris Client


I'd rather speak my mind because it hurts too much to bite my tongue.

American Spirit BBQ Proudly Serving those that courageously defend freedom.
ID: 124565 · Report as offensive
Kathy
Avatar

Send message
Joined: 5 Jan 03
Posts: 338
Credit: 27,877,436
RAC: 0
United States
Message 124590 - Posted: 17 Jun 2005, 15:15:48 UTC

I'm getting this message on a regular basis with 4.45 (although I have had it with other versions as well).
ID: 124590 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 124694 - Posted: 17 Jun 2005, 23:13:23 UTC

Using the information from this page:
http://boinc.berkeley.edu/debug_win.php

Could somebody install the symbol files so the next time you experience the crash, we can track it down and fix it.

Thanks in advance.

----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 124694 · Report as offensive
Profile Fred G
Avatar

Send message
Joined: 17 May 99
Posts: 185
Credit: 24,109,481
RAC: 0
United States
Message 124730 - Posted: 18 Jun 2005, 1:48:29 UTC

I was getting them also. What I did and it seems to have fixed the problem, was to clean my registry. After installing several versions I figured there was something in the registry that was left over that it just didn't like. I ran RegSeeker twice and made sure everything was removed that wasn't needed.

http://www.teamstarfire.org/
ID: 124730 · Report as offensive
genes
Volunteer tester
Avatar

Send message
Joined: 25 May 99
Posts: 117
Credit: 580,187
RAC: 0
United States
Message 124744 - Posted: 18 Jun 2005, 2:42:38 UTC

OK, got 'em. I'll stick these in each machine, so next time it happens, we'll be ready!



ID: 124744 · Report as offensive
genes
Volunteer tester
Avatar

Send message
Joined: 25 May 99
Posts: 117
Credit: 580,187
RAC: 0
United States
Message 125406 - Posted: 19 Jun 2005, 18:13:52 UTC
Last modified: 19 Jun 2005, 18:15:03 UTC

Just got another one. Here's what was in stderrgui.txt:


***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004190B9 read attempt to address 0x00000000

1: 06/19/05 14:09:14
1: c:\\boincsrc\\main\\boinc_public\\clientgui\\mainframe.cpp(567) +0 bytes (CMainFrame::SaveState)



It looked like it was just starting up a Predictor WU.


ID: 125406 · Report as offensive
genes
Volunteer tester
Avatar

Send message
Joined: 25 May 99
Posts: 117
Credit: 580,187
RAC: 0
United States
Message 125646 - Posted: 20 Jun 2005, 14:32:59 UTC
Last modified: 20 Jun 2005, 14:34:32 UTC

Here's another I just got, waking up one of my machines at work:

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004190B9 read attempt to address 0x00000000

1: 06/20/05 10:04:58

Unfortunately, it happened before I could put the symbols in. It appears from the timestamp that it was the act of waking it up from a "blank screen" condition that triggered it.

I now have symbols installed on all my machines, so I'll be able to catch the next one no matter where it happens.

Edit- looks like the same address as the one from home (below).

ID: 125646 · Report as offensive
genes
Volunteer tester
Avatar

Send message
Joined: 25 May 99
Posts: 117
Credit: 580,187
RAC: 0
United States
Message 126764 - Posted: 23 Jun 2005, 11:22:03 UTC
Last modified: 23 Jun 2005, 12:21:45 UTC

Here's another one I just got:



***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004190B9 read attempt to address 0x00000000

1: 06/23/05 07:16:40
1: c:\\boincsrc\\main\\boinc_public\\clientgui\\mainframe.cpp(567) +0 bytes (CMainFrame::SaveState)


By the timestamp, it crashed when I was waking up the machine from a "blank screen" state. This is a third machine. So far, it seems like the ones that crash are either Dual processor or HT. I will check the logs on the non DP machines at work.

ID: 126764 · Report as offensive
genes
Volunteer tester
Avatar

Send message
Joined: 25 May 99
Posts: 117
Credit: 580,187
RAC: 0
United States
Message 126852 - Posted: 23 Jun 2005, 15:03:02 UTC

Here's a revealing clue: the three non-HT machines I have at work have NO entries in the stderrgui.txt file. Individual apps have crashed, but not the GUI. This makes me think there is something not thread-safe in the GUI, since it is only crashing on HT or DP machines.


ID: 126852 · Report as offensive
Walt Gribben
Volunteer tester

Send message
Joined: 16 May 99
Posts: 353
Credit: 304,016
RAC: 0
United States
Message 126870 - Posted: 23 Jun 2005, 15:54:12 UTC - in response to Message 126852.  

Here's a revealing clue: the three non-HT machines I have at work have NO entries in the stderrgui.txt file. Individual apps have crashed, but not the GUI. This makes me think there is something not thread-safe in the GUI, since it is only crashing on HT or DP machines.


Can you disable the hyperthreading for a few days on one of your machines to see if thats true?
ID: 126870 · Report as offensive
genes
Volunteer tester
Avatar

Send message
Joined: 25 May 99
Posts: 117
Credit: 580,187
RAC: 0
United States
Message 126922 - Posted: 23 Jun 2005, 18:18:27 UTC - in response to Message 126870.  
Last modified: 23 Jun 2005, 18:26:44 UTC

Here's a revealing clue: the three non-HT machines I have at work have NO entries in the stderrgui.txt file. Individual apps have crashed, but not the GUI. This makes me think there is something not thread-safe in the GUI, since it is only crashing on HT or DP machines.


Can you disable the hyperthreading for a few days on one of your machines to see if thats true?


Yes, I can do that over the weekend. The machine here at work is HT, the two at home are true dual processor, so not on those.
-------
OK, HT is now off, we'll see what happens. I would expect to see no exceptions here while it is off.

ID: 126922 · Report as offensive
Bones
Volunteer tester

Send message
Joined: 25 Mar 05
Posts: 41
Credit: 535,830
RAC: 0
Australia
Message 127009 - Posted: 23 Jun 2005, 21:58:47 UTC
Last modified: 23 Jun 2005, 22:10:58 UTC

Identical error has happened on my P4 3.2 HT, XP SP2.

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004190B9 read attempt to address 0x00000000

1: 06/24/05 07:44:03
1: c:\\boincsrc\\main\\boinc_public\\clientgui\\mainframe.cpp(567) +0 bytes (CMainFrame::SaveState)

[eidt]
and also at time 24//06//05 6.28am (an hour before the exception error) I had the following problems in the stderr.txt file from the slot directories for seti 4.00 and 4.11

No heartbeat from core client for 31.000000 sec - exiting

don't know if they are related but it may help

[/edit]
ID: 127009 · Report as offensive
Profile Brian Stansbury
Volunteer tester

Send message
Joined: 4 Jun 99
Posts: 31
Credit: 91,807
RAC: 0
United States
Message 127472 - Posted: 24 Jun 2005, 21:04:25 UTC - in response to Message 127009.  

Identical error has happened on my P4 3.2 HT, XP SP2.

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x004190B9 read attempt to address 0x00000000

1: 06/24/05 07:44:03
1: c:\boincsrc\main\boinc_public\clientgui\mainframe.cpp(567) +0 bytes (CMainFrame::SaveState)

[eidt]
and also at time 24//06//05 6.28am (an hour before the exception error) I had the following problems in the stderr.txt file from the slot directories for seti 4.00 and 4.11

No heartbeat from core client for 31.000000 sec - exiting

don't know if they are related but it may help

[/edit]

Could it possibly be the optimized 4.11 client you are using? Are you using the same 4.11 client in all the affected boxes?

ID: 127472 · Report as offensive
genes
Volunteer tester
Avatar

Send message
Joined: 25 May 99
Posts: 117
Credit: 580,187
RAC: 0
United States
Message 127473 - Posted: 24 Jun 2005, 21:13:25 UTC

I get the same errors and I'm not running any optimized clients. Also, I run multiple projects, and it's crashed when running things like Predictor.

I did turn off HT on my HT machine, and so far, no exceptions. That's not proof, though, since it might not have crashed even with HT on. But if it DOES crash, even with HT off, then that rules HT out as a cause.


ID: 127473 · Report as offensive
Profile Brian Stansbury
Volunteer tester

Send message
Joined: 4 Jun 99
Posts: 31
Credit: 91,807
RAC: 0
United States
Message 127476 - Posted: 24 Jun 2005, 21:48:27 UTC - in response to Message 127473.  

I get the same errors and I'm not running any optimized clients. Also, I run multiple projects, and it's crashed when running things like Predictor.

I did turn off HT on my HT machine, and so far, no exceptions. That's not proof, though, since it might not have crashed even with HT on. But if it DOES crash, even with HT off, then that rules HT out as a cause.

I guess I have been fortunate if this is the issue, my HT CPU has not had any errors, just checked. I am only running CP and PP on this machine.
ID: 127476 · Report as offensive
Profile Remember911
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 51
Credit: 92,166
RAC: 0
United States
Message 127520 - Posted: 25 Jun 2005, 0:19:54 UTC

I've not been able to get any work since upgrading to 4.45. I even did a reset. Has anyone else experienced this or is it Seti???

ID: 127520 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : More Unhandled Exceptions with 4.45


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.