All work units show "computation error" when computer is on for over 24 hours except ones that are "Task suspended at users request"


log in

Advanced search

Questions and Answers : Windows : All work units show "computation error" when computer is on for over 24 hours except ones that are "Task suspended at users request"

1 · 2 · Next
Author Message
Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1286838 - Posted: 22 Sep 2012, 22:32:27 UTC

I have been a SETI@Home user since a few months after the project started. For the last couple of months I have had to restart my computer every day or else all the tasks show "computation error" unless I have suspended them. I am running 3 computers and this is happening on 2 of them. All 3 are running Windows XP. A Toshiba Satellite laptop with an Intel Celeron processor and Compaq Tower with an AMD processor are both doing this but an older Compaq Presario 2500 [2585] is still working fine even when left running for 2 or 3 days, as has always been the case since I started in the program. I also leave the Task Manager on at all times and when SETI shows "computation error" on all tasks the Task Manager acts up, showing clear spaces where the borders are. Sometimes, once this happens, when I try to restart either computer it tells me I do not have permission to restart the computer and have to log off first. Once restarted, SETI will work fine again for 1 day and then act up if I do not reboot every 24 hours. I thought it was a virus and have run Malwarebytes and Microsoft Security Essentials several times and have not found a virus. Do I need to uninstall BOINC and re-install it? There have been several times that I have had a long list of work units waiting to be processed and have had all of them turn to "computation error" so I now suspend all tasks except 3 or 4 in case I don't reboot in time and the project is out of work to send out. As I mentioned, I have been doing this for a couple of months now but it is really getting old. Any suggestions would be greatly appreciated. Thank you for your time and help. Have a Good One! Love, Peace and Perception
Wizard2468

____________

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 30
United States
Message 1286859 - Posted: 22 Sep 2012, 23:45:47 UTC

Most of the errors I see on all three of your rigs are as a result of a "user abort request", i.e., you cancelled the work unit during download or processing. There are a few error code -1's, which seem to indicate that BOINC thinks you have no space on your hard drive.

I'd check your hard drives for potential problems. These appear to be older systems, so there's a good chance the HD may be on it's way out.

Note: you seem to be aborting all your AP work. If you don't want to process AP, go into your account preferences and uncheck AP v6.
____________

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1286871 - Posted: 23 Sep 2012, 0:26:26 UTC - in response to Message 1286859.

Actually, you may have something about the hard drives. The one computer that is not having a problem I just put a new hard drive in this spring. The Toshiba has the old hard drive I had put in the Compaq laptop about 5 years ago and does have little free disc space, only 750 MB free on a 111 GB drive. The Compaq tower has a 300 GB drive with plenty of free space but is also about 5 years old and is actually from a Maxtor external drive unit. I have run Seagate Sea Tools for Windows on it and it reads OK but that has been a problem drive in the past. I have it set up with 2 partitions and the C: partition is 127 GB with 27.7 GB free space. I had swapped drives around after purchasing a new drive for the Compaq laptop about a year ago, then I had the Toshiba and the Compaq tower given to me about 9 months and 6 months ago, respectively. You probably do not see many error messages because I have been very consistent about rebooting every day for over 2 months now. Would it be helpful to allow one of them [probably the Compaq tower] to go ahead and fail a few times to see what error messages you see then? In fact, it should be about to mess up very soon, it has been running for 1 day and 5 hours right now. And thank you for the info about the Astropulse checkbox in the account preferences. I have not looked at my preferences since before you started Astropulse so I did not know there was such a checkbox. I started rejecting Astropulse because the Compaq laptop, the one working correctly, has a blue screen error occasionally and I kept losing days of work credit. It would not recover the work done after the blue screen error many times and that is a very slow machine to start with. Also, when I do reboot the 2 problem machines I start the task manager because BOINC will show the task as running but it actually is not. It shows the seti program under processes but it does not use any of the CPU unless I exit BOINC, wait until the seti program disappears from the task manager processes and then restart BOINC. Then the task bar icon will go all green and I know it is processing the work units. Thank you again for the info, time and help. I will unsuspend all the work units on the tower right now and leave it run until it messes up so we can see if that produces any other error messages. Thank you again. Have a Good One! Love, Peace and Perception
Wizard2468

____________

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1286918 - Posted: 23 Sep 2012, 3:25:55 UTC - in response to Message 1286859.
Last modified: 23 Sep 2012, 3:30:30 UTC

BTW, I just realized that the units I have aborted are ones that never did start usng the CPU even after a couple of restarts. They would show running and the time run but the percent would never change and the CPU usage was 0.
Wizard2468
____________

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1286979 - Posted: 23 Sep 2012, 7:38:09 UTC - in response to Message 1286859.

OK, the tower is messing up as I write this. It is a Compaq SR5703WM. I am running at less than half the page file history at 1.75 GB. It finished the work unit it was working on at 1 day and 5 hours, started the next unit and apparently ran it for 17 minutes and then put computation error, simply put computation error on the next unit with no time run or to completion and had 14 minutes on the next unit when I noticed the problem with 0% completed. The task manager was messed up, the tab for the 5 IE8 windows I had open did not show any text and when I tried to install the latest update was told it was not installed. Now the BOINC manager has disconnected from the local client and everything is gone. When you open the start button there is no text or icons, it is all blank. I am now going to reboot the tower. I had to exit the BOINC manager, which was difficult because the tabs did not work until I closed the window and reopened it, and close 2 IE8 windows to get things to work enough to reboot. The update reappeared so I am trying to install that now before I reboot. This time the updates were installed. When rebooted, BOINC came up fine and started running properly with only the 3 previously mentioned computation error work units; the 1 uploading unit and 30 ready to start work units are fine. I will upload it and the errors when I finish writing this. Oh, and not this time, but many times when it acts up and I reboot, the shutting down screen and saving your settings screen are small boxes instead of full screen as usual. I hope this info helps resolve this. Has anyone else had anything similar happen? Or am I the Lone Ranger? hahaha If you need any more info please let me know. Thank you again for your time, patience and understanding with this problem. Have a Good One! Love, Peace and Perception
Wizard2468
____________

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8127
Credit: 52,603,225
RAC: 75,150
United Kingdom
Message 1286984 - Posted: 23 Sep 2012, 8:13:02 UTC

One tip - when reporting problems can you please give the project ID of the machine you are talking about as we can't see descriptions like "compaq laptop".
It does look as if you are having hardware issues with your ageing cruncher farm, or with some strange interaction - I too have noticed if there is an MS update sitting around waiting to install my PC behaves badly, once installed everything returns to normal.
Some of the error messages your crunchers have returned taken together suggest hard disk problems, as do your BSoDs, while S@H isn't particularly disk intensive it does rely on having a good hard disk as data is written to it every few seconds, and thus files tend to end up being very fragmented on crowded hard disks - hmmm, there's a thought - have you tried running the XP disk checker and defragment tools? They might show up some errors or messages, the only trouble being they might not run if the disk is very full (try emptying the trash and clearing out your browser cache and history before running - why does IE8 leave so much rubbish on the disk, even more than FireFox, but not as bad as Chrome....)
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1287005 - Posted: 23 Sep 2012, 9:51:43 UTC - in response to Message 1286984.

No problem. The Compaq tower, the one that just messed up, is ID: 6692865 and is named wizardt2 [tower 2]. The Toshiba laptop that messes up but has very full hard drive is ID: 6176446 and is named WIZARDLAPTOP2. The Compaq laptop that is not having a problem is ID: 5738144 and is named wizardlaptop. I do defrag fairly often, especially since I have started having problems. ID: 6176446 does have too full of a hard drive to defrag properly, but ID: 6692865 has plenty of space for the defrag in the C: partition at 21% free on that partition, 128 GB in size and 27.69 GB free according to the defrag program. The second partition on that hard drive, D:, is 151 GB, has 110 GB free or 72%. There does not have to be updates waiting to be installed for the problem to appear, that really does not seem to be associated with it at all. The only thing that appears to be associated with the problem is the computer to run for over 1 day and a few hours without being rebooted. Once it has run this long, the next time it finishes a unit and starts a new unit the computation error messages start, other programs are effected and the problems noted earlier, such as problems closing programs and problems rebooting appear. Defrag said the C: drive or partition did not need to be defragged but I have just defragged it anyhow. ID: 6692865 has 3046 MB initial virtual memory paging file size and 6092 MB maximum virtual memory paging file size. One other thing that may be relevant is that I did use the Windows file and setting transfer wizard to transfer the info from the Toshiba laptop, ID: 6176446, to the reformatted hard drive I put in the tower, ID: 6692865, but both did work for several months before this problem started. But I did at first think maybe I had transferred a problem as well although it would have had to have been a delayed problem as both did work for quite some period of time. The defrag report, copied and pasted, is:

Volume T2A (C:)
Volume size = 128 GB
Cluster size = 4 KB
Used space = 100 GB
Free space = 27.69 GB
Percent free space = 21 %

Volume fragmentation
Total fragmentation = 1 %
File fragmentation = 2 %
Free space fragmentation = 0 %

File fragmentation
Total files = 221,233
Average file size = 642 KB
Total fragmented files = 1
Total excess fragments = 4
Average fragments per file = 1.00

Pagefile fragmentation
Pagefile size = 2.97 GB
Total fragments = 2

Folder fragmentation
Total folders = 16,348
Fragmented folders = 1
Excess folder fragments = 0

Master File Table (MFT) fragmentation
Total MFT size = 273 MB
MFT record count = 238,104
Percent MFT in use = 85 %
Total MFT fragments = 3

--------------------------------------------------------------------------------
Fragments File Size Files that cannot be defragmented
None

I will run a scan on the hard drive on ID: 6692865 and post those results later when the check disk has finished running. If there is any other info or screen shots that would be helpful please let me know. Thank you for your time and effort. Have a Good One! Love, Peace and Perception
Wizard2468

____________

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1287014 - Posted: 23 Sep 2012, 11:20:10 UTC - in response to Message 1286984.

CHKDSK just finished and the system rebooted after it was finished without any alerts or warnings. Is there somewhere you can read a log of the CHKDSK results? I looked under computer management but did not see it. Or is it the old "No news is good news" bit?
Wizard2468
____________

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8127
Credit: 52,603,225
RAC: 75,150
United Kingdom
Message 1287036 - Posted: 23 Sep 2012, 13:11:24 UTC

The main thing is that no errors have been reported. !"£$%^ another theory out the window.
There looks to be a reasonable amount of free space (I don't start to worry until its below about 10%)

Ah, its an AMD, if my memory is working today, there was a problem with AMDs and a specific version of either BOINC or the S@H stock S@H application. You say this problem started after a few months of good running - did you happen to update either BOINC or the S&H application just before the problem started?


(Thankfully you don't have any GPUs on your systems, they are the number one pain to diagnose problems with....)
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 356,780
RAC: 13
Germany
Message 1287048 - Posted: 23 Sep 2012, 14:18:53 UTC - in response to Message 1287014.

Is there somewhere you can read a log of the CHKDSK results?

Yes, in the windows system event log.

Gruß,
Gundolf

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1287057 - Posted: 23 Sep 2012, 14:57:47 UTC - in response to Message 1287036.

Yes, there was a period a few months ago when 2 or 3 new versions of SETI came out in a row and notices appeared in the BOINC manager that a new version was available. I always went to the web site and downloaded the new version and that was around the time problems started. I think that was why I put up with it for a while; I thought the multiple new versions were probably because of the problem. I save each program when I download it, so I have the various installation programs still on my hard drive. Should I revert to one of the earlier versions? Is any particular version the best for AMD processors? Will it let me go backwards without telling me I already have a newer version and that I should use that or will I have to uninstall and re-install BOINC?
Wizard2468
____________

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1287062 - Posted: 23 Sep 2012, 15:16:40 UTC - in response to Message 1287048.

OK, I thought that was what I was looking at in the computer management program. In the tree at the left,I am under system tools - event viewer - system. There are not any entries under system tools - performance logs and alerts - counter logs, trace logs or alerts. It states there is a sample log under counter logs but I cannot seem to be able to open it. How do I open/find the windows system event log? Thank you for your help.
Wizard2468
____________

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8127
Credit: 52,603,225
RAC: 75,150
United Kingdom
Message 1287091 - Posted: 23 Sep 2012, 16:37:43 UTC

Your description of what happend a few months back sounds about right. I didn't get caught, as I like to have stable systems and wait until I see the new S@H app, or BOINC is stable before doing the update. In the case of BOINC I'm still on version 6.10.60, which I find stable. It is no easy task to roll back from the current 7.x.y BOINC family to the older 6.10 family as there are some structural differences in the way the BOINC tracking data is stored. One day I will have to update, I'll probably do that when I build my next cruncher in a couple of months time.



Your question about where chkdsk hides its data - it used to be a file in the root of your C: drive, but I have set mine up to be placed where I know its going to be rather than where windows thinks I might want it, I don't think it comes up as one of the options on the system log viewer - I never have to use that tool to find it. The main thing is it did the auto-reboot without showing you a screen full of error messages about faulty files/folders/lost fragments, in which case the log file will only have the run information, and which disk(s) were checked.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 356,780
RAC: 13
Germany
Message 1287098 - Posted: 23 Sep 2012, 16:48:54 UTC - in response to Message 1287062.

OK, I thought that was what I was looking at in the computer management program. In the tree at the left,I am under system tools - event viewer - system.

Yes, that's what I meant, too - almost. Sorry, I have the German version, so it's sometimes difficult to get the correct terms.

There should be more entries within event viewer, namely applications (Anwendung) and security (Sicherheit). Didn't you find the chkdsk logs in the application section?

Gruß,
Gundolf

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1287116 - Posted: 23 Sep 2012, 17:04:28 UTC - in response to Message 1287091.

Yes, that 6.10.60 did run very well for a very long time. I usually do the same [hence still XP hahaha] but updated SETI immediately in case there was different/better/more detailed changes in the analysis of the data. I will probably try to uninstall BOINC, delete folders/files left and reinstall with that version. I will post results here. BTW, just looked and I do have the installation program for that version saved in my Programs folder.


That's what I assumed with the log, no news is good news.

Thanks again to all forthe help. Have a Good One! Love, Peace and Perception
Wizard2468
____________

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1287127 - Posted: 23 Sep 2012, 17:16:43 UTC - in response to Message 1287098.
Last modified: 23 Sep 2012, 17:17:27 UTC

There is a large text file there, but the only things I noted were a warning and an alert but those seem to have occurred when the update failed. The recent dates at the top of the list are:

Type Date Time Source Category Event User Computer
Information 9/23/2012 5:59:41 AM SecurityCenter None 1800 N/A WIZARDT2
Information 9/23/2012 5:59:16 AM btwdins None 0 N/A WIZARDT2
Information 9/23/2012 5:59:05 AM Winlogon None 1001 N/A WIZARDT2
Information 9/23/2012 2:15:38 AM SecurityCenter None 1800 N/A WIZARDT2
Information 9/23/2012 2:15:33 AM btwdins None 0 N/A WIZARDT2
Information 9/23/2012 1:58:48 AM MPSampleSubmission None 5001 N/A WIZARDT2
Error 9/23/2012 1:58:47 AM MPSampleSubmission None 5000 N/A WIZARDT2
Warning 9/23/2012 1:50:34 AM MsiInstaller None 1001 Wizard WIZARDT2
Information 9/21/2012 2:06:04 PM SecurityCenter None 1800 N/A WIZARDT2
Information 9/21/2012 2:05:58 PM btwdins None 0 N/A WIZARDT2
Information 9/20/2012 1:17:52 AM SecurityCenter None 1800 N/A WIZARDT2
Information 9/20/2012 1:17:46 AM btwdins None 0 N/A WIZARDT2
Information 9/18/2012 10:59:10 PM SecurityCenter None 1800 N/A

I was looking more for sectors scanned, bad sectors, etc..

Wizard2468
____________

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8127
Credit: 52,603,225
RAC: 75,150
United Kingdom
Message 1287129 - Posted: 23 Sep 2012, 17:24:05 UTC

Have a look around "number crunching", there was some discussion a few months back about how to roll back from 7.x.y to 6.x.y.


The only reason this PC is running Windows 7 is I couldn't find my XP install disks when I did a major update (new disk, motherboard, CPU, memory....). I did have Vista on my previous work laptop and it was "very poor" - work then gave me a new laptop with win7 on it (same make and model of laptop, but I'd complained so much they decided replace was the best way forward). Win 7 is so much better than Vista, but is different to XP. One of the guys in the office has just been given a laptop with win8 on it, his opion is "not very good, in fact CE was better" - he's due for a roll back to win7 next week! (Corporate policy is to send out few trial laptops to heavy abusers every time a new OS comes out from MS - roll back means the level of complaints has been too high)
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1287140 - Posted: 23 Sep 2012, 17:35:55 UTC - in response to Message 1287129.

Thanks for the info. Now that I think about it, it does seem that it was after one of the 7.x.y versions this all started. No significant loss of data usefulness to SETI by rolling back versions?

I was going to ask you about your opinion of Windows versions. Wow, didn't know one was as bad as WinCE [never had it but heard you winced every time you started it up.]. No wonder Microsoft is offering to update to win8 for only $40! I REALLY wondered why so cheap, not their style, IMHO.

Wizard2468

____________

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 30
United States
Message 1287190 - Posted: 23 Sep 2012, 18:51:07 UTC - in response to Message 1287140.

Thanks for the info. Now that I think about it, it does seem that it was after one of the 7.x.y versions this all started. No significant loss of data usefulness to SETI by rolling back versions?

Wizard2468


BOINC is just a manager. It has no effect whatsoever on the data sent by, processed and uploaded to S@H.
____________

Profile Wizard2468
Send message
Joined: 21 May 99
Posts: 27
Credit: 760,904
RAC: 383
United States
Message 1287204 - Posted: 23 Sep 2012, 19:45:55 UTC - in response to Message 1287190.

Doh! That's right, the SETI programs are automatically downloaded by the manager, you don't manually install them. Is it worth trying to save the present work units that have not been processed yet or better to just let it download new ones after the rollback?

Wizard2468
____________

1 · 2 · Next

Questions and Answers : Windows : All work units show "computation error" when computer is on for over 24 hours except ones that are "Task suspended at users request"

Copyright © 2014 University of California