Seti_enhanced & client_state errors

Message boards : Number crunching : Seti_enhanced & client_state errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Daniel Schaalma
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 297
Credit: 16,953,703
RAC: 0
United States
Message 217364 - Posted: 18 Dec 2005, 17:30:17 UTC

I certainly hope that the issue regarding the error "can't parse state file" gets completely resolved before they even _think_ about releasing the enhanced client. I just had this error again on one of my machines early Thursday morning. Somehow the client_state.xml file gets corrupted, and all cached work, all in progress work, and all completed work not yet reported is LOST. Between my 20 machines, this error happens about once or twice per month. With the current client, the most I loose is maybe 4 hours worth of processing time, plus my whole cache of work. But with the ENHANCED client, I would really be livid if the machine was 48 hours into processing a 51 hour workunit, then for some reason the client_state file gets corrupted and I'd loose 2 DAYS worth of processing time, in addition to all cached W/U's. Unfortunately, the area where I live is susceptable to power brown-outs during bad weather, and I just can't afford a bank of UPS units that would power my whole farm for the 9 or 10 hours when I am away to work.

I'm all for analyzing the data with higher resolution, but making the process time to complete 1 W/U take up to 10 times longer than with the STANDARD 4.18 app does not make any sense. IMO, with the Enhanced app, they should break the data into W/U's that will never exceed 4 to 6 hours using an 'average' modern computer. I would define an 'average' modern computer as an AMD64 3000+ or Intel P4 3.0 GHz or equivalent with 512 Megs RAM. If such a machine cannot crunch a single W/U in less than 6 hours, then MANY people will become disinterested in this project. Espescially if the issue regarding the client_state file corruption is not resloved before the Enhanced client is released.

I know that if I started loosing that much work on a regular basis because of a corrupted client_state file or any other reason, I would think seriously about leaving this project. I have been with Seti Classic since May 1999, and with Seti BOINC since June 2004, and I have also donated funds to the project, and I would not have done all this if I wasn't really dedicated to this project. But the potential for a tremendous loss of scientific data, CPU time, and user credit makes me very skeptical of the Enhanced client without at LEAST fixing the client_state bug first.

Regards, Daniel.
ID: 217364 · Report as offensive
Profile Tern
Volunteer tester
Avatar

Send message
Joined: 4 Dec 03
Posts: 1122
Credit: 13,376,822
RAC: 44
United States
Message 217375 - Posted: 18 Dec 2005, 17:45:37 UTC

My issue isn't an "average" computer, or 6 hours, but the fact that having any WU take over 24 hours (preferably 12) on the "majority" of computers is going to totally mess up the stats. Even CPDN, with trickles, will give you at _least_ one trickle per day on any reasonable system, every other day on a slow one. If you think RAC is worthless now, imagine what it will look like when you (running multiple projects on multiple CPUs) get no credit for several days, then a big hunk of credit from several of your systems, and repeat, randomly... while someone else who can manage to do one per day with one CPU gets to a "steady state" point eventually.

Hopefully the optimizers will get the times down, but even if the slower CPUs _can_ do the work, I think the psychological factor is going to cause a lot of people to quit using the slower systems.

And yes, if there is _anything_ that can cause a corrupted client_state.xml file, then there MUST be an automated recovery procedure to use the client_state.prev when it's detected; the current method of writing both at startup makes recovery impossible. Losing work is not nice; losing many hours of work is _very_ not nice.
ID: 217375 · Report as offensive
Daniel Schaalma
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 297
Credit: 16,953,703
RAC: 0
United States
Message 217700 - Posted: 19 Dec 2005, 0:58:28 UTC - in response to Message 217375.  

My issue isn't an "average" computer, or 6 hours, but the fact that having any WU take over 24 hours (preferably 12) on the "majority" of computers is going to totally mess up the stats. Even CPDN, with trickles, will give you at _least_ one trickle per day on any reasonable system, every other day on a slow one. If you think RAC is worthless now, imagine what it will look like when you (running multiple projects on multiple CPUs) get no credit for several days, then a big hunk of credit from several of your systems, and repeat, randomly... while someone else who can manage to do one per day with one CPU gets to a "steady state" point eventually.

Hopefully the optimizers will get the times down, but even if the slower CPUs _can_ do the work, I think the psychological factor is going to cause a lot of people to quit using the slower systems.

And yes, if there is _anything_ that can cause a corrupted client_state.xml file, then there MUST be an automated recovery procedure to use the client_state.prev when it's detected; the current method of writing both at startup makes recovery impossible. Losing work is not nice; losing many hours of work is _very_ not nice.


The psychological factor is going to be an impact with even modern machines, I think. People will buy a computer from a TV ad stating that you are getting a "Blazingly fast" computer, and it ends up looking like a snail when it can't even process at least a couple of W/U's per day. And yes, without even trickle credit, the _whole_ credit system will be screwed up. Especially if the client_state bug is not fixed before the release. Enhance the resolution, absolutely. But break up the data so that the W/U's sent out for crunching by the enhanced client can be completed in a REASONABLE amount of time.

Regards, Daniel.

ID: 217700 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19057
Credit: 40,757,560
RAC: 67
United Kingdom
Message 217783 - Posted: 19 Dec 2005, 2:48:13 UTC - in response to Message 217700.  

The psychological factor is going to be an impact with even modern machines, I think. People will buy a computer from a TV ad stating that you are getting a "Blazingly fast" computer, and it ends up looking like a snail when it can't even process at least a couple of W/U's per day. And yes, without even trickle credit, the _whole_ credit system will be screwed up. Especially if the client_state bug is not fixed before the release. Enhance the resolution, absolutely. But break up the data so that the W/U's sent out for crunching by the enhanced client can be completed in a REASONABLE amount of time.

Regards, Daniel.

How many people stopped crunching for Classic when they brought out the version that increased the crunch time from a few hours to 20+. On classic the scenario was probably even worse as the credit system was 1/unit, at least here under BOINC if it takes longer you get proportionally more credits. I assume you, like me, just carried on regardless.
ID: 217783 · Report as offensive
Daniel Schaalma
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 297
Credit: 16,953,703
RAC: 0
United States
Message 217998 - Posted: 19 Dec 2005, 14:00:02 UTC - in response to Message 217783.  

How many people stopped crunching for Classic when they brought out the version that increased the crunch time from a few hours to 20+. On classic the scenario was probably even worse as the credit system was 1/unit, at least here under BOINC if it takes longer you get proportionally more credits. I assume you, like me, just carried on regardless.


My main concern here is the client_state errors that are occuring. It is irritating enough with the current client, loosing all work in progress, any finished work not yet reported, and the entire cache of waiting work. But, as I said, if one or more of my machines were many hours into processing very long workunits when the client_state file gets corrupted for ANY reason, I would be furious. This file seems to get corrupted often enough that it could be devastating to the science as well as a user's credit. Windows XP 64-Bit Edition is especially susceptable to this error. If there is a shutdown or reboot for any reason while the client is running, I can almost guarantee that the client_state file will be corrupted, and since it overwrites the backup file with the corrupt data immediately afterward, there is no possibility of recovering this lost work. This has also happened on my Win XP Pro SP2 and Win2000 SP4 machines, and even once on my Linux box. In the last 7 months, between all of my machines, this error has occured SIXTEEN times. And only ten of those occurances were due to power outages. I have no idea what caused the other six. I would notice it when one of the machines had not contacted the scheduler for a long time, check the machine, and in the BOINC message tab, I would see "can't parse state file".

All I am saying is that this bug MUST be fixed before they even THINK about sending out workunits that can take more than 12 hours to complete.

Regards, Daniel.
ID: 217998 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19057
Credit: 40,757,560
RAC: 67
United Kingdom
Message 218010 - Posted: 19 Dec 2005, 14:17:33 UTC
Last modified: 19 Dec 2005, 14:20:29 UTC

Daniel,

Have you reported your concerns on the SetiB NC board, I notice that Eric K has responded to other problems recently.

I do have to suspect that the all the occurances are power related. If a computer is running on the edge of its power envelope the sudden surge needed to write to disk could just put it over the edge for a fraction of a second and corrupt the file.

I've not seen anybody else report similar problems, but that doesn't say it isn't common and won't be a common complaint if it is issued without a fix.

P.S. what version are you presently using, 4.11 is latest.
ID: 218010 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 218032 - Posted: 19 Dec 2005, 14:55:59 UTC - in response to Message 217998.  
Last modified: 19 Dec 2005, 14:56:59 UTC

... many hours into processing very long workunits when the client_state file gets corrupted for ANY reason, I would be furious. This file seems to get corrupted often enough that it could be devastating to the science as well as a user's credit. Windows XP 64-Bit Edition is especially susceptable to this error...

I agree with WinterKnight that you've likely got HARDWARE problems causing spurious operation. Either that or your Windows is infested with something nasty.

Check out your system with Memtest86+, GIMPS mprime95 torture test, and then a HDD test diagnostic utility from your HDD manufacturer. If you cannot be 100% certain that your hardware is faultless, any attempt at computing is futile.

And there's various Linux distros that have been running fully 64-bit for years now. Take a look? ;-)

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 218032 · Report as offensive
uba36
Volunteer tester

Send message
Joined: 17 Jul 02
Posts: 74
Credit: 1,159,280
RAC: 0
Germany
Message 218070 - Posted: 19 Dec 2005, 15:58:00 UTC
Last modified: 19 Dec 2005, 16:26:53 UTC

I had an issue with a corruppted client_state file recently too. I my case, the machine failed to write the file client_stat_temp.xml (don't recall the exact file name). Unfortunately this happened with CPDN and several times in a row. I have moved the BOINC folder to another HDD and tweaked the Virus scanner a little bit. Don't know exactly what solved this issue, but my machine is running stable for 2 days now.

First I suspected that the CPDN client (sulphur ...4.22 ) was to blame, but than I read in the log that the seti client was up and running and CPDN preempted . A dying HD is also possible.

So , if it's a software issue with boinc it should really be fixed soon.


Regards
Uba36

[edit for spelling]
ID: 218070 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 218077 - Posted: 19 Dec 2005, 16:14:52 UTC
Last modified: 19 Dec 2005, 17:04:04 UTC

I also suspect it's a hardware issue, but if it persists, you may want to report it to boinc_dev mailing list at http://www.ssl.berkeley.edu/mailman/listinfo/boinc_dev to notify boinc devs or ask for solution because it's a boinc core's problem.
Personally I've encountered this problem upon incorrect shutdown on my Linux box with xfs. But since I changed to reiserfs, I've never seen this even after erronerous shutdown, several times. I have never seen this with my Windows XP boxes either with ntfs or with vfat (fat32).

edit: typo
ID: 218077 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 218109 - Posted: 19 Dec 2005, 17:01:42 UTC - in response to Message 217364.  
Last modified: 19 Dec 2005, 17:20:27 UTC

[snip]

I'm all for analyzing the data with higher resolution, but making the process time to complete 1 W/U take up to 10 times longer than with the STANDARD 4.18 app does not make any sense. IMO, with the Enhanced app, they should break the data into W/U's that will never exceed 4 to 6 hours using an 'average' modern computer. I would define an 'average' modern computer as an AMD64 3000+ or Intel P4 3.0 GHz or equivalent with 512 Megs RAM. If such a machine cannot crunch a single W/U in less than 6 hours, then MANY people will become disinterested in this project. Espescially if the issue regarding the client_state file corruption is not resloved before the Enhanced client is released.

[snip]
Regards, Daniel.


Yeah 4-6 hours seems more reasonable, I've heard 10x too, But some have reported as much as a 22x increase in the time(about 51hours per WU and how would a 4x credit compenste for that huge amount of time? It wouldn't be able to or be enough.) under the Beta. And when I and others complained We were told so what? We don't care, Science is all important. People want to Help with Seti If It's seen as reasonable and If the results of their efforts can be seen, If not they'll quit SetiBoinc and Seti will get a lot less help. They seem to have a view of life of that everyone will help no matter how slow It goes and no matter how much is added to slow down the reporting of WUs. Some have said the minimum should be a PIII, A64 3000 cpu or something like that. If that were so just think of all the PII's and the AMD XP socket A cpus that would quit rather than to try for what could be crunching times lasting far too long.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 218109 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19057
Credit: 40,757,560
RAC: 67
United Kingdom
Message 218124 - Posted: 19 Dec 2005, 17:23:00 UTC - in response to Message 218109.  
Last modified: 19 Dec 2005, 17:23:53 UTC

Yeah 4-6 hours seems more reasonable, I've heard 10x too, But some have reported as much as a 22x increase in the time(about 51hours per WU and how would a 4x credit compenste for that huge amount of time? It wouldn't be able to or be enough.) under the Beta. And when I and others complained We were told so what? We don't care, Science is all important. People want to Help with Seti If It's seen as reasonable and If the results of their efforts can be seen, If not they'll quit SetiBoinc and Seti will get a lot less help. They seem to have a view of life of that everyone will help no matter how slow It goes and no matter how much is added to slow down the reporting of WUs. Some have said the minimum should be a PIII, A64 3000 cpu or something like that. If that were so just think of all the PII's and the AMD XP socket A cpus that would quit rather than to try for what could be crunching times lasting far too long.


The actual crunch time on the most common angle_range units is a factor of * 12 on this Pent M, Seti standard 2hr:20m, SetiB 28hr. But the range_angle has a much greater influence on the variation of crunch times.

The credits can be adjusted so I wouldn't read to much into the present claimed/granted they have to be approx equal to what you can get on the other projects/day. Or people would either leave it droves if it was to low or jump on bandwagon if it is set to high.

Why do you suggest PII's might want to go and P3's remain. At one time on classic I had PII 400MHz and PIII 450MHz on identical mobo's, and guess which one was fastest. Even swapped cpu's over to confirm, and the PII was still the faster. And there are people still using 486's taking over 60hrs to crunch a unit here, so length of time to crunch isn't really a factor.
ID: 218124 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 218130 - Posted: 19 Dec 2005, 17:31:30 UTC - in response to Message 218124.  


The actual crunch time on the most common angle_range units is a factor of * 12 on this Pent M, Seti standard 2hr:20m, SetiB 28hr. But the range_angle has a much greater influence on the variation of crunch times.

The credits can be adjusted so I wouldn't read to much into the present claimed/granted they have to be approx equal to what you can get on the other projects/day. Or people would either leave it droves if it was to low or jump on bandwagon if it is set to high.

Why do you suggest PII's might want to go and P3's remain. At one time on classic I had PII 400MHz and PIII 450MHz on identical mobo's, and guess which one was fastest. Even swapped cpu's over to confirm, and the PII was still the faster. And there are people still using 486's taking over 60hrs to crunch a unit here, so length of time to crunch isn't really a factor.


As for crunching time, there is already a hint in stderr.txt. For example, see this stderr.txt.

NumCfft, NumGauss, NumPulse, NumTriplet are much releated to crunch time. They are the numbers how many signals are scanned. NumCfft is somewhat related to how many times v_ChirpDate() is called. I haven't established how to calculate crunch time exactly, but it will be the form of

crunching time = "NumCfft*a + NumGauss*b + NumPulse*c + Numtriplet*d"

If you find a,b,c,d, you can estimate the crunch time.
ID: 218130 · Report as offensive
uba36
Volunteer tester

Send message
Joined: 17 Jul 02
Posts: 74
Credit: 1,159,280
RAC: 0
Germany
Message 218174 - Posted: 19 Dec 2005, 18:33:17 UTC - in response to Message 218130.  

I haven't established how to calculate crunch time exactly, but it will be the form of

crunching time = "NumCfft*a + NumGauss*b + NumPulse*c + Numtriplet*d"

If you find a,b,c,d, you can estimate the crunch time.


If this formula is correct, establishing a,b,c and d is possible with any statistics software, provided you have enough data for a multi regression.
Even the excel solver can do this.
ID: 218174 · Report as offensive
Tetsuji Maverick Rai
Volunteer tester
Avatar

Send message
Joined: 25 Apr 99
Posts: 518
Credit: 90,863
RAC: 0
Japan
Message 218182 - Posted: 19 Dec 2005, 18:43:52 UTC - in response to Message 218174.  
Last modified: 19 Dec 2005, 18:54:13 UTC

I haven't established how to calculate crunch time exactly, but it will be the form of

crunching time = "NumCfft*a + NumGauss*b + NumPulse*c + Numtriplet*d"

If you find a,b,c,d, you can estimate the crunch time.


If this formula is correct, establishing a,b,c and d is possible with any statistics software, provided you have enough data for a multi regression.
Even the excel solver can do this.

I believe this formula is almost correct. "Almost" means there are other factors; scanning spikes, dechirping and fft, which should be roughtly (not exactly) proportional to NumCfft, so it's included in a. S@H_enhanced consumes most of time in scanning signals, dechirping and fft. My guess is b is larger than the others. So if you know a,b,c,d, you can estimate crunch time. And as a matter of course this formula is not valid when result overflow happens.

So-called "short wu" has 0 NumGauss and small NumCfft and the other two are also small, and that's why it's short.

You'll be able to determine a,b,c,d using results in beta project if there's a machine which has crunched enough wu's.
ID: 218182 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 218282 - Posted: 19 Dec 2005, 21:32:55 UTC - in response to Message 218124.  

Yeah 4-6 hours seems more reasonable, I've heard 10x too, But some have reported as much as a 22x increase in the time(about 51hours per WU and how would a 4x credit compenste for that huge amount of time? It wouldn't be able to or be enough.) under the Beta. And when I and others complained We were told so what? We don't care, Science is all important. People want to Help with Seti If It's seen as reasonable and If the results of their efforts can be seen, If not they'll quit SetiBoinc and Seti will get a lot less help. They seem to have a view of life of that everyone will help no matter how slow It goes and no matter how much is added to slow down the reporting of WUs. Some have said the minimum should be a PIII, A64 3000 cpu or something like that. If that were so just think of all the PII's and the AMD XP socket A cpus that would quit rather than to try for what could be crunching times lasting far too long.


The actual crunch time on the most common angle_range units is a factor of * 12 on this Pent M, Seti standard 2hr:20m, SetiB 28hr. But the range_angle has a much greater influence on the variation of crunch times.

The credits can be adjusted so I wouldn't read to much into the present claimed/granted they have to be approx equal to what you can get on the other projects/day. Or people would either leave it droves if it was to low or jump on bandwagon if it is set to high.

Why do you suggest PII's might want to go and P3's remain. At one time on classic I had PII 400MHz and PIII 450MHz on identical mobo's, and guess which one was fastest. Even swapped cpu's over to confirm, and the PII was still the faster. And there are people still using 486's taking over 60hrs to crunch a unit here, so length of time to crunch isn't really a factor.


I thought they might be too slow, possibly, No flame meant. As the last Intel cpu I used was a Pentium and not a PII or PIII.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 218282 · Report as offensive
Profile Purple Rabbit
Avatar

Send message
Joined: 31 Aug 99
Posts: 49
Credit: 5,820,832
RAC: 3
United States
Message 218300 - Posted: 19 Dec 2005, 22:19:12 UTC - in response to Message 218124.  
Last modified: 19 Dec 2005, 22:32:30 UTC

And there are people still using 486's taking over 60hrs to crunch a unit here, so length of time to crunch isn't really a factor.


My super 486 (Pentium 58 equivalent) takes 12 days to run the current WU. It does this reliably, but slowly. It's dedicated to crunching so only power failures and operator-head-space errors would corrupt the client state file.

I doubt that it can run SETI Enhanced, but I'm going to try it just to see if it works.
ID: 218300 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 218343 - Posted: 19 Dec 2005, 23:28:04 UTC - in response to Message 218300.  

And there are people still using 486's taking over 60hrs to crunch a unit here, so length of time to crunch isn't really a factor.


My super 486 (Pentium 58 equivalent) takes 12 days to run the current WU. It does this reliably, but slowly. It's dedicated to crunching so only power failures and operator-head-space errors would corrupt the client state file.

I doubt that it can run SETI Enhanced, but I'm going to try it just to see if it works.

It should work OK. The deadline for _enhanced is currently based on the time it would take for a system with a 100 MHz CPU with 10% resource share for setiathome_enhanced.
                                                     Joe
ID: 218343 · Report as offensive
Daniel Schaalma
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 297
Credit: 16,953,703
RAC: 0
United States
Message 218435 - Posted: 20 Dec 2005, 3:03:35 UTC - in response to Message 218032.  
Last modified: 20 Dec 2005, 3:19:07 UTC

... many hours into processing very long workunits when the client_state file gets corrupted for ANY reason, I would be furious. This file seems to get corrupted often enough that it could be devastating to the science as well as a user's credit. Windows XP 64-Bit Edition is especially susceptable to this error...

I agree with WinterKnight that you've likely got HARDWARE problems causing spurious operation. Either that or your Windows is infested with something nasty.

Check out your system with Memtest86+, GIMPS mprime95 torture test, and then a HDD test diagnostic utility from your HDD manufacturer. If you cannot be 100% certain that your hardware is faultless, any attempt at computing is futile.

And there's various Linux distros that have been running fully 64-bit for years now. Take a look? ;-)

Happy crunchin',
Martin


I can assure everyone that this is NOT a hardware issue. I am a computer technician that repairs computers for a living. When this issue first began happening, I did thorough hardware diagnostics on each machine this occured on, ant everything is in PERFECT working order. This is definitely a software bug.

Regards, Daniel.

[edit]I should also add that I built every computer I own with my own hands, using the highest quality parts I could get my hands on at the time of each build[/edit]
ID: 218435 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19057
Credit: 40,757,560
RAC: 67
United Kingdom
Message 218456 - Posted: 20 Dec 2005, 4:03:16 UTC - in response to Message 218435.  

I can assure everyone that this is NOT a hardware issue. I am a computer technician that repairs computers for a living. When this issue first began happening, I did thorough hardware diagnostics on each machine this occured on, ant everything is in PERFECT working order. This is definitely a software bug.

Regards, Daniel.

[edit]I should also add that I built every computer I own with my own hands, using the highest quality parts I could get my hands on at the time of each build[/edit]


Ok, we can say the computer hardware is in good order. Your account shows 21 computers, you say 10 of the 16 occurances were down to power failures. So we are looking at only 6 unknowns and you also say that the XP64 machine is the most susceptable.

So my logic says either the XP64 machine is suspect to power fluctuations, and accounts for most of the 10 known power induced problems and maybe some of the other six as well. Or most of the six unexplained are on this machine. So what is rating of the PSU on this computer, are the ratings sustained or peak, is it of good quality. If everything is ok in the psu then what about the motherboard, most of the critical voltages are regulated there, so same questions for this component.

CPU and memory, what happens if you adjust the voltages on these components up or down a notch.

Have you adjusted the save to disk option, and allow HDD(s) to spin down between saves, and therefore have to spin up, maybe causing 12V line dip. Unless known for any device containing a motor, you should always allow 3.5 times normal running current at startup. i.e. if HDD spec says 12V 1Amp then allow an extra 2.5Amp for runup thats an extra 25Watts/HDD. Why do you think that the newer SATA2 spec allows for sequential start up of HDD's.

Just a few thought's for you to mull over, excuse the rambling, but as far as I know you are the only one reporting this problem, and therefore have to ask the question, Why? So I will try to help, by throwing in my 40+ years as an electronics engineer and computer builder.
ID: 218456 · Report as offensive
Don Erway
Volunteer tester

Send message
Joined: 18 May 99
Posts: 305
Credit: 471,946
RAC: 0
United States
Message 218529 - Posted: 20 Dec 2005, 7:22:42 UTC
Last modified: 20 Dec 2005, 7:30:50 UTC

Does it pass the prime95 torture test?

I found I actually had to slow my OC down, a about 4-5 mhz fsb, BELOW prime95, to get my AMD64 system to run boinc and seti reliably. And the problem was exactly this - bad state file, and restarting the project, and abandoning the WUs.

You can click on my name, and see my machines.

At the time, I was running seti only, no enhanced at all. Then, after I got everything totally long term stable, I attached to seti enhanced, and have never had a single problem.

Don

ID: 218529 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Seti_enhanced & client_state errors


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.