Message boards :
Technical News :
Wall of Workunits (Apr 15 2008)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
As mentioned yesterday the kind folks at Adaptec/SnapAppliance replaced our server. The leading theory for its failure is still localized to the ribbon cable connecting the faceplate to the motherboard, but they swapped out the whole thing anyway just to be safe. The RAID devices had to be massaged a bit and then spent all night resyncing. That wrapped up around 4am, but one of the RAID1 pairs needed to be resynced again. Once that finished, I tackled the usual Tuesday database compression/backup. Since that began early this week (no reason not to since we were already off line) that completed around 12:30pm and I started the public/beta projects. We'll be catching up for a while, I imagine. The assimilator queue blossomed again, but this (I think) was mostly due to one of the four assimilators being stuck on one particular result where the uploaded file got garbled and therefore became un-parseable. I blew this result away and that one assimilator seems to have pushed through for now. Jeff is trying to debug a new problem with the splitters - despite additional smarts/logic some are failing mid-file, unable to find the radar blanking signal. But when we look at the file by hand, we see the signal (or at least where the signal should be). Insert sound of head scratching here. In any case, if there are less splitters running than normal, that's why. Happy Tax Day, my U.S. compatriots. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Daniel Michel Send message Joined: 2 Feb 04 Posts: 14925 Credit: 1,378,607 RAC: 6 |
From my end it seems like you guys are doing a swell job with the tools you have to work with...Now if only we could get a major band to do a benefit concert for SETI@home you could be outfitted with more of the good stuff you need...Even a fundraising concert by a bunch of lesser known artist could raise some significant cash. Again...thanks for keeping us updated. PROUD TO BE TFFE! |
Greg Send message Joined: 12 Oct 07 Posts: 6 Credit: 1,031,943 RAC: 0 |
Speaking of a Wall of Workunits. I seem to have been allocated a huge slab while the download server was offline before I realised there was something wrong and suspended my boinc software. Is there any way to re-download these, or at least have them put back into circulation? It's just that I don't like the idea of all those people waiting for credit until May 8 (when most of them expire) when I could have them processed within a few days. Cheers, Greg |
RandyC Send message Joined: 20 Oct 99 Posts: 714 Credit: 1,704,345 RAC: 0 |
Speaking of a Wall of Workunits. I seem to have been allocated a huge slab while the download server was offline before I realised there was something wrong and suspended my boinc software. Is there any way to re-download these, or at least have them put back into circulation? The only way (currently) to handle this is to Detach that host. A Reset does not work. If you still have valid WUs you're crunching, set no-new-tasks, run the queue down, and then report them before doing the detach. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Speaking of a Wall of Workunits. I seem to have been allocated a huge slab while the download server was offline before I realised there was something wrong and suspended my boinc software. Is there any way to re-download these, or at least have them put back into circulation? What I've been doing, which seems to be working very well: I only allow network activity for about 4 hours per day, during the evening in Berkeley. I've got my connect interval set to less than 4 hours (0.1 seems good), and my "extra days" at about 3. That way, the cache stays pretty full, and if the project is down for the evening, my systems aren't hammering Berkeley for work. |
Neil Blaikie Send message Joined: 17 May 99 Posts: 143 Credit: 6,652,341 RAC: 0 |
Good job again everyone at Berkeley. Thanks for the update Matt. Off to enjoy the small amount of remaining evening sunshine here in Montreal and yes it will be with a nice cold beer! |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . Again Matt - Thanks for keeping us folks informed - and iT is Appreciated Sir - THAT also goes out to each of the others @ Berkeley too . . . BOINC Wiki . . . Science Status Page . . . |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
|
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Umm, Matt... the client connection stats page is on the fritz again... . Hello, from Albany, CA!... |
John G Send message Joined: 29 Dec 01 Posts: 68 Credit: 10,932,850 RAC: 0 |
Speaking of a Wall of Workunits. I seem to have been allocated a huge slab while the download server was offline before I realised there was something wrong and suspended my boinc software. Is there any way to re-download these, or at least have them put back into circulation? Ditto Greg I have lost over 46 wu's this day because of the problem. Had to go to a reset of project which I hate doing !!!!. Cheers |
Mr. Majestic Send message Joined: 26 Nov 07 Posts: 4752 Credit: 258,845 RAC: 0 |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
Ditto Greg I have lost over 46 wu's this day because of the problem. Had to go to a reset of project which I hate doing !!!!. Why? Wait for them to be re-issued, they get crunched, you get credit. Grant Darwin NT |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Speaking of a Wall of Workunits. I seem to have been allocated a huge slab while the download server was offline before I realised there was something wrong and suspended my boinc software. Is there any way to re-download these, or at least have them put back into circulation? I didn't lose any. BOINC kept retrying, and when things came back up, it picked up the relevant files. |
Greg Send message Joined: 12 Oct 07 Posts: 6 Credit: 1,031,943 RAC: 0 |
Speaking of a Wall of Workunits. I seem to have been allocated a huge slab while the download server was offline before I realised there was something wrong and suspended my boinc software. Is there any way to re-download these, or at least have them put back into circulation? Thankyou all for your assistance on this one! Detaching did the trick, and the WU's in question are dropping from my task-list like flies. I took the hint from a few of you and increased my queue length a bit. Thanks Matt and the team for keeping up the supply! -Greg |
Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 |
Would a wall of workunits be causing this kind of problem? Task ID 820497706 Name 12mr08ah.19734.11115.10.8.88_2 Workunit 253396773 Created 19 Apr 2008 15:08:47 UTC Sent 20 Apr 2008 10:30:09 UTC Received 22 Apr 2008 15:58:14 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 2398546 Report deadline 14 May 2008 16:43:30 UTC CPU time 27096.671875 stderr out <core_client_version>5.4.11</core_client_version> <stderr_txt> Optimized SETI@Home Enhanced application Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra Version: Windows SSE3 32-bit based on seti V5.15 'Ni!' Rev: (R-2.4|xP|FFT:IPP_SSE3|Ben-Joe) CPUID: 'Intel PD Pentium D (Presler)' cpus: 1 cores: 2 threads: 1 cache: L1=16K L2=2048K L3=0K features: mmx sse sse2 sse3 speed: 3412 MHz -- read megs/sec: L1=12564, L2=8307, RAM=4702 Work Unit Info True angle range: 0.389240 Restarted at 78.07 percent. Spikes Pulses Triplets Gaussians Flops 1 0 0 0 22417043295907 </stderr_txt> Validate state Initial Claimed credit 73.9615432793234 Granted credit 0 application version 5.27 HOME PARTICIPATE ABOUT COMMUNITY ACCOUNT STATISTICS Taking almost three times the normal time to crunch |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Would a wall of workunits be causing this kind of problem? what is your normal CPU time for an ~ 74 credit WU? Nothing here looks out of the ordinary, to me, except the "Restarted at 78 %" ... possibly the WU was interrupted for another, higher priority, WU, possibly for another project. . Hello, from Albany, CA!... |
Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 |
About 10,000 seconds. An idea popped into my head. Maybe I'll go in and blow off all the cooling elements, fans, etc. Maybe the computer (just the one, not the other similar machine) is throttling back because its PD950 is getting a little too hot. I'll give that a try. There are more bad units today, too. There's no other project, just Seti at present. Thanks a lot. |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
About 10,000 seconds. An idea popped into my head. Maybe I'll go in and blow off all the cooling elements, fans, etc. Maybe the computer (just the one, not the other similar machine) is throttling back because its PD950 is getting a little too hot. I'll give that a try. There are more bad units today, too. There's no other project, just Seti at present. Thanks a lot. Also look for anything else that might have resulted in your CPU being "throttled back" for heat - bad CPU heatsink fan, dead case fan(s), adjacent case fans blowing in opposite directions, major obstruction in airways, etc.. Don't forget the fan in your power supply! (one of those has actually happened to me - the "dead case fan(s)") . Hello, from Albany, CA!... |
Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 |
Blowing out everything didn't help. I still found several bad results (long times with restarts) returned today, after blowing out the machine yesterday afternoon. All four fans are spinning at blur velocity. There can't be any adjacent case fans turning in opposite directions because there are no adjacent fans, and the machine had been crunching at normal speed up until recently. Some units are right now being done at normal speed. Maybe I could try loading a newer Boinc but that'll almost certainly freeze Seti (Simon's cruncher). Maybe I could try Crunch3r's cruncher (if that's available) instead of Simon's. |
Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 |
I finally got SpeedFan installed on both computers. The errant computer's processor is at 84C, and the OK one is at only 64C. The system fan is rotating faster in the good machine, and the "CPU0" fan is at 0 RPM in the bad machine. It looked like all fans were turning there but maybe there could be a hidden one somewhere. I guess it's a phonecall to CyberPower at convenience. Thanks. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.