Message boards :
Number crunching :
Panic Mode On (94) Server Problems?
Message board moderation
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 22 · Next
Author | Message |
---|---|
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I'm only running one computer. Using 2 cores of an old Q8200 CPU for CPU tasks, and 2 cores feeding a single Mid-range GPU, ATI HD7870. I guesstimate mine to to out somewhere around & that your GPU should, at least, put out 6.8-7.2k more than mine. Which would put your machine in range of what you are thinking as well. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I find your RAC interesting because of the fact it is based predominately on AP work. Have you been able to keep the system busy with just AP work considering the whole AP database mess lately? My take on this .... AP work can be considered the 'best bang for the buck' with respect to actual credit awarded per unit of time. Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
OTS Send message Joined: 6 Jan 08 Posts: 369 Credit: 20,533,537 RAC: 0 |
New message from Eric K. under Technical News. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Well there goes my 'consecutive valid tasks' count... Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Well there goes my 'consecutive valid tasks' count... Yeah, well, now you're stuck with one of those "guaranteed to fail" MB WUs, too. Join the crowd! :^) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Well there goes my 'consecutive valid tasks' count... You know, I saw that. I noticed the stock app (_7) shows an autocorr count, and I've been seeing you guys talking about some batches that just will not run with autocorr, and all of you are doing GPUs and stuff, but I've been wondering if the Lunatics CPU app will do it right or not. *shrug* Probably not, but oh well, I suppose. Is what it is. Not that it really matters on that machine anyway. I'm pretty sure the MBs that it just got a little while ago are going to end up erroring-out for taking 2x more than the estimate. (I think it's because of what I did for a pile of APs two weeks ago and now the MBs it got have really really short estimates on them (like 1:02:34 for a shorty when those usually take 3-4 hours). I thought about adjusting those, too, but then I'll constantly be chasing this issue.. I just need to let it sort itself out now that there's nothing worth trying to save (consecutive valid tasks count.)) Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
... but I've been wondering if the Lunatics CPU app will do it right or not. *shrug* Probably not, but oh well, I suppose. Is what it is. Nope, it won't. Take a look at 3901319917, which was one of mine that ran to completion (an hour and 10 minutes on the CPU) before I wised up and started aborting them when I spotted them. ... now that there's nothing worth trying to save (consecutive valid tasks count.)) Actually, I don't think getting an Invalid will break your MB valid task streak, if you choose not to abort it. However, it will probably waste over an hour of your CPU's time getting it done. Certainly up to you as to which is more important. Look on the bright side, though. At least you're not going to be running it on an Android device, like the poor _2 on that WU, which took over 18 hours to achieve absolutely nothing! |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I thought tasks that get marked as invalid would reset "consecutive valid" back to zero? And I'm also going to be facing a few "maximum elapsed time exceeded (-177)" with the recent batch of MBs that I downloaded, so I know those will reset consecutive valid as well. Like I said, when I edited the client_state to get realistic estimates for some APs to avoid that same error, I guess my DCF got scaled way down in the process (I edited the estimates to make them ~3x what they were assigned as.. so it makes sense now that the estimates on new tasks are about 1/3 what they should be). I need to just let this problem fix itself, and that's going to mean getting some errors. I can probably micro-manage and gradually shift the estimates until they can sort themselves out without generating errors, but that's too much effort for the reward of keeping the consecutive valid value from being reset. Wouldn't matter if that "doomed to fail" task is going to reset it anyway. There's no getting out of that one, other than resetting the project and having the lost tasks resent using the stock app (if the stock app does in fact do it properly). Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
OK, who stepped on the Cricket. Both my links show no activity but downloads are working. Anyone know how to find the router they're using? LOL....I am actually relieved to find that out. I just got home from work and the first thing I did was refresh the Cricket graph. I was a bit dismayed and thought that the project was dead in the water. After checking further, I was happy to find that MB work is flowing and the rigs have their caches full again. Meow! "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I thought tasks that get marked as invalid would reset "consecutive valid" back to zero? Not necessarily. The "maximum elapsed time exceeded (-177)" errors are based on rsc_fpops_bound, not rsc_fpops_est, and for MB tasks the bound is set at 20x the est. Also, DCF is not used in the calculation of the time limit so although the estimated run time has been affected by the excursion in DCF, the limit has not. I can probably micro-manage and gradually shift the estimates until they can sort themselves out without generating errors, but that's too much effort for the reward of keeping the consecutive valid value from being reset. Wouldn't matter if that "doomed to fail" task is going to reset it anyway. There's no getting out of that one, other than resetting the project and having the lost tasks resent using the stock app (if the stock app does in fact do it properly). All app versions are doing those tasks properly, the autocorr parameters say not to do that search and they don't. The stock CPU SaH v7 app happens to show an Autocorr count of zero under those conditions, that's a correct value and does not imply that Autocorr processing was done. The reason for those correctly processed tasks being judged invalid is the SaH v7 Validator code requires a best_autocorr for tasks which don't overflow. Joe |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Well, MB seems to be flowing just fine, but I wish the Crickets would wake up..... "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I caught one of those blips on the Status Page and saved them for comparison to figure out what is going on. When wrong info is displayed on Server Status Page and HaveLand pages I found that it is not random info, but info from wrong data fields that is showing on the pages. When page displays WRONG info: (Field) = (What is Shown) Results ready to send = Results out in the field Current result creation rate ... Seems Correct Results out in the field = Workunits waiting for validation Results received in last hour ... Correct Result turnaround time ... Correct Results returned and awaiting validation = (Not Sure) MB shows 0, should be 2,772,932 AP shows 58, should be 2,039,924 Workunits waiting for validation = Workunits waiting for assimilation Workunits waiting for assimilation = Shows 0, Definately showing different field Workunit files waiting for deletion = Shows 0, Could be showing different field Result files waiting for deletion = Workunits waiting for db purging Workunits waiting for db purging = Results waiting for db purging Results waiting for db purging = Results returned and awaiting validation On the small numbers (i.e. 0-58) it is hard to tell which values should be where. To me it looks like info is randomly being read wrong, or worse, data is being written wrong into the database. EDIT: Forgot Results in Field |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Not necessarily. The "maximum elapsed time exceeded (-177)" errors are based on rsc_fpops_bound, not rsc_fpops_est, and for MB tasks the bound is set at 20x the est. Thanks, Joe. You always have the answers that explain everything. The first of those MBs ran through to completion properly. The original estimate was 1:04:23 or something very close to that, and it ended up taking over 7 hours instead (normal for that machine). Also, DCF is not used in the calculation of the time limit so although the estimated run time has been affected by the excursion in DCF, the limit has not. That's also good to know. So _bound is more or less static, and _est is determined by a combination of APR and DCF? Since the above task finished, the estimates for the rest of the cache have increased quite a bit, but still not to where they should be. Currently, they're showing in the 2:35:00 range--up from 1:04:00, but not quite near 7:30:00 yet. It'll take a few more tasks for that to happen. But at the same time, I thought that if a task took more than 10 or 20% longer than the average estimates, the remaining tasks would all have their estimates changed to what that long-running task took. If less than 10%, then do something like add 10% to all the others. You've explained it before..long, long ago. I was just thinking the rest of the cache should have all become 7:30:00 upon completion of that first one. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
OTS Send message Joined: 6 Jan 08 Posts: 369 Credit: 20,533,537 RAC: 0 |
I wonder if poor Eric is feeling a lot like Clifford Stoll back in the late 80s. A PhD in astronomy and all he seems to do is spend his time working on computers. Must be getting a little frustrating. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I thought that if a task took more than 10 or 20% longer than the average estimates, the remaining tasks would all have their estimates changed to what that long-running task took. If less than 10%, then do something like add 10% to all the others. You've explained it before..long, long ago. I was just thinking the rest of the cache should have all become 7:30:00 upon completion of that first one. This seems to have corrected itself when the second MB task completed. All estimates look right now, and I allowed new tasks and filled the 3.5-day cache and all the new tasks have correct estimates, as well. Back to letting that machine be on auto-pilot for a while now, I suppose. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Aurora Borealis Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0 |
Cricket is alive. Someone found an ink cartridge. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Cricket is alive. Someone found an ink cartridge. Cause for celebration. Still getting random weirdness on the Haveland graphs though. Grant Darwin NT |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I'm curious to see what the splitters will do when they get to "tape" 24se13ad. My records show that my machines already processed 287 tasks from that file just last March. They were all MB v7 tasks, however. I didn't get any APs from it. Looks like it should be the next "tape" in line, but I think I'll have to wait until morning to see what happens. |
S@NL Etienne Dokkum Send message Joined: 11 Jun 99 Posts: 212 Credit: 43,822,095 RAC: 0 |
Something's alive. AP's started validating again... |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Something's alive. AP's started validating again... Holy crap, did they ever.... And the kitties have some crickets to chase again! Meow! "Freedom is just Chaos, with better lighting." Alan Dean Foster |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.