Message boards :
Number crunching :
Shorties estimate up from three minutes to six hours after today's outage!
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 9 · Next
Author | Message |
---|---|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
No, this was a question of me.. ;-) Oh, but it's so much more fun to test it live....LOL. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
Mark, Are you sure this is a "live" test...seems to me it is more like DOA. Dave |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Mark, Not until our dear Dr. Anderson says it is............ "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I suggest people get ready to hit the NNT button..... Well, i've been able to get some work. Problem is that it's all crunched before i can download more. Grant Darwin NT |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
Richard, Why screw up 250,000 users for a rare problem?? Or at least fix it with MUCH less impact on the portion that is not a problem. Dave |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Hang on while I go and find the threads. OK, go and have a read of Average processing rate - a little high? "Be careful what you wish for" Now please excuse me while I go to try and compose an email for David that he will actually listen to - yesterday's efforts don't seem to have elicited a response yet. |
W-K 666 Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67 |
Hang on while I go and find the threads. But I wished for a competent fix not a band aid. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Hang on while I go and find the threads. Well.....best of luck, my friend. Even though not running smoothly, I've got work to crunch and will let the rigs get on with struggling with Boinc whilst I sleep with the kitties. As the saying goes....... 'This too, shall pass.' Meow. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Why screw up 250,000 users for a rare problem?? It's much fewer than that. The fix only affects users of optimised applications - something of the order of 10,000 users, last time we checked the download statistics. As to why? I'm sure it wasn't intentional. But a fix, developed - as I said, in good faith - had side-effects when deployed untested on a live production server. |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
Hang on while I go and find the threads. Kidney stones pass too, but they are not what you would wish for. Dave |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Hang on while I go and find the threads. I did say something a number of posts ago about it being painful...LOL. G'night all, and good luck. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
W-K 666 Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67 |
Because it is not as rare as Richard might assume, it will hit everybody attaching a new or re-attached computer who chooses to do MB and AP. It messed things up as badly as what we are seeing now because of the wide difference in processing times and therefore the big differences in time taken to get 10 tasks validated. i.e. MB goes to APR timings long before AP gets there. I my case as soon as the MB CUDA app got a working APR the DCF bounced from 1.xx to 10.xx every time an AP task completed. It is still so bad that just before this ill conceived change, that when an AP task completed the DCF rose to above 2.xx and that is 6 weeks after it was re-attached. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Hang on while I go and find the threads. There's been another Changeset [trac]changeset:24217[/trac] which sounds like it'll work better: - scheduler: revise [21428] to include non-anonymous-platform, Claggy |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
|
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
I don't want to start a war but I disagree. About 5 weeks ago I installed cuda cards in 3 of my machines. I started them all out with stock and changed to opti apps after 3 or 4 days. All were running GPU MB and CPU MB & Ap from the get go.. Initially estimates were off, I had not completed 10. As soon as I completed 10, a few days, estimates were accurate and remained so for a month, until this change. Maybe something else is going on, especially since you are still having issues after 6 weeks. I did read the thread listed above. Dave |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
And, when SAH_v7 is deployed here in October or November, it will become not rare at all. Common, in fact. Universal. Because EVERY user will be starting from a clean APR slate for what we now call MB. Email sent: David, |
[DPC] hansR Send message Joined: 14 Jul 00 Posts: 47 Credit: 235,829,569 RAC: 8 |
I'm receiving every now and then 1 or 2 WU's. I'm running 3 at a time on my GTX 570. After every request there is a backoff for 5 minutes, but it takes just 2-3 minutes to finish the received WU('s). The system is not able to connect to the internet between 22:00 and 5:00. I think my card will become lazy .. Just wait and see ...... |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Hang on while I go and find the threads. Yesterday's effort (which was only a whisker short of a rolling pin) yielded the desired response of having David listen to Joe's advice on what might be good numbers (see the new changeset posted). from boinc_dev: I changed the ratio limit from 2 to 10, Perfect in time to wreck havoc over the weekend. Anybody attached to beta, please monitor your incoming work closely, to see if the change has brought the desired effect - i.e. task duration estimates back to more realistic. We don't want the change to hit main if it doesn't. Especially if it now is going to affect not only the estimated 10k users on anonymous platform (some 4k of them with a v0.38 Lunatics installer) but also stock. Winterknight wrote: This is a BOINC change, Seti Beta is for testing Seti applications, although there is nothing to stop Dr.A doing tests at SetiBeta. It is not wise to cloud the issues of the testing at Beta, with possible errors introduced by changes to BOINC. That point was addressed as well. That's probably why we have a one day window between beta and main deploy - balancing 'fixing ASAP' with 'checking it actually works' Anybody can predict whether these changes will impact on the 'DCF squared' problem? We expect V7 release in late autumn/early winter and then EVERYBODY will start with a fresh slate - DCF squared will need addressing before that. DCF squared is the problem experienced when attaching a new host or swapping to anonymous platform, resulting in a new entry on the application details page, tracking the APR of the CPU/GPU to be used in task duration estimates. Tasks first come in with the far too high initial estimate - over the next few tasks, DCF on the client adjusts to small values to get estimates down. after the 10th VALIDATION APR kicks in - expecting a client DCF of 1! And BANG! DCF squared - tasks are suddenly estimated much much too short, the host overfetches, etc. etc. the first task of the new batch (if it doesn't get killed by a -177 'ran longer than 10x estimate) pushes DCF back up towards 1, but by then the damage has often been done. NB if you run anon, inserting <flops> will circumvent this problem. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Yes, David's interim fix was posted and announced while we were discussing matters here. Neither the changeset nor the email had been released at the time I entered this discussion - and, to be honest, given the time-zones involved, I didn't expect it: I thought we'd already missed the 'Wednesday Window' for changes. The interim fix is, IMHO, precisely that - a modest change that will take us back part-way towards the status quo ante. Joe's suggested ratio of 10 was derived from, and is reasonable for, optimised CPU crunching. My GTX 470, for example, - at 18 months old no longer state of the art - needs a ratio of around 80. The 'DCF squared' problem is comparatively minor for users of the stock application delivery method. It's all over within the runtime of a single task, because the revised APR estimates are applied globally to all tasks in the cache, including the currently-running one. So, on the conclusion of the current task, local DCF is reset and the 'squared' drops out of the equation. DCF squared is more of a problem for users of the 'anonymous platform' delivery method. Then, the APR values are applied singly, one task at a time - and only to newly-downloaded work. Previously cached tasks retain their old estimates, and thus DCF squared persists for as long as it takes to work through that cached work. All of this has major implications for the expected autumn rollout of v7. • There'll be no pre-existing project-wide averages to 'seed' the initial estimates. • Everyone will be starting their own APR record from scratch. • Downloading tasks will be even harder than usual, because everyone will be downloading new applications (and, in a lot of cases, very large CUDA runtime/fft DLLs) at the same time. • We have no idea yet how those factors will affect the time it takes to reach the crucial tenth validation. Be prepared for a bumpy ride... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
NB if you run anon, inserting <flops> will circumvent this problem. I thought the reason for the server side adjustments was so we didn't have to do all that stuffing around? And it did involve a lot of stuffing around. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.