Message boards :
Number crunching :
Panic Mode On (108) Server Problems?
Message board moderation
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 29 · Next
Author | Message |
---|---|
David@home Send message Joined: 16 Jan 03 Posts: 755 Credit: 5,040,916 RAC: 28 |
Great news that the SETI team managed to solve the database problem. Had to do a manual update as BOINC manager was in a 4 hour deep sleep. Picked up GPU work only but hopefully will pick up CPU work as the splitters catch up with demand. Wonder who the lucky ones were that got a cache of all those Astropulse redos that were building up, I missed out on those. |
Sid Send message Joined: 12 Jun 07 Posts: 16 Credit: 10,968,872 RAC: 0 |
167 on both Linux and Windoze machines. . . .looks like we're back. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Well. Meowmeowmeow. Middle of a Friday night and Seti comes back online? Kudos to whoever was working on things this late!! Thankyouthankyouthankyou! Meow! "Time is simply the mechanism that keeps everything from happening all at once." |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
But, as expected, there's trouble in paradise. Well, I am hoping the success was not that short lived, and the SSP snag is just due to the heavy load things must be under. Kitties are hopeful. Meow. "Time is simply the mechanism that keeps everything from happening all at once." |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
But, as expected, there's trouble in paradise. I prefer to thank the kitties. Meow! "Time is simply the mechanism that keeps everything from happening all at once." |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
All nice, full caches on all machines. Definitely came roaring back ... |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
All nice, full caches on all machines. Definitely came roaring back ... Then you got lucky and hit the servers just after they came back up. Now it's going to be like the server lottery trying to get work with all the hungry computers to feed. And THAT depends on things staying glued together under the heavy load. Meowpatience. "Time is simply the mechanism that keeps everything from happening all at once." |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
It's unfortunate that, as happened so often after past outages, the luck of the draw has thrown tapes full of Arecibo shorties into the splitter just as we need a steady supply of good, chewy, work. One of my GPU machines has finally filled its 200 task queue, and it's got precisely 100 shorties and 100 guppies - nothing in between. |
Advent42 Send message Joined: 23 Mar 17 Posts: 175 Credit: 4,015,683 RAC: 0 |
Yeah!! The search continues...:-) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It's all fine. The project will be back in January. Maybe a little earlier than that.Which January? . . :) Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline. . . That would get my support :) Stephen .. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline. . . Hi Mark, . . I have heard that reasoning before but there is no rational reason for any rig, no matter how slow, to download more work than they can process in a month. If the gear can only process 2 tasks per week than set the cache so that you only download half a dozen jobs and a one month deadline is still not an issue. Since every time you upload/report results you get fresh work (OK I'm an optimist 8^} ) such a rig would still be productive. Why should any rig have 100 tasks cached if it would take that rig 6 months to process them? I completely agree that such disproportionate downloading makes a great argument for shorter deadlines. Stephen .. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Don't forget it isn't just the raw crunching time you need to consider for deadlines - it's also all the dead time when the computer is switched off or in use. And fir Android, when it's away from the charger. . . But if any given device cannot process a single task within a month, whether due to insufficient crunching power or lack of run time, then is that device really fit for purpose ?? Stephen ?? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Why should any rig have 100 tasks cached if it would take that rig 6 months to process them?If the pattern is persistent, it wouldn't be able to. Work is requested by time (use the <sched_op_debug> flag, and really read the Event Log): the maximum time request is 20 days, not 6 months. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Given the arguments in favor of the roughly 8-week deadlines for normal AR and VLAR MB tasks, in order to accommodate even the most laggardly of hosts, can anyone then explain the 3-week deadlines for AP tasks? On my machines, regardless of the CPU or GPU or OS, AP tasks take longer to run than the longest-running of those MB tasks, in some cases, 2 or 3 times as long. If 3 weeks is an adequate deadline for APs, why not for MBs? Mind you, I'm not advocating for that short a deadline for either of those categories of tasks, but that's always struck me as a glaring inconsistency. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Why should any rig have 100 tasks cached if it would take that rig 6 months to process them?If the pattern is persistent, it wouldn't be able to. Work is requested by time (use the <sched_op_debug> flag, and really read the Event Log): the maximum time request is 20 days, not 6 months. . . But that is the system's weakness. If a rig can crunch a WU in 2 hours (say an old i7 using just one CPU core) and they set their work request to the 20 day maximum allowed they will get the full server limited allocation of 100 tasks, even if they are returning only a few per week or only invalid results. I have come across many wingmen like that. If the allocation were made on average return time as others have suggested, rather than average run time, then the allocation numbers would be appropriately reduced. . . BTW, just how do you set flags in the event log?? Stephen . . |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Given the arguments in favor of the roughly 8-week deadlines for normal AR and VLAR MB tasks, in order to accommodate even the most laggardly of hosts, can anyone then explain the 3-week deadlines for AP tasks? On my machines, regardless of the CPU or GPU or OS, AP tasks take longer to run than the longest-running of those MB tasks, in some cases, 2 or 3 times as long. If 3 weeks is an adequate deadline for APs, why not for MBs? Mind you, I'm not advocating for that short a deadline for either of those categories of tasks, but that's always struck me as a glaring inconsistency.No, but... Some of it is covered in the (extremely ancient) Astropulse FAQ page. Astropulse had been around as a concept for some years, but this particular implementation was written as a grad-student project by Josh von Korff. Because it formed part of his Examined coursework for whichever degree it was, Eric - as his supervisor - very deliberately: (a) left him to work out the solutions to his own problems (b) required him to handle deployment, snagging, and dealing with user feedback as part of his training. Josh got it working and deployed, passed his degree, and moved on to continue his academic career at another institution. I suppose the justification at the time (about 10 years ago, just before GPUs were added to the crunching mix) was that you could opt in or out of AP, only those with the most powerful CPUs would choose to opt in - others with deadline trouble could content themselves with MB only. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
No, but...Actually, it looks like it was, at least initially, only an opt-out choice, unless you were running optimized apps. All others got AP tasks automatically, if their systems met the requirements. I'm intrigued by a couple of statements in there. First, that "The initial deadline for Astropulse tasks will be 14 days.", and then there's "... If our server judges that your computer cannot complete an Astropulse workunit in 22.5 days (75% of the maximum 30 days)...". So, the 14 days apparently got bumped up early on, but what's the meaning of "maximum 30 days"? Was that the maximum for any S@h task in those olden days? If so, how did we jump to 8 weeks, even as CPUs got faster and GPUs came into the mix? The bottom line, to me, is that decisions regarding task deadlines are among those that were made a long time ago, and no longer take into account the processing environment as it exists today, both in terms of end-user hardware and the project's periodic database woes. Many aspects of the project are moving forward. These legacy decisions should be revisited to evaluate whether the reasons behind them still make sense. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
So, the 14 days apparently got bumped up early on, but what's the meaning of "maximum 30 days"? Was that the maximum for any S@h task in those olden days? If so, how did we jump to 8 weeks, even as CPUs got faster and GPUs came into the mix? Those numbers were probably based on the original pre-BOINC Seti work times, then beefed up for the original BOINC Seti. Since then, there have been several versions of Seti, each one involving more processing than the previous one and longer runtimes than the previous version for given hardware. Hence the long deadlines, based on the crunching time for that much older hardware. The bottom line, to me, is that decisions regarding task deadlines are among those that were made a long time ago, and no longer take into account the processing environment as it exists today, both in terms of end-user hardware and the project's periodic database woes. Many aspects of the project are moving forward. These legacy decisions should be revisited to evaluate whether the reasons behind them still make sense. That's the best argument yet for changing the deadlines IMHO. Current basic Android devices would be on par with what was a highend P4 computation wise. More recent Android devices are not only higher performing, but multi core as well; let alone current CPUs with AVX, AVX2 and IPC (instructions per clock) improvements. And then there are GPUs. Many of the deadlines were based on WU run times for what was the lowend hardware of the day. Current lowend hardware matches, or even exceeds, highend hardware of that period. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
The bottom line, to me, is that decisions regarding task deadlines are among those that were made a long time ago, and no longer take into account the processing environment as it exists today, both in terms of end-user hardware and the project's periodic database woes. Many aspects of the project are moving forward. These legacy decisions should be revisited to evaluate whether the reasons behind them still make sense.I totally agree. But they need to be rational, considered revisitations, taking into account the needs of everyone - the project itself, the users who post here, the users who don't post here, the users with the latest hardware, the users with one clunky hand-me-down, the users who are exclusively dedicated to SETI, the users who spread themselves thinly across multiple projects..... And everyone in between. What the project needs most of all is time to think, and data to base their decisions on (which means fixing those broken webpages like client types and science status) which haven't updated since before Matt Lebofsky was diverted away from us. And that means more people. And more people means more money. Thinking caps on, please. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.