Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 107 · Next
Author | Message |
---|---|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Sweet dreams. I'm on the other end of that timeline - just starting the day. More coffee, to dispel the traces of last night's sleepyjuice. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
I'm supposed to be on that side of the world myself. I actually live in Thailand (long story), but I'm back in the US doing something I didn't think I'd have to, complete a degree. Seems the Thai government won't let most foreigners work over there without a STEM degree, and though I've never needed it because I have at least half a dozen certifications from the major vendors not to mention 30 years experience dealing with nearly every main operating system to Supercomputers running TRIX or Secure Solaris (now solaris with security extensions), the people who write laws don't understand how people in the computer industry actually get jobs. So I find myself in Monterey doing JC stuff, and trying to get into Berkeley. So...Have a good day sir, and enjoy that coffee. Guy |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
How can this be possible? https://setiathome.berkeley.edu/result.php?resultid=8682469710 The server is sending the host GPU work that its advertized GPU can't run. And the host is not using anonymous platform! |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
The staff team don't have the time - have never had the time - to program the servers with every transient edge-case, like 'bad driver #xx from manufacture y'. They don't - can't - bother to even try. So the system only works at the broad-brush level: got an ATI card? Here's an ATI app. The BOINC system as a whole is designed - not necessarily well designed, but designed - to tolerate the small number of failures that drop through the cracks, without losing any science. That task will be finished off by somebody else. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
The staff team don't have the time - have never had the time - to program the servers with every transient edge-case, like 'bad driver #xx from manufacture y'. They don't - can't - bother to even try.Seti@Home staff is not even responsible for that but the Boinc devs. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Seti@Home staff is not even responsible for that but the Boinc devs.No, this one is down to SETI. The BOINC software provides the tools to make the app selection as precise as you like, to match the hardware - see Specifying plan classes in C++. But it's up the the project - SETI - to specify exactly what the rules are for their particular application set. We have an exceptionally wide set of applications to choose from, and an exceptionally complex set of rules for what will run on what. In this particular example case, the app selected through that process was correct for the broad category of hardware (ATI), but wrong for the specific case (model too old). That would be a SETI-specific rule, if the SETI staff had the time to write it. They didn't, and don't. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19724 Credit: 40,757,560 RAC: 67 ![]() ![]() |
If you want to know about the BOINC code from over 6 years ago, then you might find some links here at RomWorld, An example is BOINC Client: The evils of 'Returning Results Immediately' |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Good one! I think he wrote that in response to what I was posting at the time. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
That confirms that I did the right thing when I modded my client to not report any results if it has reported any in the last 30 minutes. It contacts the scheduler whenever Boinc thinks it wants new work, which is every 5 minutes (server specified cooldown) because my GPU chews through a single task in lot less than 5 min but will only report results on approximately every sixth contact. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
At the time Rom wrote that blog, the automatic reporting interval - the maximum wait time for a completed task - was 24 hours. The default reporting interval for a standard client is now 1 hour. I set my machines, where appropriate, to have Store up to an additional 0.05 days of work- just over an hour. Sometimes, the client gets hungry first, and sometimes the maximum reporting interval is reached first. Either way, both reporting and work fetch are combined into a single database interaction. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Either way, both reporting and work fetch are combined into a single database interaction.It that blog is to be believed, then work fetch takes a fixed number of db queries per task regardless of how many you get at a time. I have also observed that when the servers were heavily loaded after Tuesday downtimes (back when we still had them) the more tasks you asked, the more likely it was for the entire scheduler request to fail. That's why I modified the reporting interval only, not the work fetch interval. Also grabbing lot of tasks at once can make the next host hitting the scheduler after you get nothing. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Again, fair comment. My machines are, in general, not requesting a huge amount of work at the end of the hour. If there's a shortage, they repeat the hourly top-up request five minutes later, and usually catch up quite quickly and resume the pattern. I'm trying to remember what I might have been posting to provoke Rom's blog. I like to think it might have been returning/reporting work quickly, so it could be assimilated and purged as quickly as possible, keeping the database size down. Some things never change! Talking of which - kudos to the servers today. They're sending out a shorty storm, and my cache has noticeably increased in task numbers for the same time requested. Return rate is almost back up to 150,000 an hour, and the message boards seem responsive. They're doing something right. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Talking of which - kudos to the servers today. They're sending out a shorty storm, and my cache has noticeably increased in task numbers for the same time requested. Return rate is almost back up to 150,000 an hour, and the message boards seem responsive. They're doing something right.The database is swelling a lot. It has about 25.6 million results now and the splitters don't seem to be throttled at all so the database is growing without bound. The last time the database went into disk thrashing mode due to spilling out of RAM causing a day long period of no new tasks the result table size was about half a million rows lower than now. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
TBar was apparently right with his wild guess about the momentary leveling of assimilation queue being a turning point. The queue has been shrinking for the last two and a half days. But too slowly to make any meaningful difference before the end of work distribution. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
And the data on the SSP is gone!!! lol wow. the website was being quite laggy the last few mins or so too. guess that explains it. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
How can this be possible? https://setiathome.berkeley.edu/result.php?resultid=8682469710 This kind of goes to the heart of the issue I had with that one plan class they turned on a couple of months ago for me. It tended to error out in most cases for my build, but the replacement was in development and testing so they opened it up, realizing that there was a certain set of computers which was having issues with that specific code. This is how I made sure they don't run. It is better not to run them and throw bad information into the system, and just abort them ASAP so the others can work on them and get them out of the database ASAP. <app_version> <app_name>setiathome_v8</app_name> <plan_class>opencl_ati5_SoG_mac</plan_class> <max_concurrent>0</max_concurrent> <cmdline> -version</cmdline> <ngpus>0</ngpus> <avg_ncpus>0</avg_ncpus> </app_version> In most situations, the max_concurrent keeps them from running over other tasks in my client. If my client is low or out of other GPU tasks, it will try to run them anyway. That's where the command line kicks in and causes the program to terminate. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1647 Credit: 12,921,799 RAC: 89 ![]() ![]() |
I'm not sure I agree with you. Because every new work unit that is sent out creates another entry in the database so you're creating extra load on the database from the way I understand it ![]() |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
looks like the replica went kablamo. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() ![]() ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 ![]() ![]() |
looks like the replica went kablamo. I think you are right... I can't get much of a status page. So far I'm still sending and receiving work though |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13959 Credit: 208,696,464 RAC: 304 ![]() ![]() |
lol wow. the website was being quite laggy the last few mins or so too. guess that explains it.Web site was MIA for a while there, now it's just excruciatingly slow, with a very messed up & lacking in details Server Status page. Grant Darwin NT |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.