Message boards :
Number crunching :
Panic Mode On (57) Server problems?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next
Author | Message |
---|---|
![]() Send message Joined: 24 Dec 99 Posts: 19 Credit: 5,056,116 RAC: 1 ![]() |
Team, This following messages are typical today... 10/1/2011 7:40:19 PM | SETI@home | update requested by user 10/1/2011 7:40:21 PM | SETI@home | Sending scheduler request: Requested by user. 10/1/2011 7:40:21 PM | SETI@home | Requesting new tasks for NVIDIA GPU 10/1/2011 7:40:25 PM | SETI@home | Scheduler request completed: got 0 new tasks 10/1/2011 7:40:25 PM | SETI@home | No tasks sent 10/1/2011 7:40:25 PM | SETI@home | No tasks are available for SETI@home Enhanced 10/1/2011 7:40:25 PM | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them Jeff I wonder if that means that there's a lot of VLAR's around, (VLAR's aren't sent to Nvidia GPU's) Out from 88 units i´ve downloaded over night there was only 6 VLARs. Most VHARs and a few mid range units. I think Jeff should check his project preferences then, and check that SETI@home Enhanced is still selected, then report back, Claggy, I have checked SETI@home Enhanced, and it is still selected. I have reset my project (no issues since I have not had any work for 2 days), and reset my queue for the maximum 10 days. I am still not getting work with the same response from the server... I appreciate your help regardless! I will keep working on it. Jeff |
Cruncher-American ![]() ![]() ![]() Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 ![]() ![]() |
No problems here, everything works perfect.... Don't look now, but.... I'm sure Berkeley will find a way... |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51527 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
However it will take more than this to make me angry, and start threatening to quit. Much, much more. Aye aye, Chris. The kitties would keep sniffing for WUs a loooooooooooooooooooong time if the project were forced into another extended downtime. And will keep doing what they can to avoid that from happening. Meanwhile, back at the crunching farm, all rigs have finally been able to top off their caches to the current limits.....50 CPU, 400 GPU per rig. I am sure the fact that all the GPU work is not VHAR at the moment has helped immensely. Hopefully the limits can be increased a bit next week, although with further adjustments in the Boinc server code in the offing, I am guessing it shall not be totally lifted for some time to come until most hosts have had more time to adjust their DCFs accordingly. "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
![]() ![]() Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 ![]() |
So.. I keep hearing about this limit of 50/cpu... my single core machine has 99 in progress and the messages tab says nothing about a limit. Hm. Now it's 100, still no message. Either it's a glitch, or it's because a 10-day cache on this machine is somewhere between 90-110 MBs on average, at least with the weird estimates and most of them being shorties. While I truly support the IDEA of 6.12.XX asking less frequently, I have found the reality of trying to use it frustrating beyond belief, especially on a fast machine. The backoffs are currently horrendous, unmanageable, and require constant button abuse in order to try to get a cache of any size. This is especially true if there is ANY network congestion( not that we have seen that lately). If you take a look at the first couple of pages of "Top hosts", It seems I am not the only person with similar experiences. I saw only 1 of the top 40 machines running 6.12. I myself tried, and had to go back. I really do not want to beat up servers any more than necessary, but 6.12 is IMO not ready for prime time. This, along with the current limits running enough work for a fast machine to last a few hours, makes every little bump in the road a major issue. Keeping the limits in place seems a necessary bandaid to the recent "upgrades"(*cough*) to the server software, but an increase would be much appreciated by the faster machines. Janice |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34495 Credit: 79,922,639 RAC: 80 ![]() ![]() |
That wasn´t a problem for me the last 12 month s^s. With each crime and every kindness we birth our future. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 ![]() |
While I truly support the IDEA of 6.12.XX asking less frequently, I have found the reality of trying to use it frustrating beyond belief, especially on a fast machine. The backoffs are currently horrendous, unmanageable, and require constant button abuse in order to try to get a cache of any size. This is especially true if there is ANY network congestion( not that we have seen that lately). I do find it's rather hard keeping a full cache of Astropulse with Boinc 6.12.x, it just doesn't ask enough, which means my T8100 is empty. Claggy |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34495 Credit: 79,922,639 RAC: 80 ![]() ![]() |
While I truly support the IDEA of 6.12.XX asking less frequently, I have found the reality of trying to use it frustrating beyond belief, especially on a fast machine. The backoffs are currently horrendous, unmanageable, and require constant button abuse in order to try to get a cache of any size. This is especially true if there is ANY network congestion( not that we have seen that lately). When the servers run stable for a few weeks its also no problem. But seti wasn´t stable the last couple month. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 ![]() |
That wasn´t a problem for me the last 12 month s^s. As I said, the slower machines do not have nearly as much of an issue. A couple of hundred units lasts a while. Right now with clean upload/download pipes, it is not an issue. But during congestion it certainly is. Janice |
Cosmic_Ocean ![]() Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 ![]() ![]() |
So.. I keep hearing about this limit of 50/cpu... my single core machine has 99 in progress and the messages tab says nothing about a limit. Hm. Now it's 100, still no message. Either it's a glitch, or it's because a 10-day cache on this machine is somewhere between 90-110 MBs on average, at least with the weird estimates and most of them being shorties. I ended up topping out at 103 tasks. BOINC finally stopped asking for work since three tasks have completed and DCFs have adjusted slightly. Between 98 and 103, it was a case of "asking for <100 seconds of work" and got 1 new task in response. I'm still curious why this machine gets past that 50 limit for CPU. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 ![]() ![]() |
Never got round to trying 6.12, but if there had been a surge of interest among the "Top hosts" I would have seen it and followed suit. This machine is not the fastest by any means but to give some of the readers of this board an idea of what this needs to keep it running. Regular WU's approx 500 per day for GPU's. VHAR WU's almost THREE thousand per day for GPU's, (1 VHAR processed every 30 seconds) CPU, I only use 3 cores and its only an old AMD, only does 30 - 50 WU's per day. Multiply that out and you should see what I need to download to have a few days cache for the next hiccup. The very top machines need 2 to 3 times that amount to keep going flat out and even more to rebuild a cache. Kevin ![]() ![]() ![]() |
SupeRNovA ![]() Send message Joined: 25 Oct 04 Posts: 131 Credit: 12,741,814 RAC: 0 ![]() |
i think that the top 100 hosts need to have a big WU limit that can allow them to store more work. |
Dave Stegner ![]() Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 ![]() ![]() |
I keep reading about the wu limit. I am not sure it is not an indication of some other problem. IE: I have 2 hardware identical machines, within a few 10's of rac points of each other. One has over 400 wus and the other 120. The one with 120 SOMETIMES tells me I have reached the wu limit sometimes not. The one with 400+ does not. With all the different response messages, at different times, it would almost appear that the scheduler response message is purely random. Dave |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 ![]() ![]() |
I keep reading about the wu limit. I am not sure it is not an indication of some other problem. The message may appear to be random but when everything is running OK my "tasks in progress" tops out at 450 WU's. Kevin ![]() ![]() ![]() |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 ![]() ![]() |
i think that the top 100 hosts need to have a big WU limit that can allow them to store more work. What we need is a gradual increase in WU limit, so that the work flows in and out without swamping the pipe, this could be done for all hosts not just the top performers. What we don't need is the limit just being removed and the usual swamped pipe problems. It would also be nice if the relaxing of the limit is done when we are not in the middle of a shortie storm. Kevin ![]() ![]() ![]() |
Cosmic_Ocean ![]() Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 ![]() ![]() |
Scheduler is coming and going. Most of the evening it has been "couldn't connect to server", then I got a successful contact, and the next one was "no headers, no data". Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51527 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
I think the database is getting tied in knots...and the servers with it. Uploads seem to be connecting and going through A-OK. And downloads, when you finally get some, are pretty zippy too. But connecting to the scheduler to report tasks and/or request work is very hit and miss right now, and even once you seem to connect, most attempts are resulting in 'http internal server error'. Will probably remain that way until tomorrow morning when somebody's back in the lab to sort things again. Meowsigh. "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
Cosmic_Ocean ![]() Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 ![]() ![]() |
There has to be something wrong with db_purge. Unless Jeff forgot to turn it back on after the Tuesday maintenance. Bloated DB seems most likely. If on Monday it goes down to give the database a chance to catch up on the backlog and then do the compression and backup on Tuesday, I'm fine with that. If it means fixing things, I think we can all agree to ~30 hours of downtime for fixing instead of what we're dealing with now. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
![]() ![]() Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 ![]() |
There has to be something wrong with db_purge. Unless Jeff forgot to turn it back on after the Tuesday maintenance. Bloated DB seems most likely. I would agree to as long as they need to fix it, presuming of course they have the tools to fix it. Janice |
Blake Bonkofsky ![]() Send message Joined: 29 Dec 99 Posts: 617 Credit: 46,383,149 RAC: 0 ![]() |
Looks like it's finally broken for the night. Haven't been able to connect in an hour now, and the times on the status page are falling behind. The page itself is updating on time, but various statistics on the right are all an hour behind now, and climbing. ![]() |
Cosmic_Ocean ![]() Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 ![]() ![]() |
I'm starting to wonder about how my single-core machine can have more than 50 in progress tasks. I know those of you with GPUs have demonstrated that once you get to 450, you get the limit message, but has there been a test where there is no GPU at all? I know you can just disable GPU crunching, but I'm trying to figure out if this is just one of those things where it's an old, pre-GPU client, or if it is because I don't have a GPU and haven't hit 450, or what? Any quad-core people out there want to pull their GPU and see if that's the case? Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.