Message boards :
Number crunching :
Panic Mode On (104) Server Problems?
Message board moderation
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 42 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Well I am finally at server limits on all machines. But it took 20 hours to fill the Windows 7 machines back to full strength. It will be interesting to see if I repeat this weeks outage problem next Tuesday. I'm curious whether my adding the RFC1323 options to the registry since then will have any effect next Tuesday. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Well I am finally at server limits on all machines. But it took 20 hours to fill the Windows 7 machines back to full strength. It will be interesting to see if I repeat this weeks outage problem next Tuesday. I'm curious whether my adding the RFC1323 options to the registry since then will have any effect next Tuesday. Just don't forget that you cannot expect an instant recovery after the outage. Especially if the dang thing runs over 11 hours like this week's did. Such long outages create many many hungry mouths to feed when coming back up. So even if the ready to send cache starts to fill up, the scheduler and feeder are getting run ragged trying to fill work requests. I realize I am pointing out the obvious. And I am away at work during and after the outages, so I cannot monitor how quickly my rigs get their fill. I do know that this week, the outage had ended not very much before I got home, and my caches were down. Meow. "Time is simply the mechanism that keeps everything from happening all at once." |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes you stated the obvious. I recognize the many mouths that need to be fed after the outage. What I can't explain is why the requests now are filled 4 tasks at a time instead of the normal 41-46 tasks at a time. Unless the feeder server is now fragmenting the requests to lower limits and feeding more simultaneous requests out of the 100 task buffer, I can't explain why the number of tasks delivered per request has dropped so significantly. Has the project increased the number of simultaneous connections to the server buffer? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
betreger Send message Joined: 29 Jun 99 Posts: 11416 Credit: 29,581,041 RAC: 66 |
To those crunchers who run out of work during the outrage I have a suggestion. Use E@H as your back up project, after all Seti owes them a lot of love, after all they have been so generous to allow us to run Nebula on their super computer. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
... What I can't explain is why the requests now are filled 4 tasks at a time instead of the normal 41-46 tasks at a time. Unless the feeder server is now fragmenting the requests to lower limits and feeding more simultaneous requests out of the 100 task buffer, I can't explain why the number of tasks delivered per request has dropped so significantly. Has the project increased the number of simultaneous connections to the server buffer? When I get a refill, I'm still seeing large numbers per session, like always, will often be ranging from 40-50s up to high 100s. Wondering if others who do not experience issue see this as well? Assuming so ... so doubtful it's any server changes. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
... What I can't explain is why the requests now are filled 4 tasks at a time instead of the normal 41-46 tasks at a time. Unless the feeder server is now fragmenting the requests to lower limits and feeding more simultaneous requests out of the 100 task buffer, I can't explain why the number of tasks delivered per request has dropped so significantly. Has the project increased the number of simultaneous connections to the server buffer? I doubt there have been any basic scheduler level changes made. To what end? Eric's got his hands full just trying to work out the v8 conversion and it's foibles. I doubt anybody would try to throw another wrench in the works by messing with the basic foundation. "Time is simply the mechanism that keeps everything from happening all at once." |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Already do. Have Einstein and MilkyWay as backup projects. The problem now with those projects is that they are forcing my computers into High Priority mode for the first time. Any work onboard from those projects seems to force HP even with new requests with deadlines out 10-14 days in the future. Never seen BOINC behave like this before, it's lost its mind. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
All I can think of is it's showing HP mode to meet it's Resource Share value. Time for NNT mode. Another note, My Linux box was having problems getting tasks before the weekly maintenance. Yes/no no/yes didn't seem to help. I tried switching to web based prefs only, and switched location too, didn't seem to help. Moved everything back to normal and just left as I had other things to do. When I came back and maintenance had started, I had a full 200 tasks (which only lasted 4.5 hours, LOL) and has been fine ever since. Maybe going to web prefs straightened it out? IDK, but something reset it. Worth a try. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Woke up to find the cache running down again. Changed cache settings to No, Yes, No for a few Scheduler requests, still running down. Changed them back, and work started flowing again. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
It's never been a problem before with MilkyWay just getting tasks constantly because of the hard server limit of 160 tasks per machine. I retire tasks faster than I ever let any task even come close to approaching its deadline which is two weeks out. I ran into issues with Einstein when they finished the BRP4G work and started on the FGRPB1G work which takes much longer per task and also takes a full CPU core now that it is OpenCL too. As usual, Einstein sent WAY TOO MUCH work initially and I quickly set it to NNT last month. I only brought in 60 tasks on the last request for work on 1/17 on each machine. All the tasks had a deadline of 1/31. This morning all machines thought they had to go HP on Einstein work. Really? Come on, 11 days till deadline and they think they have to finish the work in 3 days since getting it? I don't understand why BOINC is getting so discombobulated in its process priority. When I run Event Log options I don't see any red flags pop out indicating priority problems. I've never had any issues with High Priority before in running my 3 projects concurrently since 2011. This is something I've never experienced before. As I've said in this thread, BOINC has lost its mind. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
Woke up to find the cache running down again. Changed cache settings to No, Yes, No for a few Scheduler requests, still running down. Changed them back, and work started flowing again. A while later, the cache is running down again. Flipped the application settings, and it fills back up & then stays there. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
This is something I've never experienced before. Maybe this is why? I ran into issues with Einstein when they finished the BRP4G work and started on the FGRPB1G work which takes much longer per task and also takes a full CPU core now that it is OpenCL too. As usual, Einstein sent WAY TOO MUCH work initially I don't understand why BOINC is getting so discombobulated in its process priority. Maybe this had an impact why? and I quickly set it to NNT last month Longer than expected run times & the loss of a CPU core (or more) would take a while for the Manager to figure out what's going on. Setting NNT would have impacted on that process. The result being going in to High Priority while it tries to sort things out. It would have been worth not setting NNT and seeing if it was able to sort itself out sooner. Going with NNT throws a whole new spin on resource allocation as the Manager tries to balance resource share, deadlines and cache settings. And as Jason keeps pointing out the estimate components of Credit New are involved in allocating work, the last week or so of all Arecibo, then back to Arecibo/Guppie work mix would have thrown work fetch estimations up, down & sideways which combined with longer than estimated run times & you setting NNT and the loss of a CPU core or 2 results in a recipe for Manager confusion while it tries to resolve all the contradictory & conflicting requirements. Just a thought. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the comments, Grant. The only project I've ever had to set NNT on the past is Einstein as it consistently sends more work than can be finished by deadlines. Never needed it with MilkyWay because of the hard server limits per machine and rapid task completions. For a while with the BRP4G tasks with Einstein and a trick to reduce my BOINC disk limits, I was able to not set NNT on Einstein and keep it from sending too much work because of the large task sizes of that project. Now with the new work and MUCH smaller tasks, that trick doesn't work anymore and I'm forced to use NNT. I think you have correctly deduced the issue of CreditNew being the cause of my problems. It makes sense what you said about the recent deluge of exclusive Arecibo work and now the return of the normal mix of Arecibo/Guppie work. That is what is confusing BOINC now. Unfortunately it is impacting even MilkyWay work now which is not expected because of the work limit. In the meantime I have reduced my project usage by 50% on MilkyWay and Einstein to see if that helps. I won't see an immediate effect probably because it takes so long for CreditNew to figure out averages. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
More Centurion issues? The Server Status page shows a full Ready-to-send buffer, and it shows 7 GBT splitters running, but the number of files being split only shows 1 channel in progress. And I've been getting less and less Guppies over the last few hours. Still getting new GBT work, it's just a considerably reduced percentage of the total than usual. Grant Darwin NT |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
o splitters running could be the v8.23 ATI tasks going through. I seen a pile of resends from that earlier. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
o splitters running could be the v8.23 ATI tasks going through. I seen a pile of resends from that earlier. Server Status page still shows only 1 GBT channel being split, but over night most of the work I got was Guppie, so the earlier almost-all Arecibo downloads have been offset by the last few hours of downloads. And almost all of it is new work, very few resends. And so far this morning I haven't had to play with the application settings to keep the cache full (fingers crossed). Grant Darwin NT |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
o splitters running could be the v8.23 ATI tasks going through. I seen a pile of resends from that earlier. I do not know if it is just me, but every time I look I see 7 channels of GBT data being split???? or am I looking somewhere you are not? SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Wiggo Send message Joined: 24 Jan 00 Posts: 36798 Credit: 261,360,520 RAC: 489 |
o splitters running could be the v8.23 ATI tasks going through. I seen a pile of resends from that earlier. The thickness of the dark green part at the beginning of a file being split bar will indicate how many splitters are working on that 1 file and sometimes all of them can be working on just 1 file. ;-) Cheers. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
I do not know if it is just me, but every time I look I see 7 channels of GBT data being split???? or am I looking somewhere you are not? Must be a different place. Computing, Server Status page. Scroll down to splitter status, Breakthrough listen. It shows 7 files that have completed channels in them (light green), but it shows only 1 channel in progress (actually being split- dark green). Scroll down to Multibeam (Arecibo) and it shows 6 files having channels been completed (light green), and 4 channels in progress (dark green). Given the amount of GBT work I got overnight and the Ready-to-send buffer remains full, I suspect that more than one channel is being split, but it just isn't being displayed in the splitter status. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
The thickness of the dark green part at the beginning of a file being split bar will indicate how many splitters are working on that 1 file and sometimes all of them can be working on just 1 file. ;-) That's probably it- there are so many channels in a GBT file that a single channel is too small to see. And all the GBT work I've got (other than re-sends) is all from the one file. So that little sliver of dark green at the start of blc2_2bit_guppi_57423_32060_HIP53824_0017 isn't a single splitter, but all of them on the one file. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.