Panic Mode On (74) Server problems? |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (74) Server problems?
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next
| Author | Message |
|---|---|
|
My guess is that we are again seeing some kind of scheduler/feeder limitation. | |
| ID: 1229549 · | |
My guess is that we are again seeing some kind of scheduler/feeder limitation. My machines are no longer uploading/requesting tasks 1 or 2 at a time. As they seem to have filled to their cache settings. So we may be looking at a normal bandwidth graph again. Which is how it would often look in the days before limits sans AP or shorties. Not to say all requests are being fulfilled. Just that there are not so many transfers in progress to keep the bandwidth pegged 24/7. ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 1229581 · | |
|
With the increased limits and the scheduler/feeder not having tasks available all the time.... | |
| ID: 1230054 · | |
With the increased limits and the scheduler/feeder not having tasks available all the time.... I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses. ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 1230066 · | |
With the increased limits and the scheduler/feeder not having tasks available all the time.... I don't believe this has ANYTHING to do with the Boinc client. The host continually asks for GPU 'AND' CPU tasks. But is repeatedly ONLY sent GPU work. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1230068 · | |
I don't believe ANYTHING that has to do with the Boinc client. There, I fixed it:) | |
| ID: 1230070 · | |
I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses. No, I have 7.0.25 on my QX6700 and it's got the same problem, so having V7 does not help with this server issue. I would like to see a bigger fifo so fewer requests are needed to replenish the cache. | |
| ID: 1230072 · | |
I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses. It's not the client.... It's the what the scheduler logic does with the client request. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1230073 · | |
I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses. And by scheduler, Mark means the scheduler that runs on the server - that is indeed where this particular problem lies. | |
| ID: 1230078 · | |
I thought there was talk about that being corrected in the v7 client, but then there is the odd high/low work fetch system it uses. Thank you, Richard. Of my top 3 rigs, 2 are now running GPU only due to this bug. The only reason the 3rd is not is that the CPU is running on cached AP work with the manually installed AP app. Otherwise, it would be in the same boat. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1230081 · | |
|
If you stop BOINC and set a bigish duration_correction_factor you will just get CPU work for a while. The reason my 980X gets CPU WUs is the DCF jumps to 6 when a slow GPU finishes and the system just asks for CPU WUs 'till it drops. | |
| ID: 1230092 · | |
If you stop BOINC and set a bigish duration_correction_factor you will just get CPU work for a while. The reason my 980X gets CPU WUs is the DCF jumps to 6 when a slow GPU finishes and the system just asks for CPU WUs 'till it drops. I have enough GPU work to last a bit, so I am going to do the 'uncheck use nvidia GPU' trick to get some CPU work flowing. But, that is a workaround, and should not be necessary. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1230096 · | |
I would like to see a bigger fifo so fewer requests are needed to replenish the cache. It's called the feeder. The usual workaround is to disable the resource in the project prefs that is getting all the tasks, until the 'slower' has some sort of cache. The other option would be to reduce cache, allow the slower resource to catch up and then gradually increase cache again. It will eventually get sorted by itself, but if you have a large cache to fill, it may take quite a while until you have single resource requests again instead of double ones. ____________ I'm not the Pope. I don't speak Ex Cathedra! | |
| ID: 1230097 · | |
|
Who stole all APs? Or, who stole the AP splitters? | |
| ID: 1230136 · | |
Who stole all APs? Or, who stole the AP splitters? It may be that since AP_v505 is now down to 8 still out in the field, new tapes will be held until the last of those 505's comes in and they can do that DB kick thing (which may have already been done since "awaiting validation" isn't 10k+ anymore). Or.. it could just be that there were so many tapes loaded up in the first place that we're now at that point where we have to sit around and wait for the MB splitters to catch up to get some new tapes loaded for AP to split. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 1230235 · | |
Who stole all APs? Or, who stole the AP splitters? I am down to 18 from 65 a couple of days ago. ____________ | |
| ID: 1230303 · | |
|
Well this is just starting to be almost slightly irritating. Because of the adjustments to the estimates, I ended up with like a 22-day AP-only cache and therefore, my average turnaround time was in the high teens. The result of this was that most of my wingmates were waiting for me, so I ended up with nearly every reported result being validated immediately. | |
| ID: 1230426 · | |
|
| |
| ID: 1230437 · | |
Well, the tasks are going out, because we're now over 5.5 million out in the field. I don't know how big that figure can be before the database starts slowing down... | |
| ID: 1230442 · | |
I still reckon something's not quite right. Now there are no limits I expect many hosts are asking for and getting the entire of the feeder buffer. Getting WUs is going to be a problem 'till all the caches are full. I feel it would help a lot if the feeder could have a bigger buffer. I am puzzled as to why the Result average turnaround is dropping though. | |
| ID: 1230458 · | |
Message boards : Number crunching : Panic Mode On (74) Server problems?
| Copyright © 2013 University of California |