Message boards :
Number crunching :
Okay..so, time for me to bit*h and moan
Message board moderation
Author | Message |
---|---|
Draconian Send message Joined: 16 Mar 03 Posts: 21 Credit: 1,809,058 RAC: 0 |
First and foremost - nothing I say here is personal or an attack on anyone that runs this project - you do a heck of a job and are under appreciated - you give a TON! My respect. However.... With the frequent outages - please - open up the queues - I want to crunch 24/7 and - well...I'm out except for one Astro unit. 200 workunits doesn't cut it for this box - 6 core, 12 threads, and a 680. It's..getting hungry... Question - when I look at other folks queues - why do I see...MANY more units - several hundred to thousands....? All I get - no matter what I set - is 200.. Why do you send me workunits that expire 5 minutes AFTER you send them - hello? A lot of the recent failures have been with the scheduler - so - why not open up the queue - give me 5 days of data - and set a MANDATORY "do not contact" setting until my systems are half full? I would still have 2 and a half days data at that point - enough to get through most failures - and it would lower stress on the scheduler. It doesn't make ANY sense that my system contacts the server and asks for more work when my queue is nearly topped off - reporting 4 completed and asking for more (when...I still have the 196 to go). If there is a way to set the system to NOT allow the user to request communication until their queue is half full - it would be great. Again though - my respect. These are just the thoughts of someone that wants to crunch 24/7 and my systems are hungry. I donated a little money - but - if there is any other way I can help - I'm here. I have a background in communications (20 years in the Air Force) and can do anything from ordering circuits to installing them, troubleshooting them and their systems and engineering comm (fiber, DWDM, whatever) Here to help if needed (doubtful - but hey...ya just never know). It isn't doubting what you do at all - but - as I have learned through my career - sometimes - fresh eyes.... |
Draconian Send message Joined: 16 Mar 03 Posts: 21 Credit: 1,809,058 RAC: 0 |
And - with the above - I know there is a concern regarding bandwidth - however - slow and steady is just fine. Nobody needs to download at 500KB/sec - all we ever need to download at is enough to get the data when it is ready to be crunched. Data sitting idle on the system - doesn't make it crunch faster - all that should need to happen is that when the system is ready to crunch - the data is there. If the data arrived at the system at 500KB/sec or 30KB/sec - it's irrelevant - as long as it is there. |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
First and foremost - nothing I say here is personal or an attack on anyone that runs this project - you do a heck of a job and are under appreciated - you give a TON! My respect. They probably got those before the limits were put in place. Many of those units may even be ghosts that aren't really on the machines in question. Why do you send me workunits that expire 5 minutes AFTER you send them - hello? I thought everyone who is a regular in the forums knew this by now... You get short timeouts like that when your computer asks for both CPU and GPU work and the scheduler assigns it but the message doesn't get back to your computer. Five minutes later, it asks again, but this time only for CPU. The scheduler realizes it can't send the previously assigned GPU tasks on a CPU-only request, so it times them out immediately. A lot of the recent failures have been with the scheduler - so - why not open up the queue - give me 5 days of data - and set a MANDATORY "do not contact" setting until my systems are half full? I would still have 2 and a half days data at that point - enough to get through most failures - and it would lower stress on the scheduler. It doesn't make ANY sense that my system contacts the server and asks for more work when my queue is nearly topped off - reporting 4 completed and asking for more (when...I still have the 196 to go). If there is a way to set the system to NOT allow the user to request communication until their queue is half full - it would be great. There should be a way to set your cache configuration to make it work this way, but I'm not sure how you would do it or if it would matter with the limits on. Again though - my respect. These are just the thoughts of someone that wants to crunch 24/7 and my systems are hungry. I donated a little money - but - if there is any other way I can help - I'm here. I have a background in communications (20 years in the Air Force) and can do anything from ordering circuits to installing them, troubleshooting them and their systems and engineering comm (fiber, DWDM, whatever) Here to help if needed (doubtful - but hey...ya just never know). It isn't doubting what you do at all - but - as I have learned through my career - sometimes - fresh eyes.... Fresh eyes would probably help. Even getting Matt's eyes back from Europe would probably be a shot in the arm right now. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
:) You are close to the explanation but not exactly Yes - it happens only if those tasks were ghosts and only if they were VLARs (first request was for CPU and some VLARs was assigned to CPU but become ghosts) Then second request was for GPU (and VLARs are not sent to GPUs) "so it times them out immediately" Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.