Message boards :
Number crunching :
how much GPU can the download server support?
Message board moderation
Author | Message |
---|---|
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project. So...what do folks with a lot of nvidia GPU power do to deal with this? Besides shaking your fist at the transfers tab, that is. Dublin, California Team: SETI.USA |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
My rigs seem to be doing pretty well, holding onto their caches OK at the moment. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0 |
For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project. I've got 3 590's and 2 580's, and have plenty of work. The trick is to have the patience to wait for it to download. As near as I can tell, I'm about 36 hours behind from the scheduler request to actually downloading the work assigned. |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
The problem that you're experiencing is more likely due to the BOINC version that you're using more than anything else as you'll likely notice if you look at the versions that we who have answered so far are using. ;) Cheers. |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
The problem that you're experiencing is more likely due to the BOINC version that you're using more than anything else as you'll likely notice if you look at the versions that we who have answered so far are using. ;) I am using the same BM version and I think it is working very well. I also have the work cache problem but I think it somewhat more a problem with server backckoffs all the time. But I try to get ap tasks which are rare. //TRuEQ TRuEQ & TuVaLu |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project. I was told there are 2 download servers. Try edit your c:\Windows\system32\drivers\etc\hosts file Add the line 208.68.240.13 boinc2.ssl.berkeley.edu If that one works porely try add the other. 208.68.240.18 boinc2.ssl.berkeley.edu I hope it helps. //TRuEQ TRuEQ & TuVaLu |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Thanks for the tips. Unfortunately, no change. Machine requests work. Gets about 10 tasks. Maybe 1 of 10 will download successfully. The rest download not at all, or only partially, then go into a back off loop. With each retry, one or two tasks will successfully download, and the rest go into a progressively increasing back off loop. After the 8th failure or so, I abort them. And BOINC will not ask for any more tasks until they all successfully download (or are aborted). So the GPUs sit idle most of the time. The only machine that I have that is slow enough to stay busy is one with a couple of 8800s. I have tried messing with the time out, the max concurrent transfers, the IP addresses. The problem is that the task downloads just stall, and may take many retries and many hours to complete download (if ever). I am not sure if the bottle neck is bandwidth or server capacity. Either way, it looks like I will have to put my CUDA machines on other projects until Berkeley gets it resolved. I assume they are aware of the problem. Dublin, California Team: SETI.USA |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Thanks for the tips. I have attached to Einstein with a zero percent work share on my top rigs. If they run out of Seti GPU work, they grab a few hours of Einstein work to do on the side, whilst still trying to get Seti work. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
try SIV That will nudge those stuck downloads along (Windows drop down menu, Boinc status, Auto retry checkbox) Don't abort tasks or the error will reduce your quota and you'll get even less tasks. Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. I'm not the Pope. I don't speak Ex Cathedra! |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
try SIV Yeah, I forgot to mention, I am also using SIV. With regard to quota, not a problem yet. I can't get enough tasks to even get close. ;) Dublin, California Team: SETI.USA |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
Thanks for the tips. Sorry it didn't work. I choosed 1 of the servers a couple of weeks ago and since then it have worked for me. But I'm located in northern Europe maybe that's why it worked. //TRuEQ TRuEQ & TuVaLu |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
After the 8th failure or so, I abort them. Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch. Grant Darwin NT |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
try SIV I didn't mean the limits on tasks in progress, I meant the 'max tasks per day' as per application details page, which gets reduced if you have errors. I thought download errors (and tasks aborted by user) count against that one too. I'm not the Pope. I don't speak Ex Cathedra! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. Under normal circumstances it would actually work. ie download bandwidth is only maxed out 1-10% of the time. The backoffs would help clear that load. But here at Seti where the bandwidth is maxed out all the time, it makes it impossible to download any work. Grant Darwin NT |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
I also noticed a strange behaviour when running on 2GPU with BM 7.0.x I used exclude in cc_config.xml to manage which card that where supposed to get what task. SETIBeta was only requesting 10tasks and then waited for the tasks to get downloaded. And then BM waited for thoose tasks to get completed. Then it requested 10 new tasks. That dissapeared when I removed my second GPU card. Now I run only 1 card and the work flows as it should and I use the changes in cc_config.xml as I wrote earliar in this thread. Maybe it is a BM 7.x.x bug here when using more then 1GPU? What do the experts think? //TRuEQ TRuEQ & TuVaLu |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Don't forget we have a new download server coming online soon. It will be interesting to see if it does help with the download problems, although if it does all it will be doing is working around the problem that the load sharing between the two servers, for whatever reason, just isn't working too well. One server appears to carry most of the load- hence trying to download from it results in frustration. Whereas the other server doesn't have much load & so downloads from it, even at the worst of times, are possible. Although presently with all the AP work & the shorites going through the system downloads are more difficult than they have been. And even Scheduler requests are taking a while to get a response, so there could be some other servver issues lurking in the backgropund at the moment. Grant Darwin NT |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Yeah, I forgot to mention, I am also using SIV. With regard to quota, not a problem yet. I can't get enough tasks to even get close. ;) Yes, I understood what you were talking about. I am saying that I cannot come close to reaching the 'max tasks per day', even with the reduced limit due to the aborted tasks. Dublin, California Team: SETI.USA |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
After the 8th failure or so, I abort them. Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again. Dublin, California Team: SETI.USA |
zombie67 [MM] Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0 |
Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. If bandwidth is the bottle neck, then the new sever will not improve the situation, right? Are there any plans to improve bandwidth? Dublin, California Team: SETI.USA |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.