how much GPU can the download server support?

Author	Message
zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213031 - Posted: 1 Apr 2012, 23:21:14 UTC For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project. So...what do folks with a lot of nvidia GPU power do to deal with this? Besides shaking your fist at the transfers tab, that is. Dublin, California Team: SETI.USA ID: 1213031 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1213036 - Posted: 1 Apr 2012, 23:44:34 UTC My rigs seem to be doing pretty well, holding onto their caches OK at the moment. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1213036 ·

Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0	Message 1213052 - Posted: 2 Apr 2012, 0:57:59 UTC - in response to Message 1213031. For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project. So...what do folks with a lot of nvidia GPU power do to deal with this? Besides shaking your fist at the transfers tab, that is. I've got 3 590's and 2 580's, and have plenty of work. The trick is to have the patience to wait for it to download. As near as I can tell, I'm about 36 hours behind from the scheduler request to actually downloading the work assigned. ID: 1213052 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1213055 - Posted: 2 Apr 2012, 1:06:13 UTC - in response to Message 1213052. The problem that you're experiencing is more likely due to the BOINC version that you're using more than anything else as you'll likely notice if you look at the versions that we who have answered so far are using. ;) Cheers. ID: 1213055 ·

TRuEQ & TuVaLu Volunteer tester Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10	Message 1213153 - Posted: 2 Apr 2012, 7:51:33 UTC - in response to Message 1213055. The problem that you're experiencing is more likely due to the BOINC version that you're using more than anything else as you'll likely notice if you look at the versions that we who have answered so far are using. ;) Cheers. I am using the same BM version and I think it is working very well. I also have the work cache problem but I think it somewhat more a problem with server backckoffs all the time. But I try to get ap tasks which are rare. //TRuEQ TRuEQ & TuVaLu ID: 1213153 ·

TRuEQ & TuVaLu Volunteer tester Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10	Message 1213154 - Posted: 2 Apr 2012, 7:55:32 UTC - in response to Message 1213031. For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project. So...what do folks with a lot of nvidia GPU power do to deal with this? Besides shaking your fist at the transfers tab, that is. I was told there are 2 download servers. Try edit your c:\Windows\system32\drivers\etc\hosts file Add the line 208.68.240.13 boinc2.ssl.berkeley.edu If that one works porely try add the other. 208.68.240.18 boinc2.ssl.berkeley.edu I hope it helps. //TRuEQ TRuEQ & TuVaLu ID: 1213154 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213252 - Posted: 2 Apr 2012, 17:04:53 UTC Thanks for the tips. Unfortunately, no change. Machine requests work. Gets about 10 tasks. Maybe 1 of 10 will download successfully. The rest download not at all, or only partially, then go into a back off loop. With each retry, one or two tasks will successfully download, and the rest go into a progressively increasing back off loop. After the 8th failure or so, I abort them. And BOINC will not ask for any more tasks until they all successfully download (or are aborted). So the GPUs sit idle most of the time. The only machine that I have that is slow enough to stay busy is one with a couple of 8800s. I have tried messing with the time out, the max concurrent transfers, the IP addresses. The problem is that the task downloads just stall, and may take many retries and many hours to complete download (if ever). I am not sure if the bottle neck is bandwidth or server capacity. Either way, it looks like I will have to put my CUDA machines on other projects until Berkeley gets it resolved. I assume they are aware of the problem. Dublin, California Team: SETI.USA ID: 1213252 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1213254 - Posted: 2 Apr 2012, 17:10:46 UTC - in response to Message 1213252. Thanks for the tips. Unfortunately, no change. Machine requests work. Gets about 10 tasks. Maybe 1 of 10 will download successfully. The rest download not at all, or only partially, then go into a back off loop. With each retry, one or two tasks will successfully download, and the rest go into a progressively increasing back off loop. After the 8th failure or so, I abort them. And BOINC will not ask for any more tasks until they all successfully download (or are aborted). So the GPUs sit idle most of the time. The only machine that I have that is slow enough to stay busy is one with a couple of 8800s. I have tried messing with the time out, the max concurrent transfers, the IP addresses. The problem is that the task downloads just stall, and may take many retries and many hours to complete download (if ever). I am not sure if the bottle neck is bandwidth or server capacity. Either way, it looks like I will have to put my CUDA machines on other projects until Berkeley gets it resolved. I assume they are aware of the problem. I have attached to Einstein with a zero percent work share on my top rigs. If they run out of Seti GPU work, they grab a few hours of Einstein work to do on the side, whilst still trying to get Seti work. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1213254 ·

LadyL Volunteer tester Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0	Message 1213257 - Posted: 2 Apr 2012, 17:21:39 UTC Last modified: 2 Apr 2012, 17:24:01 UTC try SIV That will nudge those stuck downloads along (Windows drop down menu, Boinc status, Auto retry checkbox) Don't abort tasks or the error will reduce your quota and you'll get even less tasks. Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. I'm not the Pope. I don't speak Ex Cathedra! ID: 1213257 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213266 - Posted: 2 Apr 2012, 17:53:20 UTC - in response to Message 1213257. try SIV That will nudge those stuck downloads along (Windows drop down menu, Boinc status, Auto retry checkbox) Don't abort tasks or the error will reduce your quota and you'll get even less tasks. Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. Yeah, I forgot to mention, I am also using SIV. With regard to quota, not a problem yet. I can't get enough tasks to even get close. ;) Dublin, California Team: SETI.USA ID: 1213266 ·

TRuEQ & TuVaLu Volunteer tester Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10	Message 1213269 - Posted: 2 Apr 2012, 17:57:18 UTC - in response to Message 1213252. Thanks for the tips. Unfortunately, no change. Machine requests work. Gets about 10 tasks. Maybe 1 of 10 will download successfully. The rest download not at all, or only partially, then go into a back off loop. With each retry, one or two tasks will successfully download, and the rest go into a progressively increasing back off loop. After the 8th failure or so, I abort them. And BOINC will not ask for any more tasks until they all successfully download (or are aborted). So the GPUs sit idle most of the time. The only machine that I have that is slow enough to stay busy is one with a couple of 8800s. I have tried messing with the time out, the max concurrent transfers, the IP addresses. The problem is that the task downloads just stall, and may take many retries and many hours to complete download (if ever). I am not sure if the bottle neck is bandwidth or server capacity. Either way, it looks like I will have to put my CUDA machines on other projects until Berkeley gets it resolved. I assume they are aware of the problem. Sorry it didn't work. I choosed 1 of the servers a couple of weeks ago and since then it have worked for me. But I'm located in northern Europe maybe that's why it worked. //TRuEQ TRuEQ & TuVaLu ID: 1213269 ·

Slavac Volunteer tester Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0	Message 1213280 - Posted: 2 Apr 2012, 18:22:55 UTC - in response to Message 1213270. Don't forget we have a new download server coming online soon. Executive Director GPU Users Group Inc. - brad@gpuug.org ID: 1213280 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1213283 - Posted: 2 Apr 2012, 18:34:37 UTC - in response to Message 1213252. After the 8th failure or so, I abort them. Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch. Grant Darwin NT ID: 1213283 ·

LadyL Volunteer tester Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0	Message 1213289 - Posted: 2 Apr 2012, 18:43:48 UTC - in response to Message 1213266. try SIV That will nudge those stuck downloads along (Windows drop down menu, Boinc status, Auto retry checkbox) Don't abort tasks or the error will reduce your quota and you'll get even less tasks. Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. Yeah, I forgot to mention, I am also using SIV. With regard to quota, not a problem yet. I can't get enough tasks to even get close. ;) I didn't mean the limits on tasks in progress, I meant the 'max tasks per day' as per application details page, which gets reduced if you have errors. I thought download errors (and tasks aborted by user) count against that one too. I'm not the Pope. I don't speak Ex Cathedra! ID: 1213289 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1213291 - Posted: 2 Apr 2012, 18:44:29 UTC - in response to Message 1213257. Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. Under normal circumstances it would actually work. ie download bandwidth is only maxed out 1-10% of the time. The backoffs would help clear that load. But here at Seti where the bandwidth is maxed out all the time, it makes it impossible to download any work. Grant Darwin NT ID: 1213291 ·

TRuEQ & TuVaLu Volunteer tester Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10	Message 1213296 - Posted: 2 Apr 2012, 18:51:23 UTC Last modified: 2 Apr 2012, 18:52:33 UTC I also noticed a strange behaviour when running on 2GPU with BM 7.0.x I used exclude in cc_config.xml to manage which card that where supposed to get what task. SETIBeta was only requesting 10tasks and then waited for the tasks to get downloaded. And then BM waited for thoose tasks to get completed. Then it requested 10 new tasks. That dissapeared when I removed my second GPU card. Now I run only 1 card and the work flows as it should and I use the changes in cc_config.xml as I wrote earliar in this thread. Maybe it is a BM 7.x.x bug here when using more then 1GPU? What do the experts think? //TRuEQ TRuEQ & TuVaLu ID: 1213296 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1213305 - Posted: 2 Apr 2012, 19:22:22 UTC - in response to Message 1213280. Don't forget we have a new download server coming online soon. It will be interesting to see if it does help with the download problems, although if it does all it will be doing is working around the problem that the load sharing between the two servers, for whatever reason, just isn't working too well. One server appears to carry most of the load- hence trying to download from it results in frustration. Whereas the other server doesn't have much load & so downloads from it, even at the worst of times, are possible. Although presently with all the AP work & the shorites going through the system downloads are more difficult than they have been. And even Scheduler requests are taking a while to get a response, so there could be some other servver issues lurking in the backgropund at the moment. Grant Darwin NT ID: 1213305 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213341 - Posted: 2 Apr 2012, 21:12:28 UTC - in response to Message 1213289. Yeah, I forgot to mention, I am also using SIV. With regard to quota, not a problem yet. I can't get enough tasks to even get close. ;) I didn't mean the limits on tasks in progress, I meant the 'max tasks per day' as per application details page, which gets reduced if you have errors. I thought download errors (and tasks aborted by user) count against that one too. Yes, I understood what you were talking about. I am saying that I cannot come close to reaching the 'max tasks per day', even with the reduced limit due to the aborted tasks. Dublin, California Team: SETI.USA ID: 1213341 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213344 - Posted: 2 Apr 2012, 21:15:39 UTC - in response to Message 1213283. After the 8th failure or so, I abort them. Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch. Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again. Dublin, California Team: SETI.USA ID: 1213344 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 1213346 - Posted: 2 Apr 2012, 21:17:17 UTC - in response to Message 1213291. Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't. Under normal circumstances it would actually work. ie download bandwidth is only maxed out 1-10% of the time. The backoffs would help clear that load. But here at Seti where the bandwidth is maxed out all the time, it makes it impossible to download any work. If bandwidth is the bottle neck, then the new sever will not improve the situation, right? Are there any plans to improve bandwidth? Dublin, California Team: SETI.USA ID: 1213346 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.