how much GPU can the download server support?

Message boards : Number crunching : how much GPU can the download server support?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213031 - Posted: 1 Apr 2012, 23:21:14 UTC

For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project.

So...what do folks with a lot of nvidia GPU power do to deal with this? Besides shaking your fist at the transfers tab, that is.
Dublin, California
Team: SETI.USA
ID: 1213031 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1213036 - Posted: 1 Apr 2012, 23:44:34 UTC

My rigs seem to be doing pretty well, holding onto their caches OK at the moment.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1213036 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1213052 - Posted: 2 Apr 2012, 0:57:59 UTC - in response to Message 1213031.  

For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project.

So...what do folks with a lot of nvidia GPU power do to deal with this? Besides shaking your fist at the transfers tab, that is.


I've got 3 590's and 2 580's, and have plenty of work. The trick is to have the patience to wait for it to download. As near as I can tell, I'm about 36 hours behind from the scheduler request to actually downloading the work assigned.
ID: 1213052 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1213055 - Posted: 2 Apr 2012, 1:06:13 UTC - in response to Message 1213052.  

The problem that you're experiencing is more likely due to the BOINC version that you're using more than anything else as you'll likely notice if you look at the versions that we who have answered so far are using. ;)

Cheers.
ID: 1213055 · Report as offensive
Profile TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 505
Credit: 69,523,653
RAC: 10
Sweden
Message 1213153 - Posted: 2 Apr 2012, 7:51:33 UTC - in response to Message 1213055.  

The problem that you're experiencing is more likely due to the BOINC version that you're using more than anything else as you'll likely notice if you look at the versions that we who have answered so far are using. ;)

Cheers.



I am using the same BM version and I think it is working very well.
I also have the work cache problem but I think it somewhat more a problem with server backckoffs all the time.

But I try to get ap tasks which are rare.

//TRuEQ
TRuEQ & TuVaLu
ID: 1213153 · Report as offensive
Profile TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 505
Credit: 69,523,653
RAC: 10
Sweden
Message 1213154 - Posted: 2 Apr 2012, 7:55:32 UTC - in response to Message 1213031.  

For nvidia, and using a 580 as the example, how much GPU can the download server support? I am finding it very difficult to keep even a single 580 running. And building up a cache seems impossible. The problem is 4x with my dual 590 machine. When I add another project, to fill the gaps, the whole cache fills up with the other project.

So...what do folks with a lot of nvidia GPU power do to deal with this? Besides shaking your fist at the transfers tab, that is.



I was told there are 2 download servers.

Try edit your c:\Windows\system32\drivers\etc\hosts file

Add the line
208.68.240.13 boinc2.ssl.berkeley.edu

If that one works porely try add the other.
208.68.240.18 boinc2.ssl.berkeley.edu

I hope it helps.

//TRuEQ


TRuEQ & TuVaLu
ID: 1213154 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213252 - Posted: 2 Apr 2012, 17:04:53 UTC

Thanks for the tips.

Unfortunately, no change. Machine requests work. Gets about 10 tasks. Maybe 1 of 10 will download successfully. The rest download not at all, or only partially, then go into a back off loop. With each retry, one or two tasks will successfully download, and the rest go into a progressively increasing back off loop. After the 8th failure or so, I abort them. And BOINC will not ask for any more tasks until they all successfully download (or are aborted). So the GPUs sit idle most of the time. The only machine that I have that is slow enough to stay busy is one with a couple of 8800s.

I have tried messing with the time out, the max concurrent transfers, the IP addresses. The problem is that the task downloads just stall, and may take many retries and many hours to complete download (if ever). I am not sure if the bottle neck is bandwidth or server capacity. Either way, it looks like I will have to put my CUDA machines on other projects until Berkeley gets it resolved. I assume they are aware of the problem.
Dublin, California
Team: SETI.USA
ID: 1213252 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1213254 - Posted: 2 Apr 2012, 17:10:46 UTC - in response to Message 1213252.  

Thanks for the tips.

Unfortunately, no change. Machine requests work. Gets about 10 tasks. Maybe 1 of 10 will download successfully. The rest download not at all, or only partially, then go into a back off loop. With each retry, one or two tasks will successfully download, and the rest go into a progressively increasing back off loop. After the 8th failure or so, I abort them. And BOINC will not ask for any more tasks until they all successfully download (or are aborted). So the GPUs sit idle most of the time. The only machine that I have that is slow enough to stay busy is one with a couple of 8800s.

I have tried messing with the time out, the max concurrent transfers, the IP addresses. The problem is that the task downloads just stall, and may take many retries and many hours to complete download (if ever). I am not sure if the bottle neck is bandwidth or server capacity. Either way, it looks like I will have to put my CUDA machines on other projects until Berkeley gets it resolved. I assume they are aware of the problem.

I have attached to Einstein with a zero percent work share on my top rigs. If they run out of Seti GPU work, they grab a few hours of Einstein work to do on the side, whilst still trying to get Seti work.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1213254 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1213257 - Posted: 2 Apr 2012, 17:21:39 UTC
Last modified: 2 Apr 2012, 17:24:01 UTC

try SIV

That will nudge those stuck downloads along (Windows drop down menu, Boinc status, Auto retry checkbox)

Don't abort tasks or the error will reduce your quota and you'll get even less tasks.

Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1213257 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213266 - Posted: 2 Apr 2012, 17:53:20 UTC - in response to Message 1213257.  

try SIV

That will nudge those stuck downloads along (Windows drop down menu, Boinc status, Auto retry checkbox)

Don't abort tasks or the error will reduce your quota and you'll get even less tasks.

Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't.


Yeah, I forgot to mention, I am also using SIV. With regard to quota, not a problem yet. I can't get enough tasks to even get close. ;)
Dublin, California
Team: SETI.USA
ID: 1213266 · Report as offensive
Profile TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 505
Credit: 69,523,653
RAC: 10
Sweden
Message 1213269 - Posted: 2 Apr 2012, 17:57:18 UTC - in response to Message 1213252.  

Thanks for the tips.

Unfortunately, no change. Machine requests work. Gets about 10 tasks. Maybe 1 of 10 will download successfully. The rest download not at all, or only partially, then go into a back off loop. With each retry, one or two tasks will successfully download, and the rest go into a progressively increasing back off loop. After the 8th failure or so, I abort them. And BOINC will not ask for any more tasks until they all successfully download (or are aborted). So the GPUs sit idle most of the time. The only machine that I have that is slow enough to stay busy is one with a couple of 8800s.

I have tried messing with the time out, the max concurrent transfers, the IP addresses. The problem is that the task downloads just stall, and may take many retries and many hours to complete download (if ever). I am not sure if the bottle neck is bandwidth or server capacity. Either way, it looks like I will have to put my CUDA machines on other projects until Berkeley gets it resolved. I assume they are aware of the problem.



Sorry it didn't work.
I choosed 1 of the servers a couple of weeks ago and since then it have worked for me. But I'm located in northern Europe maybe that's why it worked.

//TRuEQ
TRuEQ & TuVaLu
ID: 1213269 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1213280 - Posted: 2 Apr 2012, 18:22:55 UTC - in response to Message 1213270.  

Don't forget we have a new download server coming online soon.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1213280 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1213283 - Posted: 2 Apr 2012, 18:34:37 UTC - in response to Message 1213252.  

After the 8th failure or so, I abort them.

Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch.

Grant
Darwin NT
ID: 1213283 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1213289 - Posted: 2 Apr 2012, 18:43:48 UTC - in response to Message 1213266.  

try SIV

That will nudge those stuck downloads along (Windows drop down menu, Boinc status, Auto retry checkbox)

Don't abort tasks or the error will reduce your quota and you'll get even less tasks.

Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't.


Yeah, I forgot to mention, I am also using SIV. With regard to quota, not a problem yet. I can't get enough tasks to even get close. ;)


I didn't mean the limits on tasks in progress, I meant the 'max tasks per day' as per application details page, which gets reduced if you have errors.
I thought download errors (and tasks aborted by user) count against that one too.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1213289 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1213291 - Posted: 2 Apr 2012, 18:44:29 UTC - in response to Message 1213257.  

Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't.

Under normal circumstances it would actually work.
ie download bandwidth is only maxed out 1-10% of the time. The backoffs would help clear that load. But here at Seti where the bandwidth is maxed out all the time, it makes it impossible to download any work.
Grant
Darwin NT
ID: 1213291 · Report as offensive
Profile TRuEQ & TuVaLu
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 505
Credit: 69,523,653
RAC: 10
Sweden
Message 1213296 - Posted: 2 Apr 2012, 18:51:23 UTC
Last modified: 2 Apr 2012, 18:52:33 UTC

I also noticed a strange behaviour when running on 2GPU with BM 7.0.x
I used exclude in cc_config.xml to manage which card that where supposed to get what task.

SETIBeta was only requesting 10tasks and then waited for the tasks to get downloaded. And then BM waited for thoose tasks to get completed.
Then it requested 10 new tasks.

That dissapeared when I removed my second GPU card.

Now I run only 1 card and the work flows as it should and I use the changes in cc_config.xml as I wrote earliar in this thread.

Maybe it is a BM 7.x.x bug here when using more then 1GPU?
What do the experts think?

//TRuEQ
TRuEQ & TuVaLu
ID: 1213296 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1213305 - Posted: 2 Apr 2012, 19:22:22 UTC - in response to Message 1213280.  

Don't forget we have a new download server coming online soon.

It will be interesting to see if it does help with the download problems, although if it does all it will be doing is working around the problem that the load sharing between the two servers, for whatever reason, just isn't working too well. One server appears to carry most of the load- hence trying to download from it results in frustration. Whereas the other server doesn't have much load & so downloads from it, even at the worst of times, are possible.

Although presently with all the AP work & the shorites going through the system downloads are more difficult than they have been. And even Scheduler requests are taking a while to get a response, so there could be some other servver issues lurking in the backgropund at the moment.
Grant
Darwin NT
ID: 1213305 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213341 - Posted: 2 Apr 2012, 21:12:28 UTC - in response to Message 1213289.  

Yeah, I forgot to mention, I am also using SIV. With regard to quota, not a problem yet. I can't get enough tasks to even get close. ;)


I didn't mean the limits on tasks in progress, I meant the 'max tasks per day' as per application details page, which gets reduced if you have errors.
I thought download errors (and tasks aborted by user) count against that one too.


Yes, I understood what you were talking about. I am saying that I cannot come close to reaching the 'max tasks per day', even with the reduced limit due to the aborted tasks.
Dublin, California
Team: SETI.USA
ID: 1213341 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213344 - Posted: 2 Apr 2012, 21:15:39 UTC - in response to Message 1213283.  

After the 8th failure or so, I abort them.

Why? Aborting tasks reduces the number you can download. Returning work that validates increases the quota you can download. Till you abort the next batch.


Because BOINC will not ask for more work, until all the tasks complete download or are aborted. For example, BOINC asks for work, and gets 10 tasks. Only 8 will (eventually) successfully download. So I can let my GPUs sit idle, or I can abort the last two, which allows BOINC ask for more work again.
Dublin, California
Team: SETI.USA
ID: 1213344 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,771,894
RAC: 0
United States
Message 1213346 - Posted: 2 Apr 2012, 21:17:17 UTC - in response to Message 1213291.  

Edit: yes I know, David thinks that the feature that project backoff is cleared when a task finishes downloading is enough. We know it isn't.

Under normal circumstances it would actually work.
ie download bandwidth is only maxed out 1-10% of the time. The backoffs would help clear that load. But here at Seti where the bandwidth is maxed out all the time, it makes it impossible to download any work.


If bandwidth is the bottle neck, then the new sever will not improve the situation, right? Are there any plans to improve bandwidth?
Dublin, California
Team: SETI.USA
ID: 1213346 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : how much GPU can the download server support?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.