Please rise the limits... just a little...


log in

Advanced search

Message boards : Number crunching : Please rise the limits... just a little...

1 · 2 · 3 · 4 · Next
Author Message
juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5472
Credit: 313,439,005
RAC: 100,726
Brazil
Message 1344751 - Posted: 9 Mar 2013, 23:05:06 UTC

Now with the Windows TCP Settings bug fixed why not rise the limit a little?

Maybe just to 100WU per GPU not per host (mantaining the 100 limit on CPU work), that will allow us to pass more easely the scheduled manteinances on each Tuesday on our fastest hosts.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,708,002
RAC: 19,520
Australia
Message 1344753 - Posted: 9 Mar 2013, 23:12:24 UTC - in response to Message 1344751.


100 per core, 600 per GPU would be a nice start.
____________
Grant
Darwin NT.

Profile Keith Myers
Volunteer tester
Avatar
Send message
Joined: 29 Apr 01
Posts: 175
Credit: 65,836,082
RAC: 28,221
United States
Message 1344795 - Posted: 10 Mar 2013, 2:02:15 UTC - in response to Message 1344753.


100 per core, 600 per GPU would be a nice start.

Ditto.
____________

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8748
Credit: 61,650,314
RAC: 37,366
United Kingdom
Message 1344864 - Posted: 10 Mar 2013, 9:08:24 UTC

While an increase in the number of WU distributed to an individual cruncher might sound like a good idea it wouldn't help anything part from our egos.
With the way the weekly outages have gone of late there is actually little need to increase per processor, what would be good however is BOINC to correctly identify multi processor Nvidia cards are being more than one processor. Why the chuff chuff does BOINC decide that my GTX690 is only a single processor, when it is reported as [2] on the accounts page??
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,708,002
RAC: 19,520
Australia
Message 1344871 - Posted: 10 Mar 2013, 9:17:04 UTC - in response to Message 1344864.

Why the chuff chuff does BOINC decide that my GTX690 is only a single processor, when it is reported as [2] on the accounts page??

Because it is a single device.
Just as a 16 core CPU with Hypethreading (so 32 available processing units) is a single CPU.
____________
Grant
Darwin NT.

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8748
Credit: 61,650,314
RAC: 37,366
United Kingdom
Message 1344880 - Posted: 10 Mar 2013, 9:46:44 UTC

But it isn't - its two devices on one board, which may, or may not, be connected by an internal SLI link (mine are unlinked). One part of the system reports it as TWO devices - take a look at the details for yourself : http://setiathome.berkeley.edu/show_host_detail.php?hostid=6890059
It is interesting to note that GPUGRID treats the GTX690 as being TWO GPUs as witnessed by the fact that a few minutes ago it was running one instance of a GPUGRID task, plus three S@H tasks at the same time, and will run 6 S@H tasks with a setting of 0.33/GPU - if it were a single GPU it would not be capable of doing either of these sets of operations.

____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 0
United States
Message 1344900 - Posted: 10 Mar 2013, 10:40:48 UTC
Last modified: 10 Mar 2013, 10:41:23 UTC

While an increase in the number of WU distributed to an individual cruncher might sound like a good idea it wouldn't help anything part from our egos.

I disagree. Everytime I run out of work during the outages, I load work from a "B" project, and that crunch time is lost to SETI. I suspect the faster crunchers run out more than I do. That lost crunch time means that less results flow to the science databases, and that hurts the project. It is wasted capacity. Give us enough to withstand a 48 hour outage.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8763
Credit: 52,715,310
RAC: 18,478
United Kingdom
Message 1344904 - Posted: 10 Mar 2013, 10:47:54 UTC - in response to Message 1344900.

While an increase in the number of WU distributed to an individual cruncher might sound like a good idea it wouldn't help anything part from our egos.

I disagree. Everytime I run out of work during the outages, I load work from a "B" project, and that crunch time is lost to SETI. I suspect the faster crunchers run out more than I do. That lost crunch time means that less results flow to the science databases, and that hurts the project. It is wasted capacity. Give us enough to withstand a 48 hour outage.

No it doesn't, I'm afraid. At the moment, with the project running absolutely flat out, it means that somebody else grabs the tasks and runs them for you.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,708,002
RAC: 19,520
Australia
Message 1344909 - Posted: 10 Mar 2013, 11:03:22 UTC - in response to Message 1344880.

But it isn't

But it is.
Just as i pointed out in my first post- 8 or 16 or 32 cores in a CPU still counts as 1 CPU.
2 or 4 or 8 GPUs on a single board still counts as a single video card.


I'm not sure what happens in the case of 2 CPUs or 2 physical video cards. If the limitation is per device, then you'd get 100 WUs for each CPU & each video card. If the limitation is per system, then you'd still be limited to 100 WUs for all CPUs & 100WUs for all video cards no matter how many the system has.
____________
Grant
Darwin NT.

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 0
United States
Message 1344911 - Posted: 10 Mar 2013, 11:08:35 UTC

While an increase in the number of WU distributed to an individual cruncher might sound like a good idea it wouldn't help anything part from our egos.


I disagree. Everytime I run out of work during the outages, I load work from a "B" project, and that crunch time is lost to SETI. I suspect the faster crunchers run out more than I do. That lost crunch time means that less results flow to the science databases, and that hurts the project. It is wasted capacity. Give us enough to withstand a 48 hour outage.

No it doesn't, I'm afraid. At the moment, with the project running absolutely flat out, it means that somebody else grabs the tasks and runs them for you.

Just don't see it that way. If I can't get work during an outage, they can't either. And if I still have B project work left over when work flow resumes and someone runs tasks I would have run, one set gets run instead of two. Larger caches would allow the proect to run flat out, which it is not doing when we're out of work.

____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32340
Credit: 14,276,495
RAC: 8,012
United Kingdom
Message 1344913 - Posted: 10 Mar 2013, 11:14:48 UTC

At the moment, with the project running absolutely flat out,

That is the whole point that Richard sensibly makes. They are doing their best, and yes there doesn't seem to be enough work for everybody. You have to remember that the infrastructure of this project was scoped out 10 years ago when we didn't have quad and hex core processors, nor GPU cards all crunching away. The fact that they have kept pace with technology as well as they have done, on a limited budget, is a credit to them.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7953
Credit: 98,331,608
RAC: 26,176
Australia
Message 1344915 - Posted: 10 Mar 2013, 11:25:47 UTC - in response to Message 1344909.

But it isn't

But it is.
Just as i pointed out in my first post- 8 or 16 or 32 cores in a CPU still counts as 1 CPU.
2 or 4 or 8 GPUs on a single board still counts as a single video card.


I'm not sure what happens in the case of 2 CPUs or 2 physical video cards. If the limitation is per device, then you'd get 100 WUs for each CPU & each video card. If the limitation is per system, then you'd still be limited to 100 WUs for all CPUs & 100WUs for all video cards no matter how many the system has.

It doesn't matter whether you have 1 CPU or 2 you still only get 100 w/u's and the same goes for a GPU capable machine and that number also stays at 100 w/u's no matter how many which means that unless you run some really early version of BOINC the limit for any machine crunching on both CPU and GPU is 200 w/u's.

Cheers.
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5472
Credit: 313,439,005
RAC: 100,726
Brazil
Message 1344928 - Posted: 10 Mar 2013, 12:12:30 UTC - in response to Message 1344915.
Last modified: 10 Mar 2013, 12:17:24 UTC


It doesn't matter whether you have 1 CPU or 2 you still only get 100 w/u's and the same goes for a GPU capable machine and that number also stays at 100 w/u's no matter how many which means that unless you run some really early version of BOINC the limit for any machine crunching on both CPU and GPU is 200 w/u's.

That exactly why i sugest a "little increase" in the limit of the GPU WU, 100 WU of CPU task is far enought for a 1/2 day of work, even on the fastest CPU´s (if i´m wrong someone could show why please), but 100 WU of GPU is not, even in a single 690 hosts, normaly a WU will crunch (not a shortie of course) in less than 7 minutes, it´s about 34 WU per hour, so a 100 WU caches last for less than 3 hours, not enought for the 3-6 hours outages. A 200 per GPU limit will get us far enought work in a double or triple GPU hosts, even for a large normal outage, not when a unsheduled things happening of course, but that will be a good beginning and does not produce to much new load to the databases.

I noticed something else (out of this thread focus), the cricket show almost 100% of bandwidth utilization, but all the AP splitters are out and MB spiting is in "slow" mode (only 3 splliters are workin) and we have a lot of ready to send MB units (more then 300K) and everything apears to work fine, lets see what happening when the AP splitters returns to dutty.
____________

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 24,148,117
RAC: 1,872
United States
Message 1345099 - Posted: 10 Mar 2013, 21:46:38 UTC - in response to Message 1344904.

While an increase in the number of WU distributed to an individual cruncher might sound like a good idea it wouldn't help anything part from our egos.

I disagree. Everytime I run out of work during the outages, I load work from a "B" project, and that crunch time is lost to SETI. I suspect the faster crunchers run out more than I do. That lost crunch time means that less results flow to the science databases, and that hurts the project. It is wasted capacity. Give us enough to withstand a 48 hour outage.



No it doesn't, I'm afraid. At the moment, with the project running absolutely flat out, it means that somebody else grabs the tasks and runs them for you.

Emphasis added by me.

That last part should be in a FAQ or a sticky or something.

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 24,148,117
RAC: 1,872
United States
Message 1345107 - Posted: 10 Mar 2013, 21:56:01 UTC - in response to Message 1344911.

While an increase in the number of WU distributed to an individual cruncher might sound like a good idea it wouldn't help anything part from our egos.


I disagree. Everytime I run out of work during the outages, I load work from a "B" project, and that crunch time is lost to SETI. I suspect the faster crunchers run out more than I do. That lost crunch time means that less results flow to the science databases, and that hurts the project. It is wasted capacity. Give us enough to withstand a 48 hour outage.

No it doesn't, I'm afraid. At the moment, with the project running absolutely flat out, it means that somebody else grabs the tasks and runs them for you.

Just don't see it that way. If I can't get work during an outage, they can't either. And if I still have B project work left over when work flow resumes and someone runs tasks I would have run, one set gets run instead of two. Larger caches would allow the proect to run flat out, which it is not doing when we're out of work.


But the project is already running flat out. How does
increasing the limits make the project run any more
flat out than it already is?

At what point does the increased number of work units
in the field cause the database to crash? Which is why
limits were put in, isn't it?

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 24,148,117
RAC: 1,872
United States
Message 1345110 - Posted: 10 Mar 2013, 22:03:30 UTC - in response to Message 1345105.

Yes, but the project never asked anybody to build
such super crunchers, and the project is not time
critical. If it takes an extra year to find those
little green men that is not a problem for the project.

Yes, it would be nice to have all the WU's you want, but
if it causes even bigger problems, why do it?

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 24,148,117
RAC: 1,872
United States
Message 1345112 - Posted: 10 Mar 2013, 22:07:08 UTC - in response to Message 1345109.



At what point does the increased number of work units
in the field cause the database to crash? Which is why
limits were put in, isn't it?

That's the million dollar question.
I have been told that the actual DB usage is only at, I think it was like 60% of capacity. The limit may be on the server's capacity to process and maintain the DB.


So that problem has to be fixed first, where ever it
resides at the project. What can we do to fix that problem
from this side of the servers?

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 24,148,117
RAC: 1,872
United States
Message 1345118 - Posted: 10 Mar 2013, 22:16:54 UTC - in response to Message 1344928.

Much snippage by me.

That exactly why i sugest a "little increase"


What is needed is the point <number> at which the project grinds to a
halt because of too many WU's in the field.

Then divide that by the number of crunchers, add a suitable
fudge factor for a safety margin and we'll have a limit that can be
imposed until the underlying problem is fixed.

Has anybody run the numbers?

1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Please rise the limits... just a little...

Copyright © 2014 University of California