Please rise the limits... just a little...

Message boards : Number crunching : Please rise the limits... just a little...
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1345131 - Posted: 10 Mar 2013, 22:48:56 UTC - in response to Message 1345127.  

I talk about a "Little increase" to avoid any problems, a change from 100WU GPU Host to 100W per GPU sure will not make a crash on the DB and will keep our GPU´s working.

I agree in something, nobody ask to build super-crunchers, but if they are here why not use?

I´m with Mark, we just want to full use of our resources like anyone else.


Yes, but has anybody computed the numbers to see if even "a little increase"
will cause the project to crash. Without actual numbers I don't see the limits rising any time soon, if ever.

To paraphrase Mr. Spock 'the needs of the project, out weigh the needs of the one, or the few'.
ID: 1345131 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34249
Credit: 79,922,639
RAC: 80
Germany
Message 1345132 - Posted: 10 Mar 2013, 22:52:38 UTC

I totally agree Bill.
Under actual conditions i dont think it will change.
Its on the staff to decide it.

I`d also like to store 500 APs again but it is as it is.



With each crime and every kindness we birth our future.
ID: 1345132 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1345136 - Posted: 10 Mar 2013, 23:08:07 UTC

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.
ID: 1345136 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1345143 - Posted: 10 Mar 2013, 23:30:37 UTC - in response to Message 1345136.  
Last modified: 10 Mar 2013, 23:46:15 UTC

sorry double post
ID: 1345143 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1345144 - Posted: 10 Mar 2013, 23:31:28 UTC - in response to Message 1345136.  

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


Boincstats says there are 143000+ active crunchers. Just running some
rough numbers through my head that's an increase of 143000 x 100WU=

14300000 WUs. Nothing trivial.
ID: 1345144 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34249
Credit: 79,922,639
RAC: 80
Germany
Message 1345147 - Posted: 10 Mar 2013, 23:37:34 UTC - in response to Message 1345144.  

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


Boincstats says there are 143000+ active crunchers. Just running some
rough numbers through my head that's an increase of 143000 x 100WU=

14300000 WUs. Nothing trivial.


Nah.
Not nearly everyone downloads 100 a day.
Some only 100 a week or month.



With each crime and every kindness we birth our future.
ID: 1345147 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1345148 - Posted: 10 Mar 2013, 23:40:29 UTC - in response to Message 1345147.  

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


Boincstats says there are 143000+ active crunchers. Just running some
rough numbers through my head that's an increase of 143000 x 100WU=

14300000 WUs. Nothing trivial.


Nah.
Not nearly everyone downloads 100 a day.
Some only 100 a week or month.


And some download hundreds+ a day.
ID: 1345148 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34249
Credit: 79,922,639
RAC: 80
Germany
Message 1345149 - Posted: 10 Mar 2013, 23:43:43 UTC
Last modified: 10 Mar 2013, 23:44:03 UTC

But thats not the majority.


With each crime and every kindness we birth our future.
ID: 1345149 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1345150 - Posted: 10 Mar 2013, 23:45:43 UTC

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


There is a mistake, i talk about 100 WU PER GPU not per CPU or core (100 WU per CPU give a enought work to continue working in the scheduled outages), sure the number is a lot less, lets try to imagine, if the same top 20 machines have 2.5xGPU each one (mean) is only about 3000 WU. Now imagine the top 100 (that will include most of the 2 or 3 GPU´s hosts) with a mean of 2 GPU/host (sure is less) that will give about 10K WU, a little compared with the total of WU the DB actualy handle.

So i can´t see the "possible dissaster" to do that. But Maybe i´m wrong.
ID: 1345150 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1345154 - Posted: 11 Mar 2013, 0:09:21 UTC - in response to Message 1345150.  

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


There is a mistake, i talk about 100 WU PER GPU not per CPU or core (100 WU per CPU give a enought work to continue working in the scheduled outages), sure the number is a lot less, lets try to imagine, if the same top 20 machines have 2.5xGPU each one (mean) is only about 3000 WU. Now imagine the top 100 (that will include most of the 2 or 3 GPU´s hosts) with a mean of 2 GPU/host (sure is less) that will give about 10K WU, a little compared with the total of WU the DB actualy handle.

So i can´t see the "possible dissaster" to do that. But Maybe i´m wrong.


So the top 20 machines will possibly ask for an EXTRA 10,000 WU's per day. As we have no idea how many GPU cores there are out there if there are only 500 double GPU machines that would be 50,000 requests that would all hit the servers and database on the day the limits were raised!

I know as an average cruncher who runs a few single GPU slower machines I am not considered as important as the multi GPU monsters, however since I put the TCP fix in SETI@Home is running better than it has for a long time. I would like to keep it that way.

The project has years of results than have not been analysed, I know that is supposed to change, but it is not a race to crunch as much and as fast as you can.

I suspect SETI@Home currently has more data than it will be able to handle in my lifetime. There is no rush!!

ID: 1345154 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1345155 - Posted: 11 Mar 2013, 0:09:40 UTC - in response to Message 1345150.  

Until someone comes up with hard numbers it's all
assumption anyway. No numbers, no raise in limits.
ID: 1345155 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1345163 - Posted: 11 Mar 2013, 0:51:09 UTC

About six weeks ago, Einstein wrote:

It seems that the large number of tasks (3.7M) is seriously stressing our databases (not only the replica).

Here, we currently have
3,379,437 MB out in the field
  170,122 AP out in the field
2,585,161 MB results returned
  189,666 AP results returned
---------
6,324,386 tasks in database

- some 70% higher than Einstein's 'serious stress' level.

If the limits were raised, there would be a one-off transitional spike as the fast and/or high cache hosts transitioned to the new maximum level. There would be frantic splitter and download activity for a couple of days while everybody filled their boots: no problem, we've survived worse than that before.

Then we'd settle down to a new steady state. The pipe would stay full. Tasks would be allocated on a 'return one, get one back' basis as now. The same amount of work would be done.

There would be two differences.

1) The database would be fuller - more bloat, more stress, less speed.
2) The upload pipe would be fuller. The same number of scheduler requests, but each file would be bigger.

I can't see any benefit (for the project, that is). And since Matt is in the process of getting everything clean, lean and ship-shape in preparation for moving the servers out of their air-con home for the last 14 years, and (down the hill?) to their new co-lo home, is now the time to stuff everything up to and beyond the limit?

[Who would want to be driving that buggy? Really?]
ID: 1345163 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1345167 - Posted: 11 Mar 2013, 1:15:01 UTC
Last modified: 11 Mar 2013, 1:20:13 UTC

If they are allready preparing to moving, i agree with you, is safer keep all the way is working now.

Let´s wait and see what is comming.
ID: 1345167 · Report as offensive
ExchangeMan
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 115
Credit: 157,719,104
RAC: 0
United States
Message 1345181 - Posted: 11 Mar 2013, 2:56:00 UTC

I would be with everyone who wants to raise the limit. As a compromise, can we cut back the CPU cache from 100 to 50 and raise the GPU cache to 150. For big GPU crunchers that would work well. There is a great imbalance between CPU crunch power and GPU crunch power. I've been getting shorties all day (as I'm sure everyone else has). If all I have are shorties in my cache, my big cruncher will burn through 100 in 15 to 20 minutes. I know, when you process at that rate 50 more isn't likely worth it.

Now with the download transfer rate problems appearing temporarily to be solved for some of us Windows users at least we don't have to fight that problem so much.

I don't know about any of you, but I get times where the 5 minute cycle gives me no GPU work units (saying none are available) even though I may report 30 completed tasks. This gets me really nervous when several cycles in a row do this. At least a somewhat larger cache increases the chances of riding this 'phenomena' out without the GPUs running dry.

Oh well, just my 2 cents.

Carry on.

ID: 1345181 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1345182 - Posted: 11 Mar 2013, 3:09:08 UTC - in response to Message 1345181.  
Last modified: 11 Mar 2013, 3:09:54 UTC

I would be with everyone who wants to raise the limit. As a compromise, can we cut back the CPU cache from 100 to 50 and raise the GPU cache to 150. For big GPU crunchers that would work well. There is a great imbalance between CPU crunch power and GPU crunch power. I've been getting shorties all day (as I'm sure everyone else has). If all I have are shorties in my cache, my big cruncher will burn through 100 in 15 to 20 minutes. I know, when you process at that rate 50 more isn't likely worth it.

Now with the download transfer rate problems appearing temporarily to be solved for some of us Windows users at least we don't have to fight that problem so much.

I don't know about any of you, but I get times where the 5 minute cycle gives me no GPU work units (saying none are available) even though I may report 30 completed tasks. This gets me really nervous when several cycles in a row do this. At least a somewhat larger cache increases the chances of riding this 'phenomena' out without the GPUs running dry.

Oh well, just my 2 cents.

Carry on.

Same here....
100 WUs just don't last very long on a multi-GPU cruncher. Especially when they are mostly of the shorty variety.
Between 9 rigs, my cache has been floating between around 1700 and 1500....so all work requests are not being filled.
And of course, several of my rigs won't make it through a Tuesday outage with only 100 GPU WUs.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1345182 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1345311 - Posted: 11 Mar 2013, 13:37:17 UTC

BOINCStats reports SETI@Home has 420,541 active hosts. The next project down in active hosts is Einstein@Home with 262,645. So I would say if Einstein is having issues then our servers must be held together with wizardry.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1345311 · Report as offensive
Sakletare
Avatar

Send message
Joined: 18 May 99
Posts: 132
Credit: 23,423,829
RAC: 0
Sweden
Message 1345545 - Posted: 11 Mar 2013, 21:08:27 UTC - in response to Message 1345322.  

our servers must be held together with wizardry.

Matt, take a bow!

Do not take Matt for some conjuror of cheap tricks! I wouldn't be surprised if Matt turns out to be the eighth son of an eighth son.

There, there's two nerdy references for you. ;)
ID: 1345545 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1345723 - Posted: 12 Mar 2013, 6:23:52 UTC
Last modified: 12 Mar 2013, 6:26:35 UTC

I believe the problem comes in not from the number any one super cruncher can do in a day but from how many of those will persist in the database waiting for it's mate from a wingman with a much slower system.

The top host can on average crunch 1 GPU assigned unit in under a minute (seen it a low as 35 seconds) due to the number of GPUs and multitasking multiple units on each GPU. In a day we are talking 1400-2500 units. My very low end GPU can do one in a little less than an hour. It takes around 3.5 days to process 100 GPU units. Right now 15-25% of them per day are waiting for their wingman when they are reported. Just imagine the percentage for a super cruncher.

It's the validation pending ones as well as those assigned in progress queue that clog up the database lookup for super crunchers. I have roughly 25% of an average day's worth of work that's been pending for more than 30 days. What's the rate for someone who can crunch several hundred if not thousand in a day? How many persist for weeks, filling up the database, slowing lookup times?

It's not that the super crunchers are the problem, they are just the ever accelerating conveyer belt of bonbons that Lucy simply can't box fast enough.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1345723 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1345736 - Posted: 12 Mar 2013, 6:46:13 UTC - in response to Message 1345723.  

I believe the problem comes in not from the number any one super cruncher can do in a day but from how many of those will persist in the database waiting for it's mate from a wingman with a much slower system.

The top host can on average crunch 1 GPU assigned unit in under a minute (seen it a low as 35 seconds) due to the number of GPUs and multitasking multiple units on each GPU. In a day we are talking 1400-2500 units. My very low end GPU can do one in a little less than an hour. It takes around 3.5 days to process 100 GPU units. Right now 15-25% of them per day are waiting for their wingman when they are reported. Just imagine the percentage for a super cruncher.

It's the validation pending ones as well as those assigned in progress queue that clog up the database lookup for super crunchers. I have roughly 25% of an average day's worth of work that's been pending for more than 30 days. What's the rate for someone who can crunch several hundred if not thousand in a day? How many persist for weeks, filling up the database, slowing lookup times?

It's not that the super crunchers are the problem, they are just the ever accelerating conveyer belt of bonbons that Lucy simply can't box fast enough.


A happy situation might be for the cache limits to go from 100 work units to .5 or 1 day caches and shorten the timeout limits. That would be a "validate or perish" situation for the database.

Oh, not to mention that it would cut the number of entries in the database for all of those who do fewer than 100 work units in a day (and CPU work units which would be difficult to do 100 of in a day).

More connections? I doubt it.

Look at the production of the computers you find in the 960-1000th places in the "top computers" list. There are a LOT of computers making a LOT of connections to re-build a 100 work unit cache.

I *really* hope Matt is making headway in getting our super-duper servers into a building with a super-duper connection to the outside world.


ID: 1345736 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22149
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1345738 - Posted: 12 Mar 2013, 6:58:35 UTC

S@H uses multiple servers, so the load on the upload servers does not affect that on the download servers. The upload servers feed into the validators, which aren't affected by the problem addressed by the poor performance of the download servers - S@H has always had a large pool of data awaiting validation, and appears to manage that side of things quite well.
The download servers are on the other hand struggling to cope with demand. Its bound not to be a simple solution (apart from lack of bandwidth) but a deep rooted one, that is taking a lot of effort to isolate and resolve. Contributors I can think of include the re-try/back-off by clients, the in-balance between the the two download servers (one is massively faster than the other), the "auto-sync" between the various time-outs and hand-overs (they all appear to be based on 5 minutes) and so on. Not to mention that there is a large number of different versions of BOINC out here all with subtly different "approaches to the world". And finally there are the abusers, sorry users....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1345738 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Please rise the limits... just a little...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.