Please rise the limits... just a little...


log in

Advanced search

Message boards : Number crunching : Please rise the limits... just a little...

Previous · 1 · 2 · 3 · 4 · Next
Author Message
juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,199,709
RAC: 331,003
Brazil
Message 1345127 - Posted: 10 Mar 2013, 22:36:28 UTC

I talk about a "Little increase" to avoid any problems, a change from 100WU GPU Host to 100W per GPU sure will not make a crash on the DB and will keep our GPU´s working.

I agree in something, nobody ask to build super-crunchers, but if they are here why not use?

I´m with Mark, we just want to full use of our resources like anyone else.




____________

bill
Send message
Joined: 16 Jun 99
Posts: 845
Credit: 20,540,060
RAC: 14,915
United States
Message 1345131 - Posted: 10 Mar 2013, 22:48:56 UTC - in response to Message 1345127.

I talk about a "Little increase" to avoid any problems, a change from 100WU GPU Host to 100W per GPU sure will not make a crash on the DB and will keep our GPU´s working.

I agree in something, nobody ask to build super-crunchers, but if they are here why not use?

I´m with Mark, we just want to full use of our resources like anyone else.


Yes, but has anybody computed the numbers to see if even "a little increase"
will cause the project to crash. Without actual numbers I don't see the limits rising any time soon, if ever.

To paraphrase Mr. Spock 'the needs of the project, out weigh the needs of the one, or the few'.

Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 22358
Credit: 29,271,618
RAC: 23,965
Germany
Message 1345132 - Posted: 10 Mar 2013, 22:52:38 UTC

I totally agree Bill.
Under actual conditions i dont think it will change.
Its on the staff to decide it.

I`d also like to store 500 APs again but it is as it is.

____________

Profile Bernie Vine
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6601
Credit: 22,369,650
RAC: 14,901
United Kingdom
Message 1345136 - Posted: 10 Mar 2013, 23:08:07 UTC

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.
____________


Today is life, the only life we're sure of. Make the most of today.

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,199,709
RAC: 331,003
Brazil
Message 1345143 - Posted: 10 Mar 2013, 23:30:37 UTC - in response to Message 1345136.
Last modified: 10 Mar 2013, 23:46:15 UTC

sorry double post
____________

bill
Send message
Joined: 16 Jun 99
Posts: 845
Credit: 20,540,060
RAC: 14,915
United States
Message 1345144 - Posted: 10 Mar 2013, 23:31:28 UTC - in response to Message 1345136.

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


Boincstats says there are 143000+ active crunchers. Just running some
rough numbers through my head that's an increase of 143000 x 100WU=

14300000 WUs. Nothing trivial.

Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 22358
Credit: 29,271,618
RAC: 23,965
Germany
Message 1345147 - Posted: 10 Mar 2013, 23:37:34 UTC - in response to Message 1345144.

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


Boincstats says there are 143000+ active crunchers. Just running some
rough numbers through my head that's an increase of 143000 x 100WU=

14300000 WUs. Nothing trivial.


Nah.
Not nearly everyone downloads 100 a day.
Some only 100 a week or month.

____________

bill
Send message
Joined: 16 Jun 99
Posts: 845
Credit: 20,540,060
RAC: 14,915
United States
Message 1345148 - Posted: 10 Mar 2013, 23:40:29 UTC - in response to Message 1345147.

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


Boincstats says there are 143000+ active crunchers. Just running some
rough numbers through my head that's an increase of 143000 x 100WU=

14300000 WUs. Nothing trivial.


Nah.
Not nearly everyone downloads 100 a day.
Some only 100 a week or month.


And some download hundreds+ a day.

Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 22358
Credit: 29,271,618
RAC: 23,965
Germany
Message 1345149 - Posted: 10 Mar 2013, 23:43:43 UTC
Last modified: 10 Mar 2013, 23:44:03 UTC

But thats not the majority.
____________

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,199,709
RAC: 331,003
Brazil
Message 1345150 - Posted: 10 Mar 2013, 23:45:43 UTC

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


There is a mistake, i talk about 100 WU PER GPU not per CPU or core (100 WU per CPU give a enought work to continue working in the scheduled outages), sure the number is a lot less, lets try to imagine, if the same top 20 machines have 2.5xGPU each one (mean) is only about 3000 WU. Now imagine the top 100 (that will include most of the 2 or 3 GPU´s hosts) with a mean of 2 GPU/host (sure is less) that will give about 10K WU, a little compared with the total of WU the DB actualy handle.

So i can´t see the "possible dissaster" to do that. But Maybe i´m wrong.
____________

Profile Bernie Vine
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6601
Credit: 22,369,650
RAC: 14,901
United Kingdom
Message 1345154 - Posted: 11 Mar 2013, 0:09:21 UTC - in response to Message 1345150.

There are 200 potential CPU's in the top 20 machines, currently they are allowed 100 WU's each 20x100 = 2,000

Increasing the limit to 100 per CPU will mean just those 20 machines could try and download an extra 18,000 WU's

How many other multi CPU rigs are there out there? Each quad would need an extra 300 each eight core and extra 700 and so on.

Even my little farm of 6 machine would be able to download an extra 1,000

As Bill says without the numbers it could be a disaster.


There is a mistake, i talk about 100 WU PER GPU not per CPU or core (100 WU per CPU give a enought work to continue working in the scheduled outages), sure the number is a lot less, lets try to imagine, if the same top 20 machines have 2.5xGPU each one (mean) is only about 3000 WU. Now imagine the top 100 (that will include most of the 2 or 3 GPU´s hosts) with a mean of 2 GPU/host (sure is less) that will give about 10K WU, a little compared with the total of WU the DB actualy handle.

So i can´t see the "possible dissaster" to do that. But Maybe i´m wrong.


So the top 20 machines will possibly ask for an EXTRA 10,000 WU's per day. As we have no idea how many GPU cores there are out there if there are only 500 double GPU machines that would be 50,000 requests that would all hit the servers and database on the day the limits were raised!

I know as an average cruncher who runs a few single GPU slower machines I am not considered as important as the multi GPU monsters, however since I put the TCP fix in SETI@Home is running better than it has for a long time. I would like to keep it that way.

The project has years of results than have not been analysed, I know that is supposed to change, but it is not a race to crunch as much and as fast as you can.

I suspect SETI@Home currently has more data than it will be able to handle in my lifetime. There is no rush!!

____________


Today is life, the only life we're sure of. Make the most of today.

bill
Send message
Joined: 16 Jun 99
Posts: 845
Credit: 20,540,060
RAC: 14,915
United States
Message 1345155 - Posted: 11 Mar 2013, 0:09:40 UTC - in response to Message 1345150.

Until someone comes up with hard numbers it's all
assumption anyway. No numbers, no raise in limits.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,912,546
RAC: 13,509
United Kingdom
Message 1345163 - Posted: 11 Mar 2013, 0:51:09 UTC

About six weeks ago, Einstein wrote:

It seems that the large number of tasks (3.7M) is seriously stressing our databases (not only the replica).

Here, we currently have
3,379,437 MB out in the field 170,122 AP out in the field 2,585,161 MB results returned 189,666 AP results returned --------- 6,324,386 tasks in database

- some 70% higher than Einstein's 'serious stress' level.

If the limits were raised, there would be a one-off transitional spike as the fast and/or high cache hosts transitioned to the new maximum level. There would be frantic splitter and download activity for a couple of days while everybody filled their boots: no problem, we've survived worse than that before.

Then we'd settle down to a new steady state. The pipe would stay full. Tasks would be allocated on a 'return one, get one back' basis as now. The same amount of work would be done.

There would be two differences.

1) The database would be fuller - more bloat, more stress, less speed.
2) The upload pipe would be fuller. The same number of scheduler requests, but each file would be bigger.

I can't see any benefit (for the project, that is). And since Matt is in the process of getting everything clean, lean and ship-shape in preparation for moving the servers out of their air-con home for the last 14 years, and (down the hill?) to their new co-lo home, is now the time to stuff everything up to and beyond the limit?

[Who would want to be driving that buggy? Really?]

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,199,709
RAC: 331,003
Brazil
Message 1345167 - Posted: 11 Mar 2013, 1:15:01 UTC
Last modified: 11 Mar 2013, 1:20:13 UTC

If they are allready preparing to moving, i agree with you, is safer keep all the way is working now.

Let´s wait and see what is comming.
____________

ExchangeMan
Volunteer tester
Send message
Joined: 9 Jan 00
Posts: 103
Credit: 104,498,506
RAC: 212,023
United States
Message 1345181 - Posted: 11 Mar 2013, 2:56:00 UTC

I would be with everyone who wants to raise the limit. As a compromise, can we cut back the CPU cache from 100 to 50 and raise the GPU cache to 150. For big GPU crunchers that would work well. There is a great imbalance between CPU crunch power and GPU crunch power. I've been getting shorties all day (as I'm sure everyone else has). If all I have are shorties in my cache, my big cruncher will burn through 100 in 15 to 20 minutes. I know, when you process at that rate 50 more isn't likely worth it.

Now with the download transfer rate problems appearing temporarily to be solved for some of us Windows users at least we don't have to fight that problem so much.

I don't know about any of you, but I get times where the 5 minute cycle gives me no GPU work units (saying none are available) even though I may report 30 completed tasks. This gets me really nervous when several cycles in a row do this. At least a somewhat larger cache increases the chances of riding this 'phenomena' out without the GPUs running dry.

Oh well, just my 2 cents.

Carry on.

____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37286
Credit: 497,847,270
RAC: 492,372
United States
Message 1345182 - Posted: 11 Mar 2013, 3:09:08 UTC - in response to Message 1345181.
Last modified: 11 Mar 2013, 3:09:54 UTC

I would be with everyone who wants to raise the limit. As a compromise, can we cut back the CPU cache from 100 to 50 and raise the GPU cache to 150. For big GPU crunchers that would work well. There is a great imbalance between CPU crunch power and GPU crunch power. I've been getting shorties all day (as I'm sure everyone else has). If all I have are shorties in my cache, my big cruncher will burn through 100 in 15 to 20 minutes. I know, when you process at that rate 50 more isn't likely worth it.

Now with the download transfer rate problems appearing temporarily to be solved for some of us Windows users at least we don't have to fight that problem so much.

I don't know about any of you, but I get times where the 5 minute cycle gives me no GPU work units (saying none are available) even though I may report 30 completed tasks. This gets me really nervous when several cycles in a row do this. At least a somewhat larger cache increases the chances of riding this 'phenomena' out without the GPUs running dry.

Oh well, just my 2 cents.

Carry on.

Same here....
100 WUs just don't last very long on a multi-GPU cruncher. Especially when they are mostly of the shorty variety.
Between 9 rigs, my cache has been floating between around 1700 and 1500....so all work requests are not being filled.
And of course, several of my rigs won't make it through a Tuesday outage with only 100 GPU WUs.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3565
Credit: 97,805,213
RAC: 78,978
United States
Message 1345311 - Posted: 11 Mar 2013, 13:37:17 UTC

BOINCStats reports SETI@Home has 420,541 active hosts. The next project down in active hosts is Einstein@Home with 262,645. So I would say if Einstein is having issues then our servers must be held together with wizardry.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 29472
Credit: 8,870,481
RAC: 27,233
United Kingdom
Message 1345322 - Posted: 11 Mar 2013, 14:21:44 UTC

our servers must be held together with wizardry.

Matt, take a bow!

Sakletare
Avatar
Send message
Joined: 18 May 99
Posts: 131
Credit: 20,106,540
RAC: 7,696
Sweden
Message 1345545 - Posted: 11 Mar 2013, 21:08:27 UTC - in response to Message 1345322.

our servers must be held together with wizardry.

Matt, take a bow!

Do not take Matt for some conjuror of cheap tricks! I wouldn't be surprised if Matt turns out to be the eighth son of an eighth son.

There, there's two nerdy references for you. ;)

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 369
Credit: 2,468,587
RAC: 2,301
United States
Message 1345723 - Posted: 12 Mar 2013, 6:23:52 UTC
Last modified: 12 Mar 2013, 6:26:35 UTC

I believe the problem comes in not from the number any one super cruncher can do in a day but from how many of those will persist in the database waiting for it's mate from a wingman with a much slower system.

The top host can on average crunch 1 GPU assigned unit in under a minute (seen it a low as 35 seconds) due to the number of GPUs and multitasking multiple units on each GPU. In a day we are talking 1400-2500 units. My very low end GPU can do one in a little less than an hour. It takes around 3.5 days to process 100 GPU units. Right now 15-25% of them per day are waiting for their wingman when they are reported. Just imagine the percentage for a super cruncher.

It's the validation pending ones as well as those assigned in progress queue that clog up the database lookup for super crunchers. I have roughly 25% of an average day's worth of work that's been pending for more than 30 days. What's the rate for someone who can crunch several hundred if not thousand in a day? How many persist for weeks, filling up the database, slowing lookup times?

It's not that the super crunchers are the problem, they are just the ever accelerating conveyer belt of bonbons that Lucy simply can't box fast enough.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Please rise the limits... just a little...

Copyright © 2014 University of California