Server Run, July 30 - August 2 2010

Message boards : Number crunching : Server Run, July 30 - August 2 2010
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1022113 - Posted: 3 Aug 2010, 1:16:25 UTC - in response to Message 1022084.  
Last modified: 3 Aug 2010, 1:16:58 UTC

Sutaru,
That VLAR_2 was just one of the new ones that got sent back for some reason or another. So long as it doesn't end up on your GPU it should be no problem.

?

I spoke about hiamps VLAR WU.. ;-)

I use Fred's BOINC Rescheduler (V1.6) for to protect my GPU for VLAR WUs and for to avoid the famous -177 errors. :-)

Got 4 more Vlars and used the 1.6 this time...Looked thru the list but didn't see 4 that looked any different.

I got ~ 20 VLAR WUs for GPU, which weren't marked as .vlar_x .
So BOINC Rescheduler and fine. ;-)
ID: 1022113 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1022115 - Posted: 3 Aug 2010, 1:21:54 UTC - in response to Message 1022112.  

It sure would be cool if they could raise the work unit cache from 100 to 1000 or something due to the limited amount of download time.

You mean the daily WU quota for GPU?

Your top_RAC_host have current 'Max tasks per day 453' (GPU). This value x8 = 3,624 WUs/GPU/day.

It's a pity, that they don't write there the correct WU value.

ID: 1022115 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1022134 - Posted: 3 Aug 2010, 5:07:41 UTC - in response to Message 1022115.  
Last modified: 3 Aug 2010, 5:08:15 UTC

It sure would be cool if they could raise the work unit cache from 100 to 1000 or something due to the limited amount of download time.

You mean the daily WU quota for GPU?

Your top_RAC_host have current 'Max tasks per day 453' (GPU). This value x8 = 3,624 WUs/GPU/day.

It's a pity, that they don't write there the correct WU value.


Sutaru, I think he is talking about expanding the Download Feeder Process, which has slots for only 100 Results at a time.



Hiamps, if the Feeder slots could be expanded to 200 or more, what might that do to the already maxed-out download bandwidth, not to mention all the issues some folks have had with "ghosts" - WUs that were assigned but were never properly downloaded?

A larger Download Feeder would also require more memory space on the server - is there enough available?
ID: 1022134 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1022155 - Posted: 3 Aug 2010, 7:38:59 UTC - in response to Message 1022134.  

It sure would be cool if they could raise the work unit cache from 100 to 1000 or something due to the limited amount of download time.

You mean the daily WU quota for GPU?

Your top_RAC_host have current 'Max tasks per day 453' (GPU). This value x8 = 3,624 WUs/GPU/day.

It's a pity, that they don't write there the correct WU value.


Sutaru, I think he is talking about expanding the Download Feeder Process, which has slots for only 100 Results at a time.



Hiamps, if the Feeder slots could be expanded to 200 or more, what might that do to the already maxed-out download bandwidth, not to mention all the issues some folks have had with "ghosts" - WUs that were assigned but were never properly downloaded?

A larger Download Feeder would also require more memory space on the server - is there enough available?

Not sure how that works anymore...Seems if 10,000 people have downloads going when the servers go down then when they come back up the servers seem to handle it OK? Just going by what I see....Looks like they opened the gates wide anyways as I took a nap and came back to 3500 downloads.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1022155 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1022372 - Posted: 4 Aug 2010, 2:18:24 UTC

what happened this outage is exactly why ALL LIMITS NEED TO BE REMOVED AFTER 24 HOURS.

check cricket graphs, server problem after 10 hours after removing limits.
ID: 1022372 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1022378 - Posted: 4 Aug 2010, 2:30:56 UTC

I dissagree.........................

When all limits are removed is precisely when "ghost" work units are more likely to occur and the probablity of project going down increases substantialy. Due to excessive data base activity and bandwidth being maxed out.

I think if the cricket graph could be kept at a maximum of 80 (or less due to reduced demand) for the entire weekend then everything would work a lot better.
Boinc....Boinc....Boinc....Boinc....
ID: 1022378 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1022380 - Posted: 4 Aug 2010, 2:43:37 UTC - in response to Message 1022378.  

and then they have a chance to fix it before the outage, rather then a few hours to fill my belly... :P
ID: 1022380 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1022381 - Posted: 4 Aug 2010, 2:44:52 UTC - in response to Message 1022372.  

what happened this outage is exactly why ALL LIMITS NEED TO BE REMOVED AFTER 24 HOURS.

check cricket graphs, server problem after 10 hours after removing limits.

If you really convince them of that relationship, they absolutely won't make the final limits boost earlier and incur some obligation to fix things during the weekend. They're not paid for that kind of support, though they usually do it when needed if they can.

It might be possible to have one set of limits at the Friday morning start, a boost near their quitting time Friday afternoon, then the final boost Monday morning. But that first Friday morning set might need to be lower in order to reach a steady state before quitting time Friday afternoon. I doubt they'll be inclined to experiment along those lines next weekend, assuming there are new AP validators to watch.
                                                                  Joe
ID: 1022381 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19062
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1022414 - Posted: 4 Aug 2010, 8:07:18 UTC - in response to Message 1022381.  

what happened this outage is exactly why ALL LIMITS NEED TO BE REMOVED AFTER 24 HOURS.

check cricket graphs, server problem after 10 hours after removing limits.

If you really convince them of that relationship, they absolutely won't make the final limits boost earlier and incur some obligation to fix things during the weekend. They're not paid for that kind of support, though they usually do it when needed if they can.

It might be possible to have one set of limits at the Friday morning start, a boost near their quitting time Friday afternoon, then the final boost Monday morning. But that first Friday morning set might need to be lower in order to reach a steady state before quitting time Friday afternoon. I doubt they'll be inclined to experiment along those lines next weekend, assuming there are new AP validators to watch.
                                                                  Joe

I would like them to move the three day outage to Mon thru Wed, so that they can start on Thur with limited downloads and then can be ramped up on Fri under supervision.
At the moment UK office computers only have Monday and Tuesday to d/load. And if there are w/end outages even that option is dead. At present the re-start on Fridays is after UK office hours.
ID: 1022414 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1022424 - Posted: 4 Aug 2010, 9:19:48 UTC - in response to Message 1022381.  

what happened this outage is exactly why ALL LIMITS NEED TO BE REMOVED AFTER 24 HOURS.

check cricket graphs, server problem after 10 hours after removing limits.

If you really convince them of that relationship, they absolutely won't make the final limits boost earlier and incur some obligation to fix things during the weekend. They're not paid for that kind of support, though they usually do it when needed if they can.

It might be possible to have one set of limits at the Friday morning start, a boost near their quitting time Friday afternoon, then the final boost Monday morning. But that first Friday morning set might need to be lower in order to reach a steady state before quitting time Friday afternoon. I doubt they'll be inclined to experiment along those lines next weekend, assuming there are new AP validators to watch.
                                                                  Joe


If I would be an admin of this project (and I guess it shouldn't be a problem to adjust the limit from at home, remote control of the server) it wouldn't be a problem for me to look two times/day over the weekend to the cricket graph for to increase the limit in steps.

;-)

ID: 1022424 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1022496 - Posted: 4 Aug 2010, 15:04:43 UTC

I thought Jeff's original idea was to script the increases gradually after the outage ended. What happened to that plan?
Idling bandwidth over the weekend and then setting up a mad dash for everybody to fill their tanks on Monday just makes no sense to me.

And did anybody ever figure out what the problem was with all of the difficulty folks were having with connecting to the servers even when the bandwidth usage was rather reasonable this run?

Meow meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1022496 · Report as offensive
IFRS
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 1022520 - Posted: 4 Aug 2010, 16:37:03 UTC

Not to mention that the babysit on big crunchers, that never was low, its like double now, and sometimes you can´t fill them enough for the outage = frustration. I don´t know how much time I will be able to keep with this pace before I got sick of it. Working on the machines like 3 or 4 hours everyday just to keep the farm running is a price that I may not be able to pay FOREVER.

ID: 1022520 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1022546 - Posted: 4 Aug 2010, 18:13:31 UTC - in response to Message 1022496.  

I agree. The weekend that Jeff seemed to be on call went very well. He kept us informed and made small increases throughout the weekend. By Monday I had very little to "top off" with for my 5 day cache.

Nobody ever explained the connection problem last weekend. As usual no information flows from the top, down. We get no respect!
Boinc....Boinc....Boinc....Boinc....
ID: 1022546 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1022579 - Posted: 4 Aug 2010, 21:52:03 UTC

DA knows there is a problem with ghosts yet allows them to reset DCF so you can't get work....
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 1022579 · Report as offensive
Profile Ghery S. Pettit
Avatar

Send message
Joined: 7 Nov 99
Posts: 325
Credit: 28,109,066
RAC: 82
United States
Message 1022604 - Posted: 4 Aug 2010, 23:52:25 UTC

I don't know what all the hoopla is about. I just leave the machines alone and they take care of themselves. Not much choice when I've been traveling and can only access one of them remotely. Plenty of work for them. Now, I only have a functional GPU on one of them, but it seems to do alright, as well. Good thing as I won't be able to babysit them the next two shutdowns, either.
ID: 1022604 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1022711 - Posted: 5 Aug 2010, 14:35:58 UTC
Last modified: 5 Aug 2010, 14:38:24 UTC

If your PC would have at least 4 (for example GTX260-216) GPUs, you would need to have at least ~ 700 normal WUs/day.
For the three day outage you need at least ~ 2,100 normal WUs.
For to have little bit security reserves (4 day WU cache) = ~ 2,800 WUs.

If you would let run the BOINC Manager alone and your PC get all the time 'no tasks available' and a backlog for the new work request.. It's not possible to fill up the WU cache with ~ 2,800 WUs.

If you get 1/4 shorties and 3/4 normal WUs you would need ~ 2,100 normal and ~ 2,800 shorty WUs = ~ 4,900 WUs for 4 days. BTW, and I don't think the BOINC Client/Manager can manage this big WU cache.

If the SETI@home scheduler send your BOINC ~ 20 WUs/request your BOINC need ~ 245 well contacts.

I think with this example it's well shown that only ~ 24 hours without or increased limit is too less.
ID: 1022711 · Report as offensive
IFRS
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 1022723 - Posted: 5 Aug 2010, 15:03:53 UTC - in response to Message 1022711.  

If your PC would have at least 4 (for example GTX260-216) GPUs, you would need to have at least ~ 700 normal WUs/day.
For the three day outage you need at least ~ 2,100 normal WUs.
For to have little bit security reserves (4 day WU cache) = ~ 2,800 WUs.

If you would let run the BOINC Manager alone and your PC get all the time 'no tasks available' and a backlog for the new work request.. It's not possible to fill up the WU cache with ~ 2,800 WUs.

If you get 1/4 shorties and 3/4 normal WUs you would need ~ 2,100 normal and ~ 2,800 shorty WUs = ~ 4,900 WUs for 4 days. BTW, and I don't think the BOINC Client/Manager can manage this big WU cache.

If the SETI@home scheduler send your BOINC ~ 20 WUs/request your BOINC need ~ 245 well contacts.

I think with this example it's well shown that only ~ 24 hours without or increased limit is too less.


Exactly. A 10k RAC machine, single CUDA, won´t need any babysit, or near that. But top 100 hosts, wich I guess are owned the most by the people that use this foruns, need a good amount of babysit, wich is doubled now on this system.
Some of them, like Sutaru say, can´t have enough work for the shortage. IF you can, BOINC can´t handle that much WU´s without hanging the machine, or even REPORT them when they are uploaded.
All I know is the big guns are suffering. Not complaining, just saying. If something can be done to avoid it, I think it should, because the hardcore users put big money and time on the project, for the science of it.
ID: 1022723 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Server Run, July 30 - August 2 2010


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.