Panic Mode On (60) Server problems?

Message boards : Number crunching : Panic Mode On (60) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 9 · Next

AuthorMessage
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1168853 - Posted: 7 Nov 2011, 5:48:03 UTC - in response to Message 1168846.  

Looks like Boinc did pretty well with the buttons while I was gone for the day.

The kitties must have helped out a bit.


... skite :)

can't get a thing here ... was just watching boinc ... 28 minutes to get a wu that took 1:08 to do ...

MUCH MUCH MUCH FATTER PIPES PLEASE ..... or turn the limits off so we can build up our caches again so that I am once again able to run counter cyclical as I used to and not be affected by these unfolding disasters ...


ID: 1168853 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1168872 - Posted: 7 Nov 2011, 8:55:55 UTC - in response to Message 1168850.  

Looks like Boinc did pretty well with the buttons while I was gone for the day.

The kitties must have helped out a bit.

I've been experimenting like a Mad Scientist, I know down to 270.51 I don't need dongles and Raists AP r5.21 GPU APP doesn't get along with 270 or newer drivers and My PC doesn't get along with 267.24 or older drivers(no dongles), 267.24 sees one gpu of a GTX295 and 266.58 sees no gpus under Boinc.

Have you tried extending the Desktop onto each GPU?

Claggy
ID: 1168872 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1168899 - Posted: 7 Nov 2011, 14:10:34 UTC

Claggy, I'm sure taking the scheduler off the HE link would help, but have you seen the load the scheduler replies alone cause?
Besides all the political nonsense surrounding the issue, it's a fair amount of traffic you're talking of.

Right, business as usual, and our yellow friend might make an appareance just before maintenance tomorrow.

We need something for the choppy waters of a non maxxed link and something that hates green...
ID: 1168899 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1168902 - Posted: 7 Nov 2011, 14:19:37 UTC - in response to Message 1168899.  

Looking across a sample of machines and projects, the scheduler request files are pretty uniformly bigger than the replies - and for big crunchers, I would suspect even more so, perhaps by several orders of magnitude.
ID: 1168902 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1168905 - Posted: 7 Nov 2011, 14:24:07 UTC - in response to Message 1168899.  

Claggy, I'm sure taking the scheduler off the HE link would help, but have you seen the load the scheduler replies alone cause?
Besides all the political nonsense surrounding the issue, it's a fair amount of traffic you're talking of.

Did you look at the links i posted of the traffic on the Campus Router's, i think the scheduler traffic would be miniscule compared to the size of those pipes,
yes during weekdays those links have more traffic on them, but they don't seem to be maxed out.

Claggy
ID: 1168905 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1168928 - Posted: 7 Nov 2011, 16:10:15 UTC - in response to Message 1168850.  

Vic, have you tried 267.59? http://www.nvidia.com/object/win7-winvista-64bit-267.59-whql-driver.html I don't have multi GPUs but it seems to work well for me on my little GTS 450.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1168928 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1168929 - Posted: 7 Nov 2011, 16:12:25 UTC - in response to Message 1168928.  
Last modified: 7 Nov 2011, 16:13:30 UTC

Vic, have you tried 267.59? http://www.nvidia.com/object/win7-winvista-64bit-267.59-whql-driver.html I don't have multi GPUs but it seems to work well for me on my little GTS 450.


SUPPORTED PRODUCTS

GeForce 500 series:
GTX 550 Ti

GeForce 400 series:
GTS 450

;-)

Claggy
ID: 1168929 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 1168942 - Posted: 7 Nov 2011, 17:18:21 UTC - in response to Message 1168928.  
Last modified: 7 Nov 2011, 17:20:09 UTC

Vic, have you tried 267.59? http://www.nvidia.com/object/win7-winvista-64bit-267.59-whql-driver.html I don't have multi GPUs but it seems to work well for me on my little GTS 450.

No, But I have only GTX295(4-water, 3- air, 1-dead-air[needs repairs], 8 total) and 2-MSI NX6200TC-TD128ELF cards now, Of course only the 295's can crunch, the 6200's aren't worth their silicon now.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1168942 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1168944 - Posted: 7 Nov 2011, 17:20:36 UTC - in response to Message 1168929.  

Sorry Vic and thanks Claggy for pointing that out. I didn't realize it had that narrow a scope. I'll go back to sleep now! :)


PROUD MEMBER OF Team Starfire World BOINC
ID: 1168944 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1169056 - Posted: 8 Nov 2011, 1:34:40 UTC

Scheduler request involves sending your client_state.xml to the server. Reply is new additions to client_state.xml plus some HTTP overhead for both.

We've seen recently when the downloads abruptly stop, that there is still about 8-11mbit of the blue line (us uploading result files, and doing scheduler requests), and the green is usually below 1mbit (us getting responses from the scheduler).

Scheduler bandwidth is negligible in the grand scheme of things, but putting it on a different link would make it more reliable (less time-outs and HTTP errors due to saturated link).
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1169056 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1169088 - Posted: 8 Nov 2011, 6:30:28 UTC - in response to Message 1169056.  

Scheduler request involves sending your client_state.xml to the server. Reply is new additions to client_state.xml plus some HTTP overhead for both.

We've seen recently when the downloads abruptly stop, that there is still about 8-11mbit of the blue line (us uploading result files, and doing scheduler requests), and the green is usually below 1mbit (us getting responses from the scheduler).

Scheduler bandwidth is negligible in the grand scheme of things, but putting it on a different link would make it more reliable (less time-outs and HTTP errors due to saturated link).

Agreed, but the link might be even more oversubscribed because work assignment would be easier. I really wish BOINC had the capability to apply some limits in a fair way.

The project is already allowed to have the front page, forums, etc. on the campus network. Perhaps the costs involved come out of the percentage of donations which the University doesn't pass on to the project. Added bandwidth usage, even if not a large fraction of what's available, seems to fall into the ongoing political kind of negotiation.
                                                                  Joe
ID: 1169088 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1169098 - Posted: 8 Nov 2011, 8:08:58 UTC - in response to Message 1169088.  

Agreed, but the link might be even more oversubscribed because work assignment would be easier. I really wish BOINC had the capability to apply some limits in a fair way.

The project is already allowed to have the front page, forums, etc. on the campus network. Perhaps the costs involved come out of the percentage of donations which the University doesn't pass on to the project. Added bandwidth usage, even if not a large fraction of what's available, seems to fall into the ongoing political kind of negotiation.
                                                                  Joe

True, but I've been saying for a while that despite what everyone thinks about switching from 100mbit to gigabit.. yes it will let more data transfers through in a given period, but there is a very distinct possibility that the entire system may fall to its knees crying digital tears with the ability to do more. Before the server upgrade, the 100mbit limit was probably the only thing keeping it all alive.

At least here recently, it seems like the I/O issues have improved a bit, and the replica keeps up for the most part. What I'm most worried about is a few of the systems that lock up as much as they do presently, but with theoretically 10 times the load on them, what kind of chaos would that unleash?

Perhaps download_1 can stay on HE, and download_2 can move to campus and limit it to 50-100mbit for the time being, until gigabit can be enabled? If I remember reading it correctly, the lab did get gigabit finally, but the uploads, downloads, and scheduler requests have to stay on the 100mbit HE link. I know website is on campus, and they also send/receive the tapes to/from off-site storage on the campus link.

Seems like some diplomacy/provisions should be suggested to the powers-that-be and see what comes of it.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1169098 · Report as offensive
Jon Stoltz

Send message
Joined: 22 Jan 08
Posts: 6
Credit: 338,123
RAC: 0
United States
Message 1169137 - Posted: 8 Nov 2011, 13:14:05 UTC

i think skynet is taking over

*runs*
ID: 1169137 · Report as offensive
KWSN - Sir William The Flagrantly Verbose
Avatar

Send message
Joined: 28 Apr 00
Posts: 829
Credit: 11,757,678
RAC: 0
United States
Message 1169139 - Posted: 8 Nov 2011, 13:15:02 UTC

Hi all!

Server status as of 13Z shows everything green, but I am unable to access the project. Anyone else continuing to have these issues after the difficulties over the weekend? My big iron crunchers have been starved and are getting bread and water ration of WUs since the trouble...

11/8/2011 8:06:22 AM | SETI@home | Started download of 28mr11ac.3438.11110.13.10.238
11/8/2011 8:06:22 AM | SETI@home | Started download of 28mr11ac.23545.13973.12.10.122
11/8/2011 8:06:25 AM | SETI@home | Temporarily failed download of 28mr11ac.23545.13973.12.10.122: HTTP error
11/8/2011 8:06:25 AM | SETI@home | Backing off 29 min 49 sec on download of 28mr11ac.23545.13973.12.10.122
11/8/2011 8:06:25 AM | SETI@home | Started download of 28mr11ac.23545.13973.12.10.140
11/8/2011 8:06:27 AM | SETI@home | Temporarily failed download of 28mr11ac.23545.13973.12.10.140: HTTP error
11/8/2011 8:06:27 AM | SETI@home | Backing off 18 min 39 sec on download of 28mr11ac.23545.13973.12.10.140
11/8/2011 8:06:27 AM | SETI@home | Started download of 28mr11ac.23545.13973.12.10.139
11/8/2011 8:06:29 AM | SETI@home | update requested by user
11/8/2011 8:06:31 AM | SETI@home | Sending scheduler request: Requested by user.
11/8/2011 8:06:31 AM | SETI@home | Reporting 2 completed tasks, not requesting new tasks
11/8/2011 8:06:53 AM | SETI@home | Scheduler request failed: Couldn't connect to server
11/8/2011 8:06:57 AM | | Project communication failed: attempting access to reference site
11/8/2011 8:07:00 AM | | Internet access OK - project servers may be temporarily down.
11/8/2011 8:09:34 AM | SETI@home | Temporarily failed download of 28mr11ac.23545.13973.12.10.139: HTTP error
11/8/2011 8:09:34 AM | SETI@home | Backing off 10 min 1 sec on download of 28mr11ac.23545.13973.12.10.139
11/8/2011 8:09:37 AM | | Project communication failed: attempting access to reference site
11/8/2011 8:09:38 AM | | Internet access OK - project servers may be temporarily down.


"People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf." - Orwell

ID: 1169139 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1169149 - Posted: 8 Nov 2011, 13:42:25 UTC

Uploads are going through, but I'm not unable to report anything, so obviously I'm not getting any downloads to have stall.

Well it is Tuesday, so the tyre kicker in chief will be along soon to kick the tyres into motion after the inevitable Tuesday downs and ups have taken place.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1169149 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1169156 - Posted: 8 Nov 2011, 14:19:26 UTC

Something seems to have worked right. For the first time this morning I have seen the computer has reached limit message. I have 303 work units in progress on my E5400 core two dual and GTS 450. Looks like I might actually make it through the outage with work to spare. I've not managed to reach the limit before now.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1169156 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1169157 - Posted: 8 Nov 2011, 14:19:59 UTC

Well that was spectacular.
Within a couple of minutes of my last posting things happened.
First a scheduler request got through and loads were reported.
Then I got a few resends in the download queue.
And they started to stall.
Then I got a pile of new work.
And they joined the stalled download queue.
Then I looked at the status page and see that the transitioners are not in the best of health.

The tyre kicker will need a good set of boots with all the kicking that needs to be done. Maybe he will find that yellow fluff ball and cure it forever?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1169157 · Report as offensive
Orioneti

Send message
Joined: 22 Oct 07
Posts: 21
Credit: 23,642,634
RAC: 0
Finland
Message 1169212 - Posted: 8 Nov 2011, 21:12:19 UTC - in response to Message 1169139.  
Last modified: 8 Nov 2011, 21:13:46 UTC

Anyone else continuing to have these issues after the difficulties over the weekend?


I've got the same problem here. All my crunchers are out of SETI work and have been unable to reach project after weekend. They are able to reach Einstein@home just fine though...
ID: 1169212 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1169218 - Posted: 8 Nov 2011, 21:42:06 UTC

And oooops, we're down again...
ID: 1169218 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1169260 - Posted: 8 Nov 2011, 23:35:40 UTC - in response to Message 1169218.  
Last modified: 8 Nov 2011, 23:38:28 UTC

And oooops, we're down again...

Apparently because of the 2 Billion Results Bug:

2 Billion Results

Claggy
ID: 1169260 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 9 · Next

Message boards : Number crunching : Panic Mode On (60) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.