Panic Mode On (62) Server problems?

Message boards : Number crunching : Panic Mode On (62) Server problems?

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

AuthorMessage
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45964
Credit: 815,514,212
RAC: 123,446
United States
Message 1175614 - Posted: 4 Dec 2011, 14:21:08 UTC - in response to Message 1175612.  
Last modified: 4 Dec 2011, 14:31:27 UTC

My top rig has about 3 hours worth of GPU work left.
Since it can't even upload work at the rate it is crunching the shorties, most of the time Boinc won't even let it ask for work, much less download anything issued.

Oh well, the kitties will have to find some alternate kibble to crunch on until the servers can be kicked out of their whacked state.


The download pipe is still full so who or where are all them WU's going?



Well, with the multitude of hosts on the project, even if up/down and work requests are only successful for a small percentage of them, it's enough to keep the bandwidth filled. That, and the fact that what ever is issued has mostly been shorties.

(I suspect as well that a lot of the bandwidth is being wasted on idle chatter between the servers and hosts with partial downloads and repeated retries).

I did just get one work request successfully through for my top rig by spending some quality time with my mouse and the retry button.
And they were of the 5-6 minute variety rather than 2 minute shorties.
That will help extend the GPU cache a little bit.

So maybe there is some hope.

Meow meow meow!
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1175614 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 502
Credit: 46,965,339
RAC: 13,113
United Kingdom
Message 1175621 - Posted: 4 Dec 2011, 14:50:32 UTC - in response to Message 1175614.  

I did just get one work request successfully through for my top rig by spending some quality time with my mouse and the retry button.

Meow meow meow!


I tried that but mine was too far gone, I only had shorties left and I could not press the button fast enough to keep up with them.



Kevin


ID: 1175621 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 502
Credit: 46,965,339
RAC: 13,113
United Kingdom
Message 1175623 - Posted: 4 Dec 2011, 14:52:32 UTC - in response to Message 1175610.  


41 VLAR's in the pot, lets see how x41g can handle them, I am going easy only 1 per card.



I am looking at about an hour per VLAR on GPU, so x41g looks as good as or slightly better than previous releases on 470's.



Kevin


ID: 1175623 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 794
Credit: 17,648,150
RAC: 0
United States
Message 1175643 - Posted: 4 Dec 2011, 16:05:19 UTC

No uploads and no downloads of any kind now.
ID: 1175643 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 10535
Credit: 135,507,593
RAC: 42,302
Australia
Message 1175702 - Posted: 4 Dec 2011, 20:35:32 UTC - in response to Message 1175643.  

Well things here are business as usual with my 3 rigs bouncing on/off the limits still.

Cheers.
ID: 1175702 · Report as offensive
musicplayer

Send message
Joined: 17 May 10
Posts: 1785
Credit: 842,842
RAC: 0
Message 1175703 - Posted: 4 Dec 2011, 20:36:45 UTC

Umm, it is up and running again. Great!
ID: 1175703 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 780
Credit: 232,826,803
RAC: 81,383
United Kingdom
Message 1175706 - Posted: 4 Dec 2011, 20:55:43 UTC - in response to Message 1175621.  

I did just get one work request successfully through for my top rig by spending some quality time with my mouse and the retry button.

Meow meow meow!


I tried that but mine was too far gone, I only had shorties left and I could not press the button fast enough to keep up with them.




To save having to monitor the Retry button I made up a little cron job and a wee awk script:

crontab entry:
* * * * * source /home/Compaq_Owner/retryfiles

retryfiles:

cd c:
cd 'Program Files/BOINC'
./boinccmd.exe --get_file_transfers | gawk -f retry.awk

Program Files\BOINC\retry.awk:

/name/ { n = $2;}
/ xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}

In other words, every minute cron runs retryfiles; retryfiles lists the files awaiting transfer and hands the results to retry.awk. The awk script stores the name of each file as the data passes through it, then if it sees that that file has an inactive transfer it spawns a system command to tell boinccmd to retry the transfer... The nice thing about doing it this way is that I only bother the Berkeley servers if I find a transfer that's in a wait-for-retry state.

This is working well on my two main Windows/NVIDIA boxes; their tasks in progress are slowly rising whereas this morning they both had empty caches. The commands as given should work out-of-the-box with most Linux installations; for Windows you need to install cygwin and its cron service -- or other equivalent software.

To run the script less often than every minute use */<n> as the first entry in the crontab line, where <n> is however many minutes you want to delay between instances, e.g. */5 for every fifth minute.
ID: 1175706 · Report as offensive
Tutankhamon "Communist"
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6091
Credit: 37,752,485
RAC: 18,096
Sweden
Message 1175708 - Posted: 4 Dec 2011, 20:58:44 UTC - in response to Message 1175702.  

Well things here are business as usual with my 3 rigs bouncing on/off the limits still.

Cheers.


Same here, the two "faster" ones, are at their limits, and the ATOM doesn't want any more, it's so slow that it stays well below the server limits.

I have a combined WU count of 1679 WU's in progress. So no problems here in uploading or downloading.
This is a test of the Emergency Moron System. Had there been a real moron in the room, there would've been a small mushroom cloud in the place where the idiot had been standing.
ID: 1175708 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 502
Credit: 46,965,339
RAC: 13,113
United Kingdom
Message 1175710 - Posted: 4 Dec 2011, 21:09:02 UTC - in response to Message 1175706.  


I tried that but mine was too far gone, I only had shorties left and I could not press the button fast enough to keep up with them.




To save having to monitor the Retry button I made up a little cron job and a wee awk script:



Snip.

This may be useful to some, a little basic to others, but its way above my head, mouse clicking is about my limit:-)


Kevin


ID: 1175710 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 780
Credit: 232,826,803
RAC: 81,383
United Kingdom
Message 1175720 - Posted: 4 Dec 2011, 22:19:30 UTC - in response to Message 1175710.  


To save having to monitor the Retry button I made up a little cron job and a wee awk script:



Snip.

This may be useful to some, a little basic to others, but its way above my head, mouse clicking is about my limit:-)



Each to his own, Kev. You know I need to understand this sort of thing for my job; the beauty of computers is that they can relieve us of button-clicking duty, they never get tired. So ultimately a little bit of time spent learning something like awk (or python, or perl if you want even more capability) _can_ pay you back in added flexibility. But it's not compulsory...

Cheers, mate!
ID: 1175720 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 502
Credit: 46,965,339
RAC: 13,113
United Kingdom
Message 1175734 - Posted: 4 Dec 2011, 22:44:00 UTC - in response to Message 1175720.  


Each to his own, Kev. You know I need to understand this sort of thing for my job; the beauty of computers is that they can relieve us of button-clicking duty, they never get tired. So ultimately a little bit of time spent learning something like awk (or python, or perl if you want even more capability) _can_ pay you back in added flexibility. But it's not compulsory...

Cheers, mate!


I am just a lorry driver by trade, yes they are sneeking into the cabs, there is even a bunch of electronics between my right foot and the engine now but the thing I have to worry about is whats on the dash (ie: call out tow truck).

I only dabble with computers for pleasure and due to other commitments time is in very short supply, If or when I can find the time increasing my capabilities with a computer is on my to do list.



Kevin


ID: 1175734 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,622,794
RAC: 342
United States
Message 1175767 - Posted: 5 Dec 2011, 2:37:03 UTC

My APs upload first try every time one finishes. Scheduler requests go through every time, but about 99.5% of the time respond with "no tasks available" or "your app_info.xml file doesn't have a usable version of Seti@Home Enhanced." On the rare occasion that I do get issued an AP, it instant-fails 1-10 times and then finally goes through. I haven't hit any buttons in at least a month.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1175767 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7495
Credit: 91,208,102
RAC: 46,042
Australia
Message 1175798 - Posted: 5 Dec 2011, 7:33:32 UTC - in response to Message 1175192.  


Take a look at Scarecrow's graphs.
A surge as the backlog of uploads goes through is to be expected- but for it to be sustained at over 110,000 for several hours. Talk about a hammering.
Grant
Darwin NT
ID: 1175798 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7495
Credit: 91,208,102
RAC: 46,042
Australia
Message 1175805 - Posted: 5 Dec 2011, 8:53:30 UTC - in response to Message 1175798.  
Last modified: 5 Dec 2011, 9:12:56 UTC

And it would appear it was too much for too long- the uploads are backing up yet again.


EDIT- they have finally cleaered, but the inbound traffic is looking jagged again. Not a good sign.
Grant
Darwin NT
ID: 1175805 · Report as offensive
Profile Dimly Lit Lightbulb 😀Project Donor
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 14363
Credit: 2,925,357
RAC: 2,932
United Kingdom
Message 1175917 - Posted: 5 Dec 2011, 19:07:02 UTC

One, and a third of an astropulse left to go. I'm starting to panic a bit.
ID: 1175917 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1791
Credit: 225,343,269
RAC: 10,426
Australia
Message 1175976 - Posted: 6 Dec 2011, 1:19:53 UTC
Last modified: 6 Dec 2011, 1:49:32 UTC

Oops, Looks like uploads have gone MGD again.

All my rigs are now in "project backoff" for uploads and all have been getting "No Tasks Available" when asking for work for some hours.

And it's now just after knock off time in Berkeley so there will be no-one there to apply the rubber hammer. :P

EDIT: Looks like I was wrong about the rubber hammer, about 15 minutes after I posted, all uploads cleared at good speed.

T.A.
ID: 1175976 · Report as offensive
Amauri
Volunteer tester

Send message
Joined: 18 May 08
Posts: 26
Credit: 508,976
RAC: 0
Brazil
Message 1176024 - Posted: 6 Dec 2011, 4:14:00 UTC - in response to Message 1175706.  

To save having to monitor the Retry button I made up a little cron job and a wee awk script:

crontab entry:
* * * * * source /home/Compaq_Owner/retryfiles

retryfiles:

cd c:
cd 'Program Files/BOINC'
./boinccmd.exe --get_file_transfers | gawk -f retry.awk

Program Files\BOINC\retry.awk:

/name/ { n = $2;}
/ xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}



Great job, Ivan, thank you!
ID: 1176024 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7495
Credit: 91,208,102
RAC: 46,042
Australia
Message 1176044 - Posted: 6 Dec 2011, 8:27:24 UTC - in response to Message 1176024.  


Once again, uploads accumulate.
Grant
Darwin NT
ID: 1176044 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 314
Credit: 44,998,680
RAC: 11,136
United Kingdom
Message 1176067 - Posted: 6 Dec 2011, 13:14:28 UTC

Cricket graph has just base lined for uploads and downloads. Thats probably that until after the weekly outage.
ID: 1176067 · Report as offensive
Profile SciManStev
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 5858
Credit: 106,001,257
RAC: 3,171
United States
Message 1176070 - Posted: 6 Dec 2011, 13:34:55 UTC

I'm still dreaming of the day when my GPU's don't run dry 2-3 times a week.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1176070 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (62) Server problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.