Panic Mode On (116) Server Problems?

Message boards : Number crunching : Panic Mode On (116) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 42 · 43 · 44 · 45 · 46 · 47 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2001861 - Posted: 10 Jul 2019, 0:35:54 UTC

Spent the last couple of hours manually updating to get acks for the uploads that have gone through. The two slowest hosts reduced their uploads enough to clear the trip point and actually download new work. The other machines still have around 3000 tasks still in upload so no downloads for them yet. Today is definitely not a "set and forget" outage.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2001861 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2001863 - Posted: 10 Jul 2019, 0:56:44 UTC - in response to Message 2001861.  
Last modified: 10 Jul 2019, 0:57:22 UTC

Spent the last couple of hours manually updating to get acks for the uploads that have gone through. The two slowest hosts reduced their uploads enough to clear the trip point and actually download new work. The other machines still have around 3000 tasks still in upload so no downloads for them yet. Today is definitely not a "set and forget" outage.

. . and so endeth the lesson ... amen :(

Stephen

. . My slowest machine is managing to get work only because it only uploads about 1 task / 5 mins. The other three are all log jammed ... :(

Stephen

:(
ID: 2001863 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2001865 - Posted: 10 Jul 2019, 1:03:18 UTC

Both my caches have been full for the last 90mins or so.

Cheers.
ID: 2001865 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2001868 - Posted: 10 Jul 2019, 1:30:56 UTC - in response to Message 2001865.  

Both my caches have been full for the last 90mins or so.

Cheers.


. . Here uploads continue to fail but with much jiggery pokery I have managed get new work in the caches on my two Linux machines. Now if I can just manage that trick on the two Windows machines :(

Stephen

? ?
ID: 2001868 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2001872 - Posted: 10 Jul 2019, 2:58:09 UTC

Making progress. Now only have the Threadripper host that still has a 1000 tasks in upload that isn't asking for work. All the other hosts have finally cleared the hurdle and are asking for new work? But they still only ask for work if I manually update since the stalled uploads prevent the automatic scheduler connection. But at least the stalled uploads are below the trip point for not asking for work no matter what. I wonder what the code tells the client is the trip point that prevents downloading of work if you have too many uploads in progress. Is it a hard number or a percentage or something. I guess I will have to walk the code to find out unless someone wants to chime in with the actual mechanism.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2001872 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2001874 - Posted: 10 Jul 2019, 3:13:13 UTC - in response to Message 2001872.  

2x number of CPU cores.
ID: 2001874 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2001876 - Posted: 10 Jul 2019, 3:27:03 UTC - in response to Message 2001874.  

2x number of CPU cores.


Thanks very much Richard. I was always curious what the trip point was. Now I know.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2001876 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2001879 - Posted: 10 Jul 2019, 3:51:16 UTC - in response to Message 2001874.  

2x number of CPU cores.


so if you have more uploads than 2x CPU cores, it wont ask for work?

my systems wont ask for work automatically on the 5:03 cycle even if there are only 2-3 uploads in progress and thats on a system with like 32 cores. it will get work if i manually click the update button however.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2001879 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2001882 - Posted: 10 Jul 2019, 4:13:12 UTC - in response to Message 2001879.  
Last modified: 10 Jul 2019, 4:13:44 UTC

2x number of CPU cores.


so if you have more uploads than 2x CPU cores, it wont ask for work?

my systems wont ask for work automatically on the 5:03 cycle even if there are only 2-3 uploads in progress and thats on a system with like 32 cores. it will get work if i manually click the update button however.

I think there is another part to the algorithm. If you have any stalled upload, it ignores the 5:03 scheduler connection. The 2X number of cpu cores is just what trips the "unable to request work because too many uploads" and you report your tasks and get the 0:00 cpu seconds of work requested and 0:00 seconds of gpu work requested.

I still have to manually update hosts that can get work because the uploads are still slow.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2001882 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2001885 - Posted: 10 Jul 2019, 4:53:56 UTC - in response to Message 2001882.  
Last modified: 10 Jul 2019, 4:55:30 UTC

Are you automating it? Or manually clicking the button every 5 mins?

watch -n 310 ./boinccmd --project http://setiathome.berkeley.edu update


That will click the button for you every 310 seconds. Adjust as necessary. Make sure you run it from the BOINC directory
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2001885 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65746
Credit: 55,293,173
RAC: 49
United States
Message 2001886 - Posted: 10 Jul 2019, 5:09:04 UTC

Uploads seem to need ye olde plunger, while downloads don't here.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 2001886 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65746
Credit: 55,293,173
RAC: 49
United States
Message 2001887 - Posted: 10 Jul 2019, 5:10:41 UTC - in response to Message 2001885.  

Are you automating it? Or manually clicking the button every 5 mins?

watch -n 310 ./boinccmd --project http://setiathome.berkeley.edu update


That will click the button for you every 310 seconds. Adjust as necessary. Make sure you run it from the BOINC directory

Ok so what file does that go in?
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 2001887 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2001890 - Posted: 10 Jul 2019, 5:28:45 UTC

type that at the command prompt.
ID: 2001890 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2001892 - Posted: 10 Jul 2019, 5:41:22 UTC - in response to Message 2001885.  

Are you automating it? Or manually clicking the button every 5 mins?

watch -n 310 ./boinccmd --project http://setiathome.berkeley.edu update


That will click the button for you every 310 seconds. Adjust as necessary. Make sure you run it from the BOINC directory

No I have just been manually updating via BoincTasks on the daily driver since I've been reading through the day's happenings on it anyway while I was at the movies.

Moot point now as the downloads finally cleared on the TR and it is starting to ask for work finally. If the upload problems had persisted, I would have put the watch command in play in a Terminal on the last recalcitrant child before heading off to bed.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2001892 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2001893 - Posted: 10 Jul 2019, 5:45:49 UTC - in response to Message 2001892.  

I was wondering why you were still up...lol
ID: 2001893 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65746
Credit: 55,293,173
RAC: 49
United States
Message 2001894 - Posted: 10 Jul 2019, 5:51:06 UTC

ID: 2001894 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2001895 - Posted: 10 Jul 2019, 6:03:41 UTC - in response to Message 2001893.  

I was wondering why you were still up...lol

I'm about ready to call it quits, eyelids getting heavy. Pillow is beckoning.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2001895 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2001897 - Posted: 10 Jul 2019, 7:47:08 UTC

Uploads are still rather iffy.
Just found a bit over a half dozen uploads on one of my systems counting down their backoffs.
Grant
Darwin NT
ID: 2001897 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2001898 - Posted: 10 Jul 2019, 8:14:21 UTC

WU awaiting deletion is on the rise, splitter output & Ready-to-send buffer are on the decline.
Grant
Darwin NT
ID: 2001898 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2001900 - Posted: 10 Jul 2019, 8:32:39 UTC - in response to Message 2001879.  

2x number of CPU cores.


so if you have more uploads than 2x CPU cores, it wont ask for work?
my systems wont ask for work automatically on the 5:03 cycle even if there are only 2-3 uploads in progress and thats on a system with like 32 cores. it will get work if i manually click the update button however.


. . Yeah I think that is the mechanism, if you have more than 2xCPU cores uploads 'in progress' (read stalled) you want get new work even when you give a manual kick. Less than that and you will get new work but only if you manually trigger the request.

Stephen

:(
ID: 2001900 · Report as offensive
Previous · 1 . . . 42 · 43 · 44 · 45 · 46 · 47 · Next

Message boards : Number crunching : Panic Mode On (116) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.