The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 94 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 2024895 - Posted: 25 Dec 2019, 8:48:41 UTC - in response to Message 2024893.  
Last modified: 25 Dec 2019, 8:52:24 UTC

Starting to get dribs and drabs of work. But all downloads are stalling out and backing off 5 hours.
Yep, instant timeouts on Linux system downloads now. Windows system (hosts file used) they don't time out instantly, just count away with no download activity, then timeout after a few seconds or a minute or so.
It's probably a good thing we can't download any work, as the splitters aren't actually producing any to replace what's already gone from the Ready-to-send buffer.

Things are still very, very broken.
Grant
Darwin NT
ID: 2024895 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2024896 - Posted: 25 Dec 2019, 8:49:29 UTC - in response to Message 2024893.  

Starting to get dribs and drabs of work. But all downloads are stalling out and backing off 5 hours.



Keith...why are you up?? hahaha
ID: 2024896 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2024898 - Posted: 25 Dec 2019, 9:00:33 UTC

After many hours and dozens of "no tasks available", I did the trick for resending lost tasks (even though there weren't any), and the scheduler requests started giving me small handfuls of them. Most of which are--as others have also reported--having transfer issues.

But at least the scheduler is giving work out again.

Thank you, staff.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2024898 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 2024899 - Posted: 25 Dec 2019, 9:02:46 UTC - in response to Message 2024898.  

But at least the scheduler is giving work out again.
I wish that were so.
For those brief periods where downloads are considered to be downloading, the Scheduler response to a work request is "Project has no tasks available."
Grant
Darwin NT
ID: 2024899 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 2024907 - Posted: 25 Dec 2019, 10:45:32 UTC

Downloads have started again, but "Project has no tasks available" seems the be the current response for work as the Ready-to-send is 0 & splitters aren't.
Grant
Darwin NT
ID: 2024907 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2024908 - Posted: 25 Dec 2019, 10:48:48 UTC

Much the same here, except it slowly appears to be easing off and getting back to normal - both work fetch and downloads seem to be getting easier.
ID: 2024908 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 2024911 - Posted: 25 Dec 2019, 11:08:34 UTC
Last modified: 25 Dec 2019, 11:09:23 UTC

Hmm, managed to pick up some work (new work that is, not resends) in the last 30min or so, Ready-to-send showing 1200, but splitter output has been reported as 0 for about an hour now.

Will see how things are come the morning.
Night all.
Grant
Darwin NT
ID: 2024911 · Report as offensive
Profile NorthCup

Send message
Joined: 6 Jun 99
Posts: 108
Credit: 50,093,984
RAC: 5
Germany
Message 2024912 - Posted: 25 Dec 2019, 11:40:17 UTC
Last modified: 25 Dec 2019, 11:42:52 UTC

The Slots are full - all linux-anonymous -systems are operational - thanks again to seti-staff and happy holidays at all - lg Klaus
ID: 2024912 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 2024913 - Posted: 25 Dec 2019, 11:56:37 UTC - in response to Message 2024606.  
Last modified: 25 Dec 2019, 11:57:44 UTC

OK, I've applied Retvari's workround on my Linux Mint host running special sauce and the spoofed client. There's good news and bad news.

Good news (1): It's running
Good news (2): It's picking up the command line. -nobs isn't specifically acknowledged, but I've got an -unroll in there too, and that's reported.

Bad news: It's not received the instruction to run on device 0 / device 1

The device number is being passed correctly in init_data.xml, but the special sauce app isn't looking in the right place. I see Petri's app_info.xml doesn't contain an API version specifier, and it worked before, so I assume it's only listening for a command line. That's ancient, and should be corrected. Petri needs to compile against a newer BOINC API library and make the appropriate coding adjustments.

But at least my machine is running 2-up on device 0 - so at about half normal speed - and some warmth is creeping back into my workroom.


Hi,
anyone with compilation ability can make the special app to read the gpu number from init_data.xml.

in main.cpp you can define BOINC_MAJOR_VERSION to 8 just before the version check
...
    // Patch for Cuda device selection, Care of Juha Sointusalo.
    // Deals with boinc api 7.5 onwards breaking change (mid major version),
    // from standard use of command line to field not present on preferred earlier versions.
    // init_data.xml has e.g. <gpu_device_num>0</gpu_device_num>
#define BOINC_MAJOR_VERSION 8 
#if (BOINC_MAJOR_VERSION >= 8) || ((BOINC_MAJOR_VERSION == 7) && (BOINC_MINOR_VERSION >= 5))
    if (app_init_data.gpu_device_num >= 0) {
      gCUDADevPref = app_init_data.gpu_device_num + 1;
      fprintf(stderr, "app_init.xml specified GPU %d\r\n", app_init_data.gpu_device_num);
    }
#endif
...


I tested this and it says now on stderr
app_init.xml specified GPU 0
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 2024913 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 2024914 - Posted: 25 Dec 2019, 12:07:41 UTC

My caches are full again as well.


With each crime and every kindness we birth our future.
ID: 2024914 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2024915 - Posted: 25 Dec 2019, 12:26:36 UTC
Last modified: 25 Dec 2019, 12:27:58 UTC

My 2 cleaned rigs had their caches full again 45 mins before it's the day after Xmas here and I'm off to bed before that day gets here.

Enjoy the festive season everyone.

Cheers.
ID: 2024915 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11360
Credit: 29,581,041
RAC: 66
United States
Message 2024917 - Posted: 25 Dec 2019, 12:44:24 UTC

Both boxes are full, Merry Christmas to all.
ID: 2024917 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2024918 - Posted: 25 Dec 2019, 12:49:05 UTC - in response to Message 2024857.  
Last modified: 25 Dec 2019, 13:26:44 UTC

Any Stock hosts getting work? Is Resend lost tasks still on?


My Windows box is purely stock and appears to have all its tasks reported and a full set of caches.

Excepting Beta mind you. Which is still down.

I will see if my Linux box has "finally" gotten all its gpu tasks cleared and hopefully the "reports" done.

And turn back on the Anonymous platform.

Right now I can complain about how slow Stock is under Linux.

Then I can start complaining about no tasks :)

--Edit--
The Server immediately dumped a bunch of gpu tasks on me.
When I started up my Weekend Warrior boxes they too immediately got tasks.

May be a bit short on Cpu tasks though.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2024918 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2024930 - Posted: 25 Dec 2019, 15:22:52 UTC
Last modified: 25 Dec 2019, 15:45:36 UTC

Hey, just for Future Reference, Anonymous platform doesn't work on the BETA Server, I discovered that a few months ago. All you receive is Server Errors when trying to download work. You can contact the Server when running No New Tasks, but, that won't get you any work. I suspect if that code is ever moved to Main, All Hell will break loose ;-)

BTW, I finally found out how to keep the Main Server from sending all those OpenCL tasks when trying to Spoof the CUDA Special App as Stock. Just add <no_opencl>1</no_opencl> to cc_config.xml, then restart BOINC, and then it will only send tasks for CUDA. It means you also won't run any APs, but, who needs APs when you have the Special App? Now I just have to figure out what to do with all these Ghosts that use to be OpenCL tasks. I kinda though Resend Lost Tasks would be running and take care of it..oh well.

Merry Christmas to You too!
ID: 2024930 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2024932 - Posted: 25 Dec 2019, 15:41:05 UTC

Woke up this morning to a Xmas present of tasks downloading and running. Went to fire up the hosts I had turned off for the weekend. Replica is still behind and not showing the work yet. The validated tasks and the pendings significantly dropped from before the Grand Mal Outrage event.

Thanks to the staff for taking my advice of rolling back the server software.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2024932 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2024934 - Posted: 25 Dec 2019, 15:46:43 UTC - in response to Message 2024930.  

Hey, just for Future Reference, Anonymous platform doesn't work on the BETA Server, I discovered that a few months ago. All you receive is Server Errors when trying to download work. You can contact the Server when running No New Tasks, but, that won't get you any work. I suspect if that code is ever moved to Main, All Hell will break loose ;-)

BTW, I finally found out how to keep the Main Server from sending all those OpenCL tasks when trying to Spoof the CUDA Special App as Stock. Just add <no_opencl>1</no_opencl> to cc_config.xml, then restart BOINC, and then it will only send tasks for CUDA. It means you also won't run any APs, but, who needs APs when you have the Special App? Now I just have to figure out what to do with all these Ghosts that use to be OpenCL tasks. I kinda though Resend Lost Tasks would be running and take care of it..oh well.

Merry Christmas to You too!


Neat! So now you aren't actually running Anonymous anymore since you found the work around?

Tom
A proud member of the OFA (Old Farts Association).
ID: 2024934 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2024939 - Posted: 25 Dec 2019, 16:14:15 UTC - in response to Message 2024934.  

I'm going to keep testing the App as Stock for now. I did find that boinc-master 7.5 doesn't work on the Mac as Stock, even though it seems to work on Linux. I might recompile the Apps with boinc-master 7.11 soon.
ID: 2024939 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2024941 - Posted: 25 Dec 2019, 16:38:01 UTC

My anon boxes have work and everything seems to be fine but the server status page says the replica database is again being left behind at a steady rate, which I guess will predict trouble in the future...
ID: 2024941 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2024943 - Posted: 25 Dec 2019, 16:51:32 UTC - in response to Message 2024941.  

My anon boxes have work and everything seems to be fine but the server status page says the replica database is again being left behind at a steady rate, which I guess will predict trouble in the future...


I think it will grow to some value and come back down as things stabilize, just like what happened after the last server change.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2024943 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2024954 - Posted: 25 Dec 2019, 18:04:37 UTC

No panic.

The replica lag time is decreasing, and the RTS is increasing.

Happy Holidays! Thank you Seti crew for working so long yesterday to fix the issues!
ID: 2024954 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.