The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 94 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 2024911 - Posted: 25 Dec 2019, 11:08:34 UTC
Last modified: 25 Dec 2019, 11:09:23 UTC

Hmm, managed to pick up some work (new work that is, not resends) in the last 30min or so, Ready-to-send showing 1200, but splitter output has been reported as 0 for about an hour now.

Will see how things are come the morning.
Night all.
Grant
Darwin NT
ID: 2024911 · Report as offensive
Profile NorthCup

Send message
Joined: 6 Jun 99
Posts: 108
Credit: 50,093,984
RAC: 5
Germany
Message 2024912 - Posted: 25 Dec 2019, 11:40:17 UTC
Last modified: 25 Dec 2019, 11:42:52 UTC

The Slots are full - all linux-anonymous -systems are operational - thanks again to seti-staff and happy holidays at all - lg Klaus
ID: 2024912 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 2024913 - Posted: 25 Dec 2019, 11:56:37 UTC - in response to Message 2024606.  
Last modified: 25 Dec 2019, 11:57:44 UTC

OK, I've applied Retvari's workround on my Linux Mint host running special sauce and the spoofed client. There's good news and bad news.

Good news (1): It's running
Good news (2): It's picking up the command line. -nobs isn't specifically acknowledged, but I've got an -unroll in there too, and that's reported.

Bad news: It's not received the instruction to run on device 0 / device 1

The device number is being passed correctly in init_data.xml, but the special sauce app isn't looking in the right place. I see Petri's app_info.xml doesn't contain an API version specifier, and it worked before, so I assume it's only listening for a command line. That's ancient, and should be corrected. Petri needs to compile against a newer BOINC API library and make the appropriate coding adjustments.

But at least my machine is running 2-up on device 0 - so at about half normal speed - and some warmth is creeping back into my workroom.


Hi,
anyone with compilation ability can make the special app to read the gpu number from init_data.xml.

in main.cpp you can define BOINC_MAJOR_VERSION to 8 just before the version check
...
    // Patch for Cuda device selection, Care of Juha Sointusalo.
    // Deals with boinc api 7.5 onwards breaking change (mid major version),
    // from standard use of command line to field not present on preferred earlier versions.
    // init_data.xml has e.g. <gpu_device_num>0</gpu_device_num>
#define BOINC_MAJOR_VERSION 8 
#if (BOINC_MAJOR_VERSION >= 8) || ((BOINC_MAJOR_VERSION == 7) && (BOINC_MINOR_VERSION >= 5))
    if (app_init_data.gpu_device_num >= 0) {
      gCUDADevPref = app_init_data.gpu_device_num + 1;
      fprintf(stderr, "app_init.xml specified GPU %d\r\n", app_init_data.gpu_device_num);
    }
#endif
...


I tested this and it says now on stderr
app_init.xml specified GPU 0
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 2024913 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34451
Credit: 79,922,639
RAC: 80
Germany
Message 2024914 - Posted: 25 Dec 2019, 12:07:41 UTC

My caches are full again as well.
With each crime and every kindness we birth our future.
ID: 2024914 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 37560
Credit: 261,360,520
RAC: 489
Australia
Message 2024915 - Posted: 25 Dec 2019, 12:26:36 UTC
Last modified: 25 Dec 2019, 12:27:58 UTC

My 2 cleaned rigs had their caches full again 45 mins before it's the day after Xmas here and I'm off to bed before that day gets here.

Enjoy the festive season everyone.

Cheers.
ID: 2024915 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11449
Credit: 29,581,041
RAC: 66
United States
Message 2024917 - Posted: 25 Dec 2019, 12:44:24 UTC

Both boxes are full, Merry Christmas to all.
ID: 2024917 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2024918 - Posted: 25 Dec 2019, 12:49:05 UTC - in response to Message 2024857.  
Last modified: 25 Dec 2019, 13:26:44 UTC

Any Stock hosts getting work? Is Resend lost tasks still on?


My Windows box is purely stock and appears to have all its tasks reported and a full set of caches.

Excepting Beta mind you. Which is still down.

I will see if my Linux box has "finally" gotten all its gpu tasks cleared and hopefully the "reports" done.

And turn back on the Anonymous platform.

Right now I can complain about how slow Stock is under Linux.

Then I can start complaining about no tasks :)

--Edit--
The Server immediately dumped a bunch of gpu tasks on me.
When I started up my Weekend Warrior boxes they too immediately got tasks.

May be a bit short on Cpu tasks though.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2024918 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2024930 - Posted: 25 Dec 2019, 15:22:52 UTC
Last modified: 25 Dec 2019, 15:45:36 UTC

Hey, just for Future Reference, Anonymous platform doesn't work on the BETA Server, I discovered that a few months ago. All you receive is Server Errors when trying to download work. You can contact the Server when running No New Tasks, but, that won't get you any work. I suspect if that code is ever moved to Main, All Hell will break loose ;-)

BTW, I finally found out how to keep the Main Server from sending all those OpenCL tasks when trying to Spoof the CUDA Special App as Stock. Just add <no_opencl>1</no_opencl> to cc_config.xml, then restart BOINC, and then it will only send tasks for CUDA. It means you also won't run any APs, but, who needs APs when you have the Special App? Now I just have to figure out what to do with all these Ghosts that use to be OpenCL tasks. I kinda though Resend Lost Tasks would be running and take care of it..oh well.

Merry Christmas to You too!
ID: 2024930 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2024932 - Posted: 25 Dec 2019, 15:41:05 UTC

Woke up this morning to a Xmas present of tasks downloading and running. Went to fire up the hosts I had turned off for the weekend. Replica is still behind and not showing the work yet. The validated tasks and the pendings significantly dropped from before the Grand Mal Outrage event.

Thanks to the staff for taking my advice of rolling back the server software.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2024932 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2024934 - Posted: 25 Dec 2019, 15:46:43 UTC - in response to Message 2024930.  

Hey, just for Future Reference, Anonymous platform doesn't work on the BETA Server, I discovered that a few months ago. All you receive is Server Errors when trying to download work. You can contact the Server when running No New Tasks, but, that won't get you any work. I suspect if that code is ever moved to Main, All Hell will break loose ;-)

BTW, I finally found out how to keep the Main Server from sending all those OpenCL tasks when trying to Spoof the CUDA Special App as Stock. Just add <no_opencl>1</no_opencl> to cc_config.xml, then restart BOINC, and then it will only send tasks for CUDA. It means you also won't run any APs, but, who needs APs when you have the Special App? Now I just have to figure out what to do with all these Ghosts that use to be OpenCL tasks. I kinda though Resend Lost Tasks would be running and take care of it..oh well.

Merry Christmas to You too!


Neat! So now you aren't actually running Anonymous anymore since you found the work around?

Tom
A proud member of the OFA (Old Farts Association).
ID: 2024934 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2024939 - Posted: 25 Dec 2019, 16:14:15 UTC - in response to Message 2024934.  

I'm going to keep testing the App as Stock for now. I did find that boinc-master 7.5 doesn't work on the Mac as Stock, even though it seems to work on Linux. I might recompile the Apps with boinc-master 7.11 soon.
ID: 2024939 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2024941 - Posted: 25 Dec 2019, 16:38:01 UTC

My anon boxes have work and everything seems to be fine but the server status page says the replica database is again being left behind at a steady rate, which I guess will predict trouble in the future...
ID: 2024941 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2024943 - Posted: 25 Dec 2019, 16:51:32 UTC - in response to Message 2024941.  

My anon boxes have work and everything seems to be fine but the server status page says the replica database is again being left behind at a steady rate, which I guess will predict trouble in the future...


I think it will grow to some value and come back down as things stabilize, just like what happened after the last server change.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2024943 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2024954 - Posted: 25 Dec 2019, 18:04:37 UTC

No panic.

The replica lag time is decreasing, and the RTS is increasing.

Happy Holidays! Thank you Seti crew for working so long yesterday to fix the issues!
ID: 2024954 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024959 - Posted: 25 Dec 2019, 18:50:58 UTC

I have two of my rigs back on the special sauce app, a cuda90 version. One of them has both GPU and CPU tasks (T5810-Ubuntu) and the other only has GPU after quite a while (T3500-Ubuntu). All I did was restore the backed up setiathome.berkely.com folder. Before the server software issues, both were getting both GPU and CPU tasks. Any idea what I should check? Is is just not long enough for the server to make some CPU tasks?

Thanks and happy holidays!

Roger
ID: 2024959 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2024960 - Posted: 25 Dec 2019, 18:53:46 UTC

I have all of my systems back on Anonymous and working great.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2024960 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2024962 - Posted: 25 Dec 2019, 19:06:05 UTC - in response to Message 2024959.  

I have two of my rigs back on the special sauce app, a cuda90 version. One of them has both GPU and CPU tasks (T5810-Ubuntu) and the other only has GPU after quite a while (T3500-Ubuntu). All I did was restore the backed up setiathome.berkely.com folder. Before the server software issues, both were getting both GPU and CPU tasks. Any idea what I should check? Is is just not long enough for the server to make some CPU tasks?

Thanks and happy holidays!

Roger

I always find on my hosts that the gpu caches are always refilled first. Something to do with the APR of the apps and what the scheduler thinks is fastest. So wait until your gpu cache is fully filled before panicking on if your cpu cache is still not getting filled.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2024962 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024963 - Posted: 25 Dec 2019, 19:08:54 UTC - in response to Message 2024962.  

Thanks, Keith. Good to know. Neither looks quite full at the moment, but the 5810 may have been earlier.
ID: 2024963 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024967 - Posted: 25 Dec 2019, 19:20:51 UTC

Got SETI64-Ubuntu, my last and fastest rig back on special sauce. :) Good to see it tearing through those tasks. All back now except for needing a few GPU tasks, but that's minor.

Thanks to the SETI team for getting the servers back in order and my fellow Setizens for helping cope with and spoof stock. Back to Christmas feasting now...
ID: 2024967 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024978 - Posted: 25 Dec 2019, 20:35:36 UTC - in response to Message 2024962.  

I have two of my rigs back on the special sauce app, a cuda90 version. One of them has both GPU and CPU tasks (T5810-Ubuntu) and the other only has GPU after quite a while (T3500-Ubuntu). All I did was restore the backed up setiathome.berkely.com folder. Before the server software issues, both were getting both GPU and CPU tasks. Any idea what I should check? Is is just not long enough for the server to make some CPU tasks?

Thanks and happy holidays!

Roger

I always find on my hosts that the gpu caches are always refilled first. Something to do with the APR of the apps and what the scheduler thinks is fastest. So wait until your gpu cache is fully filled before panicking on if your cpu cache is still not getting filled.


I just noticed the T3500 is only asking for GPU tasks. The preferences for its location should have both GPU and CPU tasks. Any other place where a variable could override that?
ID: 2024978 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.