The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 94 · Next

AuthorMessage
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024959 - Posted: 25 Dec 2019, 18:50:58 UTC

I have two of my rigs back on the special sauce app, a cuda90 version. One of them has both GPU and CPU tasks (T5810-Ubuntu) and the other only has GPU after quite a while (T3500-Ubuntu). All I did was restore the backed up setiathome.berkely.com folder. Before the server software issues, both were getting both GPU and CPU tasks. Any idea what I should check? Is is just not long enough for the server to make some CPU tasks?

Thanks and happy holidays!

Roger
ID: 2024959 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2024960 - Posted: 25 Dec 2019, 18:53:46 UTC

I have all of my systems back on Anonymous and working great.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2024960 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2024962 - Posted: 25 Dec 2019, 19:06:05 UTC - in response to Message 2024959.  

I have two of my rigs back on the special sauce app, a cuda90 version. One of them has both GPU and CPU tasks (T5810-Ubuntu) and the other only has GPU after quite a while (T3500-Ubuntu). All I did was restore the backed up setiathome.berkely.com folder. Before the server software issues, both were getting both GPU and CPU tasks. Any idea what I should check? Is is just not long enough for the server to make some CPU tasks?

Thanks and happy holidays!

Roger

I always find on my hosts that the gpu caches are always refilled first. Something to do with the APR of the apps and what the scheduler thinks is fastest. So wait until your gpu cache is fully filled before panicking on if your cpu cache is still not getting filled.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2024962 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024963 - Posted: 25 Dec 2019, 19:08:54 UTC - in response to Message 2024962.  

Thanks, Keith. Good to know. Neither looks quite full at the moment, but the 5810 may have been earlier.
ID: 2024963 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024967 - Posted: 25 Dec 2019, 19:20:51 UTC

Got SETI64-Ubuntu, my last and fastest rig back on special sauce. :) Good to see it tearing through those tasks. All back now except for needing a few GPU tasks, but that's minor.

Thanks to the SETI team for getting the servers back in order and my fellow Setizens for helping cope with and spoof stock. Back to Christmas feasting now...
ID: 2024967 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024978 - Posted: 25 Dec 2019, 20:35:36 UTC - in response to Message 2024962.  

I have two of my rigs back on the special sauce app, a cuda90 version. One of them has both GPU and CPU tasks (T5810-Ubuntu) and the other only has GPU after quite a while (T3500-Ubuntu). All I did was restore the backed up setiathome.berkely.com folder. Before the server software issues, both were getting both GPU and CPU tasks. Any idea what I should check? Is is just not long enough for the server to make some CPU tasks?

Thanks and happy holidays!

Roger

I always find on my hosts that the gpu caches are always refilled first. Something to do with the APR of the apps and what the scheduler thinks is fastest. So wait until your gpu cache is fully filled before panicking on if your cpu cache is still not getting filled.


I just noticed the T3500 is only asking for GPU tasks. The preferences for its location should have both GPU and CPU tasks. Any other place where a variable could override that?
ID: 2024978 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2024979 - Posted: 25 Dec 2019, 20:48:39 UTC - in response to Message 2024978.  

I just noticed the T3500 is only asking for GPU tasks. The preferences for its location should have both GPU and CPU tasks. Any other place where a variable could override that?
Missing or broken app_info.xml entry for the CPU app?
ID: 2024979 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 35004
Credit: 261,360,520
RAC: 489
Australia
Message 2024981 - Posted: 25 Dec 2019, 21:05:58 UTC

I just noticed the T3500 is only asking for GPU tasks. The preferences for its location should have both GPU and CPU tasks. Any other place where a variable could override that?
I have no idea what your "T3500" is, but have you checked your local SETI properties as sometimes you could be on an up to 4 day back off on requesting CPU work.

If you are then once you close the window do a manual update and that will reset the back off.

Cheers.
ID: 2024981 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2024982 - Posted: 25 Dec 2019, 21:21:57 UTC - in response to Message 2024959.  

I have two of my rigs back on the special sauce app, a cuda90 version. One of them has both GPU and CPU tasks (T5810-Ubuntu) and the other only has GPU after quite a while (T3500-Ubuntu). All I did was restore the backed up setiathome.berkely.com folder. Before the server software issues, both were getting both GPU and CPU tasks. Any idea what I should check? Is is just not long enough for the server to make some CPU tasks?

Thanks and happy holidays!

Roger


. . When the splitters create 'results' they are neither CPU nor GPU tasks, they are just tasks. It is the scheduler that 'decides' to make them one or the other when they are allocated to a host (which is deceptive when you get a message saying there are ATI and Intel tasks available but none for your Nvidia cards???) So check your event log as Richards suggests and make sure you are requesting CPU work.

Stephen

. .
ID: 2024982 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2024985 - Posted: 25 Dec 2019, 21:41:54 UTC - in response to Message 2024979.  

I would recommend always using the sched_op_debug logging flag for the Event Log. That way at each scheduler connection you will get a printout of how many seconds of cpu work and gpu work you are requesting. If you don't ask for any seconds of cpu work, then you need to figure out why. Probably a configuration problem where you turned off the cpu for the host or the location venue.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2024985 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2024986 - Posted: 25 Dec 2019, 21:51:42 UTC

The Splitters are struggling to meet demand, but so far they are. Now if they could just sort out the WU validation/deletion/assimilation issues. And the new Scheduler code issues.
A few things there for the to do list in the new year.
Grant
Darwin NT
ID: 2024986 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2024990 - Posted: 25 Dec 2019, 21:59:22 UTC - in response to Message 2024930.  

BTW, I finally found out how to keep the Main Server from sending all those OpenCL tasks when trying to Spoof the CUDA Special App as Stock. Just add <no_opencl>1</no_opencl> to cc_config.xml, then restart BOINC, and then it will only send tasks for CUDA.
So with luck using <no_cuda>1</no_cuda> should stop any CUDA42 or CUDA50 applications being used if trying to run stock under Windows, leaving just SoG which is OpenCL (and AP for those that process them).
Grant
Darwin NT
ID: 2024990 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2024997 - Posted: 25 Dec 2019, 22:52:43 UTC - in response to Message 2024911.  

Hmm, managed to pick up some work (new work that is, not resends) in the last 30min or so, Ready-to-send showing 1200, but splitter output has been reported as 0 for about an hour now.
This was annoying me while I was trying to sleep last night.
The only thing that comes to mind is that the splitters were spluttering over that hour or so- spitting out a good amount of work now & then, but never when they were queried for the Server Status page numbers. Hence work was being produced, even though the splitter output was being reported as 0.
Grant
Darwin NT
ID: 2024997 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2025012 - Posted: 26 Dec 2019, 0:42:02 UTC - in response to Message 2024985.  

I would recommend always using the sched_op_debug logging flag for the Event Log. That way at each scheduler connection you will get a printout of how many seconds of cpu work and gpu work you are requesting. If you don't ask for any seconds of cpu work, then you need to figure out why. Probably a configuration problem where you turned off the cpu for the host or the location venue.

Thanks for the suggestions everyone. I do have that debugging flag set. it shows no CPU work request:
Wed 25 Dec 2019 07:24:59 PM EST | SETI@home | [sched_op] Starting scheduler request
Wed 25 Dec 2019 07:24:59 PM EST | SETI@home | Sending scheduler request: To fetch work.
Wed 25 Dec 2019 07:24:59 PM EST | SETI@home | Reporting 15 completed tasks
Wed 25 Dec 2019 07:24:59 PM EST | SETI@home | Requesting new tasks for NVIDIA GPU
Wed 25 Dec 2019 07:24:59 PM EST | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Wed 25 Dec 2019 07:24:59 PM EST | SETI@home | [sched_op] NVIDIA GPU work request: 4629217.31 seconds; 0.00 devices
Wed 25 Dec 2019 07:25:02 PM EST | SETI@home | Scheduler request completed: got 15 new tasks
Wed 25 Dec 2019 07:25:02 PM EST | SETI@home | [sched_op] Server version 709
Wed 25 Dec 2019 07:25:02 PM EST | SETI@home | Project requested delay of 303 seconds
Wed 25 Dec 2019 07:25:03 PM EST | SETI@home | [sched_op] estimated total CPU task duration: 0 seconds
Wed 25 Dec 2019 07:25:03 PM EST | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 1482 seconds

Not sure why this is happening on this one host. It still has no CPU jobs and it did before the server problems. I just copied back the folder, so none of that changed. I only left in the dont_check_file_sizes flag set to 1. Could that do something like this? But, it is also set on the PC with CPU tasks. The other host at "home" location is getting CPU tasks and CPU and GPU are set for that location. I checked the cc_config and app_confi and app_info files. They look the same; I haven't done a diff compare. however.

Wiggo, T3500 is the PC name in case someone wanted to look it up. I have restarted the BOINC manager to no effect. I didn't quite understand what you were suggesting in your reply. Can you explain in a bit more detail? [/code]
ID: 2025012 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2025013 - Posted: 26 Dec 2019, 0:50:16 UTC - in response to Message 2025012.  

Just found it. In BOINC manager, the computing preferences were set a bit different (since the machines are different specs). I think it was an inadvertently low "use at most % of the CPUs" limit. Should have compared that earlier!
ID: 2025013 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2025015 - Posted: 26 Dec 2019, 1:12:41 UTC

And on to next year's wish list it would also be nice if they could sort the odd upload that takes a couple of attempts to go through, along with the occasional sticking downloads.
Grant
Darwin NT
ID: 2025015 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2025021 - Posted: 26 Dec 2019, 1:51:08 UTC

Did the new hardware upload server ever make it over to Main after its dry-run at Beta?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2025021 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2025023 - Posted: 26 Dec 2019, 1:54:27 UTC - in response to Message 2025021.  
Last modified: 26 Dec 2019, 1:56:15 UTC

Did the new hardware upload server ever make it over to Main after its dry-run at Beta?
I ask about it every so often, and... Nope. Apparently it's still there, presently out of commission.

From Eric's "Sever issues" news post
The file system containing the beta project uploads directory is having problems, so beta is down until further notice.



Would be nice to have such upgraded hardware helping out here.
*shrug*
Grant
Darwin NT
ID: 2025023 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2025034 - Posted: 26 Dec 2019, 4:44:12 UTC
Last modified: 26 Dec 2019, 4:57:33 UTC

Well, it was fun while it lasted. Looks like either work or issues in progress.
??
ID: 2025034 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2025036 - Posted: 26 Dec 2019, 5:32:55 UTC - in response to Message 2025034.  
Last modified: 26 Dec 2019, 5:37:25 UTC

Well, it was fun while it lasted. Looks like either work or issues in progress.
??
Yeah, I got a "Project is temporarily shut down for maintenance" response, next contact I reported 75 WUs, but only got 1 back, the one after that filled in the deficit.

Edit-
Eric's post in the RX 5700 XT thread might explain the recent brief project outage.
Grant
Darwin NT
ID: 2025036 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.