Panic Mode On (111) Server Problems?

Message boards : Number crunching : Panic Mode On (111) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 31 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1928156 - Posted: 5 Apr 2018, 19:46:20 UTC - in response to Message 1928139.  
Last modified: 5 Apr 2018, 20:06:33 UTC

Probably someone collected those

Tasks for CPU are available, but your preferences are set to not accept them
Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them
Tasks for Intel GPU are available, but your preferences are set to not accept them
that were cluttering up the feeder cache. Perhaps 'Tasks for CPU' were available, but because you weren't asking for any, and your preferences (may?) allow them, there was no point in sending that message.
Hmmm, are you saying those messages are bouncing around inside the server creating problems? I've never received any of those messages, and since I run Anonymous platform I really don't have any need to change those Preferences as I just list the Apps I'm using in the app_info. I'm running 2 CPU tasks and 3 nVidia GPUs on that machine, it asks for GPU tasks much more often than CPU tasks. It's also back to not being sent much work in the last hour or so. It has around 50 CPU tasks onboard, meaning right now it's down around 70 GPU tasks. It does receive some tasks ever so often, but not enough to replace the completed tasks;

Thu Apr 5 14:37:54 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 14:37:59 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 14:37:59 2018 | SETI@home | Reporting 5 completed tasks
Thu Apr 5 14:37:59 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 14:37:59 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 14:37:59 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 220997.64 seconds; 0.00 devices
Thu Apr 5 14:38:00 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 14:38:00 2018 | SETI@home | No tasks sent
Thu Apr 5 14:43:08 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 14:43:13 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 14:43:13 2018 | SETI@home | Reporting 7 completed tasks
Thu Apr 5 14:43:13 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 14:43:13 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 14:43:13 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 222353.24 seconds; 0.00 devices
Thu Apr 5 14:43:14 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 14:43:14 2018 | SETI@home | No tasks sent
Thu Apr 5 14:48:18 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 14:48:23 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 14:48:23 2018 | SETI@home | Reporting 5 completed tasks
Thu Apr 5 14:48:23 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 14:48:23 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 14:48:23 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 223388.41 seconds; 0.00 devices
Thu Apr 5 14:48:24 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 14:48:24 2018 | SETI@home | No tasks sent
Thu Apr 5 14:53:27 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 14:53:32 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 14:53:32 2018 | SETI@home | Reporting 7 completed tasks
Thu Apr 5 14:53:32 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 14:53:32 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 14:53:32 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 224525.77 seconds; 0.00 devices
Thu Apr 5 14:53:33 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 14:53:33 2018 | SETI@home | No tasks sent
Thu Apr 5 14:58:36 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 14:58:41 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 14:58:41 2018 | SETI@home | Reporting 10 completed tasks
Thu Apr 5 14:58:41 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 14:58:41 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 14:58:41 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 225665.87 seconds; 0.00 devices
Thu Apr 5 14:58:42 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 14:58:42 2018 | SETI@home | No tasks sent
Thu Apr 5 15:03:46 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:03:51 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:03:51 2018 | SETI@home | Reporting 5 completed tasks
Thu Apr 5 15:03:51 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:03:51 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:03:51 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 226879.65 seconds; 0.00 devices
Thu Apr 5 15:03:52 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 15:03:52 2018 | SETI@home | No tasks sent
Thu Apr 5 15:09:00 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:09:05 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:09:05 2018 | SETI@home | Reporting 5 completed tasks
Thu Apr 5 15:09:05 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:09:05 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:09:05 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 228181.81 seconds; 0.00 devices
Thu Apr 5 15:09:06 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 15:09:06 2018 | SETI@home | No tasks sent
Thu Apr 5 15:14:19 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:14:24 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:14:24 2018 | SETI@home | Reporting 6 completed tasks
Thu Apr 5 15:14:24 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:14:24 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:14:24 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 229321.23 seconds; 0.00 devices
Thu Apr 5 15:14:25 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 15:14:25 2018 | SETI@home | No tasks sent
Thu Apr 5 15:19:29 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:19:34 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:19:34 2018 | SETI@home | Reporting 6 completed tasks
Thu Apr 5 15:19:34 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:19:34 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:19:34 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 230393.37 seconds; 0.00 devices
Thu Apr 5 15:19:35 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 15:19:35 2018 | SETI@home | No tasks sent
Thu Apr 5 15:24:38 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:24:43 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:24:43 2018 | SETI@home | Reporting 8 completed tasks
Thu Apr 5 15:24:43 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:24:43 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:24:43 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 231670.33 seconds; 0.00 devices
Thu Apr 5 15:24:45 2018 | SETI@home | Scheduler request completed: got 4 new tasks
Thu Apr 5 15:24:45 2018 | SETI@home | [sched_op] estimated total CPU task duration: 0 seconds
Thu Apr 5 15:24:45 2018 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 425 seconds
Thu Apr 5 15:29:52 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:29:57 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:29:57 2018 | SETI@home | Reporting 4 completed tasks
Thu Apr 5 15:29:57 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:29:57 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:29:57 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 232270.98 seconds; 0.00 devices
Thu Apr 5 15:29:58 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 15:29:58 2018 | SETI@home | No tasks sent
Thu Apr 5 15:35:06 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:35:11 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:35:11 2018 | SETI@home | Reporting 6 completed tasks
Thu Apr 5 15:35:11 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:35:11 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:35:11 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 233431.68 seconds; 0.00 devices
Thu Apr 5 15:35:12 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 15:35:12 2018 | SETI@home | No tasks sent
Thu Apr 5 15:40:20 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:40:25 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:40:25 2018 | SETI@home | Reporting 7 completed tasks
Thu Apr 5 15:40:25 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:40:25 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:40:25 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 234552.87 seconds; 0.00 devices
Thu Apr 5 15:40:26 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 15:40:26 2018 | SETI@home | No tasks sent
Thu Apr 5 15:45:34 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 15:45:39 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 15:45:39 2018 | SETI@home | Reporting 6 completed tasks
Thu Apr 5 15:45:39 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 15:45:39 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 15:45:39 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 235729.31 seconds; 0.00 devices
Thu Apr 5 15:45:40 2018 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Apr 5 15:45:40 2018 | SETI@home | No tasks sent
89 tasks only lasts for so long.

Oh look, the server woke again;
Thu Apr 5 16:01:11 2018 | SETI@home | [sched_op] Starting scheduler request
Thu Apr 5 16:01:16 2018 | SETI@home | Sending scheduler request: To report completed tasks.
Thu Apr 5 16:01:16 2018 | SETI@home | Reporting 5 completed tasks
Thu Apr 5 16:01:16 2018 | SETI@home | Requesting new tasks for NVIDIA GPU
Thu Apr 5 16:01:16 2018 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Thu Apr 5 16:01:16 2018 | SETI@home | [sched_op] NVIDIA GPU work request: 234417.73 seconds; 0.00 devices
Thu Apr 5 16:01:20 2018 | SETI@home | Scheduler request completed: got 76 new tasks
Thu Apr 5 16:01:20 2018 | SETI@home | [sched_op] estimated total CPU task duration: 0 seconds
Thu Apr 5 16:01:20 2018 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 19947 seconds
So....How does it figure 76 tasks will last 332 minutes?
ID: 1928156 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1928158 - Posted: 5 Apr 2018, 20:01:24 UTC
Last modified: 5 Apr 2018, 20:10:31 UTC

OK, had much better luck this time. This has something to do with defining g_wreq->max_jobs_exceeded()

    if (config.max_wus_to_send) {
        g_wreq->max_jobs_per_rpc = mult * config.max_wus_to_send;
    } else {
        g_wreq->max_jobs_per_rpc = 999999;
            g_reply->set_delay(DELAY_NO_WORK_CACHE);
        }
        if (g_wreq->max_jobs_exceeded()) {
            sprintf(buf, "This computer has reached a limit on tasks in progress");

Last indexed on Jul 30, 2017 whatever indexed means
and

    bool max_jobs_exceeded() {
        if (max_jobs_on_host_exceeded) return true;
        for (int i=0; i<NPROC_TYPES; i++) {
extern WORK_REQ* g_wreq;
extern double capped_host_fpops();

static inline void add_no_work_message(const char* m) {
    g_wreq->add_no_work_message(m);

Last indexed on Feb 7

max_jobs_per_rpc can only be as high 999999 per request. So now have to figure out what config.max_wus_to_send is defined as. And what does mult * function do to that variable?
And extern double capped_host_fpops() looks interesting too. Good 'ole fpops comes into play again.

[Edit]

OK, it has to do with determining how many tasks you get based on how many gpus on the host.

        if (n > MAX_GPUS) n = MAX_GPUS;
        ninstances[proc_type] = n;
        effective_ngpus += n;
    }

    int mult = effective_ncpus + config.gpu_multiplier * effective_ngpus;
    if (config.max_wus_to_send) {
        g_wreq->max_jobs_per_rpc = mult * config.max_wus_to_send;
    } else {
        g_wreq->max_jobs_per_rpc = 999999;


It would seem that the number of tasks per cpu is defined somewhere else.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1928158 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1928172 - Posted: 5 Apr 2018, 21:07:29 UTC


So....How does it figure 76 tasks will last 332 minutes?


That's a VERY GOOD question. I have always thought the fpops_est was always screwed up and didn't calculate true computing power of gpus. Even less so for the special app.

The APR for the gpu tasks done on a special app host don't seem to be that wrong. So how does the scheduler mess up the estimated gpu task completion time so badly?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1928172 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1928173 - Posted: 5 Apr 2018, 21:08:06 UTC - in response to Message 1928156.  

estimated total NVIDIA GPU task duration: 19947 seconds
So....How does it figure 76 tasks will last 332 minutes?
Look on the tasks tab in BOINC Manager. Each task has a "Remaining (estimated)" runtime. I'm guessing most of them are around 00:04:22.

That's usually a pretty good estimate, if all your cards run at the same speed. The server keeps track of your performance, and tweaks the figures so the estimate is realistic. If you run stock apps, the server monitors and adjusts speed (APR). If you run Anonymous Platform, the server takes your word for the speed, and tweaks the size of the task instead. Both routes end up in the same place.
ID: 1928173 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1928183 - Posted: 5 Apr 2018, 21:23:56 UTC - in response to Message 1928173.  

estimated total NVIDIA GPU task duration: 19947 seconds
So....How does it figure 76 tasks will last 332 minutes?
Look on the tasks tab in BOINC Manager. Each task has a "Remaining (estimated)" runtime. I'm guessing most of them are around 00:04:22.

That's usually a pretty good estimate, if all your cards run at the same speed. The server keeps track of your performance, and tweaks the figures so the estimate is realistic. If you run stock apps, the server monitors and adjusts speed (APR). If you run Anonymous Platform, the server takes your word for the speed, and tweaks the size of the task instead. Both routes end up in the same place.

I don't understand Richard. What do you mean the server "takes your word for the speed" I don't know how we alter or affect the calculated APR other than what the server calculates for us.

I don't think any of us are messing with the fpops_est value in the client_state.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1928183 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1928190 - Posted: 5 Apr 2018, 21:38:13 UTC - in response to Message 1928183.  

Bedtime approaches, and the board is slow - it may take me until tomorrow to re-locate that code.

But:
Speed - in the stock case - APR in GigaFlop/sec
- in the anonymous platform case - CPU benchmark for CPU tasks, Peak flops * fiddle factor for GPUs. Fiddle factor might be 1/20th.

Task size - in the stock case - workunit <rsc_fpops_est>, raw, from splitters
- in the anonymous platform case - <rsc_fpops_est>,tweaked by the inverse of the ratio of speed (as above - you're following me?) to APR.
ID: 1928190 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1928191 - Posted: 5 Apr 2018, 21:38:25 UTC - in response to Message 1928173.  

If you look at the posted logs you can see it's reporting 5 to 9 completed tasks every 5 minutes. 5 x 12 = 60 tasks in an hour. I just received a load of shorties estimated to take 85 seconds a piece. 85 seconds. 76 won't last long. The longest estimate on the tasks page is 3:47, the shortest is 1:25, that's how you complete well over 1000 tasks a day. Then there are all those that finish in about 5 seconds. We need more tasks.
ID: 1928191 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1928193 - Posted: 5 Apr 2018, 21:40:32 UTC - in response to Message 1928191.  

That depends whether the project exists to provide kibble to you, or whether you exist to do science for the project.
ID: 1928193 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1928197 - Posted: 5 Apr 2018, 21:48:26 UTC - in response to Message 1928193.  

I'm running Low end hardware. I'll let what you posted sink in to the people running High end hardware.
ID: 1928197 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1928203 - Posted: 5 Apr 2018, 22:02:42 UTC - in response to Message 1928172.  

So how does the scheduler mess up the estimated gpu task completion time so badly?
Easy. Just take the longest estimate and consider ONE device. That would be a little closer to reality.
ID: 1928203 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1928220 - Posted: 5 Apr 2018, 23:55:29 UTC - in response to Message 1928191.  
Last modified: 5 Apr 2018, 23:57:59 UTC

I just looked at the estimated time for completion for all my tasks on the Intel machine.

52 minutes for cpu tasks. 43 seconds for shorties. 1 minute 24 seconds - 1 minute 54 seconds for VLAR's.

The Ryzen machines do cpu tasks in 28-45 minutes.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1928220 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1928225 - Posted: 6 Apr 2018, 0:12:18 UTC - in response to Message 1928118.  
Last modified: 6 Apr 2018, 0:24:21 UTC

Now both Linux crunchers are back to being down 100 tasks from full again like last night. Only 1 in 5 task requests get any work and then only 1 or 2 tasks. The rest of the time I get the "you've reached the limit of tasks in progress" message.


. . And those pesky Blc01 tapes seem to still be stuck in the splitters ...

Stephen

??
ID: 1928225 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1928229 - Posted: 6 Apr 2018, 0:26:37 UTC - in response to Message 1928122.  

Seems the messages have changed. Along with more of the Reached a Limit, I'm now just being told Nothing was sent;


. . Getting that here as well.

Stephen

:(
ID: 1928229 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1928230 - Posted: 6 Apr 2018, 0:29:38 UTC - in response to Message 1928125.  

All of these look new... I wonder with the recent long outrage to fix database issues that this has been introduced to reduce work-in-progress entries, specifically by countering "bunkering" (edit: not saying that anyone is, of course! Pretty sure that the 100 CPU/GPU limit was introduced specifically to limit work-in-progress entries so it is an issue that has been addressed before so it's possible.)


. . Hey there Mr Kevvy,

. . Long time no hear ...

. . That sounds very plausible to me ...

Stephen

:(
ID: 1928230 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1928233 - Posted: 6 Apr 2018, 0:37:44 UTC - in response to Message 1928126.  

I'm reading, but I don't have an explanation.

Is it possible that on a machine which has CPU crunching enabled, you might have 100 CPU tasks onboard, and the scheduler might count them and say "enough, already", and bail out without enumerating GPU tasks? The message you're discussing does say "This computer has reached A limit on tasks in progress" (direct quote from my log at 18:15, except for the emphasis). It doesn't say which limit.


. . I have observed just that sort of behaviour a lot lately. When getting new work the CPU queue will be completely refilled but the GPU Q will be shortchanged. Even the other way around on a rare occasion. So maybe it is contextual and whichever Q gets work allocated first triggers the "enough" signal despite the status of the second Q. That to me would be an error in the procedure, OR a deliberate change to the code to prevent bunkering as was suggested earlier.

Stephen

:(
ID: 1928233 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1928241 - Posted: 6 Apr 2018, 1:26:00 UTC - in response to Message 1928220.  

I just looked at the estimated time for completion for all my tasks on the Intel machine.
52 minutes for cpu tasks. 43 seconds for shorties. 1 minute 24 seconds - 1 minute 54 seconds for VLAR's.
The Ryzen machines do cpu tasks in 28-45 minutes.
The question was;
Thu Apr 5 16:01:20 2018 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 19947 seconds
So....How does it figure 76 tasks will last 332 minutes?
The answer is it took the longest estimate of around 4 minutes, multiplied that by 76, and came up with a time close to 304 minutes. BUT, that's for just ONE GPU...the machine has 3 GPUs. Therefore, the estimate is immediately off by a factor of 3, and then there is the problem of tasks taking much less than the High estimate. By the time all is corrected, the 76 tasks will probably take about 76 minutes using 3 GPUs with some tasks taking only 80 seconds to complete.
Apparently the estimate completely ignores the number of devices the machine has.
ID: 1928241 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1928242 - Posted: 6 Apr 2018, 1:41:26 UTC - in response to Message 1928241.  

That snippet of code I posted is supposed to calculate the number of seconds of work based on the number of gpus in the host.

From your calculation, that part of the code is not working evidently as I agree with your estimate is only for ONE device, not for three gpus.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1928242 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1928263 - Posted: 6 Apr 2018, 5:35:34 UTC

So, apart from the random work allocation rearing it's ugly head again, the database re-organisation seems to be helping.
Things were a bit messy after the outage, but the splitters (in-spite of some slowdowns after a very good start) have been able to fill the Ready-to-send buffer, and keep it filled for over half a day. Been a while since that has been the case.
The Results & WUs Awaiting-purge have both reached & generally settled around their more normal levels. WU Awaiting-deletion while not back to (effectively) zero like they used to be, are at least close enough to it, and not heading for yet another record high.

Now we just need another bunch of short WUs so we can hammer the servers with 145,000/hour again to see just how well they can hold up. If we can get the Scheduler to reliably allocate work when a host hasn't reached it's cache or server-side limits, we might actually be ready to cope with more crunchers.
Grant
Darwin NT
ID: 1928263 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1928264 - Posted: 6 Apr 2018, 5:38:25 UTC - in response to Message 1928242.  

That snippet of code I posted is supposed to calculate the number of seconds of work based on the number of gpus in the host.

From your calculation, that part of the code is not working evidently as I agree with your estimate is only for ONE device, not for three gpus.

Or it's working the way it's meant to; wasn't it a glitch in the code that allows each GPU to get 100 WU, instead of like the CPU where it's a limit of 100 regardless of the number of cores/threads?
Grant
Darwin NT
ID: 1928264 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22539
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1928266 - Posted: 6 Apr 2018, 6:32:34 UTC

Since the routine is working "correctly" on two of my four crunchers, and "incorrectly" on the other two I would suggest there is something amiss in the communication between the cruncher and the calculation. It is worth noting that the two that are "incorrect" are my top two....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1928266 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (111) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.