Message boards :
Number crunching :
Panic Mode On (111) Server Problems?
Message board moderation
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 31 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
I'll give you a history lesson.Thanks - I'll take a look when I'm fit again. I'm making too many typing errors, which means I still need to rest. Right now the problem is around 15 months OLD and counting.All the more reason to try and fix it. Here is a more recent log;But (on a couple of spot checks), it's already too old to find any additional data from task records in the database. Host ID? Version of BOINC client in use? It has that strange message, that every RPC starts with "To report completed tasks", not "To fetch work". I get these two together: 08/04/2018 18:41:45 | SETI@home | Sending scheduler request: To report completed tasks.when I need work, I see 08/04/2018 17:30:47 | SETI@home | Sending scheduler request: To fetch work. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
But why would I deliberately hamstring a host that has just as high cpu only production as the typical host with a cpu AND a gpu combined. We are expecting an influx of new data and new sources in the future and the project is going to need as much computing horsepower as possible.Then help me to try and track down the cause of the problem. Keeping a cache of 95 CPU tasks running/ready to run isn't going to hamstring anything, unless you're running a 96 CPU computer. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I'll give you a history lesson.Thanks - I'll take a look when I'm fit again. I'm making too many typing errors, which means I still need to rest. The only time I see 'Sending scheduler request: To fetch work' is when I'm Not reporting completed tasks, and since I'm usually reporting tasks I almost Never see it. This is from a Linux Host with 7.4.44, note the Requesting new tasks for NVIDIA GPU part; Sun 08 Apr2018 02:01:15 PM EDT | SETI@home | [sched_op] Starting scheduler request This is from a Mac with 7.8.6 running Yosemite, note the 'Requesting new tasks for NVIDIA GPU and AMD/ATI GPU' part; Sun Apr 8 14:09:47 2018 | SETI@home | [sched_op] Starting scheduler request This is another Mac running Sierra, note the Requesting new tasks for NVIDIA GPU part; Sun Apr 8 14:15:10 2018 | SETI@home | [sched_op] Starting scheduler request Do you still think these 15 MONTHS of problems are caused by people having No New Tasks Set? How do you receive work if NNT is set? Here's one where it isn't reporting a completed task, and since it didn't report a task, the NVIDIA cache was full; Sat Apr 7 10:24:22 2018 | SETI@home | [sched_op] Starting scheduler request |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
Do you still think these 15 MONTHS of problems are caused by people having No New Tasks Set?I don't think I've ever said that. I have said that it might - repeat might - be aggravated by having 100 CPU tasks cached, no completed CPU tasks to report, and 'reached a limit on tasks in progress' kicking in too aggressively. And it might - still repeat might - be aggravated by other things too. Your logs still don't match mine: 08-Apr-2018 08:35:22 [SETI@home] Sending scheduler request: To fetch work. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Your logs still don't match mine: So you tell me, it looks to be a Windows thing to Me. The only time I've seen the Fetch was when I wasn't reporting tasks. The only thing I can think of is the Report completed Tasks Immediately setting. Sun Apr 8 11:25:18 2018 | | Config: run apps at regular priority |
Sirius B Send message Joined: 26 Dec 00 Posts: 24904 Credit: 3,081,182 RAC: 7 |
Just my 2c worth. Wouldn't be better to run a test: Windows. CPU cruncher only GPU cruncher only Combined cruncher Then the same for Linux. Maybe able to pinpoint where the actual issue lies. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
The only thing I can think of is the Report completed Tasks Immediately setting.That could well be it, and easy to test. Thanks. I'm getting closer to testing. 08/04/2018 19:46:18 | SETI@home | [sched_op] Starting scheduler requestThat still leaves me 34 tasks short of the limit (166, need 200) - not sure which is low yet. I'll keep y'all posted. Edit - might even have a clue already. 08/04/2018 19:51:25 | SETI@home | Sending scheduler request: To fetch work.Same machine, next fetch. The single task received was, as shown, for CPU. So was the task reported, 6548119510. So I suspect I had 100 CPU, swapped 1 for 1, and the server bailed out at the point. I need a beer to think about that one. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Dot't know if is relevant to your talks but i noticed each time my host produce a error WU it triger the 'reached a limit on tasks in progress' bottom and starts the account form 0, so intil a lot of new WU where reported it not receive any new job. That makes little hurt on a host like mine who produces a lot of new work per hr, but if that happening in a slower host and close to a server outage maybe that could cause some kind of host starvation of new WU. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
But why would I deliberately hamstring a host that has just as high cpu only production as the typical host with a cpu AND a gpu combined. We are expecting an influx of new data and new sources in the future and the project is going to need as much computing horsepower as possible.Then help me to try and track down the cause of the problem. Keeping a cache of 95 CPU tasks running/ready to run isn't going to hamstring anything, unless you're running a 96 CPU computer. But my experiment yesterday to drop to 0.5 day cache dropped me to 75 cpu tasks in an hour before I called a halt. It would have fallen even further if I had continued because hosts had quit even asking for work. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
See edit to my previous post - I may have a new theory within 5 minutes. I need time, as well as the beer, to think about that one. |
rob smith Send message Joined: 7 Mar 03 Posts: 22436 Credit: 416,307,556 RAC: 380 |
If that host stopped requesting work it hadn't reached the floor - you should have let it run down tasks until it either hit the floor (zero tasks), or restarted asking for work. Another thing, and you consistently refuse to accept this - SETI@Home has NEVER promised us a continuous flow of work, so do not expect to see you cache's filled on ever call. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Sirius B Send message Joined: 26 Dec 00 Posts: 24904 Credit: 3,081,182 RAC: 7 |
Someone has pulled the rabbit out of the hat. How many times has the rabbit been forgotten? :-) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Another thing, and you consistently refuse to accept this - SETI@Home has NEVER promised us a continuous flow of work, so do not expect to see you cache's filled on ever call. Did I mention the problem doesn't exist for ATI users? Fine, they don't promise us a continuous flow of work. But, they discriminate against NVIDIA users. Does SETI want to be accused of Discrimination? We can do that you know. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
If that host stopped requesting work it hadn't reached the floor - you should have let it run down tasks until it either hit the floor (zero tasks), or restarted asking for work. That is a completely false statement and I take offense. I DO work for other projects BECAUSE Seti can't supply constant work to keep the hosts busy. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Another thing, and you consistently refuse to accept this - SETI@Home has NEVER promised us a continuous flow of work, so do not expect to see you cache's filled on ever call. +1 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14672 Credit: 200,643,578 RAC: 874 |
It might argued that SETI protects users from the poor software tools provided by NVidia to support their hardware. But that's a contentious thought that is outside the scope of this thread, and I'm not going to develop it tonight or any night. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Interestingly, the log jam on the Blc01 tapes seems to have been cleared and they are nearly all split now (only 390 channels of them left). Maybe there was something hung up there ... Stephen ? ? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13832 Credit: 208,696,464 RAC: 304 |
Another thing, and you consistently refuse to accept this - SETI@Home has NEVER promised us a continuous flow of work, so do not expect to see you cache's filled on ever call. If there are over 500,000 WUs ready to send, and there are no Arecibo VLARs anywhere in sight, it's not an unreasonable expectation to get work to replace work that is returned. It's happened ever since Seti moved to BOINC, until Dec 2016 when things changed. Given that prior to this issue occurring, even if there were a flood of Arecibo VLARs, there was never an issue getting CPU work, and it only took a few requests before a batch of GPU work would download. If there is no data- then there's no data. And that is what we were never promised- never ending data. But if there is data available, ready to be downloaded & processed, then it's implicit that we are able to download it to process. Being able to get it is important; particularly so if they want even more crunchers than they already have. It's no good just splitting work if it's going to be withheld from those that want to process it. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13832 Credit: 208,696,464 RAC: 304 |
Just made a post in the Café, and it sat there for ages and then I got this: "Project is down The project's database server is down. Please check back in a few hours." Refreshed & checked the thread & it went through, and everything else is responding as it was beforehand, Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13832 Credit: 208,696,464 RAC: 304 |
Splitters appear to have come back to life & managed to considerably re-fill the Ready-to-send buffer. Unfortunately the WU deleters haven't been able to keep up & the WUs Awaiting-deletion backlog has started to grow again. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.