Message boards :
Number crunching :
Strange behaviour, tasks switch, causing a restart
Message board moderation
Author | Message |
---|---|
Wurgl (speak^Wcrunching for Special: Off-Topic) Send message Joined: 19 Jun 06 Posts: 5 Credit: 681,649 RAC: 0 |
I am using BOIC Manager 7.2.42 from OpenSuse Leap 42.3 (yes, I have to and I will upgrade this summer). Two tasks: http://setiathome.berkeley.edu/result.php?resultid=7405624489 http://setiathome.berkeley.edu/result.php?resultid=7405624487 One is computing for about 1 minute. Then BOINC decides to switch over to the other task. Switching that task means: Starting from beginning. So it happens. The second task is computing for about a minute. Then BOINC decides … see above. Nice endless loop. Sorry. I have to kill those jobs. BTW: Looking at the statistics graph in the BOINC Client, this behavior seems to happen since March 19th. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I am usin BOINC 7.8.3 on SuSE Leap 15.0, which I recommend. On another CPU, a Virtual Machine on a Windows 10 host, SuSE insists in sending me Tumbleweed as an update of Leap 15.0. BOINC manager 7.14.2 does not work in Tumbleweed and I have connected Einstein@home by a manual installation. Tullio |
Wurgl (speak^Wcrunching for Special: Off-Topic) Send message Joined: 19 Jun 06 Posts: 5 Credit: 681,649 RAC: 0 |
Maybe it is important, that these workunits are done on the graphics card, they are not CPU-only. And these seem not to save some checkpoint from where they can continue the computing. Pure CPU-jobs run fine. |
rob smith Send message Joined: 7 Mar 03 Posts: 22158 Credit: 416,307,556 RAC: 380 |
A couple of thoughts - which source did you use to get the drivers - if it was from MS then there is a fair possibility that they do not have all the computation support needed. Second, check the GPU to make sure it isn't full of dust and is seated properly. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Wurgl (speak^Wcrunching for Special: Off-Topic) Send message Joined: 19 Jun 06 Posts: 5 Credit: 681,649 RAC: 0 |
rob, this is LINUX. MS is not allowed on my machine. The problem is task switching without the ability to continue, not computation errors. BTW: Now It runs its main job: Einstein@Home and since the start post it has done 9 workunits without errors, all running on the GPU. So no dust. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
All tasks have been aborted, up to 7 weeks after issue. All tasks for computer 4990496. No evidence retained in std_err, event log could still be queried via stdoutdae.txt. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
You are trying to run the stock SoG OpenCL gpu application. Do you have the Nvidia OpenCL drivers loaded? Does the Event Log at startup show two lines of detection for your GTX 950? You should see one line showing the Nvidia driver level with the CUDA detection. Followed by another line showing the Nvidia driver level with OpenCL detection. If you don't have that second line that is why all your gpu tasks fail. You might want to reset the project in case the application entries in the client_state.xml are corrupted or something. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
I think that Raistmer's SoG application goes into 'temporary exit' and retries if certain types of driver error are encountered - lack of OpenCL might be one of them, but I don't have the documentation to hand. That would explain the symptom in the thread title, but should be checked in both the Event Log and stderr. |
betreger Send message Joined: 29 Jun 99 Posts: 11358 Credit: 29,581,041 RAC: 66 |
IIRC the Einstein Nvida app is open CL |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
rob, this is LINUX. MS is not allowed on my machine.It would be much easier to just run the current LTS version of Ubuntu with the BOINC-All-In-One package. Just install Ubuntu, the Repository 390 driver, and the BOINC package to your Home folder. Then you could be like all these people, https://setiathome.berkeley.edu/top_hosts.php, except for that one Mac that seems to be hanging in there... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
IIRC the Einstein Nvida app is open CL Yes, I see that now in his OP. But I think that the Einstein OpenCL app can run on the Mesa OpenCL package drivers. Not sure, I would have to read through the forum threads. There can be instances where two platforms are loaded for graphics drivers. The Mesa and Nvidia, the Mesa and Intel and Mesa and AMD. Each would try an load their OpenCL component. I remember a thread where the Mesa drivers unloaded the AMD OpenCL component and vice versa. Supposedly impossible for both to co-exist on the same system. Could be the case. A check with clinfo would clear things up since it lists all detected platforms and their respective components. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I've learned that a Virtual Machine cannot use the GPU of its host. In a Linux Virtual Machine on a Windows 10 PC with a Nvidia board the hwinfo --gfx command reports I have a VMWare board, which does not exist. This is probably why LHC@home does not use GPUs, because all its programs, except the"native" ones use a Virtual Machine. Tullio |
Wurgl (speak^Wcrunching for Special: Off-Topic) Send message Joined: 19 Jun 06 Posts: 5 Credit: 681,649 RAC: 0 |
I have nvidia drivers installed. Cannot install clinfo, since this would cause a conflict. However, when there is an error, the workunit shall stop with that error. It shall not try to restart (causing the same problem again, and ending up in an endless loop). |
rob smith Send message Joined: 7 Mar 03 Posts: 22158 Credit: 416,307,556 RAC: 380 |
BOINC has a built-in feature that traps tasks that re-start too often, I can't remember what the trigger number is, but seeing the real error report of a task that has undergone a number of re-starts might actually give a hint as to what the problem is. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Limit is 100. But stderr is limited to last 64KB, and with the amount of SoG output, the initial iterations and initial start data will be lost. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.