Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 138 · 139 · 140 · 141 · 142 · 143 · 144 . . . 162 · Next
| Author | Message |
|---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
I just got done chasing up and down because my Linux box screen/keyboard was freezing. It turned out I had the "when in use" memory setting high enough the swap file was getting hit (I could hear the HD beating pretty hard). I haven't had any trouble since turned it down to 75%. And some Boinc cpu tasks from Einstein@Home started pausing "waiting for memory". I expect it is a good argument for a small SSD and/or doubling my memory. Tom A proud member of the OFA (Old Farts Association). |
|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640
|
What non-BOINC things were running that were causing you to exceed the system memory? How much memory was in the system? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours
|
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
What non-BOINC things were running that were causing you to exceed the system memory? How much memory was in the system? It was happening right after I started up the Boinc clients. The Boinc Manager would display them and then about 10-15 sec later the mouse would freeze. I have two sticks of 8 GB memory. I suppose I could add two more for a while but where is the fun in that? :) Have had an issue once I reduced the allowed memory when user is active. Tom A proud member of the OFA (Old Farts Association). |
|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640
|
strange that you would be hitting memory limits when the system is not really doing anything but BOINC. sitting at the computer at the desktop doesn't really add much, unless you had a lot of web browser windows open. Einstein Gamma-ray with 7 GPUs running only uses about 3GB of system memory on my "miner" type system which also has 16GB total, running 1 WU at a time. and that goes up to 10-11GB when running SETI with the mutex app, 2 at a time. I have my compute preferences set to allow up to 90% memory when in use (or not in use). you shouldnt be seeing waiting for memory messages unless you're hitting the boinc memory limit that is set. and you shouldnt be hitting swap unless your system as a whole is exceeding that 16GB. next time it happens open up htop and see what the system memory use is. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours
|
Jimbocous ![]() Send message Joined: 1 Apr 13 Posts: 1861 Credit: 268,616,081 RAC: 1,349
|
strange that you would be hitting memory limitsVery strange, indeed. For reference sake, Tom, here you can see what memory each of my boxes are using. Worst case is ~5G used, on a system with 7 GPUs.
|
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
strange that you would be hitting memory limits when the system is not really doing anything but BOINC. sitting at the computer at the desktop doesn't really add much, unless you had a lot of web browser windows open. [urlhttps://wp.me/p5CGc5-bD1[/url] Application Gamma-ray pulsar search #5 1.08 (FGRPSSE) Name LATeah1002F_1320.0_98332_0.0 State Waiting for memory Received Mon 20 Jan 2020 03:52:15 AM CST Report deadline Mon 03 Feb 2020 03:52:14 AM CST Estimated computation size 105,000 GFLOPs CPU time 00:00:09 CPU time since checkpoint 00:00:09 Elapsed time 00:00:10 Estimated time remaining 08:11:05 Fraction done 0.061% Virtual memory size 1.08 GB Working set size 782.45 MB Directory slots/9 Process ID 22474 Executable hsgamma_FGRP5_1.08_x86_64-pc-linux-gnu__FGRPSSE This is a task waiting for memory. A proud member of the OFA (Old Farts Association). |
|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640
|
That’s not really helpful. I’m curious to see how much memory the whole computer is using and what tasks are using it. Run htop Seti@Home classic workunits: 29,492 CPU time: 134,419 hours
|
Jimbocous ![]() Send message Joined: 1 Apr 13 Posts: 1861 Credit: 268,616,081 RAC: 1,349
|
I've developed a client that reboots itself every 15 minutes or so last night. Can't seem to clear the issue at this point. Was wondering if anyone could tell me what they know about the .wisdom files. I ask because the one for MB has a date/time stamp close to when this nonsense first began. My guess that this gets rebuilt at some point by the app (MB or AP, as the case may be) and that shutting down BOINC and deleting it might be a logical troubleshooting step. Just wondering if anyone had thoughts? Thanks!
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873
|
Wisdom files are the OpenCL compute kernel primitives for the graphics card and driver. You can safely delete them after shutting BOINC down and they will be recreated when crunching restarts. Just to be clear, I am not talking about the application *.CL file. That is required for the application. Don't delete the *3584.CL or 3556.CL files. The wisdom files are named after the card type and the driver version. There are separate ones for the MB and AP apps. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous ![]() Send message Joined: 1 Apr 13 Posts: 1861 Credit: 268,616,081 RAC: 1,349
|
Wisdom files are the OpenCL compute kernel primitives for the graphics card and driver. You can safely delete them after shutting BOINC down and they will be recreated when crunching restarts.That's what I thought. Thanks for the confirmation. Guess it's worth a shot. It seems to be blc61 files that cause the crash. Using grub to fall back from 5.3.0-26-generic to 5.0.0-37 didn't help either, and that was the only recent activity in the update log. That'll teach me to use the apt update command, even when instructed to ;) Thanks, Keith.
|
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
That’s not really helpful. I’m curious to see how much memory the whole computer is using and what tasks are using it. There are a couple of images in the URL that display Task Manager showing about 3/4 of my 16 GB in use. I believe you can see the "working" set which is rather large. As far as I can tell the Ram tracks the working set on these particular apps. I turned off the E@H cpu tasks I was running. So I can't re-create the issue except for the reported memory usage per task manager was 4-5 times what Seti cpu tasks use. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873
|
Wisdom files are the OpenCL compute kernel primitives for the graphics card and driver. You can safely delete them after shutting BOINC down and they will be recreated when crunching restarts.That's what I thought. Thanks for the confirmation. Guess it's worth a shot. It seems to be blc61 files that cause the crash. Using grub to fall back from 5.3.0-26-generic to 5.0.0-37 didn't help either, and that was the only recent activity in the update log. That'll teach me to use the apt update command, even when instructed to ;) If you start getting errors on tasks that have messages in the stderr.txt like ....initialization failed or ..... memory access denied, it is time to purge the Compute Cache of the primitives. It is located in /home/{username}/.nv/ComputeCache in Linux and in C:\Users\{username}\AppData\Roaming\NVIDIA\ComputeCache for Windows. They primitives can get corrupted or more frequently the permissions changed on the folder in Windows that prevent reading the files which the app has to do for each task crunched. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640
|
That’s not really helpful. I’m curious to see how much memory the whole computer is using and what tasks are using it. the issue was all the CPU tasks you were running. dont run so many I guess since they use up so much system memory. or add more memory if you want to run CPU work. additionally, were those 3 gravity wave WUs running 1 each on 3 different cards? or 3 on 1 GPU? the gravity wave WUs need a lot of CPU support, my system uses about 1.2-1.5 CPU threads for each GW GPU WU. can't run multiples per GPU unless you have a lot of spare threads. just what i've noticed so far on my old Xeons. the "per GPU WU" CPU percentage might be lower on the more modern chips. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours
|
Jimbocous ![]() Send message Joined: 1 Apr 13 Posts: 1861 Credit: 268,616,081 RAC: 1,349
|
If you start getting errors on tasks that have messages in the stderr.txt like ....initialization failed or ..... memory access denied, it is time to purge the Compute Cache of the primitives. It is located in /home/{username}/.nv/ComputeCache in Linux and in C:\Users\{username}\AppData\Roaming\NVIDIA\ComputeCache for Windows.In this case, the error tasks get trashed and thus returned with a compute error referring to a bad header (presumably in the returning file) so no info help there. What ever it was, I'd been crashing every 5-15 minutes, and have now been up and running for 1hr15min, so maybe it was indeed bad WUs. I have in past seen intermittent crashes like this, but few and far between. Strangely, since deleting the wisdom files, they have not been rebuilt after restart yet S@H is running fine. Guess we'll see.
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873
|
Yes, there are occasional bad tasks where the database is overloaded and can't access the task in the database and errors out. I get one or two a week. But strange that the wisdom files didn't get recreated on the very first attempt at crunching a OpenCL task. You can always see that it in the stderr.txt output with entries like "can't find opencl file . . . . recompiling" which adds about another 5 seconds to the compute time of the task. Once created, not necessary for following tasks with that card and driver. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous ![]() Send message Joined: 1 Apr 13 Posts: 1861 Credit: 268,616,081 RAC: 1,349
|
Yes, there are occasional bad tasks where the database is overloaded and can't access the task in the database and errors out. I get one or two a week. But strange that the wisdom files didn't get recreated on the very first attempt at crunching a OpenCL task. You can always see that it in the stderr.txt output with entries like "can't find opencl file . . . . recompiling" which adds about another 5 seconds to the compute time of the task. Once created, not necessary for following tasks with that card and driver. lol at myself Perhaps related to the fact that the only thing running right now is Cuda90 on GPUs and FGRPSSE on CPU ...
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873
|
I would suggest an alternative. https://github.com/jonasmalacofilho/liquidctl I was able to control both the AIO cpu fans speeds plus the pump speeds with this repository. Handles all the standard Asetek hardware across Corsair, NZXT, EVGA and Thermaltake AIO's. I got it to work quite well on Corsair H-100iV2 and EVGA CLC280 AIO's. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
That’s not really helpful. I’m curious to see how much memory the whole computer is using and what tasks are using it. Haven't made it to the other room for the picture but I can assure you that I am running without any app_config.xml file in the E@H directory so whatever load E@H decides will run on the gpus is what I am getting. It appears to be allocating 0.9 cpu's per gpu task. And it has "never" run more than one task per gpu. All the gpu tasks seem to be running at about the same "ram usage" as Seti. It is the cpu tasks that were taknig an outsized bite. I will see if I can run some E@H CPU tasks during maintenance and take a picture of Htop so we can get reliable answers. Tom A proud member of the OFA (Old Farts Association). |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
Haven't made it to the other room for the picture but I can assure you that I am running without any app_config.xml file in the E@H directory so whatever load E@H decides will run on the gpus is what I am getting. It appears to be allocating 0.9 cpu's per gpu task. And it has "never" run more than one task per gpu. All the gpu tasks seem to be running at about the same "ram usage" as Seti.This is separate from the RAM discussion, but: The figure of '0.9 cpu's per gpu task' is simply BOINC's (wildly inaccurate) estimation of - yes - what to allocate for the task. The application running the task will decide, and use, exactly what it wants. In rare cases, the developer has provided a switch - environment variable, configuration file, or command line - to toggle between 'use full CPU' or 'use less than full CPU'. If the application is running at 'less than full CPU', you again have no control over exactly how much it will use. It's usually better to make your own choices, and apply them via app_config.xml On the subject of pictures - it would be helpful if you could use an image hosting service which would allow you to show screenshots at a higher resolution than 300x240 - I found those hard to read. |
|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640
|
Like Richard said, 0.9 doesnt mean it’s using that much. That value is only used for the BOINC internal book keeping so it knows how much resources are being used and how many jobs to run. With gravity wave, I actually observed the GPU tasks using more than a full thread. About 1.2 - 1.5 CPU threads per GPU WU. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours
|
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.