Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 156 · 157 · 158 · 159 · 160 · 161 · 162 · Next
Author | Message |
---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
Came home and the Boinc Apps apparently were frozen. Couldn't kill task without re-booting whole system using power down switch. Looked hi/low with two different log inspectors for Linux and can't see a durn thing... :( I am doing "great". Someone mentioned 7.16 so I downloaded what I thought was the most current archive and can't see such a version. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I am doing "great". Someone mentioned 7.16 so I downloaded what I thought was the most current archive and can't see such a version. Downloaded from "where"? The project only supplies the ancient 7.4.22. Either get the latest 7.16 branch from the github/BOINC repository and compile it yourself or avail yourself of the 7.16.3 clients available at the team website. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Absolutely Nothing wrong with BOINC 7.14.2. If you go to the BOINC Page 7.14.2 is Still the Recommended version, https://boinc.berkeley.edu/download_all.php This is what it says about the other; 7.16.3 Development versionEvery time I try 7.16.x on My 14 GPU system I end up with Hundreds or Thousands of Stuck Uploads that Refuse to upload. I don't have that problem with 7.14.2. For now I'm staying with 7.14.2. Soon I'll place a new version in the All-In-One that has shortened Retry times, and 4 minutes for the Finish File Time. Also, the default BOINC version will have OpenSSL 1.1, so it will need Ubuntu 18.04 or newer. 18.04 works with both OpenSSL 1.0 & 1.1, any version older requires 1.0, anything newer requires 1.1. BTW, if you have an Issue with the Finish File, 7.16 will still not fix it if it's from a Hung GPU. This is what you get with 7.16 if it's a problem with a Stuck task or Hung GPU; <core_client_version>7.16.2</core_client_version> <![CDATA[ <message> Process still present 5 min after writing finish file; aborting</message> <stderr_txt> setiathome_CUDA: Found 14 CUDA device(s)... All it does is extend the time out, it still fails. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Since I don't push any hardware to the limit vis a vis "mining" rigs, I haven't had any issues with "finish file" errors since moving to 7.16.3. If you are running rigs that stretch the limits of the hardware and OS', you should expect it to be more cantankerous and need a bit more hand holding than using off the shelf normal motherboards and gpu counts. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Someone else is having problems too; <core_client_version>7.16.3</core_client_version> <![CDATA[ <message> Process still present 5 min after writing finish file; aborting</message> <stderr_txt> setiathome_CUDA: Found 7 CUDA device(s)... Those seem to popup with stalled tasks and Hung GPUs. A week or so ago I moved my cooler running 1060s to the machine with the enclosed case. For the next few days I had numerous dropped GPUs on both machines. As usual, I just kept identifying the misbehaving GPU and then changing the Power connection, or changing the USB connection to a different slot, or moving the GPU to a different Power Supply. Finally, all the problems stopped again and all is well. I'd say over 95% of my problems are solved by just changing the Power connection. Other than stalled tasks, or Hung GPUs, I never get the Finish File error even with 14 GPUs. I do have a rather New SSD, I bought a new one trying to solve the Stalled Upload problem, it didn't help with the Uploads... |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Did anyone try the newer 7.17.0 Boinc? Does it has the fix for this error? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Did anyone try the newer 7.17.0 Boinc? Does it has the fix for this error? It was fixed in 7.16.3. Would have the same fix in any newer branch. The fix is no solution for a hung gpu, nor was it ever intended. Fix the reason for the hung gpu. Changing the timeout from 10 seconds to 300 seconds fixed it for normally well running clients. Per David Anderson: client: increase finish-file timeout Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
BTW, if you have an Issue with the Finish File, 7.16 will still not fix it if it's from a Hung GPU. This is what you get with 7.16 if it's a problem with a Stuck task or Hung GPU; +1 |
Siran d'Vel'nahr Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238 |
Absolutely Nothing wrong with BOINC 7.14.2. If you go to the BOINC Page 7.14.2 is Still the Recommended version, https://boinc.berkeley.edu/download_all.php This is what it says about the other; Hi TBar, I had a similar instance as Tom had where I was watching a DVD movie and when it was done BOINC was frozen solid. I tried shutting BOINC down in the normal manner using the Exit BOINC and end tasks, but it would not shut down. I tried using System Monitor to kill the process and that would not work either. The only way I could get rid of BOINC was to do a hard reboot. I haven't had that happen since, probably because I shut BOINC down when I watch a video now. I am using BOINC v7.14.2, running on Linux Mint v19.3 Tricia, NVIDIA Driver Version: 435.21. Any ideas? :) By the way, I do not and can not compile a program in Linux simply because I do not know how to and don't want to know. ;) Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
Came home and the Boinc Apps apparently were frozen. I poked around this morning and located the post from Ian&Steve on adding: :pci=nommconf" to the grub command line. That file is located in /etc/default and you need to run the editor like this: "sudo nano grub" to allow you to edit it. Afterwards: sudo update-grub and then do a system restart. It seems like this may constrain the cpu 100% Boinc paused because cpu is busy problem that I have run across most often in Intel cpus. Tom A proud member of the OFA (Old Farts Association). |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Can you be more specific? Did it help or hurt the problem? What is the problem? I really can’t tell from your post. But that setting has nothing to do with CPU use. And I only ever had to add that to one system (the ASUS z270 motherboard system with i7-7700k and 7x RTX 2070). My other Intel systems are all untouched. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
Can you be more specific? Did it help or hurt the problem? What is the problem? I really can’t tell from your post. It took a while for me to "remember". When I cam home the intel was "running" but all the Boinc tasks were suspended because of non-boinc tasks making the system "too busy". I believe the settings say above 95% for user offline and 85% for user online on the "too busy". The symptoms are familiar. I think it may have to do with the HD getting full and then the cpu trying to clear space. The fix for keeping the HD from getting full is to reduce the amount of reporting. Hence the "pci=xxxxx" parameter. It does take a hard boot because the tasks simply won't die without it. It took a week or more to generate this hiccup so I won't know if that fixed the issue for at least that long. I couldn't find any report of the HD full in the logs I looked at. If the problem re-occurs I promise a screenshot since the system was responsive but Boinc tasks were "paused". Tom A proud member of the OFA (Old Farts Association). |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Did you at least check if the drive was filling up? Did you check the size? Or the size of the log files specifically? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
J. Mileski Send message Joined: 9 Jun 02 Posts: 632 Credit: 172,116,532 RAC: 572 |
I just reinstalled linux on a computer 8658701 that had an older version of linux on it. I got this message:
here is my app info: <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>801</version_num> <plan_class>cuda90</plan_class> <cmdline>-nobs</cmdline> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>1</avg_ncpus> <max_ncpus>1</max_ncpus> <file_ref> <file_name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90</file_name> <main_program/> </file_ref> </app_version> <app> <name>astropulse_v7</name> </app> <file_info> <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name> <executable/> </file_info> <file_info> <name>AstroPulse_Kernels_r2751.cl</name> </file_info> <file_info> <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name> </file_info> <app_version> <app_name>astropulse_v7</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>708</version_num> <plan_class>opencl_nvidia_100</plan_class> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>1</avg_ncpus> <max_ncpus>1</max_ncpus> <file_ref> <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name> <main_program/> </file_ref> <file_ref> <file_name>AstroPulse_Kernels_r2751.cl</file_name> </file_ref> <file_ref> <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name> <open_name>ap_cmdline.txt</open_name> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>800</version_num> <file_ref> <file_name>MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app> <name>astropulse_v7</name> </app> <file_info> <name>ap_7.05r2728_sse3_linux64</name> <executable/> </file_info> <app_version> <app_name>astropulse_v7</app_name> <version_num>704</version_num> <platform>x86_64-pc-linux-gnu</platform> <plan_class></plan_class> <file_ref> <file_name>ap_7.05r2728_sse3_linux64</file_name> <main_program/> </file_ref> </app_version> </app_info> I have in the directory: ap_7.05r2728_sse3_linux64, astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100, setiathome_x41p_V0.97_x86_64-pc-linux-gnu_cuda90, setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90, setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda102, MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu, MBv8_8.22r3711_sse41_intel_x86_64-pc-linux-gnu, and MBv8_8.22r4008_avx2_intel_x86_64-pc-linux-gnu. can anyone tell me what I am missing? I can't see what is wrong. Edited to the correct computer. |
J. Mileski Send message Joined: 9 Jun 02 Posts: 632 Credit: 172,116,532 RAC: 572 |
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Is the cpu binary marked executable? Is it correctly named in the app_info with no typoes? Have you tried editing the app_info and replace it with the other cpu app MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
J. Mileski Send message Joined: 9 Jun 02 Posts: 632 Credit: 172,116,532 RAC: 572 |
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I wouldn't expect that to be a problem. I have half a dozen variations of the cpu app sitting in my projects folder. As long as the app_info points to a viable binary name, all is good. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
J. Mileski Send message Joined: 9 Jun 02 Posts: 632 Credit: 172,116,532 RAC: 572 |
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
That Line doesn't mean what You think it does, look at the other line below it; Sat 29 Feb 2020 06:52:22 PM GMT | SETI@home | No tasks are available for SETI@home v8The Server thinks the machine reported an Error, So, it put you in the Penalty box. I don't see an Error listed, but, the Max tasks per day was reset to a very low number, SETI@home v8 (anonymous platform, NVIDIA GPU)The machine is working normally, as it completes more tasks the Max per day will increase and the server will start sending more tasks. I see nothing wrong with the machine, https://setiathome.berkeley.edu/results.php?hostid=8658701&offset=220 Give it some time. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.