Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 44 · 45 · 46 · 47 · 48 · 49 · 50 . . . 162 · Next
Author | Message |
---|---|
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Just remembered you are talking about a Linux installation, so ignore my last. So in Linux are the directory permissions OK? Are you running repository version of BOINC or TBar version? . . Yes, the AP app file has it's permission set OK. First thing I checked :) Do the .cl files need the permission set as well? Stephen ? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Back to my original post. Is the OpenCL component of the Linux Nvidia drivers installed? Is the AstroPulse_Kernels_r2751.cl file present? Probably should unpack the original TBar All-In-One package into the BOINC directory again and check for the AP files once more. Should reinstall the Nvidia drivers again and also make sure you get the OpenCL component marked as one of the dependencies. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
Maximum single buffer size set to:2048MBI don't know when it was set that high, it's set to 256 in the Downloaded File. I suggest you set it back to -sbs 256 I'm currently running the line; -sbs 256 -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 2304 -ffa_block_fetch 1152 on my 1060s. The MB settings are Different from the AP settings. There is an astropulse_7.08_README file in the Docs folder of the Download. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Just remembered you are talking about a Linux installation, so ignore my last. So in Linux are the directory permissions OK? Are you running repository version of BOINC or TBar version? Yes, I have permission and execute set on my file. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Maximum single buffer size set to:2048MBI don't know when it was set that high, it's set to 256 in the Downloaded File. I suggest you set it back to -sbs 256 Yes, -sbs 2048 is too high for 970's Due to a quirk in OpenCL, the card can only access a maximum of 1/4 of the cards' memory, which is 1024 MB. I think the clFFT compile which starts at the amount of memory specified in the -sbs parameter is what is making it choke. Knock the -sbs parameter down to 1024 and try again. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Back to my original post. Is the OpenCL component of the Linux Nvidia drivers installed? Is the AstroPulse_Kernels_r2751.cl file present? Probably should unpack the original TBar All-In-One package into the BOINC directory again and check for the AP files once more. Should reinstall the Nvidia drivers again and also make sure you get the OpenCL component marked as one of the dependencies. . . I am pretty sure the openCL component of the drivers is AOK or the app would not have discovered 2 opencl devices. If it were not it would only have found CUDA devices. . . I hesitate to do a full install at this point in case I wipe out something else :( Stephen ":( |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Maximum single buffer size set to:2048MBI don't know when it was set that high, it's set to 256 in the Downloaded File. I suggest you set it back to -sbs 256 . . Mine was set to 512Mb as default until I looked at it for the 1st time this am. I couldn't resist increasing it ... <my bad> But since the problem existed before I even looked at this file I doubt that is the cause. I will have a look at that readme file. Stephen .. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
This is the default AP OpenCL.txt command line that is called from app_info. -unroll 12 -sbs 256 -ffa_block 3072 -ffa_block_fetch 1536 If you look in the BOINC Projects folder and then in the Docs folder, you will find the normal AP OpenCL application tuning command suggestions at the bottom of the file For NV x80/x70 That would be the one to use for a 970. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . I was pretty sure that the infamous memory access problem on the 970s simply restricted the direct memory access to 3.n GB instead of the whole 4GB, but even if it is an issue it is not the cause of this problem as I only set it this am. Stephen ? ? |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
This is the default AP OpenCL.txt command line that is called from app_info. . .I will have to look at that too. But the change I made was in the AP command line text file which was already there and with -sbs 512 it was not the app_info.xml that I changed. Stephen ?? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
No, that was a different issue. 970's have 4GB of memory and can access all of it. It is just that the last 512KB isn't accessed at the same speed as the first 3.5GB. There is an absolute limit of 25% of card memory that OpenCL can access. From one of my posts trying to determine where the 1536 Mb of memory was coming from on my 1060 6GB card. Message 1892221 Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
This is the default AP OpenCL.txt command line that is called from app_info. The ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt file is called by the stock app_info.xml file. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . OK. I didn't know that OpenCL was limited so far in memory access/usage. But for the record, I set sbs to 256 and still the same problem, Whatever is the issue it isn't that. Stephen :( |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I don't know what to suggest next. Maybe reinstall the Nvidia driver package and also the nvidia-opencl-icd-(driver level) package. I know you said the Event Log shows the OpenCL capabilities of the driver already. But maybe something didn't get installed correctly or dependencies set correctly. Same suggestion for the BOINC package. I don't know how else to diagnose the failure of the wisdom files to compile when starting a AP gpu task. Maybe Raistmer will offer some more suggestions. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
Is the stderr.txt still saying 2048? Usually changes take effect on the next started task. Since you have slot folders for the tasks, they are probably still listed as an Active Task in the client_state.xml. What did you do, just suspend them? They are Still running as far as BOINC is concerned. To restart them you have to Stop BOINC and Remove them from the client_state.xml Active Tasks list. Then when you start BOINC they will be considered a New Task. Remove everything between <active_task_set> and </active_task_set> at the bottom of client_state.xml so it looks like this; </coproc> <dont_throttle/> </app_version> <active_task_set> </active_task_set> <platform_name>x86_64-apple-darwin</platform_name> <core_client_major_version>7</core_client_major_version> If that doesn't solve it, you probably have another corrupt AP file in the setiathome.berkeley.edu folder. Meaning you need to remove every file with AP in the name and copy the AP files from the Download again. The File in the Download has ALWAYS been -sbs 256, meaning you changed it previously to 512 and can't remember it.... |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
I don't know what to suggest next. Maybe reinstall the Nvidia driver package and also the nvidia-opencl-icd-(driver level) package. I know you said the Event Log shows the OpenCL capabilities of the driver already. But maybe something didn't get installed correctly or dependencies set correctly. Same suggestion for the BOINC package. . . There are "wisdom" files there but I cannot say if they are correct and/or valid or not ... Stephen :( |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Is the stderr.txt still saying 2048? Usually changes take effect on the next started task. Since you have slot folders for the tasks, they are probably still listed as an Active Task in the client_state.xml. What did you do, just suspend them? They are Still running as far as BOINC is concerned. To restart them you have to Stop BOINC and Remove them from the client_state.xml Active Tasks list. Then when you start BOINC they will be considered a New Task. . . Yes I suspended all the AP tasks and resumed them when the GPUs were otherwise busy so they would retry when the next task finished. . . From the sound of that I am guessing I should suspend all non running tasks and let the running tasks complete before making that change. Well that is a plan. If I ever edited that file I certainly do not remember it, or ever taking any interest in AP settings at all as I rarely ever see any. On the Linux machines the only changes I remember making are to app_info.xml and that was for -nobs and -pfb 32. Until this problem I was unaware there was a command line text file for APs in this setup. But I will go with -sbs 256 if you feel that is what it should be. The only time I have played with command line files was when I was running SoG. . . One question though, do you still feel this process is necessary if the stderr is showing -sbs 256? ? ? Running on device number: 1 Stephen ? ? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I was reading the post about getting the AMD drivers to work on older cards under 16.04 and noticed the very similar error message that you are having issues with. Anyone want to comment on the impact that the latest distributions have on locking out virtual syscalls that have been documented over in the Einstein forums for example. Could this possibly have anything to do with Stephen's problem? Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
Your theory on why Stephane is the only one having this problem? He is running 14.04.5 which isn't effected by the latest distributions. He said it worked fine previously. He has already admitted to editing both the cmdline file and the app_info without knowing what he was doing while stating, "older version of AP which does not have the appropriate apps in the folder". This places Operator Malfunction at the Top of the list in a very strong position. I booted into 14.04.5 and I can't get AP_708 to screwup in the benchmark App even with trying very hard. I'm sitting tight until I see the results from my last suggestion. i.e.; If that doesn't solve it, you probably have another corrupt AP file in the setiathome.berkeley.edu folder. Meaning you need to remove every file with AP in the name and copy the AP files from the Download again. That means All the files in the Download folder copied to the setiathome.berkeley.edu folder, including the app_info. If any changes have been made to the version numbers or plan classes since the original app_info was first used, those changes need to be applied to the original app_info file again. It also means restarting the tasks by removing them from the active tasks list. After that, the only thing left is the changes to the OS. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I agree with your assessment TBar. I thought Stephen had already performed the obvious and suggested. I would have reinstalled by now instead of beating my head against the wall. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.