Message boards :
Number crunching :
ZOTAC GAMING GeForce GTX 1650
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
My errors: Stderr output <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process got signal 11</message> <stderr_txt> SIGSEGV: segmentation violation </stderr_txt> ]]> I added this to /etc/default/grub as suggested but it did not help: GRUB_CMDLINE_LINUX="vsyscall=emulate" Radjin~ |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Sigsegv errors are usually caused by unstable cpu clocks or unstable memory clocks. Something is corrupting memory addresses. This a OS issue and not a BOINC or Seti issue. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Spartana Send message Joined: 24 Apr 16 Posts: 99 Credit: 41,712,387 RAC: 25 |
My errors: Looks like your are running a stock BOINC install with the app "SETI@home v8 8.00 x86_64-pc-linux-gnu" and not Tbar's AIO. Was that your intention? I thought you were working towards getting the AIO up and running. |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
@Radjin I also am running a Debian (buster) 10 distribution but in a "normal" desktop, with GUI, context. I pass along the following thoughts as I follow this thread: (1) I "think" (low confidence) that I had some signal 11 errors a long time ago. One easy thing to check is the permissions for the /BOINC/slots directory. That is where boinc keeps the running status of each task. The "user" who started boinc must have "w" write permission for that directory. I just set everything... drwxrwxrwx for ~/BOINC/slots. If the boinc "user" can't write to "slots" it will crash immediately as boinc can't set up the task controls. (2) I "think" (again not 100% confidence) that boincmgr is a GUI application and so I would not expect it to respond over a ssh connection to a headless server, which I presume does not have an X server running. And I'm not even sure how one would start an X server on a host that didn't have a graphics card before you plugged in the 1650. (3) Early in this thread you asked how to find any repository-installed files that didn't get removed by an apt-remove or apt-purge action. I use "dpkg" (Debian package manager) with the "aptitude" GUI frontend but the native "dpkg" is a command line application to do lots of package management activities. One useful option is: dpkg -L "package-name" , which will list ALL the files that were created when "package-name" was installed. A bit tedious, but one can check for their existence and manually remove any residual files as necessary. Maybe "apt" has an equivalent option but I couldn't find it on a quick look at the man. (4) Regarding the "./" thing to prefix a command: The interpreter (bash?) will search for a given command in the directories given in $PATH, usually /bin, /sbin, /usr/bin, /usr/sbin, and others but NOT in the user's current working directory! There are two solutions: (1) give a full absolute path name, like /home/radjin/BOINC/boinccmd (where the leading "/" signals the interpreter that an absolute path follows; or (2) use the "./" prefix, which signals the interpreter that a "relative" path follows - and assuming you're in the /home/radjin/BOINC directory you get the right application in the current directory. You may hear, or have already heard, that the Debian distribution is a difficult/complex one to deal with. I have grown with Debian Linux from it's early days and I guess I've adapted gradually to it's style so I find it more challenging to switch to Ubuntu, or Mint, etc., than to stay with what I (more or less) understand. It does the job for me. |
rob smith Send message Joined: 7 Mar 03 Posts: 22231 Credit: 416,307,556 RAC: 380 |
You have two things going on - first the very high error count is stopping you getting any new work:
Second - your CPU work is erroring out (never mind the fact that you are worrying about not detecting any GPUs on this computer - having a CPU that is dumping every task is not going to help that. Task 8056275741 Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
My errors: AIO, another version of an app? If so I was using http://www.arkayn.us/lunatics/BOINC.7z but getting the same errors. Radjin~ |
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
@Radjin I set the permissions as you suggested; owner and group were boinc. (2) I "think" (again not 100% confidence) that boincmgr is a GUI application and so I would not expect it to respond over a ssh connection to a headless server, which I presume does not have an X server running. And I'm not even sure how one would start an X server on a host that didn't have a graphics card before you plugged in the 1650. I don’t use the GUI at all. (3) Early in this thread you asked how to find any repository-installed files that didn't get removed by an apt-remove or apt-purge action. I use "dpkg" (Debian package manager) with the "aptitude" GUI frontend but the native "dpkg" is a command line application to do lots of package management activities. One useful option is: dpkg -L "package-name" , which will list ALL the files that were created when "package-name" was installed. A bit tedious, but one can check for their existence and manually remove any residual files as necessary. Maybe "apt" has an equivalent option but I couldn't find it on a quick look at the man. I use aptitude but it may not state what changed to config files, hence the path reverting to another location. (4) Regarding the "./" thing to prefix a command: The interpreter (bash?) will search for a given command in the directories given in $PATH, usually /bin, /sbin, /usr/bin, /usr/sbin, and others but NOT in the user's current working directory! There are two solutions: (1) give a full absolute path name, like /home/radjin/BOINC/boinccmd (where the leading "/" signals the interpreter that an absolute path follows; or (2) use the "./" prefix, which signals the interpreter that a "relative" path follows - and assuming you're in the /home/radjin/BOINC directory you get the right application in the current directory. I spent an hour learning about absolute and relative paths in Linux. Thanks for the info Radjin~ Radjin~ |
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
You have two things going on - first the very high error count is stopping you getting any new work:I have not concerned myself with the GPU since getting these errors. One thing at a time. Radjin~ |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I going to make few silly questions/observations and probably you already answer but the thread is big to follow so please forgive me 1 - You make so many installations and uninstallations in the last days, so is highly probable something was left so it's hard to fix. 2 - Is possible to reinstall the host from scratch? Including the OS? 3 - Instead of Debian, could you run Ubuntu? Or maybe Mint? If you decided to start the host form 0, please post so you could be guided step by step to make your host work. FYI I never run any Linux computer before this one. Configure and put to run from zero running Ubuntu takes very little time (a couple of hours at most) and almost no headache. OK a little but nothing hard to do. Just a suggestion. |
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
I going to make few silly questions/observations and probably you already answer but the thread is big to follow so please forgive me The thought did occur to me, but this is a fairly new install. I have a web server running on it and to migrate that including the MySQL database and VPN server and all the related files an settings would definitely push my limits. I basically set up a system that was pretty failsafe with a raid 1, and nightly backups. If I had the skills to back all that up and do a clean install then restore the website and VPN I probably would. I only expected to ever have to redo the system if my raid had a catastrophic failure. Radjin~ |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
One easy test that will determine if it's boinc or the Science App is fairly easy. Create a new folder, call it test, and place inside the Science App and a WU renamed - work_unit.sah . Then cd to the folder and run the App from the terminal, ./setiathome_8.00_x86_64-pc-linux-gnu, and see if it still crashes. |
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
One easy test that will determine if it's boinc or the Science App is fairly easy. Create a new folder, call it test, and place inside the Science App and a WU renamed - work_unit.sah . Then cd to the folder and run the App from the terminal, ./setiathome_8.00_x86_64-pc-linux-gnu, and see if it still crashes. This is a fantastic idea, but need a bit more information. 1. Create a directory called test inside the science app.Not sure what you mean by science app? Inside the boinc-client directory? 2. Copy a work unit to the test directory and rename it “work_unit.sahâ€Where are these work units stored? 3. cd to the test folder and run: ./setiathome_8.00_x86_64-pc-linux-gnu.I got this part. Radjin~ |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304 |
Nope- Create a new folder, call it test, and place inside (the new folder) the Science App (setiathome_8.00_x86_64-pc-linux-gnu) and a WU (eg blc34_2bit_guppi_58643_86349_HIP33332_0131.20734.409.23.46.102.vlar) renamed - work_unit.sah1. Create a directory called test inside the science app.Not sure what you mean by science app? Inside the boinc-client directory? In the Seti project directory.2. Copy a work unit to the test directory and rename it “work_unit.sahâ€Where are these work units stored? Grant Darwin NT |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
@Radjin Just to set my mind at ease regarding read/write permissions... look at the contents of the /BOINC/slots/ directory (i.e. from BOINC working directory, cd slots, then ls -al) and there should be a few directory entries, like "0" "1" "2" etc. Those get used, and re-used, by boinc in sequential fashion to hold the work unit status. Their contents get deleted when the work unit finishes but the "numbered" directories seem to be persistent. If they don't exist then there is certainly a permissions mistake. A trick that Keith M passed along to me long ago is to "disable network activity" in the boinc options- the result being that the /slots/ information is not deleted (since they can't be uploaded) and one has time to inspect the contents at leisure. Here's what it looks like for me: drwxrwx--x 2 gene gene 4096 Sep 22 00:07 0 drwxrwx--x 2 gene gene 4096 Sep 22 00:07 1 drwxrwx--x 2 gene gene 4096 Sep 22 00:06 2 drwxrwx--x 2 gene gene 4096 Sep 21 23:51 3 drwxrwx--x 2 gene gene 4096 Sep 21 22:33 4 drwxrwx--x 2 gene gene 4096 Sep 21 23:11 5 drwxrwx--x 2 gene gene 4096 Sep 21 23:41 6 drwxrwx--x 2 gene gene 4096 Sep 22 00:08 7 drwxrwx--x 2 gene gene 4096 Sep 21 09:25 8 drwxrwx--x 2 gene gene 4096 Sep 21 09:46 9 You can tell that I'm "owner" and "group" within that directory and that anybody else does NOT have r/w permissions for those directories. I don't have any experience with headless systems so I am not sure whether "boinc" is the correct owner and group setting in that context. Are you logging in as user=boinc ? I manually start my boinc (client) and boincmgr as user=gene but in a headless machine maybe those get started some other way and the owner/group setup I use is not relevant. I saw your "strace" output (in the Q&A forum) in message #2012495 a couple of days ago. I suppose you've figured out by now that your were running an Nvidia graphics Seti (GPU) application which would fail miserably if there was no graphics card with drivers installed. Might be worthwhile to try that strace procedure again with a CPU application. Keep at it... Eventually "we" will get it fixed and there will be a collective slap of the forehead that it was so obvious. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
This has been touched on a couple of times now, but you always answer with the Linux host in mind I think. You ssh into the Linux host from a Mac, right? You can install BOINC for the Mac and use its BOINC Manager (GUI) to control the remote client, if you set up the remote control options as have been given in this thread (by Keith I think).how are you accessing the system since it's headless and no monitor, etc? If you also do everything by terminal command on your Mac only then the above is moot. But it isn't if only the headless Linux boz is the one you work on via commandline. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Uh, yes. The Science App is the CPU, or GPU, or other App, and is in the same place as the Work Units, the setiathome.berkeley.edu folder. Just place the CPU App and any WU into an empty folder and rename the WU to: work_unit.sahNope- Create a new folder, call it test, and place inside (the new folder) the Science App (setiathome_8.00_x86_64-pc-linux-gnu) and a WU (eg blc34_2bit_guppi_58643_86349_HIP33332_0131.20734.409.23.46.102.vlar) renamed - work_unit.sah1. Create a directory called test inside the science app.Not sure what you mean by science app? Inside the boinc-client directory? Then run the CPU App from the Terminal. This bypasses anything to do with boinc and will determine if the problem still exists. You could do the same thing with the GPU App, IF you had a GPU properly installed in that machine. Does it still crash? |
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
Out of the blue it starts working. I did nothing other than fly out for a few days. Thanks for all the suggestions and tips. When I have time, I will drop in the GPU that started this thread and see what challenges it will offer. Radjin~ Radjin~ |
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
I attempted once again to install the card and the system failed to boot; this after installing a higher output PSU. I contacted the manufacturer and with a meter we ran through some tests looking for a known problem. Sure enough, there was a shorted diode which causes a ground. They shipped me another card and sent a pre-paid return label. So I should have the replacement soon. Kudos to Zotac and their techs for such great support. Radjin~ |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Was the card defective as shipped? Or did you damage it during testing and installation? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Radjin Send message Joined: 2 May 00 Posts: 105 Credit: 14,928,529 RAC: 102 |
It was a known defect. Radjin~ |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.