Questions and Answers :
Unix/Linux :
Linux 2.6.7 hang with boinc_4.05
Message board moderation
Author | Message |
---|---|
Mark Le Huray Send message Joined: 8 Aug 02 Posts: 3 Credit: 4,521 RAC: 0 |
I have sucessfully used the boinc clients pre 4 on the same machine with this kernel version and no problems. I am also using this boinc version on other Linux 2.4 kernels without issue. However on the 2.6.7 kernel machine I can run the client, it processes work as expected, however after a seemingly random interval the machine will freeze completely and has to be powered off to recover. (No i/o at all, keyboard dead, capslock light etc will not come on.) This is also the only AMD machine I have, which may also be a factor. I have checked the stderr.txt files etc and they contain nothing so somewhat at a loss and have had to stop running boinc on this machine. Have just tried the 4.08 client and it does the same thing, I am running this machine in the unstable Debian branch and yes it has been like this for sometime now - beyond multiple upgrades. This is when running climate predict jobs (not sure if restricted to them as don't have any other v4 client work) Client state info below: 3600 workstation 127.0.0.1 <p>1</p> <p>AuthenticAMD</p> <p>AMD Athlon(tm) XP 2600+</p> <p>1062207333.926679</p> <p>2371241469.470033</p> <p>1000000000.000000</p> <p>0</p> <p>0</p> <p>0</p> <p>1095615253.392590</p> Linux 2.6.7 529502208.000000 1000000.000000 1003474944.000000 79584657408.000000 33019998208.000000 Not sure what you need but gcc version... Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux Thread model: posix gcc version 3.3.4 (Debian 1:3.3.4-11) Thanks in advance Mark |
ML1 Send message Joined: 25 Nov 01 Posts: 20283 Credit: 7,508,002 RAC: 20 |
> ...after a seemingly random interval the machine will freeze completely... What anomalies are there in your logs for the previous 24 hours? Any oops, lost interrupts, or other error events in /var/log/messages or elsewhere? Boinc - s@h running is likely just a coincidence. Also, the kernel is up at 2.6.8 now. Check the change logs for anything specific to your hardware (motherboard, peripherals...), or just try it. Beware spurious hardware errors causing untold confusion. Common problems are overheating and failing hard drives. On 2.6.x, some of the USB operations are error prone for some hardware still. If Boinc-s@h really is the culpret, then you've really found a kernel bug! Worth reporting. Good luck, Martin |
Mark Le Huray Send message Joined: 8 Aug 02 Posts: 3 Credit: 4,521 RAC: 0 |
Thanks for the pointers Martin, Nothing abnormal in the logs, I had suspected temp but sensorsd reports all temps within in tolerances, SMART also says the disks are ok, it may be memory related but only crashes when Boinc is running, admitedly thats the only real time that the machine is under load. Am downloading the 2.6.8 kernel at the moment so will see what happens once I have that in place. Thanks Mark |
abject Send message Joined: 3 Apr 99 Posts: 65 Credit: 857,951 RAC: 0 |
FWIW, I run the 4.09 client on 2 Athlons, a 2600 running 2.6.3 and a 2400 under 2.6.7 (Debian, compiled from source) and haven't seen the problems you describe. Also, be sure you go to 2.6.8.1 at least. There was some gaping security hole in 2.6.8, as I recall. |
Mark Le Huray Send message Joined: 8 Aug 02 Posts: 3 Credit: 4,521 RAC: 0 |
> FWIW, I run the 4.09 client on 2 Athlons, a 2600 running 2.6.3 and a 2400 > under 2.6.7 (Debian, compiled from source) and haven't seen the problems you > describe. > > Also, be sure you go to 2.6.8.1 at least. There was some gaping security hole > in 2.6.8, as I recall. > > Thanks abject. I have since found out more about this one and am currently running 4.13 without any problems as long as I don't run the Climate Predict jobs, the Seti jobs work correctly so this problem looks like being related to the job rather than Boinc itself. I number of other people have also reported this problem with Climate Predict and AMD processors. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.