Linux 2.6.7 hang with boinc_4.05

Questions and Answers : Unix/Linux : Linux 2.6.7 hang with boinc_4.05
Message board moderation

To post messages, you must log in.

AuthorMessage
Mark Le Huray

Send message
Joined: 8 Aug 02
Posts: 3
Credit: 4,521
RAC: 0
United Kingdom
Message 28093 - Posted: 19 Sep 2004, 19:07:31 UTC

I have sucessfully used the boinc clients pre 4 on the same machine with this kernel version and no problems. I am also using this boinc version on other Linux 2.4 kernels without issue.

However on the 2.6.7 kernel machine I can run the client, it processes work as expected, however after a seemingly random interval the machine will freeze completely and has to be powered off to recover. (No i/o at all, keyboard dead, capslock light etc will not come on.)

This is also the only AMD machine I have, which may also be a factor.

I have checked the stderr.txt files etc and they contain nothing so somewhat at a loss and have had to stop running boinc on this machine.

Have just tried the 4.08 client and it does the same thing, I am running this machine in the unstable Debian branch and yes it has been like this for sometime now - beyond multiple upgrades.

This is when running climate predict jobs (not sure if restricted to them as don't have any other v4 client work)

Client state info below:



3600
workstation
127.0.0.1
<p>1</p>
<p>AuthenticAMD</p>
<p>AMD Athlon(tm) XP 2600+</p>
<p>1062207333.926679</p>
<p>2371241469.470033</p>
<p>1000000000.000000</p>
<p>0</p>
<p>0</p>
<p>0</p>
<p>1095615253.392590</p>
Linux
2.6.7
529502208.000000
1000000.000000
1003474944.000000
79584657408.000000
33019998208.000000


Not sure what you need but gcc version...

Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs
Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.4 (Debian 1:3.3.4-11)

Thanks in advance

Mark



ID: 28093 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 28222 - Posted: 20 Sep 2004, 1:16:27 UTC

> ...after a seemingly random interval the machine will freeze completely...


What anomalies are there in your logs for the previous 24 hours?

Any oops, lost interrupts, or other error events in /var/log/messages or elsewhere?

Boinc - s@h running is likely just a coincidence.

Also, the kernel is up at 2.6.8 now. Check the change logs for anything specific to your hardware (motherboard, peripherals...), or just try it.

Beware spurious hardware errors causing untold confusion. Common problems are overheating and failing hard drives. On 2.6.x, some of the USB operations are error prone for some hardware still.


If Boinc-s@h really is the culpret, then you've really found a kernel bug! Worth reporting.

Good luck,
Martin
ID: 28222 · Report as offensive
Mark Le Huray

Send message
Joined: 8 Aug 02
Posts: 3
Credit: 4,521
RAC: 0
United Kingdom
Message 28428 - Posted: 20 Sep 2004, 19:23:58 UTC - in response to Message 28222.  

Thanks for the pointers Martin,

Nothing abnormal in the logs, I had suspected temp but sensorsd reports all temps within in tolerances, SMART also says the disks are ok, it may be memory related but only crashes when Boinc is running, admitedly thats the only real time that the machine is under load.

Am downloading the 2.6.8 kernel at the moment so will see what happens once I have that in place.

Thanks

Mark
ID: 28428 · Report as offensive
Profile abject
Avatar

Send message
Joined: 3 Apr 99
Posts: 65
Credit: 857,951
RAC: 0
United States
Message 29819 - Posted: 24 Sep 2004, 23:26:36 UTC - in response to Message 28428.  
Last modified: 24 Sep 2004, 23:29:32 UTC

FWIW, I run the 4.09 client on 2 Athlons, a 2600 running 2.6.3 and a 2400 under 2.6.7 (Debian, compiled from source) and haven't seen the problems you describe.

Also, be sure you go to 2.6.8.1 at least. There was some gaping security hole in 2.6.8, as I recall.
ID: 29819 · Report as offensive
Mark Le Huray

Send message
Joined: 8 Aug 02
Posts: 3
Credit: 4,521
RAC: 0
United Kingdom
Message 53137 - Posted: 12 Dec 2004, 13:02:25 UTC - in response to Message 29819.  

> FWIW, I run the 4.09 client on 2 Athlons, a 2600 running 2.6.3 and a 2400
> under 2.6.7 (Debian, compiled from source) and haven't seen the problems you
> describe.
>
> Also, be sure you go to 2.6.8.1 at least. There was some gaping security hole
> in 2.6.8, as I recall.
>
>
Thanks abject. I have since found out more about this one and am currently running 4.13 without any problems as long as I don't run the Climate Predict jobs, the Seti jobs work correctly so this problem looks like being related to the job rather than Boinc itself.

I number of other people have also reported this problem with Climate Predict and AMD processors.
ID: 53137 · Report as offensive

Questions and Answers : Unix/Linux : Linux 2.6.7 hang with boinc_4.05


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.