Message boards :
Number crunching :
Intel GPU errors
Message board moderation
Author | Message |
---|---|
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
Have finally managed to get my |Intel |GPU crunching, but all its wu's are erroring, with the following error ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance Any thoughts from the gurus? one of the tasks is 3939457900, but they all give the same error line. P. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Have finally managed to get my |Intel |GPU crunching, but all its wu's are erroring, with the following error That's a very specific error message added by Raistmer (the programmer) - it would perhaps be best to wait until he can visit here and advise. But the combination of Intel(R) HD Graphics 4600 GPU with Driver version 10.18.14.4080 is currently under suspicion. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Have finally managed to get my |Intel |GPU crunching, but all its wu's are erroring, with the following error Sounds like the validness check code is working correctly, all your tasks overflow: Triplet: peak=8.547997, time=52.55, period=0.3736, d_freq=1419807128.91, chirp=0, fft_len=16 Autocorr: peak=14751.21, time=33.55, delay=4.1637, d_freq=1419287103.39, chirp=-0.17838, fft_len=128k Autocorr: peak=5897.015, time=60.4, delay=1.6539, d_freq=1420273441.97, chirp=0.073941, fft_len=128k Triplet: peak=6.946707, time=29.28, period=0.04588, d_freq=1419336395.26, chirp=0, fft_len=64 Check your cooling, Intel GPU drivers, and Memory speeds. Claggy |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
This error can happen when the hard drive needs defrag or your host needs a reboot. With each crime and every kindness we birth our future. |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
This error can happen when the hard drive needs defrag or your host needs a reboot. The hard drive is a SSD anyway, ands the host doesn't run 24/7 and is powered off every night. But thanks anyway, any and all ideas are welcome ;-) P. |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
Have finally managed to get my |Intel |GPU crunching, but all its wu's are erroring, with the following error Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version. P. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version. 'Latest' driver is not necessarily 'most compatible' with an older application. The same problem is affecting the Einstein@Home project, and the project administrator (Bernd Machenschalk) is going to look into it when he gets back to his desk next week, unless some more urgent crisis intervenes. |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version. I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? ( thinking that may have some bearing on the issue) P. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass: http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration Claggy |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-) P. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? Here, try this: <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> <app_version> <app_name>setiathome_v7</app_name> <plan_class>opencl_intel_gpu_sah</plan_class> <avg_ncpus>0.05</avg_ncpus> <ngpus>1.0</ngpus> </app_version> </app_config> Edit: <avg_ncpus> and <ngpus> values swapped around, Thanks Richard, Why would they have those two entries the other way around? <gpu_versions> has the <gpu_usage> first, <app_version> section has <ngpus> second. Claggy |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
<app_version> Not sure I like the look of that. 20 tasks on the iGPU, if you have enough CPUs to support them? |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? Thanks Claggy, I'll give that a try later this week. P. |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? I keep getting "missing </app_version>" ?? I've checked it against the original post, even the line spacings is the same, I'm flummoxed ;-) P. |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
forget that comment, typo on my part, sorry ;-) P. |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled. I have a later driver, (3960) that I'll try tomorrow. P. |
RFGuy_KCCO Send message Joined: 3 Apr 99 Posts: 2 Credit: 52,274,229 RAC: 0 |
ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled. If the Intel bug is working here like it does at Einstein, and I am fairly sure it is, then you will find that some WU's will fail and some will pass. Whether the WU passes or fails depends on which Intel driver you and your wingman were running: if you were both running the same newer driver, or any of the newer drivers with this "bug," your WU will probably pass. If you were running one of the newer drivers and your wingman was running one of the older drivers, your WU will almost certainly fail. If you were both running the same older driver, or any of the older drivers without this "bug," your WU will probably pass. It is a very odd issue. I wish the project admins luck in figuring it out. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled. It was noted that earlier versions of MB7 OpenCL GPU apps had an unfortunate tendency to "pass" validation with false Autocorr overflows. The error Phil is seeing (still with 3960) was inserted to keep those false signals out of the science database. The Autocorr threshold is 17.8, with normal processing a peak greater than the low 20s is very rare. There's a theoretical maximum just less than 64K. The observed false Autocorrs have had peaks above 100 and cause overflow, so that combination of conditions is declared an error by the OpenCL apps. A single peak above 100 isn't declared an error because a few have been seen in CPU results. Joe |
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0 |
Further Update. After having tried 3 drivers (3308, 3960 & 4080), I've given up. Aside from the 2 successful wu's early on, all the rest failed with the same initially reported error. Since I can't see the point of using up resources and getting nowhere, I chucked in the towel...for now ;-) P. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Sorry to hear that. Talking of drivers, I'm surprised that nobody in this thread (including me - mea culpa) has linked you back to the previous thread, largely about drivers for Intel GPUs: Intel gpu not seen by BOINC My observation, I think in general supported by other users, is that the best and only recommended driver for an HD 4600 is 3621, which can be downloaded from http://downloadmirror.intel.com/23885/a08/win64_153322.zip |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.