Intel GPU errors

Message boards : Number crunching : Intel GPU errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1634224 - Posted: 28 Jan 2015, 20:40:33 UTC

Have finally managed to get my |Intel |GPU crunching, but all its wu's are erroring, with the following error

ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance

Any thoughts from the gurus? one of the tasks is 3939457900, but they all give the same error line.

P.
ID: 1634224 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1634234 - Posted: 28 Jan 2015, 21:07:19 UTC - in response to Message 1634224.  

Have finally managed to get my |Intel |GPU crunching, but all its wu's are erroring, with the following error

ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance

Any thoughts from the gurus? one of the tasks is 3939457900, but they all give the same error line.

P.

That's a very specific error message added by Raistmer (the programmer) - it would perhaps be best to wait until he can visit here and advise.

But the combination of Intel(R) HD Graphics 4600 GPU with Driver version 10.18.14.4080 is currently under suspicion.
ID: 1634234 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1634239 - Posted: 28 Jan 2015, 21:15:28 UTC - in response to Message 1634224.  

Have finally managed to get my |Intel |GPU crunching, but all its wu's are erroring, with the following error

ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance

Sounds like the validness check code is working correctly, all your tasks overflow:

Triplet: peak=8.547997, time=52.55, period=0.3736, d_freq=1419807128.91, chirp=0, fft_len=16
Triplet: peak=8.817341, time=75.98, period=0.1327, d_freq=1419809265.14, chirp=0, fft_len=32
Autocorr: peak=2454.044, time=6.711, delay=2.864, d_freq=1419804686.78, chirp=-0.10721, fft_len=128k
Autocorr: peak=696.5111, time=100.7, delay=4.3372, d_freq=1419804672.15, chirp=-0.1525, fft_len=128k
Autocorr: peak=20499.46, time=100.7, delay=1.4327, d_freq=1419804647.96, chirp=-0.39281, fft_len=128k
Autocorr: peak=45.03565, time=20.13, delay=4.5878, d_freq=1419804701.51, chirp=0.69597, fft_len=128k
Autocorr: peak=62.24554, time=73.82, delay=0.30515, d_freq=1419804616.88, chirp=-0.95661, fft_len=128k
Autocorr: peak=30918.25, time=87.24, delay=0.63416, d_freq=1419804804.82, chirp=1.3448, fft_len=128k
Autocorr: peak=8091.252, time=6.711, delay=3.105, d_freq=1419804676.12, chirp=-1.6951, fft_len=128k
Autocorr: peak=17076.9, time=87.24, delay=3.4655, d_freq=1419804526.31, chirp=-1.8476, fft_len=128k
Autocorr: peak=1723.94, time=73.82, delay=3.1788, d_freq=1419804836.65, chirp=2.0204, fft_len=128k
Autocorr: peak=17255.18, time=87.24, delay=0.76749, d_freq=1419804871.43, chirp=2.1082, fft_len=128k
Autocorr: peak=26448.92, time=73.82, delay=1.5059, d_freq=1419804859.3, chirp=2.3273, fft_len=128k
Autocorr: peak=1268.563, time=46.98, delay=5.2845, d_freq=1419804572.62, chirp=-2.4456, fft_len=128k
Autocorr: peak=4844.813, time=100.7, delay=1.0539, d_freq=1419804428.29, chirp=-2.575, fft_len=128k
Autocorr: peak=4292.945, time=73.82, delay=4.0259, d_freq=1419804877.72, chirp=2.5768, fft_len=128k
Autocorr: peak=11046.95, time=87.24, delay=4.3223, d_freq=1419804451.97, chirp=-2.6998, fft_len=128k
Autocorr: peak=15692.66, time=6.711, delay=4.8716, d_freq=1419804667.48, chirp=-2.9826, fft_len=128k
Autocorr: peak=215.7986, time=60.4, delay=1.9192, d_freq=1419804502.89, chirp=-3.0565, fft_len=128k
Autocorr: peak=68.29662, time=73.82, delay=3.8869, d_freq=1419804461.66, chirp=-3.0593, fft_len=128k
Autocorr: peak=14624.52, time=100.7, delay=6.4161, d_freq=1419804379.08, chirp=-3.0639, fft_len=128k
Autocorr: peak=11016.77, time=60.4, delay=3.5723, d_freq=1419804872.89, chirp=3.0695, fft_len=128k
Autocorr: peak=32322.29, time=46.98, delay=3.7049, d_freq=1419804538.45, chirp=-3.173, fft_len=128k
Autocorr: peak=18.41169, time=33.55, delay=3.1511, d_freq=1419804795.27, chirp=3.2118, fft_len=128k
Autocorr: peak=9691.82, time=73.82, delay=1.5176, d_freq=1419804929.92, chirp=3.2839, fft_len=128k
Autocorr: peak=20102.54, time=87.24, delay=1.5192, d_freq=1419804400.77, chirp=-3.2867, fft_len=128k
Autocorr: peak=3451.05, time=46.98, delay=4.8026, d_freq=1419804842.5, chirp=3.2996, fft_len=128k
Autocorr: peak=19856.27, time=46.98, delay=0.15135, d_freq=1419804848.19, chirp=3.4207, fft_len=128k
Autocorr: peak=11943.83, time=87.24, delay=1.8566, d_freq=1419804988.83, chirp=3.454, fft_len=128k
Autocorr: peak=6399.836, time=20.13, delay=5.3675, d_freq=1419804761.91, chirp=3.6961, fft_len=128k
ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance
GPU device synched


Autocorr: peak=14751.21, time=33.55, delay=4.1637, d_freq=1419287103.39, chirp=-0.17838, fft_len=128k
Autocorr: peak=25.57696, time=100.7, delay=2.4548, d_freq=1419287261.4, chirp=1.5102, fft_len=128k
Autocorr: peak=1343.028, time=6.711, delay=2.6708, d_freq=1419287121.35, chirp=1.7838, fft_len=128k
Autocorr: peak=9897.263, time=60.4, delay=0.20152, d_freq=1419286995.5, chirp=-1.8855, fft_len=128k
Autocorr: peak=103.3339, time=87.24, delay=6.2265, d_freq=1419287279.92, chirp=1.9548, fft_len=128k
Autocorr: peak=57.61849, time=73.82, delay=3.1857, d_freq=1419286962.96, chirp=-1.9835, fft_len=128k
Autocorr: peak=6697.424, time=73.82, delay=1.4043, d_freq=1419286960.64, chirp=-2.0149, fft_len=128k
Autocorr: peak=5510.812, time=6.711, delay=1.9207, d_freq=1419287123.04, chirp=2.0361, fft_len=128k
Autocorr: peak=15392.84, time=6.711, delay=1.0162, d_freq=1419287123.16, chirp=2.0546, fft_len=128k
Autocorr: peak=34574.33, time=20.13, delay=1.4075, d_freq=1419287160.71, chirp=2.55, fft_len=128k
Autocorr: peak=45.70876, time=60.4, delay=6.2855, d_freq=1419286948.99, chirp=-2.6554, fft_len=128k
Autocorr: peak=4250.707, time=87.24, delay=1.567, d_freq=1419287351.68, chirp=2.7774, fft_len=128k
Autocorr: peak=3688.459, time=60.4, delay=6.4526, d_freq=1419286938.72, chirp=-2.8255, fft_len=128k
Autocorr: peak=126.2132, time=60.4, delay=3.266, d_freq=1419287297.89, chirp=3.1212, fft_len=128k
Autocorr: peak=17.8795, time=100.7, delay=5.513, d_freq=1419287431.01, chirp=3.1952, fft_len=128k
Autocorr: peak=17.99612, time=100.7, delay=5.513, d_freq=1419287431.38, chirp=3.1989, fft_len=128k
Autocorr: peak=156.5406, time=33.55, delay=0.65556, d_freq=1419287218.88, chirp=3.2636, fft_len=128k
Autocorr: peak=17035.64, time=87.24, delay=4.7367, d_freq=1419287397.88, chirp=3.307, fft_len=128k
Autocorr: peak=3129.808, time=87.24, delay=5.6233, d_freq=1419287405.3, chirp=3.392, fft_len=128k
Autocorr: peak=29094.74, time=87.24, delay=4.7223, d_freq=1419287405.62, chirp=3.3957, fft_len=128k
Autocorr: peak=31.32333, time=46.98, delay=1.1881, d_freq=1419286946.25, chirp=-3.4724, fft_len=128k
Autocorr: peak=5978.248, time=100.7, delay=4.7816, d_freq=1419286745.87, chirp=-3.6111, fft_len=128k
Autocorr: peak=263.2868, time=73.82, delay=1.8172, d_freq=1419287376.76, chirp=3.6222, fft_len=128k
Autocorr: peak=395.3669, time=100.7, delay=2.9232, d_freq=1419287475.2, chirp=3.6342, fft_len=128k
Autocorr: peak=48.34214, time=46.98, delay=3.6454, d_freq=1419286920.55, chirp=-4.0196, fft_len=128k
Autocorr: peak=501.7761, time=6.711, delay=1.896, d_freq=1419287136.66, chirp=4.0658, fft_len=128k
Autocorr: peak=3858.504, time=73.82, delay=5.2334, d_freq=1419287416.13, chirp=4.1555, fft_len=128k
Autocorr: peak=30.21256, time=100.7, delay=1.219, d_freq=1419286688, chirp=-4.186, fft_len=128k
Autocorr: peak=49.23941, time=100.7, delay=0.9345, d_freq=1419287532.05, chirp=4.1989, fft_len=128k
Autocorr: peak=266.2084, time=33.55, delay=0.18688, d_freq=1419286958.99, chirp=-4.4817, fft_len=128k
ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance
GPU device synched


Autocorr: peak=5897.015, time=60.4, delay=1.6539, d_freq=1420273441.97, chirp=0.073941, fft_len=128k
Autocorr: peak=3252.252, time=46.98, delay=4.7843, d_freq=1420273445.23, chirp=0.16452, fft_len=128k
Autocorr: peak=17993.13, time=60.4, delay=0.9983, d_freq=1420273465.52, chirp=0.46398, fft_len=128k
Autocorr: peak=16798.98, time=33.55, delay=4.4058, d_freq=1420273417.34, chirp=-0.60077, fft_len=128k
Autocorr: peak=98.24163, time=20.13, delay=5.5876, d_freq=1420273459.03, chirp=1.0694, fft_len=128k
Autocorr: peak=12256.99, time=33.55, delay=1.9765, d_freq=1420273397.37, chirp=-1.196, fft_len=128k
Autocorr: peak=1083.238, time=20.13, delay=3.9474, d_freq=1420273411.54, chirp=-1.2893, fft_len=128k
Autocorr: peak=1770.725, time=33.55, delay=1.5944, d_freq=1420273482.31, chirp=1.3356, fft_len=128k
Autocorr: peak=2522.959, time=60.4, delay=5.7018, d_freq=1420273356.28, chirp=-1.3448, fft_len=128k
Autocorr: peak=16738.13, time=73.82, delay=1.8275, d_freq=1420273324.31, chirp=-1.5333, fft_len=128k
Autocorr: peak=41.74892, time=20.13, delay=3.4506, d_freq=1420273470.83, chirp=1.6554, fft_len=128k
Autocorr: peak=195.8669, time=73.82, delay=6.2309, d_freq=1420273284.19, chirp=-2.0768, fft_len=128k
Autocorr: peak=390.3254, time=60.4, delay=0.22118, d_freq=1420273308.88, chirp=-2.1295, fft_len=128k
Autocorr: peak=20503.13, time=33.55, delay=4.3223, d_freq=1420273365.21, chirp=-2.1545, fft_len=128k
Autocorr: peak=22215.29, time=33.55, delay=0.96737, d_freq=1420273529.83, chirp=2.7515, fft_len=128k
Autocorr: peak=226.077, time=6.711, delay=1.5502, d_freq=1420273417.37, chirp=-3.0002, fft_len=128k
Autocorr: peak=886.3749, time=73.82, delay=2.6547, d_freq=1420273214.67, chirp=-3.0186, fft_len=128k
Autocorr: peak=15582.75, time=60.4, delay=6.4504, d_freq=1420273620.21, chirp=3.0251, fft_len=128k
Autocorr: peak=106.1889, time=33.55, delay=1.7649, d_freq=1420273334.69, chirp=-3.0639, fft_len=128k
Autocorr: peak=1560.656, time=33.55, delay=6.5372, d_freq=1420273326.91, chirp=-3.2959, fft_len=128k
Autocorr: peak=8157.41, time=6.711, delay=0.96061, d_freq=1420273459.65, chirp=3.3005, fft_len=128k
Autocorr: peak=27127.73, time=60.4, delay=4.8073, d_freq=1420273654.32, chirp=3.5898, fft_len=128k
Autocorr: peak=107.7386, time=100.7, delay=0.95898, d_freq=1420273064.23, chirp=-3.7081, fft_len=128k
Autocorr: peak=7314.886, time=87.24, delay=2.2364, d_freq=1420273763.1, chirp=3.7322, fft_len=128k
Autocorr: peak=756.7602, time=46.98, delay=3.5802, d_freq=1420273261.87, chirp=-3.7386, fft_len=128k
Autocorr: peak=517.3344, time=20.13, delay=1.9796, d_freq=1420273352.29, chirp=-4.2322, fft_len=128k
Autocorr: peak=21035.37, time=100.7, delay=1.8093, d_freq=1420272975.47, chirp=-4.5899, fft_len=128k
Autocorr: peak=15109.07, time=87.24, delay=4.3752, d_freq=1420273009.33, chirp=-4.9078, fft_len=128k
Autocorr: peak=23918.48, time=6.711, delay=4.3372, d_freq=1420273470.86, chirp=4.9716, fft_len=128k
Autocorr: peak=12715.65, time=33.55, delay=0.33618, d_freq=1420273256.6, chirp=-5.3912, fft_len=128k
ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance
GPU device synched


Triplet: peak=6.946707, time=29.28, period=0.04588, d_freq=1419336395.26, chirp=0, fft_len=64
Autocorr: peak=3243.452, time=33.55, delay=1.4358, d_freq=1419335916.19, chirp=-0.63497, fft_len=128k
Autocorr: peak=168.9512, time=46.98, delay=3.6067, d_freq=1419335978.66, chirp=0.8762, fft_len=128k
Autocorr: peak=15095.81, time=73.82, delay=6.0169, d_freq=1419336012.76, chirp=1.0195, fft_len=128k
Autocorr: peak=958.197, time=87.24, delay=6.4478, d_freq=1419335830.18, chirp=-1.2302, fft_len=128k
Autocorr: peak=21070.54, time=46.98, delay=0.22651, d_freq=1419336004.97, chirp=1.4363, fft_len=128k
Autocorr: peak=35051.41, time=87.24, delay=4.9076, d_freq=1419335804.94, chirp=-1.5195, fft_len=128k
Autocorr: peak=601.4368, time=100.7, delay=4.4932, d_freq=1419336126.46, chirp=1.8772, fft_len=128k
Autocorr: peak=432.6196, time=20.13, delay=0.6658, d_freq=1419335976.43, chirp=1.9336, fft_len=128k
Autocorr: peak=121.9143, time=73.82, delay=5.6659, d_freq=1419336085.42, chirp=2.0038, fft_len=128k
Autocorr: peak=97.1031, time=6.711, delay=4.2734, d_freq=1419335953.16, chirp=2.3328, fft_len=128k
Autocorr: peak=9652.735, time=100.7, delay=3.8919, d_freq=1419336175.68, chirp=2.3661, fft_len=128k
Autocorr: peak=5053.325, time=46.98, delay=6.041, d_freq=1419336064.28, chirp=2.6988, fft_len=128k
Autocorr: peak=7405.501, time=20.13, delay=2.3535, d_freq=1419335880.5, chirp=-2.831, fft_len=128k
Autocorr: peak=58.63932, time=100.7, delay=4.0046, d_freq=1419336224.62, chirp=2.8523, fft_len=128k
Autocorr: peak=225.3221, time=100.7, delay=5.0924, d_freq=1419336240.34, chirp=3.0085, fft_len=128k
Autocorr: peak=58.33301, time=60.4, delay=6.3096, d_freq=1419336122.72, chirp=3.0667, fft_len=128k
Autocorr: peak=85.48724, time=73.82, delay=5.3185, d_freq=1419335656.19, chirp=-3.8107, fft_len=128k
Autocorr: peak=11729.14, time=46.98, delay=3.0573, d_freq=1419335747.07, chirp=-4.0538, fft_len=128k
Autocorr: peak=12947.7, time=73.82, delay=1.8304, d_freq=1419336242.62, chirp=4.1333, fft_len=128k
Autocorr: peak=23785.53, time=33.55, delay=2.6709, d_freq=1419335793.54, chirp=-4.2904, fft_len=128k
Autocorr: peak=2226.989, time=20.13, delay=2.9282, d_freq=1419336025.57, chirp=4.3745, fft_len=128k
Autocorr: peak=16238.21, time=46.98, delay=6.4537, d_freq=1419335731.31, chirp=-4.3893, fft_len=128k
Autocorr: peak=22859.85, time=33.55, delay=2.3812, d_freq=1419336090.18, chirp=4.5501, fft_len=128k
Autocorr: peak=121.3323, time=46.98, delay=5.7202, d_freq=1419336160.63, chirp=4.7498, fft_len=128k
Autocorr: peak=13714.45, time=100.7, delay=4.809, d_freq=1419335450.16, chirp=-4.8413, fft_len=128k
Autocorr: peak=70.46099, time=33.55, delay=0.98703, d_freq=1419336103.92, chirp=4.9596, fft_len=128k
Autocorr: peak=1242.217, time=6.711, delay=5.7036, d_freq=1419335904, chirp=-4.9919, fft_len=128k
Autocorr: peak=654.6003, time=46.98, delay=0.98478, d_freq=1419335697.96, chirp=-5.0991, fft_len=128k
Autocorr: peak=28929.93, time=73.82, delay=5.2184, d_freq=1419335553.24, chirp=-5.2054, fft_len=128k
ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance
GPU device synched


Check your cooling, Intel GPU drivers, and Memory speeds.

Claggy
ID: 1634239 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1634311 - Posted: 28 Jan 2015, 23:21:04 UTC
Last modified: 28 Jan 2015, 23:21:46 UTC

This error can happen when the hard drive needs defrag or your host needs a reboot.


With each crime and every kindness we birth our future.
ID: 1634311 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1634465 - Posted: 29 Jan 2015, 8:33:21 UTC - in response to Message 1634311.  

This error can happen when the hard drive needs defrag or your host needs a reboot.


The hard drive is a SSD anyway, ands the host doesn't run 24/7 and is powered off every night. But thanks anyway, any and all ideas are welcome ;-)

P.
ID: 1634465 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1634466 - Posted: 29 Jan 2015, 8:35:28 UTC - in response to Message 1634239.  

Have finally managed to get my |Intel |GPU crunching, but all its wu's are erroring, with the following error

ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance

Sounds like the validness check code is working correctly, all your tasks overflow:

Check your cooling, Intel GPU drivers, and Memory speeds.

Claggy



Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version.

P.
ID: 1634466 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1634474 - Posted: 29 Jan 2015, 9:10:31 UTC - in response to Message 1634466.  

Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version.

P.

'Latest' driver is not necessarily 'most compatible' with an older application.

The same problem is affecting the Einstein@Home project, and the project administrator (Bernd Machenschalk) is going to look into it when he gets back to his desk next week, unless some more urgent crisis intervenes.
ID: 1634474 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1634518 - Posted: 29 Jan 2015, 12:18:55 UTC - in response to Message 1634474.  

Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version.

P.

'Latest' driver is not necessarily 'most compatible' with an older application.

The same problem is affecting the Einstein@Home project, and the project administrator (Bernd Machenschalk) is going to look into it when he gets back to his desk next week, unless some more urgent crisis intervenes.


I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's?
( thinking that may have some bearing on the issue)

P.
ID: 1634518 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1634521 - Posted: 29 Jan 2015, 12:38:07 UTC - in response to Message 1634518.  

I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's?
( thinking that may have some bearing on the issue)

You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass:

http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

Claggy
ID: 1634521 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1634524 - Posted: 29 Jan 2015, 12:54:15 UTC - in response to Message 1634521.  

I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's?
( thinking that may have some bearing on the issue)

You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass:

http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

Claggy


Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-)

P.
ID: 1634524 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1634696 - Posted: 29 Jan 2015, 22:07:50 UTC - in response to Message 1634524.  
Last modified: 29 Jan 2015, 22:46:58 UTC

I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's?
( thinking that may have some bearing on the issue)

You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass:

http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

Claggy


Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-)

P.

Here, try this:

<app_config>
  <app>
  <name>setiathome_v7</name>
    <gpu_versions>
      <gpu_usage>0.5</gpu_usage>
      <cpu_usage>0.05</cpu_usage>
    </gpu_versions>
   </app>
   <app>
   <name>astropulse_v7</name>
     <gpu_versions>
       <gpu_usage>1.0</gpu_usage>
       <cpu_usage>0.05</cpu_usage>
     </gpu_versions>
   </app>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>opencl_intel_gpu_sah</plan_class>
          <avg_ncpus>0.05</avg_ncpus>
          <ngpus>1.0</ngpus>
   </app_version>
</app_config>


Edit: <avg_ncpus> and <ngpus> values swapped around, Thanks Richard, Why would they have those two entries the other way around? <gpu_versions> has the <gpu_usage> first, <app_version> section has <ngpus> second.


Claggy
ID: 1634696 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1634706 - Posted: 29 Jan 2015, 22:23:42 UTC - in response to Message 1634696.  

<app_version>
<app_name>setiathome_v7</app_name>
<plan_class>opencl_intel_gpu_sah</plan_class>
<avg_ncpus>1.0</avg_ncpus>
<ngpus>0.05</ngpus>
</app_version>

Not sure I like the look of that. 20 tasks on the iGPU, if you have enough CPUs to support them?
ID: 1634706 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1635613 - Posted: 31 Jan 2015, 9:53:56 UTC - in response to Message 1634696.  

I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's?
( thinking that may have some bearing on the issue)

You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass:

http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

Claggy


Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-)

P.

Here, try this:

<app_config>
  <app>
  <name>setiathome_v7</name>
    <gpu_versions>
      <gpu_usage>0.5</gpu_usage>
      <cpu_usage>0.05</cpu_usage>
    </gpu_versions>
   </app>
   <app>
   <name>astropulse_v7</name>
     <gpu_versions>
       <gpu_usage>1.0</gpu_usage>
       <cpu_usage>0.05</cpu_usage>
     </gpu_versions>
   </app>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>opencl_intel_gpu_sah</plan_class>
          <avg_ncpus>0.05</avg_ncpus>
          <ngpus>1.0</ngpus>
   </app_version>
</app_config>


Edit: <avg_ncpus> and <ngpus> values swapped around, Thanks Richard, Why would they have those two entries the other way around? <gpu_versions> has the <gpu_usage> first, <app_version> section has <ngpus> second.


Claggy


Thanks Claggy, I'll give that a try later this week.

P.
ID: 1635613 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1636064 - Posted: 1 Feb 2015, 12:58:21 UTC - in response to Message 1634696.  

I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's?
( thinking that may have some bearing on the issue)

You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass:

http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

Claggy


Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-)

P.

Here, try this:

<app_config>
  <app>
  <name>setiathome_v7</name>
    <gpu_versions>
      <gpu_usage>0.5</gpu_usage>
      <cpu_usage>0.05</cpu_usage>
    </gpu_versions>
   </app>
   <app>
   <name>astropulse_v7</name>
     <gpu_versions>
       <gpu_usage>1.0</gpu_usage>
       <cpu_usage>0.05</cpu_usage>
     </gpu_versions>
   </app>
   <app_version>
       <app_name>setiathome_v7</app_name>
       <plan_class>opencl_intel_gpu_sah</plan_class>
          <avg_ncpus>0.05</avg_ncpus>
          <ngpus>1.0</ngpus>
   </app_version>
</app_config>


Edit: <avg_ncpus> and <ngpus> values swapped around, Thanks Richard, Why would they have those two entries the other way around? <gpu_versions> has the <gpu_usage> first, <app_version> section has <ngpus> second.


Claggy



I keep getting "missing </app_version>" ??
I've checked it against the original post, even the line spacings is the same, I'm flummoxed ;-)

P.
ID: 1636064 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1636065 - Posted: 1 Feb 2015, 13:01:31 UTC - in response to Message 1636064.  



I keep getting "missing </app_version>" ??
I've checked it against the original post, even the line spacings is the same, I'm flummoxed ;-)

P.



forget that comment, typo on my part, sorry ;-)

P.
ID: 1636065 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1636260 - Posted: 1 Feb 2015, 21:27:41 UTC

ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled.
I have a later driver, (3960) that I'll try tomorrow.

P.
ID: 1636260 · Report as offensive
Profile RFGuy_KCCO Project Donor
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 2
Credit: 52,274,229
RAC: 0
United States
Message 1636873 - Posted: 3 Feb 2015, 4:40:11 UTC - in response to Message 1636260.  
Last modified: 3 Feb 2015, 4:47:51 UTC

ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled.
I have a later driver, (3960) that I'll try tomorrow.

P.


If the Intel bug is working here like it does at Einstein, and I am fairly sure it is, then you will find that some WU's will fail and some will pass. Whether the WU passes or fails depends on which Intel driver you and your wingman were running: if you were both running the same newer driver, or any of the newer drivers with this "bug," your WU will probably pass. If you were running one of the newer drivers and your wingman was running one of the older drivers, your WU will almost certainly fail. If you were both running the same older driver, or any of the older drivers without this "bug," your WU will probably pass. It is a very odd issue. I wish the project admins luck in figuring it out.
ID: 1636873 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1636906 - Posted: 3 Feb 2015, 7:52:01 UTC - in response to Message 1636873.  

ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled.
I have a later driver, (3960) that I'll try tomorrow.

P.


If the Intel bug is working here like it does at Einstein, and I am fairly sure it is, then you will find that some WU's will fail and some will pass. Whether the WU passes or fails depends on which Intel driver you and your wingman were running: if you were both running the same newer driver, or any of the newer drivers with this "bug," your WU will probably pass. If you were running one of the newer drivers and your wingman was running one of the older drivers, your WU will almost certainly fail. If you were both running the same older driver, or any of the older drivers without this "bug," your WU will probably pass. It is a very odd issue. I wish the project admins luck in figuring it out.

It was noted that earlier versions of MB7 OpenCL GPU apps had an unfortunate tendency to "pass" validation with false Autocorr overflows. The error Phil is seeing (still with 3960) was inserted to keep those false signals out of the science database.

The Autocorr threshold is 17.8, with normal processing a peak greater than the low 20s is very rare. There's a theoretical maximum just less than 64K. The observed false Autocorrs have had peaks above 100 and cause overflow, so that combination of conditions is declared an error by the OpenCL apps. A single peak above 100 isn't declared an error because a few have been seen in CPU results.
                                                                  Joe
ID: 1636906 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1636921 - Posted: 3 Feb 2015, 9:27:51 UTC

Further Update. After having tried 3 drivers (3308, 3960 & 4080), I've given up. Aside from the 2 successful wu's early on, all the rest failed with the same initially reported error. Since I can't see the point of using up resources and getting nowhere, I chucked in the towel...for now ;-)

P.
ID: 1636921 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1636926 - Posted: 3 Feb 2015, 10:00:42 UTC - in response to Message 1636921.  

Sorry to hear that. Talking of drivers, I'm surprised that nobody in this thread (including me - mea culpa) has linked you back to the previous thread, largely about drivers for Intel GPUs: Intel gpu not seen by BOINC

My observation, I think in general supported by other users, is that the best and only recommended driver for an HD 4600 is 3621, which can be downloaded from http://downloadmirror.intel.com/23885/a08/win64_153322.zip
ID: 1636926 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Intel GPU errors


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.