Intel GPU errors

Author	Message
Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1634224 - Posted: 28 Jan 2015, 20:40:33 UTC Have finally managed to get my \|Intel \|GPU crunching, but all its wu's are erroring, with the following error ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance Any thoughts from the gurus? one of the tasks is 3939457900, but they all give the same error line. P. ID: 1634224 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1634234 - Posted: 28 Jan 2015, 21:07:19 UTC - in response to Message 1634224. Have finally managed to get my \|Intel \|GPU crunching, but all its wu's are erroring, with the following error ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance Any thoughts from the gurus? one of the tasks is 3939457900, but they all give the same error line. P. That's a very specific error message added by Raistmer (the programmer) - it would perhaps be best to wait until he can visit here and advise. But the combination of Intel(R) HD Graphics 4600 GPU with Driver version 10.18.14.4080 is currently under suspicion. ID: 1634234 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1634239 - Posted: 28 Jan 2015, 21:15:28 UTC - in response to Message 1634224. Have finally managed to get my \|Intel \|GPU crunching, but all its wu's are erroring, with the following error ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance Sounds like the validness check code is working correctly, all your tasks overflow: Triplet: peak=8.547997, time=52.55, period=0.3736, d_freq=1419807128.91, chirp=0, fft_len=16 Triplet: peak=8.817341, time=75.98, period=0.1327, d_freq=1419809265.14, chirp=0, fft_len=32 Autocorr: peak=2454.044, time=6.711, delay=2.864, d_freq=1419804686.78, chirp=-0.10721, fft_len=128k Autocorr: peak=696.5111, time=100.7, delay=4.3372, d_freq=1419804672.15, chirp=-0.1525, fft_len=128k Autocorr: peak=20499.46, time=100.7, delay=1.4327, d_freq=1419804647.96, chirp=-0.39281, fft_len=128k Autocorr: peak=45.03565, time=20.13, delay=4.5878, d_freq=1419804701.51, chirp=0.69597, fft_len=128k Autocorr: peak=62.24554, time=73.82, delay=0.30515, d_freq=1419804616.88, chirp=-0.95661, fft_len=128k Autocorr: peak=30918.25, time=87.24, delay=0.63416, d_freq=1419804804.82, chirp=1.3448, fft_len=128k Autocorr: peak=8091.252, time=6.711, delay=3.105, d_freq=1419804676.12, chirp=-1.6951, fft_len=128k Autocorr: peak=17076.9, time=87.24, delay=3.4655, d_freq=1419804526.31, chirp=-1.8476, fft_len=128k Autocorr: peak=1723.94, time=73.82, delay=3.1788, d_freq=1419804836.65, chirp=2.0204, fft_len=128k Autocorr: peak=17255.18, time=87.24, delay=0.76749, d_freq=1419804871.43, chirp=2.1082, fft_len=128k Autocorr: peak=26448.92, time=73.82, delay=1.5059, d_freq=1419804859.3, chirp=2.3273, fft_len=128k Autocorr: peak=1268.563, time=46.98, delay=5.2845, d_freq=1419804572.62, chirp=-2.4456, fft_len=128k Autocorr: peak=4844.813, time=100.7, delay=1.0539, d_freq=1419804428.29, chirp=-2.575, fft_len=128k Autocorr: peak=4292.945, time=73.82, delay=4.0259, d_freq=1419804877.72, chirp=2.5768, fft_len=128k Autocorr: peak=11046.95, time=87.24, delay=4.3223, d_freq=1419804451.97, chirp=-2.6998, fft_len=128k Autocorr: peak=15692.66, time=6.711, delay=4.8716, d_freq=1419804667.48, chirp=-2.9826, fft_len=128k Autocorr: peak=215.7986, time=60.4, delay=1.9192, d_freq=1419804502.89, chirp=-3.0565, fft_len=128k Autocorr: peak=68.29662, time=73.82, delay=3.8869, d_freq=1419804461.66, chirp=-3.0593, fft_len=128k Autocorr: peak=14624.52, time=100.7, delay=6.4161, d_freq=1419804379.08, chirp=-3.0639, fft_len=128k Autocorr: peak=11016.77, time=60.4, delay=3.5723, d_freq=1419804872.89, chirp=3.0695, fft_len=128k Autocorr: peak=32322.29, time=46.98, delay=3.7049, d_freq=1419804538.45, chirp=-3.173, fft_len=128k Autocorr: peak=18.41169, time=33.55, delay=3.1511, d_freq=1419804795.27, chirp=3.2118, fft_len=128k Autocorr: peak=9691.82, time=73.82, delay=1.5176, d_freq=1419804929.92, chirp=3.2839, fft_len=128k Autocorr: peak=20102.54, time=87.24, delay=1.5192, d_freq=1419804400.77, chirp=-3.2867, fft_len=128k Autocorr: peak=3451.05, time=46.98, delay=4.8026, d_freq=1419804842.5, chirp=3.2996, fft_len=128k Autocorr: peak=19856.27, time=46.98, delay=0.15135, d_freq=1419804848.19, chirp=3.4207, fft_len=128k Autocorr: peak=11943.83, time=87.24, delay=1.8566, d_freq=1419804988.83, chirp=3.454, fft_len=128k Autocorr: peak=6399.836, time=20.13, delay=5.3675, d_freq=1419804761.91, chirp=3.6961, fft_len=128k ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance GPU device synched Autocorr: peak=14751.21, time=33.55, delay=4.1637, d_freq=1419287103.39, chirp=-0.17838, fft_len=128k Autocorr: peak=25.57696, time=100.7, delay=2.4548, d_freq=1419287261.4, chirp=1.5102, fft_len=128k Autocorr: peak=1343.028, time=6.711, delay=2.6708, d_freq=1419287121.35, chirp=1.7838, fft_len=128k Autocorr: peak=9897.263, time=60.4, delay=0.20152, d_freq=1419286995.5, chirp=-1.8855, fft_len=128k Autocorr: peak=103.3339, time=87.24, delay=6.2265, d_freq=1419287279.92, chirp=1.9548, fft_len=128k Autocorr: peak=57.61849, time=73.82, delay=3.1857, d_freq=1419286962.96, chirp=-1.9835, fft_len=128k Autocorr: peak=6697.424, time=73.82, delay=1.4043, d_freq=1419286960.64, chirp=-2.0149, fft_len=128k Autocorr: peak=5510.812, time=6.711, delay=1.9207, d_freq=1419287123.04, chirp=2.0361, fft_len=128k Autocorr: peak=15392.84, time=6.711, delay=1.0162, d_freq=1419287123.16, chirp=2.0546, fft_len=128k Autocorr: peak=34574.33, time=20.13, delay=1.4075, d_freq=1419287160.71, chirp=2.55, fft_len=128k Autocorr: peak=45.70876, time=60.4, delay=6.2855, d_freq=1419286948.99, chirp=-2.6554, fft_len=128k Autocorr: peak=4250.707, time=87.24, delay=1.567, d_freq=1419287351.68, chirp=2.7774, fft_len=128k Autocorr: peak=3688.459, time=60.4, delay=6.4526, d_freq=1419286938.72, chirp=-2.8255, fft_len=128k Autocorr: peak=126.2132, time=60.4, delay=3.266, d_freq=1419287297.89, chirp=3.1212, fft_len=128k Autocorr: peak=17.8795, time=100.7, delay=5.513, d_freq=1419287431.01, chirp=3.1952, fft_len=128k Autocorr: peak=17.99612, time=100.7, delay=5.513, d_freq=1419287431.38, chirp=3.1989, fft_len=128k Autocorr: peak=156.5406, time=33.55, delay=0.65556, d_freq=1419287218.88, chirp=3.2636, fft_len=128k Autocorr: peak=17035.64, time=87.24, delay=4.7367, d_freq=1419287397.88, chirp=3.307, fft_len=128k Autocorr: peak=3129.808, time=87.24, delay=5.6233, d_freq=1419287405.3, chirp=3.392, fft_len=128k Autocorr: peak=29094.74, time=87.24, delay=4.7223, d_freq=1419287405.62, chirp=3.3957, fft_len=128k Autocorr: peak=31.32333, time=46.98, delay=1.1881, d_freq=1419286946.25, chirp=-3.4724, fft_len=128k Autocorr: peak=5978.248, time=100.7, delay=4.7816, d_freq=1419286745.87, chirp=-3.6111, fft_len=128k Autocorr: peak=263.2868, time=73.82, delay=1.8172, d_freq=1419287376.76, chirp=3.6222, fft_len=128k Autocorr: peak=395.3669, time=100.7, delay=2.9232, d_freq=1419287475.2, chirp=3.6342, fft_len=128k Autocorr: peak=48.34214, time=46.98, delay=3.6454, d_freq=1419286920.55, chirp=-4.0196, fft_len=128k Autocorr: peak=501.7761, time=6.711, delay=1.896, d_freq=1419287136.66, chirp=4.0658, fft_len=128k Autocorr: peak=3858.504, time=73.82, delay=5.2334, d_freq=1419287416.13, chirp=4.1555, fft_len=128k Autocorr: peak=30.21256, time=100.7, delay=1.219, d_freq=1419286688, chirp=-4.186, fft_len=128k Autocorr: peak=49.23941, time=100.7, delay=0.9345, d_freq=1419287532.05, chirp=4.1989, fft_len=128k Autocorr: peak=266.2084, time=33.55, delay=0.18688, d_freq=1419286958.99, chirp=-4.4817, fft_len=128k ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance GPU device synched Autocorr: peak=5897.015, time=60.4, delay=1.6539, d_freq=1420273441.97, chirp=0.073941, fft_len=128k Autocorr: peak=3252.252, time=46.98, delay=4.7843, d_freq=1420273445.23, chirp=0.16452, fft_len=128k Autocorr: peak=17993.13, time=60.4, delay=0.9983, d_freq=1420273465.52, chirp=0.46398, fft_len=128k Autocorr: peak=16798.98, time=33.55, delay=4.4058, d_freq=1420273417.34, chirp=-0.60077, fft_len=128k Autocorr: peak=98.24163, time=20.13, delay=5.5876, d_freq=1420273459.03, chirp=1.0694, fft_len=128k Autocorr: peak=12256.99, time=33.55, delay=1.9765, d_freq=1420273397.37, chirp=-1.196, fft_len=128k Autocorr: peak=1083.238, time=20.13, delay=3.9474, d_freq=1420273411.54, chirp=-1.2893, fft_len=128k Autocorr: peak=1770.725, time=33.55, delay=1.5944, d_freq=1420273482.31, chirp=1.3356, fft_len=128k Autocorr: peak=2522.959, time=60.4, delay=5.7018, d_freq=1420273356.28, chirp=-1.3448, fft_len=128k Autocorr: peak=16738.13, time=73.82, delay=1.8275, d_freq=1420273324.31, chirp=-1.5333, fft_len=128k Autocorr: peak=41.74892, time=20.13, delay=3.4506, d_freq=1420273470.83, chirp=1.6554, fft_len=128k Autocorr: peak=195.8669, time=73.82, delay=6.2309, d_freq=1420273284.19, chirp=-2.0768, fft_len=128k Autocorr: peak=390.3254, time=60.4, delay=0.22118, d_freq=1420273308.88, chirp=-2.1295, fft_len=128k Autocorr: peak=20503.13, time=33.55, delay=4.3223, d_freq=1420273365.21, chirp=-2.1545, fft_len=128k Autocorr: peak=22215.29, time=33.55, delay=0.96737, d_freq=1420273529.83, chirp=2.7515, fft_len=128k Autocorr: peak=226.077, time=6.711, delay=1.5502, d_freq=1420273417.37, chirp=-3.0002, fft_len=128k Autocorr: peak=886.3749, time=73.82, delay=2.6547, d_freq=1420273214.67, chirp=-3.0186, fft_len=128k Autocorr: peak=15582.75, time=60.4, delay=6.4504, d_freq=1420273620.21, chirp=3.0251, fft_len=128k Autocorr: peak=106.1889, time=33.55, delay=1.7649, d_freq=1420273334.69, chirp=-3.0639, fft_len=128k Autocorr: peak=1560.656, time=33.55, delay=6.5372, d_freq=1420273326.91, chirp=-3.2959, fft_len=128k Autocorr: peak=8157.41, time=6.711, delay=0.96061, d_freq=1420273459.65, chirp=3.3005, fft_len=128k Autocorr: peak=27127.73, time=60.4, delay=4.8073, d_freq=1420273654.32, chirp=3.5898, fft_len=128k Autocorr: peak=107.7386, time=100.7, delay=0.95898, d_freq=1420273064.23, chirp=-3.7081, fft_len=128k Autocorr: peak=7314.886, time=87.24, delay=2.2364, d_freq=1420273763.1, chirp=3.7322, fft_len=128k Autocorr: peak=756.7602, time=46.98, delay=3.5802, d_freq=1420273261.87, chirp=-3.7386, fft_len=128k Autocorr: peak=517.3344, time=20.13, delay=1.9796, d_freq=1420273352.29, chirp=-4.2322, fft_len=128k Autocorr: peak=21035.37, time=100.7, delay=1.8093, d_freq=1420272975.47, chirp=-4.5899, fft_len=128k Autocorr: peak=15109.07, time=87.24, delay=4.3752, d_freq=1420273009.33, chirp=-4.9078, fft_len=128k Autocorr: peak=23918.48, time=6.711, delay=4.3372, d_freq=1420273470.86, chirp=4.9716, fft_len=128k Autocorr: peak=12715.65, time=33.55, delay=0.33618, d_freq=1420273256.6, chirp=-5.3912, fft_len=128k ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance GPU device synched Triplet: peak=6.946707, time=29.28, period=0.04588, d_freq=1419336395.26, chirp=0, fft_len=64 Autocorr: peak=3243.452, time=33.55, delay=1.4358, d_freq=1419335916.19, chirp=-0.63497, fft_len=128k Autocorr: peak=168.9512, time=46.98, delay=3.6067, d_freq=1419335978.66, chirp=0.8762, fft_len=128k Autocorr: peak=15095.81, time=73.82, delay=6.0169, d_freq=1419336012.76, chirp=1.0195, fft_len=128k Autocorr: peak=958.197, time=87.24, delay=6.4478, d_freq=1419335830.18, chirp=-1.2302, fft_len=128k Autocorr: peak=21070.54, time=46.98, delay=0.22651, d_freq=1419336004.97, chirp=1.4363, fft_len=128k Autocorr: peak=35051.41, time=87.24, delay=4.9076, d_freq=1419335804.94, chirp=-1.5195, fft_len=128k Autocorr: peak=601.4368, time=100.7, delay=4.4932, d_freq=1419336126.46, chirp=1.8772, fft_len=128k Autocorr: peak=432.6196, time=20.13, delay=0.6658, d_freq=1419335976.43, chirp=1.9336, fft_len=128k Autocorr: peak=121.9143, time=73.82, delay=5.6659, d_freq=1419336085.42, chirp=2.0038, fft_len=128k Autocorr: peak=97.1031, time=6.711, delay=4.2734, d_freq=1419335953.16, chirp=2.3328, fft_len=128k Autocorr: peak=9652.735, time=100.7, delay=3.8919, d_freq=1419336175.68, chirp=2.3661, fft_len=128k Autocorr: peak=5053.325, time=46.98, delay=6.041, d_freq=1419336064.28, chirp=2.6988, fft_len=128k Autocorr: peak=7405.501, time=20.13, delay=2.3535, d_freq=1419335880.5, chirp=-2.831, fft_len=128k Autocorr: peak=58.63932, time=100.7, delay=4.0046, d_freq=1419336224.62, chirp=2.8523, fft_len=128k Autocorr: peak=225.3221, time=100.7, delay=5.0924, d_freq=1419336240.34, chirp=3.0085, fft_len=128k Autocorr: peak=58.33301, time=60.4, delay=6.3096, d_freq=1419336122.72, chirp=3.0667, fft_len=128k Autocorr: peak=85.48724, time=73.82, delay=5.3185, d_freq=1419335656.19, chirp=-3.8107, fft_len=128k Autocorr: peak=11729.14, time=46.98, delay=3.0573, d_freq=1419335747.07, chirp=-4.0538, fft_len=128k Autocorr: peak=12947.7, time=73.82, delay=1.8304, d_freq=1419336242.62, chirp=4.1333, fft_len=128k Autocorr: peak=23785.53, time=33.55, delay=2.6709, d_freq=1419335793.54, chirp=-4.2904, fft_len=128k Autocorr: peak=2226.989, time=20.13, delay=2.9282, d_freq=1419336025.57, chirp=4.3745, fft_len=128k Autocorr: peak=16238.21, time=46.98, delay=6.4537, d_freq=1419335731.31, chirp=-4.3893, fft_len=128k Autocorr: peak=22859.85, time=33.55, delay=2.3812, d_freq=1419336090.18, chirp=4.5501, fft_len=128k Autocorr: peak=121.3323, time=46.98, delay=5.7202, d_freq=1419336160.63, chirp=4.7498, fft_len=128k Autocorr: peak=13714.45, time=100.7, delay=4.809, d_freq=1419335450.16, chirp=-4.8413, fft_len=128k Autocorr: peak=70.46099, time=33.55, delay=0.98703, d_freq=1419336103.92, chirp=4.9596, fft_len=128k Autocorr: peak=1242.217, time=6.711, delay=5.7036, d_freq=1419335904, chirp=-4.9919, fft_len=128k Autocorr: peak=654.6003, time=46.98, delay=0.98478, d_freq=1419335697.96, chirp=-5.0991, fft_len=128k Autocorr: peak=28929.93, time=73.82, delay=5.2184, d_freq=1419335553.24, chirp=-5.2054, fft_len=128k ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance GPU device synched Check your cooling, Intel GPU drivers, and Memory speeds. Claggy ID: 1634239 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1634311 - Posted: 28 Jan 2015, 23:21:04 UTC Last modified: 28 Jan 2015, 23:21:46 UTC This error can happen when the hard drive needs defrag or your host needs a reboot. With each crime and every kindness we birth our future. ID: 1634311 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1634465 - Posted: 29 Jan 2015, 8:33:21 UTC - in response to Message 1634311. This error can happen when the hard drive needs defrag or your host needs a reboot. The hard drive is a SSD anyway, ands the host doesn't run 24/7 and is powered off every night. But thanks anyway, any and all ideas are welcome ;-) P. ID: 1634465 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1634466 - Posted: 29 Jan 2015, 8:35:28 UTC - in response to Message 1634239. Have finally managed to get my \|Intel \|GPU crunching, but all its wu's are erroring, with the following error ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance Sounds like the validness check code is working correctly, all your tasks overflow: Check your cooling, Intel GPU drivers, and Memory speeds. Claggy Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version. P. ID: 1634466 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1634474 - Posted: 29 Jan 2015, 9:10:31 UTC - in response to Message 1634466. Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version. P. 'Latest' driver is not necessarily 'most compatible' with an older application. The same problem is affecting the Einstein@Home project, and the project administrator (Bernd Machenschalk) is going to look into it when he gets back to his desk next week, unless some more urgent crisis intervenes. ID: 1634474 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1634518 - Posted: 29 Jan 2015, 12:18:55 UTC - in response to Message 1634474. Ok, I'll check those, though from Richards comment, it seems the driver is suspect, though its the latest version. P. 'Latest' driver is not necessarily 'most compatible' with an older application. The same problem is affecting the Einstein@Home project, and the project administrator (Bernd Machenschalk) is going to look into it when he gets back to his desk next week, unless some more urgent crisis intervenes. I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? ( thinking that may have some bearing on the issue) P. ID: 1634518 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1634521 - Posted: 29 Jan 2015, 12:38:07 UTC - in response to Message 1634518. I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? ( thinking that may have some bearing on the issue) You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass: http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration Claggy ID: 1634521 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1634524 - Posted: 29 Jan 2015, 12:54:15 UTC - in response to Message 1634521. I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? ( thinking that may have some bearing on the issue) You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass: http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration Claggy Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-) P. ID: 1634524 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1634696 - Posted: 29 Jan 2015, 22:07:50 UTC - in response to Message 1634524. Last modified: 29 Jan 2015, 22:46:58 UTC I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? ( thinking that may have some bearing on the issue) You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass: http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration Claggy Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-) P. Here, try this: <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> <app_version> <app_name>setiathome_v7</app_name> <plan_class>opencl_intel_gpu_sah</plan_class> <avg_ncpus>0.05</avg_ncpus> <ngpus>1.0</ngpus> </app_version> </app_config> Edit: <avg_ncpus> and <ngpus> values swapped around, Thanks Richard, Why would they have those two entries the other way around? <gpu_versions> has the <gpu_usage> first, <app_version> section has <ngpus> second. Claggy ID: 1634696 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1634706 - Posted: 29 Jan 2015, 22:23:42 UTC - in response to Message 1634696. <app_version> <app_name>setiathome_v7</app_name> <plan_class>opencl_intel_gpu_sah</plan_class> <avg_ncpus>1.0</avg_ncpus> <ngpus>0.05</ngpus> </app_version> Not sure I like the look of that. 20 tasks on the iGPU, if you have enough CPUs to support them? ID: 1634706 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1635613 - Posted: 31 Jan 2015, 9:53:56 UTC - in response to Message 1634696. I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? ( thinking that may have some bearing on the issue) You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass: http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration Claggy Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-) P. Here, try this: <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> <app_version> <app_name>setiathome_v7</app_name> <plan_class>opencl_intel_gpu_sah</plan_class> <avg_ncpus>0.05</avg_ncpus> <ngpus>1.0</ngpus> </app_version> </app_config> Edit: <avg_ncpus> and <ngpus> values swapped around, Thanks Richard, Why would they have those two entries the other way around? <gpu_versions> has the <gpu_usage> first, <app_version> section has <ngpus> second. Claggy Thanks Claggy, I'll give that a try later this week. P. ID: 1635613 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1636064 - Posted: 1 Feb 2015, 12:58:21 UTC - in response to Message 1634696. I realise that. One question, I was running 2 wu's on the ATO GPU, and also on the Intel GPU, is it possible to just run 1 wu on one gpu and 2 on other gpu's? ( thinking that may have some bearing on the issue) You can use the <app_version> portion of an app_config.xml to specify a different config for a different planclass: http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration Claggy Thanks for that Claggy, it'll take me a while to decipher the gobbeldy gook in that link, but at least it'll give me summat to do ;-) P. Here, try this: <app_config> <app> <name>setiathome_v7</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> <app_version> <app_name>setiathome_v7</app_name> <plan_class>opencl_intel_gpu_sah</plan_class> <avg_ncpus>0.05</avg_ncpus> <ngpus>1.0</ngpus> </app_version> </app_config> Edit: <avg_ncpus> and <ngpus> values swapped around, Thanks Richard, Why would they have those two entries the other way around? <gpu_versions> has the <gpu_usage> first, <app_version> section has <ngpus> second. Claggy I keep getting "missing </app_version>" ?? I've checked it against the original post, even the line spacings is the same, I'm flummoxed ;-) P. ID: 1636064 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1636065 - Posted: 1 Feb 2015, 13:01:31 UTC - in response to Message 1636064. I keep getting "missing </app_version>" ?? I've checked it against the original post, even the line spacings is the same, I'm flummoxed ;-) P. forget that comment, typo on my part, sorry ;-) P. ID: 1636065 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1636260 - Posted: 1 Feb 2015, 21:27:41 UTC ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled. I have a later driver, (3960) that I'll try tomorrow. P. ID: 1636260 ·

RFGuy_KCCO Volunteer tester Send message Joined: 3 Apr 99 Posts: 2 Credit: 52,274,229 RAC: 0	Message 1636873 - Posted: 3 Feb 2015, 4:40:11 UTC - in response to Message 1636260. Last modified: 3 Feb 2015, 4:47:51 UTC ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled. I have a later driver, (3960) that I'll try tomorrow. P. If the Intel bug is working here like it does at Einstein, and I am fairly sure it is, then you will find that some WU's will fail and some will pass. Whether the WU passes or fails depends on which Intel driver you and your wingman were running: if you were both running the same newer driver, or any of the newer drivers with this "bug," your WU will probably pass. If you were running one of the newer drivers and your wingman was running one of the older drivers, your WU will almost certainly fail. If you were both running the same older driver, or any of the older drivers without this "bug," your WU will probably pass. It is a very odd issue. I wish the project admins luck in figuring it out. ID: 1636873 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1636906 - Posted: 3 Feb 2015, 7:52:01 UTC - in response to Message 1636873. ok, an update. After installing Claggy's app_config, the 1st wu processed with the same error. I then rolled back the driver from the latest (4080) to an much earlier one (3308), but I think this one may be too early for a Haswell CPU. Anyways, the next wu succedded, but the 3rd errored. so far, out of 6 wu's 2 went ok, 4 errored. Another odd thing, on the 2nd wu, I was using gpu-z to check the temp, which it said was 60 deg C, with a gpu load of over 90%. Currently, the temp is around 60 still, but the gpu load is zero. That has me puzzled. I have a later driver, (3960) that I'll try tomorrow. P. If the Intel bug is working here like it does at Einstein, and I am fairly sure it is, then you will find that some WU's will fail and some will pass. Whether the WU passes or fails depends on which Intel driver you and your wingman were running: if you were both running the same newer driver, or any of the newer drivers with this "bug," your WU will probably pass. If you were running one of the newer drivers and your wingman was running one of the older drivers, your WU will almost certainly fail. If you were both running the same older driver, or any of the older drivers without this "bug," your WU will probably pass. It is a very odd issue. I wish the project admins luck in figuring it out. It was noted that earlier versions of MB7 OpenCL GPU apps had an unfortunate tendency to "pass" validation with false Autocorr overflows. The error Phil is seeing (still with 3960) was inserted to keep those false signals out of the science database. The Autocorr threshold is 17.8, with normal processing a peak greater than the low 20s is very rare. There's a theoretical maximum just less than 64K. The observed false Autocorrs have had peaks above 100 and cause overflow, so that combination of conditions is declared an error by the OpenCL apps. A single peak above 100 isn't declared an error because a few have been seen in CPU results. Joe ID: 1636906 ·

Phil Burden Send message Joined: 26 Oct 00 Posts: 264 Credit: 22,303,899 RAC: 0	Message 1636921 - Posted: 3 Feb 2015, 9:27:51 UTC Further Update. After having tried 3 drivers (3308, 3960 & 4080), I've given up. Aside from the 2 successful wu's early on, all the rest failed with the same initially reported error. Since I can't see the point of using up resources and getting nowhere, I chucked in the towel...for now ;-) P. ID: 1636921 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1636926 - Posted: 3 Feb 2015, 10:00:42 UTC - in response to Message 1636921. Sorry to hear that. Talking of drivers, I'm surprised that nobody in this thread (including me - mea culpa) has linked you back to the previous thread, largely about drivers for Intel GPUs: Intel gpu not seen by BOINC My observation, I think in general supported by other users, is that the best and only recommended driver for an HD 4600 is 3621, which can be downloaded from http://downloadmirror.intel.com/23885/a08/win64_153322.zip ID: 1636926 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.