Message boards :
Number crunching :
To Many ERRORS
Message board moderation
Author | Message |
---|---|
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
http://setiathome.berkeley.edu/result.php?resultid=2530015233 an example of many I've uninstalled the ati 12.6 drivers and reinstalled the standard drivers that came with the card. I'm not sure why the card isn't getting work loaded properly In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
98 views and not one suggestion or idea what is happening here? I could use the help In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
None of the developers seem to be online at the moment. We're thinking about it. |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Personally I hadn't commented because it's the ATI GPU app with which I've no experience, so I usually leave that to Mike and Raistmer to sort out. Superficially getting tasks stuck and aborted by boinc is something that points to problems with the app or the host e.g. a bad driver. What's puzzeling is that you have strings of 'good' tasks interspersed with long running ones like http://setiathome.berkeley.edu/result.php?resultid=2530015217 it's a VHAR it should have taken some 200 sec like the other ones on the host. It managed to complete just inside the 10x cutoff. The others may be actually processing but too slow and get aborted (as opposed to being stuck). The card may be intermittently downclocking for some reason - any chance you can monitor that host to see if tasks are actually progressing and check the system for anomalies once a task goes past normal runtimes? At this point it might be anything - app, driver, windows updates, boinc version, whacky tasks... I'm not the Pope. I don't speak Ex Cathedra! |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
From your tasks it is still occurring with 12.2 or 12.3 that you downgraded to. If you are running more than 1 task at a time I would guess that could be the issue. If you were running 3 at a time and 2 VLAR hit at the same time as a VHAR the slow down caused by the VLAR's could make the VHAR run to long. If that is occurring then assigning tasks a value for their load or such would be a good idea. Where normal tasks would be a load of 1.0 and then VLAR's might have a rating of a value greater than 1.0 such as 1.1-1.5. Then the max load value could be assigned per processing device. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Since 2 days I'm experiencing errors on my ATI 5870 GPUs only on MB work. Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED. I freed up 1 core of the i7-2600, 2 threads since I'm using HT. GPU load is higher in first 15 seconds and I've seen error rate going down, but still unsure as to what's the exact reasaon? Vendor Advanced Micro Devices, Inc. Driver version: CAL 1.4.1720 (VM) Version: OpenCL 1.2 AMD-APP (923.1) Not on AstroPulse work. Nothing has or was changed since these errors occurred?! |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
Personally I hadn't commented because it's the ATI GPU app with which I've no experience, so I usually leave that to Mike and Raistmer to sort out. I did notice that the CPU load time is very short compared to work that actually completes. This makes me curious as to whether the WU is not loading properly or completely and it just sits there for 30 minutes and times out. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Personally I hadn't commented because it's the ATI GPU app with which I've no experience, so I usually leave that to Mike and Raistmer to sort out. Since it's only MB (6.10), the rev.390 app. for ATI, BOINC 7.0.28 (64bit) and WIN7 (64bit) which runs for some time, leaves only (?) wacky tasks and/or WIN 7 UPDates?! First time I see this kind of error and not only on VHAR, also VLAR. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Have you set a core free skildude ? It seems you are suffering from low GPU usage bug. Whats your DCF ? With each crime and every kindness we birth our future. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I've never had a problem with leaving a CPU core open for GPU work and as I recall this has never been an issue anywhere. the DCF for this machine is 3.354584 Is that good/bad I don't know. I can't remember which app I'm using if its the HD5 or not. whichever it is I will try the alternate app and see if that stops the errors In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Thats what i expected. Your DCF is to high, should be around 1. This combined with low GPU usage causing this errors. Have you set flops in appinfo ? Try to change DCF in client state.xml. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Personally I hadn't commented because it's the ATI GPU app with which I've no experience, so I usually leave that to Mike and Raistmer to sort out. [ADDED] Well, the errors have stopped, in 2.5 days (60 hours) suddenly Tiem Limit Exeeded. I already had left half a core, 1 thread free, now I've freed 1 core. Now none of the 8 threads are running 100% constantly. (SandyBridge is indeed quite another breed ;-)) . CPU 'clock' doesn't have a fixed value as well. This too can have an effect on the GPU(s). And you're running 3 instances_per_device, but have 32 Compute Units per GPU, mine (5870s) have 20 Compute Units and do 2 per GPU. Most mobos run PCIe slots, when both used, in 8x mode, mine too. Still fast enough if it's PCIe 2.0 or higher. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Fred this has nothing to do with it. My GPU only has 18 CU´s and i´m running 3 instances without any problems. Wrong estimates and low GPU usage only is the problem. On a FX you need to free 2 cores to get full GPU utilisation. You can´t compare a I7 with a FX CPU. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Fred this has nothing to do with it. An i7 isn't a FX CPU, true. And estimates incorrect, why 'out of the blue', since , atleast my host has run for a few month with this setting. Freeing 1 core is also adviseble, I use 1 core, 2threads. Looks like it's over........... But last received result, 24 hours ago, had the same error. And APR : SETI@home Enhanced (anonymous platform, ATI GPU) Number of tasks completed 11 Max tasks per day 218 Number of tasks today 0 Consecutive valid tasks 36 Average processing rate 681.28996334014 Average turnaround time 0.21 days |
Alan Send message Joined: 16 Jun 11 Posts: 4 Credit: 867,828 RAC: 0 |
Did you upgrade to BOINC 7.0.28? I had problems when I upgraded the estimated time dropped alot and I started getting time exceeded errors. I went back to 7.0.25. the time estimates went up and on new work I have not been having nearly as many error out. |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Thats what i expected. Actually, a high DCF is good in his particular case and could prevent further -197 even though it messes with cache. the long running task I linked will have pulled DCF up - so the estimates AND more crucially with it the 10x estimate abort limit are high (provided I'm getting my logic right) and if further tasks run long but inside the new margin they will keep DCF up even though the majority of tasks pulls DCF back to 1. That at least might allow error free processing until the underliying cause of the slowrunning WUs is found. I'm not the Pope. I don't speak Ex Cathedra! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Actually, a high DCF is good in his particular case and could prevent further -197 even though it messes with cache. Your logic isn't quite right, DCF is not used for the limit. It's strictly whatever rsc_fpops_bound the servers sent divided by whatever flops are in use for the application. The estimate is rsc_fpops_est divided by the same flops, but then multiplied by DCF. That unfortunately means that if DCF is above 10, the estimate is longer than the limit. So it is possible to have what looks like a reasonable estimate and progress get killed for maximum elapsed time exceeded. Joe |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Do you think that these slow running tasks may be the same issue I reported in PM to you ('Task_Hang_on_ATI', 'Addendum__Task_Hang_on_ATI') and you "copied to the r390 thread at Lunatics"? ("task 'hang' on ATI and auto-continued after a long time (but validated OK)") Do you think that what I noted in the second letter ('Addendum__Task_Hang_on_ATI') have any merits?: " About the pause in GPU processing during BOINC usage: This may be some driver glitch - it appears to me that if I start some GPU monitoring tool it kicks the ATI driver and the GPU processing continues. (I may be wrong, this may be just a coincidence, I didn't watching for this behaviour specifically but I can say that the GPU processing continued at +- 2 minutes around the start of GPU monitoring program. ) It is probably not the GPU monitoring as such that do the 'kick' (as SIV and TThrottle run all the time). I suppose it have to be the start (initialization phase) of GPU monitoring tool. It was GPU-Z in the case of 21fe12ad.19149.9502.8.10.160.vlar It was ATI MemoryViewer in the case of 25ap12aa.26506.476.13.10.96 " Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Actually, a high DCF is good in his particular case and could prevent further -197 even though it messes with cache. Ta. I thought it would affect the limit as well - doesn't make much sense that way... In that case probably have to resort to Fred's reschduler and use the expert -177 option. I'm not the Pope. I don't speak Ex Cathedra! |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
to digress: -Windows 7 ultimate fully updated. -BOINC 7.0.28 -ATI 12.3 drivers installed without errors (uninstalled drivers, used Driver sweep, reboot installed 12.3) allowed 1 CPU core to remain idle and changed from the HD5 to standard GPU app and still getting errors Are there any other ATI 7970 users that have this problem or use a different driver? would a period_interation move in the app_info change anything? In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.