Message boards :
Number crunching :
Noisy GPU workunits
Message board moderation
Author | Message |
---|---|
Stuart Gibson Send message Joined: 28 May 99 Posts: 31 Credit: 12,112,497 RAC: 0 |
Anybody else having this problem ? Here's an example: http://setiathome.berkeley.edu/result.php?resultid=1325282691 They are all reporting: -9 result_overflow These WUs complete in 1 second (on average) and I have had hundreds upon hundreds of these in the last couple of days, to the extent that my GPU is idle most of the time because I have exceeded my daily quota. I have 22 AP workunits and about 100 MB left to process on my quad, but I cant get any more work becuase of these ultra short GPU multibeams. If I reschedule them to the CPU, they process just fine. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
That looks like a noisy WU. I've had dozens of WU's that end quickly like that. the WU has to many results(30) and ends at that point. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
john deneer Send message Joined: 16 Nov 06 Posts: 331 Credit: 20,996,606 RAC: 0 |
I have 22 AP workunits and about 100 MB left to process on my quad, but I cant get any more work becuase of these ultra short GPU multibeams. I'm not crunching any wu's received on July 30 yet (that's when you received the first that went gaga on your machine), but the sheer amount of them seems very unlikely. Have you tried turning your machine off completely, in order that your gpu's don't have any voltage applied to them and get reset? Rebooting might not be enough, completely turning the machine off might reset the gpu's better then just rebooting. The fact that they crunch just fine on the cpu makes me suspicious of the state your gpu's are in :-) Regards, John. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
That looks like a noisy WU. I've had dozens of WU's that end quickly like that. the WU has to many results(30) and ends at that point. Yes, it "looks like" a noisy WU, but as Stuart pointed out they are not noisy when processed on a CPU. IOW, it's the GPU which is noisy, not the WU. Others have run across the problem. Too much overclocking, heat, or some component degrading can cause the GPU to produce bad results. When such a GPU is doing graphics it may show as an occasional pixel being wrong, so very little obvious impact. Joe |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
I suspect this is not a noisy WU problem - if they are re-scheduled to the CPU they are not -9's. If you re-boot your machine when you notice this happening, then I believe that they will all crunch fine. This starts with a "compute error" on one CUDA WU which then causes all succeeding tasks on the same GPU to error out with -9. I found on my GTX295 that that first "compute error" was caused by failing memory on the second GPU (the GTX is now in the post for RMA). It could also be caused by the vid card getting too warm - any chance of that? I tested the memory on my CUDA card with this. F. |
Stuart Gibson Send message Joined: 28 May 99 Posts: 31 Credit: 12,112,497 RAC: 0 |
I had only a mild overclock (2%) on the GPU's (2x ASUS 9800GTX+ TOPs) because they were pretty much maxxed out anyway, and they had been working fine. I'll try clocking them back to default and see if that makes any difference. I have extra cooling over the GPU's. Fred: Thanks for the link to the CUDA memory tester. I'll give it a try. |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
One of my wingmen http://setiathome.berkeley.edu/show_host_detail.php?hostid=4951639 seems to have a similar problem. I have just sent him a PM. Keith |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Yes, my GTX295 has been working fine since January (and UNDERclocked by 20% for the past few weeks to keep the temps down around 80C). Then last week it started producing errors, all from its second GPU. These things can creep up on you ;(( F. |
Westsail and *Pyxey* Send message Joined: 26 Jul 99 Posts: 338 Credit: 20,544,999 RAC: 0 |
Thanks for posting that!! Never seen it before. What a great tool. Downloading now.. "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov |
Stuart Gibson Send message Joined: 28 May 99 Posts: 31 Credit: 12,112,497 RAC: 0 |
Cheers John. Switched off the power supply for 30 seconds, powered up again and it seems to have fixed the problem. I'll have to add your tip to my little black book. Thanks to all. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.