Message boards :
Number crunching :
Modified SETI MB CUDA + opt AP package for full GPU utilization
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 25 · Next
Author | Message |
---|---|
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
... Eric has always been willing to give credit for Beta work by using a script, and that's obviously happening here too. The CUDA code is very new and problems can be expected, it wouldn't be fair to penalize users for weaknesses of the application. That kind of credit granting cannot effect the science. The occasional problem where two CUDA apps get a "strongly similar" result may cause false signals to be added to the science database, but the possibility they'll be part of a persistency match is vanishingly small. Joe |
![]() ![]() Send message Joined: 15 Apr 99 Posts: 1546 Credit: 3,438,823 RAC: 0 ![]() |
Should be possible to find out which AR triggers that error. Just modify the AR of a test wu and try running the app in standalone mode. ![]() Join BOINC United now! |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
... Not so small. It happened already with overflowed result. CUDA can generate such results with amazing speed untill host will be rebooted. So leaved unattended such host can prepare pretty big field for such false "validations". |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Here is new build with logging of AR of overflowed tasks. It will create and append later file r_debug.txt on C:\ and will write AR of overflowed task there. Sure there will be "legal" overflows too, but any overflows not validated against CPU result worth to report. Just replace CUDA MB executable from my last package with this one. Name remains same. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
... I didn't mean to imply I think the problem is negligible. Matt said that 3% of validations involve CUDA processing, so the rate at which two CUDA apps are paired is about 1 in 1000. If about half the work on CUDA apps is getting false signal overflows, that's a lot of bad data going into the science database. However, any that say result_overflow are flagged as such when being put into the database, and the likelihood that another observation from another time will match the sky position, frequency, and signal type is small anyhow. NTPCKR will have more data to chew on, but I don't think it is going to identify these as possible candidates for reobservation. Joe |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51527 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
In my humble opinion..... Cuda should be withdrawn from Seti Main back into the beta stage from whence it came.....until the 'bugs' are worked out. In a scientific project, there is little room for tossing known bad data into the results, thereby invalidating what otherwise would be good work. "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
In my humble opinion..... Agreed. But this can be done only by project staff. We all do what we can do.... |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51527 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
In my humble opinion..... Of course, Raistmer. I am not saying anything bad about your efforts to work with what was put forth. I am suggesting that the Admins yank it from Main until it has proven itself in Beta.....it never should have been released here. "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
![]() Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0 ![]() |
OMG... look here on this your result. It's absolute record about quantity of errors per single result %) Hmmm.. I only set the low 3d clocks and 2d clocks to the same speed as performance 3d clocks... ATI tools claims it is stable (after a 30 min test) WAIT!! I know why that happened... Your cuda app requested to connect to the internet... It was waiting for my permission... (Firewall) Destination IP 75.154.132.100:53 Any Ideas why? ![]() Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
OMG... look here on this your result. It's absolute record about quantity of errors per single result %) Hm... no ideas. This IP resolved to cachednsab04.nssi.telus.com Sure not my host :) Try to check it by all antivirus means you could reach. Maybe my MSVC production host infected by some trojan horse virus? ... Will check this too... UPDATE: My current NOD32 version says file clean. And I have some idea what could happened - could it be app crash and Windows OS launches error reporting ? |
![]() Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0 ![]() |
And I have some idea what could happened - could it be app crash and Windows OS launches error reporting ? No CUDA app errors in event viewer... Haven't seen an app crash causing a error report screen (only computer crashes) ![]() Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 |
![]() ![]() Send message Joined: 26 May 99 Posts: 9958 Credit: 103,452,613 RAC: 328 ![]() ![]() |
I have installed the updated file. Will run with it and let you know. Bernie |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
I have installed the updated file. Will run with it and let you know. Fine :) Just don't forget to check this executable with your latest antivirus software (as it should be for all inet-downloaded stuff). @popandbob What your antivirus said? Don't like idea that my server security compromised in such way... :/ UPDATE: DrWEB CureIT! says "clean" too. |
john deneer ![]() Send message Joined: 16 Nov 06 Posts: 331 Credit: 20,996,606 RAC: 0 ![]() |
In my humble opinion..... I fully agree. Imagine the scenario that somebody would build an optimized application for cpu crunching which would give a 10x faster performance compared to stock but resulting in incorrect results. This application would run on each and every system, resulting in speed increases on all systems, but generating wrong results on all systems as well. The guys building it would most likely not be inclined to distribute it, feeling a responsibility to distribute a correctly working program only. And if they did distribute it their program would have to be installed 'manually' by crunchers anyway, and thus it would most probably get distributed only to the 5% or so that are 'on the ball' anyway. Since most of these faulty wu's would be paired with a wingman using stock they would be discarded. So even if the developers of this faulty program would behave irresponsible the damage would be limited since most faulty results would be discarded. Now look at the scenario that has been established for the cuda application. This thing is generating faulty results all over the place. If you upgrade to the newest version of boinc, and you are the happy owner of a cuda capable card (and there are a lot of computer enthousiasts using nvidia cards for gaming) you have automatically enabled the use of this card for crunching seti, since using the card is the default preference enabled for all users. There are people in the message boards who are stunned to find that their computer is crunching seti using the graphics card. If the resulting havoc had been caused by some cruncher developing a great killer application in his spare time, thinking he was doing everybody a favor, he would have been called an idiot, somebody who obviously had no idea of what he was doing. The problem is that when you open the stage curtains what you get is not an idiot but the people who should have known better (at least, that's how I think about it). Okay, I think I'll stop ranting now. I had to blow off some steam I guess :-) Regards, John. PS: I'm actually using the cuda application on a machine that has 2 gt8800's. It seems to be producing reliable results, but who can tell. I'm using this system in the hope that it produces as many faulty results as possible. Unfortunately my dcf on that rig is such that I receive only very few wu's. Producing as many faulty results as possible will hopefully get the message across ..... That's the Christmas spirit for you. Oh sorry, still ranting :-) |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
Got first validated result using new modified app. 382164460 This task wasn't a problem, but it's the first one that validated, even though I have other tasks that should have validated before this one. using the new build released today now. Edit: even though the "other task" isn't 2.7 AR I'm still curious why it isn't validating and no wingman is being sent out on it if there is a problem? |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Well, AR=2,7 and "valid" w/o overflows... fine. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
...even though the "other task" isn't 2.7 AR I'm still curious why it isn't validating and no wingman is being sent out on it if there is a problem? Since about 1 pm Dec. 25 Berkeley time, the Validator has been falling behind, now "Workunits waiting for validation 197,689". I think it's still working, but slowly enough that it might be running as much as a day late. Joe |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
...even though the "other task" isn't 2.7 AR I'm still curious why it isn't validating and no wingman is being sent out on it if there is a problem? Thanks Joe, I thought there might be something I wasn't aware of, wrong with the task, and was concerned about doing more till I found out what it was. I'm at ease now. |
![]() Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0 ![]() |
Well, AR=2,7 and "valid" w/o overflows... fine. I've had [edit] some [/edit] 2.71's w/o overflow and [edit] most [/edit] 2.72's with overflow's Still waiting on the validator to see if they are valid or not... My anti-virus (zone alarm) Says my PC is clear of all spyware/viruses... Maybe this will have to go under the "Microsoft Mystery" folder..lol ![]() Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 |
![]() Send message Joined: 29 Apr 00 Posts: 15 Credit: 5,921,750 RAC: 0 ![]() |
I also think the issue is partly driver related. I'm running an 8800GTX on the 180.84 beta drivers and started getting nothing but -9 overflows until I rebooted and that seemed to also fix the problem. So this all might be just a combination of immature drivers and also the API perhaps not clearing our resources after a crash?? I'm not too familiar with the CUDA API. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.