Message boards :
Number crunching :
Modified SETI MB CUDA + opt AP package for full GPU utilization
Message board moderation
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · Next
| Author | Message |
|---|---|
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
Hi, Monitoring this configuration I see that MB_6.08_mod_VLAR_kill_CUDA.exe has 100% CPU utilization, so this module seems to be the culprit now. Morten Morten Ross
|
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
Well, it seems your host is good candidate for using Maik's script. It handles such hang situation AFAIK. Look for it here or on Lunatics site. As I have understood it Afaik's script is for terminating app when a WU is CPU-idle. In my scenario the WU is not progressing while the CPU utilization is 100% by the app. The workaround is to close boincmgr and restart it, then the same WU is being properly processed by app, so no WU is terminated. I have tested Afaik's script in this scenario and nothing happens, as the logic of the s cript is not aware of the erraneous situation. I think this one need further investigation into your code. Morten Morten Ross
|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121
|
Have you tried Boinc 6.6.2 yet? No. I read reports and it seems too unstable. Don't need it right now, maybe will try later.
There are 2 V7 versions available. One with VLAR processing. Slow but w/o errors (I hope). And another one with VLAR autokill enabled. So answer is yes, it's actually needed because it allows to rise overall performance. And no, you doesn't required to use VLAR kill version, you can use V7 w/o autokill mod. It will behave like 6.08 stock but do nice things with priorities and report memory availability. Maybe smth more in future versions. Everyone can chose what he/she like more ;)
No, V7 doesnt do it. V7 with autokill mod does it. Look on downloads places again - do you see two versions ?
they clearly differentiated from overflows now too, cause -9 overflow is not computation error at all. And BAD_HEADER code is true error code, such result will not be validated in any case (even if two results will be with this errors). If you provide code to "client aborted" exit code I could change it. I become incredibly lazy when doesn't understand reason of work proposed ;) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121
|
Link to result, srderr, please ? When it does hara-kiry it should not use CPU, it should just die. In all another cases it will behave like 6.08. Only in one case you can see 100% CPU usage - when due to low memory condition (or maybe another reason) it fall back to CPU processing. The it will use CPU for computations and will behave like 6.03 stock. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121
|
Well, it seems your host is good candidate for using Maik's script. It handles such hang situation AFAIK. Look for it here or on Lunatics site. I need at least stderr.txt for this task. |
popandbob Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0
|
For me the first ~25 seconds the wu's run at 100% CPU then they start processing... Was it in this start-up phase? Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121
|
Data decoding and GPU feeding I suppose. This feature was in place from very first CUDA MB release, nothing new in that. But 100% CPU usage more than ~1min - it's something that need to be analysed . |
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
It's allways in the start-up phase, and then the application gets stuck at 100% cpu util. There are no errors logged in the application stderr.txt or online, as seen here: http://setiathome.berkeley.edu/result.php?resultid=1129307192. This WU was first not progressing due to 100% cpu for the app, then boincmgr was restarted, and the WU-data once again dumped - that's why it's listed twice - but no errors as you can see, as the WU was completed successfully the second time around. If the application is killed, the WU is also failing, so that is not an option. Morten Morten Ross
|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121
|
Realy strange. Try to reproduce this with stock 6.08 and if it shows the same behavior - report it in Q&A section - it's most direct way to pass bug to devs. If 6.08 will work OK - well, you could continue with it and not use my current build. |
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
Hi, I downlaoded the setiathome_6.08_windows_intelx86__cuda.exe from fanoutserver and renamed all references to your V7-app in app_info to setiathome_6.08_windows_intelx86__cuda.exe and started BOINC. This failed - only AP WUs were processed and MB WUs failed, so I had to do a roll-back... What must the app_info contain in order to successfully use the stock 6.08-app? Morten Morten Ross
|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
Funny you should ask that - hang on a mo - have a look at message 857243. |
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
Hi, Excellent initiative! Unfortunately the same result, I'm afraid. Comparing the app_info only one difference between yours and mine: Mine: <version_num>528</version_num> <plan_class>cuda</plan_class> <avg_ncpus>0.040000</avg_ncpus> <max_ncpus>0.040000</max_ncpus> yours: <version_num>528</version_num> <plan_class>cuda</plan_class> <avg_ncpus>0.114729</avg_ncpus> <max_ncpus>0.114729</max_ncpus> Apart from that the your dll-files files are newer. Morten Morten Ross
|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
The app and the DLL files are as downloaded from the SETI servers - any variation in datestamp will be simply due to the different times we downloaded them. I have installed the packages direct from my own webspace (well, the AP SSE3 package - but there's very little difference) on to two machines which didn't previously have CUDA-capable graphics cards. No problems encountered - both AP and CUDA started to run exactly as expected. Which probably directs the focus onto your CUDA card, and the NVidia drivers you're using. Or has Raistmer already gone over all of that with you? |
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
Hi, I've gone through a thorough version-testing and have arrived at Cuda 181.20 and BOINC 6.4.5 (which Raistmer is running) as the best combo. Prior to this I have run GPGgrid for a while and there was not even one hitch, so this is S@H-specific. I have suggested to Raistmer that a debug-version of V7 is compiled, in o rder to collect more information at the time of the problem. Have you been successfull in running the "SETI CUDA v6.08 - AP SSE3" on a Windows Vista x64? I have a stable Vista 32-bit, so this is propably x64-related. I am considering demoting the BOINC installation to 32-bit...... Morten Morten Ross
|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
I'm using the same Cuda 181.20 and BOINC v6.4.5 combo, so nothing helps us there. All my rigs are 32-bit XP, so I can't help there: but I agree the suspicion lies in the 64-bit or Vista area. |
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
Hi, I might have been a bit quick to judge - boincmngr was hanging and I saw no setiathome_6.08_windows_intelx86__cuda.exe in task list and assumed a no-go and rolled back immediately. I wanted to give it another more thorough try before I demote to BOINC 32-bit and this time it is working. I'm bnot sure what is different - a reboot is the only thing I can think of is different from the previous two attempts. Nevertheless - same behaviour, so I am now going to demote the x64 to 32-bit BOINC and see if that makes a difference. Morten Morten Ross
|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121
|
You can try new version also. V8 uses completely different approach to GPU-handling. No BOINC involved. Maybe it will run on you host... |
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
Hi, Great approach! I am currently running BOINC 32-bit to see if this changes the issue... Morten Morten Ross
|
|
Morten Ross Send message Joined: 30 Apr 01 Posts: 183 Credit: 385,664,915 RAC: 0
|
Hi, BOINC 32-bit tested, and same happens.... I've now tested V8, and the same happens - the cuda-app is using 100% cpu and the WU is not progressing beyond 0,026%. Morten Morten Ross
|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121
|
well.... only x64 Vista remains... |
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.