Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 42 · 43 · 44 · 45 · 46 · 47 · 48 . . . 83 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1880679 - Posted: 28 Jul 2017, 1:07:10 UTC - in response to Message 1880666.  

. . I am most distressed ... I have an invalid task that is not a noise bomb ...
http://setiathome.berkeley.edu/workunit.php?wuid=2616796976
. . It looks OK to me though it completed slightly quicker than the usual Guppi task.


EdIt- that's a bit odd, it started on Device 1, then stopped & restated on Device 2.

**** Detected setiathome_enhanced_v7 task ****. Autocorrelations enabled, size 128k elements.[/pre]


. . Thanks Grant,

. . That explains the problem, it must have been when I shut down to upgrade something. Though detecting it as a V7 task is odd. But since it is a known issue I won't fret over it any more :)

. . I wonder if suspending all other tasks and letting the two running tasks complete before shutting down BOINC will solve the issue. From the sound of things it should.

Stephen
ID: 1880679 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1880681 - Posted: 28 Jul 2017, 1:11:19 UTC - in response to Message 1880667.  
Last modified: 28 Jul 2017, 1:12:33 UTC

Replied to wrong post.
Grant
Darwin NT
ID: 1880681 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1880682 - Posted: 28 Jul 2017, 1:13:10 UTC - in response to Message 1880677.  

Restarted at 72.01 percent

That has already been discussed. You will have to talk to Petri about that one.
My suggestion would be to set the checkpoints to longer than your GPU takes to finish a task.

**** Detected setiathome_enhanced_v7 task ****

What about detecting the WU as v7 and not v8?
Is that what causes the problem after the restart, or is it a different issue entirely?
Grant
Darwin NT
ID: 1880682 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1880709 - Posted: 28 Jul 2017, 3:09:07 UTC - in response to Message 1880682.  

Restarted at 72.01 percent

That has already been discussed. You will have to talk to Petri about that one.
My suggestion would be to set the checkpoints to longer than your GPU takes to finish a task.

**** Detected setiathome_enhanced_v7 task ****

What about detecting the WU as v7 and not v8?
Is that what causes the problem after the restart, or is it a different issue entirely?
That's an interesting observation. I hadn't noticed that discrepancy before on the Invalids I've been getting on my host 8289033. It appears that if a task restarts from scratch, it sticks with "Detected setiathome_enhanced_v8 task", but if it restarts from a checkpoint, it prints the "v7" line, as you spotted. Only the developer could say if that's a significant clue or not.

Host 8289033 is the only one of my 3 Linux boxes that gets those Invalids, which have been happening about 3-4 times a week since I starting running the x41p_zi3v app. (The other 2 hosts are still running x41p_zi3t2b.) Inasmuch as I have 4 GPUs on that host, and it goes through a restart cycle every weekday evening, that's about 15-20% of the restarted tasks that go berserk. There seems to be a threshold at around 70% where, if a task restarts from a checkpoint below that point, it "detects" a mass of phantom Spikes, whereas above that point, it finds phantom Triplets. All 3 Linux hosts have the same 120 second checkpoint setting.
ID: 1880709 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1880714 - Posted: 28 Jul 2017, 3:33:32 UTC - in response to Message 1880709.  

The CUDA code is a kluge, some parts are from 2007. This is the part of the code that prints that line;
https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha/PetriR_raw3/analyzeFuncs.cpp#L701
if (ac_fft_len) fprintf(stderr,"Detected setiathome_enhanced_v7 task.
It would be a very simple matter to change that 7 to an 8, I doubt it would change a thing.
So, I wouldn't get very concerned about it. It's going to take much more than changing the number to fix the problem, and the problem existed back when v7 was current.
The overflows on restarted tasks have always been there...they were worse a while back.
ID: 1880714 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1880716 - Posted: 28 Jul 2017, 3:44:35 UTC - in response to Message 1880714.  

The CUDA code is a kluge, some parts are from 2007. This is the part of the code that prints that line;
https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha/PetriR_raw3/analyzeFuncs.cpp#L701
if (ac_fft_len) fprintf(stderr,"Detected setiathome_enhanced_v7 task.
It would be a very simple matter to change that 7 to an 8, I doubt it would change a thing.
So, I wouldn't get very concerned about it. It's going to take much more than changing the number to fix the problem, and the problem existed back when v7 was current.
I think the question, though, would be whether the program's taking an incorrect path following a restart, actually bringing it to the v7 instead of the v8 path. As I said, if a task restarts at the beginning and doesn't show a restart percentage, it still prints the v8 line.

The overflows on restarted tasks have always been there...they were worse a while back.
Not on Windows, as far as I know. Never had a single one in the 4+ years I've been active.
ID: 1880716 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1880718 - Posted: 28 Jul 2017, 3:51:30 UTC - in response to Message 1880716.  
Last modified: 28 Jul 2017, 4:03:25 UTC

The overflows on restarted tasks have always been there WITH THE SPECIAL APP...they were worse a while back.

Fixed it for you.
If you check the code, I think you'll find the Special code doesn't have another path. You might want to check it though, I could be wrong ;-)

BTW, a quick check would be to find the line that says,
fprintf(stderr,"Restarted at %.2f percent, with setiathome enhanced x41p_zi3v, Cuda %c.%c%c %s\n",
progress*100,custr[0],custr[2],custr[3],(CUDART_VERSION >= 6050) ? "special":"");
if (ac_fft_len) fprintf(stderr,"Detected setiathome_enhanced_v8 task.
I don't see one.
ID: 1880718 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1880723 - Posted: 28 Jul 2017, 4:08:22 UTC
Last modified: 28 Jul 2017, 4:23:19 UTC

Okay, I just restarted 8289033 a little early so I could easily spot today's four restarted tasks when they reported. I got the full range of results, I think.

Task 5901687112: Validated; originally started on a 750Ti, restarted on the 960 at 47.66%; shows "Detected setiathome_enhanced_v7 task" after restart; no apparent phantom signals

Task 5901695820: Pending; originally started on the 960, restarted on a 750Ti at 66.41%; shows "Detected setiathome_enhanced_v7 task" after restart; no apparent phantom signals

Task 5901695818: Pending; originally started on a 750Ti, restarted on a different 750Ti, apparently from beginning (no restart % shown); shows "Detected setiathome_enhanced_v8 task" after restart; no apparent phantom signals

Task 5901687309: Inconclusive; originally started on a 750Ti, restarted on a different 75Ti at 57.90%; shows "Detected setiathome_enhanced_v7 task" after restart; appears to have detected 17 phantom Triplets following restart and will obviously be marked Invalid once all is said and done

And, before someone asks, it isn't always the same GPU that exhibits the problem following a restart. It can happen on any one of the 4.

------------------

EDIT: To provide another data point, I just took a look at one of the tasks that restarted this evening on one of my other Linux hosts, 8253697, which is running x41p_zi3t2b. It also prints the v7 line following a checkpoint restart. The thing is, this host has never had the "phantom signals" problem.

Task 5902138952: Pending; originally started on one 980, restarted on the other 980 at 56.13%; shows "Detected setiathome_enhanced_v7 task" after restart; no apparent phantom signals
ID: 1880723 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1880774 - Posted: 28 Jul 2017, 12:30:49 UTC

I'm going to look at the code when home.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1880774 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1880918 - Posted: 29 Jul 2017, 7:49:38 UTC

Thought I remembered reading about the "Detected setiathome_enhanced_v7 task" errors recently. Just checked my new special app linux box and noticed a couple of errors. These two ones:
Task 5902461864

Task 5905308810

have the same errors talked about in the last few posts. Never had any of those kind of errors on my Windows machines. I take it that the app is being looked at and there will be a possible update?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1880918 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1880967 - Posted: 29 Jul 2017, 16:11:40 UTC - in response to Message 1880918.  

Thought I remembered reading about the "Detected setiathome_enhanced_v7 task" errors recently. Just checked my new special app linux box and noticed a couple of errors. These two ones:
Task 5902461864

Task 5905308810

have the same errors talked about in the last few posts. Never had any of those kind of errors on my Windows machines. I take it that the app is being looked at and there will be a possible update?
Although the "Detected setiathome_enhanced_v7 task" is apparently coming from the Special app when it's restarted from a checkpoint, the actual error you're getting on those two, "finish file present too long", is a long-standing BOINC problem. Your tasks actually completed successfully and wrote out their "finish file", but BOINC was shut down before it got around to noticing that the file was there. By the time BOINC was restarted, that 10 second window had long been exceeded and BOINC trashed the tasks. Nothing the application can do in this situation.
ID: 1880967 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1881670 - Posted: 2 Aug 2017, 18:48:21 UTC

Hi,

detected ... v7 ...

That is just an old message and I have changed that to say now v8.

If you have trouble with restarted tasks you could try setting in BOINC manager 'write to disk at most every xxx seconds' to such a high number that a longest running GPU task does not do checkpoints.
My computer finishes GPU tasks in under 300 seconds so I can set 300 to the BOINC.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1881670 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1881718 - Posted: 3 Aug 2017, 1:56:56 UTC

Petri, do you have your latest special app hosted somewhere?

What is a sensible -pfb value for reference GTX970s? I have a command line currently at -autotune and -nobs. I searched in the top 100 computers for other 970 users and I only found Stephen who is using the same defaults and Mr. Kevvy who is using autotune and -pfb=32 setting. I think the -pfp value correlates directly to the number of compute units. I am unclear on what the -pfb parameter sets up in the card. A short explanation would be appreciated. I looked in the docs and it doesn't really explain what the parameters accomplish, just their syntax.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1881718 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1881723 - Posted: 3 Aug 2017, 3:06:23 UTC - in response to Message 1881718.  

Petri, do you have your latest special app hosted somewhere?

What is a sensible -pfb value for reference GTX970s? I have a command line currently at -autotune and -nobs. I searched in the top 100 computers for other 970 users and I only found Stephen who is using the same defaults and Mr. Kevvy who is using autotune and -pfb=32 setting. I think the -pfp value correlates directly to the number of compute units. I am unclear on what the -pfb parameter sets up in the card. A short explanation would be appreciated. I looked in the docs and it doesn't really explain what the parameters accomplish, just their syntax.


Hi,

I do not have the latest (cuda9) version anywhere yet. It'll come in a week or so. I'll post here the links.

The -pfb can have 8, 16 or 32 on modern HW (7xx, 9xx & 10xx). It is short for Pulse Find Blocks per sm. The default is 4 if I remember correctly. Setting it to 8 or higher speeds up the computation. It is not autotuned yet. I use -pfb 32 on my computer.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1881723 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1881739 - Posted: 3 Aug 2017, 4:28:21 UTC - in response to Message 1881723.  

Thanks Petri, I mainly asked because of your post of the fix for the outdated error messages to V7. I'll monitor the thread looking for the latest when you think it is ready for release.

Thanks for the explanation about the -pfb setting. I went back to the docs for the x41Z app and refreshed my memory. I used to run -pfb=16 for the CUDA50 app on the 970's. I'll think I'll give that a try for the 970s on the special app.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1881739 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1881779 - Posted: 3 Aug 2017, 8:52:09 UTC - in response to Message 1881739.  

Thanks Petri, I mainly asked because of your post of the fix for the outdated error messages to V7. I'll monitor the thread looking for the latest when you think it is ready for release.

Thanks for the explanation about the -pfb setting. I went back to the docs for the x41Z app and refreshed my memory. I used to run -pfb=16 for the CUDA50 app on the 970's. I'll think I'll give that a try for the 970s on the special app.


I hope you all can live with the message having a 'typo'. An to the -pfb N, I think you'll notice an improvement in speed.
And for those who have blocking sync enabled I recommend disabling it if you are after the ultimate speed.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1881779 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1882045 - Posted: 4 Aug 2017, 13:30:58 UTC - in response to Message 1881779.  


I hope you all can live with the message having a 'typo'. An to the -pfb N, I think you'll notice an improvement in speed.
And for those who have blocking sync enabled I recommend disabling it if you are after the ultimate speed.

Petri


. . I have added -pfb 16 to the app_info.xml for both machines and cannot say there is a huge improvemnent. So I upped the number for the 970s to 32 but still only a few secs off at best. Too small a change to be sure about. It could just be a small variation in the batch of tasks coming through at the moment. Still it is doing no harm so I will leave it there.

. . It was worth a try.

. . Normal Arecibo (NARA) tasks were 2 mins 48 - 53 secs. Now about 2 min 44-46 secs.

. . VHAR (Halflings) were 1 min 18-20 secs, now about 1 min 21-22 secs.

. . A similarly small change in GB run times.

Stephen

..
ID: 1882045 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1882073 - Posted: 4 Aug 2017, 15:52:25 UTC

OK, you prompted me to bump my -pfb to 32 also since I know you have 970s also.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1882073 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1882136 - Posted: 4 Aug 2017, 21:45:16 UTC

It appears the new build of BOINC 7.2.47 is working very nicely. There's just a couple minor changes over the current build. The new build uses the gpu_nvidia.cpp file from 7.4.53 which displays the nvidia_driver_version, the change from 7.6.33 that adds the repository driver library link allowing boinc to see OpenCL with the repository driver, and the embedded wxWidgets libraries now contains the tiny libwx_gtk2_gl-2.8.a library. As far as I know there aren't any outstanding dependencies in the targeted Ubuntu systems, 12.04 and higher. In the newer systems such as 16.04 you need to choose Shut down connected client... in the Advanced menu to Stop boinc and the active tasks, or just Suspend the project before Exiting the Manager which will leave boinc running harmlessly in the background until next use. This version of BOINC does Not contain the Tasks page display Bug which is present in newer versions of BOINC for Linux.

As soon as I test embedding the CUDA Special App in the BOINC package I'll be posting the All In One BOINC package. This will make running the Special App as easy as Installing the OS & Driver, Expanding the Download to your Home folder, and Double Clicking boincmgr. Quite an improvement in ease of use.
ID: 1882136 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1882151 - Posted: 4 Aug 2017, 23:05:45 UTC - in response to Message 1882136.  

Thanks for the continued development and debug of the linux BOINC platform, TBar. I'm sure a lot of people will be interested in your all-in-one package. And I'm sure the fact BOINC gets installed in Home directory will make the permissions headache a thing of the past and will be greatly appreciated by everyone.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1882151 · Report as offensive
Previous · 1 . . . 42 · 43 · 44 · 45 · 46 · 47 · 48 . . . 83 · Next

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.