Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 83 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1908405 - Posted: 22 Dec 2017, 3:52:16 UTC - in response to Message 1908403.  

Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox.

I'm running SoG r 3500 on that machine

Jeff is running the old Cuda Special with cuda 8.0
ID: 1908405 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1908406 - Posted: 22 Dec 2017, 4:09:11 UTC - in response to Message 1908405.  

Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox.

I'm running SoG r 3500 on that machine

Jeff is running the old Cuda Special with cuda 8.0
Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes.
ID: 1908406 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1908427 - Posted: 22 Dec 2017, 12:22:58 UTC - in response to Message 1908406.  

Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox.

I'm running SoG r 3500 on that machine

Jeff is running the old Cuda Special with cuda 8.0
Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes.


. . For what it is worth I am getting very good results with zi3v Cuda80, though I am thinking about moving it up to the Cuda90 version as that seems to be just as reliable.

Stephen

:)
ID: 1908427 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908449 - Posted: 22 Dec 2017, 17:24:33 UTC - in response to Message 1908427.  

Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox.

I'm running SoG r 3500 on that machine

Jeff is running the old Cuda Special with cuda 8.0
Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes.


. . For what it is worth I am getting very good results with zi3v Cuda80, though I am thinking about moving it up to the Cuda90 version as that seems to be just as reliable.

Stephen

:)

I've found the zi3v CUDA 9 version to be MUCH less prone to throwing out Inconclusives compared to the zi3v CUDA 8 app.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908449 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1908455 - Posted: 22 Dec 2017, 18:15:16 UTC - in response to Message 1908449.  

I've found the zi3v CUDA 9 version to be MUCH less prone to throwing out Inconclusives compared to the zi3v CUDA 8 app.
That's interesting. I've found it to be pretty much a tossup. I switched one of boxes (8253697) to the zi3v Cuda9 on 23 Nov and noted the task breakdown as:
 Validation pending (1464) · Validation inconclusive (70) · Valid (1332)
I just looked at the current numbers and found:
 Validation pending (1690) · Validation inconclusive (74) · Valid (996)
The Inconclusives have certainly fluctuated over the course of the month since I made the change (much of which seems to be due to the server problems and outages), but the current numbers look like about what I've seen when the servers have been stable.

I think the Cuda9 version of zi3v has cut down on the Inconclusives due to Pulse differences, but because that version uses some of the zi3x routines, it's added some Inconclusives due to changes in the Spike and Autocorr reporting sequences by zi3x. The problem occurs on overflows where the signals are mostly Spikes and Autocorrs. Because they're reported in a different order, it tends to be a tossup when the 30-signal overflow condition is triggered as to whether the last signal reported is a Spike or an Autocorr. Up until then, the reported signals can all be matched up (with a little effort), but it's that difference in the last one (or sometimes two or more) that causes indigestion in the validator. Things might be fine for that particular WU if the cutoff was 31, or 29, or......but then it would just be a different WU that would throw a shoe.

I first reported on these types of Inconclusives in Message 1895798 a couple months ago. They appear to have started with zi3x, which is why the Cuda8 version of zi3v doesn't have them, but the Cuda9 version of zi3v is a bit of a hybrid and picks up the improved Pulse finding but also the degraded Spike/Autocorr sequencing.

BTW, I do still have one host (8289033) running the Cuda8 zi3v and its current task breakdown looks like this:
Validation pending (1351) · Validation inconclusive (74) · Valid (796)
It's a slightly lower-volume machine, so the current Inconclusives are just a hair more common than with the Cuda9 zi3v host, but probably not statistically significant. They're just Inconclusive for different reasons.
ID: 1908455 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1908458 - Posted: 22 Dec 2017, 18:35:27 UTC

MY host running CUDA90

State: Todos (6725) · Em progresso (536) · Validation pending (3367) · Validation inconclusive (126) · Válido (2647) · Inválido (0) · Erro (49) 

ID: 1908458 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1908462 - Posted: 22 Dec 2017, 19:01:39 UTC

I made a close look at my inconclusives and see a lot of them are like this:

https://setiathome.berkeley.edu/workunit.php?wuid=2787543202

The WU crunched by the CUDA90 builds shows:

Spike count:    5
Autocorr count: 0
Pulse count:    2
Triplet count:  19
Gaussian count: 0


and the one crunchd by the wingmate host CUDA42 in this particular case shows:

Spike count:    30
Autocorr count: 0
Pulse count:    0
Triplet count:  0
Gaussian count: 0


What is happening?
ID: 1908462 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908464 - Posted: 22 Dec 2017, 19:04:01 UTC

My zi3v CUDA90 machine.
State: All (4167) · In progress (400) · Validation pending (2189) · Validation inconclusive (101) · Valid (1466) · Invalid (0) · Error (11)

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908464 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1908485 - Posted: 22 Dec 2017, 21:24:43 UTC - in response to Message 1908462.  

...and the one crunchd by the wingmate host CUDA42 in this particular case shows:

Spike count:    30
Autocorr count: 0
Pulse count:    0
Triplet count:  0
Gaussian count: 0


What is happening?
Nothing for you to worry about there. Almost all of your wingmate's GPU tasks are ending up as Invalid with similar overflows. That machine is running an old NVIDIA GeForce 8400 GS which just doesn't seem to be working very well.
ID: 1908485 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1908492 - Posted: 22 Dec 2017, 21:44:57 UTC - in response to Message 1908485.  
Last modified: 22 Dec 2017, 22:04:31 UTC

...and the one crunchd by the wingmate host CUDA42 in this particular case shows:

Spike count:    30
Autocorr count: 0
Pulse count:    0
Triplet count:  0
Gaussian count: 0


What is happening?
Nothing for you to worry about there. Almost all of your wingmate's GPU tasks are ending up as Invalid with similar overflows. That machine is running an old NVIDIA GeForce 8400 GS which just doesn't seem to be working very well.


. . Did someone drop that wingmate a line to let them know their video drivers are wrong for that card? It will continue to produce bad results unless it is rectified or the card replaced.

[edit] . . Looking at his results the invalids seem to have stopped about 10 hours ago, so I am guessing someone has ...

Stephen

??
ID: 1908492 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1908495 - Posted: 22 Dec 2017, 21:57:44 UTC - in response to Message 1908492.  

Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^)
ID: 1908495 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1908497 - Posted: 22 Dec 2017, 22:07:52 UTC - in response to Message 1908495.  
Last modified: 22 Dec 2017, 22:14:09 UTC

Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^)


. . Some people are clearly shy :)

. . But either someone did get through or the operator detected the issue themselves. The invalids seem to have stopped about 10 hours ago (for the moment at least). Here's hoping :) . Actually there are no results since then so may be that rig simply hasn't logged in since that time.

Stephen

:)
ID: 1908497 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1908818 - Posted: 24 Dec 2017, 19:31:24 UTC - in response to Message 1908497.  

Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^)


. . Some people are clearly shy :)

. . But either someone did get through or the operator detected the issue themselves. The invalids seem to have stopped about 10 hours ago (for the moment at least). Here's hoping :) . Actually there are no results since then so may be that rig simply hasn't logged in since that time.

Stephen

:)


. . I have since re-checked that rig and it is still chucking out rubbish. Someone needs to reach them.

Stephen

:)
ID: 1908818 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1908890 - Posted: 25 Dec 2017, 11:55:02 UTC

Two solutions come to mind:
No "Anne Onee Moose" accounts.
Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1908890 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1908902 - Posted: 25 Dec 2017, 14:18:24 UTC - in response to Message 1908890.  

Two solutions come to mind:
No "Anne Onee Moose" accounts.
Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances.


. . I can't see the first one being implemented, but the second one should be feasible.

. . Damage control ...

Stephen

..
ID: 1908902 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908955 - Posted: 25 Dec 2017, 22:55:26 UTC - in response to Message 1908890.  

Two solutions come to mind:
No "Anne Onee Moose" accounts.
Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances.

Not entirely sure of the reference with the first suggestion. I think it refers to drive-by spammers creating accounts to spam. We should do the same as MilkyWay just did and not allow accounts to post or form teams unless the host has produced 1 credit.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908955 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1908963 - Posted: 26 Dec 2017, 0:07:32 UTC - in response to Message 1908955.  

Two solutions come to mind:
No "Anne Onee Moose" accounts.
Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances.

Not entirely sure of the reference with the first suggestion. I think it refers to drive-by spammers creating accounts to spam. We should do the same as MilkyWay just did and not allow accounts to post or form teams unless the host has produced 1 credit.


. . Hi Keith,

. . Actually we are talking about dysfunctional hosts producing large numbers of bad results and causing that number of resends. When they are set up, as was the case in point, as anonymous you cannot contact them to point out that they need to take action about their problem. So Rob was suggesting anonymous accounts should not be allowed and/or their bad results cause them to be restricted on downloads, so they can be contacted and if the problem persists they will be progressively starved of new work.

Stephen

:)
ID: 1908963 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1908970 - Posted: 26 Dec 2017, 1:04:34 UTC - in response to Message 1908890.  

Two solutions come to mind:
No "Anne Onee Moose" accounts.
Second automatically stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances.


I like the word rate (not count) as in ratio compared to the number of good ones or RAC.
A totally valid host can get into an error mode sometimes. It should have a possibility to recover rather hastly.
A good computer can have a bad behaving new (a brand new beta) version of an executable but as soon as it has been fixed it should be allowed to recover.

Those new computers with RAC < xx and valid tasks < yy could get spanked, ignored, rectified or coerced to use more suitable running environments (less dust, right drivers, ...) , settings (no OC or OV) or correct versions(up to par and date) of drivers and software.

I know my host can be seen/taken as one of the worst pollutants/misbehaving ones but I have a cause and an intent to make it (and the whole computing) better.

And a good night to all of you at 3 AM here.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1908970 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908975 - Posted: 26 Dec 2017, 1:37:50 UTC

I just wish the mechanism that is supposedly in place for the longest time, actually worked. But I agree, there should be new mechanisms put into play to prevent these bad hosts from getting any more work until they clean up their act. As I said, if the existing mechanism worked consistently, report a bad task and get penalized and reduce penalty in real time for each valid task reported.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908975 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1909005 - Posted: 26 Dec 2017, 7:12:21 UTC - in response to Message 1908970.  

Two solutions come to mind:
No "Anne Onee Moose" accounts.
Second automatically stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances.


I like the word rate (not count) as in ratio compared to the number of good ones or RAC.
A totally valid host can get into an error mode sometimes. It should have a possibility to recover rather hastly.
A good computer can have a bad behaving new (a brand new beta) version of an executable but as soon as it has been fixed it should be allowed to recover.

Those new computers with RAC < xx and valid tasks < yy could get spanked, ignored, rectified or coerced to use more suitable running environments (less dust, right drivers, ...) , settings (no OC or OV) or correct versions(up to par and date) of drivers and software.

I know my host can be seen/taken as one of the worst pollutants/misbehaving ones but I have a cause and an intent to make it (and the whole computing) better.

And a good night to all of you at 3 AM here.

Petri


. . Hi Petri,

. . You have little to be worried about, as an advanced experimental platform your rig has error and invalid rates of about 5% or less, that is 1000% better than many of the delinquent rigs out there is cyberspace. Some of them have valid rates lower than that :)

Stephen

:)
ID: 1909005 · Report as offensive
Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 83 · Next

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.