Linux CUDA 'Special' App finally available, featuring Low CPU use

Author	Message
Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1908405 - Posted: 22 Dec 2017, 3:52:16 UTC - in response to Message 1908403. Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox. I'm running SoG r 3500 on that machine Jeff is running the old Cuda Special with cuda 8.0 ID: 1908405 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1908406 - Posted: 22 Dec 2017, 4:09:11 UTC - in response to Message 1908405. Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox. I'm running SoG r 3500 on that machine Jeff is running the old Cuda Special with cuda 8.0 Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes. ID: 1908406 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1908427 - Posted: 22 Dec 2017, 12:22:58 UTC - in response to Message 1908406. Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox. I'm running SoG r 3500 on that machine Jeff is running the old Cuda Special with cuda 8.0 Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes. . . For what it is worth I am getting very good results with zi3v Cuda80, though I am thinking about moving it up to the Cuda90 version as that seems to be just as reliable. Stephen :) ID: 1908427 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1908449 - Posted: 22 Dec 2017, 17:24:33 UTC - in response to Message 1908427. Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox. I'm running SoG r 3500 on that machine Jeff is running the old Cuda Special with cuda 8.0 Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes. . . For what it is worth I am getting very good results with zi3v Cuda80, though I am thinking about moving it up to the Cuda90 version as that seems to be just as reliable. Stephen :) I've found the zi3v CUDA 9 version to be MUCH less prone to throwing out Inconclusives compared to the zi3v CUDA 8 app. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1908449 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1908455 - Posted: 22 Dec 2017, 18:15:16 UTC - in response to Message 1908449. I've found the zi3v CUDA 9 version to be MUCH less prone to throwing out Inconclusives compared to the zi3v CUDA 8 app. That's interesting. I've found it to be pretty much a tossup. I switched one of boxes (8253697) to the zi3v Cuda9 on 23 Nov and noted the task breakdown as: Validation pending (1464) Â· Validation inconclusive (70) Â· Valid (1332) I just looked at the current numbers and found: Validation pending (1690) Â· Validation inconclusive (74) Â· Valid (996) The Inconclusives have certainly fluctuated over the course of the month since I made the change (much of which seems to be due to the server problems and outages), but the current numbers look like about what I've seen when the servers have been stable. I think the Cuda9 version of zi3v has cut down on the Inconclusives due to Pulse differences, but because that version uses some of the zi3x routines, it's added some Inconclusives due to changes in the Spike and Autocorr reporting sequences by zi3x. The problem occurs on overflows where the signals are mostly Spikes and Autocorrs. Because they're reported in a different order, it tends to be a tossup when the 30-signal overflow condition is triggered as to whether the last signal reported is a Spike or an Autocorr. Up until then, the reported signals can all be matched up (with a little effort), but it's that difference in the last one (or sometimes two or more) that causes indigestion in the validator. Things might be fine for that particular WU if the cutoff was 31, or 29, or......but then it would just be a different WU that would throw a shoe. I first reported on these types of Inconclusives in Message 1895798 a couple months ago. They appear to have started with zi3x, which is why the Cuda8 version of zi3v doesn't have them, but the Cuda9 version of zi3v is a bit of a hybrid and picks up the improved Pulse finding but also the degraded Spike/Autocorr sequencing. BTW, I do still have one host (8289033) running the Cuda8 zi3v and its current task breakdown looks like this: Validation pending (1351) Â· Validation inconclusive (74) Â· Valid (796) It's a slightly lower-volume machine, so the current Inconclusives are just a hair more common than with the Cuda9 zi3v host, but probably not statistically significant. They're just Inconclusive for different reasons. ID: 1908455 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1908458 - Posted: 22 Dec 2017, 18:35:27 UTC MY host running CUDA90 State: Todos (6725) Â· Em progresso (536) Â· Validation pending (3367) Â· Validation inconclusive (126) Â· VÃ¡lido (2647) Â· InvÃ¡lido (0) Â· Erro (49) ID: 1908458 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1908462 - Posted: 22 Dec 2017, 19:01:39 UTC I made a close look at my inconclusives and see a lot of them are like this: https://setiathome.berkeley.edu/workunit.php?wuid=2787543202 The WU crunched by the CUDA90 builds shows: Spike count: 5 Autocorr count: 0 Pulse count: 2 Triplet count: 19 Gaussian count: 0 and the one crunchd by the wingmate host CUDA42 in this particular case shows: Spike count: 30 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 What is happening? ID: 1908462 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1908464 - Posted: 22 Dec 2017, 19:04:01 UTC My zi3v CUDA90 machine. State: All (4167) Â· In progress (400) Â· Validation pending (2189) Â· Validation inconclusive (101) Â· Valid (1466) Â· Invalid (0) Â· Error (11) Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1908464 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1908485 - Posted: 22 Dec 2017, 21:24:43 UTC - in response to Message 1908462. ...and the one crunchd by the wingmate host CUDA42 in this particular case shows: Spike count: 30 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 What is happening? Nothing for you to worry about there. Almost all of your wingmate's GPU tasks are ending up as Invalid with similar overflows. That machine is running an old NVIDIA GeForce 8400 GS which just doesn't seem to be working very well. ID: 1908485 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1908492 - Posted: 22 Dec 2017, 21:44:57 UTC - in response to Message 1908485. Last modified: 22 Dec 2017, 22:04:31 UTC ...and the one crunchd by the wingmate host CUDA42 in this particular case shows: Spike count: 30 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 What is happening? Nothing for you to worry about there. Almost all of your wingmate's GPU tasks are ending up as Invalid with similar overflows. That machine is running an old NVIDIA GeForce 8400 GS which just doesn't seem to be working very well. . . Did someone drop that wingmate a line to let them know their video drivers are wrong for that card? It will continue to produce bad results unless it is rectified or the card replaced. [edit] . . Looking at his results the invalids seem to have stopped about 10 hours ago, so I am guessing someone has ... Stephen ?? ID: 1908492 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1908495 - Posted: 22 Dec 2017, 21:57:44 UTC - in response to Message 1908492. Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^) ID: 1908495 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1908497 - Posted: 22 Dec 2017, 22:07:52 UTC - in response to Message 1908495. Last modified: 22 Dec 2017, 22:14:09 UTC Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^) . . Some people are clearly shy :) . . But either someone did get through or the operator detected the issue themselves. The invalids seem to have stopped about 10 hours ago (for the moment at least). Here's hoping :) . Actually there are no results since then so may be that rig simply hasn't logged in since that time. Stephen :) ID: 1908497 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1908818 - Posted: 24 Dec 2017, 19:31:24 UTC - in response to Message 1908497. Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^) . . Some people are clearly shy :) . . But either someone did get through or the operator detected the issue themselves. The invalids seem to have stopped about 10 hours ago (for the moment at least). Here's hoping :) . Actually there are no results since then so may be that rig simply hasn't logged in since that time. Stephen :) . . I have since re-checked that rig and it is still chucking out rubbish. Someone needs to reach them. Stephen :) ID: 1908818 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1908890 - Posted: 25 Dec 2017, 11:55:02 UTC Two solutions come to mind: No "Anne Onee Moose" accounts. Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1908890 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1908902 - Posted: 25 Dec 2017, 14:18:24 UTC - in response to Message 1908890. Two solutions come to mind: No "Anne Onee Moose" accounts. Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances. . . I can't see the first one being implemented, but the second one should be feasible. . . Damage control ... Stephen .. ID: 1908902 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1908955 - Posted: 25 Dec 2017, 22:55:26 UTC - in response to Message 1908890. Two solutions come to mind: No "Anne Onee Moose" accounts. Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances. Not entirely sure of the reference with the first suggestion. I think it refers to drive-by spammers creating accounts to spam. We should do the same as MilkyWay just did and not allow accounts to post or form teams unless the host has produced 1 credit. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1908955 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1908963 - Posted: 26 Dec 2017, 0:07:32 UTC - in response to Message 1908955. Two solutions come to mind: No "Anne Onee Moose" accounts. Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances. Not entirely sure of the reference with the first suggestion. I think it refers to drive-by spammers creating accounts to spam. We should do the same as MilkyWay just did and not allow accounts to post or form teams unless the host has produced 1 credit. . . Hi Keith, . . Actually we are talking about dysfunctional hosts producing large numbers of bad results and causing that number of resends. When they are set up, as was the case in point, as anonymous you cannot contact them to point out that they need to take action about their problem. So Rob was suggesting anonymous accounts should not be allowed and/or their bad results cause them to be restricted on downloads, so they can be contacted and if the problem persists they will be progressively starved of new work. Stephen :) ID: 1908963 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1908970 - Posted: 26 Dec 2017, 1:04:34 UTC - in response to Message 1908890. Two solutions come to mind: No "Anne Onee Moose" accounts. Second automatically stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances. I like the word rate (not count) as in ratio compared to the number of good ones or RAC. A totally valid host can get into an error mode sometimes. It should have a possibility to recover rather hastly. A good computer can have a bad behaving new (a brand new beta) version of an executable but as soon as it has been fixed it should be allowed to recover. Those new computers with RAC < xx and valid tasks < yy could get spanked, ignored, rectified or coerced to use more suitable running environments (less dust, right drivers, ...) , settings (no OC or OV) or correct versions(up to par and date) of drivers and software. I know my host can be seen/taken as one of the worst pollutants/misbehaving ones but I have a cause and an intent to make it (and the whole computing) better. And a good night to all of you at 3 AM here. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1908970 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1908975 - Posted: 26 Dec 2017, 1:37:50 UTC I just wish the mechanism that is supposedly in place for the longest time, actually worked. But I agree, there should be new mechanisms put into play to prevent these bad hosts from getting any more work until they clean up their act. As I said, if the existing mechanism worked consistently, report a bad task and get penalized and reduce penalty in real time for each valid task reported. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1908975 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1909005 - Posted: 26 Dec 2017, 7:12:21 UTC - in response to Message 1908970. Two solutions come to mind: No "Anne Onee Moose" accounts. Second automatically stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances. I like the word rate (not count) as in ratio compared to the number of good ones or RAC. A totally valid host can get into an error mode sometimes. It should have a possibility to recover rather hastly. A good computer can have a bad behaving new (a brand new beta) version of an executable but as soon as it has been fixed it should be allowed to recover. Those new computers with RAC < xx and valid tasks < yy could get spanked, ignored, rectified or coerced to use more suitable running environments (less dust, right drivers, ...) , settings (no OC or OV) or correct versions(up to par and date) of drivers and software. I know my host can be seen/taken as one of the worst pollutants/misbehaving ones but I have a cause and an intent to make it (and the whole computing) better. And a good night to all of you at 3 AM here. Petri . . Hi Petri, . . You have little to be worried about, as an advanced experimental platform your rig has error and invalid rates of about 5% or less, that is 1000% better than many of the delinquent rigs out there is cyberspace. Some of them have valid rates lower than that :) Stephen :) ID: 1909005 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.