Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 83 · Next
Author | Message |
---|---|
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox. I'm running SoG r 3500 on that machine Jeff is running the old Cuda Special with cuda 8.0 |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox.Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox.Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes. . . For what it is worth I am getting very good results with zi3v Cuda80, though I am thinking about moving it up to the Cuda90 version as that seems to be just as reliable. Stephen :) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Petri is the first, testing his new build probably with the new Cuda 9.1 toolbox.Yep, I've got a different version of the Special App on each of my 3 Linux hosts. To some extent, I figure that helps to make the changes from one version to the next more obvious. However, that zi3t2b is definitely outdated so I should probably replace it. The pulsefinding in the newer zi3v versions seems to be better, though anything later than zi3v seems to break other things. So it goes. I've found the zi3v CUDA 9 version to be MUCH less prone to throwing out Inconclusives compared to the zi3v CUDA 8 app. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I've found the zi3v CUDA 9 version to be MUCH less prone to throwing out Inconclusives compared to the zi3v CUDA 8 app.That's interesting. I've found it to be pretty much a tossup. I switched one of boxes (8253697) to the zi3v Cuda9 on 23 Nov and noted the task breakdown as: Validation pending (1464) · Validation inconclusive (70) · Valid (1332)I just looked at the current numbers and found: Validation pending (1690) · Validation inconclusive (74) · Valid (996)The Inconclusives have certainly fluctuated over the course of the month since I made the change (much of which seems to be due to the server problems and outages), but the current numbers look like about what I've seen when the servers have been stable. I think the Cuda9 version of zi3v has cut down on the Inconclusives due to Pulse differences, but because that version uses some of the zi3x routines, it's added some Inconclusives due to changes in the Spike and Autocorr reporting sequences by zi3x. The problem occurs on overflows where the signals are mostly Spikes and Autocorrs. Because they're reported in a different order, it tends to be a tossup when the 30-signal overflow condition is triggered as to whether the last signal reported is a Spike or an Autocorr. Up until then, the reported signals can all be matched up (with a little effort), but it's that difference in the last one (or sometimes two or more) that causes indigestion in the validator. Things might be fine for that particular WU if the cutoff was 31, or 29, or......but then it would just be a different WU that would throw a shoe. I first reported on these types of Inconclusives in Message 1895798 a couple months ago. They appear to have started with zi3x, which is why the Cuda8 version of zi3v doesn't have them, but the Cuda9 version of zi3v is a bit of a hybrid and picks up the improved Pulse finding but also the degraded Spike/Autocorr sequencing. BTW, I do still have one host (8289033) running the Cuda8 zi3v and its current task breakdown looks like this: Validation pending (1351) · Validation inconclusive (74) · Valid (796)It's a slightly lower-volume machine, so the current Inconclusives are just a hair more common than with the Cuda9 zi3v host, but probably not statistically significant. They're just Inconclusive for different reasons. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
MY host running CUDA90 State: Todos (6725) · Em progresso (536) · Validation pending (3367) · Validation inconclusive (126) · Válido (2647) · Inválido (0) · Erro (49) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I made a close look at my inconclusives and see a lot of them are like this: https://setiathome.berkeley.edu/workunit.php?wuid=2787543202 The WU crunched by the CUDA90 builds shows: Spike count: 5 Autocorr count: 0 Pulse count: 2 Triplet count: 19 Gaussian count: 0 and the one crunchd by the wingmate host CUDA42 in this particular case shows: Spike count: 30 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 What is happening? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
My zi3v CUDA90 machine. State: All (4167) · In progress (400) · Validation pending (2189) · Validation inconclusive (101) · Valid (1466) · Invalid (0) · Error (11) Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
...and the one crunchd by the wingmate host CUDA42 in this particular case shows:Nothing for you to worry about there. Almost all of your wingmate's GPU tasks are ending up as Invalid with similar overflows. That machine is running an old NVIDIA GeForce 8400 GS which just doesn't seem to be working very well. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
...and the one crunchd by the wingmate host CUDA42 in this particular case shows:Nothing for you to worry about there. Almost all of your wingmate's GPU tasks are ending up as Invalid with similar overflows. That machine is running an old NVIDIA GeForce 8400 GS which just doesn't seem to be working very well. . . Did someone drop that wingmate a line to let them know their video drivers are wrong for that card? It will continue to produce bad results unless it is rectified or the card replaced. [edit] . . Looking at his results the invalids seem to have stopped about 10 hours ago, so I am guessing someone has ... Stephen ?? |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^) . . Some people are clearly shy :) . . But either someone did get through or the operator detected the issue themselves. The invalids seem to have stopped about 10 hours ago (for the moment at least). Here's hoping :) . Actually there are no results since then so may be that rig simply hasn't logged in since that time. Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Unless the project implements some sort of automated notification to users producing bad output, even the piecemeal approach of sending PMs is not going to work when the computer is registered to Mr. A. Nonny Mouse. ;^) . . I have since re-checked that rig and it is still chucking out rubbish. Someone needs to reach them. Stephen :) |
rob smith Send message Joined: 7 Mar 03 Posts: 22529 Credit: 416,307,556 RAC: 380 |
Two solutions come to mind: No "Anne Onee Moose" accounts. Second automticly stop sending tasks to computers with high invalid rates in the same way that high computing error rate computers get reduced task allowances. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Two solutions come to mind: . . I can't see the first one being implemented, but the second one should be feasible. . . Damage control ... Stephen .. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Two solutions come to mind: Not entirely sure of the reference with the first suggestion. I think it refers to drive-by spammers creating accounts to spam. We should do the same as MilkyWay just did and not allow accounts to post or form teams unless the host has produced 1 credit. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Two solutions come to mind: . . Hi Keith, . . Actually we are talking about dysfunctional hosts producing large numbers of bad results and causing that number of resends. When they are set up, as was the case in point, as anonymous you cannot contact them to point out that they need to take action about their problem. So Rob was suggesting anonymous accounts should not be allowed and/or their bad results cause them to be restricted on downloads, so they can be contacted and if the problem persists they will be progressively starved of new work. Stephen :) |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Two solutions come to mind: I like the word rate (not count) as in ratio compared to the number of good ones or RAC. A totally valid host can get into an error mode sometimes. It should have a possibility to recover rather hastly. A good computer can have a bad behaving new (a brand new beta) version of an executable but as soon as it has been fixed it should be allowed to recover. Those new computers with RAC < xx and valid tasks < yy could get spanked, ignored, rectified or coerced to use more suitable running environments (less dust, right drivers, ...) , settings (no OC or OV) or correct versions(up to par and date) of drivers and software. I know my host can be seen/taken as one of the worst pollutants/misbehaving ones but I have a cause and an intent to make it (and the whole computing) better. And a good night to all of you at 3 AM here. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I just wish the mechanism that is supposedly in place for the longest time, actually worked. But I agree, there should be new mechanisms put into play to prevent these bad hosts from getting any more work until they clean up their act. As I said, if the existing mechanism worked consistently, report a bad task and get penalized and reduce penalty in real time for each valid task reported. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Two solutions come to mind: . . Hi Petri, . . You have little to be worried about, as an advanced experimental platform your rig has error and invalid rates of about 5% or less, that is 1000% better than many of the delinquent rigs out there is cyberspace. Some of them have valid rates lower than that :) Stephen :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.