Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 70 · 71 · 72 · 73 · 74 · 75 · 76 . . . 162 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . OK I just checked your older results and you were definitely running zi3v so TBar must have included an updated app section to point that to 0.971b, well done! That makes it a lot safer for the hoi poloi to use :) Stephen :) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Stephen you are still really confused about the structure of an app_info I guess. No, no section is included to handle zi3v, it is not necessary. Just the usual app_version statement for the new 0.97b2 application. Why don't you download the archive and have a look at the app_info for yourself. Might clear things up for you. As long as you run the new app with its new app_info that is in the archive you will not dump tasks or make ghosts. If you try editing it on your own for some silly reason, all bets are off. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
The important thing in app_info to avoid changing and making ghosts is the <plan_class> whatever it is originally has to match your tasks in progress, which are listed in the client_state file. If the <plan_class> changes, all your tasks will disappear. That is why it is recommended to empty your cache first. As long as that doesn't change, your tasks are safe. My #1 computer is still on <plan_class>cuda80</plan_class> Two computers have cuda60. It doesn't matter the number, just as long as they match. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Actually, what causes Ghosts is the difference between the client_state and app_info in these lines; <platform></platform> <version_num></version_num> <plan_class></plan_class> Look in your client_state.xml for task assignments, down near the bottom framed by <result></result> As long as the result sections agree with the app_info, as is well. If there is a difference, you must change either the client_state or app_info so they match. <platform></platform> differences are between 64 & 32 bit, most Linux & Mac Apps are 64 bit so this field usually doesn't change. <version_num></version_num> these numbers MUST be the same in client_state & app_info <plan_class></plan_class> these values also MUST match If you just download the last three CUDA packages at C.A. and Look, you will see the app_info sections ALL MATCH; <platform>x86_64-pc-linux-gnu</platform> <version_num>801</version_num> <plan_class>cuda90</plan_class> This means you can change between any of the Three CUDA 9.x packages Without creating Ghosts. Now if you want to Ghost all your assigned tasks, just change the app_info so it doesn't match your <result></result> sections, guaranteed Ghosts. Also, you DO NOT need driver 396.xx to run the CUDA 91 App. The CUDA 91 App will work just fine with a CUDA 91 Driver, CUDA 91 drivers are the ones listed as 390.xx as with this machine, https://setiathome.berkeley.edu/show_host_detail.php?hostid=6813106 |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Also, you DO NOT need driver 396.xx to run the CUDA 91 App. The CUDA 91 App will work just fine with a CUDA 91 Driver, CUDA 91 drivers are the ones listed as 390.xx as with this machine, https://setiathome.berkeley.edu/show_host_detail.php?hostid=6813106 So the baseline "Nvidia-390" driver will work with the CUDA91? That will be a relief. It was a bit of a pain getting my setup to "find" the nvidia-396. Ok, everyone. On your mart. Get set. Upgrade!!!! ROFLing. Tom A proud member of the OFA (Old Farts Association). |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Does this change in "the code" effect the code we are running? https://setiathome.berkeley.edu/forum_thread.php?id=83285#1952776 Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
No, everything done in Nebula is "after the fact". We just crunch the tasks. Nebula takes our results and works with them. Nebula code has nothing to do with our science app code. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
No, everything done in Nebula is "after the fact". We just crunch the tasks. Nebula takes our results and works with them. Nebula code has nothing to do with our science app code. Sorry, it wasn't clear that it was "in Nebula". Tom A proud member of the OFA (Old Farts Association). |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
That came from the Nebula section of these forums so it should've been evident. ;-)No, everything done in Nebula is "after the fact". We just crunch the tasks. Nebula takes our results and works with them. Nebula code has nothing to do with our science app code. Cheers. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
That came from the Nebula section of these forums so it should've been evident. ;-)No, everything done in Nebula is "after the fact". We just crunch the tasks. Nebula takes our results and works with them. Nebula code has nothing to do with our science app code. Mental fart. Apparently I thought I was in "Seti Science" when that question occurred :( A proud member of the OFA (Old Farts Association). |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Oh, I've done everything possible, including swapping out cards with other machines. The problem was it was always just the one machine having the problem. If you ask Petri how many times I've mentioned this, his answer would be LOTS. Now that other people are having the same problem, perhaps he will look at it again. Another one invalid... Don't why it disappeared, but it's back with a vengeance. I've found a few more from last night, seems something happened between 6-7PM EDT, nothing since around 7 last night. That's interesting, seems the problems stopped just after making this post, https://setiathome.berkeley.edu/forum_thread.php?id=83306. Before that I was looking for an Overflow filled with Triplets. However, I have seen the missing Pulses on the two cards Not running the monitor, just not many times.Weird at least.Well, it looks like that problem is back, and it's on the card that was having the most problems even though it's in a different slot. So, I don't think it's the slot, it was the Top slot before, now the card is in the bottom slot and having the same problem. Fortunately it's only happening on One machine of mine, but I did see the same problem on someone else's machine, so, that makes three of us now having the problem. I wonder why it disappeared for a couple of days, I had been seeing a few a day. I have seen it on all cards, but mostly it was on the card in the Top Slot.As for the Pulses completely missing, as well as the Best Pulse missing, I tracked a similar problem down to a GPU not fitting squarely in the Slot. It was tilted slightly upwards. See if you keep getting the same problem with the same GPU.Will check that, of course, but it's hard to be that because i use a cube case, the GPU's are in the vertical position and firmily fixed. The host crunches 1000's of WU every day, and only few (3 yesterday) WU are mark as invalids and all from diferent GPU's. After a few more days of testing I'm convinced this problem where the GPU misses All the Pulses is caused by using a Web Browser. It seems to only happen on a few machines and in My case it doesn't matter which Browser or which browser settings are used. It's unfortunate seeing as how I never had this problem with zi3v, it appears to have just started with the v0.9x versions. I have switched to using the machine with the GTX 1060s for Web browsing and so far haven't seen the problem with the 1060. It seems to be a rather severe problem with the 1050 Ti on the other machine though. The only solution I see is to just use a different card for Web Browsing, so, I've ordered a 'New' Card from eBay. It seems there are some interesting deals now that the new GPUs have been announced. The pricing on eBay is much better than a few months ago, just don't order any GPU from China and make sure the seller has an established history. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
When you say you tried changing settings, I assume you tried turning off "hardware acceleration" in the browser? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I tried different combinations of all Three settings listed under Performance in FireFox without seeing any change, "hardware acceleration" is just one of the settings. I also tried Safari with similar results. About the only thing left is to go back to zi3v for a while and confirm the problem doesn't exist there. I had FireFox open continuously with zi3v, never had a problem with the 1050 Ti back then. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I tried different combinations of all Three settings listed under Performance in FireFox without seeing any change, "hardware acceleration" is just one of the settings. I also tried Safari with similar results. About the only thing left is to go back to zi3v for a while and confirm the problem doesn't exist there. I had FireFox open continuously with zi3v, never had a problem with the 1050 Ti back then. Please forgive if i enter in the wrong thread. But FYI I never use firefox on this host only use Chrome and had at least one WU with that problem. So is not related to Firefox only. Browser setting to use hardware aceleration (on) & the GPU connected to the monitor is a EVGA 1070FTW hybrid. Can't say if the WU with problem was before or after i installed the new app. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
The only solution I see is to just use a different card for Web Browsing, so, I've ordered a 'New' Card from eBay. It seems there are some interesting deals now that the new GPUs have been announced. The pricing on eBay is much better than a few months ago, just don't order any GPU from China and make sure the seller has an established history. Make sure anything you order has more of a brand name than just Nvidia (eg. MSI, ASUS, EGA etc). And doesn't say "generic Nvidia" and you should be alright. I will admit that on the two machines I am still running Linux/CUDA91 on, I have been using "other" cards to drive the monitor. I didn't have a problem with a gtx 750Ti on one box but the dual gtx 750Ti's were a little cranky on the other box. (One box retired, the other moved back to Windows since I had trouble with the browser and my bank). Tom A proud member of the OFA (Old Farts Association). |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, I've loaded zi3v, have FireFox surfing as much as possible, and don't see any missed Pulses. So, I'm going to have to suggest there is some change to V0.9.x which makes it not work correctly on some Systems/Cards when running a Web Browser. The trick is to look at the Best Pulse result, if that number is Zero it means the App has completely missed All pulses. I'll run it for a while and surf some more, but, it looks good right now, https://setiathome.berkeley.edu/results.php?hostid=6796475&offset=260 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Twilight Zone. The machine ran for a couple of hours with zi3v while surfing with FireFox. No Missed Pulses. I quit BOINC, switched back to V0.97b2, Stopped FireFox, and reset the active tasks. First task back with V0.97b2 Missed All the Pulses, it was running on the 960. The 960 isn't even connected to a monitor... https://setiathome.berkeley.edu/result.php?resultid=6961236359 The 1050 Ti is running the monitor, and it found the pulses... https://setiathome.berkeley.edu/result.php?resultid=6961236733 Something strange going on. The 960 missed the next task too. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
That really is Twilight Zone territory. Hard to troubleshoot a intermittent problem. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It's worse than that. I decided to swap the cards back the way they were. Now the Problem has been moved to the 1050 in the Middle slot. Right now the 1050Ti in the Top slot and the 960 in the bottom slot are working fine. It's the 1050's turn to Miss all Pulses. Don't know, maybe it's just BOINC , being BOINC. This certainly doesn't look encouraging, but it looks the same with zi3v, and zi3v doesn't have the problem with missing pulses; 08-Sep-2018 00:10:57 [---] Starting BOINC client version 7.10.3 for x86_64-apple-darwinI dunno, maybe if I swapped the 1050 & 960... Well, that didn't work either. The 1050 is in the bottom slot now and Still missing All the Pulses. Strange it was working before I switched over to zi3v for a couple of hours. Since then nothing seems to get it back to the way it was before then. Only other choice would be to swap the 960 for another 1050. But then I wouldn't be able to Boot to El Capitan to compile Petri's code without swapping in another Maxwell card. El Capitan doesn't do Pascal, but it will Ignore the Pascals and use the Maxwell long enough to compile an App. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I would have suggested that your symptoms were caused by a PCIe slot going bad. But you have moved cards around to other slots and the problem tracked with the card and not the slot. So no help there. I have had two different brand motherboards take a dump and lose a slot where the slot became non-functional and if ANY card was installed in the slot, the PC wouldn't even POST. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.