Panic Mode On (111) Server Problems?

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1929732 - Posted: 13 Apr 2018, 4:09:03 UTC - in response to Message 1929724. Volta does Arecibo tasks in 75 seconds and a 1080 does them in 150 seconds. Nice! Half the time, similar clock speeds? Grant Darwin NT ID: 1929732 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1929738 - Posted: 13 Apr 2018, 5:19:45 UTC - in response to Message 1929689. Yes, that might be expected. If so, I'm sure we will begin to see bewildered posts here in Number Crunching or the Questions and Problems forums. Plenty of us experts around to write a reply on how to fix the problem or deliver the bad news. I think this might be a good thing in the end as this will force the set and forget type to reconnect with the project. They might be some of the bad hosts that plague us and we have never been able to contact to tell them their hosts have issues. . . Yep, culling the herd ... Stephen :( ID: 1929738 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1929739 - Posted: 13 Apr 2018, 5:19:49 UTC - in response to Message 1929731. TBar pointed out that anybody running without OpenCL drivers will end up with the stock CUDA 32,42 and 50 apps. They will feel the pain as you describe. I'm waiting for Richard's report on how is GTX 470 fairs with them on I assume the SoG app. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1929739 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1929741 - Posted: 13 Apr 2018, 5:30:25 UTC - in response to Message 1929693. Actually the Old CUDA Apps are still active as a Stock App. If you don't have a working OpenCL Driver you will be sent the OLD Baseline CUDA App, as long as you have a working CUDA Driver. The main problem the last time the Arecibo VLARs were released was people had their machines set to run 2 or 3 CUDA MB tasks at once. One VLAR will load the One Compute Unit used by the Old CUDA App on the VLARs, more than One VLAR at a time will bring it to it's Knees. If you have the machine set to run just One task at a time it should be bearable, but annoying. The New Linux & Mac CUDA Apps use ALL the Compute Units on the Arecibo VLARs, so there really isn't any difference from the OpenCL App. All My machines are pretty much loaded with the Arecibo VLARs, there must be a Large number of them. If they hadn't of started sending them, people would be moaning about being out of GPU tasks. Hopefully these episodes won't be very common and people will be running BLCs most of the time. . . A good point, ppl with older video drivers not set to run OpenCL might be getting some wake up calls. . . And with all these Nvidia equipped hosts finding their caches heavily laden with Arecibo VLARs for both CPU and GPU Q's that adds up to a LOT of them. It was definitely a necessary move. If left to CPU crunchers only it could have been a very nasty log jam. Hopefully it will not be the norm but who knows? I am sure we can adjust ... Stephen ID: 1929741 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1929742 - Posted: 13 Apr 2018, 5:36:30 UTC - in response to Message 1929702. Are we sure these are Arecibo? Didn't they mention something about a 3rd source of data would be rolling out? . . If you mean Parkes I am confident they will be Blc tasks. . . Is there another telescope from which we are expecting data? Stephen ? ID: 1929742 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1929743 - Posted: 13 Apr 2018, 5:38:22 UTC - in response to Message 1929705. We're supposed to get Parkes data from Australia eventually. But I don't think that data has arrived yet. I don't think the prep work for that data has been finished yet. If and when we do get it, I bet it has its own naming convention like the Green Bank telescope does. . . Well they won't be guppis but I'll bet your last dollar they will start with Blc ... Stephen :) ID: 1929743 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1929744 - Posted: 13 Apr 2018, 5:40:35 UTC - in response to Message 1929724. Last modified: 13 Apr 2018, 5:44:49 UTC Volta does Arecibo tasks in 75 seconds and a 1080 does them in 150 seconds. Nice! VLAR work is totally ok. Credit per wu is bigger too. I'll wait till APR scales back up. If it does not I'll find out what do I need to optimize next. Petri . . I had forgotten there was at least one person running Volta out there :) I had a look at current GPU prices and I won't be upgrading anything in the near future. (the 1060's I bought the Christmas before last are now nearly double the price.) Stephen :) ID: 1929744 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 1929751 - Posted: 13 Apr 2018, 8:45:53 UTC I am using a GTX 1050 Ti on a Windows 10 PC which gains me a huge credit (above 30k RAC,) in Einstein@home and a GTX 750 Ti on a Linux box which runs SETI@home with good results but much less credits. Surprisingly, the SETI@home GPU tasks seem to cause frequent reboots on the Windows 10 PC but no reboot on the Linux box. Tullio ID: 1929751 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1929765 - Posted: 13 Apr 2018, 11:26:01 UTC - in response to Message 1929702. Are we sure these are Arecibo? Didn't they mention something about a 3rd source of data would be rolling out? Yes, they did. We ran a special fundraiser for the hardware before Christmas, and you've got the 'special' icon beside your name to confirm you donated - thanks. But somebody has probably still got to build the data recorder and fly to Australia to install it. Tough job. ID: 1929765 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22202 Credit: 416,307,556 RAC: 380	Message 1929767 - Posted: 13 Apr 2018, 11:49:28 UTC And once that person has had their enforced sojourn in Van Damen's Land there will be a period of end to end testing, trials at Beta and then we will tasks that are inverted..... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1929767 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1929770 - Posted: 13 Apr 2018, 12:03:01 UTC OK, first results are in. Arecibo VLAR on NVidia: GTX 670, up from <10min to <20 min GTX 750Ti, up from <12.5min to <25 min GTX 470, up from 12min to <24 min So I agree with the general proposition that these tasks run roughly twice as long as BLC VLARs - which is roughly what they do on CPUs, too. I think that they have quietly and gradually tweaked the processing parameters of BLC tasks - people have commented on the different runtimes of different batches - so that BLC VLARs run for roughly the same time as mid-AR Arecibos. And it's worked - y'all haven't been frightened away yet. I wonder what this new psychological experiment will do to the crunchership? Usability - the 670 is definitely showing screen lag. It can be used, but there's a distinct feeling of "There's something not quite right about my computer this morning". That's the biggest risk to the project: if many people get that feeling, and if they track it down to the true cause (most won't), they might switch off BOINC and their other projects too. That would be a shame. The 750Ti shows no lag at all :-). That may be because the monitor is connected to the GTX 970 in the same box, and I don't waste a 970 on SETI, while GPUGrid/CUDA80 has work. Which it does this morning. The GTX 470 is, indeed, a Fermi. Specifically, it's the Fermi I drove across town to collect - and paid Â£239.99 plus sales tax for - on or around 14 May 2010, because no-one would answer my question (Beta message 39386). It's now sitting in my dual CPU Dell Precision Workstation, to replace the Quadro 1500 (one step below CUDA - I should have waited for the 1700). It's also driving 2520x1200 pixels of dual monitors. It doesn't feel too bad as I type and it crunches, but certain screen redraws are very slow, and I'm used to this old machine - now running 32-bit Windows 7 in 4 GB of quad-channel RAM - feeling clunky by modern standards. Yes, I'm running r3584 SoG. When running usability tests, remember that Raistmer switched round the processing order compared with CUDA. With VLAR on CUDA, the slow, clunky, bits come at the beginning, which is a real turn off: with OpenCL/SoG, they come towards the end, and the progress %age rate slows to a crawl (not that progress is accurately assessed by any SETI application). ID: 1929770 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1929774 - Posted: 13 Apr 2018, 12:59:50 UTC - in response to Message 1929765. Are we sure these are Arecibo? Didn't they mention something about a 3rd source of data would be rolling out? Yes, they did. We ran a special fundraiser for the hardware before Christmas, and you've got the 'special' icon beside your name to confirm you donated - thanks. But somebody has probably still got to build the data recorder and fly to Australia to install it. Tough job. . . As a local I will happily volunteer to carry their toolbox ... Stephen :) ID: 1929774 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1929776 - Posted: 13 Apr 2018, 13:04:03 UTC - in response to Message 1929767. And once that person has had their enforced sojourn in Van Damen's Land there will be a period of end to end testing, trials at Beta and then we will tasks that are inverted..... . . No no, Van Diemens Land is further south, but only a short way, 1200Kms or so. But they grow nice apples there and I recommend the curried scallop pies. Stephen :) ID: 1929776 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1929791 - Posted: 13 Apr 2018, 16:42:18 UTC Last modified: 13 Apr 2018, 16:56:04 UTC OK, first results are in. Arecibo VLAR on NVidia: GTX 670, up from <10min to <20 min GTX 750Ti, up from <12.5min to <25 min GTX 470, up from 12min to <24 min So I agree with the general proposition that these tasks run roughly twice as long as BLC VLARs - which is roughly what they do on CPUs, too. I think that they have quietly and gradually tweaked the processing parameters of BLC tasks OK, thanks for the update Richard on how the Fermi card fared. I see it didn't bring the system to its knees as predicted. Was that with just the stock SoG tuning the project applies? Or have you given it a custom tuning tweak with -use_sleep or -period_iterations_num or -sbs adjustment to alleviate any lagginess? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1929791 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1929795 - Posted: 13 Apr 2018, 16:59:48 UTC - in response to Message 1929791. Last modified: 13 Apr 2018, 17:30:16 UTC OK, thanks for the update Richard on how the Fermi card faired. I see it didn't bring the system to its knees as predicted. Was that with just the stock SoG tuning the project applies? Or have you given it a custom tuning tweak with -use_sleep or -period_iterations_num or -sbs adjustment to alleviate any lagginess? I honestly don't know, though I suppose I could look it up if you twisted my arm. I really don't like the idea of deploying Raistmer's SoG app as a stock 'wallop it out to the set and forget punters that don't even understand english', when you need the brain the size of a planet just to understand the command line. It would be much, much better for general project use (not the 0.01% who read and post in this thread) if the app was made intelligently self-tuning (and if it wasn't written in OpenCL, thus taking away CPU resources from other worthwhile scientific research). But I know I'm preaching to the wrong audience here. I probably picked one of Mike's pre-cooked suggested lines and dropped it somewhere, about two years ago. The Fermi is in host 2901600 - you can probably pick the details out of one of Raistmer's humunguous stderr_txt files. Edit - downstairs again. Looks like I didn't bother. This is the machine I used to build Lunatics installers on, so I've got every conceivable app available, but I expect them to work as supplied (and that would be what I was looking at while testing), not taking them dirt-track racing. ID: 1929795 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1929798 - Posted: 13 Apr 2018, 18:19:42 UTC - in response to Message 1929795. Yes, Host 2901600 is running the stock project supplied configuration for the SoG app. That is what I wondered about for the set and forget masses who would never read the SoG_readme.txt file supplied to understand the tuning capabilities possible. So your experience would match the set and forget type who join the project and let the project determine the best app (which is SoG) with the standard configuration and just expect it to run. So probably will have to rely on the stats aggregrator websites to report any drop in hosts participating in the project caused by unhappiness with their hosts caused by the new VLARs. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1929798 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1929800 - Posted: 13 Apr 2018, 18:38:02 UTC - in response to Message 1929798. When you are dealing with the standard apps that the server sends out. The older cards might not even be given the OpenCl applications. In which case, those cards are going to be crippled trying to run the new vlars on the old cuda applications. If by some miracle, they actually do have the Opencl SoG on their computers, the machines will at least function, although not as well as a newer card. So let's not go blaming the SoG application right now. As far as the tuning, If you get that far, you are more than the average set and forget user so you will have at least done some research into the matter and probably ended here at some point. ID: 1929800 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1929802 - Posted: 13 Apr 2018, 18:56:26 UTC Back in 2014 I blocked usage of the GPU. I'm guessing due to the GPU driver issues at the time. The new WU change for GPU got me thinking about it again as I was wondering if I should start allowing WU to my GPU again. I don't want to send back bad data or kill my machine. I'm being lazy and asking here, instead of doing my own research. 25-Feb-2018 10:41:07 [---] Starting BOINC client version 7.8.6 for x86_64-apple-darwin 25-Feb-2018 10:41:08 [---] log flags: file_xfer, sched_ops, task 25-Feb-2018 10:41:08 [---] Libraries: libcurl/7.50.2 OpenSSL/1.1.0 zlib/1.2.5 c-ares/1.11.0 25-Feb-2018 10:41:08 [---] Data directory: /Library/Application Support/BOINC Data 25-Feb-2018 10:41:10 [---] OpenCL: NVIDIA GPU 0: GeForce 9400 (driver version 10.0.52 310.90.10.05b46, device version OpenCL 1.0, 256MB, 256MB available, 211 GFLOPS peak) 25-Feb-2018 10:41:10 [---] OpenCL CPU: Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2) 25-Feb-2018 10:41:10 [---] 25-Feb-2018 10:41:10 [---] Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz [x86 Family 6 Model 23 Stepping 10] 25-Feb-2018 10:41:10 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni dtes64 mon dscpl vmx est tm2 ssse3 cx16 tpr pdcm sse4_1 xsave 25-Feb-2018 10:41:11 [---] OS: Mac OS X 10.11.6 (Darwin 15.6.0) 25-Feb-2018 10:41:11 [---] Memory: 3.00 GB physical, 12.34 GB virtual 25-Feb-2018 10:41:11 [---] Disk: 151.40 GB total, 12.10 GB free ID: 1929802 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1929804 - Posted: 13 Apr 2018, 19:21:43 UTC - in response to Message 1929802. Not familiar with MacOS and its drivers. Your Event Log report doesn't show any of the relevant parameters. Hard to pin down the exact minimum requirements for using the SoG app. But from what you posted I think you don't meet the minimum requirement for either the driver level, the Compute Capability level or the OpenCL level. So I would not recommend enabling that gpu for the project. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1929804 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1929811 - Posted: 13 Apr 2018, 21:08:04 UTC - in response to Message 1929802. Back in 2014 I blocked usage of the GPU. I'm guessing due to the GPU driver issues at the time. The new WU change for GPU got me thinking about it again as I was wondering if I should start allowing WU to my GPU again. I don't want to send back bad data or kill my machine. I'm being lazy and asking here, instead of doing my own research. 25-Feb-2018 10:41:07 [---] Starting BOINC client version 7.8.6 for x86_64-apple-darwin 25-Feb-2018 10:41:08 [---] log flags: file_xfer, sched_ops, task 25-Feb-2018 10:41:08 [---] Libraries: libcurl/7.50.2 OpenSSL/1.1.0 zlib/1.2.5 c-ares/1.11.0 25-Feb-2018 10:41:08 [---] Data directory: /Library/Application Support/BOINC Data 25-Feb-2018 10:41:10 [---] OpenCL: NVIDIA GPU 0: GeForce 9400 (driver version 10.0.52 310.90.10.05b46, device version OpenCL 1.0, 256MB, 256MB available, 211 GFLOPS peak) 25-Feb-2018 10:41:10 [---] OpenCL CPU: Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2) 25-Feb-2018 10:41:10 [---] 25-Feb-2018 10:41:10 [---] Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz [x86 Family 6 Model 23 Stepping 10] 25-Feb-2018 10:41:10 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni dtes64 mon dscpl vmx est tm2 ssse3 cx16 tpr pdcm sse4_1 xsave 25-Feb-2018 10:41:11 [---] OS: Mac OS X 10.11.6 (Darwin 15.6.0) 25-Feb-2018 10:41:11 [---] Memory: 3.00 GB physical, 12.34 GB virtual 25-Feb-2018 10:41:11 [---] Disk: 151.40 GB total, 12.10 GB free There are others with the same machine here, you should see how they preform with a Pre-Fermi mobile GPU, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=82897 It appears it will work, but it is about 2 to 3 times slower than the CPU while using the faster CPU to do Slower GPU work. So, the question would be, why bother when you can just use the Faster CPU instead. ID: 1929811 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.