Panic Mode On (111) Server Problems?

Message boards : Number crunching : Panic Mode On (111) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 31 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1929732 - Posted: 13 Apr 2018, 4:09:03 UTC - in response to Message 1929724.  

Volta does Arecibo tasks in 75 seconds and a 1080 does them in 150 seconds. Nice!

Half the time, similar clock speeds?
Grant
Darwin NT
ID: 1929732 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929738 - Posted: 13 Apr 2018, 5:19:45 UTC - in response to Message 1929689.  


Yes, that might be expected. If so, I'm sure we will begin to see bewildered posts here in Number Crunching or the Questions and Problems forums. Plenty of us experts around to write a reply on how to fix the problem or deliver the bad news.

I think this might be a good thing in the end as this will force the set and forget type to reconnect with the project. They might be some of the bad hosts that plague us and we have never been able to contact to tell them their hosts have issues.


. . Yep, culling the herd ...

Stephen

:(
ID: 1929738 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1929739 - Posted: 13 Apr 2018, 5:19:49 UTC - in response to Message 1929731.  

TBar pointed out that anybody running without OpenCL drivers will end up with the stock CUDA 32,42 and 50 apps. They will feel the pain as you describe. I'm waiting for Richard's report on how is GTX 470 fairs with them on I assume the SoG app.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1929739 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929741 - Posted: 13 Apr 2018, 5:30:25 UTC - in response to Message 1929693.  

Actually the Old CUDA Apps are still active as a Stock App. If you don't have a working OpenCL Driver you will be sent the OLD Baseline CUDA App, as long as you have a working CUDA Driver. The main problem the last time the Arecibo VLARs were released was people had their machines set to run 2 or 3 CUDA MB tasks at once. One VLAR will load the One Compute Unit used by the Old CUDA App on the VLARs, more than One VLAR at a time will bring it to it's Knees. If you have the machine set to run just One task at a time it should be bearable, but annoying. The New Linux & Mac CUDA Apps use ALL the Compute Units on the Arecibo VLARs, so there really isn't any difference from the OpenCL App.

All My machines are pretty much loaded with the Arecibo VLARs, there must be a Large number of them. If they hadn't of started sending them, people would be moaning about being out of GPU tasks. Hopefully these episodes won't be very common and people will be running BLCs most of the time.


. . A good point, ppl with older video drivers not set to run OpenCL might be getting some wake up calls.

. . And with all these Nvidia equipped hosts finding their caches heavily laden with Arecibo VLARs for both CPU and GPU Q's that adds up to a LOT of them. It was definitely a necessary move. If left to CPU crunchers only it could have been a very nasty log jam. Hopefully it will not be the norm but who knows? I am sure we can adjust ...

Stephen
ID: 1929741 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929742 - Posted: 13 Apr 2018, 5:36:30 UTC - in response to Message 1929702.  

Are we sure these are Arecibo? Didn't they mention something about a 3rd source of data would be rolling out?


. . If you mean Parkes I am confident they will be Blc tasks.

. . Is there another telescope from which we are expecting data?

Stephen

?
ID: 1929742 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929743 - Posted: 13 Apr 2018, 5:38:22 UTC - in response to Message 1929705.  

We're supposed to get Parkes data from Australia eventually. But I don't think that data has arrived yet. I don't think the prep work for that data has been finished yet. If and when we do get it, I bet it has its own naming convention like the Green Bank telescope does.


. . Well they won't be guppis but I'll bet your last dollar they will start with Blc ...

Stephen

:)
ID: 1929743 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929744 - Posted: 13 Apr 2018, 5:40:35 UTC - in response to Message 1929724.  
Last modified: 13 Apr 2018, 5:44:49 UTC

Volta does Arecibo tasks in 75 seconds and a 1080 does them in 150 seconds. Nice!
VLAR work is totally ok. Credit per wu is bigger too. I'll wait till APR scales back up.
If it does not I'll find out what do I need to optimize next.

Petri


. . I had forgotten there was at least one person running Volta out there :) I had a look at current GPU prices and I won't be upgrading anything in the near future. (the 1060's I bought the Christmas before last are now nearly double the price.)

Stephen

:)
ID: 1929744 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1929751 - Posted: 13 Apr 2018, 8:45:53 UTC

I am using a GTX 1050 Ti on a Windows 10 PC which gains me a huge credit (above 30k RAC,) in Einstein@home and a GTX 750 Ti on a Linux box which runs SETI@home with good results but much less credits. Surprisingly, the SETI@home GPU tasks seem to cause frequent reboots on the Windows 10 PC but no reboot on the Linux box.
Tullio
ID: 1929751 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1929765 - Posted: 13 Apr 2018, 11:26:01 UTC - in response to Message 1929702.  

Are we sure these are Arecibo? Didn't they mention something about a 3rd source of data would be rolling out?
Yes, they did. We ran a special fundraiser for the hardware before Christmas, and you've got the 'special' icon beside your name to confirm you donated - thanks. But somebody has probably still got to build the data recorder and fly to Australia to install it. Tough job.
ID: 1929765 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1929767 - Posted: 13 Apr 2018, 11:49:28 UTC

And once that person has had their enforced sojourn in Van Damen's Land there will be a period of end to end testing, trials at Beta and then we will tasks that are inverted.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1929767 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1929770 - Posted: 13 Apr 2018, 12:03:01 UTC

OK, first results are in. Arecibo VLAR on NVidia:

GTX 670, up from <10min to <20 min
GTX 750Ti, up from <12.5min to <25 min
GTX 470, up from 12min to <24 min

So I agree with the general proposition that these tasks run roughly twice as long as BLC VLARs - which is roughly what they do on CPUs, too. I think that they have quietly and gradually tweaked the processing parameters of BLC tasks - people have commented on the different runtimes of different batches - so that BLC VLARs run for roughly the same time as mid-AR Arecibos. And it's worked - y'all haven't been frightened away yet. I wonder what this new psychological experiment will do to the crunchership?

Usability - the 670 is definitely showing screen lag. It can be used, but there's a distinct feeling of "There's something not quite right about my computer this morning". That's the biggest risk to the project: if many people get that feeling, and if they track it down to the true cause (most won't), they might switch off BOINC and their other projects too. That would be a shame.

The 750Ti shows no lag at all :-). That may be because the monitor is connected to the GTX 970 in the same box, and I don't waste a 970 on SETI, while GPUGrid/CUDA80 has work. Which it does this morning.

The GTX 470 is, indeed, a Fermi. Specifically, it's the Fermi I drove across town to collect - and paid £239.99 plus sales tax for - on or around 14 May 2010, because no-one would answer my question (Beta message 39386). It's now sitting in my dual CPU Dell Precision Workstation, to replace the Quadro 1500 (one step below CUDA - I should have waited for the 1700). It's also driving 2520x1200 pixels of dual monitors. It doesn't feel too bad as I type and it crunches, but certain screen redraws are very slow, and I'm used to this old machine - now running 32-bit Windows 7 in 4 GB of quad-channel RAM - feeling clunky by modern standards.

Yes, I'm running r3584 SoG. When running usability tests, remember that Raistmer switched round the processing order compared with CUDA. With VLAR on CUDA, the slow, clunky, bits come at the beginning, which is a real turn off: with OpenCL/SoG, they come towards the end, and the progress %age rate slows to a crawl (not that progress is accurately assessed by any SETI application).
ID: 1929770 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929774 - Posted: 13 Apr 2018, 12:59:50 UTC - in response to Message 1929765.  

Are we sure these are Arecibo? Didn't they mention something about a 3rd source of data would be rolling out?
Yes, they did. We ran a special fundraiser for the hardware before Christmas, and you've got the 'special' icon beside your name to confirm you donated - thanks. But somebody has probably still got to build the data recorder and fly to Australia to install it. Tough job.


. . As a local I will happily volunteer to carry their toolbox ...

Stephen

:)
ID: 1929774 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929776 - Posted: 13 Apr 2018, 13:04:03 UTC - in response to Message 1929767.  

And once that person has had their enforced sojourn in Van Damen's Land there will be a period of end to end testing, trials at Beta and then we will tasks that are inverted.....


. . No no, Van Diemens Land is further south, but only a short way, 1200Kms or so. But they grow nice apples there and I recommend the curried scallop pies.

Stephen

:)
ID: 1929776 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1929791 - Posted: 13 Apr 2018, 16:42:18 UTC
Last modified: 13 Apr 2018, 16:56:04 UTC


OK, first results are in. Arecibo VLAR on NVidia:

GTX 670, up from <10min to <20 min
GTX 750Ti, up from <12.5min to <25 min
GTX 470, up from 12min to <24 min

So I agree with the general proposition that these tasks run roughly twice as long as BLC VLARs - which is roughly what they do on CPUs, too. I think that they have quietly and gradually tweaked the processing parameters of BLC tasks


OK, thanks for the update Richard on how the Fermi card fared. I see it didn't bring the system to its knees as predicted. Was that with just the stock SoG tuning the project applies? Or have you given it a custom tuning tweak with -use_sleep or -period_iterations_num or -sbs adjustment to alleviate any lagginess?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1929791 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1929795 - Posted: 13 Apr 2018, 16:59:48 UTC - in response to Message 1929791.  
Last modified: 13 Apr 2018, 17:30:16 UTC

OK, thanks for the update Richard on how the Fermi card faired. I see it didn't bring the system to its knees as predicted. Was that with just the stock SoG tuning the project applies? Or have you given it a custom tuning tweak with -use_sleep or -period_iterations_num or -sbs adjustment to alleviate any lagginess?
I honestly don't know, though I suppose I could look it up if you twisted my arm. I really don't like the idea of deploying Raistmer's SoG app as a stock 'wallop it out to the set and forget punters that don't even understand english', when you need the brain the size of a planet just to understand the command line. It would be much, much better for general project use (not the 0.01% who read and post in this thread) if the app was made intelligently self-tuning (and if it wasn't written in OpenCL, thus taking away CPU resources from other worthwhile scientific research). But I know I'm preaching to the wrong audience here.

I probably picked one of Mike's pre-cooked suggested lines and dropped it somewhere, about two years ago. The Fermi is in host 2901600 - you can probably pick the details out of one of Raistmer's humunguous stderr_txt files.

Edit - downstairs again. Looks like I didn't bother. This is the machine I used to build Lunatics installers on, so I've got every conceivable app available, but I expect them to work as supplied (and that would be what I was looking at while testing), not taking them dirt-track racing.
ID: 1929795 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1929798 - Posted: 13 Apr 2018, 18:19:42 UTC - in response to Message 1929795.  

Yes, Host 2901600 is running the stock project supplied configuration for the SoG app. That is what I wondered about for the set and forget masses who would never read the SoG_readme.txt file supplied to understand the tuning capabilities possible. So your experience would match the set and forget type who join the project and let the project determine the best app (which is SoG) with the standard configuration and just expect it to run.

So probably will have to rely on the stats aggregrator websites to report any drop in hosts participating in the project caused by unhappiness with their hosts caused by the new VLARs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1929798 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1929800 - Posted: 13 Apr 2018, 18:38:02 UTC - in response to Message 1929798.  

When you are dealing with the standard apps that the server sends out. The older cards might not even be given the OpenCl applications. In which case, those cards are going to be crippled trying to run the new vlars on the old cuda applications. If by some miracle, they actually do have the Opencl SoG on their computers, the machines will at least function, although not as well as a newer card.

So let's not go blaming the SoG application right now.

As far as the tuning, If you get that far, you are more than the average set and forget user so you will have at least done some research into the matter and probably ended here at some point.
ID: 1929800 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1929802 - Posted: 13 Apr 2018, 18:56:26 UTC

Back in 2014 I blocked usage of the GPU. I'm guessing due to the GPU driver issues at the time. The new WU change for GPU got me thinking about it again as I was wondering if I should start allowing WU to my GPU again. I don't want to send back bad data or kill my machine.
I'm being lazy and asking here, instead of doing my own research.

25-Feb-2018 10:41:07 [---] Starting BOINC client version 7.8.6 for x86_64-apple-darwin
25-Feb-2018 10:41:08 [---] log flags: file_xfer, sched_ops, task
25-Feb-2018 10:41:08 [---] Libraries: libcurl/7.50.2 OpenSSL/1.1.0 zlib/1.2.5 c-ares/1.11.0
25-Feb-2018 10:41:08 [---] Data directory: /Library/Application Support/BOINC Data
25-Feb-2018 10:41:10 [---] OpenCL: NVIDIA GPU 0: GeForce 9400 (driver version 10.0.52 310.90.10.05b46, device version OpenCL 1.0, 256MB, 256MB available, 211 GFLOPS peak)
25-Feb-2018 10:41:10 [---] OpenCL CPU: Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
25-Feb-2018 10:41:10 [---]
25-Feb-2018 10:41:10 [---] Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz [x86 Family 6 Model 23 Stepping 10]
25-Feb-2018 10:41:10 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni dtes64 mon dscpl vmx est tm2 ssse3 cx16 tpr pdcm sse4_1 xsave
25-Feb-2018 10:41:11 [---] OS: Mac OS X 10.11.6 (Darwin 15.6.0)
25-Feb-2018 10:41:11 [---] Memory: 3.00 GB physical, 12.34 GB virtual
25-Feb-2018 10:41:11 [---] Disk: 151.40 GB total, 12.10 GB free
ID: 1929802 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1929804 - Posted: 13 Apr 2018, 19:21:43 UTC - in response to Message 1929802.  

Not familiar with MacOS and its drivers. Your Event Log report doesn't show any of the relevant parameters. Hard to pin down the exact minimum requirements for using the SoG app. But from what you posted I think you don't meet the minimum requirement for either the driver level, the Compute Capability level or the OpenCL level.

So I would not recommend enabling that gpu for the project.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1929804 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1929811 - Posted: 13 Apr 2018, 21:08:04 UTC - in response to Message 1929802.  

Back in 2014 I blocked usage of the GPU. I'm guessing due to the GPU driver issues at the time. The new WU change for GPU got me thinking about it again as I was wondering if I should start allowing WU to my GPU again. I don't want to send back bad data or kill my machine.
I'm being lazy and asking here, instead of doing my own research.

25-Feb-2018 10:41:07 [---] Starting BOINC client version 7.8.6 for x86_64-apple-darwin
25-Feb-2018 10:41:08 [---] log flags: file_xfer, sched_ops, task
25-Feb-2018 10:41:08 [---] Libraries: libcurl/7.50.2 OpenSSL/1.1.0 zlib/1.2.5 c-ares/1.11.0
25-Feb-2018 10:41:08 [---] Data directory: /Library/Application Support/BOINC Data
25-Feb-2018 10:41:10 [---] OpenCL: NVIDIA GPU 0: GeForce 9400 (driver version 10.0.52 310.90.10.05b46, device version OpenCL 1.0, 256MB, 256MB available, 211 GFLOPS peak)
25-Feb-2018 10:41:10 [---] OpenCL CPU: Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
25-Feb-2018 10:41:10 [---]
25-Feb-2018 10:41:10 [---] Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU P7350 @ 2.00GHz [x86 Family 6 Model 23 Stepping 10]
25-Feb-2018 10:41:10 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni dtes64 mon dscpl vmx est tm2 ssse3 cx16 tpr pdcm sse4_1 xsave
25-Feb-2018 10:41:11 [---] OS: Mac OS X 10.11.6 (Darwin 15.6.0)
25-Feb-2018 10:41:11 [---] Memory: 3.00 GB physical, 12.34 GB virtual
25-Feb-2018 10:41:11 [---] Disk: 151.40 GB total, 12.10 GB free
There are others with the same machine here, you should see how they preform with a Pre-Fermi mobile GPU, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=82897 It appears it will work, but it is about 2 to 3 times slower than the CPU while using the faster CPU to do Slower GPU work. So, the question would be, why bother when you can just use the Faster CPU instead.
ID: 1929811 · Report as offensive
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (111) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.