Panic Mode On (84) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (84) Server Problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 22 · Next
Author Message
Lionel
Send message
Joined: 25 Mar 00
Posts: 545
Credit: 230,870,044
RAC: 249,566
Australia
Message 1373433 - Posted: 30 May 2013, 10:48:16 UTC - in response to Message 1373431.

Just received a "cuda_opend_100" AP work unit on the box that houses the GTX295s. How do I stop AP from landing on this box in v7 ??
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8550
Credit: 50,364,021
RAC: 50,747
United Kingdom
Message 1373454 - Posted: 30 May 2013, 11:16:36 UTC - in response to Message 1373433.

Just received a "cuda_opend_100" AP work unit on the box that houses the GTX295s. How do I stop AP from landing on this box in v7 ??

Same way as always - deselect AP in preferences.

There's no AP v7 - still at v6 for that app.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5344
Credit: 298,605,858
RAC: 465,251
Brazil
Message 1373469 - Posted: 30 May 2013, 12:10:34 UTC - in response to Message 1373281.
Last modified: 30 May 2013, 12:18:09 UTC

Looks like it'll be system specific, so I'll probably present the 3 options I mentioned in response to Fred E, to the project. Try the reduced instances / settings described. See if anything changes if you free a CPU core.


Jason

As you ask I made the test running 2 WU at a time (with 3 the video lag is to high).

Free a CPU core have no noticeable efect.
Change the setting helps with the video lag.
Cuda32 - 1 1/2 - 2 hrs to crunch a Vlar
Cuda42 - 1 hr
Cuda50 - no Vlar is avaiable on the cache to test.

Normal crunch time (not Vlars) in this host with the old V6 and x41zc Cuda50 and 3 WU at a time: 10-15 min with the new V7 and 2 WU 10-15 min a little bit slower as expected.

From my side, as a user, crunch the Vlars are 4 to 5x slower than non-Vlar WU, so still belive is a waste of resource crunch Vlar in the GPU on this host at least.

Host: http://setiathome.berkeley.edu/show_host_detail.php?hostid=6283715
____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5021
Credit: 73,545,305
RAC: 14,966
Australia
Message 1373471 - Posted: 30 May 2013, 12:21:36 UTC - in response to Message 1373469.

Normal crunch time in this host with the old V6 and x41zc Cuda50 and 3 WU at a time: 10-15 min

Host: http://setiathome.berkeley.edu/show_host_detail.php?hostid=6283715


Great to have some numbers & the paremeters have *some* intended effect. How long does your CPU take to do V7 VLAR ?

My old Core2Duo takes 3.5 to 4 hours, which is much longer than V6 of course, and 4.5 to 6x longer than GTX680+x41zc. AVX capable will be quicker than my old beast.

For now, just remember V7 has much more intense processing, which Eric will detail the why's & what fors in short order. Comparing V7 to V6 is going to be an issue for some, believing that they do the same thing (they don't). For others they want absolute control there just isn't any controls for yet ;)

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

MikeN
Send message
Joined: 24 Jan 11
Posts: 301
Credit: 31,916,249
RAC: 41,676
United Kingdom
Message 1373472 - Posted: 30 May 2013, 12:22:27 UTC

I upgraded 5833982 to Boinc 7.0.64 during the outage yesterday to get it ready for MB v7. Once the servers came back up I reset the project and got all my tasks back as resends (or most of them anyway). However, since they I have been getting some 194 errors from both CPU and GPU processed tasks whilst others have run and validated OK. Is this anything to worry about? All error tasks for this PC are listed on: http://setiathome.berkeley.edu/results.php?hostid=5833982&offset=0&show_names=0&state=6&appid=. Ignore any returned 29th May as that was when I was updating / reseting, but at time of writing there are three returned today 30th May.
____________

Profile WilliamProject donor
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1602
Credit: 9,469,048
RAC: 276
Message 1373474 - Posted: 30 May 2013, 12:35:40 UTC - in response to Message 1373472.

I upgraded 5833982 to Boinc 7.0.64 during the outage yesterday to get it ready for MB v7. Once the servers came back up I reset the project and got all my tasks back as resends (or most of them anyway). However, since they I have been getting some 194 errors from both CPU and GPU processed tasks whilst others have run and validated OK. Is this anything to worry about? All error tasks for this PC are listed on: http://setiathome.berkeley.edu/results.php?hostid=5833982&offset=0&show_names=0&state=6&appid=. Ignore any returned 29th May as that was when I was updating / reseting, but at time of writing there are three returned today 30th May.

<sarcasm>Great</sarcasm>
Ok, I see some time limit exeeded which is the boinc upgrade bug as per http://setiathome.berkeley.edu/forum_thread.php?id=71160.

http://setiathome.berkeley.edu/result.php?resultid=3006233155

Looks like it is one of those where boinc fails to receive the 'I'm done' msg from the app.
We've been trying to chase that error to no avail.
Actually I'm glad it is popping up with cuda, as Jason uses a differnt API from Raistmer.

I've lost track - those reading boinc_dev and _alpha - is that error still turning up after Raistmer upgraded to latetst API?

MikeN - please start dedicated thread. It needs chasing but we can't do that in the middle of an application upgrade.
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5344
Credit: 298,605,858
RAC: 465,251
Brazil
Message 1373475 - Posted: 30 May 2013, 12:38:15 UTC - in response to Message 1373471.
Last modified: 30 May 2013, 12:43:58 UTC

How long does your CPU take to do V7 VLAR ?

About 4 1/2 to 5 hours, this is an old CPU too.

I understand v7 use more processing power/time than v6 but it´s improve the ability to search mode deepely the signal so that´s is a good thing.
____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5021
Credit: 73,545,305
RAC: 14,966
Australia
Message 1373478 - Posted: 30 May 2013, 12:49:08 UTC - in response to Message 1373475.
Last modified: 30 May 2013, 12:56:02 UTC

How long does your CPU take to do V7 VLAR ?

About 4 1/2 to 5 hours


OK, keep an eye on Cuda 5 times & let me know please. They should be around 50-60 mins for 2 at once ( so effectively half an hour each ), similar to the Cuda42 or a bit better.

As for the pulsefinding settings related to lag, you can experiment with higher & lower values within the range.

Until baseline V7 settles in, it won't be a productivity race, but one to understand all the new issues (including appropriate new VLAR on GPU policy).

After this the gloves are off, and it's open season on those hefty pulsefinds and new autocorrelations for another major optimisation round ;)

Jason

[Edit:]
I understand v7 use more processing power/time than v6 but it´s improve the ability to search mode deeply the signal so that´s is a good thing.


As well as the extra processing with the new autocorrelation search, there are moves toward fewer inconclusives across all the v7 apps. Most you should see with V7 will be due to either Bad hosts, or known issues. The dated stock 6.xx Cuda apps being out of the way is going to be a relief, even as new unexpected challenges take the place of those.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5344
Credit: 298,605,858
RAC: 465,251
Brazil
Message 1373481 - Posted: 30 May 2013, 13:01:43 UTC - in response to Message 1373478.
Last modified: 30 May 2013, 13:02:34 UTC


After this the gloves are off, and it's open season on those hefty pulsefinds and new autocorrelations for another major optimisation round ;)

I like that... seems like a new round of Jason´s black magic is comming... that is allways a good thing... lets wait...

As soon i receive any vlar to crunch with cuda50 on this host i will post the times. DO you whant to know the times on other GPUs? I have 580/590/690 in others hosts, now running with v7 and 2 wu at a time on each one.
____________

Paul BowyerProject donor
Send message
Joined: 15 Aug 99
Posts: 9
Credit: 70,497,718
RAC: 127,982
United States
Message 1373487 - Posted: 30 May 2013, 13:20:05 UTC

I have something weird going on with my 2 machines, they were assigned VLAR but they are showing up as suspended by user and I did not suspend them.


Same thing happened on both my boxes - one was running v7, but the other is still running v6. The v7 cache ran dry overnite because of the suspended wu. I resumed it and it ran ok.
____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5021
Credit: 73,545,305
RAC: 14,966
Australia
Message 1373490 - Posted: 30 May 2013, 13:32:55 UTC - in response to Message 1373481.
Last modified: 30 May 2013, 13:34:02 UTC

As soon i receive any vlar to crunch with cuda50 on this host i will post the times. DO you whant to know the times on other GPUs? I have 580/590/690 in others hosts, now running with v7 and 2 wu at a time on each one.


Yes thanks, though the Fermi Class (5xx) probably won't receive them for now. It's the Kepler GPUs testing the efficiency bounds, and if there are major problems to solve.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

MikeN
Send message
Joined: 24 Jan 11
Posts: 301
Credit: 31,916,249
RAC: 41,676
United Kingdom
Message 1373496 - Posted: 30 May 2013, 14:00:29 UTC

My GTX460 has just started running CUDA50's. They seem to take either 7 minutes or 20 minutes (running 1WU at a time as I do not know how to run 2 yet). I assume 7 minutes are shorties and 20 minutes normal angle range. For v6 shorties used to take 4 minutes (running 2 at a time) and normal angle range 18 minutes (running 2 at a time).
____________

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3410
Credit: 20,268,298
RAC: 24,321
Sweden
Message 1373517 - Posted: 30 May 2013, 15:03:13 UTC
Last modified: 30 May 2013, 15:03:37 UTC

And now to something completely different, but rather expected:

Where's my APs? :-)

I do not plan to go to MB v7 in the near future, unless AP dries out completely.
____________

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6211
Credit: 710,458
RAC: 1,193
United States
Message 1373523 - Posted: 30 May 2013, 15:12:27 UTC - in response to Message 1373497.
Last modified: 30 May 2013, 15:13:21 UTC

Is the reason for no work been split or very low amounts sub 10 per second to do with the rollout of the new application (version 7) or is there something more server related going on?

I suspect something's borked.

Or the server status page is not showing the activity of v7.......

Yes, exactly right, Mark. Same as when Astropulse V6 rolled out. The SSP continued to show only AP v505 until almost all of them were done and validated, THEN they changed the scripts to track AP v6 generation and validation.

So for the next two-three months, the Sati@Home "Results Ready to Send" and "Creation Rate" numbers will be close to zero and not valid indicators of splitter performance.
____________
Donald
Infernal Optimist / Submariner, retired

Profile SonicAgamemnon
Avatar
Send message
Joined: 8 Apr 06
Posts: 31
Credit: 11,893,945
RAC: 0
United States
Message 1373531 - Posted: 30 May 2013, 15:22:17 UTC

My computer appears to have fully installed v7, but the only WUs being downloaded are setiathome_enhanced 6.03, no v7 work at all. Are there v7 WUs available? I have a lot of latent CUDA horsepower waiting on the sideline for something to crunch on...


5/30/2013 8:12:49 AM | | Starting BOINC client version 7.0.64 for windows_x86_64
5/30/2013 8:12:49 AM | | log flags: file_xfer, sched_ops, task
5/30/2013 8:12:49 AM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
5/30/2013 8:12:49 AM | | Data directory: C:\ProgramData\BOINC
5/30/2013 8:12:49 AM | | Running under account SonicAgamemnon
5/30/2013 8:12:49 AM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz [Family 6 Model 45 Stepping 7]
5/30/2013 8:12:49 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx tm2 dca pbe
5/30/2013 8:12:49 AM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00)
5/30/2013 8:12:49 AM | | Memory: 31.95 GB physical, 79.88 GB virtual
5/30/2013 8:12:49 AM | | Disk: 906.10 GB total, 510.74 GB free
5/30/2013 8:12:49 AM | | Local time is UTC -7 hours
5/30/2013 8:12:49 AM | | VirtualBox version: 4.2.4
5/30/2013 8:12:49 AM | | CUDA: NVIDIA GPU 0: GeForce GTX TITAN (driver version 314.22, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
5/30/2013 8:12:49 AM | | CUDA: NVIDIA GPU 1: GeForce GTX TITAN (driver version 314.22, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
5/30/2013 8:12:49 AM | | CUDA: NVIDIA GPU 2: GeForce GTX TITAN (driver version 314.22, CUDA version 5.0, compute capability 3.5, 4096MB, 4096MB available, 4989 GFLOPS peak)
5/30/2013 8:12:49 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN (driver version 314.22, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4989 GFLOPS peak)
5/30/2013 8:12:49 AM | | OpenCL: NVIDIA GPU 1: GeForce GTX TITAN (driver version 314.22, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4989 GFLOPS peak)
5/30/2013 8:12:49 AM | | OpenCL: NVIDIA GPU 2: GeForce GTX TITAN (driver version 314.22, device version OpenCL 1.1 CUDA, 6144MB, 4096MB available, 4989 GFLOPS peak)
5/30/2013 8:12:49 AM | SETI@home | Found app_info.xml; using anonymous platform
5/30/2013 8:12:49 AM | | Config: report completed tasks immediately
5/30/2013 8:12:49 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6857388; resource share 100
5/30/2013 8:12:49 AM | SETI@home | General prefs: from SETI@home (last modified 29-May-2013 20:26:02)
5/30/2013 8:12:49 AM | SETI@home | Computer location: home
5/30/2013 8:12:49 AM | SETI@home | General prefs: no separate prefs for home; using your defaults
5/30/2013 8:12:49 AM | | Reading preferences override file
5/30/2013 8:12:49 AM | | Preferences:
5/30/2013 8:12:49 AM | | max memory usage when active: 16360.19MB
5/30/2013 8:12:49 AM | | max memory usage when idle: 29448.34MB
5/30/2013 8:12:49 AM | | max disk usage: 25.00GB
5/30/2013 8:12:49 AM | | max download rate: 999004 bytes/sec
5/30/2013 8:12:49 AM | | max upload rate: 999004 bytes/sec
5/30/2013 8:12:49 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
5/30/2013 8:12:49 AM | | Not using a proxy
5/30/2013 8:12:50 AM | SETI@home | Restarting task 23oc12ac.24695.3339.11.11.80_3 using setiathome_enhanced version 603 in slot 9
5/30/2013 8:12:50 AM | SETI@home | Restarting task 22my12ab.9983.14791.11.11.63_2 using setiathome_enhanced version 603 in slot 3
5/30/2013 8:12:50 AM | SETI@home | Restarting task 19oc12ab.23052.22040.9.11.105_2 using setiathome_enhanced version 603 in slot 1
5/30/2013 8:12:50 AM | SETI@home | Restarting task 08jn10ac.9828.237768.14.11.195_3 using setiathome_enhanced version 603 in slot 0
5/30/2013 8:12:50 AM | SETI@home | Restarting task 23oc12ac.24695.14382.11.11.132_2 using setiathome_enhanced version 603 in slot 10
5/30/2013 8:12:50 AM | SETI@home | Restarting task 26jn12ab.6592.14968.6.11.179_2 using setiathome_enhanced version 603 in slot 11
5/30/2013 8:12:50 AM | SETI@home | Restarting task 19oc12ab.13788.11815.8.11.39_3 using setiathome_enhanced version 603 in slot 8
5/30/2013 8:12:50 AM | SETI@home | Restarting task 23oc12ac.1058.16427.14.11.218_2 using setiathome_enhanced version 603 in slot 5
5/30/2013 8:12:50 AM | SETI@home | Restarting task 26jn12ab.17010.2284.8.11.113_2 using setiathome_enhanced version 603 in slot 4
5/30/2013 8:12:50 AM | SETI@home | Restarting task 30jl12aa.22752.63374.206158430222.10.176_2 using setiathome_enhanced version 603 in slot 7
5/30/2013 8:12:50 AM | SETI@home | Restarting task 25jn11ac.10463.24374.6.11.167_2 using setiathome_enhanced version 603 in slot 2
5/30/2013 8:12:50 AM | SETI@home | Restarting task 23oc12ac.24180.12337.10.11.19_2 using setiathome_enhanced version 603 in slot 6
5/30/2013 8:17:50 AM | SETI@home | Sending scheduler request: To fetch work.
5/30/2013 8:17:50 AM | SETI@home | Requesting new tasks for CPU and NVIDIA
5/30/2013 8:17:53 AM | SETI@home | Scheduler request completed: got 0 new tasks
5/30/2013 8:17:53 AM | SETI@home | No tasks sent
5/30/2013 8:17:53 AM | SETI@home | No tasks are available for SETI@home Enhanced
5/30/2013 8:17:53 AM | SETI@home | No tasks are available for SETI@home v7



____________
"History is a pack of lies about events that never happened told by people who weren't there." - Santayana

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6211
Credit: 710,458
RAC: 1,193
United States
Message 1373536 - Posted: 30 May 2013, 15:30:52 UTC - in response to Message 1373531.
Last modified: 30 May 2013, 15:38:02 UTC

My computer appears to have fully installed v7, but the only WUs being downloaded are setiathome_enhanced 6.03, no v7 work at all. Are there v7 WUs available? I have a lot of latent CUDA horsepower waiting on the sideline for something to crunch on...

{Messages Log snipped}

V7 tasks are being split and issued, but there are also a lot of Enhanced v6 resends, and will be for the next few months. It also may be that any tasks already assigned to you as v6 will be resent as v6 rather than upgraded to v7.

On the "No V7 work available" you may have just hit the Scheduler when the feeder was empty, but just for the record, on your Account Preferences page on the S@H website, do you have V7 tasks checked?

Edit] And as Mark said, V7 tasks are not being sent to Anonymous Platform clients. Check the "v7 roll-out" thread for info on when the new Lunatics apps are coming out, and how to revert to stock apps in the interim...
____________
Donald
Infernal Optimist / Submariner, retired

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24184
Credit: 33,302,372
RAC: 24,546
Germany
Message 1373539 - Posted: 30 May 2013, 15:40:13 UTC

I`m constantly getting V7 work.

30.05.2013 17:30:08 SETI@home Reporting 3 completed tasks, requesting new tasks for ATI GPU
30.05.2013 17:30:08 SETI@home [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
30.05.2013 17:30:08 SETI@home [sched_op] ATI GPU work request: 642222.79 seconds; 0.00 GPUs
30.05.2013 17:30:10 SETI@home Started upload of 22my12ab.16948.7020.13.11.21_1_0
30.05.2013 17:30:10 SETI@home Scheduler request completed: got 2 new tasks
30.05.2013 17:30:10 SETI@home [sched_op] Server version 701
30.05.2013 17:30:10 SETI@home Project requested delay of 303 seconds
30.05.2013 17:30:10 SETI@home [sched_op] estimated total CPU task duration: 0 seconds
30.05.2013 17:30:10 SETI@home [sched_op] estimated total ATI GPU task duration: 4610 seconds
30.05.2013 17:30:10 SETI@home [sched_op] handle_scheduler_reply(): got ack for task 22my12ab.16948.7020.13.11.18_1
30.05.2013 17:30:10 SETI@home [sched_op] handle_scheduler_reply(): got ack for task 26jn12ab.13700.889.11.12.68_0
30.05.2013 17:30:10 SETI@home [sched_op] handle_scheduler_reply(): got ack for task 25jn11ac.26481.24783.10.12.93_1
30.05.2013 17:30:10 SETI@home [sched_op] Deferring communication for 5 min 3 sec
30.05.2013 17:30:10 SETI@home [sched_op] Reason: requested by project

____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (84) Server Problems?

Copyright © 2014 University of California