Panic Mode On (92) Server Problems?

Message boards : Number crunching : Panic Mode On (92) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 23 · Next

AuthorMessage
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 1605731 - Posted: 26 Nov 2014, 14:53:00 UTC
Last modified: 26 Nov 2014, 15:41:26 UTC

I've been crunching mostly Beta since last week since it has a 3 to 1 ratio with main. Last night I ran out of Bata work, so BOINC started crunching my reserve cache of 80 WU from main. Now down 60 WU including 28 resend (3 AP), I pickup overnight. I figure I've still got another 8 to 10 hrs before my GPU falls back on Milkyway and Einstein.

EDIT: Got a full (50 WU) cache from BETA, so I'm good for 36 hrs.

Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 1605731 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1605772 - Posted: 26 Nov 2014, 18:24:42 UTC

Well my lucks in today, just got another 2 MB tasks.. Last lot were resends, but a WU is a WU just the same:-)

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1605772 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1605778 - Posted: 26 Nov 2014, 18:49:19 UTC - in response to Message 1605772.  

+1

Well my lucks in today, just got another 2 MB tasks.. Last lot were resends, but a WU is a WU just the same:-)

Regards,


ID: 1605778 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34053
Credit: 18,883,157
RAC: 18
Belgium
Message 1605790 - Posted: 26 Nov 2014, 19:16:10 UTC - in response to Message 1605778.  

+1

Well my lucks in today, just got another 2 MB tasks.. Last lot were resends, but a WU is a WU just the same:-)

Regards,


+2
rOZZ
Music
Pictures
ID: 1605790 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1605791 - Posted: 26 Nov 2014, 19:16:18 UTC - in response to Message 1605329.  

Update on my Beta crunching: my i7 crunched through all the cuda32s and cuda42s (and a few cuda50s) it was sent, and got sent more 32s. Looking at the processing times, though, it seems to my amateur eye like the 42s were more efficient. Does the scheduler think differently than I do? (Probably...)

The Average processing rate (APR) shown on the Application details for host 66539 is 72.81 GFLOPS for CUDA32 and 64.93 GFLOPS for CUDA42. That's the basis on which the Scheduler logic considers CUDA32 faster.

However, the Scheduler choice of which to send has a random factor applied. It's based on the normal distribution so is usually small, and it is further scaled down by the number of completed tasks in the host average. The idea is that the host averages will get more reliable as more tasks are averaged in. But the host averages are calculated using exponential smoothing such that about half the average is based on the most recent 69 values, so they can actually vary considerably for GPU processing.

The "GFLOPS" are derived from the estimate of floating point operations the splitters provide. For SaH v7 those estimates are based on angle range (AR), and are a compromise between how AR affects processing on CPU or GPU. That compromise means that very high AR "shorties" provide a lower APR than normal midrange AR tasks when processed on GPU. It looks like all the CUDA42 tasks you got were shorties but CUDA32 has gotten a mix. Perhaps the next time the random factor makes the Scheduler send CUDA42 you'll get a batch of midrange AR tasks which will increase the APR for those.
                                                                  Joe
ID: 1605791 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1605892 - Posted: 27 Nov 2014, 0:01:21 UTC - in response to Message 1605791.  

Update on my Beta crunching: my i7 crunched through all the cuda32s and cuda42s (and a few cuda50s) it was sent, and got sent more 32s. Looking at the processing times, though, it seems to my amateur eye like the 42s were more efficient. Does the scheduler think differently than I do? (Probably...)

The Average processing rate (APR) shown on the Application details for host 66539 is 72.81 GFLOPS for CUDA32 and 64.93 GFLOPS for CUDA42. That's the basis on which the Scheduler logic considers CUDA32 faster.

However, the Scheduler choice of which to send has a random factor applied. It's based on the normal distribution so is usually small, and it is further scaled down by the number of completed tasks in the host average. The idea is that the host averages will get more reliable as more tasks are averaged in. But the host averages are calculated using exponential smoothing such that about half the average is based on the most recent 69 values, so they can actually vary considerably for GPU processing.

The "GFLOPS" are derived from the estimate of floating point operations the splitters provide. For SaH v7 those estimates are based on angle range (AR), and are a compromise between how AR affects processing on CPU or GPU. That compromise means that very high AR "shorties" provide a lower APR than normal midrange AR tasks when processed on GPU. It looks like all the CUDA42 tasks you got were shorties but CUDA32 has gotten a mix. Perhaps the next time the random factor makes the Scheduler send CUDA42 you'll get a batch of midrange AR tasks which will increase the APR for those.
                                                                  Joe


A sidenote perhaps interesting to some: In addition to the averages being volatile with work mix and other aspects ignoring any hardware or app change not managed, the estimate mechanism already contains several 'noisy' inputs with various offsets/scales and variances in practice. Those sources of noise easily swamp those random factors inducing chaotic behaviour and making those random offsets more or less redundant.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1605892 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1605896 - Posted: 27 Nov 2014, 0:54:30 UTC

And now we have less than 900K MBs out in the field.
ID: 1605896 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1605937 - Posted: 27 Nov 2014, 3:34:08 UTC - in response to Message 1605892.  

Update on my Beta crunching: my i7 crunched through all the cuda32s and cuda42s (and a few cuda50s) it was sent, and got sent more 32s. Looking at the processing times, though, it seems to my amateur eye like the 42s were more efficient. Does the scheduler think differently than I do? (Probably...)

The Average processing rate (APR) shown on the Application details for host 66539 is 72.81 GFLOPS for CUDA32 and 64.93 GFLOPS for CUDA42. That's the basis on which the Scheduler logic considers CUDA32 faster.

However, the Scheduler choice of which to send has a random factor applied. It's based on the normal distribution so is usually small, and it is further scaled down by the number of completed tasks in the host average. The idea is that the host averages will get more reliable as more tasks are averaged in. But the host averages are calculated using exponential smoothing such that about half the average is based on the most recent 69 values, so they can actually vary considerably for GPU processing.

The "GFLOPS" are derived from the estimate of floating point operations the splitters provide. For SaH v7 those estimates are based on angle range (AR), and are a compromise between how AR affects processing on CPU or GPU. That compromise means that very high AR "shorties" provide a lower APR than normal midrange AR tasks when processed on GPU. It looks like all the CUDA42 tasks you got were shorties but CUDA32 has gotten a mix. Perhaps the next time the random factor makes the Scheduler send CUDA42 you'll get a batch of midrange AR tasks which will increase the APR for those.
                                                                  Joe


A sidenote perhaps interesting to some: In addition to the averages being volatile with work mix and other aspects ignoring any hardware or app change not managed, the estimate mechanism already contains several 'noisy' inputs with various offsets/scales and variances in practice. Those sources of noise easily swamp those random factors inducing chaotic behaviour and making those random offsets more or less redundant.

Thanks, gentlemen.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1605937 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1605938 - Posted: 27 Nov 2014, 3:36:35 UTC - in response to Message 1605432.  

I think they may have disabled or changed the built in degradation part of RAC. Normally it should drop at least once a week, but I have machines that have had their RAC flat for ~3 weeks.

My RAC is continuing to fall.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1605938 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1605940 - Posted: 27 Nov 2014, 3:39:05 UTC
Last modified: 27 Nov 2014, 3:49:46 UTC

I've received 11 resends so far today, including a V7 AP. :-D

Cheers.
ID: 1605940 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1605946 - Posted: 27 Nov 2014, 4:12:55 UTC - in response to Message 1605940.  

Hi Wiggo,

Yeah I got a whole 1 AP resend:-) However AP's don't seem to be getting validated, since the AP validator is seemingly offline..

Guess that's no surprise considering the AP database problems.

But I still have a couple of MB tasks waiting in the wings:-)

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1605946 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1605966 - Posted: 27 Nov 2014, 5:23:25 UTC

I notice that the SSP seems to cough up this hairball,
Warning: number_format() expects parameter 1 to be double, string given in /disks/carolyn/b/home/boincadm/projects/sah/html/seti_boinc_html/sah_status.php on line 417

any time the "Results received in last hour" for AP drops to zero. Perhaps a "divide by zero" type of error that's now showing up as we start to hit bottom.
ID: 1605966 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1606110 - Posted: 27 Nov 2014, 12:09:05 UTC

This lark is getting beyond a joke now, not only is S@H effectively down for work, but a darn backup project has now gone down as well..

Maybe its server flu or summat.. going the rounds from one project to the next..


Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1606110 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1606281 - Posted: 27 Nov 2014, 21:31:26 UTC

Enabled work fetch on Einstein.

No work for my GPU for hours and my CPU is almost freezing while crunching .
Wont happen anymore.


With each crime and every kindness we birth our future.
ID: 1606281 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1606283 - Posted: 27 Nov 2014, 22:01:38 UTC - in response to Message 1606281.  

Enabled work fetch on Einstein.

No work for my GPU for hours and my CPU is almost freezing while crunching .
Wont happen anymore.

All of mine are fetching, crunching and reporting as requested.

http://einstein.phys.uwm.edu/hosts_user.php?userid=144054

Check that your preferences are up-to-date for the applications you were expecting to crunch.
ID: 1606283 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1606287 - Posted: 27 Nov 2014, 22:18:41 UTC - in response to Message 1606283.  
Last modified: 27 Nov 2014, 22:19:42 UTC

Enabled work fetch on Einstein.

No work for my GPU for hours and my CPU is almost freezing while crunching .
Wont happen anymore.

All of mine are fetching, crunching and reporting as requested.

http://einstein.phys.uwm.edu/hosts_user.php?userid=144054

Check that your preferences are up-to-date for the applications you were expecting to crunch.


Richard i`m no newbe.

27.11.2014 23:17:17 Einstein@Home update requested by user
27.11.2014 23:17:21 Einstein@Home [sched_op] Starting scheduler request
27.11.2014 23:17:21 Einstein@Home Sending scheduler request: Requested by user.
27.11.2014 23:17:21 Einstein@Home Requesting new tasks for CPU and ATI GPU
27.11.2014 23:17:21 Einstein@Home [sched_op] CPU work request: 1.00 seconds; 0.00 CPUs
27.11.2014 23:17:21 Einstein@Home [sched_op] ATI GPU work request: 1.00 seconds; 1.00 GPUs
27.11.2014 23:17:23 Einstein@Home Scheduler request completed: got 0 new tasks
27.11.2014 23:17:23 Einstein@Home [sched_op] Server version 611
27.11.2014 23:17:23 Einstein@Home No work sent
27.11.2014 23:17:23 Einstein@Home see scheduler log messages on http://einstein5.aei.uni-hannover.de/EinsteinAtHome/host_sched_logs/3647/3647123
27.11.2014 23:17:23 Einstein@Home Jobs for CPU are available, but your preferences are set to not accept them


With each crime and every kindness we birth our future.
ID: 1606287 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 7015
Credit: 2,084,789
RAC: 3
Message 1606301 - Posted: 27 Nov 2014, 22:42:45 UTC

From the Server Status page:

http://setiathome.berkeley.edu/sah_status.html

Results ready to send: 3

Apparently it became 0 once again.

Yes, I know about the current problems.

http://setiathome.berkeley.edu/forum_thread.php?id=76174&postid=1605524

Please click on the links there will you?

I guess you all missed what he said.
ID: 1606301 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1606305 - Posted: 27 Nov 2014, 22:50:05 UTC - in response to Message 1606287.  
Last modified: 27 Nov 2014, 23:27:48 UTC

Richard i`m no newbe.

You are in this case, you're running a Boinc without OpenCL detection:

http://einstein5.aei.uni-hannover.de/EinsteinAtHome/host_sched_logs/3647/3647123

2014-11-27 22:24:31.7574 [PID=28773] Request: [USER#xxxxx] [HOST#3647123] [IP xxx.xxx.xxx.3] client 6.12.34
2014-11-27 22:24:31.7580 [PID=28773] [send] effective_ncpus 2 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2014-11-27 22:24:31.7580 [PID=28773] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2014-11-27 22:24:31.7581 [PID=28773] [send] Not using matchmaker scheduling; Not using EDF sim
2014-11-27 22:24:31.7581 [PID=28773] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2014-11-27 22:24:31.7581 [PID=28773] [send] ATI: req 1.00 sec, 1.00 instances; est delay 0.00
2014-11-27 22:24:31.7581 [PID=28773] [send] work_req_seconds: 0.00 secs
2014-11-27 22:24:31.7581 [PID=28773] [send] available disk 85.37 GB, work_buf_min 345600
2014-11-27 22:24:31.7581 [PID=28773] [send] active_frac 0.955851 on_frac 0.988101 DCF 1.274796
2014-11-27 22:24:31.7589 [PID=28773] [send] [HOST#3647123] is reliable
2014-11-27 22:24:31.7590 [PID=28773] [send] set_trust: random choice for error rate 0.000010: yes
2014-11-27 22:24:31.7882 [PID=28773] [version] Checking plan class 'BRP4G-opencl-ati'
2014-11-27 22:24:31.7901 [PID=28773] [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2014-11-27 22:24:31.7901 [PID=28773] [version] numerical Windows version: 601760100 (Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00))
2014-11-27 22:24:31.7901 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7901 [PID=28773] [version] ATI device (or driver) doesn't support OpenCL
2014-11-27 22:24:31.7901 [PID=28773] [version] Checking plan class 'BRP4G-cuda32'
2014-11-27 22:24:31.7902 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7902 [PID=28773] [version] No CUDA devices found
2014-11-27 22:24:31.7902 [PID=28773] [version] Checking plan class 'BRP4G-cuda32-nv301'
2014-11-27 22:24:31.7902 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7902 [PID=28773] [version] No CUDA devices found
2014-11-27 22:24:31.7902 [PID=28773] [version] Checking plan class 'BRP4G-opencl-ati'
2014-11-27 22:24:31.7902 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7902 [PID=28773] [version] ATI device (or driver) doesn't support OpenCL
2014-11-27 22:24:31.7902 [PID=28773] [version] no app version available: APP#25 (einsteinbinary_BRP4G) PLATFORM#9 (windows_x86_64) min_version 0
2014-11-27 22:24:31.7902 [PID=28773] [version] no app version available: APP#25 (einsteinbinary_BRP4G) PLATFORM#2 (windows_intelx86) min_version 0
2014-11-27 22:24:31.7912 [PID=28773] [version] Checking plan class 'opencl-intel_gpu'
2014-11-27 22:24:31.7912 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7912 [PID=28773] [version] No Intel GPU devices found
2014-11-27 22:24:31.7912 [PID=28773] [version] Checking plan class 'opencl-intel_gpu'
2014-11-27 22:24:31.7912 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7912 [PID=28773] [version] No Intel GPU devices found
2014-11-27 22:24:31.7912 [PID=28773] [version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#9 (windows_x86_64) min_version 0
2014-11-27 22:24:31.7912 [PID=28773] [version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#2 (windows_intelx86) min_version 0
2014-11-27 22:24:31.7913 [PID=28773] [version] Checking plan class 'BRP5-opencl-ati'
2014-11-27 22:24:31.7913 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7913 [PID=28773] [version] ATI device (or driver) doesn't support OpenCL
2014-11-27 22:24:31.7913 [PID=28773] [version] Checking plan class 'BRP5-cuda32'
2014-11-27 22:24:31.7913 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7914 [PID=28773] [version] No CUDA devices found
2014-11-27 22:24:31.7914 [PID=28773] [version] Checking plan class 'BRP5-cuda32-nv301'
2014-11-27 22:24:31.7914 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7914 [PID=28773] [version] No CUDA devices found
2014-11-27 22:24:31.7914 [PID=28773] [version] Checking plan class 'BRP5-opencl-ati'
2014-11-27 22:24:31.7914 [PID=28773] [version] parsed project prefs setting 'gpu_util_brp': 1.000000
2014-11-27 22:24:31.7914 [PID=28773] [version] ATI device (or driver) doesn't support OpenCL
2014-11-27 22:24:31.7914 [PID=28773] [version] no app version available: APP#23 (einsteinbinary_BRP5) PLATFORM#9 (windows_x86_64) min_version 0
2014-11-27 22:24:31.7914 [PID=28773] [version] no app version available: APP#23 (einsteinbinary_BRP5) PLATFORM#2 (windows_intelx86) min_version 0
2014-11-27 22:24:31.7915 [PID=28773] [version] Checking plan class 'FGRP4-SSE2'
2014-11-27 22:24:31.7915 [PID=28773] [version] plan class ok
2014-11-27 22:24:31.7915 [PID=28773] [version] Don't need CPU jobs, skipping version 104 for hsgamma_FGRP4 (FGRP4-SSE2)
2014-11-27 22:24:31.7915 [PID=28773] [version] Checking plan class 'FGRP4-Beta'
2014-11-27 22:24:31.7915 [PID=28773] [version] beta test app versions not allowed in project prefs.
2014-11-27 22:24:31.7915 [PID=28773] [version] no app version available: APP#27 (hsgamma_FGRP4) PLATFORM#9 (windows_x86_64) min_version 0
2014-11-27 22:24:31.7915 [PID=28773] [version] no app version available: APP#27 (hsgamma_FGRP4) PLATFORM#2 (windows_intelx86) min_version 0
2014-11-27 22:24:31.8019 [PID=28773] [debug] [HOST#3647123] MSG(high) No work sent
2014-11-27 22:24:31.8019 [PID=28773] [debug] [HOST#3647123] MSG(high) see scheduler log messages on http://einstein5.aei.uni-hannover.de/EinsteinAtHome/host_sched_logs/3647/3647123
2014-11-27 22:24:31.8020 [PID=28773] Sending reply to [HOST#3647123]: 0 results, delay req 60.00
2014-11-27 22:24:31.8030 [PID=28773] Scheduler ran 0.049 seconds


Claggy
ID: 1606305 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 1606313 - Posted: 27 Nov 2014, 23:27:13 UTC

I'm hoping the donations earned from crunching at BU will allow for some new hardware purchases.


ID: 1606313 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1606331 - Posted: 28 Nov 2014, 1:10:04 UTC

Still got 20 tasks in cache. It helps that I'm being maimed to death by killer eagles in Far Cry 4 most of the day. :)
ID: 1606331 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (92) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.