Panic Mode On (87) Server Problems?

Message boards : Number crunching : Panic Mode On (87) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 24 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13987
Credit: 208,696,464
RAC: 304
Australia
Message 1488240 - Posted: 13 Mar 2014, 7:06:11 UTC - in response to Message 1488174.  

Has anyone noticed that there is a backlog of Astropulse work units waiting for assimilation, and that two of the four Astropulse assimilators are not running?

It's been that way for over a week now. Prior to that it was stuck for a week. Prior to that it was stuck for a couple of days, on a couple of occasions. And it's been the same for the AP validators for the same time periods.
Grant
Darwin NT
ID: 1488240 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1488298 - Posted: 13 Mar 2014, 12:34:52 UTC - in response to Message 1488009.  
Last modified: 13 Mar 2014, 12:52:51 UTC

My MB Host isn't having any trouble. The Two AP Hosts have been dropping since Monday when they were both full. They both should be close to 200, the one is now at 107 and continues to drop with only an occasional download. Most of the time it simply says;
Wed Mar 12 14:57:24 2014 | SETI@home | Sending scheduler request: To fetch work.
Wed Mar 12 14:57:24 2014 | SETI@home | Reporting 2 completed tasks
Wed Mar 12 14:57:24 2014 | SETI@home | Requesting new tasks for ATI
Wed Mar 12 14:57:26 2014 | SETI@home | Scheduler request completed: got 0 new tasks
Wed Mar 12 14:57:26 2014 | SETI@home | No tasks sent
Wed Mar 12 14:57:26 2014 | SETI@home | No tasks are available for AstroPulse v6
Wed Mar 12 15:03:01 2014 | SETI@home | Computation for task ap_11ap13aa_B4_P0_00393_20140310_05311.wu_2 finished
Wed Mar 12 15:03:01 2014 | SETI@home | Starting task ap_04mr13aa_B1_P1_00105_20140310_18914.wu_0 using astropulse_v6 version 607 (opencl_ati_100) in slot 3
Wed Mar 12 15:03:03 2014 | SETI@home | Started upload of ap_11ap13aa_B4_P0_00393_20140310_05311.wu_2_0
Wed Mar 12 15:03:07 2014 | SETI@home | Finished upload of ap_11ap13aa_B4_P0_00393_20140310_05311.wu_2_0
Wed Mar 12 15:03:07 2014 | SETI@home | Sending scheduler request: To fetch work.
Wed Mar 12 15:03:07 2014 | SETI@home | Reporting 1 completed tasks
Wed Mar 12 15:03:07 2014 | SETI@home | Requesting new tasks for ATI
Wed Mar 12 15:03:09 2014 | SETI@home | Scheduler request completed: got 0 new tasks
Wed Mar 12 15:03:09 2014 | SETI@home | Project has no tasks available...

I believe the problem is associated with the bold text;
State: All (502) · In progress (107) · Validation pending (124) · Validation inconclusive (5) · Valid (265) · Invalid (0) · Error (1)
That number should be close to 100. I think you can blame that on "Workunits waiting for assimilation: 90,453"

Still No Change. Valid Count & Workunits waiting for assimilation Still Rising. Still No AP Downloads.
State: All (502) · In progress (102) · Validation pending (125) · Validation inconclusive (5) · Valid (269)
Workunits waiting for assimilation: 91,339

Still only receiving AP resends;
State: All (473) · In progress (48) · Validation pending (116) · Validation inconclusive (6) · Valid (302)
That Host will be out of work in half a day.
The other machine is still dropping as well;
State: All (424) · In progress (128) · Validation pending (111) · Validation inconclusive (6) · Valid (179)
Workunits waiting for assimilation: 99,159

Thu Mar 13 08:40:28 2014 | SETI@home | Sending scheduler request: To fetch work.
Thu Mar 13 08:40:28 2014 | SETI@home | Reporting 1 completed tasks
Thu Mar 13 08:40:28 2014 | SETI@home | Requesting new tasks for CPU and ATI
Thu Mar 13 08:40:29 2014 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Mar 13 08:40:29 2014 | SETI@home | Project has no tasks available...
ID: 1488298 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 955
Credit: 136,115,648
RAC: 73
Hungary
Message 1488333 - Posted: 13 Mar 2014, 13:56:25 UTC - in response to Message 1488176.  

I keep getting the following messages and no work:
3/13/2014 3:20:36 AM | | cc_config.xml not found - using defaults
3/13/2014 3:20:37 AM | | Starting BOINC client version 7.2.42 for windows_x86_64
3/13/2014 3:20:37 AM | | log flags: file_xfer, sched_ops, task
3/13/2014 3:20:37 AM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
3/13/2014 3:20:37 AM | | Data directory: C:\ProgramData\BOINC
3/13/2014 3:20:37 AM | | Running under account Nick
3/13/2014 3:20:37 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 580 (driver version 331.65, CUDA version 6.0, compute capability 2.0, 1536MB, 1424MB available, 1581 GFLOPS peak)
3/13/2014 3:20:37 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 580 (driver version 331.65, device version OpenCL 1.1 CUDA, 1536MB, 1424MB available, 1581 GFLOPS peak)
3/13/2014 3:20:37 AM | | OpenCL CPU: AMD FX(tm)-8150 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 2.0, device version OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10))
3/13/2014 3:20:37 AM | SETI@home | Found app_info.xml; using anonymous platform
3/13/2014 3:20:37 AM | | app version refers to missing GPU type ATI
3/13/2014 3:20:37 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
ID: 1488333 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1488340 - Posted: 13 Mar 2014, 14:00:42 UTC

Ready to send for MB is dragging on the ground. Creation rate is ~27/sec.

BTW, for the discussion of when the splitters stop when the ready to send is high enough, I noticed that on the Beta SSP there is a process called splitter throttle. I would think there's one here too and it just doesn't appear on the SSP.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1488340 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1488342 - Posted: 13 Mar 2014, 14:06:04 UTC - in response to Message 1488333.  

I keep getting the following messages and no work:
3/13/2014 3:20:36 AM | | cc_config.xml not found - using defaults
3/13/2014 3:20:37 AM | | Starting BOINC client version 7.2.42 for windows_x86_64
3/13/2014 3:20:37 AM | | log flags: file_xfer, sched_ops, task
3/13/2014 3:20:37 AM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
3/13/2014 3:20:37 AM | | Data directory: C:\ProgramData\BOINC
3/13/2014 3:20:37 AM | | Running under account Nick
3/13/2014 3:20:37 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 580 (driver version 331.65, CUDA version 6.0, compute capability 2.0, 1536MB, 1424MB available, 1581 GFLOPS peak)
3/13/2014 3:20:37 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 580 (driver version 331.65, device version OpenCL 1.1 CUDA, 1536MB, 1424MB available, 1581 GFLOPS peak)

3/13/2014 3:20:37 AM | | OpenCL CPU: AMD FX(tm)-8150 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 2.0, device version OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10))
3/13/2014 3:20:37 AM | SETI@home | Found app_info.xml; using anonymous platform
3/13/2014 3:20:37 AM | | app version refers to missing GPU type ATI

3/13/2014 3:20:37 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)

Looks to me like you have an nVidia GPU but installed the anonymous apps for ATI.

Am I interpreting this correctly, guys?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1488342 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1488344 - Posted: 13 Mar 2014, 14:08:56 UTC - in response to Message 1488343.  

3/13/2014 3:20:37 AM | SETI@home | Found app_info.xml; using anonymous platform
3/13/2014 3:20:37 AM | | app version refers to missing GPU type ATI


Your app_info.xml is set up to do work on an ATI(AMD) GPU and since you have no ATI(AMD) GPU, you're getting this error.

Reinstall Lunatics and uncheck ATI boxes and that'll get rid of that error. (Make sure you understand which boxes are checked and which boxes are unchecked during the installation.)

Edit: Yes N9JFE, you are correct.

Thanks. I guess we were both typing at the same time.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1488344 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1488345 - Posted: 13 Mar 2014, 14:09:09 UTC

Miklos, do me a favour, start your own thread for your workfetch problems and run work_fetch_debug.

I might even explain how to run work_fetch_debug if there's a dedicated thread.

Then again, maybe I've just having a bad hair day.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1488345 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1488346 - Posted: 13 Mar 2014, 14:11:51 UTC - in response to Message 1488343.  

3/13/2014 3:20:37 AM | SETI@home | Found app_info.xml; using anonymous platform
3/13/2014 3:20:37 AM | | app version refers to missing GPU type ATI


Your app_info.xml is set up to do work on an ATI(AMD) GPU and since you have no ATI(AMD) GPU, you're getting this error.

Reinstall Lunatics and uncheck ATI boxes and that'll get rid of that error. (Make sure you understand which boxes are checked and which boxes are unchecked during the installation.)

Edit: Yes N9JFE, you are correct.


As Richard pointed out while we were doing the last installer, making things foolproof only breeds a better class of fool.

As it stands, I am going to expect a minimum amount of reading skills from somebody using the installer. Switching on the brain has been known to help too.

Yes, defenitely a bad hair day. Might be that spammer at beta.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1488346 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1488362 - Posted: 13 Mar 2014, 15:10:18 UTC - in response to Message 1488346.  



Yes, defenitely a bad hair day. Might be that spammer at beta.


You mean all 3 of them that showed up the same day, or were they all using the same IP address.

ID: 1488362 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1488386 - Posted: 13 Mar 2014, 16:10:53 UTC - in response to Message 1488346.  

Yes, defenitely a bad hair day.

or ??
ID: 1488386 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1488484 - Posted: 13 Mar 2014, 18:27:54 UTC

Finally.

Thu Mar 13 14:14:52 2014 | SETI@home | Sending scheduler request: To fetch work.
Thu Mar 13 14:14:52 2014 | SETI@home | Requesting new tasks for CPU and ATI
Thu Mar 13 14:14:54 2014 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Mar 13 14:14:54 2014 | SETI@home | Project has no tasks available
Thu Mar 13 14:25:01 2014 | SETI@home | Sending scheduler request: To fetch work.
Thu Mar 13 14:25:01 2014 | SETI@home | Requesting new tasks for CPU and ATI
Thu Mar 13 14:25:03 2014 | SETI@home | Scheduler request completed: got 5 new tasks
Thu Mar 13 14:25:05 2014 | SETI@home | Started download of ap_05ap13aa_B4_P0_00092_20140313_19222.wu
Thu Mar 13 14:25:05 2014 | SETI@home | Started download of ap_21se13aa_B5_P0_00160_20140216_02211.wu
Thu Mar 13 14:25:05 2014 | SETI@home | Started download of ap_02au13aa_B0_P0_00095_20140313_19108.wu
Thu Mar 13 14:25:05 2014 | SETI@home | Started download of ap_05ap13aa_B3_P0_00091_20140313_19146.wu
Thu Mar 13 14:25:05 2014 | SETI@home | Started download of ap_29mr13aa_B0_P0_00066_20140313_28552.wu

Just in time;
State: All (460) · In progress (32) · Validation pending (117) · Validation inconclusive (4) · Valid (306)
:-)
ID: 1488484 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1488530 - Posted: 13 Mar 2014, 19:28:46 UTC

I see it this way:

- There is almost 90000 AP WU waiting for assimilation.
- The Splitters only produce new WU as disk space is available.
- So, as the number of WU waiting to be assimilated is growing, less and less disk space is available, so the results out in the field is steadily dropping.


Anyone would like to coment?



Possible. I guess it would all depend on what is being stored where. But based purely on watching how things normally go, I doubt it. Big question is, are the files actually living on the servers that are doing the processing of a particular step?

I'd be more inclined to think it's a case of process priority.

After an outage, it's implied the scheduling server, in charge of moving results out to the field and receiving result reports, gets very busy. (see server descriptions on SSP) Since the AP Validator, AP Assimilators, the Scheduler processes and the Feeder all reside on the same physical server (Synergy) it's reasonable to assume that when Synergy gets busy it assigns higher priority to scheduling and feeding work, and cuts back on validation and assimilation as lower priority tasks. Sending results to the clients, receiving uploaded complete results and accepting reports of the uploads are all "real time" tasks that involve communication with our client software, where validation and assimilation can be done anytime. So establishing priority on that basis would only make sense.



Now that the validators/assimilators are clearing the backlog, automaticaly the splitters are sending new AP work out to volunteers.

Probably the lack of disk space issue.
ID: 1488530 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1488622 - Posted: 13 Mar 2014, 21:45:31 UTC

Well, that's a new one to me.

13/03/2014 21:12:56 | SETI@home | [sched_op] Starting scheduler request
13/03/2014 21:12:56 | SETI@home | Sending scheduler request: To fetch work.
13/03/2014 21:12:56 | SETI@home | Requesting new tasks for AMD/ATI GPU
13/03/2014 21:12:56 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
13/03/2014 21:12:56 | SETI@home | [sched_op] AMD/ATI GPU work request: 125.00 seconds; 0.50 devices
13/03/2014 21:13:00 | SETI@home | Scheduler request completed: got 0 new tasks
13/03/2014 21:13:00 | SETI@home | [sched_op] Server version 703
13/03/2014 21:13:00 | SETI@home | No tasks sent
13/03/2014 21:13:00 | SETI@home | No tasks are available for SETI@home v7
13/03/2014 21:13:00 | SETI@home | No tasks are available for AstroPulse v6
13/03/2014 21:13:00 | SETI@home | Message from server: Your app_info.xml file doesn't have a usable version of AstroPulse v6.
13/03/2014 21:13:00 | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
13/03/2014 21:13:00 | SETI@home | Tasks for NVIDIA GPU are available, but your preferences are set to not accept them
13/03/2014 21:13:00 | SETI@home | Tasks for Intel GPU are available, but your preferences are set to not accept them
13/03/2014 21:13:00 | SETI@home | Project requested delay of 303 seconds
13/03/2014 21:13:00 | SETI@home | [sched_op] Deferring communication for 00:05:03
13/03/2014 21:13:00 | SETI@home | [sched_op] Reason: requested by project
The above was a slight oversight by me. I did have AP set in project prefs, but didn't have an app for my GPU in my app_info.xml; got that fixed now.

So then, what to think of this later one?

13/03/2014 22:32:49 | SETI@home | [sched_op] Starting scheduler request
13/03/2014 22:32:49 | SETI@home | Sending scheduler request: Requested by user.
13/03/2014 22:32:49 | SETI@home | Not requesting tasks: don't need (CPU: project preferences; AMD/ATI GPU: buffer full)
13/03/2014 22:32:49 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
13/03/2014 22:32:49 | SETI@home | [sched_op] AMD/ATI GPU work request: 0.00 seconds; 0.00 devices


:-)
ID: 1488622 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1488623 - Posted: 13 Mar 2014, 21:49:40 UTC - in response to Message 1488622.  

Well, that's a new one to me.

13/03/2014 21:12:56 | SETI@home | [sched_op] Starting scheduler request
13/03/2014 21:12:56 | SETI@home | Sending scheduler request: To fetch work.
13/03/2014 21:12:56 | SETI@home | Requesting new tasks for AMD/ATI GPU
13/03/2014 21:12:56 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
13/03/2014 21:12:56 | SETI@home | [sched_op] AMD/ATI GPU work request: 125.00 seconds; 0.50 devices
13/03/2014 21:13:00 | SETI@home | Scheduler request completed: got 0 new tasks
13/03/2014 21:13:00 | SETI@home | [sched_op] Server version 703
13/03/2014 21:13:00 | SETI@home | No tasks sent
13/03/2014 21:13:00 | SETI@home | No tasks are available for SETI@home v7
13/03/2014 21:13:00 | SETI@home | No tasks are available for AstroPulse v6
13/03/2014 21:13:00 | SETI@home | Message from server: Your app_info.xml file doesn't have a usable version of AstroPulse v6.
13/03/2014 21:13:00 | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them
13/03/2014 21:13:00 | SETI@home | Tasks for NVIDIA GPU are available, but your preferences are set to not accept them
13/03/2014 21:13:00 | SETI@home | Tasks for Intel GPU are available, but your preferences are set to not accept them
13/03/2014 21:13:00 | SETI@home | Project requested delay of 303 seconds
13/03/2014 21:13:00 | SETI@home | [sched_op] Deferring communication for 00:05:03
13/03/2014 21:13:00 | SETI@home | [sched_op] Reason: requested by project
The above was a slight oversight by me. I did have AP set in project prefs, but didn't have an app for my GPU in my app_info.xml; got that fixed now.

That's almost a full house - all you're missing is "this computer has reached a limit on tasks in progress".

Pick a reason, any reason. And never complain that BOINC doesn't give you enough information ;)
ID: 1488623 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1488713 - Posted: 14 Mar 2014, 4:33:49 UTC

Here's the sort of thing that'll hold down the RTS buffer if it happens very often. My T7400, 7057115, just blew through 37 tasks in a row from "tape" 27mr13aa that had perfectly legitimate -9 overflows (i.e., not a runaway rig). Total Run Time = 238.35 seconds! These tasks were shorties to begin with, but this was ridiculous. :^)

All 37 tasks had names starting with "27mr13aa.1310.10292.438086664203.12". (Two other tasks from the same tape, but from a different sequence, produced strange errors. That's a whole different issue, I think.)
ID: 1488713 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1490054 - Posted: 17 Mar 2014, 10:20:51 UTC
Last modified: 17 Mar 2014, 10:21:18 UTC

It seems that the transitioners are down (5/6) :(
ID: 1490054 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1490201 - Posted: 17 Mar 2014, 18:52:18 UTC - in response to Message 1490192.  

Time for a little upload outage I think....

Not to mention a frozen SSP...
ID: 1490201 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22881
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1490251 - Posted: 17 Mar 2014, 19:42:58 UTC

...and me updating a cruncher never helps :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1490251 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1490255 - Posted: 17 Mar 2014, 19:45:06 UTC

FYI, I sent E-Mail to David, Eric, Matt & Jeff - because SSP frozen & UL not possible.
ID: 1490255 · Report as offensive
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (87) Server Problems?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.