Panic Mode On (94) Server Problems?

Message boards : Number crunching : Panic Mode On (94) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 22 · Next

AuthorMessage
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1626938 - Posted: 13 Jan 2015, 0:38:16 UTC



"Sour Grapes make a bitter Whine." <(0)>
ID: 1626938 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1626943 - Posted: 13 Jan 2015, 1:11:24 UTC


David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1626943 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1626947 - Posted: 13 Jan 2015, 1:26:22 UTC

Panic is called for, Einstein is borked and I can't get enough Seti as a back up on that box, truly a cause for PANIC.
ID: 1626947 · Report as offensive
Profile JanniCash
Avatar

Send message
Joined: 17 Nov 03
Posts: 57
Credit: 1,276,920
RAC: 0
United States
Message 1626949 - Posted: 13 Jan 2015, 1:45:11 UTC

ID: 1626949 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1626950 - Posted: 13 Jan 2015, 1:47:30 UTC
Last modified: 13 Jan 2015, 1:52:56 UTC

Progress! (?)

The machine that couldn't update finally did a few seconds ago when I clicked Update button (reported 171 tasks) but no tasks available....

Where have I heard that song before?

EDIT And that forced my Pendings well over 2000 (2118, to be exact) for the first time in living memory. /EDIT
ID: 1626950 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1626962 - Posted: 13 Jan 2015, 2:58:25 UTC

Got straightened out starting about 9pm EST (2am UTC) (d/l 86, then 68 WUs, then more).

Now, back to 300, the correct amount - 1 CPU, 2 GPU) Yay!!!

And thanks to whomever did the work!
ID: 1626962 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1626998 - Posted: 13 Jan 2015, 5:20:02 UTC

As I write this there are over 2 million MB results waiting to be purged. When this is done I am sure it will relieve a little bit of space
ID: 1626998 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1627207 - Posted: 13 Jan 2015, 15:53:29 UTC

Now what.
Everything was looking good, I was up to a whole 23 APs on my Mac (My Windows machines have Hundreds) and suddenly all the APs are gone from the SSP. Splitters disabled... Are they about to finally Fix the APs during this Outage?

One can only Hope.
ID: 1627207 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1627509 - Posted: 14 Jan 2015, 14:57:37 UTC
Last modified: 14 Jan 2015, 15:33:31 UTC

Oh well. There doesn't appear to be any change in the APs. The creation rate is still in the basement even when there are files to split. On the other front, there is still the issue where the server will send CPU APs even if the GPUs are Out of work. A recent example where there were 2 CPU spots open and the server decided to fill those 2 spots even though a GPU was Out of work and there were Days of CPU work;
Wed Jan 14 09:37:34 2015 | SETI@home | [sched_op] Starting scheduler request
Wed Jan 14 09:37:34 2015 | SETI@home | Sending scheduler request: To fetch work.
Wed Jan 14 09:37:34 2015 | SETI@home | Requesting new tasks for CPU and ATI
Wed Jan 14 09:37:34 2015 | SETI@home | [sched_op] CPU work request: 79474.48 seconds; 0.00 devices
Wed Jan 14 09:37:34 2015 | SETI@home | [sched_op] ATI work request: 1177312.35 seconds; 1.00 devices
Wed Jan 14 09:37:36 2015 | SETI@home | Scheduler request completed: got 1 new tasks
Wed Jan 14 09:37:36 2015 | SETI@home | [sched_op] estimated total CPU task duration: 36280 seconds
Wed Jan 14 09:37:36 2015 | SETI@home | [sched_op] estimated total ATI task duration: 0 seconds
Wed Jan 14 09:42:42 2015 | SETI@home | [sched_op] Starting scheduler request
Wed Jan 14 09:42:42 2015 | SETI@home | Sending scheduler request: To fetch work.
Wed Jan 14 09:42:42 2015 | SETI@home | Requesting new tasks for CPU and ATI
Wed Jan 14 09:42:42 2015 | SETI@home | [sched_op] CPU work request: 43953.40 seconds; 0.00 devices
Wed Jan 14 09:42:42 2015 | SETI@home | [sched_op] ATI work request: 1177960.66 seconds; 1.00 devices
Wed Jan 14 09:42:44 2015 | SETI@home | Scheduler request completed: got 2 new tasks
Wed Jan 14 09:42:44 2015 | SETI@home | [sched_op] estimated total CPU task duration: 72558 seconds
Wed Jan 14 09:42:44 2015 | SETI@home | [sched_op] estimated total ATI task duration: 0 seconds

If I raise the cache another half a day, the server will quickly send another CPU task to fill that spot even while the GPUs are Out of work.

Wed Jan 14 09:54:59 2015 | SETI@home | [sched_op] Starting scheduler request
Wed Jan 14 09:54:59 2015 | SETI@home | Sending scheduler request: To fetch work.
Wed Jan 14 09:54:59 2015 | SETI@home | Reporting 2 completed tasks
Wed Jan 14 09:54:59 2015 | SETI@home | Requesting new tasks for CPU and ATI
Wed Jan 14 09:54:59 2015 | SETI@home | [sched_op] CPU work request: 4741.31 seconds; 0.00 devices
Wed Jan 14 09:54:59 2015 | SETI@home | [sched_op] ATI work request: 1179360.00 seconds; 3.00 devices
Wed Jan 14 09:55:00 2015 | SETI@home | Scheduler request completed: got 0 new tasks
Wed Jan 14 09:55:00 2015 | SETI@home | No tasks sent
Wed Jan 14 09:55:00 2015 | SETI@home | No tasks are available for AstroPulse v7
Wed Jan 14 10:05:36 2015 | SETI@home | update requested by user
Wed Jan 14 10:05:39 2015 | SETI@home | [sched_op] Starting scheduler request
Wed Jan 14 10:05:39 2015 | SETI@home | Sending scheduler request: Requested by user.
Wed Jan 14 10:05:39 2015 | SETI@home | Requesting new tasks for CPU and ATI
Wed Jan 14 10:05:39 2015 | SETI@home | [sched_op] CPU work request: 61260.20 seconds; 0.00 devices
Wed Jan 14 10:05:39 2015 | SETI@home | [sched_op] ATI work request: 1308960.00 seconds; 3.00 devices
Wed Jan 14 10:05:40 2015 | SETI@home | Scheduler request completed: got 2 new tasks
Wed Jan 14 10:05:40 2015 | SETI@home | [sched_op] estimated total CPU task duration: 72552 seconds
Wed Jan 14 10:05:40 2015 | SETI@home | [sched_op] estimated total ATI task duration: 0 seconds
Wed Jan 14 10:10:47 2015 | SETI@home | Sending scheduler request: To fetch work.
Wed Jan 14 10:10:47 2015 | SETI@home | Requesting new tasks for CPU and ATI
Wed Jan 14 10:10:47 2015 | SETI@home | [sched_op] CPU work request: 12837.28 seconds; 0.00 devices
Wed Jan 14 10:10:47 2015 | SETI@home | [sched_op] ATI work request: 1308960.00 seconds; 3.00 devices
Wed Jan 14 10:10:48 2015 | SETI@home | Scheduler request completed: got 0 new tasks
Wed Jan 14 10:32:17 2015 | SETI@home | [sched_op] Starting scheduler request
Wed Jan 14 10:32:17 2015 | SETI@home | Sending scheduler request: To fetch work.
Wed Jan 14 10:32:17 2015 | SETI@home | Requesting new tasks for CPU and ATI
Wed Jan 14 10:32:17 2015 | SETI@home | [sched_op] CPU work request: 78845.72 seconds; 0.00 devices
Wed Jan 14 10:32:17 2015 | SETI@home | [sched_op] ATI work request: 1438560.00 seconds; 3.00 devices
Wed Jan 14 10:32:18 2015 | SETI@home | Scheduler request completed: got 1 new tasks
Wed Jan 14 10:32:18 2015 | SETI@home | [sched_op] estimated total CPU task duration: 36273 seconds
Wed Jan 14 10:32:18 2015 | SETI@home | [sched_op] estimated total ATI task duration: 0 seconds

There are over 4 days of CPU work, the 3 GPUs are Out of work, so the server sends CPU work.
You can't make this stuff up...
ID: 1627509 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1627566 - Posted: 14 Jan 2015, 18:31:15 UTC - in response to Message 1627509.  

Oh well. There doesn't appear to be any change in the APs. The creation rate is still in the basement even when there are files to split. On the other front, there is still the issue where the server will send CPU APs even if the GPUs are Out of work.

When I checked the SSP about an hour ago, there were 2 AP splitters munching away and about 300 ready to send. Now all the AP tapes are done (although the 2 splitters still show running) and RTS is 0.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1627566 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1627574 - Posted: 14 Jan 2015, 18:41:42 UTC

Again.................




"Sour Grapes make a bitter Whine." <(0)>
ID: 1627574 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1627588 - Posted: 14 Jan 2015, 19:00:47 UTC - in response to Message 1627566.  

Oh well. There doesn't appear to be any change in the APs. The creation rate is still in the basement even when there are files to split. On the other front, there is still the issue where the server will send CPU APs even if the GPUs are Out of work.

When I checked the SSP about an hour ago, there were 2 AP splitters munching away and about 300 ready to send. Now all the AP tapes are done (although the 2 splitters still show running) and RTS is 0.

Depending on how full the temp WU storage area is at the moment. We may or may not get more soon.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1627588 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1627766 - Posted: 15 Jan 2015, 2:54:26 UTC - in response to Message 1627588.  

Depending on how full the temp WU storage area is at the moment. We may or may not get more soon.

I'm going to take another stab at estimating (roughly) how much disk space is being consumed by WUs.

AP:
-WUs waiting for assimilation: ~750k = 5.70TiB
-Results returned and awaiting validation: ~1.7M / 3 (assuming an average of 3 results per WU) = 4.3TiB
-Results out in the field: ~120k / 3 (assuming average of 3 per WU) = ~312GiB

AP's subtotal: ~10.3TiB

MB:
-Results returned and awaiting validation: ~3M / 3 = ~340GiB
-Results out in the field: ~3.6M / 3 = ~410GiB
-WUs waiting for DB purging: ~1.3M = ~450GiB

MB's subtotal: ~1TiB

So in total, [quite] roughly, ~11.5TiB of WUs on disk presently. There's probably a +/- of 500GiB in my [very] rough numbers and estimations, but I think it is good enough for rough estimations.



Last week, the problem was the storage area filled (and I estimated it to be at 8TiB then) and more space got added to that volume of the array, but it is unknown how much space got added. If it was 4TiB, then we're nearing that limit again soon. If it was 8TiB, then we still have probably ~1 week longer until it becomes an issue again.

Really hope the AP database finishes getting massaged and mended so this mess can start being cleaned-up. MB doesn't take up much space on disk at all.. but AP is a massive byte hog.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1627766 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1628071 - Posted: 15 Jan 2015, 18:49:23 UTC - in response to Message 1627766.  

assuming an average of 3 results per WU

For Multibeam:

Workunits waiting for db purging        1,011,068
Results waiting for db purging          2,142,904


That gives about 2,12 results per workunit. The ratio is probably inflated by some amount by the bad batch from the rogue splitter. Astropulse ratio is/has been usually higher than that but still way under 3. (The current numbers in SSP makes no sense.)
ID: 1628071 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1628280 - Posted: 16 Jan 2015, 5:43:55 UTC

Well... considering the splitters are supposedly running, and there are tapes to split, but the creation rate is near-zero.. I'm going to assume the WU storage filled again.

Based on my ~11.5 TiB math and splitting has screeched to a halt, I'm going to assume (again) that 4 TiB was added to the original 8 and the storage space is once again full.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1628280 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1628582 - Posted: 16 Jan 2015, 20:22:26 UTC

Pretty screwy numbers on the SSP at this point, so something must have gone sideways. Not buying that the splitters somehow generated 3.6M MBs ready to send over night, and definitely not 100k APs. As if everything that was out in the field got lost track of and all of a sudden is back waiting to send?
ID: 1628582 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1628586 - Posted: 16 Jan 2015, 20:30:28 UTC - in response to Message 1628582.  

Pretty screwy numbers on the SSP at this point, so something must have gone sideways. Not buying that the splitters somehow generated 3.6M MBs ready to send over night, and definitely not 100k APs. As if everything that was out in the field got lost track of and all of a sudden is back waiting to send?

Of course the splitters didn't do that overnight. They did it in the past hour. Along with the 90k AP.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1628586 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1628615 - Posted: 16 Jan 2015, 21:12:35 UTC - in response to Message 1628586.  

Of course the splitters didn't do that overnight. They did it in the past hour. Along with the 90k AP.

Fortunately, it seems it shook its little head, resolved the issue and now looks normal. But, it looks as though the AP splitters are stuck. At least, they appear to be at the same point on the same files that they were 12 hours ago. Oh well ...
ID: 1628615 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1628787 - Posted: 17 Jan 2015, 2:10:49 UTC - in response to Message 1628615.  

Of course the splitters didn't do that overnight. They did it in the past hour. Along with the 90k AP.

Fortunately, it seems it shook its little head, resolved the issue and now looks normal. But, it looks as though the AP splitters are stuck. At least, they appear to be at the same point on the same files that they were 12 hours ago. Oh well ...

And both then promptly died. Oh well ...
ID: 1628787 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1628820 - Posted: 17 Jan 2015, 4:05:08 UTC - in response to Message 1628787.  
Last modified: 17 Jan 2015, 4:05:20 UTC

Of course the splitters didn't do that overnight. They did it in the past hour. Along with the 90k AP.

Fortunately, it seems it shook its little head, resolved the issue and now looks normal. But, it looks as though the AP splitters are stuck. At least, they appear to be at the same point on the same files that they were 12 hours ago. Oh well ...

And both then promptly died. Oh well ...


Add to that the result of my last Scheduler request-
17/01/2015 13:32:22 | SETI@home | Scheduler request failed: HTTP service unavailable
Grant
Darwin NT
ID: 1628820 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (94) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.