Panic Mode On (101) Server Problems?

Message boards : Number crunching : Panic Mode On (101) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 27 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11478
Credit: 167,293,895
RAC: 101,955
Australia
Message 1742145 - Posted: 14 Nov 2015, 21:48:26 UTC - in response to Message 1742142.  

Well we never know where the limit is until it is crossed.

Would be nice to stress test the system by allowing 400 or more WUs per GPU cache.
Grant
Darwin NT
ID: 1742145 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 519,757,459
RAC: 810,104
Canada
Message 1742150 - Posted: 14 Nov 2015, 22:02:48 UTC

I have a feeling they are doing stress tests to see what the databases can handle when adding a new telescope to the mix.
ID: 1742150 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11478
Credit: 167,293,895
RAC: 101,955
Australia
Message 1742151 - Posted: 14 Nov 2015, 22:10:00 UTC - in response to Message 1742150.  

Here's an interesting WU.

3 different applications, 3 different results.
02ja11aa.12226.328091.8.12.206
Grant
Darwin NT
ID: 1742151 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17583
Credit: 396,280,169
RAC: 156,251
United Kingdom
Message 1742152 - Posted: 14 Nov 2015, 22:10:54 UTC

Matt said as much a couple of days back - take a look near the bottom of this post:
http://setiathome.berkeley.edu/forum_thread.php?id=78462&postid=1740982
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1742152 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2988
Credit: 11,869,884
RAC: 7,377
United States
Message 1742189 - Posted: 15 Nov 2015, 1:49:28 UTC - in response to Message 1742151.  

Here's an interesting WU.

3 different applications, 3 different results.
02ja11aa.12226.328091.8.12.206

It will be interesting to see if any of those three end up being invalid.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1742189 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8237
Credit: 12,165,924
RAC: 11,614
United States
Message 1742219 - Posted: 15 Nov 2015, 7:02:45 UTC - in response to Message 1742189.  

Here's an interesting WU.

3 different applications, 3 different results.
02ja11aa.12226.328091.8.12.206

It will be interesting to see if any of those three end up being invalid.

Looking at the results tables in each stderr.txt, I would guess the _0, on the cuda42 app, will be invalid, due to the large number of signals found that the others did not. If either of the other two results is validated, the _0 will not pass the 50% match test for "weakly similar" validation,
Donald
Infernal Optimist / Submariner, retired
ID: 1742219 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2988
Credit: 11,869,884
RAC: 7,377
United States
Message 1742336 - Posted: 15 Nov 2015, 19:13:10 UTC - in response to Message 1742219.  

Here's an interesting WU.

3 different applications, 3 different results.
02ja11aa.12226.328091.8.12.206

It will be interesting to see if any of those three end up being invalid.

Looking at the results tables in each stderr.txt, I would guess the _0, on the cuda42 app, will be invalid, due to the large number of signals found that the others did not. If either of the other two results is validated, the _0 will not pass the 50% match test for "weakly similar" validation,

And the verdict is: _0 is invalid, 1,2,3 are valid.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1742336 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 4826
Credit: 30,725,064
RAC: 448
United States
Message 1742880 - Posted: 17 Nov 2015, 15:39:25 UTC

And here's a real waste of time and electricity.

http://setiathome.berkeley.edu/workunit.php?wuid=1954343880

The saddest part is it appears to be re-sending again! Shouldn't there be a check in the programming to segregate WUs that have 3 consecutive invalidations much less 7?

Again, such a waste.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1742880 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 571
Credit: 66,124,963
RAC: 0
Finland
Message 1742887 - Posted: 17 Nov 2015, 15:51:17 UTC - in response to Message 1742880.  

And here's a real waste of time and electricity.

http://setiathome.berkeley.edu/workunit.php?wuid=1954343880

The saddest part is it appears to be re-sending again! Shouldn't there be a check in the programming to segregate WUs that have 3 consecutive invalidations much less 7?

Again, such a waste.


That wu is from time when splitter code was altered, and got wrong

Yes, it's waste of our time and electricity.

But S@H staff do have more important jobs to do than write new program (and waste their time) to find out those bad wu's.

We just have to wait until wu reaches 10 resends, after that it is marked as invalid.
ID: 1742887 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1366
Credit: 154,339,631
RAC: 249,741
United States
Message 1742937 - Posted: 18 Nov 2015, 0:12:40 UTC - in response to Message 1742880.  
Last modified: 18 Nov 2015, 0:14:27 UTC

And here's a real waste of time and electricity.

http://setiathome.berkeley.edu/workunit.php?wuid=1954343880

The saddest part is it appears to be re-sending again! Shouldn't there be a check in the programming to segregate WUs that have 3 consecutive invalidations much less 7?

Again, such a waste.

Actually, there is a check. It's set at 10.

max # of error/total/success tasks 5, 10, 5

These days, that may be a bit too liberal. I've been taking a quick scan through and aborting anything 4 and higher. By now, I think most are cycled through. Here's hoping...

Had an even weirder one the other day. Not sure if it was a a result of the splitter error, or something else, but it was estimated at a quarter million hours to complete!! Sheesh. Caught it six plus hours into work, with less than 1/1000th of a percent complete. Nuked that sucker in a hurry, for sure ...
ID: 1742937 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 56,827,781
RAC: 17,441
United States
Message 1742948 - Posted: 18 Nov 2015, 0:47:10 UTC

Did someone forget to bring the Beta Scheduler online after today's outrage?... all the other functions seem to be online, from just B4 I wrote this...
.

Hello, from Albany, CA!...
ID: 1742948 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,385,710
RAC: 399
United Kingdom
Message 1742950 - Posted: 18 Nov 2015, 0:52:16 UTC - in response to Message 1742948.  

Did someone forget to bring the Beta Scheduler online after today's outrage?... all the other functions seem to be online, from just B4 I wrote this...

It's not unusual, But those in the know may wonder if it's for some other reason, We'll see. ;-)

Claggy
ID: 1742950 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 956,724
RAC: 122
United States
Message 1742980 - Posted: 18 Nov 2015, 2:35:08 UTC

The thing we tried last week (science database updates in advance of Green Bank data splitting) that didn't quite work? Well, we're doing it again this week, this time with hopefully more success. The beta project and other science database related stuff is offline until this is finished. We may likely run out of workunits, but we shall see...

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1742980 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1366
Credit: 154,339,631
RAC: 249,741
United States
Message 1743001 - Posted: 18 Nov 2015, 3:26:13 UTC
Last modified: 18 Nov 2015, 3:26:24 UTC

So, not looking so happy over on the SSP. RTS Cache down to 100k, no splitters running ...
ID: 1743001 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 25391
Credit: 48,638,892
RAC: 25,638
United States
Message 1743018 - Posted: 18 Nov 2015, 5:03:40 UTC - in response to Message 1742980.  

The thing we tried last week (science database updates in advance of Green Bank data splitting) that didn't quite work? Well, we're doing it again this week, this time with hopefully more success. The beta project and other science database related stuff is offline until this is finished. We may likely run out of workunits, but we shall see...

- Matt

Hope this time everything checks out fine. Yes, Ready To Send big fat ZERO.
ID: 1743018 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17583
Credit: 396,280,169
RAC: 156,251
United Kingdom
Message 1743046 - Posted: 18 Nov 2015, 6:05:16 UTC

All splitters of-line so only the odd resend coming out.
And about an hour or two ago reporting stalled, and the pile of "ready to reports" is growing....
All is not happy in SETIServerLand
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1743046 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1743086 - Posted: 18 Nov 2015, 10:25:50 UTC - in response to Message 1743046.  

All splitters of-line so only the odd resend coming out.
And about an hour or two ago reporting stalled, and the pile of "ready to reports" is growing....
All is not happy in SETIServerLand

There's still 95k results received in the last hour.

Time for a cup of tea.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1743086 · Report as offensive
Profile Kibble (KB7TIB)
Avatar

Send message
Joined: 6 Dec 99
Posts: 27
Credit: 9,868,860
RAC: 964
United States
Message 1743131 - Posted: 18 Nov 2015, 16:32:34 UTC - in response to Message 1743086.  

Well, it's a blue day. Two of my three machines have work units. One ran out sometime last night. :-(

Guess it's time to visit the Einstein site and see if they have any WUs available.
ID: 1743131 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 571
Credit: 66,124,963
RAC: 0
Finland
Message 1743135 - Posted: 18 Nov 2015, 16:57:19 UTC
Last modified: 18 Nov 2015, 17:09:07 UTC

Well, only bright side is that those bad wu's are flushed out much faster, just aborted 10 of those and 4 got out of system.

Of course 6 of them were resent to new host, but at least they are not waiting about 6 hours (RTS 1M cache) to be resend.

EDIT:

And MB splitters are online.
ID: 1743135 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4745
Credit: 509,438,923
RAC: 1,232,526
United States
Message 1743137 - Posted: 18 Nov 2015, 17:06:23 UTC

One of my machines just got 36 New tasks. They appear normal, no short run time estimates like the bad batch from last time. Hopefully all will be well.
ID: 1743137 · Report as offensive
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 27 · Next

Message boards : Number crunching : Panic Mode On (101) Server Problems?


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.