The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 62 · 63 · 64 · 65 · 66 · 67 · 68 . . . 107 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2043381 - Posted: 6 Apr 2020, 20:40:09 UTC - in response to Message 2043375.  

All I had to do was try a different machine. Apparently there was something strange about the Older Hack, it works fine in the newer Hack, https://setiathome.berkeley.edu/show_host_detail.php?hostid=6796479 Now to see if Einstein's Mac AMD App is better than their NV Mac App, the NV version is very slow on a Mac.
ID: 2043381 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 2043402 - Posted: 6 Apr 2020, 22:26:47 UTC
Last modified: 6 Apr 2020, 22:27:48 UTC

Oldest Pending (of 1716) Jan 1, resend due April 17
Oldest Inconclusive (of 304) Jan 10, resend due April 21

Trying to find my oldest Valid resulted in this error-
Database Error
Database Error
Warning: Invalid argument supplied for foreach() in /disks/carolyn/b/home/boincadm/projects/sah/html/inc/result.inc on line 757 Database Error
Warning: Invalid argument supplied for foreach() in /disks/carolyn/b/home/boincadm/projects/sah/html/inc/result.inc on line 766

and then i got lucky.
Oldest Valid (of 8085) Jan 17
blc14_2bit_guppi_58692_10223_HIP84166_0144.14724.0.21.44.225.vlar
It's completed and Validated, but there is an In progress WU there that is not due back until June16, although the host did make contact yesterday it's got 190 In progress with a 6 day turnaround.

One from Jan 30 with 2 In progress Tasks with 7+ day turnarounds.



Quite a large number of my Valids still have Tasks In progress against them, so it would seem the Assimilation backlog is due to Tasks still in progress holding up the Assimilation of work that has already been Validated.
It might have been Validated, but it can't be Assimilated & deleted & purged, until all Task have been returned or timed out, which would explain why even when during outages and periods of no splitter output the Assimilator backlog remained pretty much an changed.




But that final blast of work from March 31, it will be 23May till many will be resent. Some later still (mid June).

And I can't believe the number of them that were sent out on May 31, and the last contact from the host to download them was May31.
Grant
Darwin NT
ID: 2043402 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2043412 - Posted: 6 Apr 2020, 22:51:07 UTC - in response to Message 2043402.  

Oldest Pending (of 1716) Jan 1, resend due April 17
Oldest Inconclusive (of 304) Jan 10, resend due April 21
But that final blast of work from March 31, it will be 23May till many will be resent. Some later still (mid June).
And I can't believe the number of them that were sent out on May 31, and the last contact from the host to download them was May31.


. . Some people obviously never found the NNT button ...

Stephen

:(
ID: 2043412 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2043439 - Posted: 7 Apr 2020, 2:24:00 UTC - in response to Message 2043412.  

Oldest Pending (of 1716) Jan 1, resend due April 17
Oldest Inconclusive (of 304) Jan 10, resend due April 21
But that final blast of work from March 31, it will be 23May till many will be resent. Some later still (mid June).
And I can't believe the number of them that were sent out on May 31, and the last contact from the host to download them was May31.


. . Some people obviously never found the NNT button ...

Stephen

:(

Just means we'll have to wait a while longer.
ID: 2043439 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2043442 - Posted: 7 Apr 2020, 2:29:16 UTC

In other news it looks like the replica database is catching up by roughly 20,000 seconds or 5.5 hours. I am sure when I looked at it this morning was 450,000 it is now down to 430,000 seconds. Is currently 4.98 days behind.
ID: 2043442 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 2043445 - Posted: 7 Apr 2020, 2:37:00 UTC - in response to Message 2043442.  

In other news it looks like the replica database is catching up by roughly 20,000 seconds or 5.5 hours. I am sure when I looked at it this morning was 450,000 it is now down to 430,000 seconds. Is currently 4.98 days behind.
I'm sure it's just taking a short break from it's efforts to get week or more behind.
Grant
Darwin NT
ID: 2043445 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2043451 - Posted: 7 Apr 2020, 3:11:20 UTC

seems like they turned off the assimilators (or heavily throttled them) to shift focus on the replica delay.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2043451 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 2043453 - Posted: 7 Apr 2020, 3:38:21 UTC - in response to Message 2043451.  
Last modified: 7 Apr 2020, 3:52:47 UTC

seems like they turned off the assimilators (or heavily throttled them) to shift focus on the replica delay.
Or it just continues to wait on Results to be returned so Assimilation can then take place.
I've got plenty that have been Validated, but still have outstanding Tasks against the Work Unit.


Edit-
Because of systems like this one.
Last contact, April 1
Tasks in progress, 14,409!

and WUs like this one.
1 Task timed out, 5 Tasks Validated, and 1 Task still in progress.
Deadline- May 22.
Grant
Darwin NT
ID: 2043453 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2043455 - Posted: 7 Apr 2020, 3:50:49 UTC - in response to Message 2043453.  
Last modified: 7 Apr 2020, 3:54:12 UTC

The assimilators were assimilating, but have stopped. Got down to like 7.3 million, but is now slowly climbing again from results being returned but not being assimilated. The rate at which it’s rising again is very slow since the return rate is very low. About 1/20th (5%) what it used to be.

The time that the replica started reducing again coincides with when the assimilation queue started rising again.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2043455 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 2043459 - Posted: 7 Apr 2020, 4:05:41 UTC - in response to Message 2043455.  

The time that the replica started reducing again coincides with when the assimilation queue started rising again.
The database is still so bloated it can only do 1 or 2 things at a time, not all of the things it needs to.
Grant
Darwin NT
ID: 2043459 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2043464 - Posted: 7 Apr 2020, 4:16:49 UTC - in response to Message 2043459.  

The time that the replica started reducing again coincides with when the assimilation queue started rising again.
The database is still so bloated it can only do 1 or 2 things at a time, not all of the things it needs to.

Results returned and awaiting validation 0 28,710 19,613,042 9m

Seems this isn't falling as fast as I expected it would as well. But...
Results received in last hour ** 0 122 7,051 0m

Isn't helping either.
ID: 2043464 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2043465 - Posted: 7 Apr 2020, 4:17:36 UTC

I think that shows just how many people just turned off their machines.
ID: 2043465 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2043466 - Posted: 7 Apr 2020, 4:38:39 UTC - in response to Message 2043465.  

It’s probably just the slow systems left. All (well most) of the fast and medium speed systems ran out of work already. So nothing to return for a very large number of people. Except when they get the small handful of resends every day.

Undoubtedly I’m sure there’s quite a few people that turned off their systems with work still in the cache, but that’s probably not the norm.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2043466 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 2043467 - Posted: 7 Apr 2020, 4:44:00 UTC - in response to Message 2043464.  

Results returned and awaiting validation 0 28,710 19,613,042 9m

Seems this isn't falling as fast as I expected it would as well. But...
Results received in last hour ** 0 122 7,051 0m

Isn't helping either.
With 2 month & even 3 months+ deadlines, that's how long it will take for the current stuff in limbo to eventually be resent. Then of course a few of those will probably end up being picked up by a system that won't return it, so add another 3+ months before those are likely to finally be returned (unless they end up with yet another black hole system).
Grant
Darwin NT
ID: 2043467 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2043468 - Posted: 7 Apr 2020, 6:20:01 UTC - in response to Message 2043453.  
Last modified: 7 Apr 2020, 6:20:43 UTC

Because of systems like this one.
Last contact, April 1
Tasks in progress, 14,409!
Something really messed up here, and I don't think it's the client, but rather the DB.
. Error tasks that timed out in 24 hours?
. MB tasks with a 14 day timeout?
. 14k tasks in progress on a machine with 1 GPU?
ID: 2043468 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 2043472 - Posted: 7 Apr 2020, 7:37:47 UTC - in response to Message 2043468.  

Because of systems like this one.
Last contact, April 1
Tasks in progress, 14,409!
14k tasks in progress on a machine with 1 GPU?
AV software creating Ghosts comes to mind.
But even though that system doesn't actually have them, they're in the Seti system as though they are until they eventually timeout & get re-issued.
Grant
Darwin NT
ID: 2043472 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2043475 - Posted: 7 Apr 2020, 8:24:00 UTC - in response to Message 2043472.  
Last modified: 7 Apr 2020, 8:34:45 UTC

Because of systems like this one.
Last contact, April 1
Tasks in progress, 14,409!
14k tasks in progress on a machine with 1 GPU?
AV software creating Ghosts comes to mind.
But even though that system doesn't actually have them, they're in the Seti system as though they are until they eventually timeout & get re-issued.

AV shouldn't be doing this. Not anything I can see which would cause it to reject file transfers. As far as I can see, it is all http(s) transfers of gridded binary data. Nothing there should be of concern to AV software.
ID: 2043475 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 2043479 - Posted: 7 Apr 2020, 10:18:15 UTC - in response to Message 2043475.  

AV software creating Ghosts comes to mind.
But even though that system doesn't actually have them, they're in the Seti system as though they are until they eventually timeout & get re-issued.

AV shouldn't be doing this. Not anything I can see which would cause it to reject file transfers. As far as I can see, it is all http(s) transfers of gridded binary data. Nothing there should be of concern to AV software.
Every other update seems to result in AV software of one brand or another considering BOINC activity to be malicious, and it's not at all unusual for someone to complain they aren't getting any work, when they have been- it's just that the AV software has been intercepting it as suspect activity.
Grant
Darwin NT
ID: 2043479 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2043480 - Posted: 7 Apr 2020, 10:22:10 UTC - in response to Message 2043475.  

Because of systems like this one.
Last contact, April 1
Tasks in progress, 14,409!
14k tasks in progress on a machine with 1 GPU?
AV software creating Ghosts comes to mind.
But even though that system doesn't actually have them, they're in the Seti system as though they are until they eventually timeout & get re-issued.
AV shouldn't be doing this. Not anything I can see which would cause it to reject file transfers. As far as I can see, it is all http(s) transfers of gridded binary data. Nothing there should be of concern to AV software.
Actually many people have wound up being the victim of overzealous AV ware and unless the BOINC folders are excluded from being scanned (and its internet connection being monitored) then those results are indeed very possible and I've trouble shooted many here who have suffered from them as well. ;-)

Cheers.
ID: 2043480 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2043482 - Posted: 7 Apr 2020, 10:42:11 UTC - in response to Message 2043465.  

I think that shows just how many people just turned off their machines.


. .The sad thing is that was always going to be a problem as soon as the announcement was made.

Stephen

:(
ID: 2043482 · Report as offensive     Reply Quote
Previous · 1 . . . 62 · 63 · 64 · 65 · 66 · 67 · 68 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.