The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 89 · 90 · 91 · 92 · 93 · 94 · 95 . . . 107 · Next

AuthorMessage
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 2046460 - Posted: 23 Apr 2020, 1:22:19 UTC - in response to Message 2046321.  
Last modified: 23 Apr 2020, 1:30:04 UTC


I see that regularly on WCG, so there is such an option. But the client in question must be alive, and actually do the cancelling after the server request.


Thanks, I thought I had seen it before.

I think there may be over 1 million tasks in this condition now "Results waiting for db purging" out of the total "Results returned and awaiting validation"

I'm sure that value has been increasing significantly over the last few days, but I can't find a Munin graph for it. https://munin.kiska.pw/munin/setiathome-day.html


Are these the graphs you seek?

https://munin.kiska.pw/munin/Munin-Node/Munin-Node/workunits_setiathomev8.html

my $validationv8 = $states->{sah_v8_results_awaiting_validation};
my $assimilationv8 = $states->{sah_v8_workunits_waiting_for_assimilation};
my $deletionv8 = $states->{sah_v8_workunits_waiting_for_deletion};
my $waitingv8 = $states->{sah_v8_results_waiting_for_deletion};


my $validationap = $states->{ap_results_awaiting_validation};
my $assimilationap = $states->{ap_workunits_waiting_for_assimilation};
my $deletionap = $states->{ap_workunits_waiting_for_deletion};
my $waitingap = $states->{ap_results_waiting_for_deletion};
ID: 2046460 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2046464 - Posted: 23 Apr 2020, 1:36:03 UTC

I have a small number of "completed, inconclusive" when I go look at my Wingperson they usually are coming up with the same result (inconclusive). Given that it was on a Windows/SOG task....

Tom M
A proud member of the OFA (Old Farts Association).
ID: 2046464 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 2046469 - Posted: 23 Apr 2020, 2:42:03 UTC

Flood gates are open again.
Just picked up 259 resends, mostly _3 with a lot of _4 mixed in there.
Grant
Darwin NT
ID: 2046469 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2046470 - Posted: 23 Apr 2020, 3:02:36 UTC - in response to Message 2046460.  
Last modified: 23 Apr 2020, 3:06:13 UTC


I see that regularly on WCG, so there is such an option. But the client in question must be alive, and actually do the cancelling after the server request.


Thanks, I thought I had seen it before.

I think there may be over 1 million tasks in this condition now "Results waiting for db purging" out of the total "Results returned and awaiting validation"

I'm sure that value has been increasing significantly over the last few days, but I can't find a Munin graph for it. https://munin.kiska.pw/munin/setiathome-day.html


Are these the graphs you seek?

https://munin.kiska.pw/munin/Munin-Node/Munin-Node/workunits_setiathomev8.html


No, the labels on some of the graphs are a bit misleading.

"Results waiting for db purging" has just passed 1.25 million.

If you subtract that value from "Results returned and awaiting validation" the result appears to be the number of Workunits or Tasks that have not reached Quorum.

I may have got this wrong, but it seems to be in the right ball park.
ID: 2046470 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2046471 - Posted: 23 Apr 2020, 3:19:34 UTC - in response to Message 2046470.  

I may have got this wrong, but it seems to be in the right ball park.


You’ve got it wrong. The results waiting for purging are already out of the waiting for validation queue and have already been assimilated.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2046471 · Report as offensive     Reply Quote
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 2046473 - Posted: 23 Apr 2020, 3:36:03 UTC - in response to Message 2046471.  

I may have got this wrong, but it seems to be in the right ball park.


You’ve got it wrong. The results waiting for purging are already out of the waiting for validation queue and have already been assimilated.


OK, thanks.

I know that Workunits, Tasks and Results are not interchangeable, but there should be a way of estimating the Units that have now reached Quorum, even though they still have outstanding Tasks, and the Units that are still awaiting validation.
ID: 2046473 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2046475 - Posted: 23 Apr 2020, 6:07:42 UTC

I guess this anomalously high 'Results waiting for db purging' includes validated results that are part of the workunits that are still waiting for some unreturned results. The workunit is still waiting for validation but because some results in it have been validated already, those results are not shown in 'Results returned and awaiting validation' and the only place for them to go is 'Results waiting for db purging'. But they won't be purged before the associated workunit moves to 'Workunits waiting for db purging'.
ID: 2046475 · Report as offensive     Reply Quote
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2046476 - Posted: 23 Apr 2020, 6:48:51 UTC

I looked at the graph site, and I really like this page
https://munin.kiska.pw/munin/Munin-Node/Munin-Node/results_setiathome_AP_creation.html
as I can see the spikes of forced resends.

Most of what I'm currently working on is from the April 19 WU creation spike. I should be done with my WUs tomorrow night unless I snag some resends.

any guesses on when the next forced resend happens??
ID: 2046476 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2046485 - Posted: 23 Apr 2020, 9:51:34 UTC - in response to Message 2046476.  


any guesses on when the next forced resend happens??


It could be daily. I got some yesterday and some more today.

Tom M
A proud member of the OFA (Old Farts Association).
ID: 2046485 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2046495 - Posted: 23 Apr 2020, 11:56:00 UTC - in response to Message 2046475.  

I guess this anomalously high 'Results waiting for db purging' includes validated results that are part of the workunits that are still waiting for some unreturned results. The workunit is still waiting for validation but because some results in it have been validated already, those results are not shown in 'Results returned and awaiting validation' and the only place for them to go is 'Results waiting for db purging'. But they won't be purged before the associated workunit moves to 'Workunits waiting for db purging'.


Since the process is Validation->Assimilation->Deletion->Purging I don’t think the purging category is going to hold anything still waiting for validation. It’s just the stats that are still on the website in that category.

Workunit/Files waiting for deletion: The number of files which can be deleted from disk, as the workunit has been assimilated, and there is no more use for it or its constituent results.
Workunits/Results waiting for db purging: The number of workunits or results which have been deleted from disk and, after a short grace period (currently 24 hours), will be purged from the database. It is during this grace period that completed results can still be viewed in your personal account pages. For safety, important information is written to disk and archived before these rows are deleted.

Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2046495 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2046508 - Posted: 23 Apr 2020, 13:22:04 UTC - in response to Message 2046495.  
Last modified: 23 Apr 2020, 13:37:28 UTC

I guess this anomalously high 'Results waiting for db purging' includes validated results that are part of the workunits that are still waiting for some unreturned results.
Since the process is Validation->Assimilation->Deletion->Purging I don’t think the purging category is going to hold anything still waiting for validation. It’s just the stats that are still on the website in that category.
It wouldn't be the only place where the SSP labels are misleading.

Back when stuff was running normally, the ratio of results and workunits waiting for purging matched the 2.2 average replication most of the time. But recently when they have sent lot of pre-resends so we have lot of workunits that have their quorum filled but are still waiting for some unreturned results, the results waiting for purging has grown a lot without matching growth in workunits waiting for purging.

And it is absolutely impossible for those numbers to represent only those results that are really ready to be purged because the current ratio between results and WUs waiting for purging is about 360! Average replication is higher than normal due to those pre-resends, but it can't be that high because there's a 10 result cap after which the entire WU fails.

The process is not simply a linear validation->assimilation->deletion->purging. A workunit can go through validation multiple times. First when the result that filled the quorum was received and then again to resolve inconclusives or to give credit to later returned pre-resends. The already validated results of a workunit that is still waiting to be validated again have to be somewhere and observations are consistent with them ending up in 'Results waiting for db purging'.

Even a task that has missed its deadline can be returned in pretty late stage after it has already been expired on the web site, the WU processed by an extra wingman, validated and assimilated and being presumedly waiting for purging for real and it will still return the WU to validators and replaces the error result with a valid. Only when the WU has been purged from the database it is too late to return the expired task.
ID: 2046508 · Report as offensive     Reply Quote
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 2046533 - Posted: 23 Apr 2020, 15:30:37 UTC

found a Science United Account with 2.8K hosts and a lot of tasks in progress ...
ID: 2046533 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2046537 - Posted: 23 Apr 2020, 15:41:42 UTC - in response to Message 2046533.  
Last modified: 23 Apr 2020, 15:42:33 UTC

found a Science United Account with 2.8K hosts and a lot of tasks in progress ...


Science United is a collection of thousands of different people that use that platform to crunch BOINC projects instead of manually attaching to projects. no one person is managing all of the computers on that account.

https://scienceunited.org/
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2046537 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2046553 - Posted: 23 Apr 2020, 16:27:19 UTC - in response to Message 2046537.  

found a Science United Account with 2.8K hosts and a lot of tasks in progress ...
Science United is a collection of thousands of different people that use that platform to crunch BOINC projects instead of manually attaching to projects. no one person is managing all of the computers on that account.
https://scienceunited.org/
It's a dumbed down way to run boinc for the social media generation. One step towards the society depicted in the movie 'Idiocracy'.
ID: 2046553 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2046555 - Posted: 23 Apr 2020, 16:31:13 UTC - in response to Message 2046553.  

Pretty much.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2046555 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2046560 - Posted: 23 Apr 2020, 17:44:22 UTC - in response to Message 2046555.  

Pretty much.

+1

Sure not for me, i will not give the others control of what my host will crunch or do.

On another topic: Who is the one who crash the stats? There are no Free-DC or Boinc Stats anymore?
ID: 2046560 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2046561 - Posted: 23 Apr 2020, 17:49:33 UTC - in response to Message 2046560.  

Who is the one who crash the stats? There are no Free-DC or Boinc Stats anymore?
Setiathome hasn't updated the statistic export files in three days: https://setiathome.berkeley.edu/stats/

About the same time the replica database has been down.
ID: 2046561 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2046562 - Posted: 23 Apr 2020, 17:50:00 UTC - in response to Message 2046560.  

On another topic: Who is the one who crash the stats? There are no Free-DC or Boinc Stats anymore?
https://setiathome.berkeley.edu/stats/ hasn't been updated since 20 April - that's about when they turned off the replica, isn't it?
ID: 2046562 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 2046625 - Posted: 23 Apr 2020, 22:21:45 UTC - in response to Message 2046440.  

April 22 0:40 UTC
I've still got 259 Pendings and 342 Inconclusives
April 22 23:20
My Pendings are down to 145 and Inconclusives 237.
65 Pendings, 145 Inconclusives.
Grant
Darwin NT
ID: 2046625 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2046630 - Posted: 23 Apr 2020, 22:54:16 UTC - in response to Message 2046553.  

found a Science United Account with 2.8K hosts and a lot of tasks in progress ...
Science United is a collection of thousands of different people that use that platform to crunch BOINC projects instead of manually attaching to projects. no one person is managing all of the computers on that account.
https://scienceunited.org/
It's a dumbed down way to run boinc for the social media generation. One step towards the society depicted in the movie 'Idiocracy'.


. . I recommend a Sci-Fi short story called 'The Marching Morons' by C.M.Kornbluth (also nicely subtle in the title 8-})

Stephen

:)
ID: 2046630 · Report as offensive     Reply Quote
Previous · 1 . . . 89 · 90 · 91 · 92 · 93 · 94 · 95 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.