The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 107 · Next

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2035521 - Posted: 4 Mar 2020, 3:43:17 UTC

back again....

sorry Steven...
ID: 2035521 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2035529 - Posted: 4 Mar 2020, 4:03:19 UTC

Normal fight to get results uploaded and then the interminable wait for new work overnight to replenish the caches.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2035529 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035535 - Posted: 4 Mar 2020, 4:19:36 UTC - in response to Message 2035529.  

Normal fight to get results uploaded and then the interminable wait for new work overnight to replenish the caches.


Same here, not one of my 3 machines are uploading, hence they can't download either. And the lag on posting is ridonkulous.
ID: 2035535 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035537 - Posted: 4 Mar 2020, 4:22:26 UTC - in response to Message 2035535.  

Normal fight to get results uploaded and then the interminable wait for new work overnight to replenish the caches.


Same here, not one of my 3 machines are uploading, hence they can't download either. And the lag on posting is ridonkulous.


And just like that, the 2 lower performing ones submit.
ID: 2035537 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13922
Credit: 208,696,464
RAC: 304
Australia
Message 2035560 - Posted: 4 Mar 2020, 6:21:33 UTC

Is anyone able to get a response from the Scheduler (other than an error)? Even with NNT, it times out or is an error.
Grant
Darwin NT
ID: 2035560 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13922
Credit: 208,696,464
RAC: 304
Australia
Message 2035563 - Posted: 4 Mar 2020, 6:38:59 UTC

Still no joy even with only 75 being reported.
Grant
Darwin NT
ID: 2035563 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13922
Credit: 208,696,464
RAC: 304
Australia
Message 2035571 - Posted: 4 Mar 2020, 7:28:06 UTC - in response to Message 2035563.  
Last modified: 4 Mar 2020, 7:33:12 UTC

Still no joy even with only 75 being reported.

In progress keeps falling, so at least some people have been able to report, but still no joy here.


Things are way more borked than the usual after outage headaches. I think we really need to get the deadlines reduced to 2 weeks for new work (inc AP) and 3 days for resends or we're not going to make it till the end of the month.
Results returned and awaiting validation has hit another record high, on top of the boost it got after last weeks outage (which it never recovered from- we went in to the outage at 12.5 million, after the outage it peaked at just under 14 million, and it was only just a little bit more than just under 14 million at the start of this week's outage).
Grant
Darwin NT
ID: 2035571 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2035573 - Posted: 4 Mar 2020, 7:36:51 UTC

Still no joy even with only 75 being reported.

Same here. Totally unable to report a single task yet.

Good night. See ya in the morning.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2035573 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13922
Credit: 208,696,464
RAC: 304
Australia
Message 2035574 - Posted: 4 Mar 2020, 7:38:20 UTC - in response to Message 2035573.  

Still no joy even with only 75 being reported.

Same here. Totally unable to report a single task yet.

Good night. See ya in the morning.
I just got through!
That's one out of 30 something requests. Not a good ratio.
Grant
Darwin NT
ID: 2035574 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035579 - Posted: 4 Mar 2020, 7:53:30 UTC - in response to Message 2035574.  

Still no joy even with only 75 being reported.

Same here. Totally unable to report a single task yet.

Good night. See ya in the morning.
I just got through!
That's one out of 30 something requests. Not a good ratio.


My main machine just broke through as I was about to tell you it couldn't.
ID: 2035579 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035580 - Posted: 4 Mar 2020, 7:55:13 UTC - in response to Message 2035579.  

My main machine just broke through as I was about to tell you it couldn't.


Still only delivered about half.
ID: 2035580 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035585 - Posted: 4 Mar 2020, 8:14:09 UTC

Well, at least I've now reported. Still went through three cycles of Scheduler request completed: got 0 new tasks

I guess it's just too much to ask that I get 200 OpenCL_ati_mac Astropulse WUs seeing as there is 2000 available? But as this always goes, I'll have to DL about 100-150 of the SoG WUs that I'll dump into the abort column because they don't believe I won't crunch them.
ID: 2035585 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13922
Credit: 208,696,464
RAC: 304
Australia
Message 2035589 - Posted: 4 Mar 2020, 8:38:14 UTC - in response to Message 2035585.  
Last modified: 4 Mar 2020, 8:39:21 UTC

Well, at least I've now reported. Still went through three cycles of Scheduler request completed: got 0 new tasks
At least you that.
Still almost nothing but Scheduler errors here.


Edit- should have posted that sooner, finally managed to report a few more WUs just then.
Grant
Darwin NT
ID: 2035589 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13922
Credit: 208,696,464
RAC: 304
Australia
Message 2035590 - Posted: 4 Mar 2020, 8:42:30 UTC - in response to Message 2035585.  

But as this always goes, I'll have to DL about 100-150 of the SoG WUs that I'll dump into the abort column because they don't believe I won't crunch them.
If you;'re not going to process them, why download them??? Why not just set No New Tasks??? Or just set that system to another Venue & disable GPU processing?????
Grant
Darwin NT
ID: 2035590 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13922
Credit: 208,696,464
RAC: 304
Australia
Message 2035597 - Posted: 4 Mar 2020, 9:31:29 UTC

Well, finally got all of my work reported.
Scheduler now less frequently erroring out (but that's coming back from a very low base). Errors still out numbering "Project has no tasks available" responses by a very large margin.
Grant
Darwin NT
ID: 2035597 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13922
Credit: 208,696,464
RAC: 304
Australia
Message 2035603 - Posted: 4 Mar 2020, 9:51:53 UTC
Last modified: 4 Mar 2020, 9:52:16 UTC

What are the odds, that once we can get work from the Scheduler again that we won't able able to download them...?
Grant
Darwin NT
ID: 2035603 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 2035607 - Posted: 4 Mar 2020, 10:26:59 UTC

Finally got ~5600 reported across 4 boxes.
Now getting dribs and drabs of work, but not enough yet to knock Einstein off the perch.
I suspect, like last week, it will be another 12-18 hours before the caches start filling again and a steady workflow can be maintained.
ID: 2035607 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2035609 - Posted: 4 Mar 2020, 10:33:48 UTC

When the reports start working reliably, the assimilation queue will probably hit a new all time high. The queue has grown rapidly by about 600,000 workunits after each downtime and it is now higher than it has ever been just before that post downtime growth.

Also the total number of results in the database is now 20.3 million. Well past the safe 20 mil limit, so the database may be spilling out of RAM. If this is the case then the recovery may take several days!
ID: 2035609 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035611 - Posted: 4 Mar 2020, 10:37:55 UTC - in response to Message 2035590.  

But as this always goes, I'll have to DL about 100-150 of the SoG WUs that I'll dump into the abort column because they don't believe I won't crunch them.
If you;'re not going to process them, why download them??? Why not just set No New Tasks??? Or just set that system to another Venue & disable GPU processing?????


Because it is only that program that runs buggy on my system, and I'm not about to make a 4th configuration to satisfy that particular program. I still run MB and AP just fine, and I have 3 configurations for those programs that work just fine. If I had an option to deny downloading a particular plan class, that would already be done. As it is, I just deny them to run via max_concurrent=0 while my cache is full, and cmdline=-version when it isn't to keep them from running. The rest work just fine.
ID: 2035611 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2035614 - Posted: 4 Mar 2020, 10:51:17 UTC - in response to Message 2035611.  

If you;'re not going to process them, why download them??? Why not just set No New Tasks??? Or just set that system to another Venue & disable GPU processing?????
Because it is only that program that runs buggy on my system, and I'm not about to make a 4th configuration to satisfy that particular program.
If you are running Anonymous Platform, then just remove the SoG app from your app_info.xml. If you are not running anonymous, then do this:

Stop Boinc, then edit cc_config.xml to add this line to the <options> section:

<dont_check_file_sizes>1</dont_check_file_sizes>

and copy your best working GPU app executable over the SoG app executable in the project directory. Then restart Boinc.

Now you are actually using this other app to crunch the tasks the scheduler sends for SoG app.
ID: 2035614 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.