The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 60 · 61 · 62 · 63 · 64 · 65 · 66 . . . 107 · Next

AuthorMessage
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2043043 - Posted: 4 Apr 2020, 23:12:57 UTC - in response to Message 2043041.  

Mine's 15kb

Thanks. That's solve the puzzle.

Something was changed and the servers not handle anymore my large file.

Back to the drawing board. Will remain running with NNT until i think on something.

Thanks again for the help.

No problem.
Wondering, if that was deleted would BOINC just rebuild it on restart? Would think so.
ID: 2043043 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13745
Credit: 208,696,464
RAC: 304
Australia
Message 2043047 - Posted: 4 Apr 2020, 23:18:22 UTC - in response to Message 2043039.  

I'm looking for this file:

sched_request_setiathome.berkeley.edu
27kB for me.
Grant
Darwin NT
ID: 2043047 · Report as offensive     Reply Quote
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 2043048 - Posted: 4 Apr 2020, 23:18:25 UTC

Just got 1.

Thought it unusual that it was a _1.

Did a bit of searching and found it was the ghost that I had not bothered to try to recover.

Looks like they have turned something back on again.
Kevin


ID: 2043048 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13745
Credit: 208,696,464
RAC: 304
Australia
Message 2043049 - Posted: 4 Apr 2020, 23:19:56 UTC - in response to Message 2043048.  
Last modified: 4 Apr 2020, 23:27:59 UTC

Looks like they have turned something back on again.
It would explain why the Scheduler keeps falling over if they are enabling it for a hour or 2 every few hours.
Grant
Darwin NT
ID: 2043049 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2043053 - Posted: 4 Apr 2020, 23:26:28 UTC - in response to Message 2043049.  
Last modified: 4 Apr 2020, 23:29:54 UTC

Looks like they have turned something back on again.
If would explain why the Scheduler keeps falling over if they are enabling it for a hour or 2 every few hours.

That is exactly is my clue.

Wondering, if that was deleted would BOINC just rebuild it on restart? Would think so.

No it's created each time your host makes a new scheduled call. But is supposed to be a small file like yours.
Because my huge caches my file is a lot bigger. My large file was working fine until something changes the 303 secs delay to 606 this week.
Maybe is related to the ghost recovery subrotine at the server side. Need more digging with less alcohol in my head.
ID: 2043053 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13745
Credit: 208,696,464
RAC: 304
Australia
Message 2043057 - Posted: 4 Apr 2020, 23:32:09 UTC - in response to Message 2043041.  
Last modified: 4 Apr 2020, 23:33:05 UTC

Back to the drawing board. Will remain running with NNT until i think on something.
Try NNT and limit the number of reported Results to 25 or so.
cc_config.xml, <options> section
<max_tasks_reported>25</max_tasks_reported>

I've had to limit mine to 75 for ages now to be able to report & get new work. After outages i still often had to use NNT to report.
Grant
Darwin NT
ID: 2043057 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2043060 - Posted: 4 Apr 2020, 23:38:59 UTC - in response to Message 2043057.  

Back to the drawing board. Will remain running with NNT until i think on something.
Try NNT and limit the number of reported Results to 25 or so.
cc_config.xml, <options> section
<max_tasks_reported>25</max_tasks_reported>

I've had to limit mine to 75 for ages now to be able to report & get new work. After outages i still often had to use NNT to report.

Thanks for the tip but I can't do that.
Remember due my unique client, what you see is not what it rely is.

My host production is in the range of 50-70 WU each 5 min.
Now with the 10 min interval i'm sending 100-140 WU on each request.

Don't worry until i could find a solution NNT is ok for me.

It's more a question of curiosity, i 'm a curious men. LOL
ID: 2043060 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13745
Credit: 208,696,464
RAC: 304
Australia
Message 2043066 - Posted: 4 Apr 2020, 23:49:37 UTC - in response to Message 2043060.  

Thanks for the tip but I can't do that.
Remember due my unique client, what you see is not what it rely is.

My host production is in the range of 50-70 WU each 5 min.
Now with the 10 min interval i'm sending 100-140 WU on each request.
Yep, but at least the backlog would be increasing at a lower rate.
And it may be possible to return more per request, just use 25 as a starting point to see if it does allow you to return work.

If it does, you just click Update after each successful request- even at 25 a t time it should still be possible to return work faster than you are completing it (at least until your finger gives out from fatigue).
Grant
Darwin NT
ID: 2043066 · Report as offensive     Reply Quote
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 2043067 - Posted: 4 Apr 2020, 23:50:02 UTC - in response to Message 2043060.  


Thanks for the tip but I can't do that.
Remember due my unique client, what you see is not what it rely is.

My host production is in the range of 50-70 WU each 5 min.
Now with the 10 min interval i'm sending 100-140 WU on each request.


Does not matter.

Set NNT and restrict max tasks to a low amount and try.

If it works you do not have to wait just hit the update button again.

The wait was only introduced to stop people swamping the server with work requests.
Kevin


ID: 2043067 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2043068 - Posted: 4 Apr 2020, 23:58:03 UTC - in response to Message 2043067.  


Thanks for the tip but I can't do that.
Remember due my unique client, what you see is not what it rely is.

My host production is in the range of 50-70 WU each 5 min.
Now with the 10 min interval i'm sending 100-140 WU on each request.


Does not matter.

Set NNT and restrict max tasks to a low amount and try.

If it works you do not have to wait just hit the update button again.

The wait was only introduced to stop people swamping the server with work requests.

Seems to me I remember Richard saying something about the "sweet spot" for max reporting being 64 for Win and 128 for Linux. After a lot of experimenting, I found that to be a good compromise here.
ID: 2043068 · Report as offensive     Reply Quote
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 2043071 - Posted: 5 Apr 2020, 0:08:06 UTC - in response to Message 2043068.  


Seems to me I remember Richard saying something about the "sweet spot" for max reporting being 64 for Win and 128 for Linux. After a lot of experimenting, I found that to be a good compromise here.


Mine is set to 40, even without setting NNT it used to clear them down after an outrage - which usually ended just after I left for work.
Kevin


ID: 2043071 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2043074 - Posted: 5 Apr 2020, 0:29:55 UTC - in response to Message 2043031.  

...
05/04/2020 00:39:33 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
05/04/2020 00:39:36 | SETI@home | Scheduler request completed: got 75 new tasks                                    <-- !!!!!!
...

:O


. . You should buy a lottery ticket :)

Stephen

:)
ID: 2043074 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2043077 - Posted: 5 Apr 2020, 0:43:00 UTC - in response to Message 2043074.  
Last modified: 5 Apr 2020, 0:44:56 UTC

[quote]
05/04/2020 00:39:33 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
05/04/2020 00:39:36 | SETI@home | Scheduler request completed: got 75 new tasks <-- !!!!!!
...
[/code]
Congrats you win the #1 prize.

About my problem.
Set the WU reported to 128 and allow new tasks. As suggested. Thanks
Is working for now. Lets see what happening at the end of the 3 hrs cicle.
BTW The file drops from 1.8 MB to 1.5 MB still too high.
The WU cache is at 88K.
ID: 2043077 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2043080 - Posted: 5 Apr 2020, 0:54:18 UTC - in response to Message 2043034.  

@Stephen

Could you make me a favor?

Can you look and post (or PM) the size of your setiathome.berkeley.edu.xml file? Its located on the boinc directory (the one where the boinc.exe program is placed)

And if you could look how many seconds your host is asking for new work on the scheduled request? That will appears o the history file if you have the sched_op flag activated.

Asked because after the change for 303 to 606 my host has serious troubles to report the work and the only possible cause i was able to imagine focus on this 2 possibilities. So i need the info of a regular client who is getting work to be sure i'm in the right path.

Thanks in advance.


. . I hate to be a disappointment Juan but there is no such file in the BOINC folder on this machine (Windows) under program, but in the data directory there is a statistics.setiathome.berkeley.edu.xml file which is 11Kb, and on the Linux boxes that same file exists in the program/BOINC directories and is 10.5Kb. I don't have that flag set for the log file but I can add it, although I don't think it will yield much info because all I am getting is "no tasks available' and longer and longer backoffs. It is only when I initiate a manual request that I can see the 'normal' backoff which is 606 secs. But if you want me to try something I will happily do so.

Stephen

? ?
ID: 2043080 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2043082 - Posted: 5 Apr 2020, 1:03:12 UTC - in response to Message 2043080.  

@Stephen

Could you make me a favor?

Can you look and post (or PM) the size of your setiathome.berkeley.edu.xml file? Its located on the boinc directory (the one where the boinc.exe program is placed)

And if you could look how many seconds your host is asking for new work on the scheduled request? That will appears o the history file if you have the sched_op flag activated.

Asked because after the change for 303 to 606 my host has serious troubles to report the work and the only possible cause i was able to imagine focus on this 2 possibilities. So i need the info of a regular client who is getting work to be sure i'm in the right path.

Thanks in advance.


. . I hate to be a disappointment Juan but there is no such file in the BOINC folder on this machine (Windows) under program, but in the data directory there is a statistics.setiathome.berkeley.edu.xml file which is 11Kb, and on the Linux boxes that same file exists in the program/BOINC directories and is 10.5Kb. I don't have that flag set for the log file but I can add it, although I don't think it will yield much info because all I am getting is "no tasks available' and longer and longer backoffs. It is only when I initiate a manual request that I can see the 'normal' backoff which is 606 secs. But if you want me to try something I will happily do so.

Stephen

? ?


Was a type error the right name is: sched_request_setiathome.berkeley.edu

But Jimbocous & Grant already give me their numbers. I was able to confirm the problem was because my large file.
By a suggestion i set the report number to a lower set and i'm testing now if that helps of not.
Need to pass the 3 hrs cicle to be sure.

Fingers crossed.
Thanks
ID: 2043082 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2043083 - Posted: 5 Apr 2020, 1:05:43 UTC - in response to Message 2043048.  

Just got 1.

Thought it unusual that it was a _1.

Did a bit of searching and found it was the ghost that I had not bothered to try to recover.

Looks like they have turned something back on again.


. . I had 10 ghosts I was unaware of attempt a recovery only to expire :(

Stephen

:(
ID: 2043083 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2043088 - Posted: 5 Apr 2020, 1:13:43 UTC - in response to Message 2043082.  

Was a type error the right name is: sched_request_setiathome.berkeley.edu
But Jimbocous & Grant already give me their numbers. I was able to confirm the problem was because my large file.
By a suggestion i set the report number to a lower set and i'm testing now if that helps of not.
Need to pass the 3 hrs cicle to be sure.
Fingers crossed.
Thanks

. . Good luck!!!

Stephen

:)
ID: 2043088 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2043090 - Posted: 5 Apr 2020, 1:38:17 UTC

No joy. Still receiving:
04-Apr-2020 20:33:46 [SETI@home] Scheduler request failed: HTTP service unavailable
04-Apr-2020 20:33:46 [SETI@home] [sched_op] Deferring communication for 00:14:19
04-Apr-2020 20:33:46 [SETI@home] [sched_op] Reason: Scheduler request failed

After complete the 3 hrs cicle.

So i will reduce the number of active GPUs and the report WU is now at 64

Let's see what i get.
ID: 2043090 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13745
Credit: 208,696,464
RAC: 304
Australia
Message 2043093 - Posted: 5 Apr 2020, 2:02:48 UTC - in response to Message 2043090.  
Last modified: 5 Apr 2020, 2:03:50 UTC

No joy. Still receiving:
04-Apr-2020 20:33:46 [SETI@home] Scheduler request failed: HTTP service unavailable
04-Apr-2020 20:33:46 [SETI@home] [sched_op] Deferring communication for 00:14:19
04-Apr-2020 20:33:46 [SETI@home] [sched_op] Reason: Scheduler request failed

After complete the 3 hrs cicle.

So i will reduce the number of active GPUs and the report WU is now at 64
Keep in mind the system is borked
5/04/2020 10:48:11 | SETI@home | Scheduler request failed: HTTP internal server error
5/04/2020 10:50:47 | SETI@home | Scheduler request failed: HTTP service unavailable
5/04/2020 10:53:52 | SETI@home | Scheduler request failed: HTTP internal server error
5/04/2020 11:01:26 | SETI@home | Scheduler request failed: HTTP service unavailable
5/04/2020 11:31:13 | SETI@home | Project has no tasks available

It wasn't just you.

Scheduler is back again, but who knows how long for.
Grant
Darwin NT
ID: 2043093 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2043095 - Posted: 5 Apr 2020, 2:11:08 UTC
Last modified: 5 Apr 2020, 2:18:36 UTC

I reconfigure the device array so now my file size down to 240Kb and reports 128 Wu at a time.

Will slow down the crunch but at least the work will be reported and the new task will allow to flow.

Lets see what i get. Need to pass the 3 hrs cicle to be sure its worked.
ID: 2043095 · Report as offensive     Reply Quote
Previous · 1 . . . 60 · 61 · 62 · 63 · 64 · 65 · 66 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.