Panic Mode On (107) Server Problems?

Message boards : Number crunching : Panic Mode On (107) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 29 · Next

AuthorMessage
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1883463 - Posted: 12 Aug 2017, 23:33:36 UTC - in response to Message 1883456.  

The 100 WUs per CPU/GPU limit is a Seti one.
And Petri simply appears to have added a multiplier to the BOINC client's GPU count to make it think he has 4 times as many GPUs when it sends a scheduler request. But each (real or imagined) still only gets 100 tasks max per the project's limit.
ID: 1883463 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1883464 - Posted: 12 Aug 2017, 23:33:47 UTC - in response to Message 1883460.  

The 1000 total WU limit is in BOINC.
The 100 WUs per CPU/GPU limit is a Seti one.

So the 1000 limit is circumventable with the proper coding knowledge, but the 100 is client side and is in stone?

Nope.
The 100 is a server side limit set by the Seti project. You can ask for as much work as you like, but if the project thinks you have 100 WUs per CPU or GPU, then you won't be able to get any more. Petri worked around that limit by telling the servers he has 16 GPUs, so he still only gets a maximum of 100WUs per GPU.
Grant
Darwin NT
ID: 1883464 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1883465 - Posted: 12 Aug 2017, 23:35:58 UTC

Alas, the kitties only have real CPUs and real GPUs.
The ones they dream about having don't count............LOL.

Meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1883465 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1883467 - Posted: 12 Aug 2017, 23:42:47 UTC

Gotcha, thanks for the clarification guys. Sadly, that would only work on the GPU side, as the program could care less about how many CPU's you have.

BOINC --> You havee vone CPU, you have 4 CPU's, I Don' Care! I see Noth-ink! You geet 100! No More! (using my best Sgt. Schultz impersonation) lol

ID: 1883467 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1883500 - Posted: 13 Aug 2017, 1:22:07 UTC - in response to Message 1883277.  

I just tried TBars' repeated requests, then picked up 49, and then 12 WUs on the following requests.
Seem ridiculous, but it's worked every time I've tried it so far.

And again- running low on work. Did the Tbar method & down it comes.
Grant
Darwin NT
ID: 1883500 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1883519 - Posted: 13 Aug 2017, 3:22:15 UTC - in response to Message 1883500.  

So i'm noticing an unusual problem for me. I'm noticing that the server is telling me I have more work units on my computer than are actually there.....

Fortunately it hasn't become a problem yet. Meaning I'm still getting my max amount of work and the extra work units listed aren't restricting my ability to get work. So I have to wonder if these are ghost units. Can't really tell yet when it happen. I checked in progress and looked at the create dates and they are all within the last 24 hours so I am thinking I'm going to have to wait a few days to find out which work units it is listing...
ID: 1883519 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1883522 - Posted: 13 Aug 2017, 3:52:12 UTC - in response to Message 1883519.  

That usually means 'ghosts'. You can use Brent's recovery protocol.

To do that,
- set 'No New Tasks" and wait until you have room to download 20 tasks (max the server will resend at one time)
- wait for a task that needs to be reported but already uploaded
- while watching your 5 minute timer, hover over "Suspend Network" option
- click as soon as you see "Reporting XX tasks"
- you should NOT see "Scheduler request completed" or you missed the correct timing
- wait at least 5 minutes, you can leave BOINC running
- Shutdown/restart BOINC for 1 minute, STOPPING tasks
- enable "Allow Work", then Enable network
- the server will either send you up to 20 lost tasks or delete them from your list
- Repeat as required ....

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1883522 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1883531 - Posted: 13 Aug 2017, 6:14:51 UTC - in response to Message 1883522.  

Thanks Keith,

That worked. Was able to get rid of all of them

Z
ID: 1883531 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1883536 - Posted: 13 Aug 2017, 7:10:56 UTC - in response to Message 1883531.  

Glad to help. Hope you didn't have too many since at 20 tasks per, it can take a while.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1883536 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1883589 - Posted: 13 Aug 2017, 14:57:56 UTC
Last modified: 13 Aug 2017, 15:24:22 UTC

Stupid BOINC. It's running my cache empty because I have it set to cache 0.5 + 1.5 days worth of work, and the last task I got in was an AP that's guessed to run for 12.5 hours, but in real time only takes 21 minutes on my AMD RX470.

For hours on end though I have:
13/08/2017 16:37:42 | SETI@home | Not requesting tasks: don't need (CPU: ; AMD/ATI GPU: job cache full)

Of course that's also to do with my rather meager Fraction of time BOINC is running 7.97%. ;-)

Now... let's first see if the AP actually wants to run on the AMD RX470, because all previous ones would err within two minutes of running with the Lunatics app (from the Beta 6 installer). I've changed it to run the stock v7.09 application more than a month ago, but after that didn't run BOINC much or was always too late for gathering APs. Seeing the amount of APs in the field, this must have been a very small batch that was wrought from the data.

Forcing the AP to run.
It's gone over two minutes, over 10%. Over 20%. Over 30%. Over 40%. I'm sure it'll finish now.
Edit: and finished now. Will be reported in 2 minutes.

Edit2: Rethinking that. I think all previous APs ran to the end as well, but always with zero spikes and a large percentage radar blanking. That's happened to this task as well, so I'll await for the wingman to run his and see the outcome, in a few days. Set preferences to only get MBs.
ID: 1883589 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1883609 - Posted: 13 Aug 2017, 15:36:57 UTC - in response to Message 1883589.  

... running with the Lunatics app (from the Beta 6 installer) ...

... run the stock v7.09 application ...

... finished now ...
The r2742 app you ran that one with is exactly the same as the app in the Beta installer, and has been unchanged since November 2014 - Raistmer provides the OpenCL apps for both distributions.
ID: 1883609 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1883622 - Posted: 13 Aug 2017, 15:59:53 UTC - in response to Message 1883609.  

In that case the task will fail to validate against the result of the wingman. APs cannot be run on AMD RX GPUs (or maybe on the Vega).
ID: 1883622 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1883631 - Posted: 13 Aug 2017, 16:26:12 UTC - in response to Message 1883460.  

The 1000 total WU limit is in BOINC.
The 100 WUs per CPU/GPU limit is a Seti one.

So the 1000 limit is circumventable with the proper coding knowledge, but the 100 is client side and is in stone?

I can only guess that the "1000 limit" in the client is per work request.
The current client is more than happy to cache several thousand tasks at once.
It is only slightly amusing when checking on a machine to find it has 8,000+ tasks from a newly attached project because you forgot your cache settings were 10+10 days and it only stopped downloading work because the partition where BOINC stores its data is full. -.-

One of the ways to circumvent the task limits only requires a very small change. It was two characters IIRC.
Having the limits in place and working is better for the project. Otherwise we get the random db crashes that sometimes take days to fix.

They did change the hard limit of 100 GPU tasks to 100*n GPUs per vendor. Which is nice, but with the increase in CPU power it would be nice to see the CPU limits get looked at again.
Currently 100 tasks don't last that long for even older CPU hardware.
E5-2670 @ 3.0GHz running 32 at a time ~8hr
i5-4670 @ 3.4GHz running 4 at a time ~20hr
I have a feeling the new i7-8700K with 6c/12t could tear through 100 tasks in ~6-7hrs.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1883631 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24907
Credit: 3,081,182
RAC: 7
Ireland
Message 1883666 - Posted: 13 Aug 2017, 19:13:30 UTC

Taalking of wu limits, I currently have 35 with 8 running, so seeing this makes me wonder about those limits:

13/08/2017 19:54:41 | SETI@home | Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: )
13/08/2017 19:54:44 | SETI@home | Scheduler request completed
ID: 1883666 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1883693 - Posted: 13 Aug 2017, 22:16:59 UTC - in response to Message 1883666.  

Taalking of wu limits, I currently have 35 with 8 running, so seeing this makes me wonder about those limits:

13/08/2017 19:54:41 | SETI@home | Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: )
13/08/2017 19:54:44 | SETI@home | Scheduler request completed


. . I hate to seem negative but your hardware is not exactly bleeding edge. What limits are you running on your cache (ie, how many days work have you set to cache)? If it is less than 10 and 10 try changing it to those values and see if it doesn't fill up to the100 limit.

Stephen

??
ID: 1883693 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1883697 - Posted: 13 Aug 2017, 22:27:59 UTC

Sirius,

I see you are running the latest BOINC.... I think that might be some of your problem.

I seem to remember a while back what while running BOINC, I would get that message is it thought that some other project had work units in "hold" or suspended. Thereby "filling" my cache or "work units in progress" to the point where new work units would not be allowed....

2 fixes for this, increasing the caches amount of days to keep, plus the extra.....or downgrading BOINC to a previous version what didn't seem to have this "glitch".

I've done both but now I just remove any work units from any other project that way BOINC doesn't see anything else in the "cache".
ID: 1883697 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24907
Credit: 3,081,182
RAC: 7
Ireland
Message 1883698 - Posted: 13 Aug 2017, 22:29:32 UTC - in response to Message 1883693.  
Last modified: 13 Aug 2017, 22:31:00 UTC

Thanks, have been told to check settings. Issue solved. As for hardware, No longer have time or interest in being bleeding edge.

@Zalster. Stephen was correct it was a settings issue. My fault in not resetting settings on upgrading Boinc :-(
ID: 1883698 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1883715 - Posted: 13 Aug 2017, 23:26:44 UTC - in response to Message 1883589.  

Stupid BOINC. It's running my cache empty because....
Naw, that's not stupid. Now this, this is stupid;

Sun Aug 13 19:04:40 2017 | | Starting BOINC client version 7.6.33 for x86_64-apple-darwin
Sun Aug 13 19:04:40 2017 | | Data directory: /Volumes/Mov1/BOINC/Yosemite/BOINC Data
Sun Aug 13 19:04:42 2017 | | NVIDIA GPU 2: GeForce GTX 960 cannot be used for CUDA or OpenCL computation with CUDA driver 6.5 or later
Sun Aug 13 19:04:42 2017 | | CUDA: NVIDIA GPU 0: GeForce GTX 960 (driver version 7.5.30, CUDA version 7.5, compute capability 5.2, 2048MB, 1989MB available, 2748 GFLOPS peak)
Sun Aug 13 19:04:42 2017 | | CUDA: NVIDIA GPU 1: Graphics Device (driver version 7.5.30, CUDA version 7.5, compute capability 5.2, 2048MB, 1761MB available, 2022 GFLOPS peak)
Sun Aug 13 19:04:42 2017 | | CUDA: NVIDIA GPU 2: Graphics Device (driver version 7.5.30, CUDA version 7.5, compute capability 5.2, 2048MB, 1998MB available, 2022 GFLOPS peak)
Sun Aug 13 19:04:42 2017 | | OpenCL: NVIDIA GPU 1: Graphics Device (driver version 10.5.3 346.02.03f14, device version OpenCL 1.2, 2048MB, 1761MB available, 2022 GFLOPS peak)
Sun Aug 13 19:04:42 2017 | | OpenCL: NVIDIA GPU 2: Graphics Device (driver version 10.5.3 346.02.03f14, device version OpenCL 1.2, 2048MB, 1998MB available, 2022 GFLOPS peak)
Sun Aug 13 19:04:42 2017 | | OpenCL: NVIDIA GPU 2: GeForce GTX 960 (driver version 10.5.3 346.02.03f14, device version OpenCL 1.2, 2048MB, 2048MB available, 1031 GFLOPS peak)
Sun Aug 13 19:04:42 2017 | | OpenCL CPU: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
Sun Aug 13 19:04:42 2017 | SETI@home | Found app_info.xml; using anonymous platform
Sun Aug 13 19:04:42 2017 | SETI@home Beta Test | Found app_info.xml; using anonymous platform
Sun Aug 13 19:04:43 2017 | | OS: Mac OS X 10.10.5 (Darwin 14.5.0)

So, the machine has Two GTX 950s and One GTX 960. In Yosemite only the 960 is recognized. The 960 is in Slot 2, it should be Device #1, they are both Wrong. Not to mention it thinks the GTX 960 is a Pre-Fermi GPU. It was doing well with the 960 in slot 1, but I thought the cooling might be better with the sandwiched card being the longest of the 3 cards, that way the middle card has almost a full fan in clean air. When running the machine in Linux I noticed the sandwiched 950 was getting kinda hot, so I decided to try a different arrangement, It seems to be running CUDA on all three cards, I suppose I'll just ignore the BOINC ramblings for now.
ID: 1883715 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1883850 - Posted: 14 Aug 2017, 18:09:31 UTC
Last modified: 14 Aug 2017, 18:11:26 UTC

I've been having issues getting work on the special app machine. Toggling preferences in apps or using TBars request method is not working. Funny part is that the Windows machines have had no issues maintaining their cache levels. I am down about 200 gpu tasks on the Linux machine. The only thing that I have found to work in getting my normal 20 task download when the cache level is low is exiting BOINC and not restarting until the previous 5 minute countdown has expired. Then the machine will get maybe 5 cycles of download requests netting me around 100 tasks and then it stalls out again with no work is available. The gpu cache level will run down to 0 if I don't intervene with my manual stopping and restarting of BOINC.

This has me concerned because all the machines are going to be left unattended from Friday till next Tuesday this week when I leave for Idaho and the solar eclipse. And this is during the WOW contest to make matters worse.

Does anyone else running the special app have this problem and can offer some insight as to what is going on with my cruncher?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1883850 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1883855 - Posted: 14 Aug 2017, 18:24:49 UTC - in response to Message 1883850.  

It's not just the special apps, I have 2 machines on Windows that have run out of work. Every time they connect, I get the "No task available" response.
ID: 1883855 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (107) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.