Panic Mode On (106) Server Problems?

Message boards : Number crunching : Panic Mode On (106) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 29 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1868978 - Posted: 24 May 2017, 0:47:16 UTC - in response to Message 1868967.  

Yes, amazed too. Caught me off guard. It looks like I banked enough tasks to get through the outage finally. And didn't make any ghosts. I barely squeaked through with the Ryzen system for CPU task. Probably wouldn't have made for the typical 13 hour long outages of late.


. . Having a working rescheduler for windows the windows boxes survive the outage, but neither Linux box comes even close. The little fella (Mi-Burrito) is out of work in about 7 hours and La-Bamba is out in about 4 hours. That rig needs a cache of about 6 to 8 hundred to make it all the way through an outage, but E@H probably don't mind :) I really need to learn more about Laurent's app.

Stephen

??
ID: 1868978 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1868992 - Posted: 24 May 2017, 1:50:28 UTC - in response to Message 1868978.  

Well, I guess I jinxed myself. As soon as I said I had made no ghosts, I dumped the entire machine because of a TDR crash. I corrupted both the MilkyWay and SETI files and dumped all their tasks. I have had Einstein tasks working in High Priority mode because I goofed and forgot to toggle NNT off for about an hour a week ago. I've been working through them and actually would not have had any problem clearing them before their deadline on the 26th. But BOINC thought differently and forced the two tasks I have running at all times to HP mode. I can't even suspend the project or tasks without BSOD'ing the machine now. I should just have aborted the tasks and been done with them. This has happened a couple of times now. I corrupt the account file and the statistics files for MW and SETI. Einstein always escaped unscathed for some reason. So ..... OH JOY, now I get to spend the evening recovering 400 ghosts.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1868992 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1868996 - Posted: 24 May 2017, 3:00:31 UTC - in response to Message 1868992.  

Well, I just discovered something interesting, I think. It seems you can't recover your ghosts if you have originally downloaded them as CPU tasks and you are currently in system GPU task loading. My attempts at recovering my 438 ghosts only succeeded in the first try of getting 16 tasks that were assigned as GPU tasks. Each ghost recovery try since, (about 8 so far) has not recovered any ghosts, just received normal tasks. I will have to wait until I have filled up my 200 GPU quota I think before I try and recover my CPU ghosts.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1868996 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1868998 - Posted: 24 May 2017, 3:10:34 UTC - in response to Message 1868996.  

That seems odd. Except for Arecibo VLARs, I don't know that it really matters what the original task assignments were. In my experience, the scheduler just sends them back based on the current request. That's why it sometimes tries to send ghosted Arecibo VLARs to NVIDIA GPUs before it realizes that it can't and then immediately marks them as errors
ID: 1868998 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1869005 - Posted: 24 May 2017, 3:37:55 UTC - in response to Message 1868996.  
Last modified: 24 May 2017, 3:45:12 UTC

Whenever I do recovery I make sure I have an opening for 20 tasks on both CPU and GPU since those Arecibo vlars will say 'can't resend' if there is no room for them. You never know what will be in the batch they are sending, so make room for them.

EDIT: If you are seeing a 'normal' download after the communication interrupt. You didn't a). Have NNT set first, b). didn't wait 5 minutes between the next request c). Didn't restart BOINC.

EDIT 2: d). don't have ghosts, or server doesn't think you do. After this maintenance and server catch up, if may be the server hasn't flagged them yet.
ID: 1869005 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1869012 - Posted: 24 May 2017, 3:49:53 UTC - in response to Message 1869005.  
Last modified: 24 May 2017, 3:53:20 UTC

Whenever I do recovery I make sure I have an opening for 20 tasks on both CPU and GPU since those Arecibo vlars will say 'can't resend' if there is no room for them. You never know what will be in the batch they are sending, so make room for them.

EDIT: If you are seeing a 'normal' download after the communication interrupt. You didn't a). Have NNT set first, b). didn't wait 5 minutes between the next request c). Didn't restart BOINC.

NOPE. I did have NNT set. I did wait 5 minutes after shutting down BOINC before restarting. Definitely fully exited BOINC.

Well my theory about having to wait until my 200 task GPU cache was filled before it started filling the missing CPU tasks ... just went out the window. I had 1 CPU task on board and room for 99 new tasks. My ghost recovery just got me new tasks. No resends of lost tasks. So unless SETI turned off resends just this evening, I don't have any answer for why it is not working. I can only think it was because all of my ghosted tasks are CPU tasks in the servers database.

Anyone want to try to explain why ghost recovery isn't working.

[EDIT] I can prove it.

Keith-Windows7

1			5/23/2017 20:42:42	Starting BOINC client version 7.6.33 for windows_x86_64	
2			5/23/2017 20:42:42	log flags: file_xfer, sched_ops, task	
3			5/23/2017 20:42:42	Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8	
4			5/23/2017 20:42:42	Data directory: C:\ProgramData\BOINC	
5			5/23/2017 20:42:42	Running under account Keith	
6			5/23/2017 20:42:43	CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 378.92, CUDA version 8.0, compute capability 6.1, 4096MB, 3046MB available, 6463 GFLOPS peak)	
7			5/23/2017 20:42:43	CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 378.92, CUDA version 8.0, compute capability 6.1, 4096MB, 3046MB available, 6463 GFLOPS peak)	
8			5/23/2017 20:42:43	OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 378.92, device version OpenCL 1.2 CUDA, 8192MB, 3046MB available, 6463 GFLOPS peak)	
9			5/23/2017 20:42:43	OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 378.92, device version OpenCL 1.2 CUDA, 8192MB, 3046MB available, 6463 GFLOPS peak)	
10	SETI@home	5/23/2017 20:42:43	Found app_info.xml; using anonymous platform	
11			5/23/2017 20:42:43	Host name: Keith-Windows7	
12			5/23/2017 20:42:43	Processor: 8 AuthenticAMD AMD FX-8370 Eight-Core Processor                [Family 21 Model 2 Stepping 0]	
13			5/23/2017 20:42:43	Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rd	
14			5/23/2017 20:42:43	OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)	
15			5/23/2017 20:42:43	Memory: 15.90 GB physical, 31.80 GB virtual	
16			5/23/2017 20:42:43	Disk: 238.47 GB total, 179.94 GB free	
17			5/23/2017 20:42:43	Local time is UTC -7 hours	
18	Einstein@Home	5/23/2017 20:42:43	Found app_config.xml	
19	Milkyway@Home	5/23/2017 20:42:43	Found app_config.xml	
20	SETI@home	5/23/2017 20:42:43	Found app_config.xml	
21			5/23/2017 20:42:43	Config: GUI RPC allowed from any host	
22			5/23/2017 20:42:43	Config: GUI RPCs allowed from:	
23			5/23/2017 20:42:43	    192.168.2.192	
24			5/23/2017 20:42:43	    keith-windows7	
25			5/23/2017 20:42:43	Config: report completed tasks immediately	
26	Einstein@Home	5/23/2017 20:42:43	URL http://einstein.phys.uwm.edu/; Computer ID 12444941; resource share 125	
27	Milkyway@Home	5/23/2017 20:42:43	URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 257518; resource share 75	
28	SETI@home	5/23/2017 20:42:43	URL http://setiathome.berkeley.edu/; Computer ID 5741129; resource share 800	
29	Milkyway@Home	5/23/2017 20:42:43	General prefs: from Milkyway@Home (last modified 25-Apr-2017 12:04:39)	
30	Milkyway@Home	5/23/2017 20:42:43	Host location: none	
31	Milkyway@Home	5/23/2017 20:42:43	General prefs: using your defaults	
32			5/23/2017 20:42:43	Reading preferences override file	
33			5/23/2017 20:42:43	Preferences:	
34			5/23/2017 20:42:43	   max memory usage when active: 8141.04MB	
35			5/23/2017 20:42:43	   max memory usage when idle: 14653.88MB	
36			5/23/2017 20:42:43	   max disk usage: 1.00GB	
37			5/23/2017 20:42:43	   (to change preferences, visit a project web site or select Preferences in the Manager)	
38			5/23/2017 20:42:43	Suspending network activity - user request	
39	SETI@home	5/23/2017 20:42:50	work fetch resumed by user	
40			5/23/2017 20:42:57	Resuming network activity	
41	SETI@home	5/23/2017 20:42:57	Sending scheduler request: To report completed tasks.	
42	SETI@home	5/23/2017 20:42:57	Reporting 1 completed tasks	
43	SETI@home	5/23/2017 20:42:57	Requesting new tasks for CPU and NVIDIA GPU	
44	SETI@home	5/23/2017 20:43:00	Scheduler request completed: got 14 new tasks	
45	SETI@home	5/23/2017 20:43:02	Started download of blc03_2bit_guppi_57835_10813_HIP48714_0038.920.409.23.46.24.vlar	

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1869012 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1869013 - Posted: 24 May 2017, 3:52:46 UTC - in response to Message 1869012.  

EDIT 2: d). don't have ghosts, or server doesn't think you do. After this maintenance and server catch up, if may be the server hasn't flagged them yet.
ID: 1869013 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1869014 - Posted: 24 May 2017, 3:55:41 UTC - in response to Message 1869013.  

Well, that is the only plausible theory I can grasp at so far. I DID in fact receive about 16 resent tasks the first time I tried ghost recovery. I got 15 GPU tasks and 1 expired task.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1869014 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1869016 - Posted: 24 May 2017, 4:13:05 UTC - in response to Message 1869013.  

After this maintenance and server catch up, if may be the server hasn't flagged them yet.
Yeah, that kinda makes sense to me, too. The Replica's over 30,000 seconds behind and perhaps your ghosts are in that limbo. Wonder why some came back on the first try, though.
ID: 1869016 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1869021 - Posted: 24 May 2017, 4:29:12 UTC - in response to Message 1869016.  

I guess to test out the theory is to wait till the replica recovers and then try ghost recovery again once I set NNT and make room for 20 resends.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1869021 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1869022 - Posted: 24 May 2017, 4:30:38 UTC - in response to Message 1869021.  

Just set a small cache for now, then you will have time later on.
ID: 1869022 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1869029 - Posted: 24 May 2017, 4:58:34 UTC - in response to Message 1869022.  

OK, just set my preferences for 0.5 day cache. Its going to impact everybody since I have everyone at no venue. I am still working through the buffer I built up on the other Windows 7 machine. The Win 10 machine already worked through everything already since it is the fastest machine in the stable with the Ryzen.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1869029 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1869031 - Posted: 24 May 2017, 5:03:23 UTC - in response to Message 1868992.  

Well, I guess I jinxed myself. As soon as I said I had made no ghosts, I dumped the entire machine because of a TDR crash. I have had Einstein tasks working in High Priority mode because I goofed and forgot to toggle NNT off for about an hour a week ago. I can't even suspend the project or tasks without BSOD'ing the machine now. I should just have aborted the tasks and been done with them. This has happened a couple of times now. I corrupt the account file and the statistics files for MW and SETI. Einstein always escaped unscathed for some reason. So ..... OH JOY, now I get to spend the evening recovering 400 ghosts.


. . I have run Einstein at 0% resource priority so it only runs when SETI has no work (ie outages). But I had a hiccough with the rescheduler on Bertie last week and have spent that week recovering nearly 350 ghosts so you have my sympathy ...

. . Dare I tempt fate and state that I now have zero ghosts??

Stephen

?
ID: 1869031 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1869055 - Posted: 24 May 2017, 10:19:09 UTC
Last modified: 24 May 2017, 10:19:25 UTC

Looks like someone is using Seti to test run some yet to be released hardware.
Lurking in the top hosts list.
CPU type GenuineIntel Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz [Family 6 Model 85 Stepping 4]
Number of processors 96
Memory 195245.19 MB

Considering the CPUs are only clocked at 2.70GHz, its pumping out WUs in pretty good time. If it were running the Lunatics AVX application I suspect it would really churn out some work.
Grant
Darwin NT
ID: 1869055 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1869062 - Posted: 24 May 2017, 11:03:32 UTC

Charles Long maybe?

Cheers.
ID: 1869062 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1869066 - Posted: 24 May 2017, 11:30:25 UTC
Last modified: 24 May 2017, 12:09:48 UTC

This is new to me, it popped up while changing apps chasing AP tasks.

Instead of the usual:
AstroPulse v7: yes
SETI@home v8: yes

It showed:
(all applications)

Other work was set to NO.

I haven't been able to recreate it, but it does indicate to me there IS something lurking in the code which may cause strange behaviours regarding AP/MB selections.
Anyone else ever see this?
ID: 1869066 · Report as offensive
Profile Advent42
Avatar

Send message
Joined: 23 Mar 17
Posts: 175
Credit: 4,015,683
RAC: 0
Ireland
Message 1869074 - Posted: 24 May 2017, 12:42:57 UTC - in response to Message 1869066.  

Well I have about a full day of tasks to get through but I can only seem to get 3 done at a time....where as before it was 5....so not sure what it is...and no AstroPulseV7!!
ID: 1869074 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1869094 - Posted: 24 May 2017, 14:34:44 UTC - in response to Message 1869066.  

This is new to me, it popped up while changing apps chasing AP tasks.

Instead of the usual:
AstroPulse v7: yes
SETI@home v8: yes

It showed:
(all applications)

Other work was set to NO.

I haven't been able to recreate it, but it does indicate to me there IS something lurking in the code which may cause strange behaviours regarding AP/MB selections.
Anyone else ever see this?

Yes, this has been changed from what it used to do. This mechanism was in fact the way I toggled preferences to get work running again on my systems. I got AP tasks for the first time in a really long time on two systems. 12 on Numbskull. I didn't get any on Keith-Windows7 because it was set to NNT in order to make room for ghosts. I tried again to recover this morning. No dice. The system doesn't seem to think I have any ghosts so won't resend them. The 438 tasks that got abandoned in the project corruption are gone to me. Going to take a long time clear out naturally now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1869094 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1869105 - Posted: 24 May 2017, 15:20:40 UTC

And the lower 'quality' of the Kibble available to feed to my crunchers lately has affected my SETI RAC by 10%.............not in a good way.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1869105 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1869112 - Posted: 24 May 2017, 15:47:38 UTC - in response to Message 1869094.  

The system doesn't seem to think I have any ghosts so won't resend them. The 438 tasks that got abandoned in the project corruption are gone to me. Going to take a long time clear out naturally now.
Looks like they did get marked as Abandoned, https://setiathome.berkeley.edu/results.php?hostid=5741129&offset=0&show_names=0&state=6&appid=, so they've already been sent out to new hosts. No need to worry. :^)
ID: 1869112 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (106) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.