GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU

Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 37 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1810002 - Posted: 17 Aug 2016, 21:44:39 UTC - in response to Message 1809990.  

This is puzzling me also. I have been having similar issues on one of my machines with it discarding phantom AP CPU tasks after running the rescheduler. I do have the proper AP CPU app installed in my app_info. I also currently have real 2 AP CPU tasks on board. I will process them normally with no issues. I have also done in the past. This is what my error log shows when I restart after a reschedule.
Pipsqueek

11	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
12	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
13	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
14	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
15	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
16	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
17	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
18	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
19	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
20	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
21	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
22	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
23	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
24	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
25	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
26	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
27	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
28	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
29	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	
30	SETI@home	8/17/2016 2:09:50 PM	[error] No application found for task: windows_x86_64 703 sse2; discarding	


I have mentioned this to Mr. Kevvy and Stubbles already but they haven't had any insight yet into why this is happening occasionally. Anyone have ideas?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1810002 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1810003 - Posted: 17 Aug 2016, 21:46:44 UTC - in response to Message 1810002.  

Whew, at least I'm not the only one! ;-)

ID: 1810003 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1810019 - Posted: 17 Aug 2016, 22:11:01 UTC - in response to Message 1809850.  

-high_prec_timer quite experimental feature for now.
Though some sources say it affects only particular program, MSDN quite clear in that multimedia timer precision change is system-wide. If so, it can affect other programs too.
So, in case of GUI freezes the first thing I would change is to disable this option.
Also, it has sense ONLY with -use_sleep. No sense to increase timer precision if timer not used...


. . I tried Zalster's suggestion and disabled -high_prec_timer and reduced -tt to 500 but lockups still bad, maybe worse. So for now I am returning to r3430, better the devil you know :). Lockups were a daily problem there but not an hourly one :)
ID: 1810019 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1810022 - Posted: 17 Aug 2016, 22:14:08 UTC

OK, here's what I think has happened. Your app_info.xml file is absolutely fine, and looks like it has been assembled from my installer.

It will handle the following types of task:

Astropulse for CPU
... v703
... v701
... v700

Astropulse for NVidia
... v710
... v705

Multibeam for CPU
... v800

Multibeam for NVidia
... v812
... v800

All of which match - as they should - the version numbers on offer from the server, as shown on the applications page for Windows. Plus a couple of older versions (701 and 705 for AP) designed to cover the case of people upgrading from older versions of the installer. So you're able to cope with anything sent by the server as stock, and anything you download yourself. So far, so good.

But wait...

Look at All tasks for computer 8012837. You have just 7 AP tasks listed, and filtering that down, only four are active - all assigned by the server as "AstroPulse v7 Anonymous platform (NVIDIA GPU)".

But a little while ago, in message 1809987, you posted a list of 67 tasks with "No application found for task: windows_x86_64 710 opencl_nvidia_100". That's an AP version number and plan class, but no way do you have 67 Astropulse tasks loaded.

So I looked at the source code for Mr. Kevvy's rescheduler.

He properly swaps CPU and GPU plan_classes, version numbers and platforms using these variables

string app_versionGPU = "";	// The app number (in text form) of the GPU plan_class
string app_versionCPU = "";	// The app number (in text form) of CPU app
string platformGPU = "";	// Platform name of the GPU app
string platformCPU = "";	// Platform name of the CPU app
string version_numGPU = "";	// Version number (in text form) of the GPU app
string version_numCPU = "";	// Version number (in text form) of the CPU


But note that there is only room for one of each - and the word "Astropulse" appears nowhere in his source code or comments.

So, my strong suspicion is that the rescheduler scans the input files for version numbers, and latches on to the first one it finds - without noticing whether it belongs to an <app_name>setiathome_v8 or an <app_name>astropulse_v7

He pays some attention to "<app_version>n" in sched_request_setiathome.berkeley.edu.xml, but appears not to retrieve the full app_version structure from earlier in the file, where the full detail is available, as

<app_version>
    <app_name>setiathome_v8</app_name>
    <version_num>800</version_num>
    <platform>windows_x86_64</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>4706468289.867001</flops>
    <api_version>7.5.0</api_version>
</app_version>

- the 'n' in <other_results> is simply an index into a list of those structures, and they need to be checked too.

Altogether, it looks as if the rescheduler is currently capable of assigning an AP <version_num> to an <app_name>setiathome_v8</app_name> task, with the results that Al and Keith have reported. I'd suggest that you don't use the rescheduler if you have AP tasks in your cache, until Mr. Kevvy has had a chance to consider and respond to these comments.
ID: 1810022 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1810025 - Posted: 17 Aug 2016, 22:21:17 UTC - in response to Message 1809961.  

Not normal for me since I have chosen to not run CPU work units. I have run some in the past.


. . Umm you need both a GPU cache and a CPU cache to "swap" tasks between them. It is a rescheduler not a task remover.
ID: 1810025 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1810026 - Posted: 17 Aug 2016, 22:23:14 UTC - in response to Message 1809964.  

Then how do you expect to Reschedule tasks GPU <--> CPU if you "have chosen to not run CPU work units"??



They can still be ran is what I am saying. Just run multibeam on GPU and Guppi vlars on CPU instead of running multibeam on CPU.



. . That is the function but you have to be running CPU tasks to have a CPU cache to move them to.
ID: 1810026 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1810028 - Posted: 17 Aug 2016, 22:27:05 UTC - in response to Message 1809966.  

Then how do you expect to Reschedule tasks GPU <--> CPU if you "have chosen to not run CPU work units"??

It's actually a very reasonable configuration. You can load the machine with CPU apps, but choose not to *download* work for them under normal circumstances (that's how the web preference works). That's how I run my machines normally - the CPUs are busy on other, CPU-only, BOINC projects.

I can't immediately see why the test the rescheduler uses should be "at least one CPU work unit in sched_request_setiathome.berkeley.edu.xml", but that's the way Mr. Kevvy wrote it, presumably to suit his own needs and not foreseeing this particular scenario. I think it would be a good enhancement.


. . I take it that requires "manually" installing the CPU handler app while your SETI preferences say to not send CPU work. I need to learn how to do that :)
ID: 1810028 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1810029 - Posted: 17 Aug 2016, 22:29:58 UTC - in response to Message 1809968.  

Wouldn't that require that the CPU app already be installed, either as stock (which would seem unlikely if CPU work wasn't selected), or through Anonymous Platform and the app_info.xml?



. . Lunatics, that would be worth a try. That would install the CPU handler app and then you could move to the CPU cache.
ID: 1810029 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1810030 - Posted: 17 Aug 2016, 22:34:46 UTC - in response to Message 1810022.  

Richard, thanks so much for your insight and concise analysis. I think you explained it very well. It doesn't quite explain for me why I have seen this problem ONLY on Pipsqueek. I have never seen the issue on Numbskull or Keith-Windows7 which also have both CPU/GPU AP tasks on board currently. In both machines, there were AP tasks before and after I ran the rescheduler on them within a ten minute window of rescheduling on Pipsqueek. They have been either lucky or there is some obscure differences in client_state or sched_request between the three machines.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1810030 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1810034 - Posted: 17 Aug 2016, 23:09:51 UTC - in response to Message 1810030.  

Does "latches on to the first one it finds" (or that might be the last one - I didn't check the search logic) cover it? In other words, it grabs an AP number only if AP is the oldest WU in the cache, or alternatively if it's the most recent download? Either case would be pretty rare in the current state of AP splitting.
ID: 1810034 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1810039 - Posted: 17 Aug 2016, 23:59:16 UTC - in response to Message 1810034.  

Well, my previous statement is no longer valid. I had one AP CPU task left on Keith-Windows7 just now. I had finished up the previous balance of GPU AP tasks. I just ran the rescheduler again and I just dumped 17 phantom AP CPU tasks on that machine. So, you are probably correct in that it depends on how old a task is sitting in client_state or something. I agree with your assessment now that you should not run the rescheduler on any machine with AP work on board. We will have to wait it out or you will dump work. Let's hope Mr. Kevvy can make some adjustments in the app to accommodate AP work besides MB work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1810039 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1810053 - Posted: 18 Aug 2016, 1:19:58 UTC
Last modified: 18 Aug 2016, 1:20:13 UTC

Haven't been watching this thread much but was advised of issues...

The app. doesn't check for AP tasks but does the opposite: the MB mover checks for the task name starting with ##xx two digit-day and month:

		if (isdigit(client_state[currentposition])) {
			if (isdigit(client_state[currentposition + 1])) {
				if (isalpha(client_state[currentposition + 2])) {
					if (isalpha(client_state[currentposition + 3])) {


And the GUPPI mover checks for the WU name task name starting with "blc":
	if ( client_state.substr(currentposition, 3) == "blc" ) {


So it should never touch AP work as neither would match as they start with "ap_"

I apologize if the app. doesn't work for anyone (especially if it drops work units) but I am unsure why it does this, as it worked for me the Windows and several Linux machines I tested it and continue to use it on. I think this is due to BOINC using multiple platforms simultaneously for the same type of work for some people's builds. (I even ran it on the client_state files that people who had issues sent me, and it worked on them, so I'm still in the dark as to the cause.) As noted in the readme, back up your client_state.xml before first use, and hopefully if it works the first time it will keep working if you don't go and change platforms ie by installing a third-party client.
ID: 1810053 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1810055 - Posted: 18 Aug 2016, 1:26:54 UTC - in response to Message 1810053.  

I can supply my backed-up client_state files that had tasks in it that were discarded on two machines if that helps. Not running any third party apps, only the official ones that install with the Lunatics installers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1810055 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1810058 - Posted: 18 Aug 2016, 1:45:49 UTC - in response to Message 1810053.  

Haven't been watching this thread much but was advised of issues...

The app. doesn't check for AP tasks but does the opposite: the MB mover checks for the task name starting with ##xx two digit-day and month:

		if (isdigit(client_state[currentposition])) {
			if (isdigit(client_state[currentposition + 1])) {
				if (isalpha(client_state[currentposition + 2])) {
					if (isalpha(client_state[currentposition + 3])) {


And the GUPPI mover checks for the WU name task name starting with "blc":
	if ( client_state.substr(currentposition, 3) == "blc" ) {


So it should never touch AP work as neither would match as they start with "ap_"

I apologize if the app. doesn't work for anyone (especially if it drops work units) but I am unsure why it does this, as it worked for me the Windows and several Linux machines I tested it and continue to use it on. I think this is due to BOINC using multiple platforms simultaneously for the same type of work for some people's builds. (I even ran it on the client_state files that people who had issues sent me, and it worked on them, so I'm still in the dark as to the cause.) As noted in the readme, back up your client_state.xml before first use, and hopefully if it works the first time it will keep working if you don't go and change platforms ie by installing a third-party client.

I think that perhaps what Richard was getting at was not so much that the actual identification of tasks to be rescheduled has a problem, but rather that the initial identification of app version and plan class might have a problem when the first task that is found in the scheduler request file happens to be an AP. In other words, could:

currentposition = sched_request.find("</app_version>\n        <plan_class>");		// Now do the same as the block above but for app_versionGPU

result in the extraction of the app_version (and, subsequently, the plan class) for an AP instead of for an MB? I don't know that you're checking for APs at that point. Just a thought.
ID: 1810058 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1810059 - Posted: 18 Aug 2016, 1:53:39 UTC - in response to Message 1810058.  

I think that perhaps what Richard was getting at was not so much that the actual identification of tasks to be rescheduled has a problem, but rather that the initial identification of app version and plan class might have a problem when the first task that is found in the scheduler request file happens to be an AP. In other words, could:

currentposition = sched_request.find("</app_version>\n        <plan_class>");		// Now do the same as the block above but for app_versionGPU

result in the extraction of the app_version (and, subsequently, the plan class) for an AP instead of for an MB? I don't know that you're checking for APs at that point. Just a thought.


Thanks... I will have a look at that. Will have time this weekend to check it over in detail (thankfully no plans for a change.)
ID: 1810059 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1810090 - Posted: 18 Aug 2016, 6:15:13 UTC

 
Until this is fixed:

boinc_rescheduler_2_7.zip still exists:
http://www.efmer.eu/download/boinc/scheduler/boinc_rescheduler_2_7.zip

How to patch for SETI@home v8:
http://setiathome.berkeley.edu/forum_thread.php?id=77586&postid=1763581#1763581
http://setiathome.berkeley.edu/forum_thread.php?id=77586&postid=1763938#1763938

Tool to use:
HxD - Freeware Hex Editor
https://mh-nexus.de/en/hxd/

On this Downloads page:
https://mh-nexus.de/en/downloads.php?product=HxD

Find (Ctrl+F):
94e57a52e4d3eca6576bc15a99e884b6cdd5b03a

... to easy find links for "HxD, English - portable 1.7.7.0"
 
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1810090 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1810096 - Posted: 18 Aug 2016, 7:24:04 UTC - in response to Message 1810090.  

 
Until this is fixed:

boinc_rescheduler_2_7.zip still exists:
http://www.efmer.eu/download/boinc/scheduler/boinc_rescheduler_2_7.zip

How to patch for SETI@home v8:
http://setiathome.berkeley.edu/forum_thread.php?id=77586&postid=1763581#1763581
http://setiathome.berkeley.edu/forum_thread.php?id=77586&postid=1763938#1763938

Tool to use:
HxD - Freeware Hex Editor
https://mh-nexus.de/en/hxd/

On this Downloads page:
https://mh-nexus.de/en/downloads.php?product=HxD

Find (Ctrl+F):
94e57a52e4d3eca6576bc15a99e884b6cdd5b03a

... to easy find links for "HxD, English - portable 1.7.7.0"
 


Thanks for such clear instructions.
WARNING to others who will try this way:

18 August 2016 - 10:17:37 Seti MB v818 August 2016 - 10:17:54 ERROR: Cpu and GPU count: More than one Cpu version number: 804 ,802


I think it's because of 8.02 in SETI main project and 8.04 is beta for CPU versions.
Apparently one need to keep both identical (I have no 8.04 in app_info.xml for main project at all).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1810096 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1810097 - Posted: 18 Aug 2016, 7:35:51 UTC - in response to Message 1810096.  

To check/compare your original and patched files with mine:
MD5:
a072664a7063bb1a231a2aae967cc427 *BOINC Rescheduler.exe
07bd6714b0817e6faf1941575e56e0b5 *BOINC Rescheduler v7.exe
cf29f5efe0eb427b37364a3432246762 *BOINC Rescheduler v8.exe

I didn't edit BOINC Rescheduler64 - I can't test it and don't really know why it exists (why there is need for 64-bit Rescheduler)
Is it impossible for a 32-bit program to access and edit client_state.xml on 64-bit Windows?
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1810097 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1810098 - Posted: 18 Aug 2016, 7:38:54 UTC - in response to Message 1810097.  
Last modified: 18 Aug 2016, 7:58:44 UTC

To check/compare your original and patched files with mine:
MD5:

Of course MD5 will only match if exact same changes were made. In my case I changed "human readable" name to Seti MB v8 so MD5 would hardly match.

P.S. Interesting, that "Other" tab allows to move CPU<-> GPU for particular project, but doesn't understand VLAR/VHAR and main SETI MB one understands VLAR/VHAR, but doesn't make the difference between main and beta projects...

EDIT: and more on version mismatch issue: when I suspended all beta CPU tasks error gone.

And seems it works OK for GBT tasks too.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1810098 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1810125 - Posted: 18 Aug 2016, 12:39:03 UTC

MB work (for me at least) seems to have completely dried up, so if you're getting errors using this app. it may mean there's simply nothing to move right now.
ID: 1810125 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 37 · Next

Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.