Issue running Seti projects

Questions and Answers : Windows : Issue running Seti projects
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1383038 - Posted: 20 Jun 2013, 14:33:43 UTC - in response to Message 1382949.  
Last modified: 20 Jun 2013, 14:37:52 UTC

Try these options:

-period_iterations_num 40 -sbs 128 -hp

And are you also going to explain to him how he should do that on stock apps? Or do you think it works itself out without having to give any explanation or how to add the command line instructions to the correct text file in the projects directory?

It may be all normal for you, but that does not mean that it is for the people you are giving help to. So you will have to explain, in detail, on how to do things like this. Or else point out a thread in which it has been done before.


Ageless, maybe you don't like my conversation style, but look, with number of symbols you spent to write what I should do you could explain that by yourself, don't you think so?

Ok, I will do:
There is readme file in project directory that explains these params.
Also, there is another file in project directory that name looks like mb_cmdline_win_x86_SSE2_OpenCL_ATI.txt (I don't run stock currently so name is approximate). One need to copy/paste line with params there and restart app (or BOINC as whole).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1383038 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383185 - Posted: 20 Jun 2013, 20:33:14 UTC - in response to Message 1383154.  


Sorry, Ageless
I pointed Raistmer to this thread as I can't see why this problem occurs.
Raistmer gives short hints and maybe expects I to elaborate on them if the user have questions "How to do that" ;)

If Raistmer can better spend time and thinking to find why/how this problem occurs it will be fine with me 'expanding' his short hints.

My 'hints' to him were:
- What happens in the calculations at about 20% progress?
- is the app stuck waiting for something? (it is not using at that moment CPU or GPU according to the user)
- or it calls some external function which never returns?


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383185 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1383204 - Posted: 20 Jun 2013, 21:17:29 UTC - in response to Message 1383185.  
Last modified: 20 Jun 2013, 21:18:17 UTC


- What happens in the calculations at about 20% progress?


Most probably one of long-running kernel sequencies occur.
All other possibilities much less probable for now.
I'm awaiting OP to check system log and see how many driver restarts he actually had.
For now best explanation of "stuck" is the driver restart @ ~20% and app freeze because of invalidated OpenCL context. And yes, with invalidated context first call to OpenCL runtime after invalidation will never return.
It's very nasty "feature" of runtime. Much better would be if some error was returned to app... but we have what we have.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1383204 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383243 - Posted: 21 Jun 2013, 1:31:52 UTC - in response to Message 1383204.  
Last modified: 21 Jun 2013, 2:09:48 UTC

I'm awaiting OP to check system log and see how many driver restarts he actually had.


@TJ13

To do this:
Option 1 - use the 'standard' way:
Control Panel -> Administrative Tools -> Event Viewer -> System
(on Windows 8 it may be somehow different, I did this on XP)


But I prefer Option 2 - use MyEventViewer:
http://www.nirsoft.net/utils/my_event_viewer.html


@Raistmer
Can you supply a search string (for 'Find') to easy locate the driver restarts? (in 'Event Viewer' or MyEventViewer)


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383243 · Report as offensive
TJ13

Send message
Joined: 13 Mar 06
Posts: 35
Credit: 7,970,069
RAC: 6
United States
Message 1383270 - Posted: 21 Jun 2013, 4:55:20 UTC
Last modified: 21 Jun 2013, 4:57:04 UTC

Apologies I was at work all day and only now got around to getting back to this post. I'm currently running the MB_bench_208.cmd BilBg gave and haven't finished one yet although it started about 12 mins ago. Also it did successfully snooze/pause BOINC in Windows 8.

Also it turns out I've had a lot of the driver failures. As in more than I could reasonably count. First one started May 29th and I've had several when I was doing tests the past few days.

Edit: as I post it the first one finished. Started at 00:42:28.424 Ended at 00:55:27:889. At this rate I'll probably let it run all night and have results in the morning.
ID: 1383270 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383271 - Posted: 21 Jun 2013, 5:12:32 UTC - in response to Message 1383270.  

Edit: as I post it the first one finished.

"the first one" is a CPU app/result
The second will be the GPU app (which may hang again at 20%)


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383271 · Report as offensive
TJ13

Send message
Joined: 13 Mar 06
Posts: 35
Credit: 7,970,069
RAC: 6
United States
Message 1383279 - Posted: 21 Jun 2013, 6:17:34 UTC
Last modified: 21 Jun 2013, 6:18:22 UTC

Well it finished. There is a file but it's really long and I'm not sure what's relevant. (Note: I'm going to bed now so don't expect a response for a few hours.)
ID: 1383279 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383401 - Posted: 21 Jun 2013, 16:22:43 UTC - in response to Message 1383279.  
Last modified: 21 Jun 2013, 16:39:22 UTC


You can go to this site and paste the full text (and give the resulting link):
http://pastebin.com/

(you can do it without account but to be able to edit and delete your pastes you need account)

Example - my last Knabench test:
http://pastebin.com/xA0pY8qn


If the test really finished OK then there is easy solution for you - to go to apps included in Lunatics Installer
(MB7_win_x86_SSE_OpenCL_ATi_HD5_r1843.exe is exactly from it)
(links to Lunatics Installer "How to" later, after we know ATi_HD5_r1843 really finished OK (you can find the info about Lunatics Installer by yourself but better wait for confirmation))


@Raistmer
What is the difference between the stock 'SSEx Win32 Build 1831 (USE_OPENCL_HD5xxx)' (I don't know the name of the .exe distributed)
http://setiathome.berkeley.edu/result.php?resultid=3021008523
and MB7_win_x86_SSE_OpenCL_ATi_HD5_r1843.exe


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383401 · Report as offensive
TJ13

Send message
Joined: 13 Mar 06
Posts: 35
Credit: 7,970,069
RAC: 6
United States
Message 1383421 - Posted: 21 Jun 2013, 17:13:48 UTC - in response to Message 1383401.  

http://pastebin.com/ZznGkDRP

That's the full file.
ID: 1383421 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383456 - Posted: 21 Jun 2013, 19:00:56 UTC - in response to Message 1383421.  


All looks good to me:
R2: .\ref\ref-setiathome_7.00_windows_intelx86.exe-PG0395.wu.res
Result : Strongly similar, Q= 99.95%

R2: .\ref\ref-setiathome_7.00_windows_intelx86.exe-PG0395_v7.wu.res
Result : Strongly similar, Q= 99.95%

R2: .\ref\ref-setiathome_7.00_windows_intelx86.exe-PG1327.wu.res
Result : Strongly similar, Q= 99.14%

R2: .\ref\ref-setiathome_7.00_windows_intelx86.exe-PG1327_v7.wu.res
Result : Strongly similar, Q= 99.14%

(and 8 lines 'called boinc_finish' also show no app hang)

=============

If you still continue to have this 'hang at 20%' problem now - you may go for 'Updated Installers, v0.41 for Windows'
http://setiathome.berkeley.edu/forum_thread.php?id=71867&postid=1375943#1375943

(the only downside is the apps will not update automatically from SETI servers in the future.
So [Subscribe] to the above thread to be notified if something new is posted.
)

First read the link 'this thread':
"Please see this thread for full release notes. READ THEM before you install."

Then use any of the 2 download links:
"... on Arkayn's site, Crunchers Anonymous HERE"
or in the Mike's post

You need file:
Lunatics_Win64_v0.41_setup.exe

(Installer will issue a warning about "BOINC is still running", I prefer to exit BOINC first manually)

You need to select all apps (type of tasks) that you use or want to use
(CPU, ATI, AstroPulse, 'MultiBeam v7' == SETI@home v7)

For CPU if the AVX can be selected will be fastest.
(you now don't have CPU tasks, is it incidentally or is your choice?)
For ATI select MB7_win_x86_SSE_OpenCL_ATi_HD5_r1843


Also make sure you have the proper selections per your choice here:
http://setiathome.berkeley.edu/prefs.php?subset=project
Use CPU yes/no
Use ATI GPU yes/no
...
SETI@home Enhanced: yes/no
SETI@home v7: yes/no
AstroPulse v6: yes/no
If no work for selected applications is available, accept work from other applications? yes/no


! If some app (type of tasks) is not selected (is unchecked) in 'SETI@home preferences' OR in Lunatics Installer
you will not receive this type of tasks.
(i.e. the app have to be selected in both places ('SETI@home preferences' and Lunatics Installer) to be 'active')


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383456 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1383564 - Posted: 22 Jun 2013, 3:54:17 UTC - in response to Message 1383401.  
Last modified: 22 Jun 2013, 3:55:44 UTC



@Raistmer
What is the difference between the stock 'SSEx Win32 Build 1831 (USE_OPENCL_HD5xxx)' (I don't know the name of the .exe distributed)
http://setiathome.berkeley.edu/result.php?resultid=3021008523
and MB7_win_x86_SSE_OpenCL_ATi_HD5_r1843.exe


Nothing that could explain difference in hang @20% behavior.
More detailed info can be found in revision logs in sources (there were number of changes).

https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1383564 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1383588 - Posted: 22 Jun 2013, 5:21:30 UTC
Last modified: 22 Jun 2013, 5:24:46 UTC

Regarding r1843 ability to run shortened tasks - it proves nothing.
20% of full length task is definitely not the same.

What about running with proposed options ?

[It's known that A-10 per se can run stock app. I have similar APU. Just right config needed to avoid driver restarts]
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1383588 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383667 - Posted: 22 Jun 2013, 14:18:06 UTC - in response to Message 1383588.  

What about running with proposed options ?


@TJ13

To do this (if you still continue to have this 'hang at 20%' problem):
Go to:
C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\

(Easy way to open this directory:
Press Win+R ([Win] is the key between left [Ctrl] and [Alt])
Paste the above path (C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\) and hit Enter
)

Find and edit the file (it have zero length by default):
mb_cmdline_win_x86_SSE_OpenCL_ATi_HD5.txt

Paste the following line in it and save:
-period_iterations_num 40 -sbs 128 -hp

To make the new options/settings in effect app have to be restarted, any of the following will do this:
[Suspend]/[Resume] the stuck task (always wait 5-10 sec after any Suspend) (when you Suspend the stuck task - new task of the same kind will start)
[Suspend]/[Resume] the project (SETI@home)
Exit/start BOINC
Restart Windows (do this if you had 'driver restart', after "driver failing and restarting successfully" OpenCL can no longer work - always Restart Windows if this happens)


Check in SIV what are the Temperatures of CPU and GPU during the computations.
(for CPU it shows both internal and external Temperatures, often the internal sensor is very wrong at low Temperature (when CPU is idle))

(too high Temperatures can cause 'driver restart' but usually at random points in the calculation.)


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383667 · Report as offensive
TJ13

Send message
Joined: 13 Mar 06
Posts: 35
Credit: 7,970,069
RAC: 6
United States
Message 1383702 - Posted: 22 Jun 2013, 18:01:36 UTC
Last modified: 22 Jun 2013, 18:15:30 UTC

Well I did what you said to do earlier and now I've received new tasks and finished them, but I've also received new tasks and gotten them stuck on 20% again. I've narrowed it down to having issues with tasks that are Application SETI@home v7 7.03 (opencl_ati_sah). Astropule and 7.00 Applications run fine. I'm not sure if anything I'm doing is just having the 7.03 not working but I figured I'd ask.

Edit: Did what you said in the most recent post and restarted the laptop. It worked for a little bit then got the "driver failed and successfully restarted" message again. Then it stopped showing progress. SpeedFan says that the HD1 is currently at 39* C and GPU is at 54* C. I have a cooling fan under the laptop running at full speed so no idea if there's anything I can do about the laptop temperature.
ID: 1383702 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383797 - Posted: 23 Jun 2013, 0:55:05 UTC - in response to Message 1383702.  
Last modified: 23 Jun 2013, 1:05:07 UTC

I've narrowed it down to having issues with tasks that are Application SETI@home v7 7.03 (opencl_ati_sah)

Yes, we know that from the beginning but why this happens on your computer is still unknown.


SpeedFan says that the HD1 is currently at 39* C and GPU is at 54* C

HD1 is probably the hard disk temperature which is irrelevant to this problem
(39°C is in the safe range according to Google (in 2007):
http://research.google.com/pubs/pub32774.html
PDF: http://research.google.com/archive/disk_failures.pdf
)

GPU at 54°C is also good (if measured during the GPU task progress (while GPU load in SIV was high), not after it hang at 20%)
Since the GPU is in the same package (chip) with CPU I suppose the CPU temperature is not much different (you didn't say what SIV shows about it)


Your 'Aborted by user' task using r1843 don't show change in options:
http://setiathome.berkeley.edu/result.php?resultid=3021008858

Used GPU device parameters are:
Number of compute units: 6
Single buffer allocation size: 64MB
max WG size: 256
period_iterations_num=20


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383797 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383803 - Posted: 23 Jun 2013, 1:50:34 UTC - in response to Message 1383588.  


@Raistmer
Do you think that running with -v 2 will give you some hint or a special debug build is needed?

I downloaded one of the failed WUs of TJ13:
http://setiathome.berkeley.edu/result.php?resultid=3021008523

Run it successfully (my GPU needed 26 minutes to reach and pass 20% progress)
I used this line in mb_cmdline.txt
-period_iterations_num 40 -sbs 128 -v 3

(-hp is causing too much lag for me so I omitted it; And (unfortunately for me) -v 3 == -v 2
(I was trying to cheat with undocumented feature ;) which works for some other apps)
)

The results:
http://pastebin.com/qb2G0P77


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383803 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1383819 - Posted: 23 Jun 2013, 4:03:56 UTC - in response to Message 1383588.  


Is this connected with the problem here? (and you want to make calls shorter by -period_iterations_num 40)

Timeout Detection and Recovery of GPUs
http://msdn.microsoft.com/en-us/windows/hardware/gg487368.aspx

TdrDelay: REG_DWORD. The number of seconds that the GPU is allowed to delay the preempt request from the scheduler. This is effectively the timeout threshold. The default value is 2


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1383819 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1383865 - Posted: 23 Jun 2013, 7:27:41 UTC - in response to Message 1383803.  
Last modified: 23 Jun 2013, 7:35:43 UTC


(-hp is causing too much lag for me so I omitted it; And (unfortunately for me) -v 3 == -v 2

-v <level> will just open more and more addition input as app evolves. So -v 10 (for example) can be used too, but will should only what I found useful to look at on some stage of development. No special plan for -v <level> option, new info added only when it's needed.

-hp - if it increases lag... well, then it's worth to remove it from config line.
For my Trinity-based desktop I also saw lag increase with CPU cores freed (but it correlated with increased throughput on that host). So, to avoid driver restarts it's worth to try with full loaded CPU also....

There is no such long runing single kernels in app (~2 seconds) so lags and driver restarts caused not single kernel or memory transfer but sequence of commands.
So, some slowdown (as additional output) can fix this issue too. For example, any attempt to catch exact place of driver restart with debug (OCL_VERBOSE) build each time gave nothing. Driver restart just doesn't occur if synching after each OpenCL command performed.

Increase in windows watchdog timer value or disable it completely could help with these driver restarts too.

There are 2 types of MB OpenCL app:
Windows/x86 7.03 (opencl_ati5_sah) 30 May 2013, 0:18:19 UTC
Windows/x86 7.03 (opencl_ati_sah)
that should be available for this particular host.
interesting if both cause driver restarts ot not.

EDIT: last cited resul recived with USE_OPENCL_HD5xxx, that is, opencl_ati_sah plan class app.
Worth to check if another app will cause driver restarts too or not. With quota limited for opencl_ati_sah plan class SETI scheduler will start to issue work for another compatible plan class...)
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1383865 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1383874 - Posted: 23 Jun 2013, 7:40:38 UTC
Last modified: 23 Jun 2013, 7:44:45 UTC

@TJ13

your host info shows that you able to run anonymous platform:

SETI@home Enhanced 6.03 windows_intelx86
Число завершённых заданий 491
Максимум заданий в день 612
Число заданий сегодня 0
Правильные задания завершённые подряд 512
Средняя скорость обработки 7.5620352119213
Среднее время обработки 3.30 days
AstroPulse v6 6.01 windows_intelx86
Число завершённых заданий 0
Максимум заданий в день 101
Число заданий сегодня 0
Правильные задания завершённые подряд 1
Средняя скорость обработки 14.74771573869
Среднее время обработки 2.82 days
AstroPulse v6 6.04 windows_intelx86 (opencl_ati_100)
Число завершённых заданий 1
Максимум заданий в день 225
Число заданий сегодня 0
Правильные задания завершённые подряд 125
Средняя скорость обработки 204.67270646889
Среднее время обработки 0.75 days
AstroPulse v6 6.04 windows_intelx86 (ati_opencl_100)
Число завершённых заданий 0
Максимум заданий в день 132
Число заданий сегодня 0
Правильные задания завершённые подряд 32
Средняя скорость обработки 184.82355412102
Среднее время обработки 0.72 days
SETI@home v7 7.00 windows_intelx86
Число завершённых заданий 1
Максимум заданий в день 101
Число заданий сегодня 0
Правильные задания завершённые подряд 1
Средняя скорость обработки 9.4470449455295
Среднее время обработки 2.38 days
SETI@home v7 7.03 windows_intelx86 (opencl_ati_sah)
Число завершённых заданий 0
Максимум заданий в день 15
Число заданий сегодня 0
Правильные задания завершённые подряд 0
Среднее время обработки 0.00 days
SETI@home v7 (анонимная платформа, Тип ЦП)
Число завершённых заданий 4
Максимум заданий в день 37
Число заданий сегодня 0
Правильные задания завершённые подряд 4
Средняя скорость обработки 11.289285690668
Среднее время обработки 0.51 days
SETI@home v7 (анонимная платформа, ГП ATI)
Число завершённых заданий 0
Максимум заданий в день 33
Число заданий сегодня 0
Правильные задания завершённые подряд 0
Среднее время обработки 0.00 days
AstroPulse v6 (анонимная платформа, ГП ATI)
Число завершённых заданий 1
Максимум заданий в день 35
Число заданий сегодня 0
Правильные задания завершённые подряд 2
Средняя скорость обработки 148.13778604426
Среднее время обработки 0.43 days


Try to re-run Lunatics installer and chose non-HD5 app for MultiBeam this time.
Will it fix issue?


EDIT:
One of last results finished OK and has such stderr:

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<stderr_txt>
Running on device number: 0
Number of period iterations for PulseFind set to:40
Maximum single buffer size set to:128MB
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided device ID used

Build features: SETI7 Non-graphics OpenCL OCL_CHIRP3 FFTW AMD specific USE_SSE x86
CPUID: AMD A10-4600M APU with Radeon(tm) HD Graphics


So, it's non-HD5 with proposed command line options used.
Looks like this config able to finish work. Or you still have driver restarts on other tasks? If yes, try to remove -hp from config line.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1383874 · Report as offensive
TJ13

Send message
Joined: 13 Mar 06
Posts: 35
Credit: 7,970,069
RAC: 6
United States
Message 1383934 - Posted: 23 Jun 2013, 15:30:12 UTC - in response to Message 1383874.  
Last modified: 23 Jun 2013, 15:38:01 UTC


Try to re-run Lunatics installer and chose non-HD5 app for MultiBeam this time.
Will it fix issue?

Tried this and still experienced the problem.


So, it's non-HD5 with proposed command line options used.
Looks like this config able to finish work. Or you still have driver restarts on other tasks? If yes, try to remove -hp from config line.

I'm only having the problem on SETI@home v7 7.03 (opencl_ati_sah). Every other type of task has successfully run if that's what you're asking.

Edit: Apparently trying a fresh project got it well above the 20% threshold, but the one stuck at 20% is still stuck. I'll check back later on it to see if it's just luck or fixed for the other projects as well.
ID: 1383934 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Questions and Answers : Windows : Issue running Seti projects


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.