Posts by Leopoldo


log in
1) Message boards : Number crunching : Panic Mode On (34) Server Problems (Message 1007782)
Posted 1528 days ago by Profile Leopoldo
Wonder how many of those 4.6 million results out in the field will hit the upload server when it comes back online?

This number slowly decreases also:

As of 24 Jun 2010 14:50:09 UTC
Results ready to send 158,484 21,179 Results out in the field 4,618,405 72,475 Results returned and awaiting validation 6,473,085 75,090 Workunits waiting for validation 0 0 Workunits waiting for assimilation 947,649 5,141 Workunit files waiting for deletion 46 0 Result files waiting for deletion 140 0 Workunits waiting for db purging 937,110 21 Results waiting for db purging 1,981,623 914

As of 24 Jun 2010 15:45:08 UTC
Results ready to send 161,415 22,761 Results out in the field 4,615,486 72,475 Results returned and awaiting validation 6,385,568 75,090 Workunits waiting for validation 0 0 Workunits waiting for assimilation 903,271 5,141 Workunit files waiting for deletion 11 0 Result files waiting for deletion 98 0 Workunits waiting for db purging 981,631 21 Results waiting for db purging 2,072,158 914

2) Message boards : Number crunching : Rescheduler weirdness and strange errors (Message 1005157)
Posted 1536 days ago by Profile Leopoldo
In addition I am now seeing weird stuff in the error logs that makes no sense??

17/06/2010 08:30:00 SETI@home Beta Test You used the wrong URL for this project
17/06/2010 08:30:00 SETI@home Beta Test The correct URL is http://setiweb.ssl.berkeley.edu/beta/


It's not Rescheduler problem - error exists in description file inside of SETI@home Beta (the same name with main project). Do detach from beta project and continue crunching for main SETI@home. You can attach to beta again later.
3) Message boards : Number crunching : Running SETI@home on an nVidia Fermi GPU (Message 1004577)
Posted 1536 days ago by Profile Leopoldo
MadMaC,

<app_name>setiathome_enhanced</app_name> Im guessing that this is where the error is
No, this string of <workunit> section is correct. It's multibeam workunit.

<version_num>610</version_num>
<plan_class>cuda</plan_class>
Here for fermi-oriented app 6.10 must be

<plan_class>cuda_fermi</plan_class>

Rescheduler 1.9 doesn't know about new version 6.10 and this new plan_class - at compilation time such new things didn't exists...
4) Message boards : Number crunching : seti, collatz and cuda (Message 997300)
Posted 1564 days ago by Profile Leopoldo
As addition to Claggy's message: Dennis, saw you scheduling values for this projects?
The collatz scheduling priority is roughly double that of seti, can that be changed?

Dennis, you can read article http://www.boinc-wiki.info/Work_Scheduler to better understanding BOINC principle of chosing the projects to crunch right now.
* Work Scheduler does "round-robin" scheduling among Results, attempting to honor Resource Shares. *
If I will wish to crunch 2 projects simultaneously, I will set equal Resource Share to both of them.

Also, in your particular case of Collatz joining later than SETI, "Long Term Debt" for SETI is much lower than LTD for Collatz.

Back to your question. To equalize BOINC expectations about projects with equal resource shares - there exist the way to change project debts, with "boinccmd" tool. But it does require advanced skills and caution. It's not recommended for majority of users - to use service tools for manual BOINC manipulating.

I did it w/o problem. Backup BOINC data directory first (just for any case). At the BOINC directory exists service executable called "boinccmd.exe". To equalize debts for 2 projects, SETI and Collatz, you can run this executable with special parameters. Command line is:
boinccmd --set_debts setiathome.berkeley.edu 0 0 boinc.thesonntags.com/collatz 0 0


After this you can check properties for these projects to see how numbers changed.


Edit: but firstly look at coproc_debug output please.
5) Message boards : Number crunching : HELP!!! My son has reforrmatted my Ext-HD (Message 997162)
Posted 1564 days ago by Profile Leopoldo
ok. will look at these again. I have allowed the BF to try a scan with a different program, I can't remember what it is, I will check this too.

If nothing of above will help, try to look at http://en.wikipedia.org/wiki/TestDisk. There you can choose filesystem to search for. It can work with external media like USB-flashdisk...
6) Message boards : Number crunching : seti, collatz and cuda (Message 997132)
Posted 1564 days ago by Profile Leopoldo
As addition to Claggy's message: Dennis, saw you scheduling values for this projects?
(with project's "Properties" button in BOINC's "Advanced mode", search for "... scheduling priority" lines)

Project with larger value should be crunched by BOINC sooner than its counterpart...
(Except the case of projected crunching time is longer than task's deadline)
___________ WBW, Leonid
7) Message boards : Number crunching : Panic Mode On (32) Server problems (Message 996116)
Posted 1570 days ago by Profile Leopoldo
It looks like upload webserver and corresponding tasks (including results verification and total credit updating) stopped.


Upload computer is alive
PING setiboincdata.ssl.berkeley.edu (208.68.240.16) 56(84) bytes of data.
64 bytes from setiboincdata.ssl.berkeley.edu (208.68.240.16): icmp_seq=1 ttl=53 time=343 ms

But upload webserver doesn't responding
telnet: connect to address 208.68.240.16: Connection refused
8) Message boards : Number crunching : libz.so.1?? (Message 994255)
Posted 1578 days ago by Profile Leopoldo
but I'm trying to install a new version of boinc (boinc_6.10.44_i686-pc-linux-gnu.sh) on a Fedora Core 12 machine.


Just a suggestion: if I'm correct, your FC12 machine have 2.6.31.5-127.fc12.x86_64 installed (i.e. 64-bit Linux). Probably, 64-bit BOINC (boinc_6.10.**_x86_64-pc-linux-gnu.sh) would be better suited for such OS?
9) Message boards : Number crunching : Something wrong somwhere? (Message 994254)
Posted 1578 days ago by Profile Leopoldo
The 210 tasks dahls has are "Ready to report" so they have already been uploaded successfully. Communications with the upload handler are done, it's communicating with the Scheduler that's failing.

Thx, Josef. I missed this.

@dahls: Jørn, you can visit scheduler URL http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi from machine under question?

Normal answer looks like:

<scheduler_reply> <scheduler_version>611</scheduler_version> <master_url>http://setiathome.berkeley.edu/</master_url> <request_delay>11.000000</request_delay> <message priority="low">Error in request message: no start tag </message> <project_name>SETI@home</project_name> </scheduler_reply>

In my opinion it seems like there is something wrong at the server side.

To determine this, look into server answer after the uploading attempt (file "sched_reply_setiathome.berkeley.edu.xml")
Normal acknowledgement for completed task looks like:

... <result_ack> <name>11dc06ag.26523.16023.4.10.112_0</name> </result_ack> ...

Excuse me please, I never saw rejecting server answer, can't tell which it looks like.

Should I delete everything on the triple core machine and then reinstall BOINC again?

IMHO, in case of such big uploading troubles, which makes crunching at this machine useless, this action seems normal from my point of view...
10) Message boards : Number crunching : Something wrong somwhere? (Message 994188)
Posted 1578 days ago by Profile Leopoldo
The triple core machine has probably not been able to get new work sets for at least a week. And it has 210 work sets to report, but it's not able to report them.

Firstly you should report completed tasks. You can check availability of the upload server (208.68.240.16) by visiting the URL http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler with your favorite browser from this triple core machine.

Normal answer looks like:

<data_server_reply> <status>1</status> <message>no command</message> </data_server_reply>

If it does, check your uploading (completed tasks reported through the file "sched_request_setiathome.berkeley.edu.xml", so maybe this file can't be uploaded due to size)

If it doesn't, change URL with direct numbers instead of symbolic name and repeat visit.
11) Message boards : Number crunching : Something wrong somwhere? (Message 992765)
Posted 1584 days ago by Profile Leopoldo
An explanation of why this is happened would be nice :)


To see the one source of this problem please visit address
http://208.68.240.18/sah/download_fanout/testfile
You should see "this is a test" - which means server is working

And address
http://208.68.240.13/sah/download_fanout/testfile
If you will see "The service is not available. Please try again later." - 2nd SETI server isn't working and only the trick with "hosts" file will help your computers with correct downloads (don't forget to comment lines after the problem with server exhausted)

_____________
WBW, Leopoldo
12) Message boards : Number crunching : CUDA MB V12b rebuild supposed to work with Fermi GPUs (Message 992656)
Posted 1584 days ago by Profile Leopoldo
This is curious. If I click on my own link (which worked yesterday), I get "The service is unavailable".

But if I click on the copy of the exact same link which I posted in the Beta forum a few minutes later, the app downloads OK.


Just as a reminder, at present situation with .13 server we can post modified URLs like instead of
http://boinc2.ssl.berkeley.edu/beta/download/setiathome_6.09_windows_intelx86__cuda_fermi.exe

working 'hand-made' link is
http://208.68.240.18/beta/download/setiathome_6.09_windows_intelx86__cuda_fermi.exe

and it will work without 'The service is not available' ;)
13) Questions and Answers : Unix/Linux : Stalled tasks. (Message 981011)
Posted 1625 days ago by Profile Leopoldo
IMHO, such stuck looks as a result of using binary, compiled from initially-working source which assumes any compilator defaults - with the other version of compilator (and other defaults).

For example, initial variant of 64-bit openSUSE kernel in the 11.3 milestone 3 works if compiled by GCC 4.4.3 and doesn't with 4.4.4 - this GCC versions used different default aligning of "struct"s, 8 and 32, coordingly.

Such tasks (endlessly looping after writing baseline smoothing into the stderr.txt), are common for the stock 6.03 application of the SETI@home Beta project with 64-bit versions of Ubuntu 9.10 and openSUSE 11.2

Tangerineboy, with such binary I see no other way as abort them.
14) Message boards : Number crunching : Panic Mode On (30) Server problems (Message 979149)
Posted 1629 days ago by Profile Leopoldo
Did you saw how is broken LAN-card works? On-board LEDs indicates true activity, you can connect to POP and SMTP through telnet and receive answers, you can send/receive small pieces of data (your telnet commands and POP or SMTP answers; broadcasts at LAN; showing of shared folders)... BUT you never can said will data larger than around 200KB transmitted or not - you just receives no answer after sending of a command... Only telnet to external POP3-server and "list" command helped me to find a broken card. Replacement of this card had been done, which allowed to bring server into working condition.

Edit: furthermore, I see half-working 16-port switch now. It have full-working 8-ports and broken 8-ports (ping, SMB-messages, telnet, short-file transfers works there but long-file transfers doesn't). After disassembling and looking at electronic circuitry I saw 2 control chips and 2 correspondig buffers - so I sticked broken ports with scotch tape and reconnected 8 remaining UTPs to other switch. My conclusion was - one broken buffer. Other half-part of this switch still working...

It's very hard to find such things...
15) Message boards : Number crunching : Problems... (Message 977882)
Posted 1632 days ago by Profile Leopoldo
(btw, is any reason to double mention of all files in heading section? I mean why each file_ref should have corresponding file_info? )

my interpretation is:

  • files specified at <file_info> sections (right after the <app>) are declarations of all files which can be needed by any versions of specified app (i.e. for file existance check);
  • and files, mentioned at <file_ref> sections of one <app_version> - as files which must be copied to corresponding slot along with workunit file and chosen version of app for future processing.

16) Message boards : Number crunching : AndyK pending tool........... (Message 946930)
Posted 1751 days ago by Profile Leopoldo
I have updated to the same version that Leopaldo is using
...

Hello!

New version (as an attempt to add some stability to script working at fast webhosting - delays in line-by-line reading were added) has been uploaded to my mirror. I tested it with Geek@Play, John Galt 007 and msattler accounts - at my mirror I found no lockups.

Arkayn, try it too, please. Only one thing: delays for your fast webhosting must be longer, than my. Time-related settings concentrated around line 655.
17) Message boards : Number crunching : BOINC 6.10.17 released for all users (Message 943897)
Posted 1765 days ago by Profile Leopoldo
But not if you intend to run AQUA's multi-thread CPU applications. Unfinished business.

Agreed. AQUA MT apps slows down SETI CUDA-crunching seriously.
18) Message boards : Number crunching : AndyK pending tool........... (Message 943578)
Posted 1766 days ago by Profile Leopoldo
Hello!
Querry by HostID failed...........
Querry by UserID failed...........

Geek@Play, thank You! Finally I saw the lockup of the script too.

But the problem is much more larger than I think before. Looking deeper into the AndyK script functions, I hadn't found reading stream timeout reaction, only timeout increasing of the whole parsing script...

I don't understand why AndyK didn't programm the reaction to reading timeout. So, before any next recommendations, I will experiment at my mirror firstly.
19) Message boards : Number crunching : AndyK pending tool........... (Message 943391)
Posted 1767 days ago by Profile Leopoldo
When the querry is by User ID the waiting period is 0.50 and fails.

Right now I tried arkayn's mirror with Geek@Play account ID and it works.
Hmm, I will ask arkayn to increase timeout even more...
20) Message boards : Number crunching : AndyK pending tool........... (Message 943307)
Posted 1767 days ago by Profile Leopoldo
Hi, fellow crunchers!

Just as another variant I increased delay to 0.1 sec and request timeout to 3x value at my mirror.

John Galt 007 and Geek@Play: I entered your account numbers into the new version of the AndyK tool at my mirror - and no lockups I saw. Please check yourself again if you wish. I hope, together we can find the working combination...

___________
WBW, Leonid


Next 20

Copyright © 2014 University of California