climateprediction.net home page
App crasched when opening \"Show graphics\"

App crasched when opening \"Show graphics\"

Questions and Answers : Windows : App crasched when opening \"Show graphics\"
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile old_user48540

Send message
Joined: 31 Jan 05
Posts: 5
Credit: 85,146
RAC: 0
Message 10595 - Posted: 9 Mar 2005, 9:11:09 UTC

Which shouldn't be a problem, just restart the whole thing, but when I did that the project ended when I had 40 hours left out of over 800 hours). And the app started to download new zip-files and the project says it is starting a new project, which it hasn't done yet since the progress won't leave 0%....

Can I finish the old project or should i just let it continue on the new one.... Rather annoying since it was just about 6 % left to compute...
________________________________
ID: 10595 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 10599 - Posted: 9 Mar 2005, 10:54:01 UTC

Do you have a backup of the boinc folder from before the crash? If so copying it back should work.
Visit BOINC WIKI for help

And join BOINC Synergy for all the news in one place.
ID: 10599 · Report as offensive     Reply Quote
Profile old_user48540

Send message
Joined: 31 Jan 05
Posts: 5
Credit: 85,146
RAC: 0
Message 10601 - Posted: 9 Mar 2005, 11:35:11 UTC - in response to Message 10599.  

Hrrmmff... no... I don't... but I will start doing that from today...
Still the folders and files in dataout-folder of the interrupted project seem ok since they're divided into a number of files and shouldn't it be possible to start from the latest file that seems ok? And if so, where can I find info of doing that?
________________________________
ID: 10601 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 10603 - Posted: 9 Mar 2005, 12:47:34 UTC - in response to Message 10601.  
Last modified: 9 Mar 2005, 12:50:25 UTC

> Hrrmmff... no... I don't... but I will start doing that from today...
> Still the folders and files in dataout-folder of the interrupted project seem
> ok since they're divided into a number of files and shouldn't it be possible
> to start from the latest file that seems ok? And if so, where can I find info
> of doing that?
>
Sorry it isn't possible. Most files you see are averages and don't contain the detail. The restart files contain most of the detail and may be ok but unfortunately there is also info in the clientstate.xml file which is also needed and is now lost.

BTW The old unit is still of some use to the scientists.
Visit BOINC WIKI for help

And join BOINC Synergy for all the news in one place.
ID: 10603 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 10604 - Posted: 9 Mar 2005, 12:56:26 UTC - in response to Message 10603.  
Last modified: 9 Mar 2005, 13:01:18 UTC

> unfortunately there is also info in the clientstate.xml file which is also needed and is now lost.

There is an outside chance that it's not too late to recover the job. If you can find the string <b>45dv_000215544</b> in 8 different sections of your client_state.xml or client_state_prev.xml file take a copy of the relevant file, post another message and I'll try to help you out.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 10604 · Report as offensive     Reply Quote
Profile old_user48540

Send message
Joined: 31 Jan 05
Posts: 5
Credit: 85,146
RAC: 0
Message 10605 - Posted: 9 Mar 2005, 13:07:18 UTC - in response to Message 10604.  

Ok, I understand.... and as you said... it's lost since I couldn't find the string... Thank you for your time and I have now scheduled a backup running every night on the boinc folder.... ;-)

Cheers

________________________________
ID: 10605 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 10895 - Posted: 15 Mar 2005, 3:12:41 UTC

According to the Workunit page, this model has received "over" status and has been given to a new user to compute.

516610 99588 31 Jan 2005 11:32:42 UTC 9 Mar 2005 9:33:23 UTC Over Client error Computing 2756732.51 6332.67
618533 27407 12 Mar 2005 1:44:21 UTC 12 Mar 2005 2:45:36 UTC Over Client error Computing 0.00 0.00


Your 760 hours cpu-time are rendered useless as soon as an other user finnishes this particaliar model-run.


I believe that this could have been avoided if the boinc client was reliable !

Making a backup, once daily, is for now the best practice to cover up for the ability of the client to work the CPDN models.

An other possiblity is to make copies of the client_state(_prev).xml files, as soon as you encounter some kind of problem.
Than copy the client_state_prev.xml to the original client_state.xml file and recycle the machine.
If you are lucky than the model will continue on restarting the client.

If it does not, revert to the backup or consider your work lost !


N.B. If I read between the lines of the Project Statistics, than I estimate that roughly half of the assigned credit is wasted on incomplete models. It is my assumption that a great deal of this wasted resources is due to these client-errors.

A bit off topic, maybe, but I wonder what the equivalent-value is to the cups of tea which where heated up and where left untouched to cool down to ambient conditions again ?
ID: 10895 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 10899 - Posted: 15 Mar 2005, 6:16:36 UTC
Last modified: 15 Mar 2005, 6:20:07 UTC

&gt; Your 760 hours cpu-time are rendered useless as soon as an other user finnishes this particaliar model-run.

Not necessarily. The scientists like to compare the same model run on different processors / op systems,
to check for consistancy, and 2/3rds of a run to compare is better than none.

&gt; I believe that this could have been avoided if the boinc client was reliable !

Surprisingly, most of the problems lie with people's computers.
Lots have said: "My computer is brand new, so it must be OK." Or similar.
But when they have performed the checks and tests recommended to them, it often turns out to a heat problem.
Which is caused by MANY things.

And lots of people have NO trouble completing models. On different processsors and op systems. Myself included.

As for wasted credit, credit doesn't count here. Only trickles and completed models.
What gets wasted a lot, are parameter sets, and a lot of this is due to people running their machines automatically,
without checking to see if the program is producing results.

&gt;A bit off topic, maybe, but I wonder what the equivalent-value is to the cups of tea which where heated up and where left untouched to cool down to ambient conditions again ?

I agree with this. Good example.

Mine is that it's like a huge jigsaw puzzle, with lots of the pieces missing, probably permanently.

Les

ID: 10899 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 10909 - Posted: 15 Mar 2005, 12:56:55 UTC - in response to Message 10895.  

&gt; N.B. If I read between the lines of the Project Statistics, than I estimate
&gt; that roughly half of the assigned credit is wasted on incomplete models. It is
&gt; my assumption that a great deal of this wasted resources is due to these
&gt; client-errors.
&gt;

I agree with what Les has said. My estimate is that 76% of model years are eventually ending up in completed runs. For the classic client it was only 72% so BOINC is more stable than the non BOINC version. I think we will see further improvement in stability as better techniques get tried out in alpha testing of sulphur cycle model (and pre alpha testing of coupled model?) then brought back to the public release. Also some improvement as people get more used to the intensive work of CP. However, I do not see a vast improvement on the 76% being possible as a lot of this relates to the stability of the computers being used.
ID: 10909 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 10980 - Posted: 16 Mar 2005, 0:52:05 UTC - in response to Message 10909.  

&gt; My estimate is that 76% of model years are
&gt; eventually ending up in completed runs. For the classic client it was only 72%
&gt; so BOINC is more stable than the non BOINC version.

Very unscientific to draw conclusions based on estimates.
If you think that that boinc client is so stable, than support your call with calculations on the boinc stats.

As a gesture I'm willing to set my estimate from rouhgly half to 60+%, but I think 76% seems to me wishfull thinking. But than if it was 76% - if 76% of airplanes would not make it to the other side of the Atlantic, you would not see me flying across ! Even if you would tell me that the airplaines where perfectly safe.
ID: 10980 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 10984 - Posted: 16 Mar 2005, 2:18:30 UTC

crandles is talking about the pre-BOINC CP results as compared with BOINC CP results.
This was discussed last year, in the classic forum, which is still dead.

Les
ID: 10984 · Report as offensive     Reply Quote
Profile old_user48540

Send message
Joined: 31 Jan 05
Posts: 5
Credit: 85,146
RAC: 0
Message 11005 - Posted: 16 Mar 2005, 17:05:08 UTC - in response to Message 10984.  

Just a thought....
Why in Saskatchewan doesn't the client save the client_state.xml file as unique with date and time instead of overwriting the two versions, actual och previous...???

Then you could go back to the file before the crash and continue even without backing up your data (which you of course should do anyway)...

I mean, the xml files is 16 kb each, which is nothing compared to the amount of space the projects files take up. So it wouldn't be anything you noticed anyhow...

Couldn't this easily made dirty solution increase the number of succeding WUs?
________________________________
ID: 11005 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 11006 - Posted: 16 Mar 2005, 17:49:46 UTC - in response to Message 11005.  

Very good point Lunkster

From the 760 hours you completed, mayby only the last few bits are corrupt.
Restarting the client earlier than where it stoped, should overwrite the corrupt data and would therfore have no impact on the model-data. Furtermore it would preserver your efforts.

Was your machine in any way exposed to more heat than usual ? I doubt it, but to be sure I would like to hear that from you.

Regards,

Eric.


ID: 11006 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 11009 - Posted: 16 Mar 2005, 19:37:51 UTC

Lunkster,
I think the client_state files are part of BOINC, so you would need to ask them.
Also, I think they have a suggestions / bug report forum. V4.25 has 2 links to their site builtin.
But it is an idea.

Les
ID: 11009 · Report as offensive     Reply Quote
Profile old_user48540

Send message
Joined: 31 Jan 05
Posts: 5
Credit: 85,146
RAC: 0
Message 11026 - Posted: 17 Mar 2005, 9:42:02 UTC - in response to Message 11006.  

I have no indications of overheating.... what hapened was that I hade a few programs running, and when (by curiousity) I wanted to open the globe to se the patterns of the model "Show graphics" that client went frozen and after a while when I was wondering what happened the client starts by saying the WU was ended and reported what you can see on the projects history. So for me it was something that conflicted insede the client, but that my unprofessional hypothesis.. ;)
________________________________
ID: 11026 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 11029 - Posted: 17 Mar 2005, 12:38:30 UTC - in response to Message 11026.  

&gt; I have no indications of overheating.... what hapened was that I hade a few
&gt; programs running, and when (by curiousity) I wanted to open the globe to se
&gt; the patterns of the model "Show graphics" that client went frozen and after a
&gt; while when I was wondering what happened the client starts by saying the WU
&gt; was ended and reported what you can see on the projects history. So for me it
&gt; was something that conflicted insede the client, but that my unprofessional
&gt; hypothesis.. ;)
&gt;
Prof or unprof, it seems to me you got the picture right.

B.t.w. can I show my respect for your perseverance. Think I wouldn't have started running a new model, if the same would happen to me.
Actualy, plan to delay the next model-run untill release of a fixed client.
My first model ended after 30 hours, due to my own wrong doing *, but if the present one finnishes uncompleted, I'm done.

Looked into the source-code overview, yesterday, in relation to clientstate.
Second inpressions is that things are not as simple as depicted earlier. State file might be updated quite often and read from other state files or alike.
Best would be to have a procedure in place which, on error, goes back a few timestaps and tries from there instead of the present "abort on error"
This procedure is already aplied to models that run out of bounderies or experiance model crashes.

* Using task manager, I killed one of the two hadsm... tasks, because the "Grapics Window" was taking more cpu resources than it would take before.
Result was that both Graphics and client ended and so on.
ID: 11029 · Report as offensive     Reply Quote

Questions and Answers : Windows : App crasched when opening \"Show graphics\"

©2024 cpdn.org