climateprediction.net (CPDN) home page
Thread 'Just lost a WU... :-('

Thread 'Just lost a WU... :-('

Message boards : Number crunching : Just lost a WU... :-(
Message board moderation

To post messages, you must log in.

AuthorMessage
Kenneth Larsen

Send message
Joined: 26 Aug 04
Posts: 59
Credit: 438,133
RAC: 0
Message 6855 - Posted: 11 Dec 2004, 12:08:24 UTC

I just lost a wu this morning due to the cpu being extremely unstable for no apparent reason. The wu was returned with "client error", is there any chance that it is still of any worth to them, or is it completely lost? It was at 85% when it crashed, and I've saved the wu folder. I hate the thought of several months worth of cpu cycles lost :-(
Proud owner of the CPDN Wow-Mug!
ID: 6855 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 6862 - Posted: 11 Dec 2004, 15:58:49 UTC

When you say you saved the wu folder, was this before or after the crash (or do you have both)?

If you have a copy from before the crash, it should be possible to try again.
Visit BOINC WIKI for help

And join BOINC Synergy for all the news in one place.
ID: 6862 · Report as offensive     Reply Quote
Kenneth Larsen

Send message
Joined: 26 Aug 04
Posts: 59
Credit: 438,133
RAC: 0
Message 6863 - Posted: 11 Dec 2004, 18:27:08 UTC

No, unfortunately it is from after the crash. The only backup I have from before the crash is several weeks old. I guess I can only learn from that...
ID: 6863 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 6876 - Posted: 12 Dec 2004, 1:51:42 UTC

Commiserations, Kenneth. I lost my 1st wu, (at about 76% I think).

At the time, I was running TeaTimer, the resident part of SpyBot, with the
monitor off. There was a sound like someone banging on a large hollow pipe,
(almost a "boink" sound), and when I got a display, there were several warning
windows on the screen. They said that either CP or BOINC (I forget which), had
tried to write to the registry.
Can't remember much else about that episode, but I got the wu woking OK.

Then, a week or so later, in the early hours of the morning, I was woken up by
someone banging on a pipe. Then I realised it was coming from the computer, and
turned on the monitor. This time there were about 14 warning windows cascaded
down the screen. And some more appeared as I watched. The messages all seemed to
be the same, but were in German. And the only cure was a reboot. And the Restart
icon was missing. Can't remember if I had to use The Big Red Button or not, but
when the computer was back up the wu had crashed, and a new one was downloaded
before I realised what was going on.

However, recently, someone from the project (Tolu? Dave Frame?) said that even
partial results such as this are useful. I suggest you save the wu folder the
same as a completed wu. Mine is still there.

Les


Backups: Here
ID: 6876 · Report as offensive     Reply Quote
Kenneth Larsen

Send message
Joined: 26 Aug 04
Posts: 59
Credit: 438,133
RAC: 0
Message 6882 - Posted: 12 Dec 2004, 9:57:42 UTC

That sure was very unlucky! Where exactly did the sound come from?
I've now started backing up the BOINC folder every week or so, at least then I won't loose months' of work.
The reason for my crash was that the cpu had overheated, it has done the same before. Problem is, back then it was at 61C, now it overheats at 55C. Brand new cpu even :-(

I hope you are right about partial results being useful too.
ID: 6882 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 6890 - Posted: 12 Dec 2004, 21:31:17 UTC

As far as I know, the sound was from the speaker. I'm not sure if it was a M'Soft sound,
or one supplied with SpyBot.

I had been following the progress of the two WUs closely, and it took a long
time to get over the loss of one of them.

One of the excuses (sorry, reasons), for getting the new computer was to run
ClimatePrediction, and I bought all the components and assembled it myself. So I knew the parts were good quality, and should last for a few months before becoming
obsolete. I did a lot of research first, but after reading the info on this site,
especially some of the stuff by UK_Nick, it seems I didn't know much at all.

48 hours and 20 minutes to completion of another wu. I'm about to make final
prepartions for backup, and prevention of a new download. Then I can try to restore a crashed wu from some time back, to see if I can do it, to hopefully
improve the look of my Results page, and to try and add another completed run to
the data available to the scientists.

The embarrassing part of it was that I crashed it by being careful to backup
the BOINC directory, and all my own data files before enlarging the C: drive
with PartitionMagic, and then applying the SP2 upgrade.

I had originally chopped up my 120Gig drive into 10Gig partions, so that I could
put everything in their own areas, but BOINC insisted on being on the C: drive.
I didn't understand any of it, so I didn't meddle, but this meant that C: was
getting full. And I had never used a CD drive that could write, so I had to learn that.

I got instructions from XP magazines.(Got lots of help from them).
I had to put the cp stuff onto several cds. Folders dragged OK, but there were a few loose files in BOINC and in climateprediction, so I proceeded to drag them
to the "copy" window. That's when I found that I had MOVED instead of COPYING.
So, copy them back. OK. Burn the cd. OK. Now to repartition. Not a problem.
Run the SP2 upgrade. Took a while, but again, no problems. Now to reboot.

As they say in Thomas the Tank Engine: Then there was trouble!
Can't remember exactly but something like:

XP saying something / Spybot complaining / 2nd XP window. Wants to know if BOINC
is allowed to talk on port 80 (that's nice. I won't have to try and do that myself).
"It's allowed" / close the SpyBot window / Read the 1st XP window and close it /
Hang on! Why is the dialup window open? / finally remember to open the BOINC gui
and look at the messages. One wu is dead and a new one downloaded. Already! / Disable net access. Start breathing again.

I had originally thought that I had copied the file back to the wrong folder,
but about 2 weeks again, after reading advice about backups, and "remember to
make the files R/W", I suddenly realised that THAT is what I had missed.

So, I'm going to tempt fate and try a restore from backup. I think it was at
about 50%, so it shouldn't take too long to run.

But I'm reluctant to upgrade from BOINC 4.05; It works OK , and I don't want
more problems.


Les

Backups: Here
ID: 6890 · Report as offensive     Reply Quote
Kenneth Larsen

Send message
Joined: 26 Aug 04
Posts: 59
Credit: 438,133
RAC: 0
Message 6901 - Posted: 13 Dec 2004, 8:41:49 UTC

I haven't had any problems upgrading BOINC to the newer version with CPDN running (I've done it several times). IF it makes trouble, delete BOINC and install the older version you were using before, then copy over the backup directory and overwrite. But then, if version 4.05 works fine for you, and you are only running CPDN, I guess there is no need to upgrade.
If you are embarrassed of your results page, take a look at mine - I have several units that will never be completed because they either didn't download correctly or they wouldn't run when downloaded. I've tried sending a mail to CPDN asking them to free it from my account, but haven't heard anything (it has been several months ago).

I'll try reading the hardware info by UK_Nick, since I am constantly trying to make my computers run faster. I've never visited the old CPDN forums before, maybe I should start now. :-)
ID: 6901 · Report as offensive     Reply Quote
Profileold_user993

Send message
Joined: 23 Aug 04
Posts: 49
Credit: 183,611
RAC: 0
Message 6907 - Posted: 13 Dec 2004, 16:24:30 UTC - in response to Message 6876.  

> However, recently, someone from the project (Tolu? Dave Frame?) said that even
> partial results such as this are useful. I suggest you save the wu folder the
> same as a completed wu. Mine is still there.

As the experiment evolves we want to delve more into the data. At the moment we're working quite hard to develop the server side of the project. This will let us examine the results more thoroughly (including partial results, which we haven't really had the resources to look at so far (because of the comparatively low-tech way we've gone about analysing the data)). [We had a couple of personnel setbacks on the server side early last year, which didn't help.]

It's frustrating when the model dies when it's so close to completion, I know (one of the things we have planned for next year is smarter run control code to help the client gauge its progress a bit better).

In defence of the model, though, you could think of it like this: most DC projects are very low probability tasks that look either at signals (SETI) or at large numbers of ways of combining things (protein folding stuff). Both expect a null result, in general, with the odd spectacular success if they're lucky (I stand to be corrected on this). We, on the other hand, get useful data back from pretty much every run: even treating completed runs as "positives" incomplete runs as "negatives",** we get a high strike rate compared to other DC projects. This is because we're looking to span a range of physical behaviours (sample the phase space of a complex model) rather than looking for a needle in a (combinatorial or Fourier space) haystack. I'm not meaning to be rude about other DC projects here - I think they're all innovative, efficient ways of addressing important scientific issues - just to point out that our rather epic work units often have a higher strike rate (in terms of most ways of defining work unit "success") than most other DC projects.

**Actually it's quite a bit more complex than this, but it'll do as a zeroth order pproximation to make the point. The issue of how and why runs fail is something we want to look at - just a matter of when!

Dave
ID: 6907 · Report as offensive     Reply Quote
Profileold_user17525

Send message
Joined: 13 Sep 04
Posts: 161
Credit: 284,548
RAC: 0
Message 6909 - Posted: 13 Dec 2004, 17:29:54 UTC
Last modified: 13 Dec 2004, 18:16:29 UTC

Hi Dave,

I think you should post that reply on the other board as well. There are a number of threads where it keeps cropping up.

Marj
_________________________________
ID: 6909 · Report as offensive     Reply Quote
Kenneth Larsen

Send message
Joined: 26 Aug 04
Posts: 59
Credit: 438,133
RAC: 0
Message 6912 - Posted: 13 Dec 2004, 18:41:05 UTC

Hi Dave, I thank you for your informative post. Does that mean that no wus have as yet been analyzed by the scientists?
ID: 6912 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 6914 - Posted: 13 Dec 2004, 22:01:44 UTC

What do you mean by 'analysed'?

They certainly haven't done all the analysis they are going to do yet. (Carl made a post suggesting most work would be in 2 - 10 years)

They have started looking at the results. For example at the open day they commented on the sparcity of results with a climate sensitivity of less than 2 degrees.
Visit BOINC WIKI for help

And join BOINC Synergy for all the news in one place.
ID: 6914 · Report as offensive     Reply Quote
old_user23880
Volunteer tester

Send message
Joined: 10 Oct 04
Posts: 223
Credit: 4,664
RAC: 0
Message 6917 - Posted: 14 Dec 2004, 1:40:58 UTC

Yes, the models are being analysed! I believe that one of the PhD students is starting to actually write his thesis.
__________________________________________________

ID: 6917 · Report as offensive     Reply Quote
Profileold_user993

Send message
Joined: 23 Aug 04
Posts: 49
Credit: 183,611
RAC: 0
Message 6920 - Posted: 14 Dec 2004, 9:23:55 UTC - in response to Message 6912.  
Last modified: 15 Dec 2004, 11:11:03 UTC

> Hi Dave, I thank you for your informative post. Does that mean that no wus
> have as yet been analyzed by the scientists?

There's plenty of analysis going on. The story is that Dave Stainforth (project's Chief Scientist) has completed the analysis of some 2500 early runs, and is hoping to publish the results early next year. Getting the first paper through was always likely to be a bit of a drama on account of the novel methodology of the experiment. When we talk to fellow scientists about the project we're often asked variants on the same themes: how do we know the models are reasonable? can you meaningfully compare across different platforms? a lot of these models are different from the standard model - how seriously can we take them? why don't you get back (insert favourite diagnostic) - it really matters?! We have answers to all these (and more!) but it takes time to persuade peer-reviewers (who are charged with being very sceptical) of the methodology. Once we've done that, we still have to persuade them that the science is good (really good!). I think a part of the long wait we've had is because the poblem throws up so many interesting and reasonable questions, and it takes time to work through them all.
[One interesting thing about the project's scientific "credibility" is the way the idea seems to get slowly picked up: it was first suggested in 1999, and everyone thought Myles was a bit nuts. Then Dave S and Myles worked on it for a few years, they got a few grants and managed to get something up and running, and gradually UK folks came around to the idea. When I started here in early 2002 it was still fringe, but was gaining more "credibility" through the painstaking technical work Dave and Jamie Kettleborough were doing. The week before we launched (Sept 2003) I gave a talk at the Royal Met Soc's conference and (a) everyone had heard of it (b) nobody thought it was particularly weird. But talking about it to more international audiences, one still finds "pockets of resistance" - people who haven't heard of it and find it all a bit strange and unfamiliar. Having said that, a lot of folks in the US and Europe have been really enthusiastic and interested in what we're doing (to the point of preparing bids for similar projects). As we start to publish, I hope we'll get even more of the community on board.]

Once we've published the first results paper, we ought to be able to get several more on the way quite quickly: we've been doing analysis in parallel, so there are actually several papers in various stages of completion, all of which we'd like to submit in the first half of next year.

In the medium term, there's loads of analysis to be done, and I think (and hope) the climate science community will be looking at the cpdn dataset (that you're all helping to build) for quite a few years.

Dave
ID: 6920 · Report as offensive     Reply Quote
old_user23880
Volunteer tester

Send message
Joined: 10 Oct 04
Posts: 223
Credit: 4,664
RAC: 0
Message 6942 - Posted: 15 Dec 2004, 0:15:37 UTC

Dave, your reply helps to put it all in context and shows how valuable the work is, both yours and ours. I don't know whether this will make you grin or groan, but if there's ever another open day, I imagine there'll be a big crowd there.

It can't be easy to coordinate a research programme where thousands of people are taking part rather than the usual small group.
__________________________________________________

ID: 6942 · Report as offensive     Reply Quote
Profileold_user993

Send message
Joined: 23 Aug 04
Posts: 49
Credit: 183,611
RAC: 0
Message 6964 - Posted: 15 Dec 2004, 11:36:24 UTC - in response to Message 6942.  

> Dave, your reply helps to put it all in context and shows how valuable the
> work is, both yours and ours. I don't know whether this will make you grin or
> groan, but if there's ever another open day, I imagine there'll be a big crowd
> there.

We'll look into it next year (I'm reluctant to commit us to anything this far in advance). Next year's would be a bit different in that we'd be talking much more about results, I think. Which is kind of neat. Personally, I think the video record was really useful, because it gives us a chance to refer to a set of archived talks which set out quite a nice picture of the project. I like the idea of this being publicly available. If we had one next year, the video record of the talks would be outlining our results, and I think that would be really useful - real scientists talking about what they did with your data, and what it means for our understanding of the climate system. I think that would be a considerable advance in terms of engaging the public in climate research.

> It can't be easy to coordinate a research programme where thousands of people
> are taking part rather than the usual small group.

The demands are very different. It's actually a very strange job - it's kind of like a hybrid between a regular academic project and a high-tech start-up company: on one hand there is lots of research to be done, which entails the usual procedures (getting data, writing code, analysing the data, debugging the code, analysing the data again, talking to your group, changing the analysis, talking to your group again, trying to write papers...) but then there is a quasi-business side to it, too: evaluating workplan options, liasing with our numerous partners about who can/should do which tasks, dealing with various human resource issues, finding compromises between doing things (esp. software engineering) as they would be done in a perfect world vs as they need to be done to meet real world targets, keeping our clients (participants and their PCs) happy, increasing numbers of clients through marketing (subject to the usual constraints of good scientific practice), etc. One of the things I like about the job is how varied it is. One of the things that really winds me up is how varied it is: I feel I never get a long enough run at any one task (especially research). It can be incredibly frustrating, but usually it's great. Most the time I feel kind of priviliged to work (a) at such a fine old institution as Oxford and (b) on such an innovative, interesting, scientifically-valuable experiment which has the ability (if we do our jobs right) to push boundaries in several disciplines (climate research, distributed computing/eScience, public understanding of and engagement with science).

Dave
ID: 6964 · Report as offensive     Reply Quote
old_user2474
Avatar

Send message
Joined: 28 Aug 04
Posts: 10
Credit: 334,771
RAC: 0
Message 7034 - Posted: 20 Dec 2004, 20:49:32 UTC

Thanks, Dave, for the detailed discription of what is taking place. I would say that few of us involved in CPDN have ever been involved in science at such a cutting edge like it is here. Just to know that the boundries of science can be pushed forward by the work our computers at home are doing is exciting in it'self. To know that we are adding to a knowledge base, by our little efforts (huge when all participants are added together)that will help create a better understanding of our climate, makes us all first rate scientists of the ilk that we have only read about in our past. To create a better climate model takes time and data. We are helping to supply this data and the folks at CPDN have the biggest chore: sifting through mountains of data to find the answer. Seems to me that there has never been enough data to really push climate prediction forward very far before CPDN. We will supply the data and you can have fun finding a better understanding of our climate.

CPDN ------> pushing forward the understanding of climate prediction!!!!
<img src="http://www.boincsynergy.com/images/stats/comb-793.jpg">
ID: 7034 · Report as offensive     Reply Quote

Message boards : Number crunching : Just lost a WU... :-(

©2024 cpdn.org