(CPDN) home page
Thread 'Bad Work Units'

Thread 'Bad Work Units'

Message boards : Science : Bad Work Units
Message board moderation

To post messages, you must log in.

1 · 2 · Next


Send message
Joined: 23 Aug 04
Posts: 49
Credit: 183,611
RAC: 0
Message 17933 - Posted: 9 Dec 2005, 11:49:09 UTC

Hi everyone,

Over the last few weeks we\'ve had something of a problem with the work units, which has only just come to light. Basically we\'ve released a bad batch of Work Units. Things were fine in the sulphur cycle experiment when we were simply perturbing S-cycle parameters, but there was a hitch (actually a series of hitches) when we attempted to release experiments with both atmospheric and S-cycle perturbations. One hitch involved not releasing the perturbations we thought we were releasing; the second involved not doubling the CO2 in the appropriate phases.

Given that the main thing we want from these runs (for the coupled model experiment) is the downwelling fluxes diagonsed in phase 1, these experiments are still useful (if unexciting from a CO2 point of view). They might also be handy for examining some of the more technical 32-bit reproduceable experiment issues. What we don\'t have from these runs is their climate sensitivity. This mainly affects an as yet unknown post-doc from an as yet unfunded proposal which is still only a twinkle in our eye, so while we regret the error we\'re satisfied it\'s not especially significant.

We (Tolu here in the basement and Carl in some airport lounge, I guess...) have now fixed these glitches and we should have some correct (and heavily scrutinised) experiments in the queue.

One of the beauties of doing your science in public is that people get to see your screw ups (and in this case point them out!). This is kind of embarrassing, but also kind of a good thing. Pretty much every scientist I know makes mistakes (and if they don\'t I bet they take too long to get anything done...). It\'s reasonable that you should see some of this. [There have been countless climate model runs that have had to be stopped, re-worked, and started again because someone chose the wrong start files/boundary conditions/control switches/etc. I know how many goes I had at getting the UM to run when I was in Reading!]

Thanks to Tolu and Carl (again) for (again) bailing us out of a problem that wasn\'t of their making. Thanks to all our participants for their patience.

ID: 17933 · Report as offensive     Reply Quote

Send message
Joined: 15 Oct 05
Posts: 4
Credit: 844
RAC: 0
Message 18144 - Posted: 13 Dec 2005, 15:15:56 UTC

Hi Everyone it\'s Duncan (PhD student who helped set up sulphur cycle experiment)

I couldn\'t find my old username and Key etc. so I\'ve logged on as the account I set up for my Dad\'s computer - sorry about that.

To second what Dave has said, these sort of things happen all the time when running the model locally except normally you can give yourself a slap on the wrists and re-start it again without any problems. Sorry for the mistakes that have occured....

Anyway, I\'m here to show you what we have so far and how good the results, which you\'ve all been very kindly churning out for us, look when they return. I have several postscript files with good images on but I\'m not sure how best to get them on to the discussion boards. Does anyone have any suggestions? (I\'ll ask elsewhere too) Hopefully I can then show you what the future may hold when we combine the cooling effect of sulfate with the warming of CO2!

Again, sorry for the screw ups but thank you so much for sticking with us. We REALLY appreciate it.

ID: 18144 · Report as offensive     Reply Quote
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 18147 - Posted: 13 Dec 2005, 15:46:34 UTC - in response to Message 18144.  
Last modified: 13 Dec 2005, 15:47:21 UTC

[Duplicit post deleted]
P.S. I\'m sure there are more people interested in such plot and will follow any discussion on that topic.
<i>phpBB forum for CPDN, all are </i><a href="">invited</a>
ID: 18147 · Report as offensive     Reply Quote
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 18148 - Posted: 13 Dec 2005, 15:46:34 UTC - in response to Message 18144.  

I have several postscript files with good images on but I\'m not sure how best to get them on to the discussion boards. Does anyone have any suggestions?
I would make a PDF files from them (everyone can download and see/print in any resolution/zoom if they are really postscript/vector files and not bitmapped.
Using them on a forum would need to make a .png (or .gif) files from postscript (an easy task for me) and upload them on a server with link to the forum post(s).

If you need help with it, you can provide me with a link to download or send them to me via e-amil (prefer zipped as this should shrink zipe of postscripped files significantly).

<i>phpBB forum for CPDN, all are </i><a href="">invited</a>
ID: 18148 · Report as offensive     Reply Quote
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 18151 - Posted: 13 Dec 2005, 16:05:14 UTC

Thanks Duncan,

Yes, I am also sure there are more people who would be interested.
Visit BOINC WIKI for help

And join BOINC Synergy for all the news in one place.
ID: 18151 · Report as offensive     Reply Quote

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 18152 - Posted: 13 Dec 2005, 16:12:43 UTC

Should those *.pdf\'s not be sent to Tolu, so he can place them on the website, after they have been verified to be authentic.
ID: 18152 · Report as offensive     Reply Quote

Send message
Joined: 15 Oct 05
Posts: 4
Credit: 844
RAC: 0
Message 18180 - Posted: 14 Dec 2005, 10:31:39 UTC

Thanks for that everyone. I\'ll have a word and see if we can get something posted on the Web site if not it\'ll go here. Watch this space!

Thanks again,
ID: 18180 · Report as offensive     Reply Quote

Send message
Joined: 15 Oct 05
Posts: 4
Credit: 844
RAC: 0
Message 18535 - Posted: 21 Dec 2005, 8:50:05 UTC

Hi eveyone,

Just so you know, we\'ve put up some images from the sulfur cycle runs that you may be interested in. They show the amount of sulfate in the column of air above grid boxes and you can see how the distribution of sulfate changes from present day emissions to expected 2050 emissions. It also highlights how inhomogeneous sulfate concentrations are and why they are so uncertain in models.

So a big thank you to you all for running it!

Many thanks
ID: 18535 · Report as offensive     Reply Quote
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 18538 - Posted: 21 Dec 2005, 9:24:14 UTC - in response to Message 18535.  

It a great news that sulphur cycle models are producing results.
Where we can see mentioned images?

<i>phpBB forum for CPDN, all are </i><a href="">invited</a>
ID: 18538 · Report as offensive     Reply Quote
Volunteer moderator

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 18565 - Posted: 21 Dec 2005, 17:52:52 UTC

Hi Duncan

After the Christmas and New Year break, which I hope you will all enjoy, could you ask some of the other PhD students to post and let us know what particular aspects they are researching? Things must have moved on a bit since some of us heard the students\' presentations in Oxford.
Cpdn news
ID: 18565 · Report as offensive     Reply Quote

Send message
Joined: 23 Aug 04
Posts: 49
Credit: 183,611
RAC: 0
Message 18566 - Posted: 21 Dec 2005, 17:54:52 UTC - in response to Message 18565.  

Hi Duncan

After the Christmas and New Year break, which I hope you will all enjoy, could you ask some of the other PhD students to post and let us know what particular aspects they are researching? Things must have moved on a bit since some of us heard the students\' presentations in Oxford.

Great idea.


ID: 18566 · Report as offensive     Reply Quote

Send message
Joined: 8 Feb 05
Posts: 19
Credit: 20,077
RAC: 0
Message 19113 - Posted: 10 Jan 2006, 3:40:31 UTC
Last modified: 10 Jan 2006, 3:50:45 UTC


***Can anyone from please confirm that it\'s normal and acceptable to have 4 straight \"bad\" Sulphur model runs where a fatal error(s) terminated these runs, typically around 100 CPU hours mark.?

I\'ve previously had 4 completed models (3 CO2 models and 1 Sulphur model) completed.

But now, I\'ve had 4 straight \"bad\" Sulphur model runs where a fatal error(s) terminated these runs, typically around 100 CPU hours mark.

These were downloaded in end-Dec\'05 and early-Jan\'06.
So perhaps these \"bad\" Sulphur experimental parameters mentioned by David Frame should have been removed by now (his posting was in early Dec\'05)?

I\'m getting quite discouraged from all these fatal errors.

[Edit: It\'s from the same computer--don\'t know why treated my computer as 3 separate computers...

And I\'ve checked my computer running all benchmarks as per below URL links:
ID: 19113 · Report as offensive     Reply Quote

Send message
Joined: 15 Oct 05
Posts: 4
Credit: 844
RAC: 0
Message 19124 - Posted: 10 Jan 2006, 8:24:54 UTC


thanks for that. I\'m not aware of any problems currently (this is the first I\'ve heard of this). Has anyone else had these problems? We are having a meeting tomorrow (Wednesday), I\'ll bring this up when I\'m at Oxford then.

Sorry for the problems that have occurred.

ID: 19124 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19125 - Posted: 10 Jan 2006, 8:46:03 UTC

@wang nala

1) Some people have LOTS of problems with the models. And some don\'t have any.
It depends on your computer hardware / software / program usage / room environment.

2) Each time you disconnect and reconnect to a project you get a different computer ID. There are other ways this happens, but this is the most likely cause of you having \"3\" computers.
You can merge them to the latest ID; look at one of the computers listed on your account page: there will be a merge option near the bottom.

ID: 19125 · Report as offensive     Reply Quote

Send message
Joined: 24 Feb 05
Posts: 7
Credit: 705,069
RAC: 0
Message 19329 - Posted: 15 Jan 2006, 7:04:46 UTC

I have had issues similar to wang, so we may not me alone here. I ran slab models fine but every sulpher one I have run has crashed -187 error. I have done some hardware diagonistics and havent found a problem yet, including prime95. Here is my host:
I am currently detached waiting for any ideas to fix the issue, so the 3 it shows as \"In progress\" are lost already.
ID: 19329 · Report as offensive     Reply Quote
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 19337 - Posted: 15 Jan 2006, 13:47:37 UTC - in response to Message 19329.  

I have had issues similar to wang, so we may not me alone here. I ran slab models fine but every sulpher one I have run has crashed -187 error. I have done some hardware diagonistics and havent found a problem yet, including prime95. Here is my host:
I am currently detached waiting for any ideas to fix the issue, so the 3 it shows as \"In progress\" are lost already.

Unfortunately, the 187 error is a red herring as it is trying to upload files for phases that have not yet been entered. Is there anything in the messages tab about errors preceding the failed upload error lines?

The s/TS for this PC is quite fast given its specs. Is it overclocked? Even though Prime95 ran stable, it might be good to try at its regular clock to see if that changes things. Also, if there is a yabsd.out type file in one of the failed experiment directories, could you paste the last 20 lines or so of that file (assuming they would be similar among errored models) in your response to this message?
ID: 19337 · Report as offensive     Reply Quote

Send message
Joined: 24 Feb 05
Posts: 7
Credit: 705,069
RAC: 0
Message 19984 - Posted: 5 Feb 2006, 7:03:09 UTC

REPLANCA - time interpolation for field 77
time,time1,time2 5820.000 5760.000 6480.000
hours,int,period 5820 720 8640
Information used in checking ancillary data set:
position of lookup table in dataset: 36
Position of first lookup table referring to data type 4
Interval between lookup tables referring to data type 4
Number of steps 8
STASH code in dataset 126 STASH code requested 126
\'Start\' position of lookup tables for dataset in overall lookup array
im,sm,ngroup,new_im,new_sm 1 1 48 T F
Model aborted with error code - 1 Routine and message:-

ID: 19984 · Report as offensive     Reply Quote

Send message
Joined: 25 Jan 06
Posts: 1
Credit: 0
RAC: 0
Message 20007 - Posted: 6 Feb 2006, 23:50:00 UTC

I\'m new at this - and confused.I\'m not even sure that I am in the right place!
Having been a SETI fan for many years I jumped on the BOINC bandwagon with naieve enthusiasm and had my machine throw a mental breakdown. I perservered, and all is reasonably well - if I don\'t mind the machine locking up when in screensaver mode, from time to time, and having to be reset.
That isn\'t the problem. The problem is that Einstein runs beautifully, Rosetta works beautifully, but Climate Prediction is a total failure!. I have only once seen the graphics - well, some of the graphics. I have a white world, a blue outline map, and a little message saying \"Please wait\". Then there\'s the pop-up that says,\"Visual Fortran Runtime Error - fontl severe (30):open failure, unit 6, file CONOUT $ Stacktrace terminated abnormally.\" What\'s all that about?
On the one occasion that I did manage to \'unlock\' the machine from the project without having to reboot, I saw and copied the following:
02/02/06 22:29:25||Starting result sulphur_j402_100891650_1 using sulphur_cycle version 422
02/02/06 22:54:37||request_reschedule_cpus: process exited
02/02/06 22:54:37||Computation for result sulphur_j402_100891650_1 finished
02/02/06 22:54:37|rosetta@home|Resuming result TERMINI_2reb_294_7555_0 using rosetta version 481
02/02/06 22:54:38||Sending scheduler request to
02/02/06 22:54:38||Reason: To fetch work
02/02/06 22:54:38||Requesting 259200 seconds of new work
02/02/06 22:54:39||Unrecoverable error for result sulphur_j402_100891650_1 (<file_xfer_error> <file_name></file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error><file_xfer_error> <file_name></file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error><file_xfer_error> <file_name></file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error><file_xfer_error> <file_name></file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error><file_xfer_error> <file_name></file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)
02/02/06 22:54:43||Scheduler request to succeeded
02/02/06 22:54:45||Started download of
02/02/06 22:54:47||Finished download of
02/02/06 22:54:47||Throughput 65483 bytes/sec
02/02/06 22:54:48||request_reschedule_cpus: files downloaded
02/02/06 22:54:48|rosetta@home|Pausing result TERMINI_2reb_294_7555_0 (left in memory)
02/02/06 22:54:48||Starting result sulphur_j3zj_100891631_1 using sulphur_cycle version 422
02/02/06 22:57:42||Suspending computation and network activity - user is active
02/02/06 22:57:42||Pausing result sulphur_j3zj_100891631_1 (left in memory)

Could this be why my Climate Prediction statistics show a nice staight yellow line embedded at 0 since I loaded it? (Insert sound effect = unsupported low denomination coins under gravitational influence)


ID: 20007 · Report as offensive     Reply Quote
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 20009 - Posted: 7 Feb 2006, 0:45:08 UTC - in response to Message 20007.  

The problem is that Einstein runs beautifully, Rosetta works beautifully, but Climate Prediction is a total failure!. I have only once seen the graphics - well, some of the graphics. I have a white world, a blue outline map, and a little message saying \"Please wait\". Then there\'s the pop-up that says,\"Visual Fortran Runtime Error - fontl severe (30):open failure, unit 6, file CONOUT $ Stacktrace terminated abnormally.\" What\'s all that about?

The problem is that climateprediction does not work with Win98/ME. See this sticky in the Windows section of the BOINC Questions and Problems Forum/Message Board.
ID: 20009 · Report as offensive     Reply Quote

Send message
Joined: 22 Jan 06
Posts: 2
Credit: 6,917
RAC: 0
Message 20021 - Posted: 7 Feb 2006, 13:56:47 UTC - in response to Message 20009.  

I haven\'t had a successful Sulphur run yet - they all terminate in phase 1 after about 80-100 hours CPU time, with no obvious error.

boinc.log shows:
sulphur_ispk_100877016 - PH 1 TS 0130366 A - 16/06/1818 23:00 - H:M:S=0097:17:17 AVG= 2.69 DLT= 1.00
Preparing for restart...
Error: Restart files for  not found
Giving up, this result exceeded crash count for available restart files.
        deflating :
        deflating : yabsd.out
2006-01-26 22:00:26 [---] request_reschedule_cpus: process exited
2006-01-26 22:00:26 [] Computation for result sulphur_ispk_100877016_0 finished

The yabsd.out.gz file offers no particular insight -it ends exactly as follows:
\'Start\' position of lookup tables for dataset in overall lookup array
  REPLANCA - time interpolation for field           74
  time,time1,time2    3660.000       3600.000       4320.000
  hours,int,period         3660         720        8640
  Information used in checking ancillary data set:

for that unit. The next unit ended with a different tail on yabsd.out.gz:

J_PE_JFINP1     =          -1,
 J_PE_JFINP2     =          -1,
 O_NPROC =           1,
 IMOUT   = 4*0,
 JMOUT   = 4*0,
 J_PE_IND_MED    = 4*0,
 NMEDLEV =           0
 SLAB TIMESTEP         2341
 im,sm,ngroup,new_im,new_sm           1           1          48 T F

Unfortunately, I\'ve also fallen victim to the lost activation email, so I can\'t post on the other forums. The administrators email link is bounced back as non-existent...

ID: 20021 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Science : Bad Work Units
