climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 91 · Next

AuthorMessage
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 59506 - Posted: 25 Jan 2019, 12:12:32 UTC

A new batch (784) of hadcm3s just came out, and I got four of them. The first one has been running for 45 minutes with no problems, so they could work.
ID: 59506 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,999,047
RAC: 21,571
Message 59507 - Posted: 25 Jan 2019, 13:01:09 UTC - in response to Message 59506.  
Last modified: 25 Jan 2019, 13:02:33 UTC

A new batch (784) of hadcm3s just came out, and I got four of them. The first one has been running for 45 minutes with no problems, so they could work.


I just tried to get some but got the database down message so have to wait an hour before having another go. I know Andy knows about it but not sure why the machine should be so busy that it is causing problems at the moment?

Edit: Took 8 attempts to post the above. Now to see how long the edit takes....
ID: 59507 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,803,756
RAC: 5,187
Message 59508 - Posted: 25 Jan 2019, 13:48:21 UTC

There are 3061 of them, but I can't get any for my Mac - which can only run HADCM3S - because of the database being "down" (batch list).
ID: 59508 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,999,047
RAC: 21,571
Message 59509 - Posted: 25 Jan 2019, 14:18:25 UTC - in response to Message 59508.  

Have snagged one on my desktop machine now.
ID: 59509 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,999,047
RAC: 21,571
Message 59544 - Posted: 6 Feb 2019, 16:46:10 UTC
Last modified: 6 Feb 2019, 16:58:08 UTC

New model type for batch 785 HadAM4 I don't know what is different about this model type but between two machines, three of them running under Linux at the moment. If I understand what I have read correctly, a relatively small batch of 500 tasks so they won't last long, especially if as I suspect they run on all three platforms.

Seen some on Windows of various types. Not found any on Mac yet but that means nothing as I only looked at about ten or 12 tasks.
ID: 59544 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 59545 - Posted: 6 Feb 2019, 18:45:51 UTC - in response to Message 59544.  

All 100 of the running tasks I looked at were on the linux app. I had tried Windows first when I saw there were new tasks, but the Windows clients wouldn't pick any up.
ID: 59545 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59547 - Posted: 6 Feb 2019, 20:41:24 UTC

Yes, the new app is Linux only. (Yea!)
And I've got some running on one computer, which now has both Linux/Wine/Windows, and Linux only. (Yea!)
I just need to remember which icon starts which version.

And I've come across the first mass killer, who is now running this batch. :(

Batch 785 is a small spinup batch.
ID: 59547 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59549 - Posted: 7 Feb 2019, 4:00:42 UTC

Now a bit over 3% at a bit over 6 hours on my 3.50 GHz Haswell computer, so about 8.5 days total.
ID: 59549 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,999,047
RAC: 21,571
Message 59551 - Posted: 7 Feb 2019, 5:48:12 UTC
Last modified: 7 Feb 2019, 6:38:31 UTC

A lot seem to be crashing with
Model crashed:
READDUMP: BAD BUFFIN OF DATA
. This happened to four on testing when someone put a digger through a mains cable near where I live. This happening anything from shortly after model starts to several hours in. I suspect they don't like being interrupted.

This is one example
https://www.cpdn.org/cpdnboinc/result.php?resultid=21488816

Looks like about one in six of those that don't fail due to missing libraries are failing with the above error. A few other errors also spotted. A few with an insufficient stack memory available and one where user had restricted memory usage to half a gig. Interestingly that one had completed some hadcm3s tasks.
ID: 59551 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,999,047
RAC: 21,571
Message 59553 - Posted: 7 Feb 2019, 12:11:42 UTC

And the new task type is now on the project Status page.
ID: 59553 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 59555 - Posted: 7 Feb 2019, 15:47:14 UTC

I had one of those failures with the bad buffin data, when I stopped boinc several timesteps after a checkpoint. There should have been nothing wrong with doing it. It was the only task running on that PC at the time. Not good.

These things might be the biggest memory hogs in terms of active memory that we've had. Each model task takes about 650 MB of RAM, so for a fully loaded i7 with 8 tasks, about 5.5 GB of RAM used. I would imagine given cache and memory contention, in that circumstance, it would REALLY slow model progress relative to some of our other model types/regions. My reasonably quick PCs running only 1 model each are averaging 7.5 to 9.5 sec/TS.
ID: 59555 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,999,047
RAC: 21,571
Message 59557 - Posted: 7 Feb 2019, 19:43:29 UTC - in response to Message 59555.  

Thanks George,

Each model task takes about 650 MB of RAM, so for a fully loaded i7 with 8 tasks, about 5.5 GB of RAM used. I would imagine given cache and memory contention, in that circumstance....


I hadn't looked at how much memory was being used but what you say makes sense. There are a few machines out there that have crashed tasks due to running out of memory. One of them admittedly a 4 core I7 with only 1GB ram!
ID: 59557 · Report as offensive
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 59558 - Posted: 7 Feb 2019, 20:17:11 UTC

Any sign of new work for Windows? I will have 4 empty cores by tomorrow.
ID: 59558 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59559 - Posted: 7 Feb 2019, 20:44:07 UTC

Finally got to the zips.
About 7.3M on average, so much smaller than what we've had for a while.

I can confirm what George said about the Virtual memory size. Definitely not for bare bones machines.

14.0 sec/TS for the Haswell, and about 14.4 sec/TS for the Ivy bridge.
ID: 59559 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,999,047
RAC: 21,571
Message 59564 - Posted: 8 Feb 2019, 11:07:13 UTC
Last modified: 8 Feb 2019, 11:15:24 UTC

Any sign of new work for Windows? I will have 4 empty cores by tomorrow.

I think there is a batch in the pipeline. New files were uploaded to a potential cam25 batch about 0100UTC so someone was working late if based in Oxford!
ID: 59564 · Report as offensive
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 59565 - Posted: 8 Feb 2019, 15:31:02 UTC - in response to Message 59564.  

Hopefully, a large batch.
ID: 59565 · Report as offensive
Albert H.

Send message
Joined: 18 Feb 06
Posts: 73
Credit: 61,561,438
RAC: 47,468
Message 59674 - Posted: 25 Feb 2019, 19:20:46 UTC

I have 20 cores idle, how long do we have to wait for new work (windows) for CPDN ?
Just a question : is this the end ? Or is there some hope for better days ?

Thanks.
ID: 59674 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59675 - Posted: 25 Feb 2019, 19:47:26 UTC

It's not the end, just normal.
There were several large batches released late last year. Now those researchers are waiting for the data to be returned so that they can study the results.

And at long last work is in hand on new Linux models, for those of us who haven't had any work for a long time.

And lots more people are joining the project every day, so there's no shortage of computers waiting.
ID: 59675 · Report as offensive
Albert H.

Send message
Joined: 18 Feb 06
Posts: 73
Credit: 61,561,438
RAC: 47,468
Message 59693 - Posted: 28 Feb 2019, 9:25:04 UTC

Thanks, now there is new work
ID: 59693 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,803,756
RAC: 5,187
Message 59694 - Posted: 28 Feb 2019, 9:46:08 UTC

There are 4000 units for East Asia at 50 km at 18 month duration (batch list).
ID: 59694 · Report as offensive
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org