climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 91 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58526 - Posted: 3 Aug 2018, 0:50:23 UTC - in response to Message 58524.  

And batch 742 has been paused while thinking is in progress.

So Should I continue crunching my 742s and those in my queue?


Yes, crunch away. I am.
zips at about every 8%, 92.6+ Megs for each, about 10 days total crunching.

Sending back lots of data is a good way to help, either to find out what's wrong, or simply to return good data, if that's what the un-failed models are doing.
Only the researchers can search the vast amounts of data to see what's what.

(By "paused", they mean that downloads from this batch have been stopped.)
ID: 58526 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,997,390
RAC: 21,721
Message 58527 - Posted: 3 Aug 2018, 9:04:17 UTC - in response to Message 58526.  

And batch 742 has been paused while thinking is in progress.


Thinking has been done and the people at Oxford are certain the tasks that don't crash are producing worthwhile data. The sending out of these tasks has therefore resumed.

The entire purpose of BOINC is to enable multiple projects to be run on individual PC's, not supercomputers. Dinking around with the global settings inherent in BOINC to PERHAPS stabilize one project - i.e., CPDN - at the risk of destabilizing other BOINC-related projects - i.e., SETI, LHC, Cosmology, Milky Way, etc, etc, etc - is NOT a solution and is in fact foolhardy.


There are many reasons for tinkering about with the global settings of BOINC, These are mostly related to how different projects play together or how any other programs running on the computer work alongside BOINC. The settings which reduce the chances of CPDN tasks crashing are likely to reduce the chances of tasks crashing from other projects also, though crashes I personally have had on other projects with the exception of the Android platform which isn't supported by CPDN have all crashed on all the other computers they have run on and not showed the frustrating pattern or sometimes lack of pattern that crashes with CPDN show.

I would say that with regards to data being useful, past history has shown that a high percentage of crashes with the sementation violation has never rendered the data from the tasks which do complete invalid.
ID: 58527 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,997,390
RAC: 21,721
Message 58533 - Posted: 3 Aug 2018, 19:02:11 UTC

Batch 745 is I think about 1,000 eu25 13month tasks. Possible it may be more and not showing on the server status page yet.
ID: 58533 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58536 - Posted: 4 Aug 2018, 11:01:42 UTC

Front page says batch 745 is 20,000 (wow), but the status page says 10,000, and I think a lot of those are batch 742.

So either tasks are going fast, or 745 hasn't been fully released yet.

And the trickle program has stopped running.
I'll go and see if anyone's home.
ID: 58536 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,997,390
RAC: 21,721
Message 58537 - Posted: 4 Aug 2018, 13:08:29 UTC - in response to Message 58536.  

Front page says batch 745 is 20,000 (wow), but the status page says 10,000, and I think a lot of those are batch 742.


Now wondering if I misread the numbers when I said, 1,000 or if that's all there were when I looked. It wasn't up on the front page then so any misreading would have been my looking at the workunits.
ID: 58537 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,997,390
RAC: 21,721
Message 58553 - Posted: 6 Aug 2018, 8:51:57 UTC - in response to Message 58537.  

Batch 746: EUR25 2010-2016 with 10newPP

8,600 simulations.
(from front page.)
ID: 58553 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,803,756
RAC: 5,187
Message 58558 - Posted: 7 Aug 2018, 11:11:09 UTC - in response to Message 58536.  

[Les Bayliss wrote:]Front page says batch 745 is 20,000 (wow), but the status page says 10,000, and I think a lot of those are batch 742.

So either tasks are going fast, or 745 hasn't been fully released yet.

Split batch is what I'm seeing too: batch list. More to come if the front page total is right.
ID: 58558 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,803,756
RAC: 5,187
Message 58574 - Posted: 8 Aug 2018, 23:27:59 UTC

Some long models just added: batch #747 PNW at 25 km for 121 months (2000) and 61 months (1000) (batch list).
ID: 58574 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,997,390
RAC: 21,721
Message 58614 - Posted: 16 Aug 2018, 11:36:13 UTC - in response to Message 58574.  
Last modified: 16 Aug 2018, 11:38:12 UTC

Batch 748 200 Hadcm3s tasks and batch 749 420 Hadcm3s tasks And

THEY RUN ON LINUX!!!

Well start at least, mine are 5 minutes in with no problems so far.

Edit, all four have checkpointed. Will report back when they have been going a bit longer.

And they won't last long!
ID: 58614 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 58620 - Posted: 16 Aug 2018, 13:55:00 UTC - in response to Message 58614.  

Batch 748 200 Hadcm3s tasks and batch 749 420 Hadcm3s tasks And

THEY RUN ON LINUX!!!


Perhaps they do, but since I get so few work units (none lately) my BOINC client now queries the server only once every three days, so unless three days worth of LINUX-worthy work units turn up, I am unlikely to get any.
ID: 58620 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,997,390
RAC: 21,721
Message 58621 - Posted: 16 Aug 2018, 14:08:55 UTC - in response to Message 58614.  

And server status page showing just one left now, though if as I suspect these are Linux or Linux and Mac only, some will come back in because of people without the 32bit libs installed. Mine are running at just over 3.5hours/1% Currently a bit over .6%completed.The danger point when some batches have failed is just before completion of first zip.
ID: 58621 · Report as offensive
rjs5

Send message
Joined: 16 Jun 05
Posts: 16
Credit: 19,441,595
RAC: 9,251
Message 58622 - Posted: 16 Aug 2018, 16:19:11 UTC - in response to Message 58621.  

And server status page showing just one left now, though if as I suspect these are Linux or Linux and Mac only, some will come back in because of people without the 32bit libs installed. Mine are running at just over 3.5hours/1% Currently a bit over .6%completed.The danger point when some batches have failed is just before completion of first zip.


I got one of the Linux WU on my 64-bit Linux machine and it seems to be running fine.

There is a lot of "HOW TO run 32-bit dynamic apps on 64-bit Linux" information about making sure that a 64-bit installation has the right 32-bit libraries. Seems like a pretty easy to check to make sure the right 32-bit libraries are installed is by writing a small 32-bit test app that needs the same libraries.

Seems like the KEY would be the build. 32-bit COMPILED to "a.out" with the command line that forces the correct libraries to be present:

g++ h.cpp -m32 -lpthread -ldl -lstdc++ -lm -lgcc_s -lc -lz -lnsl



Example of any c++ program (Hello World): h.cpp
cat h.cpp

#include <iostream>
using namespace std;
int main (int argc, char** argv)
{
cout << "Hello world!" << endl;
return 0;
}



32-bit libraries I needed for my 32-bit creation to say "Hello World":

ldd a.out
linux-gate.so.1 (0xf7fcd000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf7f7d000)
libdl.so.2 => /lib/libdl.so.2 (0xf7f78000)
libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7df3000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7dd6000)
libz.so.1 => /lib/libz.so.1 (0xf7dbd000)
libnsl.so.1 => /lib/libnsl.so.1 (0xf7da2000)
libm.so.6 => /lib/libm.so.6 (0xf7ca8000)
libc.so.6 => /lib/libc.so.6 (0xf7b10000)
/lib/ld-linux.so.2 (0xf7fcf000)

Notice the same 32-bit libraries I needed for CPDN application:

ldd *gnu *gnu.so
hadcm3s_8.34_i686-pc-linux-gnu:
linux-gate.so.1 (0xf7fd1000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf7f81000)
libdl.so.2 => /lib/libdl.so.2 (0xf7f7c000)
libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7df7000)
libm.so.6 => /lib/libm.so.6 (0xf7cfd000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7ce0000)
libc.so.6 => /lib/libc.so.6 (0xf7b48000)
/lib/ld-linux.so.2 (0xf7fd3000)
hadcm3s_um_8.34_i686-pc-linux-gnu:
linux-gate.so.1 (0xf7f69000)
libdl.so.2 => /lib/libdl.so.2 (0xf7f33000)
libm.so.6 => /lib/libm.so.6 (0xf7e39000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf7e1a000)
libc.so.6 => /lib/libc.so.6 (0xf7c82000)
/lib/ld-linux.so.2 (0xf7f6b000)
hadcm3s_se_8.34_i686-pc-linux-gnu.so:
linux-gate.so.1 (0xf7f2d000)
libz.so.1 => /lib/libz.so.1 (0xf7e59000)
libnsl.so.1 => /lib/libnsl.so.1 (0xf7e3e000)
libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7cb9000)
libm.so.6 => /lib/libm.so.6 (0xf7bbf000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7ba2000)
libc.so.6 => /lib/libc.so.6 (0xf7a0a000)
/lib/ld-linux.so.2 (0xf7f2f000)
ID: 58622 · Report as offensive
pj
Avatar

Send message
Joined: 15 Dec 12
Posts: 8
Credit: 535,242
RAC: 0
Message 58624 - Posted: 16 Aug 2018, 20:43:20 UTC

It's been quite a while, but for the first time since the melt down, my iMac has gotten three projects to run that will take 2 days and 22.5 hrs.
Keep them coming!
ID: 58624 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,997,390
RAC: 21,721
Message 58633 - Posted: 18 Aug 2018, 15:41:20 UTC - in response to Message 58614.  

Just noticed, the 748's are only 12 months while the 749's are 120months.
ID: 58633 · Report as offensive
ProfileBill F

Send message
Joined: 17 Jan 09
Posts: 124
Credit: 2,026,181
RAC: 2,642
Message 58640 - Posted: 25 Aug 2018, 1:16:02 UTC

Interesting before the Great Crash ... CPDN... not the Stock Market.

We had so many users active that getting a WU was a prize. We have lost so many active users that WU's are laying in the system begging to be taken.

How times change.

Bill F
Dallas TX
In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


ID: 58640 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58641 - Posted: 25 Aug 2018, 3:13:21 UTC

Get them while you can.
Things are about to change. :)

And something is seriously wrong with your i5-5200U.
Perhaps it's still using the "training wheels" setting ?
ID: 58641 · Report as offensive
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,610,380
RAC: 3,377
Message 58642 - Posted: 25 Aug 2018, 10:45:27 UTC - in response to Message 58574.  

Some information in the batch list about problems with a batch would be useful, e.g. what to do with it or a link to a message about it.
ID: 58642 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,997,390
RAC: 21,721
Message 58643 - Posted: 25 Aug 2018, 18:37:41 UTC - in response to Message 58642.  

Some information in the batch list about problems with a batch would be useful, e.g. what to do with it or a link to a message about it.


Is there a specific batch you are having problems with? If so some of us may be able to respond with some more information.
ID: 58643 · Report as offensive
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,967,615
RAC: 14,422
Message 58644 - Posted: 25 Aug 2018, 22:49:39 UTC - in response to Message 58642.  

Certainly a lot of computing error failures with batches 738 and 742 if that helps. There is an error thread further down number crunching.
ID: 58644 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58645 - Posted: 26 Aug 2018, 1:46:01 UTC

Batch 738 had a set up error. (INANCILA, which means mismatched data files.)
A message was posted to Abort them.

******************

Batch 742, the sam25's.
Yes, there were a lot of failures with these.
The project person checked everything and couldn't find anything wrong.
And I had run several that had both 1 failure and 2, and there weren't any problems.
So we decided it was most likely just people's computers, and a sensitive modeling area. And I've since run lots of sam25's that have failed on other computers, all with no problems.

Possibly a lot of people were/are running BOINC with the default "training wheels" settings. This apparently works well with other projects, but is a computing hazard with cpdn.
ID: 58645 · Report as offensive
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org