climateprediction.net (CPDN) home page
Thread 'Results being sent to multiple hosts now..?'

Thread 'Results being sent to multiple hosts now..?'

Message boards : Number crunching : Results being sent to multiple hosts now..?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileold_user156
Avatar

Send message
Joined: 5 Aug 04
Posts: 186
Credit: 1,612,182
RAC: 0
Message 4900 - Posted: 1 Oct 2004, 20:28:39 UTC
Last modified: 1 Oct 2004, 20:37:18 UTC

Looking through the 'workunit' link for a lot of my recent results; eg. <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=254987">#254987</a>, I noticed that they're all being sent to multiple hosts within a few minutes of each other - this doesn't seem to be normal behaviour for CPDN..?
ID: 4900 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 4906 - Posted: 1 Oct 2004, 21:30:31 UTC - in response to Message 4900.  

&gt; Looking through the 'workunit' link for a lot of my recent results; eg. <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=254987">#254987</a>,
&gt; I noticed that they're all being sent to multiple hosts within a few minutes
&gt; of each other - this doesn't seem to be normal behaviour for CPDN..?
&gt;

Hi well it alternates upload servers, i.e. with BOINC you have to know "a priori" which upload server to go to at the workunit generation stage. Unlike "old CPDN" which assigned an upload server at the very end of a run.
ID: 4906 · Report as offensive     Reply Quote
ProfileAndrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 4912 - Posted: 1 Oct 2004, 22:44:58 UTC - in response to Message 4906.  

&gt; Hi well it alternates upload servers, i.e. with BOINC you have to know "a
&gt; priori" which upload server to go to at the workunit generation stage. Unlike
&gt; "old CPDN" which assigned an upload server at the very end of a run.

It's a very recent change, though. Given the high rate of failure with WUs at the moment, it greatly increases the chance of them being completed, but there is a significant probability of some models being done three or four times. Is this wanted?
ID: 4912 · Report as offensive     Reply Quote
Profileold_user156
Avatar

Send message
Joined: 5 Aug 04
Posts: 186
Credit: 1,612,182
RAC: 0
Message 4919 - Posted: 2 Oct 2004, 1:04:38 UTC
Last modified: 2 Oct 2004, 1:05:26 UTC

As Andrew has written, it is a very recent change - looking through my list or results, prior to September 20th, only a single result was sent out, unless the work_unit threw an 'computing error'.

My last 'single send' work unit was #240740, result #248824 - my next work unit #252730 was sent as result #s 246909, 246910, 246911 &amp; 246912 between 05:16 &amp; 05:19UTC on September 27th - all very close together.

Looking through those last 10 results, I can't see a single one that is being still being processed by multiple machines though - ie. they all have only one set or zero recent trickles.

<a href="http://www.nmvs.dsl.pipex.com/"><img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=6&amp;team=off&amp;trans=off"></a>
ID: 4919 · Report as offensive     Reply Quote
Yeti

Send message
Joined: 5 Aug 04
Posts: 178
Credit: 18,762,752
RAC: 44,075
Message 4937 - Posted: 2 Oct 2004, 9:25:06 UTC
Last modified: 2 Oct 2004, 9:25:38 UTC

This <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=254064">WU</a> is processed active by two different hosts ...

<a href="http://www.boinc.dk/index.php?page=user_statistics&amp;project=cpdn&amp;userid=34"><img border="0" height="080" src="http://34.cpdn.sig.boinc.dk?188"></a>

Supporting <b>BOINC</b>, because it is really a <b>great concept !</b>
ID: 4937 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 5187 - Posted: 10 Oct 2004, 17:13:21 UTC - in response to Message 4919.  

&gt; As Andrew has written, it is a very recent change - looking through my list or
&gt; results, prior to September 20th, only a single result was sent out, unless
&gt; the work_unit threw an 'computing error'.
&gt;
&gt; My last 'single send' work unit was #240740, result #248824 - my next work
&gt; unit #252730 was sent as result #s 246909, 246910, 246911 &amp; 246912 between
&gt; 05:16 &amp; 05:19UTC on September 27th - all very close together.
&gt;
&gt; Looking through those last 10 results, I can't see a single one that is being
&gt; still being processed by multiple machines though - ie. they all have only one
&gt; set or zero recent trickles.
&gt;
&gt; <a href="http://www.nmvs.dsl.pipex.com/"><img> src="http://boinc.mundayweb.com/cpdn/stats.php?userID=6&amp;team=off&amp;trans=off"&gt;</a>
&gt;

Hi, Nick,

My last five W/U were sent out at least four times; the first two of the five were also on 27 Sep. The most recent was 10 Oct (W/U#258382).

A couple are currently being processed by two machines.

Seems it was not a transient phenomenom.

Jim



We have met the enemy and he is us -- Pogo
ID: 5187 · Report as offensive     Reply Quote
old_user1216

Send message
Joined: 26 Aug 04
Posts: 6
Credit: 122,963
RAC: 0
Message 7851 - Posted: 27 Jan 2005, 23:40:36 UTC - in response to Message 5187.  

I have read the reasons given for the sending of multiple copies of a work unit. I appreciate the difficulties making sure that it does not happen, and I appreciate the need to make sure that SOMEONE finishes the WU. But I feel as though I have been wasting my time for the last 2 weeks, working on a WU that was completed by someone else on 13th January (http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=214522)

David Hatherly
&gt;
&gt; Hi, Nick,
&gt;
&gt; My last five W/U were sent out at least four times; the first two of the five
&gt; were also on 27 Sep. The most recent was 10 Oct (W/U#258382).
&gt;
&gt; A couple are currently being processed by two machines.
&gt;
&gt; Seems it was not a transient phenomenom.
&gt;
&gt; Jim
&gt;
&gt;
&gt;
&gt; We have met the enemy and he is us -- Pogo
&gt;
ID: 7851 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 7854 - Posted: 27 Jan 2005, 23:59:23 UTC

David,

It wasn't an error it was deliberately done. The CP team are planning to write a paper on the differences they get sending out the same workunit to different computers.

There has been a fair amount of discussion on why different computers produce different results. It seems as if it is to do with different maths libraries being used.

Your results may well be different from other people crunching the same work unit. This does not mean that one is wrong, both are useful. They can probably be considered as members of an initial condition ensemble and larger ic ensembles are wanted.
Visit BOINC WIKI for help

And join BOINC Synergy for all the news in one place.
ID: 7854 · Report as offensive     Reply Quote
Profileold_user993

Send message
Joined: 23 Aug 04
Posts: 49
Credit: 183,611
RAC: 0
Message 7856 - Posted: 28 Jan 2005, 0:04:51 UTC - in response to Message 7854.  

&gt; It wasn't an error it was deliberately done. The CP team are planning to write
&gt; a paper on the differences they get sending out the same workunit to different
&gt; computers.
&gt;
&gt; There has been a fair amount of discussion on why different computers produce
&gt; different results. It seems as if it is to do with different maths libraries
&gt; being used.
&gt;
&gt; Your results may well be different from other people crunching the same work
&gt; unit. This does not mean that one is wrong, both are useful. They can probably
&gt; be considered as members of an initial condition ensemble and larger ic
&gt; ensembles are wanted.

Exactly. Couldn't have said it better myself.

Sylvia, Neil and Andrew Martin are the ones pushing this paper ahead. It's a good test case of using the database (trying to work towards a nice eSciencey sort of interface for scientists), it also tells us something about how machine/math library-dependent the model is, it acts as a sort of ic ensemble (where the results differ) and it allows us to address some fairly common questions we get at various seminars and conferences. It should be an interesting paper.

Dave (still at work - trying to get some stuff done for the Exeter conference next week)
ID: 7856 · Report as offensive     Reply Quote

Message boards : Number crunching : Results being sent to multiple hosts now..?

©2024 cpdn.org