climateprediction.net (CPDN) home page
Thread 'Announcement: Database residual problem - misallocated WUs'

Thread 'Announcement: Database residual problem - misallocated WUs'

Message boards : Number crunching : Announcement: Database residual problem - misallocated WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
old_user2853

Send message
Joined: 29 Aug 04
Posts: 4
Credit: 125,007
RAC: 0
Message 12749 - Posted: 21 May 2005, 6:57:42 UTC - in response to Message 12724.  

> > host id 165678
> > work id 47911
> > result id 719682
> >
> > work unit 26sp_300123158_0
> >
> > This is still running but result indicates done with client error
>
&gt; That one's not a problem Allan. <a> href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=719682"&gt;That
&gt; result</a> is registered to your host and the other system that was running it
&gt; no longer exists (I guess the owner must have merged it). And there's no need
&gt; to worry about losing credits because of the first 46 trickles being sent by
&gt; the other system as you get the credits appropriate for your most recent
&gt; trickle.
The other system was mine, I merged them when I discovered I had two identical hosts.
ID: 12749 · Report as offensive     Reply Quote
old_user28498

Send message
Joined: 4 Nov 04
Posts: 16
Credit: 11,577,003
RAC: 0
Message 12750 - Posted: 21 May 2005, 7:24:11 UTC
Last modified: 21 May 2005, 7:26:01 UTC

Well, this is what happened with <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=69015">host 69015</a> (see posts above in this same thread):

Calculations for result <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=786821">786821</a> were completed, and for the look of it here, the upload went up without any problem. However, the plots for phase 3 do not appear in the result page. No science information seems to be lost as the <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=786821">trickles</a> do point to the host doing the crunching (69015). If the science team ever wants to look at the result they can find the right host there. 69015 is now crunching the next wu allocated (but the previous one is still 'in progress' as I reported above, even if it is, in fact, completed.

Interestingly enough, now I find myself also at the opposite end of the problem. My host <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=129362">129362</a> got last night result <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=856358">856358</a>, and if you look at the <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=856358">trickles for 856358</a> you can see how this result appears to have been sent already to host <a>55941</a>, which is not mine. I ended up with a trickle's worth of credit right away as that host had upload already one.

So now I am considering suspending temporarily result 856358 to see if the other host keeps at it (my conputer won't be idle meanwhile, as it is a multiprocessor machine). Does anyone know if sending a STOP signal to hadsm3um has any adverse effect?

All this was running Linux, BOINC 4.19 and HADSM 4.13, by the way.

Cheers,

LS

ID: 12750 · Report as offensive     Reply Quote
old_user7038

Send message
Joined: 31 Aug 04
Posts: 2
Credit: 225,332
RAC: 0
Message 12777 - Posted: 22 May 2005, 10:39:48 UTC

Hi guys, same problem here.

Result: 858515
Assigned host: 169983
calculating host: 25273 (mine)

Have done 4 trickles already and now suspended the WU. I'm wondering what to do now as I don't want to crunch 100% seti for too long ;)

Regards, MrSpadge
ID: 12777 · Report as offensive     Reply Quote
old_user13614

Send message
Joined: 6 Sep 04
Posts: 6
Credit: 195,123
RAC: 0
Message 12792 - Posted: 23 May 2005, 3:45:33 UTC

I think I have one of these:

WU: 565036
Work Unit name: 2u1q_300153589_1
My Host ID: 22439
Host ID identified on results page: 48533

Completed 6 trickles - computer is showing that it is working on TS 75001 when I checked seconds ago, so a 7th trickle will come through shortly.

If the crunching my computer is doing will actually be valuable - that is more important than getting the credit - and I will just let it keep going.

Thanks for all your work.

Lornix
ID: 12792 · Report as offensive     Reply Quote
Profileold_user12285

Send message
Joined: 4 Sep 04
Posts: 14
Credit: 468,276
RAC: 0
Message 12807 - Posted: 23 May 2005, 18:14:49 UTC

Here is my solution to the problem - do a reset of the project...and all is well, WUs are registered and trickling...case closed

Why you may ask? Well, I'm using 4.19 so I deleted the unregistered WU as instructed in the forum, only to be replaced by another unregistered WU, only compounding the problem...
Since resources to fix the problem are scarce and there is no guarantee that the completed WU will be uploaded and saved, so I took my losses and reset the project:)
ID: 12807 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 12808 - Posted: 23 May 2005, 19:03:50 UTC
Last modified: 23 May 2005, 19:04:53 UTC

I added some more information to the first post.

The last suggestion is very messy and people may well prefer to follow 'bosh's last post which is what I was hinting at by saying "If you have done less than a couple of hours of work on the WU then it is easiest and safest to just abort the run."

(Didn't want to sound too dismissive of people's work.)

Sorry I cannot give an estimate of when Tolu may be able to look at and consider the possibility of fixing the problem.
ID: 12808 · Report as offensive     Reply Quote
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 12815 - Posted: 23 May 2005, 22:58:20 UTC
Last modified: 23 May 2005, 23:02:04 UTC

&gt;If there is work done but not by you, please report your host id, the
&gt;ResultID and name, and the host number that has done the work.

Hi,

WU-Name: 2mpa_300143975_1 (<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=555422">555422</a>)
Result-ID: <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=857093">857093</a>
Working on this unit: <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=5957">Host 5957</a>
Shown in resultlist of (assigned to):<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=86867">Host 86867</a>

Ciao
ID: 12815 · Report as offensive     Reply Quote
old_user1275

Send message
Joined: 26 Aug 04
Posts: 2
Credit: 327,277
RAC: 0
Message 12822 - Posted: 24 May 2005, 8:38:13 UTC

Result ID: 817970
Workunit ID: 532100
Host ID: 163416

Workunit is currently 66% complete and not in my list of reults for this host.
ID: 12822 · Report as offensive     Reply Quote
old_user845
Avatar

Send message
Joined: 14 Aug 04
Posts: 13
Credit: 1,231,931
RAC: 0
Message 12833 - Posted: 24 May 2005, 14:00:56 UTC

name: 3puk_300195210, my Host ID: 161068
It had not trickled yet and I deleted it.
ID: 12833 · Report as offensive     Reply Quote
Profileold_user12285

Send message
Joined: 4 Sep 04
Posts: 14
Credit: 468,276
RAC: 0
Message 12841 - Posted: 24 May 2005, 21:05:21 UTC

Just to provide a bit of an update to my previous post…
A reset on my second PC, v4.19, Host ID 21990 did not produce the same results,
but rather initially reported as "unsent" under "Server State" and then changed to Host ID 166993 (not mine), so this time I deleted WU, and finally CPDN gave me a WU registered to me in the "Results for Hosts".

So in conclusion, it seems to be a random hit and miss...but the same result can be achieved by deleting WU repeatedly (less drastic measure), until satisfactory outcome is achieved. And perhaps most of you "geeks" already new this, but I sure did not, so with apology… :)

PS. On the bright side, after the reset, my Host ID remained the same...
ID: 12841 · Report as offensive     Reply Quote
old_user7038

Send message
Joined: 31 Aug 04
Posts: 2
Credit: 225,332
RAC: 0
Message 12862 - Posted: 25 May 2005, 10:25:36 UTC

Guys, this starts to suck a bit. Aborted the wrong WU and got another wrong one:

Result: 875418
Assigned host: 171513
Calculating host: 25273

Regards, MrS
ID: 12862 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 12863 - Posted: 25 May 2005, 10:42:49 UTC

As 'bosh said you may have to abort/reset a few times to get an unaffected WU.
ID: 12863 · Report as offensive     Reply Quote
Profileold_user16753
Avatar

Send message
Joined: 11 Sep 04
Posts: 12
Credit: 74,234
RAC: 0
Message 12884 - Posted: 26 May 2005, 10:43:52 UTC

host 30823
ResultID WorkUnitID State
870178 572867 aborted
841692 559944 not downloaded
841691 559943 not downloaded

ID: 12884 · Report as offensive     Reply Quote
Profileold_user2275
Avatar

Send message
Joined: 28 Aug 04
Posts: 69
Credit: 260,395
RAC: 0
Message 12887 - Posted: 26 May 2005, 11:19:22 UTC

did no resetting - just aborted the one that wasn't listed on the host. I have two wu's on my host now, both listed on my host, so I keep them both


ID: 12887 · Report as offensive     Reply Quote
Profileold_user733
Avatar

Send message
Joined: 9 Aug 04
Posts: 25
Credit: 4,756,979
RAC: 0
Message 12902 - Posted: 26 May 2005, 23:59:00 UTC
Last modified: 27 May 2005, 0:30:09 UTC

I just got some WU's on a couple of machines, but I don't see them listed in my "results" page (yet). Is there a delay before they show up?
-----
Actually, two results showed up for one of my machines, but they are different than the ones I got. The other machine's WU's did not show up. I'm thinking I will have to abort them all and try again.
-----
These are the problem WU's:
Host: 6415 - Result ID: 880949 - Name: 3y7o_100206157 (not on machine)
Host: 6415 - Result ID: 880941 - Name: 3y7g_100206149_0 (not on machine)
Host: 6415 - Name: 3zvo_100208339_0 (not in results page) - aborting
Host: 6415 - Name: 3zx7_100208394_0 (not in results page) - aborting
Host: 1113 - Name: 3zx1_100208388_0 (not in results page) - aborting
Host: 1113 - Name: 3xv6_100205703_0 (not in results page) - aborting
-----
Interesting note: Results 880941 and 880949 are listed as being sent to me at an earlier time than I got the other four units.
-----
Update: After aborting the 4 unlisted WU's, I got 1 WU per machine that DID show up on my results page. Whew!
ID: 12902 · Report as offensive     Reply Quote
old_user26115
Avatar

Send message
Joined: 21 Oct 04
Posts: 24
Credit: 207,633
RAC: 0
Message 12916 - Posted: 27 May 2005, 19:31:58 UTC
Last modified: 1 Jun 2005, 16:28:07 UTC

Has somebody noted: that somebody else got the credits for the trickles?

For example: Legoman ; did you obtain the credits for the 10 trickles from <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=553146">WU 553146 </a> or named as <b>2kyo_300141699</b>?

Sorry but I got the 900 credits[&gt;EDIT] meanwhile they are at 1600 credits[/EDIT], I don't know why and I want flame somebody also no offense!

greetz from Switzerland
littleBouncer

ID: 12916 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 12925 - Posted: 28 May 2005, 5:45:41 UTC
Last modified: 28 May 2005, 10:31:43 UTC

Now I've got one of those too :

2y76_300159022_0 with resultid=853050 should be attached to hostid=142111 but it is somehow attached to hostid=131745 too.

I guess, it has been crashed by hostid=131745 (BOINC 4.25), then it has been delivered to my hostid=142111 but the server "forgot" to create a second ResultID from that WU with wuid=571281 for me.

So actually resultid=891806 would be mine I guess - but that's in "unsent" state.


The other host hostid=131745 (the one with 4.25) crashes everything anyway, it has no trickles (except for the ones from my host), 161 results and 94 credits.


edit : although the trickles appear for the foreign host, they show up in my trickle list too
ID: 12925 · Report as offensive     Reply Quote
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 12929 - Posted: 28 May 2005, 11:34:51 UTC - in response to Message 12916.  

&gt; Has somebody noted: that somebody else got the credits for the trickles?

Yes... <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=857093">there is a workunit</a> in my resultlist that someone else is computing for but the credit is added to my account.

Ciao
ID: 12929 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2186
Credit: 64,822,615
RAC: 5,275
Message 12953 - Posted: 30 May 2005, 16:59:45 UTC

Result ID: <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=871755">871755</a>
Result Name: 3raw_200197113_0
Problem: Not listed under results for my computer <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=53552">53553</a>, although I've completed 26 trickles.

Aaargh. I wasn't paying attention. This is my first one like this. Any I wondered why my BOINCSTATS stats weren't increasing for this computer. Did a reset.
ID: 12953 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 12956 - Posted: 30 May 2005, 19:45:11 UTC

Here's one more :

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=173538

15 results, no trickles but 283.55 credits.

I wonder if it will ever be possible to recover all dependencies between model, host and trickles.
ID: 12956 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Announcement: Database residual problem - misallocated WUs

©2024 cpdn.org