climateprediction.net (CPDN) home page
Thread 'no credit awarded?'

Thread 'no credit awarded?'

Message boards : Number crunching : no credit awarded?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68520 - Posted: 1 Mar 2023, 12:43:04 UTC - in response to Message 68451.  

bullschuck wrote:
Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?
Further news on proposed Mac Intel IFS tasks would be appreciated as well.
Thanks!
Still nothing?

I got a reply back from CPDN admin who checked the tasks for your account. He says: "Beyond a certain date (mid December) we have no trickles for these 14 successful results. Credit on the main site is currently awarded by trickles." So that's the reason, though why no trickles they can't tell, maybe network issues? I'm not very familiar with the trickle mechanism maybe the moderators can jump in on this one.

Hope that's useful.
ID: 68520 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68522 - Posted: 1 Mar 2023, 13:29:34 UTC

maybe the moderators can jump in on this one.
Not sure I can help much on this, I know there are times when the server that processes the trickles has been down but any deeper look into the issue for any particular computer would I assume, mean accessing the logs for the time that task was running. I really have no idea whether that is likely to still be possible?

The way I see it is that while from time to time it causes problems, at least here we mostly get some credit for tasks that fail unlike most projects where if a task fails even after several hours of computing we get absolutely nothing. I know that switching to that system would be a lot simpler but would I suspect lead to even more howls of anguish!
ID: 68522 · Report as offensive     Reply Quote
bullschuck

Send message
Joined: 22 May 21
Posts: 39
Credit: 1,197,645
RAC: 4,143
Message 68523 - Posted: 1 Mar 2023, 13:37:37 UTC - in response to Message 68520.  

bullschuck wrote:
Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?
Further news on proposed Mac Intel IFS tasks would be appreciated as well.
Thanks!
Still nothing?

I got a reply back from CPDN admin who checked the tasks for your account. He says: "Beyond a certain date (mid December) we have no trickles for these 14 successful results. Credit on the main site is currently awarded by trickles." So that's the reason, though why no trickles they can't tell, maybe network issues? I'm not very familiar with the trickle mechanism maybe the moderators can jump in on this one.

Hope that's useful.


That's not very useful. There's something broken at cdpn and it's not my network connection.
ID: 68523 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68525 - Posted: 1 Mar 2023, 14:33:33 UTC - in response to Message 68523.  

bullschuck wrote:
Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?
Further news on proposed Mac Intel IFS tasks would be appreciated as well.
Thanks!
Still nothing?
I got a reply back from CPDN admin who checked the tasks for your account. He says: "Beyond a certain date (mid December) we have no trickles for these 14 successful results. Credit on the main site is currently awarded by trickles." So that's the reason, though why no trickles they can't tell, maybe network issues? I'm not very familiar with the trickle mechanism maybe the moderators can jump in on this one.

Hope that's useful.
That's not very useful. There's something broken at cdpn and it's not my network connection.
Sorry, I didn't mean network issues at your end. I was referring to the issues CPDN had around this time. Though my understanding was that only affected the upload server, whereas trickles go to the university site I believe.
ID: 68525 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68526 - Posted: 1 Mar 2023, 14:38:48 UTC - in response to Message 68522.  

The way I see it is that while from time to time it causes problems, at least here we mostly get some credit for tasks that fail unlike most projects where if a task fails even after several hours of computing we get absolutely nothing. I know that switching to that system would be a lot simpler but would I suspect lead to even more howls of anguish!
True, but the software could still grant credit for a completed task because it knows the total credit for that task: if ( task_complete & credit < task_credit) user_credit = task_credit. Something like that would compensate for lost trickles.

I don't know if this is a CPDN issue or it's a boinc implementation issue. If it's CPDN it's intermittent. I agree it would be good if it was more robust.
---
CPDN Visiting Scientist
ID: 68526 · Report as offensive     Reply Quote
bullschuck

Send message
Joined: 22 May 21
Posts: 39
Credit: 1,197,645
RAC: 4,143
Message 68527 - Posted: 1 Mar 2023, 16:25:10 UTC - in response to Message 68525.  

bullschuck wrote:
Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?
Further news on proposed Mac Intel IFS tasks would be appreciated as well.
Thanks!
Still nothing?
I got a reply back from CPDN admin who checked the tasks for your account. He says: "Beyond a certain date (mid December) we have no trickles for these 14 successful results. Credit on the main site is currently awarded by trickles." So that's the reason, though why no trickles they can't tell, maybe network issues? I'm not very familiar with the trickle mechanism maybe the moderators can jump in on this one.

Hope that's useful.
That's not very useful. There's something broken at cdpn and it's not my network connection.
Sorry, I didn't mean network issues at your end. I was referring to the issues CPDN had around this time. Though my understanding was that only affected the upload server, whereas trickles go to the university site I believe.


Cool, cool, cool. Mea culpa at being so defensive. That would imply that the upload server is still experiencing issues, as I completed a task the past weekend, 25 Feb 2023. I think it would also imply that it's experiencing issues with the trickles but not the completion message.
ID: 68527 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,717,389
RAC: 8,111
Message 68528 - Posted: 1 Mar 2023, 16:50:20 UTC - in response to Message 68527.  

I think it would also imply that it's experiencing issues with the trickles but not the completion message.
Right. And often the final trickle data is prepared for, and included in, the same file as the completion report.

It seems to me that the problem must be occurring at the project end, when the message is received and broken down into its constituent parts for filing and reporting. I'm trying to assemble evidence for a search in that part of the system.
ID: 68528 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68529 - Posted: 1 Mar 2023, 16:53:13 UTC - in response to Message 68527.  
Last modified: 1 Mar 2023, 16:55:41 UTC

Cool, cool, cool. Mea culpa at being so defensive. That would imply that the upload server is still experiencing issues, as I completed a task the past weekend, 25 Feb 2023. I think it would also imply that it's experiencing issues with the trickles but not the completion message.
Yes, I see what you mean. I looked at your host https://www.cpdn.org/results.php?hostid=1526736 and you should have got credit by now for those mac HadCM3 tasks as the credit script runs at the weekend.

I wonder if this is specific to the mac models? As far as I can see, the linux tasks got their credit ok. I'm hoping Richard Haselgrove will jump in here as he has a deeper understanding of the trickle mechanism. We should be able to retrieve the task output though I don't know if the trickle uploads are preserved in that.

I will chase this up again when I get a chance to talk to Andy properly rather than email him. If someone else can help out or suggest ways to investigate so much the better.

Edit: just seen Richard's post. If we can figure out what might be wrong me or the moderators can ask CPDN to look at specifics.
But there appears to be no trickles at all, not just the final one. Otherwise there would be some credit.
---
CPDN Visiting Scientist
ID: 68529 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 34
Credit: 9,075,151
RAC: 374
Message 68532 - Posted: 1 Mar 2023, 21:02:37 UTC

Again, it is not only the mac wus.

I have 30 HadAM4h and 43 HadSM4 WUs, which didn't get any credit either.Last time i got full credit was at the end of November. Some of the WUs, which reported ad the beginning of December got credits partially.
ID: 68532 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68534 - Posted: 1 Mar 2023, 21:42:28 UTC - in response to Message 68532.  

Again, it is not only the mac wus.

I have 30 HadAM4h and 43 HadSM4 WUs, which didn't get any credit either.Last time i got full credit was at the end of November. Some of the WUs, which reported ad the beginning of December got credits partially.
Ok, thanks for clearing that up. So whatever the cause, it's affecting the Hadley models on all hosts, not OpenIFS, so that should help to pin it down. It's not my area of expertise but I will ask Richard & nag CPDN to investigate & fix it.
ID: 68534 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,885,708
RAC: 18,983
Message 68536 - Posted: 1 Mar 2023, 22:23:34 UTC

I think it's already been relatively well established in this thread as a strong hypothesis, if i can put it that way, that the credit problem affects all Hadley models, it started at the end of November, and the reason that there's no credit is because there're no trickles showing up on the website and credit is awarded per trickle. It seems like we're rediscovering things again, hopefully this time it won't be forgotten.

There's something that hasn't been mentioned before, that I've seen, that I think may be related and worth looking into at least. The problem seems to have started around the same time OIFS got released. OIFS is credited as all or nothing, and trickles never show up on the website. Hadley is credited per trickle and they (should) show up on the website. These seem to be 2 different credit mechanisms and perhaps the Hadley one is not running but OIFS one is? If it's not exactly that perhaps there's some other connection, interaction, friction between how the 2 models get credited that's causing the issue?

Another idea is perhaps asking around at the BOINC workshop for ideas as to where to look for a problem like this?
ID: 68536 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68537 - Posted: 1 Mar 2023, 23:14:25 UTC - in response to Message 68536.  

There's something that hasn't been mentioned before, that I've seen, that I think may be related and worth looking into at least. The problem seems to have started around the same time OIFS got released. OIFS is credited as all or nothing, and trickles never show up on the website. Hadley is credited per trickle and they (should) show up on the website. These seem to be 2 different credit mechanisms and perhaps the Hadley one is not running but OIFS one is? If it's not exactly that perhaps there's some other connection, interaction, friction between how the 2 models get credited that's causing the issue?
It's not related to when OpenIFS was released. OpenIFS tasks first went out years ago. The models know nothing about each other, the controlling code is completely different between the two (though that might be the cause of the problem).

And I'm still puzzled because the last conversation I had with Andy was that only trickles (for OpenIFS) are awarded credit, not completion. But I see what you mean. Richard's offered to take a closer look. Next time there's a tech meeting I'm in I'll bring it up, it will be more effective that way than myself or the moderators sending emails.
ID: 68537 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,717,389
RAC: 8,111
Message 68538 - Posted: 1 Mar 2023, 23:35:32 UTC - in response to Message 68536.  

Another idea is perhaps asking around at the BOINC workshop for ideas as to where to look for a problem like this?
Sadly, I don't think that will help. There's not much cross-over between projects at these events: the 'trickle' mechanism is pretty much unique to CPDN.

Up till now. Both Glenn and I picked up a clear similarity with an emerging 'BlackHoles@Home' project, which aims to study Einsteinian physics through simulations of black hole development. Massive datasets, multi-month simulation runtimes - sound familiar? But they need to work on the difference between 'checkpoints' (stored locally by the client), and 'tickles' (reporting progress to the server). Intermediate uploads are a third contender in that space.
ID: 68538 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68539 - Posted: 2 Mar 2023, 0:10:37 UTC - in response to Message 68538.  
Last modified: 2 Mar 2023, 0:11:08 UTC

Up till now. Both Glenn and I picked up a clear similarity with an emerging 'BlackHoles@Home' project, which aims to study Einsteinian physics through simulations of black hole development. Massive datasets, multi-month simulation runtimes - sound familiar? But they need to work on the difference between 'checkpoints' (stored locally by the client), and 'tickles' (reporting progress to the server).
Yes, that was interesting. Their runs are so long, they have to upload the checkpoints to their servers in order to send them out to another computer in case the first one goes down in order to keep the task going. That's equivalent to OpenIFS sending up its restart files (~1Gb) every 5-10mins!

I was interested in their docker approach, though puzzled why they went for docker rather than a non-root solution like Singularity. He seemed to argue that docker would allow a per-host build to take advantage of the chipset options, but I suspect the difference in speed they will see is dwarfed by the variation in up-time of the client & %age cpu that volunteers set. And they had large memory tasks with all the fun that will create for them (as well we found out).

Nice project though and well funded. Once they get past teething problems, will be a fun one to join.
ID: 68539 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,885,708
RAC: 18,983
Message 68540 - Posted: 2 Mar 2023, 10:01:54 UTC - in response to Message 68537.  

There's something that hasn't been mentioned before, that I've seen, that I think may be related and worth looking into at least. The problem seems to have started around the same time OIFS got released. OIFS is credited as all or nothing, and trickles never show up on the website. Hadley is credited per trickle and they (should) show up on the website. These seem to be 2 different credit mechanisms and perhaps the Hadley one is not running but OIFS one is? If it's not exactly that perhaps there's some other connection, interaction, friction between how the 2 models get credited that's causing the issue?
It's not related to when OpenIFS was released. OpenIFS tasks first went out years ago. The models know nothing about each other, the controlling code is completely different between the two (though that might be the cause of the problem).

I'm not sure what you mean as I don't know what went on behind scenes but I got my first OIFS tasks on 28 November of last year, and my first un-credited Hadley tasks reported as competed on 30 November. So what I see is that OIFS production release happened around the same time as Hadley models stopped getting credit. It just seems like there just might be a connection there somehow.

And I'm still puzzled because the last conversation I had with Andy was that only trickles (for OpenIFS) are awarded credit, not completion. But I see what you mean.

Yeah that is puzzling, I've never seen partially credited OIFS tasks, it seems like it's all or nothing and not based on trickles. Maybe there's some kind of redundant process that happens with OIFS (but not Hadley) that if a task is successfully completed but has no credit (even though it should because trickles are credited) then full credit is awarded at completion? This would explain why OIFS is credited and Hadley isn't even though both are credited by trickles but neither has trickles show up on the website so neither should be credited. It seems like the question to investigate is why starting 30 November trickles stopped showing up on the website (for Hadley). It seems like a narrow and specific enough question. Reviewing the process of how trickles show up on the website might be a good starting point. It might also reveal why OIFS trickles don't show up.
ID: 68540 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68541 - Posted: 2 Mar 2023, 11:40:28 UTC
Last modified: 2 Mar 2023, 13:31:32 UTC

Yeah that is puzzling, I've never seen partially credited OIFS tasks,
Agreed. My comments on credit are virtually all in response on others posting when there's a problem as I don't really bother about it much. After reading your post, I have gone back through all of my main site OIFS tasks and none of the partially completed ones, even when I can see over half the zips have uploaded have been granted any credit. This makes me think somehow Andy has got things mixed up a bit.

There have been times in the past when credit hasn't shown despite zips being on the website but most of them have been when there are problems with the credit script having fallen over or not been restarted after an event of some kind. There have also been times when the credits have appeared despite zips not showing on the task pages, presumably because the problem occurs after the processes to display them and the ones to go into the credit script separate.
ID: 68541 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,490,630
RAC: 21,281
Message 68542 - Posted: 2 Mar 2023, 14:56:43 UTC - in response to Message 68534.  
Last modified: 2 Mar 2023, 14:57:03 UTC

it's affecting the Hadley models on all hosts

Not just Hadleys, it also affects WAH2 models running on Windows, example https://www.cpdn.org/result.php?resultid=22250721
from December 2022. As is common, it says "No trickles!".
ID: 68542 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68543 - Posted: 2 Mar 2023, 15:27:06 UTC - in response to Message 68542.  

Not just Hadleys, it also affects WAH2 models running on Windows, example https://www.cpdn.org/result.php?resultid=22250721
All tasks prior to OIFS coming on the scene are using the Hadley model from the met office, just using it in slightly different ways. I don't know what the differences are well enough to explain them though.
ID: 68543 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,885,708
RAC: 18,983
Message 68544 - Posted: 3 Mar 2023, 6:14:30 UTC - in response to Message 68541.  

There have been times in the past when credit hasn't shown despite zips being on the website but most of them have been when there are problems with the credit script having fallen over or not been restarted after an event of some kind. There have also been times when the credits have appeared despite zips not showing on the task pages, presumably because the problem occurs after the processes to display them and the ones to go into the credit script separate.

I think I have seen the former but not the latter. If the latter is possible, that means there's another complication as to where one could look to investigate the current problem.
ID: 68544 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68545 - Posted: 3 Mar 2023, 6:23:02 UTC - in response to Message 68544.  

I think I have seen the former but not the latter. If the latter is possible, that means there's another complication as to where one could look to investigate the current problem.
Or possibly it narrows down where to look?
ID: 68545 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : no credit awarded?

©2024 cpdn.org