Message boards : Number crunching : no credit awarded?
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
bullschuck wrote: Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?Still nothing? I got a reply back from CPDN admin who checked the tasks for your account. He says: "Beyond a certain date (mid December) we have no trickles for these 14 successful results. Credit on the main site is currently awarded by trickles." So that's the reason, though why no trickles they can't tell, maybe network issues? I'm not very familiar with the trickle mechanism maybe the moderators can jump in on this one. Hope that's useful. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
maybe the moderators can jump in on this one.Not sure I can help much on this, I know there are times when the server that processes the trickles has been down but any deeper look into the issue for any particular computer would I assume, mean accessing the logs for the time that task was running. I really have no idea whether that is likely to still be possible? The way I see it is that while from time to time it causes problems, at least here we mostly get some credit for tasks that fail unlike most projects where if a task fails even after several hours of computing we get absolutely nothing. I know that switching to that system would be a lot simpler but would I suspect lead to even more howls of anguish! |
Send message Joined: 22 May 21 Posts: 39 Credit: 1,197,645 RAC: 4,143 |
bullschuck wrote:Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?Still nothing? That's not very useful. There's something broken at cdpn and it's not my network connection. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
Sorry, I didn't mean network issues at your end. I was referring to the issues CPDN had around this time. Though my understanding was that only affected the upload server, whereas trickles go to the university site I believe.bullschuck wrote:That's not very useful. There's something broken at cdpn and it's not my network connection.I got a reply back from CPDN admin who checked the tasks for your account. He says: "Beyond a certain date (mid December) we have no trickles for these 14 successful results. Credit on the main site is currently awarded by trickles." So that's the reason, though why no trickles they can't tell, maybe network issues? I'm not very familiar with the trickle mechanism maybe the moderators can jump in on this one.Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?Still nothing? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
The way I see it is that while from time to time it causes problems, at least here we mostly get some credit for tasks that fail unlike most projects where if a task fails even after several hours of computing we get absolutely nothing. I know that switching to that system would be a lot simpler but would I suspect lead to even more howls of anguish!True, but the software could still grant credit for a completed task because it knows the total credit for that task: if ( task_complete & credit < task_credit) user_credit = task_credit. Something like that would compensate for lost trickles. I don't know if this is a CPDN issue or it's a boinc implementation issue. If it's CPDN it's intermittent. I agree it would be good if it was more robust. --- CPDN Visiting Scientist |
Send message Joined: 22 May 21 Posts: 39 Credit: 1,197,645 RAC: 4,143 |
Sorry, I didn't mean network issues at your end. I was referring to the issues CPDN had around this time. Though my understanding was that only affected the upload server, whereas trickles go to the university site I believe.bullschuck wrote:That's not very useful. There's something broken at cdpn and it's not my network connection.I got a reply back from CPDN admin who checked the tasks for your account. He says: "Beyond a certain date (mid December) we have no trickles for these 14 successful results. Credit on the main site is currently awarded by trickles." So that's the reason, though why no trickles they can't tell, maybe network issues? I'm not very familiar with the trickle mechanism maybe the moderators can jump in on this one.Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?Still nothing? Cool, cool, cool. Mea culpa at being so defensive. That would imply that the upload server is still experiencing issues, as I completed a task the past weekend, 25 Feb 2023. I think it would also imply that it's experiencing issues with the trickles but not the completion message. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,717,389 RAC: 8,111 |
I think it would also imply that it's experiencing issues with the trickles but not the completion message.Right. And often the final trickle data is prepared for, and included in, the same file as the completion report. It seems to me that the problem must be occurring at the project end, when the message is received and broken down into its constituent parts for filing and reporting. I'm trying to assemble evidence for a search in that part of the system. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
Cool, cool, cool. Mea culpa at being so defensive. That would imply that the upload server is still experiencing issues, as I completed a task the past weekend, 25 Feb 2023. I think it would also imply that it's experiencing issues with the trickles but not the completion message.Yes, I see what you mean. I looked at your host https://www.cpdn.org/results.php?hostid=1526736 and you should have got credit by now for those mac HadCM3 tasks as the credit script runs at the weekend. I wonder if this is specific to the mac models? As far as I can see, the linux tasks got their credit ok. I'm hoping Richard Haselgrove will jump in here as he has a deeper understanding of the trickle mechanism. We should be able to retrieve the task output though I don't know if the trickle uploads are preserved in that. I will chase this up again when I get a chance to talk to Andy properly rather than email him. If someone else can help out or suggest ways to investigate so much the better. Edit: just seen Richard's post. If we can figure out what might be wrong me or the moderators can ask CPDN to look at specifics. But there appears to be no trickles at all, not just the final one. Otherwise there would be some credit. --- CPDN Visiting Scientist |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
Again, it is not only the mac wus. I have 30 HadAM4h and 43 HadSM4 WUs, which didn't get any credit either.Last time i got full credit was at the end of November. Some of the WUs, which reported ad the beginning of December got credits partially. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
Again, it is not only the mac wus.Ok, thanks for clearing that up. So whatever the cause, it's affecting the Hadley models on all hosts, not OpenIFS, so that should help to pin it down. It's not my area of expertise but I will ask Richard & nag CPDN to investigate & fix it. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,885,708 RAC: 18,983 |
I think it's already been relatively well established in this thread as a strong hypothesis, if i can put it that way, that the credit problem affects all Hadley models, it started at the end of November, and the reason that there's no credit is because there're no trickles showing up on the website and credit is awarded per trickle. It seems like we're rediscovering things again, hopefully this time it won't be forgotten. There's something that hasn't been mentioned before, that I've seen, that I think may be related and worth looking into at least. The problem seems to have started around the same time OIFS got released. OIFS is credited as all or nothing, and trickles never show up on the website. Hadley is credited per trickle and they (should) show up on the website. These seem to be 2 different credit mechanisms and perhaps the Hadley one is not running but OIFS one is? If it's not exactly that perhaps there's some other connection, interaction, friction between how the 2 models get credited that's causing the issue? Another idea is perhaps asking around at the BOINC workshop for ideas as to where to look for a problem like this? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
There's something that hasn't been mentioned before, that I've seen, that I think may be related and worth looking into at least. The problem seems to have started around the same time OIFS got released. OIFS is credited as all or nothing, and trickles never show up on the website. Hadley is credited per trickle and they (should) show up on the website. These seem to be 2 different credit mechanisms and perhaps the Hadley one is not running but OIFS one is? If it's not exactly that perhaps there's some other connection, interaction, friction between how the 2 models get credited that's causing the issue?It's not related to when OpenIFS was released. OpenIFS tasks first went out years ago. The models know nothing about each other, the controlling code is completely different between the two (though that might be the cause of the problem). And I'm still puzzled because the last conversation I had with Andy was that only trickles (for OpenIFS) are awarded credit, not completion. But I see what you mean. Richard's offered to take a closer look. Next time there's a tech meeting I'm in I'll bring it up, it will be more effective that way than myself or the moderators sending emails. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,717,389 RAC: 8,111 |
Another idea is perhaps asking around at the BOINC workshop for ideas as to where to look for a problem like this?Sadly, I don't think that will help. There's not much cross-over between projects at these events: the 'trickle' mechanism is pretty much unique to CPDN. Up till now. Both Glenn and I picked up a clear similarity with an emerging 'BlackHoles@Home' project, which aims to study Einsteinian physics through simulations of black hole development. Massive datasets, multi-month simulation runtimes - sound familiar? But they need to work on the difference between 'checkpoints' (stored locally by the client), and 'tickles' (reporting progress to the server). Intermediate uploads are a third contender in that space. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
Up till now. Both Glenn and I picked up a clear similarity with an emerging 'BlackHoles@Home' project, which aims to study Einsteinian physics through simulations of black hole development. Massive datasets, multi-month simulation runtimes - sound familiar? But they need to work on the difference between 'checkpoints' (stored locally by the client), and 'tickles' (reporting progress to the server).Yes, that was interesting. Their runs are so long, they have to upload the checkpoints to their servers in order to send them out to another computer in case the first one goes down in order to keep the task going. That's equivalent to OpenIFS sending up its restart files (~1Gb) every 5-10mins! I was interested in their docker approach, though puzzled why they went for docker rather than a non-root solution like Singularity. He seemed to argue that docker would allow a per-host build to take advantage of the chipset options, but I suspect the difference in speed they will see is dwarfed by the variation in up-time of the client & %age cpu that volunteers set. And they had large memory tasks with all the fun that will create for them (as well we found out). Nice project though and well funded. Once they get past teething problems, will be a fun one to join. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,885,708 RAC: 18,983 |
There's something that hasn't been mentioned before, that I've seen, that I think may be related and worth looking into at least. The problem seems to have started around the same time OIFS got released. OIFS is credited as all or nothing, and trickles never show up on the website. Hadley is credited per trickle and they (should) show up on the website. These seem to be 2 different credit mechanisms and perhaps the Hadley one is not running but OIFS one is? If it's not exactly that perhaps there's some other connection, interaction, friction between how the 2 models get credited that's causing the issue?It's not related to when OpenIFS was released. OpenIFS tasks first went out years ago. The models know nothing about each other, the controlling code is completely different between the two (though that might be the cause of the problem). I'm not sure what you mean as I don't know what went on behind scenes but I got my first OIFS tasks on 28 November of last year, and my first un-credited Hadley tasks reported as competed on 30 November. So what I see is that OIFS production release happened around the same time as Hadley models stopped getting credit. It just seems like there just might be a connection there somehow. And I'm still puzzled because the last conversation I had with Andy was that only trickles (for OpenIFS) are awarded credit, not completion. But I see what you mean. Yeah that is puzzling, I've never seen partially credited OIFS tasks, it seems like it's all or nothing and not based on trickles. Maybe there's some kind of redundant process that happens with OIFS (but not Hadley) that if a task is successfully completed but has no credit (even though it should because trickles are credited) then full credit is awarded at completion? This would explain why OIFS is credited and Hadley isn't even though both are credited by trickles but neither has trickles show up on the website so neither should be credited. It seems like the question to investigate is why starting 30 November trickles stopped showing up on the website (for Hadley). It seems like a narrow and specific enough question. Reviewing the process of how trickles show up on the website might be a good starting point. It might also reveal why OIFS trickles don't show up. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Yeah that is puzzling, I've never seen partially credited OIFS tasks,Agreed. My comments on credit are virtually all in response on others posting when there's a problem as I don't really bother about it much. After reading your post, I have gone back through all of my main site OIFS tasks and none of the partially completed ones, even when I can see over half the zips have uploaded have been granted any credit. This makes me think somehow Andy has got things mixed up a bit. There have been times in the past when credit hasn't shown despite zips being on the website but most of them have been when there are problems with the credit script having fallen over or not been restarted after an event of some kind. There have also been times when the credits have appeared despite zips not showing on the task pages, presumably because the problem occurs after the processes to display them and the ones to go into the credit script separate. |
Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,488,145 RAC: 21,123 |
it's affecting the Hadley models on all hosts Not just Hadleys, it also affects WAH2 models running on Windows, example https://www.cpdn.org/result.php?resultid=22250721 from December 2022. As is common, it says "No trickles!". |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Not just Hadleys, it also affects WAH2 models running on Windows, example https://www.cpdn.org/result.php?resultid=22250721All tasks prior to OIFS coming on the scene are using the Hadley model from the met office, just using it in slightly different ways. I don't know what the differences are well enough to explain them though. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,885,708 RAC: 18,983 |
There have been times in the past when credit hasn't shown despite zips being on the website but most of them have been when there are problems with the credit script having fallen over or not been restarted after an event of some kind. There have also been times when the credits have appeared despite zips not showing on the task pages, presumably because the problem occurs after the processes to display them and the ones to go into the credit script separate. I think I have seen the former but not the latter. If the latter is possible, that means there's another complication as to where one could look to investigate the current problem. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I think I have seen the former but not the latter. If the latter is possible, that means there's another complication as to where one could look to investigate the current problem.Or possibly it narrows down where to look? |
©2024 cpdn.org