Message boards : Number crunching : no credit awarded?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Send message Joined: 3 Nov 05 Posts: 26 Credit: 687,388 RAC: 529 |
Jumping back to a checkpoint is usually very obvious (representing many hours of work lost), and I didn't notice that happening without obvious cause (such as a power failure or manual suspension). I wonder if it's something to do with hibernation causing a small jump back every evening, but not one that's big enough to notice on the progress? The longer a task runs, the more that would multiply up. I have just taken a quick scan through the Universe@Home results for the host, and I can't see any sign that the virtual machine started running faster when the last CPDN task finished (unfortunately I can't go back very far, and other projects have an even shorter results record). |
Send message Joined: 5 Aug 04 Posts: 178 Credit: 18,972,385 RAC: 40,328 |
Jumping back to a checkpoint is usually very obvious (representing many hours of work lost), and I didn't notice that happening without obvious cause (such as a power failure or manual suspension). I wonder if it's something to do with hibernation causing a small jump back every evening, but not one that's big enough to notice on the progress? The longer a task runs, the more that would multiply up.Or you started every day with the same checkpoint because you didn't reach the next one Supporting BOINC, a great concept ! |
Send message Joined: 3 Nov 05 Posts: 26 Credit: 687,388 RAC: 529 |
Then I wouldn't have made any progress :) |
Send message Joined: 6 Sep 05 Posts: 24 Credit: 21,529 RAC: 0 |
A question for you, but is the number for that of thread viewed increased or altered for that of non logged-in users? Here reading the thread for accessing first, but had to login for that of posting, so here guessing that the thread counter was increased for already accessing it. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,545,204 RAC: 16,601 |
Jumping back to a checkpoint is usually very obvious (representing many hours of work lost), and I didn't notice that happening without obvious cause (such as a power failure or manual suspension). I wonder if it's something to do with hibernation causing a small jump back every evening, but not one that's big enough to notice on the progress? The longer a task runs, the more that would multiply upIt's not much repeated (or 'lost') work for the OpenIFS tasks. It's about 5-15mins depending on your CPU. It won't be obvious in the %age progress because it makes a tiny difference. As hibernate puts the contents of RAM to swap, yes, that will push the model out of memory (true for any boinc task) causing it to do a restart from checkpoint when the machine wakes up (I usually 'suspend'). However, if you only hibernate once a day, that's not going to make much difference. The task will still do plenty of work whilst the machine is awake. To run as slow as you noticed, it's got to be frequently dropping out of RAM, or, you have alot of CPU contention and the task is barely running. Perhaps try watching it on top or htop and see how much of the machine resource it gets? |
Send message Joined: 22 May 21 Posts: 39 Credit: 1,208,413 RAC: 3,997 |
Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks? Further news on proposed Mac Intel IFS tasks would be appreciated as well. Thanks! |
Send message Joined: 22 May 21 Posts: 39 Credit: 1,208,413 RAC: 3,997 |
Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks? Still nothing? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,545,204 RAC: 16,601 |
I can bring this up at the next CPDN tech meeting. Is it just your HadCM3 tasks? I can't see your computers so I can't get the task ids you've run.Back to the original question for this thread. Any news on awarding credit for completed HADCM3 tasks?Still nothing? I know they have had problems with the credit script a while ago. I'm not sure if that's the reason. I'll see if I can get an answer for you. As for the mac Intel OpenIFS tasks, I'm working on this week. It'll go to testing first though before appearing in production. --- CPDN Visiting Scientist |
Send message Joined: 22 May 21 Posts: 39 Credit: 1,208,413 RAC: 3,997 |
I can bring this up at the next CPDN tech meeting. Is it just your HadCM3 tasks? I can't see your computers so I can't get the task ids you've run. Yes. I only run cdpn on some older Intel Macs so only HadCM3 tasks. I know they have had problems with the credit script a while ago. I'm not sure if that's the reason. I'll see if I can get an answer for you. Thanks loads! This is for all the tasks I've completed since November of last year. As for the mac Intel OpenIFS tasks, I'm working on this week. It'll go to testing first though before appearing in production. Sweet! I'm looking forward to trying that out. Any OS limitations? I've seen some BOINC projects that have Intel-based tasks that will run on M1 Macs. Any chance that these OpenIFS tasks will? Again, thanks loads for bringing this up again at a tech meeting. That's as much as I could ask for. Bull |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,545,204 RAC: 16,601 |
I only have an Intel iMac running High Sierra to develop on. I assume that the M1/2 macs will use Rosetta, but until we try I honestly don't know if the code will run ok or not. It's usually a case of how good the low level system support is (for things like filesystem functions, cpu time, etc). If it doesn't run and it's not something I can fix in a week, we'll probably leave it as it's really not the highest priority.As for the mac Intel OpenIFS tasks, I'm working on this week. It'll go to testing first though before appearing in production.Sweet! I'm looking forward to trying that out. Any OS limitations? I've seen some BOINC projects that have Intel-based tasks that will run on M1 Macs. Any chance that these OpenIFS tasks will? --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Will this work unit ever get validated, or does it need an admin to intervene?It may need intervention to get the credit when Andy gets a chance but validation isn't used by CPDN. Credit is based on the trickle up files that generally go at the same time as the zips are uploaded. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,729,836 RAC: 7,099 |
Credit is based on the trickle up files that generally go at the same time as the zips are uploaded.That's the way it always used to be, but something seems to have slipped in the last three months or so. There was a credit run last last night or very early this morning (UTC, 25/26 Feb), just as I was finishing up the last of my batch 993 tasks. One task reported at 23:50 has been awarded full credit, the next reported at 04:27 still shows zero. In the 'trickle' days, that one would have received credit for the trickles received before, say, midnight. Another strange thing: my event log has an entry for 26-Feb-2023 00:51:38 [climateprediction.net] [sched_op] handle_scheduler_reply(): got ack for task oifs_43r3_01i7_2019110100_123_993_12215389_0That's task 22316800, which the server says is still in progress. The event log timing (also UTC) suggests that it was reported right in the middle of the period when I'm suggesting the credit script was running. Could that have interfered with the status update? There have been suggestions on the message boards that we currently have two different credit scripts running on different servers, an old one and a new one. But it seems to be more complicated than that. I quite understand that the project team have had their hands full with the testing and launch of the new apps, and delivering the results to the commissioning scientists in spite of problems with the upload servers. But there will come a time when - I hope - they will be able to take a step back and review the health of the project as a whole. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,545,204 RAC: 16,601 |
There have been suggestions on the message boards that we currently have two different credit scripts running on different servers, an old one and a new one. But it seems to be more complicated than that. I quite understand that the project team have had their hands full with the testing and launch of the new apps, and delivering the results to the commissioning scientists in spite of problems with the upload servers. But there will come a time when - I hope - they will be able to take a step back and review the health of the project as a whole.The 'two scripts' is a reference to the dev & production sites running different versions. The 'old' version is on the production site and the 'new' one is active on the dev site. They are not both active together. CPDN want to roll out the 'new' one to production but it will completely alter how credit is computed, so want to prepare something to go out to users first. That's as much as I know. Richard, I suspect you know more about the differences between the 'new' and 'old' boinc credit scripts than I do. I'm sure I've seen you talk about it in other posts. --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,729,836 RAC: 7,099 |
The 'two scripts' is a reference to the dev & production sites running different versions. The 'old' version is on the production site and the 'new' one is active on the dev site. They are not both active together. CPDN want to roll out the 'new' one to production but it will completely alter how credit is computed, so want to prepare something to go out to users first.Yes, those were the references I was alluding to (one script on each server, but different). But the question - in reference to bullschuck's question - becomes "How old is old?". His machines (1526736, 1519502) clearly show a problem. For tasks completed in July, trickles were displayed on the result pages, and credit was awarded - including partial credit according to the trickle reached, for tasks which didn't complete. But tasks completed in December or later aren't showing their trickles, and aren't getting any credit, either. But IFS tasks are getting credit on the production site, for completed tasks at least - even though they aren't showing their trickles. And tasks on the dev site are showing their trickles for both IFS and Hadley tasks. So we seem to have at least three scripts in play: should we call them old, middle-aged, and young? I did do some work for Milo Thurston back in the day, when we had a RAC problem on one particular application. But any knowledge I gained on that occasion is positively geriatric by comparison. That's why I'm suggesting that the time has come (subject to other constraints, which come first) for a thorough re-examination of the current situation. I'm happy to lend a hand in that process, if it would help. |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
Is it just your HadCM3 tasks? For me, its everything except openIFS, and since at least the 4th of december last year. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Another strange thing: my event log has an entry for Absolutely. I started seeing that behavior sometimes early in the hadam4h era. I've lost getting a status for several completed tasks over the last 3 or 4 years if they reported during the credit run. It doesn't always happen, but occasionally. There are others who posted about this situation as well but those posts are probably scattered among several threads. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,913,871 RAC: 16,233 |
Will this work unit ever get validated, or does it need an admin to intervene?It may need intervention to get the credit when Andy gets a chance but validation isn't used by CPDN. Credit is based on the trickle up files that generally go at the same time as the zips are uploaded. Even though there's no cross-task validation as in other projects, validation does seem to happen. I've seen tasks just reported show up as Validation Pending for a very brief period of time, under 30 sec. Perhaps some internal checks get done to make sure the result is valid and isn't tampered or corrupt in some way. That task may not have been checked for some reason or the check wasn't registered? |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,913,871 RAC: 16,233 |
That's why I'm suggesting that the time has come (subject to other constraints, which come first) for a thorough re-examination of the current situation. It's most definitely time. It's been 3 months since Hadley models stopped getting credit. From what I've been able to gather, the problem started at the end of November. There's also the RAC problem, which has been ongoing for weeks now. CPDN has a relatively small and patient user base. I'd be willing to bet that almost everyone likes credit and to a small or large degree cares about getting it. It kind of seems a bit neglectful to the user base to let credit problems be anything but a short term problems. It's the only tangible/visible thing users get out of volunteering. I know that CPDN runs on minimal resources, at the same time, when do we as users become high enough priority? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,545,204 RAC: 16,601 |
To be rather blunt, they have had other priorities trying to get OpenIFS projects run for paying customers, battling with cloud providers, battling with university IT provision (a long story I'm not at liberty to divulge). Andy is their only IT guy, and he manages the lot. I appreciate credit matters to some, but you can't spend it, can't eat it and can't take it with you, so I'd rather they spent time getting a working system that's attractive for scientists to use it. Otherwise, no-one will.That's why I'm suggesting that the time has come (subject to other constraints, which come first) for a thorough re-examination of the current situation. Unfortunately there are no tech CPDN meetings this week or next due to interviews for MSc students and prep for the international BOINC meeting. I will bring it up when I get the chance to get an answer. --- CPDN Visiting Scientist |
Send message Joined: 3 Nov 05 Posts: 26 Credit: 687,388 RAC: 529 |
When you hold the door open for someone, it's only polite for that person to say "thank you". You can't spend it, can't eat it and can't take it with you, but if someone doesn't take the trouble to thank you, then the next time around you'll likely be letting the door slam shut in their face. That's why the door to my computer is currently shut to CPDN, and those paying customers can go elsewhere. And since they are paying, how about them paying for the expensive electricity that powers all those BOINC hosts? |
©2024 cpdn.org