Message boards : Number crunching : no credit awarded?
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
There have been times in the past when credit hasn't shown despite zips being on the website but most of them have been when there are problems with the credit script having fallen over or not been restarted after an event of some kind. There have also been times when the credits have appeared despite zips not showing on the task pages, presumably because the problem occurs after the processes to display them and the ones to go into the credit script separate.I think that's just a simple matter of timing. The original system had two scripts - one to copy the trickles to a place where they could be seen on the website and used in credit calculations: and the other to work out the actual credit and RAC. They both took several hours to run, and the first had to finish before the second one started, otherwise some hosts got missed (that was another problem). One script ran on an interval basis: "every 24 hours (then) since the project had last been restarted". The other ran as a cron job: "at hh:mm o'clock every day". If emergency maintenance meant that the project had to be restarted at an unusual time of day, those timings could clash, and credit was erratic until the staff could get round to an orderly, planned, restart - with a check that every component was active, and running in the right sequence. Until the next time ... I don't know what the current mechanism is supposed to be: just that it doesn't appear to be going to plan. If my offer to take a look is taken up, I suppose the first question is: "can you supply me with a schematic flow-chart of the expected credit system as it stands now?". If they don't have one to hand, then drawing one up would be a useful first step. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
One script ran on an interval basis: "every 24 hours (then) since the project had last been restarted". The other ran as a cron job: "at hh:mm o'clock every day". If emergency maintenance meant that the project had to be restarted at an unusual time of day, those timings could clash, and credit was erratic until the staff could get round to an orderly, planned, restart - with a check that every component was active, and running in the right sequence. Until the next time .That's a bizarre way of doing it. If there's a dependency between the scripts either they should both be in the same cron job or there should be a trigger completed for the 2nd script to fire. Andy did tell me the credit script had been disrupted way back at the beginning of the year - he didn't go into details. That might be part of it but it doesn't seem to be working now either. --- CPDN Visiting Scientist |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
I'm not sure what you mean as I don't know what went on behind scenes but I got my first OIFS tasks on 28 November of last year, and my first un-credited Hadley tasks reported as competed on 30 November. So what I see is that OIFS production release happened around the same time as Hadley models stopped getting credit. It just seems like there just might be a connection there somehow.The problem seems to have started around the same time OIFS got released. OIFS is credited as all or nothing, and trickles never show up on the website. Hadley is credited per trickle and they (should) show up on the website.It's not related to when OpenIFS was released. OpenIFS tasks first went out years ago. The models know nothing about each other, the controlling code is completely different between the two (though that might be the cause of the problem). OpenIFS first appeared on the production CPDN site in 2020. There is a paper in the scientific literature based on the results from those batches. Then there was a long pause when the model was updated but small batches were released prior to the very big batches we saw end of last year. There has been no change to the way trickles are handled from the task/client side since 2020. I think the issues are at the CPDN server end. |
Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,490,630 RAC: 21,281 |
The original system had two scripts - one to copy the trickles to a place where they could be seen on the website This script, or whatever was supposed to replace this script, clearly isn't working as seen with the "No trickle!" on website. Based on the 11. August 2022 batch of WAH2 work, since trickles did work in August but not in December (then original issue errored-out), it doesn't look like any mis-configuration of the actual wu. Instead, some possibilities includes: 1: Trickle script can't copy to directory, due to accidentally write-protected directory or directory physically full or "full quota" or accidentally lost access rights. 2: The ini-file responsible for where trickle-script should copy trickles was changed to point to new directory, but neither web-pages or credit-script was updated to new directory. 3: Trickle-script stuck on a specific trickle and even if re-started get stuck on the same "bad" trickle. 4: Updated or re-configured BOINC server and "forget" to extract trickle information from scheduler, or extract to "wrong" directory from where trickle script expect. 5: Since apparently OpenIFS does not rely on trickles for crediting, incorrectly assumed didn't need to copy trickles any longer. Note, chances are then the problem with trickles not showing-up on web-page is fixed the credit will also be fixed on next credit run (unless overlooked the example where "recent" trickles does show on web-page but still no credit). |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
That's a bizarre way of doing it.I'm probably referring back to around late 2009 (that's when I last looked in detail at credit), or even earlier. To me, it smells like a quick'n'dirty kludge, thrown together in the early days of the project (and of BOINC), to bridge the gap between two parts of an incomplete system. Never expected, or intended, to be still running 20 years later with today's vastly quicker flow of results from modern tech. Did you refer to the history of David Anderson's involvement with BOINC, that he alluded to at the start of his talk to the workshop? The section on CPDN is illuminating, though I don't trust David's recall of history - my name appears later on in the blog, and the roles he ascribes to me are broadly accurate, but that's an amendment after Jord appealed. I still don't recognise myself. But here's the CPDN section, for what it's worth: When we released the BOINC-based version of SETI@home to the public, there was a lot of backlash. People don't like change in general, and they didn't like the complexity of BOINC. We lost a big fraction of our volunteer base; it went from 600K to 300K or something like that. Myles Allen, Climateprediction.net, and OxfordNote that the role of the BBC in promoting the early, pre-BOINC, stage of CPDN's life has escaped David's notice. |
Send message Joined: 12 May 05 Posts: 34 Credit: 1,413,736 RAC: 2,585 |
It's been many years since I managed CPDN WU's. Moved my Linux AntiX VM onto a server to let it do some climate modeling work. And I'm still puzzled because the last conversation I had with Andy was that only trickles (for OpenIFS) are awarded credit, not completion. But I see what you mean. Richard's offered to take a closer look. Next time there's a tech meeting I'm in I'll bring it up, it will be more effective that way than myself or the moderators sending emails. I have 4 OpenIFS marked valid, the trickles were all uploaded successfully (according to their log and my client event log), yet they all have 0 credit. Not sure how to tell if the WU are only partially completed and the rest of the work went to another machine. But seeing that most of these lines are at less than 100% I guess means the model isn't completed and was moved on: STATS FOR ALL TASKS NUM ROUTINE CALLS MEAN(ms) MAX(ms) FRAC(%) UNBAL(%) 0 CNT0 - COMPLETE EXECUTION 1 ********* ********* 100.00 0.00 1 CNT4 - FORWARD INTEGRATION 1 ********* ********* 99.98 0.00 8 SCAN2M - GRID-POINT DYNAMICS 3200 14521.4 14521.4 43.02 0.00 9 SPCM - SPECTRAL COMP. 2952 1842.4 1842.4 5.03 0.00 10 SCAN2M - PHYSICS 2953 9882.9 9882.9 27.02 0.00 11 IOPACK - OUTPUT P.P. RESULTS 247 6811.2 6811.2 1.56 0.00 12 SPNORM - SPECTRAL NORM COMP. 126 82.3 82.3 0.01 0.00 13 SCAN2M - RADIATION CALC. 985 82359.3 82359.3 75.10 0.00 14 SUINIF 1 14351.2 14351.2 0.01 0.00 17 GRIDFPOS IN CNT4 247 362.0 362.0 0.08 0.00 18 SUSPECG 1 3399.0 3399.0 0.00 0.00 19 SUSPEC 1 3468.4 3468.4 0.00 0.00 24 SUGRIDU 1 7905.6 7905.6 0.01 0.00 25 SPECRT 1 1461.0 1461.0 0.00 0.00 26 SUGRIDF 1 1516.0 1516.0 0.00 0.00 27 RESTART FILES - WRITING 123 13675.8 13675.8 1.56 0.00 28 RESTART FILES - READING 1 0.0 0.0 0.00 0.00 29 SU4FPOS IN CNT4 247 1.4 1.4 0.00 0.00 30 DYNFPOS IN CNT4 247 17375.5 17375.5 3.97 0.00 31 POSDDH IN STEPO 13 36.4 36.4 0.00 0.00 37 CPGLAG - SL COMPUTATIONS 2953 -53919.1 0.0 0.00 147.40 38 WAM - TOTAL COST OF WAVE MODEL 2952 23517.5 23517.5 64.27 0.00 39 SU0YOMB 1 1564.1 1564.1 0.00 0.00 51 SCAN2M - SL COMM. PART 1 2953 59.5 59.5 0.16 0.00 54 SPCM - M TO S/S TO M TRANSP. 2952 367.6 367.6 1.00 0.00 55 SPCIMPF - S TO M/M TO S TRANSP. 2952 82.1 82.1 0.22 0.00 56 SPNORM - SPECTRAL NORM COMM. 126 1.3 1.3 0.00 0.00 102 LTINV_CTL - INVERSE LEGENDRE TRANSFORM 10094 1333.5 1333.5 12.46 0.00 103 LTDIR_CTL - DIRECT LEGENDRE TRANSFORM 6152 1427.9 1427.9 8.13 0.00 106 FTDIR_CTL - DIRECT FOURIER TRANSFORM 6152 228.6 228.6 1.30 0.00 107 FTINV_CTL - INVERSE FOURIER TRANSFORM 10094 233.5 233.5 2.18 0.00 140 SULEG - COMP. OF LEGENDRE POL. 2 127.7 127.7 0.00 0.00 152 LTINV_CTL - M TO L TRANSPOSITION 10094 59.8 59.8 0.56 0.00 153 LTDIR_CTL - L TO M TRANSPOSITION 6152 64.6 64.6 0.37 0.00 157 FTINV_CTL - L TO G TRANSPOSITION 10094 78.6 78.6 0.73 0.00 158 FTDIR_CTL - G TO L TRANSPOSITION 6152 65.8 65.8 0.37 0.00 400 GSTATS 589499 0.0 0.0 0.00 0.00 401 GSTATS HOOK 564603 0.0 0.0 0.00 0.00 TOTAL MEASURED IMBALANCE = 0.0 SECONDS, 0.0 PERCENT TOTAL WALLCLOCK TIME 108019.4 CPU TIME 503376.7 VECTOR TIME 503376.7 From Richard's last comment; I'm guessing the new credit script failed to catch these 4 WU's or the scripts haven't run since they completed about 11 hours ago. I see nothing abnormal to this run and so rather than starting a new thread, guess this should be added to this conversation. https://www.cpdn.org/result.php?resultid=22315582 -------- For Glenn Carver: My BOINC credit was awarded coins, which was translated into fiat dollars of $1800 that bought 3 used rack servers that went into making me a more productive member of the BOINC community. Proof of work coins are still viable and the electricity goes into actual science work (and heating my home), like finding primes, but not currently climate research... which is a shame.. All labor that human hands, and minds do, must become paid work as AI rises to take over more duties and we may eventually need to "pay" the AI's, so human wages can compete with their "wages". Their wages will need to goto charities, or to fund basic monthly income for humans, as they take over more employment. Chat bots are already making inroads into help desk duties. Wealth disparities can lead societies to civil wars https://phys.org/news/2014-06-rich-poor-gap-civil-war.html and the disparities are growing, and that's not just the pandemic's effect So yeah, I at least want a cookie, or some credit, for my time spent on these WU's. And great, you all found some people willing to pay for the modeling services. If they are paying then send some of those funds our way because managing 400-800 BOINC computing cores is human work, not an AI's, yet... I'm 60, with a physics degree, yet looking at never being able to retire, and needing to work till I die. And if you think my anger isn't appropriate then stop making disparaging comments about users who like to get simple tokens of credit for the time, which is worth money, spent running your research...It's like I tell believers; "If you don't want your religion criticized then don't bring it up". Do not tell us we should not even worry about getting credit. We deserve credit and we also deserve cash for our labor. |
Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,490,630 RAC: 21,281 |
Note that the role of the BBC in promoting the early, pre-BOINC, stage of CPDN's life has escaped David's notice.While I did run the pre-BOINC CPDN client, I can't remember BBC mentioned here, but then again it's roughly 20 years ago. The "special" BBC CPDN experiment that started in 2006 on the other hand did use BOINC. BTW, now maybe my recollection is too fuzzy, but after the BBC experiment shut-down, didn't once-upon-a-time these BBC credits show-up here as a separate field on individual user's pages? I just checked and didn't see such a field. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
Richard wrote: Did you refer to the history of David Anderson's involvement with BOINC....here's the CPDN section, for what it's worth:Yes, exactly, what rubbish. David clearly has an issue with CPDN. What he doesn't know is there were issues with Carl's behaviour which I am not at liberty to talk about. If you look at the list of publications from boinc projects in the scientific literature (https://boinc.berkeley.edu/pubs.php), CPDN stands 2nd with 140 publications; only Rosetta has more and most boinc projects publish a lot less. Given how much effort & time it takes to get a boinc project up and running, that's a poor return on grant money for alot of boinc projects. If I was still on grant panels, I'd want to see a better publication record. Scientific publications are still a key measure of scientific impact. I really don't know what basis or measure David A. has for his comments. It reads badly for him frankly, it looks like sour grapes on his part. I sat in on the online boinc workshop last week, it was not great. It came across as a bunch of older men patting themselves on the back, led by David A. The only highlight was the talk by the Prof introducing BlackHoles@Home, but he highlighted the various issues projects still have using boinc. It's a shame, boinc is a great idea but needs to be steered better from the front. Anyway, this is drifting off topic, I agree the implementation of trickles & credit smells like a kludge. I think we've seen the software is not very robust in places. I am hoping to get the chance to chat to CPDN next week about this. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I have 4 OpenIFS marked valid, the trickles were all uploaded successfully (according to their log and my client event log), yet they all have 0 credit.Those tasks only completed yesterday. The credit script only runs once a week so they should be credited some time late Saturday or early Sunday. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The "special" BBC CPDN experiment that started in 2006 on the other hand did use BOINC.That is what got me started with CPDN though the loss of an email means I now have a different user name. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
I was possibly misled by a page I pulled up during an earlier conversation with Glenn: http://news.bbc.co.uk/1/hi/sci/tech/3100024.stm A page dated September 2003 says: A massive worldwide online effort to predict how the global climate will change this century is being launched in the UK.I assumed that was the start of the Beeb's editorial backing of the project as part of its educational support services, but I may have conflated two separate events. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
OK, back to credit issues, and specifically the breakdown of credit awards for Hadley tasks in late 2022. I find one of AndreyOR's computers very helpful in isolating the start of this event: Filtered list of HadSM4 at N144 resolution tasks for computer 1526028, page 5 It is clear that tasks reported on Tuesday 29 Nov 2022, up to 20:29:05 UTC, have been granted credit. Tasks reported on Wednesday 30 Nov 2022, from 10:56:28 UTC, have not. Because it happened mid-week, it's unlikely to be a strict "credit script" event: it would most likely have become visible at a weekend, if that was the case. And looking at individual sample tasks, trickles disappeared from the task display in the same time interval. So I think it's more likely to be a problem introduced into the trickle transfer stage of the process. Switching to trickles I've captured on my own machines at various times. These are from September 2014, and different task types, but they illustrate the flow. A trickle starts life as an XML file of the project's directory: <variety>year</variety> <wu>hadcm3s_1aby_2001_2_008988784</wu> <result>hadcm3s_1aby_2001_2_008988784_1</result> <ph>1</ph> <ts>51840</ts> <cp>187137</cp> <vr>7.24</vr> <ppname> trickle_hadcm3s_1aby_2001_2_008988784_1_2003.zip</ppname> <pplen> 110326</pplen> <ppdataz> 0MT $0! " " DJ=O4$_U^CWDV4 00!& , <' H%&9CUV,S]5,A)6>?)#,P$S7 R\%,P@3.X@S-X0S7Q\5;E%F;A]E,P S,?!'9NXV831D8 P+ ( PNX<.(C1&8 ... [snip] ... </ppdataz>This gets copied by BOINC into a "sched_request" message to the project server. I'll ignore the ppdata to save space. <msg_from_host> <result_name>hadcm3s_1aby_2001_2_008988784_1</result_name> <time>1410789211</time> <variety>year</variety> <wu>hadcm3s_1aby_2001_2_008988784</wu> <result>hadcm3s_1aby_2001_2_008988784_1</result> <ph>1</ph> <ts>51840</ts> <cp>187137</cp> <vr>7.24</vr> ... [pp fields snipped] ... </msg_from_host>Note that at this stage, we only know the result by name: it has to matched up by the server with the full result record in the database, which is keyed by ResultID number. I'm suspicious that this may be where our problems start. At this stage, I have to switch to a Linux machine for the next part of the story. Be right back ... |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
One click on the KVM button later... Here's a version of that final <msg_from_host>, recorded from an IFS_bl task a couple of weeks ago. <msg_from_host> <result_name>oifs_43r3_bl_a27b_2016092300_15_991_12209642_0</result_name> <time>1676364290</time> <variety>orig</variety> <wu>oifs_43r3_bl_a27b_2016092300_15_991_12209642</wu> <result>oifs_43r3_bl_a27b_2016092300_15_991_12209642_0_r863024831</result> <ph></ph> <ts>864000</ts> <cp>17458</cp> <vr></vr> </msg_from_host>The pp fields are no longer used, and a couple of others are blank, but I doubt that matters. But please compare carefully the tag <result>. In the old hadcm3 tasks, that's identical to the <result_name> tag added by BOINC. But in IFS, it's been extended by _r863024831 - used in the upload file names. IF (and that's a very big if) CPDN were relying on <result> to match a trickle to its ResultID, that would be a point of failure. It's the first smoking bearing in a very big machine. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
Richard, thanks. This all looks promising, bottom line though is that this conversation needs to be had with Andy/CPDN. No-one on the forums will be able to progress this. There's a tech meeting Monday. I will show them and ask if you can help out. If you have any other input send me a PM. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,885,708 RAC: 18,983 |
OpenIFS first appeared on the production CPDN site in 2020. There is a paper in the scientific literature based on the results from those batches. Then there was a long pause when the model was updated but small batches were released prior to the very big batches we saw end of last year. There has been no change to the way trickles are handled from the task/client side since 2020. I think the issues are at the CPDN server end. Ok. That's before my time here so that's why I didn't know about it. I believe you didn't show up on the forums until last year too so to me it seemed like OIFS just started at CPDN last year, although I did see evidence on the website that its arrival has been in the works at least. I've always assumed that the problem is at the CPDN server end. With my comments, I was thinking that possibly the arrival of OIFS disrupted some things with trickles and credit handling by CPDN servers, not that there's an issue with the model or BOINC client. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,885,708 RAC: 18,983 |
I have a trickle pending from a task of the latest OIFS run, here's some current data on what Richard was talking about. The entire contents of trickle_up_oifs_43r3_001t_2019110100_123_993_12213503_0_1677879929.xml file: <variety>orig</variety> <wu>oifs_43r3_001t_2019110100_123_993_12213503</wu> <result>oifs_43r3_001t_2019110100_123_993_12213503_0_r1949673894</result> <ph></ph> <ts>10623600</ts> <cp>84718</cp> <vr></vr> What I think is the relevant section of the sched_request_climateprediction.net.xml file: <msg_from_host> <result_name>oifs_43r3_001t_2019110100_123_993_12213503_0</result_name> <time>1677877810</time> <variety>orig</variety> <wu>oifs_43r3_001t_2019110100_123_993_12213503</wu> <result>oifs_43r3_001t_2019110100_123_993_12213503_0_r1949673894</result> <ph></ph> <ts>10368000</ts> <cp>82660</cp> <vr></vr> </msg_from_host> |
Send message Joined: 22 May 21 Posts: 39 Credit: 1,197,645 RAC: 4,143 |
Richard, thanks. This all looks promising, bottom line though is that this conversation needs to be had with Andy/CPDN. No-one on the forums will be able to progress this. There's a tech meeting Monday. I will show them and ask if you can help out. If you have any other input send me a PM. Any report from the tech meeting on Monday? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,476,460 RAC: 15,681 |
Richard is engaged with CPDN to isolate the problem. I'm sure he'll report here when there's more info.Richard, thanks. This all looks promising, bottom line though is that this conversation needs to be had with Andy/CPDN. No-one on the forums will be able to progress this. There's a tech meeting Monday. I will show them and ask if you can help out. If you have any other input send me a PM.Any report from the tech meeting on Monday? --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
Yes, I've written to Andy (who was busy with the BOINC workshop yesterday), and requested a specific chunk of data which will help us localise where the problems start. Once I receive that, I can work out whether we need to search forwards or back to the source of the trouble. It'll take several steps, and I won't keep up a running commentary, but I'll let you know when we make any significant change that may be observable in your own accounts. |
Send message Joined: 22 May 21 Posts: 39 Credit: 1,197,645 RAC: 4,143 |
Yes, I've written to Andy (who was busy with the BOINC workshop yesterday), and requested a specific chunk of data which will help us localise where the problems start. Once I receive that, I can work out whether we need to search forwards or back to the source of the trouble. Any updates yet? Please forgive me if I'm being a pest. Thanks! |
©2024 cpdn.org