Message boards : Number crunching : The uploads are stuck
Message board moderation
Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · Next
Author | Message |
---|---|
Send message Joined: 14 Sep 08 Posts: 127 Credit: 43,925,559 RAC: 52,842 |
Reminder to reset <ncpus> tag in cc_config.xml if you changed it This reminds me of some other project's trick. They grant additional credits if tasks are returned within X days to disincentive excessive hoarding. No matter how people fake it, whether it's bogus core count or multiple clients per machine, they can't fake the actual compute throughput of the machine. Given CPDN has credit granting script run once a week instead of continuously, it might even be possible to adjust for upload server downtime if necessary. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
This reminds me of some other project's trick. They grant additional credits if tasks are returned within X days to disincentive excessive hoarding. Back when my machine could always get work from CPDN and WCG (and Seti@home), I had at least 0.35 days of work and about 0.65 days additional work in my preferences. Now that Seti@home is gone, WCG is sort-of back up, and CPDN is erratic in work availability, my machine is set to at least 0.50 days of work and 1.5 days of additional work, and that 1.5 days is not really enough. When the upload server went down, I mostly let my machine keep crunching and I got around 20 completed tasks before they started uploading again. I do not think of that as hoarding. IIRC, some of the more recent "classical" CPDN work took around 8 days to complete a task, and in the distant past (and on slower machines) tasks could take several months. But I do not think, with the new Oifs tasks, there is much point grabbing a month's supply because they would time-out before I could get around to them. I usually leave my machine up 24/7. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
This reminds me of some other project's trick. They grant additional credits if tasks are returned within X days to disincentive excessive hoarding. Yup, I'm an old-timer here, times change, the new models have mucho memory needs, but we don't have to do interim backups any more (yet) because to decades ago models took months to complete and we didn't want to waste a quarter of a workunit after a couple weeks. Naah, we'll adapt to the new. And I've noticed, and bought on the price decline, some ECC UDIMMS for my current hosts, to good effect. |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
Just started getting this. Last time it all worked normally a few minutes later so not contacting Andy just yet. Mon 30 Jan 2023 12:17:25 GMT | | Project communication failed: attempting access to reference site Mon 30 Jan 2023 12:17:25 GMT | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0348_1989050100_123_958_12174992_1_r944602151_121.zip: transient HTTP error Mon 30 Jan 2023 12:17:25 GMT | climateprediction.net | Backing off 00:02:15 on upload of oifs_43r3_ps_0348_1989050100_123_958_12174992_1_r944602151_121.zip Mon 30 Jan 2023 12:17:25 GMT | climateprediction.net | Started upload of oifs_43r3_ps_0085_1992050100_123_961_12177729_2_r1618474980_1.zip Mon 30 Jan 2023 12:17:26 GMT | | Internet access OK - project servers may be temporarily down. Edit: Now going through again. Edit2: All uploads here now stuck again. Edit3: If anyone confirms it isn't just me i will message Andy. |
Send message Joined: 7 Aug 04 Posts: 10 Credit: 148,100,750 RAC: 29,951 |
I have the same problem |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
I have the same problemI have now messaged Andy. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,818,575 RAC: 4,615 |
Started here around quarter past twelve: 30/01/2023 12:14:07 | climateprediction.net | Started upload of oifs_43r3_ps_0783_2001050100_123_970_12187427_1_r1449012272_86.zip 30/01/2023 12:20:00 | climateprediction.net | Backing off 00:02:04 on upload of oifs_43r3_ps_0783_2001050100_123_970_12187427_1_r1449012272_86.zipI don't know why it waited 6 minutes before backing off - the timeout is usually two minutes. I'll look at http_debug if it persists. Edit - nothing more to see with debug - just the timeout. |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
Thanks, yes this server is inaccessible at the moment and I cannot get in through there online portal. I have emailed their helpdesk. |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,322,658 RAC: 1,085 |
(on the avoidance of overly deep work buffers at client computers) wujj123456 wrote: This reminds me of some other project's trick. They grant additional credits if tasks are returned within X days to disincentive excessive hoarding. No matter how people fake it, whether it's bogus core count or multiple clients per machine, they can't fake the actual compute throughput of the machine. Given CPDN has credit granting script run once a week instead of continuously, it might even be possible to adjust for upload server downtime if necessary.GPUGrid used to apply different credit if the turnaroud time of a result was <24 h, 24…48 h, or >48 h. I am not sure if they are still doing it. AFAICS the corresponding FAQ vanished from their message board. Folding@Home practically prevents work buffering in their client, and apply an extremely nonlinear credit based on turnaround time (to a degree which is ridiculous; little credit is given to work done, much credit is given to the speed at which the work was done). — GPUGrid did this/ F@H does this because newer workunit batches are built based on results from previous workunit batches, presumably. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,818,575 RAC: 4,615 |
Yes, they are. Their presentation of it is: Standard credit if task is reported 48 hours or more after issue. Plus 25% bonus if returned between 24 hours and 48 hours of issue. Plus 50% bonus if returned within 24 hours of issue. Some of their tasks follow each other in sequence - the results of one task are used to create the starting conditions of the next, up to five times. They say they need to complete the task sequences for the entire cohort quickly, in order to make the research ready for publication. They also award badges - usually some years after the computing is completed - to recognise the contribution of each volunteer to each publication. |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
Back to the uploads. As we are now well past the time the support desk for JASMIN and last I heard Andy can't look at the machine using the on-line portal, I would guess the chances of seeing uploads resume before tomorrow afternoon is remote. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,818,575 RAC: 4,615 |
My task has finished now, and can simply sit and wait until the server is ready. I'm doing other work on the machine now - including a GPUGrid task, as it happens - so networking will remain active, and it'll keep trying as the backoffs end, until it gets through. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
If anyone confirms it isn't just me i will message Andy. I see you have already done this. I just got a work unit and it is in trouble right away. N.B.: Times are EST. Mon 30 Jan 2023 11:25:51 AM EST | climateprediction.net | Starting task oifs_43r3_ps_0171_2001050100_123_970_12186815_1 Mon 30 Jan 2023 11:34:17 AM EST | climateprediction.net | Started upload of oifs_43r3_ps_0171_2001050100_123_970_12186815_1_r1404095512_0.zip Mon 30 Jan 2023 11:36:17 AM EST | | Project communication failed: attempting access to reference site Mon 30 Jan 2023 11:36:17 AM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0171_2001050100_123_970_12186815_1_r1404095512_0.zip: transient HTTP error Mon 30 Jan 2023 11:36:17 AM EST | climateprediction.net | Backing off 00:02:59 on upload of oifs_43r3_ps_0171_2001050100_123_970_12186815_1_r1404095512_0.zip Mon 30 Jan 2023 11:36:19 AM EST | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 29 Oct 17 Posts: 1052 Credit: 16,817,940 RAC: 12,877 |
Yes, they are. Their presentation of it is: It's a nice idea but bear in mind that CPDN are stupidly under-resourced and therefore highly unlikely to create work for themselves. I'll mention it and see what response I get, credit is actually quite a pain for a project to have to manage from what I've learnt. |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,617,787 RAC: 9,624 |
I'll mention it and see what response I get, credit is actually quite a pain for a project to have to manage from what I've learnt.Returning valid results for the researchers to analyze is the important matter, credit is a 'nice to have'. Others may think otherwise, but I won't mind if the credit update is rare or doesn't happen. |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
I'll mention it and see what response I get, credit is actually quite a pain for a project to have to manage from what I've learnt.Returning valid results for the researchers to analyze is the important matter, credit is a 'nice to have'. Others may think otherwise, but I won't mind if the credit update is rare or doesn't happen. I tend to agree. For myself, credit is more about another tool to spot problems than anything else. But I get that for some it is a motivating factor. |
Send message Joined: 12 Apr 21 Posts: 318 Credit: 15,031,602 RAC: 4,207 |
I'll mention it and see what response I get, credit is actually quite a pain for a project to have to manage from what I've learnt.Returning valid results for the researchers to analyze is the important matter, credit is a 'nice to have'. Others may think otherwise, but I won't mind if the credit update is rare or doesn't happen. Some kind of accounting of work done both in relative and absolute measure is important I'd argue. I have little doubt that at one point or another everyone looked at their credit and standings even if to just get an idea of the level of contribution over time. It's pretty normal behavior and I have no doubt that everyone would be asking for some way of knowing of how much one's contributed if there was no such measure provided at all. I agree that complicated credit system is unnecessary and a weekly credit run is fine. Unlike other projects, on CPDN task counts don't get erased from when you first join so you can actually tell how many tasks you've processed and calculate your error rate long term. I think that's kind of a nice thing. Although a cluttered and somewhat uninformative Project Status page is the trade-off, is my guess. Badges might be nice as they'll provide incentive to likely many people. Although, I'd rather see a number of other things improved first before concerning with badges. |
Send message Joined: 29 Oct 17 Posts: 1052 Credit: 16,817,940 RAC: 12,877 |
News from CPDN/Andy. JASMIN maintenance issues are ongoing, no update on when it will be up yet. But it's out of CPDN's hands. CPDN are actively looking at ANZ server upload issues to work around recent reports of problems there. p.s. also not fussed about credit -- but I do like a badge or two.. :) |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
I'll mention it and see what response I get, credit is actually quite a pain for a project to have to manage from what I've learnt.Returning valid results for the researchers to analyze is the important matter, credit is a 'nice to have'. Others may think otherwise, but I won't mind if the credit update is rare or doesn't happen. Rather than the absolute credit count (although that is nice as well) I find the RAC a more useful measure of how well the system is performing - not that I’m hinting or anything :-) |
Send message Joined: 29 Oct 17 Posts: 1052 Credit: 16,817,940 RAC: 12,877 |
Rather than the absolute credit count (although that is nice as well) I find the RAC a more useful measure of how well the system is performing - not that I’m hinting or anything :-)As the RAC is a long-term average, month or two I think (someone correct me if I'm wrong), it's not a great measure of how well the system is performing. |
©2024 cpdn.org