Message boards : Number crunching : EAS batches 1001-4
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
Thanks Glenn. |
Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128 |
Great news! Thanks! |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I believe I've successfully sent an 'abort task' for batches 1002, 1003, 1004. As it's the first time I've done this, let me know if that's not what happened! As mentioned previously, the tasks were aborted as the scientific result were found to have errors due to problems with the input files. These are being corrected now and these batches will be resubmitted (we hope to run the new revised app with these). --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
I believe I've successfully sent an 'abort task' for batches 1002, 1003, 1004. As it's the first time I've done this, let me know if that's not what happened!I had already aborted mine but the sudden drop in the number of tasks shown as running would suggest it has been successful. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Four 1001 tasks failed about the same time on my pipsqueak machine. They all ran a very long tme, making progress. My machine is like this: Computer 1512658 Computer information Total credit 341,419 Average credit 5,407.74 CPU type GenuineIntel 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz [Family 6 Model 140 Stepping 1] Number of processors 8 Coprocessors --- Virtualization None Operating System Microsoft Windows 10 Core x64 Edition, (10.00.19045.00) BOINC version 7.24.1 Memory 15.64 GB Cache 256 KB Swap space 18.02 GB Total disk space 460.73 GB Free Disk Space 307.5 GB Measured floating point speed 3.92 billion ops/sec Measured integer speed 29.31 billion ops/sec Average upload rate 195.13 KB/sec Average download rate 6044.66 KB/sec Average turnaround time 10.74 days Here is one of them: Task 22396678 Name wah2_eas25_h1kj_201312_24_1001_012232001_2 Workunit 12232001 Created 7 Feb 2024, 22:49:15 UTC Sent 7 Feb 2024, 22:50:32 UTC Report deadline 6 Jun 2024, 22:50:32 UTC Received 14 Feb 2024, 22:35:59 UTC Server state Over Outcome Computation error Client state Compute error Exit status 9 (0x00000009) Unknown error code Computer ID 1512658 Run time 5 days 15 hours 45 min 54 sec CPU time 5 days 15 hours 31 min 43 sec Validate state Invalid Credit 8,304.81 Device peak FLOPS 3.92 GFLOPS Application version Weather At Home 2 (wah2) v8.24 windows_intelx86 Peak working set size 342.78 MB Peak swap size 310.43 MB Peak disk usage 94.46 MB Stderr <core_client_version>7.24.1</core_client_version> <![CDATA[ <message> The storage control block address is invalid. (0x9) - exit code 9 (0x9)</message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5180, selfPID=13064, iMonCtr=1 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5180, selfPID=5180, iMonCtr=1 </stderr_txt> ]]> |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Thanks for reporting. Never seen that error before, but it's definitely coming from Windows not the app itself, though it might be boinc related.<message> The storage control block address is invalid. (0x9) - exit code 9 (0x9)</message> A quick google suggests it's possibly related to updates? https://stackoverflow.com/questions/61939852/windows-process-activation-service-error-9-the-storage-control-block-address-is Nothing I can do on the CPDN side but if it was me I'd reboot the machine. --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
Particularly plausible because yesterday was 'patch Wednesday' in UTC: the Wednesday after the second Tuesday of the month. That's the usual day Microsoft releases major update packages - even large security update packages for Windows 7, which is otherwise out of support. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
A resend I have picked up has this. couldn't start app: Task file wah2_8.29_windows_intelx86.exe: file missing</message>Not the virus scanner issue as it got going and has produced 7 zips gaining 407,756.40 in credit. Task ID:1549001 The computer in question seems to be trashing everything with this though often not till several zips have been sent. Issue is there both with region independent and the older app versions. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
A resend I have picked up has this. Dave, it looks like the 8.24 crashes were either signal 11 or they didn't have a stderr. For the region independent crashes, if 8.29 doesn't exist, how does it start the task, or the next RI task it crashes, etc.? Very strange. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
if 8.29 doesn't exist, how does it start the task, or the next RI task it crashes, etc.? Very strange. One of my guesses is either an intermittent disk or memory issue but that will probably remain a theory without access to the machine! |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Dave, I presume you mean hostid: 1549001, not task id? It's an AV issue. Probably the app started ok first time, uploaded a few zips, then on a restart the AV scanner decided it didn't like it (or its ruleset updated, or it was turned off before). Shame I can't tell what AV system killed it. McAfee have accepted the exe as a false positive and fixed the problem, waiting for a reply from Norton. A resend I have picked up has this. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
Yes, that is what I meant. Yes that makes sense. Running Linux the only AV I use is ClamAV and I have yet to have an issue. Indeed the only things it has thrown up are a couple of dodgy emails that have scraped through my ISP's system. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
On two different machines I am getting Error reported by file upload server: Server is out of disk space.I have sent an email to Andy. Edit: If this continues for more than a day or so, it may be worth halting tasks to prevent the disk_bound error that happens when the size of the task on disk exceeds that which has been allowed in the setup files. Unless I am mistaken, Andy will have to let someone in Korea know unless they see my post on the Trello Board for the batch. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I've contacted the project scientist directly. He's quicker at dealing with the IT people in S. Korea. p.s. we should probably make this a separate thread as it also affects batches 1006 & 1007. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
Thanks Glen.Now 6PM in Korea. I don't know what hours he or his IT support people work. I guess we will get a clue in how quickly it gets sorted. |
©2024 cpdn.org