Message boards : Number crunching : Upload failures
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 19 · Next
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Ah, Jasmin. That's at Oxford, and it's working fine for me. So it looks like it's something at your end. And DON'T go switching to the secure url while you have tasks for the project, or you'll lose the lot. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I now have 50 uploads queued 3700 MB of data. That may be the problem. There's a limit to either the number of files, or the amount of data that BOINC is happy with. If it IS the problem, then the cure is painful: 1. Suspend network access 2. Suspend each and every one of the tasks in the Tasks tab. (To stop more files from being created.) 3. Create a temporary folder somewhere nearby. 4. Move all but 4-5 of the cpdn zip files to this folder. The ones left should be the lowest numbered zips. 5. Resume network access and see if the zips left behind upload OK. 6. If so, move 4-5 of the zips back to their normal place, and upload them. 7. Repeat. 8. UnSuspend all of the tasks. If this doesn't work, post here again, and we'll all go down to the pub for a few beers and a good winge. |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
hmm thats strange i have never had these problems before. I had a computer with 3G modem attached and connected that once a month to upload. I then had hundreds of tasks to upload and did not run into any problems. That was a year or two ago though.. I have to try this then.. |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
I must also say that im running a proxy CCproxy to be able to upload. But it has been working fine before though.. :) Since the problem started i tried to use both HTTP proxy and SOCKS proxy but same error. Download of new tasks works fine still though. |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
I tried this now. But removeing the zip files does not remove them from the transferlist... I tried stopping and restaring the boincmgr and the boinc service and no difference. Also there seem to be problems reaching jasmin-upload.cpdn.org to upload files. I moved the files from this location: C:\ProgramData\BOINC\projects\climateprediction.net |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
During the offline period of the project, one of my machines killed 3 safr50 WU with the following error. <![CDATA[ <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 11 received: Segment violation Signal 11 received: Software termination signal from kill Signal 11 received: Abnormal termination triggered by abort call Signal 11 received, exiting... 03:20:25 (12508): called boinc_finish(193) Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2576, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=13092, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_ain::Monitor... 03:20:29 (13092): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>wah2_safr50_a0lb_201512_13_817_011859012_0_r844198882_9.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> I guess BOINC killed them because upload failures which I noticed last week, but were enable to check the full log of the machine. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
I guess BOINC killed them because upload failures which I noticed last week, but were enable to check the full log of the machine. I don't think so, Segmentation violation is a program problem. In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system the software has attempted to access a restricted area of memory.Wikipedia Some model types seem much more prone to this than others but I don't think anyone has really worked out what is causing it. The problem may be somewhere in the met office code that CPDN uses under license. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Mephist0 It's been perhaps 10 years since this trick of reducing the number of zips in the Transfers queue was last used. I guess that the way BOINC works has changed a lot since then. As for not being able to reach the server Jasmin, could you please post the line from the Event log that says this. |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
Sure.. Here it is.. One thing that is strange.. <file_size>.. Says 1310720 and 2610815 but the files are around 74MB big.. Seems wrong? Also the filetransfer looks strange.. When it looks finished the tranferbar jumps to around 30% and after that it finishes.. Looks like it does some kind of resume.. Is it possible to reset the resume function and not use that? Could that be the problem? It tries to resume the files but Jasmin does not have the files any longer? 2019-06-11 15:06:57 | | Resuming network activity 2019-06-11 15:06:57 | climateprediction.net | [fxd] starting upload, upload_offset -1 2019-06-11 15:06:57 | climateprediction.net | Started upload of wah2_sam50_n088_201412_24_814_011846908_0_r734526431_1.zip 2019-06-11 15:06:57 | climateprediction.net | [file_xfer] URL: http://jasmin-upload.cpdn.org/cgi-bin/file_upload_handler 2019-06-11 15:06:57 | climateprediction.net | [fxd] starting upload, upload_offset -1 2019-06-11 15:06:57 | climateprediction.net | Started upload of wah2_sam50_n088_201412_24_814_011846908_0_r734526431_6.zip 2019-06-11 15:06:57 | climateprediction.net | [file_xfer] URL: http://jasmin-upload.cpdn.org/cgi-bin/file_upload_handler 2019-06-11 15:07:00 | climateprediction.net | [file_xfer] http op done; retval 0 (Success) 2019-06-11 15:07:00 | climateprediction.net | [file_xfer] parsing upload response: <data_server_reply> <status>0</status> <file_size>1310720</file_size></data_server_reply> 2019-06-11 15:07:00 | climateprediction.net | [file_xfer] parsing status: 0 2019-06-11 15:07:00 | climateprediction.net | [fxd] starting upload, upload_offset 1310720 2019-06-11 15:07:00 | climateprediction.net | [file_xfer] http op done; retval 0 (Success) 2019-06-11 15:07:00 | climateprediction.net | [file_xfer] parsing upload response: <data_server_reply> <status>0</status> <file_size>2610815</file_size></data_server_reply> 2019-06-11 15:07:00 | climateprediction.net | [file_xfer] parsing status: 0 2019-06-11 15:07:00 | climateprediction.net | [fxd] starting upload, upload_offset 2610815 2019-06-11 15:09:34 | | Project communication failed: attempting access to reference site 2019-06-11 15:09:34 | climateprediction.net | [file_xfer] http op done; retval -184 (transient HTTP error) 2019-06-11 15:09:34 | climateprediction.net | [file_xfer] http op done; retval -184 (transient HTTP error) 2019-06-11 15:09:34 | climateprediction.net | [file_xfer] file transfer status -184 (transient HTTP error) 2019-06-11 15:09:34 | climateprediction.net | Temporarily failed upload of wah2_sam50_n088_201412_24_814_011846908_0_r734526431_1.zip: transient HTTP error 2019-06-11 15:09:34 | climateprediction.net | Backing off 02:21:39 on upload of wah2_sam50_n088_201412_24_814_011846908_0_r734526431_1.zip 2019-06-11 15:09:34 | climateprediction.net | [file_xfer] file transfer status -184 (transient HTTP error) 2019-06-11 15:09:34 | climateprediction.net | Temporarily failed upload of wah2_sam50_n088_201412_24_814_011846908_0_r734526431_6.zip: transient HTTP error 2019-06-11 15:09:34 | climateprediction.net | Backing off 00:08:45 on upload of wah2_sam50_n088_201412_24_814_011846908_0_r734526431_6.zip 2019-06-11 15:09:34 | climateprediction.net | [fxd] starting upload, upload_offset -1 |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
I also tried to downgrade BOINC to 7.12.1 (x64) without success :( |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I've emailed Andy about this. |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
Ok thank you. I could also delete the project and add it as https instead. Maybe that's what causing the problem... Then i have to delete the completed work though. But ill wait for the response first... :) |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
In the meantime i try to run World Community Grid tasks to see that my proxy server is working correctly when uploading results to them. It has been working correctly before though with ClimatePrediction... |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
WCG worked fine: 2019-06-12 15:03:16 | | Resuming network activity 2019-06-12 15:03:37 | World Community Grid | [fxd] starting upload, upload_offset -1 2019-06-12 15:03:37 | World Community Grid | Started upload of MIP1_00197817_1787_0_r2014039693_0 2019-06-12 15:03:37 | World Community Grid | [file_xfer] URL: https://upload.worldcommunitygrid.org/boinc/wcg_cgi/file_upload_handler 2019-06-12 15:03:38 | World Community Grid | [file_xfer] http op done; retval 0 (Success) 2019-06-12 15:03:38 | World Community Grid | [file_xfer] parsing upload response: <data_server_reply> <status>0</status> <file_size>0</file_size></data_server_reply> 2019-06-12 15:03:38 | World Community Grid | [file_xfer] parsing status: 0 2019-06-12 15:03:38 | World Community Grid | [fxd] starting upload, upload_offset 0 2019-06-12 15:03:39 | World Community Grid | [file_xfer] http op done; retval 0 (Success) 2019-06-12 15:03:39 | World Community Grid | [file_xfer] parsing upload response: <data_server_reply> <status>0</status></data_server_reply> 2019-06-12 15:03:39 | World Community Grid | [file_xfer] parsing status: 0 2019-06-12 15:03:39 | World Community Grid | [file_xfer] file transfer status 0 (Success) 2019-06-12 15:03:39 | World Community Grid | Finished upload of MIP1_00197817_1787_0_r2014039693_0 2019-06-12 15:03:39 | World Community Grid | [file_xfer] Throughput 53790 bytes/sec 2019-06-12 15:03:42 | World Community Grid | Sending scheduler request: To report completed tasks. 2019-06-12 15:03:42 | World Community Grid | Reporting 1 completed tasks 2019-06-12 15:03:42 | World Community Grid | Not requesting tasks: "no new tasks" requested via Manager 2019-06-12 15:03:43 | World Community Grid | Scheduler request completed |
Send message Joined: 2 Apr 05 Posts: 16 Credit: 19,177,656 RAC: 13,684 |
Over 12GB of Upload is waiting: 12.06.2019 22:34:41 | climateprediction.net | [error] Error reported by file upload server: can't write file wah2_safr50_n2af_199512_13_820_011867777_1_r1455705735_3.zip: No space left on server So we have lots of new Workunits but the Server disks are full !?! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Crashtest Which of the dozen servers is it? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Mephist0 Apparently "Jasmin" isn't a single server, it's a data center. They ARE having problems, and our IT people are liaising with their IT people about it. No idea of when it will be "fixed". |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I am in the same boat. It just started in the last hour apparently. Only one is stuck for me thus far. How do I determine which server? 16178 climateprediction.net 6/12/2019 5:13:24 PM Started upload of wah2_safr50_n13c_201512_13_819_011864198_0_r670590459_5.zip |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This is in the client_state.xml file. Look for the file name, in the upload section. CAREFULLY. And if you look at the posts below by Mephist0, you can see that it's in one of the BOINC "flags". Probably [file_xfer], as that's at the start of the lines. Event Log options by the look of it. Also, I think that this space problem triggers an email alarm to the project people, so I won't do anything in the middle of their night. |
Send message Joined: 14 Jun 10 Posts: 2 Credit: 6,475,083 RAC: 32,116 |
If you look at the project stats, there hasn't been any succesfull upload for more than 2 weeks, since 25.06.2019 : https://boincstats.com/en/stats/2/project/detail/lastDays |
©2024 cpdn.org