Message boards : Number crunching : Upload problem
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 6 Apr 05 Posts: 17 Credit: 744,057 RAC: 0 |
| What bugs me about it is that I live 15 minutes away from Oregon State. Well, now I have to apologize to the folks at OSU. It seems that it's not their upload server that's having the problem: Sat Sep 11 20:50:13 2010 climateprediction.net [file_xfer_debug] URL: http://climateapps1.oucs.ox.ac.uk/cgi-bin/file_upload_handler Sat Sep 11 20:50:14 2010 [http_debug] [ID#22] Info: timeout on name lookup is not supported Sat Sep 11 20:50:14 2010 [http_debug] [ID#22] Info: About to connect() to climateapps1.oucs.ox.ac.uk port 80 (#0) Sat Sep 11 20:50:14 2010 [http_debug] [ID#22] Info: Trying 163.1.13.16... Sat Sep 11 20:50:21 2010 [http_debug] [ID#22] Info: Connection refused Sat Sep 11 20:50:21 2010 [http_debug] [ID#22] Info: Failed connect to climateapps1.oucs.ox.ac.uk:80; No such file or directory Sat Sep 11 20:50:21 2010 [http_debug] [ID#22] Info: Expire cleared Sat Sep 11 20:50:21 2010 [http_debug] [ID#22] Info: Closing connection #0 Sat Sep 11 20:50:21 2010 [http_debug] HTTP error: Couldn't connect to server =Mike |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The last zip file, (13), which is created about 10 minutes after the last zip that goes to OSU, goes to Oxford, as it contains the data to join it to the next in the series for that parameter set. |
Send message Joined: 6 Apr 05 Posts: 17 Credit: 744,057 RAC: 0 |
Thanks ... good to know. Also, I saw the post about the server. BTW: Who authorized giving Milo weekends off? :D =Mike |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
I've now put a small NAS unit in the server room where climateapps1 is stored and I'm slowly copying data to it. This is not an ideal solution but it is small and cheap so I was actually able to get hold of it in a matter of days rather than months. Hopefully the server can be re-started later today. |
Send message Joined: 29 Dec 09 Posts: 34 Credit: 18,395,130 RAC: 0 |
Uploads are working well but Boinc Manager has not updated credits. I'm not very concerned about it but I don't recall seeing this with all CPDN servers running. Is this part of the current issue? |
Send message Joined: 12 Nov 09 Posts: 5 Credit: 6,176 RAC: 0 |
depuis plusieurs semaines, je n'arrive pas à faire le transfert des résultats. Voici les messages régulièrement reçus :
21/09/2010 21:32:53 climateprediction.net Started upload of famous_u0y8_599_200_006634083_0_17.zip 21/09/2010 21:34:23 Project communication failed: attempting access to reference site 21/09/2010 21:34:23 climateprediction.net Temporarily failed upload of famous_u0y8_599_200_006634083_0_14.zip: HTTP error 21/09/2010 21:34:23 climateprediction.net Backing off 1 hr 34 min 3 sec on upload of famous_u0y8_599_200_006634083_0_14.zip 21/09/2010 21:34:26 Internet access OK - project servers may be temporarily down. 21/09/2010 21:34:43 Project communication failed: attempting access to reference site 21/09/2010 21:34:43 climateprediction.net Temporarily failed upload of famous_u0y8_599_200_006634083_0_17.zip: HTTP error
21/09/2010 21:34:45 Internet access OK - project servers may be temporarily down.
|
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Credits are not related to uploads (except that if you can't upload your trickles you won't receive credit for them until you do!). There's a script that runs once a day and puts our credit into our accounts. Another script, which also runs once a day though I think at a different time, exports a record of our credit to the external stats sites like BoincStats. Occasionally these scripts have to be disabled because other jobs are being done on the server, or someone turns a script off then forgets to reenable it. The data will all be there, though, not lost. I don't think Milo has access to all this at the moment so we may just have to be patient. Cpdn news |
Send message Joined: 23 Apr 05 Posts: 1 Credit: 398,113 RAC: 0 |
Hi, I do have upload-problems here (for some days now): 27.09.2010 07:11:01 climateprediction.net Started upload of hadam3p_pnw_v3bl_1993_1_006722916_0_13.zip 27.09.2010 07:33:07 Project communication failed: attempting access to reference site 27.09.2010 07:33:07 climateprediction.net Temporarily failed upload of hadam3p_pnw_v3bl_1993_1_006722916_0_13.zip: HTTP error 27.09.2010 07:33:07 climateprediction.net Backing off 3 hr 54 min 30 sec on upload of hadam3p_pnw_v3bl_1993_1_006722916_0_13.zip 27.09.2010 07:33:09 Internet access OK - project servers may be temporarily down. Trickles from other WUs running on my machine were uploaded without any problem. Does somebody have an idea how to solve it? |
Send message Joined: 4 Nov 06 Posts: 10 Credit: 1,717,441 RAC: 0 |
I thought it better to post here rather than start another "upload trouble" thread... I am having the same problems as legolas; I have 10 zip files sitting here that have been trying to upload for around a fortnight now. I stopped client, created cc_config file, restarted and read config file for the HTTP option but it had no effect. I have also rebooted this machine. Here are the models, versions and filenumbers ( 3 different WUs ): All models are Famous 6.11 files _5, _12 (x2), _13 (x2), _14 (x2), _15 (x2), _16, _20. 2 of these Famous models are completed (1 says it is ready to report, 1 says it is uploading) and on "computation errored" out at 160 odd hours. Another model is almost complete. Any idea how I can get the work/ files up to the servers as CPDN is one of my fave projects and I dont want to stop crunching it. Thanks Veebee |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Dear Veebee: Check the server status page. The server has been down for the past 3 days. The Scheduler, transitioner and the feeder are all not running so you cannot report completed tasks or get new ones. Read more in the "News and Announcements" tread at the top in Number Crunching. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
The completed model can't report until climateapps2 has all (or more) of its programs up and running, but both these FAMOUS v.11 models should be able to upload all their files in spite of the outage. All these files should upload to kraken which has had no recent outages. Veebee, I think this must be a problem at your end, not with the server. The timeout on file uploads was lengthened to 90 days so there's no rush from that point of view but I think each file is only allowed 100 upload attempts. Don't keep repeating manual retries (the Retry Now button) until someone can suggest more ideas. Cpdn news |
Send message Joined: 4 Nov 06 Posts: 10 Credit: 1,717,441 RAC: 0 |
Quote from mo.v: Veebee, I think this must be a problem at your end, not with the server. end quote. I don't know ... I have two "identical" machines (i7-920's) and they are both crunching and up/ downloading other projects without issue. THIS machine is one I downloaded a few extra Climate models on to cover work shortages on a chosen project, the other only has the one model running but hasn't (as yet) had a zip file sit there unable to upload. (mind you, that one is a HADSM3 slab model - just had a "close look" nearly 1200 hours so far !!! :O ) I shall avoid manually retrying uploads on them, but I am having that sinking feeling that all that CPU time is gonna be wasted .. :`( BTW: two of the zip files Do say they are at 100% uploaded and a few of the pothers get to a certain point and stop... |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Veebee wrote: Any idea how I can get the work/ files up to the servers as CPDN is one of my fave projects and I dont want to stop crunching it. If BOINC is making simultaneous attempts to upload the files it's possible that you're hitting a 5 minute inactivity timeout on the files. That's most likely if you have a relatively slow connection, are restricting BOINC's upload bandwidth or have a busy connection (e.g. more than one computer attempting a large upload at the same time or a large non-BOINC file transfer on the same computer). I've found that when BOINC is doing multiple uploads it has a tendency to favour the most recently started upload. That can result in nothing being sent for uploads which are already in progress until the more recent upload has completed. If an upload is locked out in this way for longer than 5 minutes it is timed out and has to be restarted. The restart offset is negotiated with the server but very frequently the server seems to have lost track of how much has already been received (possibly something is causing it to delete the data it has already received?) and restarts from 0. The only way I've found of getting round this is to restrict the number of simultaneous uploads (the default is 8 in total and no more than 2 per project) by including the following in cc_config.xml: <cc_config> Depending on your mix of projects you might need to increase <max_file_xfers> (setting it to 1 is possible, but that would prevent other projects from uploading results until all of your CPDN files have been sent). "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,534,575 RAC: 13,216 |
The timeout on file uploads was lengthened to 90 days so there's no rush from that point of view but I think each file is only allowed 100 upload attempts. Don't keep repeating manual retries (the Retry Now button) until someone can suggest more ideas. I'm not aware of any limits on #retries, and a quick test reveals that manually increasing to 11000 connection-attempts had no effect, the upload just kept retrying as before. Worth remembering, since many CPDN-users still seems to use old BOINC-clients, is that the increase to 90-day is only for the v6.10.xx and later clients. As for checking for connection-problems, the 1st. is always to re-boot any modems, routers and so on, and to re-boot the affected computer. If this doesn't work, try creating/edit a cc_config.xml (placed in BOINC data-directory) containing minimum the following lines: <cc_config> <log_flags> <file_xfer_debug>1</file_xfer_debug> <http_xfer_debug>1</http_xfer_debug> </log_flags> </cc_config> And just select to "Read config file" in BOINC Manager. Keeping <file_xfer_debug> always enabled is an advantage, since you'll always get info about which upload-server is tried connected, making it easy to check with the server status-page if this server is down, and you don't need to manually search-through client_state.xml to get this info. The transfer-speed will also be logged if the transfer was successful. The 2nd. option on the other hand will create much extra info, so disabling it again after fixing the problem is recommended. To disable, just change the 1 to a zero, and re-read config-file. A couple other <log_flags> that possibly also can be useful is: <http_debug>1</http_debug> <proxy_debug>1</proxy_debug> edit - I see Gundolf Jahn also did mention some of the log-flags earlier in the thread. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,860,147 RAC: 4,891 |
[Veebee wrote:] ...(mind you, that one is a HADSM3 slab model - just had a "close look" nearly 1200 hours so far !!! :O )That model, hadsm3dhet2_k8ob_006620893_7, has become a slow-processing 'iceworld'. Painful though it might be at this stage, the best thing to do with that model is to abort it: it will finish eventually, but the data from the freeze point onwards is invalid. Some efforts have been made to find the cause, which is so far proving elusive. |
Send message Joined: 29 Dec 09 Posts: 34 Credit: 18,395,130 RAC: 0 |
Read Milo's announcement but I still encounter failed downloads for the past several days. Everything else is Ok. Started download of atmos_v3xe_1199_200_006736082_0.gz Project communication failed: attempting access to reference site Temporarily failed download of atmos_v3xe_1199_200_006736082_0.gz: HTTP error Thanks |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Don't know what's what's wrong but I had two files remaining, partially downloaded. After a day or two, I realized that each boinc attempt downloaded a small bite of bytes. Because the remaining files were relatively small and already partially downloaded and didn't restart each time, I decided to click 'Retry Now'... and click... and click.... Eventually, the downloads finished. (Pathetic way to get the job done, actually.) Why the server permitted that bit of foolishness, when it refused to complete the transaction on its own, one can only guess. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 29 Dec 09 Posts: 34 Credit: 18,395,130 RAC: 0 |
[quote]I decided to click 'Retry Now'... and click... and click.... Eventually, the downloads finished. (Pathetic way to get the job done, actually.) Thanks AstroWX. This would confirm it is a problem with the CPDN download server. I clicked several times also but my confidence in the downloaded files was low for not being corrupted and possibly wasting computing time so I regrettably aborted them and held off on downloads until a resolution to this issue or a confirmation the models will behave properly to the end. |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
I know that there's a problem with downloads from climateapps2 and I am working as best I can to fix it. I cannot say how many days it will take to fix as it depends upon many factors. |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
Hiro was able to extract a disk from his cluster, which I have used to add more space to climateapps2. So, there should now be no problem with downloads. |
©2024 cpdn.org