Message boards : Number crunching : Upload problems
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Caught one 6.10.58 BOINC client Ubuntu 2.6.32-30-generic #59-Ubuntu SMP Tue Mar 1 21:30:46 UTC 2011 x86_64 GNU/Linux All 12 uploads corrupted with <url>http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/file_upload_hnndler</url> Noticed that happens only on my Intel machines, not AMD -- too small a sample to be significant. this wu |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Gaaaaargh -- This is driving me nuts! I looked again at the source code at BOINC but can't see head nor tail. ONE lousy character in the xml gets changed, sometimes. One some machines.With some models, not others. Sometimes. Only linux. Maybe only intel. But not always. Some models are susceptible, but no way to figure it. And now when I want to do another test my only other Core 2 snarfed a hadcm3n while I wasn't looking --good -- need to do those, but -- So -- this is one weird problem -- What to do? |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
OK trying SIMAP and Einstein |
Send message Joined: 12 Sep 04 Posts: 34 Credit: 1,017,702 RAC: 0 |
My feeling is to agree that it is not merely a PNW issue. Welcome Jonathan! From what I can gather, the problem seems particularly severe in PNW tasks for Linux. Would it be possible to disable the distribution of these work units to Linux clients until resolved? I have tried editing the client_state.xml file to no avail. Does the client make any contact with the URL http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/ at any stage prior to upload? Could it be an error emanating from them? Warped |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,498,085 RAC: 21,454 |
From what I can gather, the problem seems particularly severe in PNW tasks for Linux. Would it be possible to disable the distribution of these work units to Linux clients until resolved? This could fairly easily be done with making a customized plan-class, but with the current less-than-optimal staff-situation wouldn't expect this to happen at this point. I have tried editing the client_state.xml file to no avail. Does the client make any contact with the URL http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/ at any stage prior to upload? Could it be an error emanating from them? The upload-server is only accessed then tries to upload a file, not before, so with the corruption being present at the time task was 1st. downloaded to client it has nothing to do with connections to upload-server. As for editing client_state.xml, make sure you've completely exited BOINC-client before trying to edit the file, this includes exiting any BOINC-service or whatever it's called under Linux, if not the info in client_state.xml will just be overwritten with new info as BOINC runs. If you have exited BOINC, and afterwards edited client_state.xml and the wrong URL somehow gets re-created on next start of BOINC, this would be very interesting, since it's much easier to test-out things that happens with just a re-start of client, and not something that only happens on 1st. download of a task... |
Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
The problem does occur with PNWs, but it's maddeningly intermittent. I got 6 PNWs on the 22nd. I have "grepped" my client_state.xml and there are no misspellings of file_upload_handler. I have set up a cron job to regularly check client_state.xml for misspellings of "file_upload_handler". Details: all 6 of my PNWs are re-dispatched tasks (after the first or second client returned an error). One of them has started processing, another should start in 3 days, then 3 more the day after that. Core i7-2600, Arch Linux 64 bit, my own glibc 2.13 compiled with -O2 -march=native -m32. |
Send message Joined: 12 Sep 04 Posts: 34 Credit: 1,017,702 RAC: 0 |
As for editing client_state.xml, make sure you've completely exited BOINC-client before trying to edit the file, this includes exiting any BOINC-service or whatever it's called under Linux, if not the info in client_state.xml will just be overwritten with new info as BOINC runs. Hi Ingleside. For security reasons I run BOINC in a user account, not as root. It is only possible to edit the client_state.xml file when logged in as root. I tried everything I could think of, including rebooting and going straight to the root account in order to edit the file, making sure that BOINC was not running at all. I also edited the client_state_prev.xml file. I am sure that I was able to correct all instances of the misspelled "hnndler" in both files. Hence my suspicion that the file is somehow updated from the website in Oregon. I aborted the task and have changed my preferences to run only the Southern Africa tasks, which happen to be the only type now available. This is running fine, except that the graphics file seems to have been corrupted on download. I can see that the model has not turned to an ice world but that is all (no timestep or other information). This is not a major issue as I can view the checkpoint progress via "Properties" in BOINC. |
Send message Joined: 12 Feb 08 Posts: 66 Credit: 4,877,652 RAC: 0 |
Did you make sure that the Boinc Client was not running? The command "sudo /etc/init.d/boinc-client status" tells you whether the Client is running or not. You can stop or start it by replacing status with "stop" or "start". This works in Ubuntu and should be something similar in other flavours of Linux. |
Send message Joined: 23 Oct 09 Posts: 1 Credit: 1,171,675 RAC: 0 |
I am having this problem on Windows 7 BOINC 6.10.60 (and the previous version) There are 14 data sets waiting to upload. Typical messages: 4/30/2011 5:50:55 AM climateprediction.net Started upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip 4/30/2011 5:52:14 AM Project communication failed: attempting access to reference site 4/30/2011 5:52:14 AM climateprediction.net Temporarily failed upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip: HTTP error 4/30/2011 5:52:14 AM climateprediction.net Backing off 14 min 38 sec on upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip 4/30/2011 5:52:15 AM Internet access OK - project servers may be temporarily down. 4/30/2011 5:57:24 AM Project communication failed: attempting access to reference site 4/30/2011 5:57:24 AM climateprediction.net Temporarily failed upload of hadam3p_pnw_yx88_1967_1_006898128_0_3.zip: HTTP error 4/30/2011 5:57:24 AM climateprediction.net Backing off 1 hr 47 min 33 sec on upload of hadam3p_pnw_yx88_1967_1_006898128_0_3.zip 4/30/2011 5:57:25 AM Internet access OK - project servers may be temporarily down. 4/30/2011 6:18:12 AM climateprediction.net Sending scheduler request: To send trickle-up message. This has been going on since the database upgrade started. Lewis Shadoff, Ph.D. Lake Jackson, TX |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
That sounds very like the problem I described (and gave a workaround for) here Lewis. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Alternatively, it could be a virus scanner problem as described in this thread. |
©2024 cpdn.org