Message boards : Number crunching : Uploads not working
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Oct 05 Posts: 44 Credit: 4,011,789 RAC: 8,782 |
Getting the following: 8/31/2012 12:31:24 AM | climateprediction.net | Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip Checking the server status page, one of the upload servers shows as not running, the other 2 upload servers are up. Things are trickeling, but data can't upload.... |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Getting similar errors but not many waiting uploads so far. Server status page shows "uploader1.atm" as down. Staff probably aware already - it's already after 9AM in the prime time zone. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,703,308 RAC: 9,860 |
Information from staff: The hard disk running the operating system on uploader1.atm has failed and needs to be replaced. We have ordered a new disk which will arrive on Monday and be installed on that day. So at the moment this machine is shut down and won't be up-and-running until Monday, I am afraid. That will affect, at least, the intermediate (_1 to _12) file uploads for EU regional models, possibly others too. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
"The hard disk running the Operating System" WTF? This is one of the looniest postings I've ever seen here. Any serious server installation has at least a mirror of the OS for backup or alternative boot and OS on whatever of several physical drives -- whether IBM mainframe or my local mini-cluster or the cloud we are all expected to trust, or a lousy backup boot partition on Linux. "The hard disk running the OS" what could that hard disk possibly be? Are we trusting all this compute power to the power of the "C: drive" And how would replacing the bare disk fix the loss of the OS -- Sorry for the rant, but the explanation makes no sense whatsoever at all - and makes the support team there look like total idiots - which I know they are not. Yes - the compete explanation would cover a lot of techie stuff that would bore most of us to tears -- but the nonsensical explanation posted is -so -dumb. Me -- sometimes the project has problems - as far as I can see the problems get fixed within a week -- no data ever lost. Last 6 years or so. I keep on contributing -- no regrets. But "need an OS disk to keep running" - Sorry about that but is so idiotic -- could have been a totally uninformed politician posting that. Please don't BS us who contribute. Maybe - "the team waits for hardware to fix the problem" might be plausible -- "Need an OS disk" obviously makes fools of us all. In any case- keep on crunching - the crew have done wonders - and keep on doing so -- \ But - nonsensical pretend explanations of problems are losers in the long run. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,009,815 RAC: 21,293 |
Assuming a raid system, if one of the hard disks had failed, it might well shut down as a precaution, if the second disk in a 2 disk raid system also went that might cause data loss so they would be awaiting a new disk to rebuild the array. I have never used raid, just been rather paranoid about backing up important stuff so this is purely based on my reading not experience lol. Dave |
Send message Joined: 30 Jan 12 Posts: 38 Credit: 10,197,388 RAC: 0 |
Now the South African download server is down, why doesn't that surprise me? The techs at Oxford could care less about this project. The whole worlds watching them, I hope they never put it on their r�sum�. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Actually, I believe that the techs on this project are doing a very good job. The limited funding for the research puts them in a position where they can't have what most of us "techies" just assume is normal. They have to do the best they can with what they've got, and that's not a lot. Mirrored drives for the OS - we see that's not true. Spare disc drives just laying around or online already waiting for a problem - obviously not so. Redundant SAN with no SPOF anywhere and automatic failover to a backup system - at least a year or two worth of storage waiting on-line already? Don't think so. Maintenance contract with (big database company that will fix any problems in 24 hours provided that you have enough spare backup hardware pre-certified?) Heh- all that could be fixed with less than 25 million euros - rough guess. Maybe 50. (not counting the service contracts with the vendors) The tech support at the project are supporting - not only the hardware - but more important and invisible to us volunteers - they are supporting the access to the work we have done - the database - for researchers worldwide. Understaffed, overworked, with more job demands than anything I ever did as a techie. (Hardware, software, database, application expertise - that would be at least 8 FTEs at even the cheapest shop I ever worked in) My earlier rant about the ongoing problems with servers should be interpreted as me venting my frustration with the whole situation - NOT as an accusation of the understaffed and underfunded crew. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,009,815 RAC: 21,293 |
Totally agree Erik! Two Techies there to do the job. If they had your estimate of eight and they were the same quality as those they have and those eight had the money to buy the hardware they wanted ........ I don't think we would see many of the problems we do....... Or maybe they would just try and do 4 times as much, succeed and still get as many complaints? Dave |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Uploads are working slowly - expect will catch up next 3-4 hours. Thx Dave - yeah volunteer here a few years the temporary failures of hardware are annoying but no big deal - wait a few days or week at worst and all the work gets uploaded and distributed eventually. Nothing ever lost. Once happened that a misconfig and load of crap wu's got my goat by wasting my limited bandwidth , that was a while ago. Main point is - most contributors never notice a week's downtime on the upload server. Last time I looked the "top -- whatever" - computers - they were wasting wu's a mile a minute - So - thanks - let's keep the osmolality of the effluent minimal when we post here, and keep on crunching -- it's worth doing. Apologize for any flaming I've done. And - to all - complain, bitch and worry -- if there's ever a problem -- it might be an old moldy problem - but it might be a new problem - and reporting such a problem might very well save all of us volunteers a lot of wasted effort - So - If you read this board - all complaints are welcome !! :):) - the Mods welcome the chance to help all problems !! :): Actually, they do help a lot -- thanks PS - I am not MOD, never will be, but thanks to them all |
Send message Joined: 20 Dec 04 Posts: 6 Credit: 4,055,041 RAC: 0 |
7 Sept 2012, 05:36 UTC; upload disk full error message started to appear 6 Sept 2012 at 22:23 UTC Server status page indicates server is up and running Just thought you would like to know. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Thanks. Confirming what you reported. Same here. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,009,815 RAC: 21,293 |
I am getting the same on an eu model. saf model which goes to a different server is fine. They should be starting work about now in Oxford so I assume we will see some action this morning. Dave |
Send message Joined: 24 Jan 06 Posts: 5 Credit: 435,756 RAC: 0 |
Confirming that I am also getting upload failures repeatedly. In itself that does not worry me, but it does chew through my upload quota at a great rate. Is tehre any way to disable the upload for a while? (This is a completed task, and I have other tasks running, so I do not want to just disable network traffic.) |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Confirming that I am also getting upload failures repeatedly. In itself that does not worry me, but it does chew through my upload quota at a great rate. Is tehre any way to disable the upload for a while? (This is a completed task, and I have other tasks running, so I do not want to just disable network traffic.) You could "disable network activity" on one of the tabs in the manager -- BUT -- seems that uploads are working again, so try that option later. OH gorgonzola and other cheeses -- so overwhelmed with backlog uploads now -- just wait a few hours. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,009,815 RAC: 21,293 |
Just to confirm that an eu zip file went through at 10:54 on one machine and two more have gone through since so issue seems resolved apart from my curiosity - in the past when the disk has filled up it has taken several hours to transfer the data before the disk has come back on line again. Seems suspiciously quick for it to have really filled up. Dave |
Send message Joined: 3 Oct 06 Posts: 43 Credit: 8,017,057 RAC: 0 |
could redirecting the url for the uploadhandler in the hosts file to say 127.0.0.0 be an option? |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Problems with uploader1 both up and down . Friday of course. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Problems with uploader1 both up and down . Friday of course. I let the project people know, but like you say it's Friday. Hopefully it'll get fixed early next week. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,009,815 RAC: 21,293 |
My three waiting uploads have all gone, however the server keeps going back to red every so often on the server status page. Dave. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Yup - the server goes on and off. Has uploaded a few dozen files from here. All what I worry about is if the uploads get lost - however many days it takes to get the job done is not a problem. Losing data is the possible problem - but that has never happened as far as I know - long delays happen when server is catching up. I run 6 machines - right now 3 have network disabled - the other 3 are uploading slowly from time to time. Won't enable network for the other 3 until the online ones clear their queues. Might be a while. The important thing is not to lose the uploads. Patience is a virtue. |
©2024 cpdn.org