Our new platform is already available at www.gandi.net

Go to the new Gandi

One of our storage equipment is experiencing slow input/ouput access (I/O).

All the virtual servers with a disk on this filer are impacted. Our technical staff is working on identifying the problem and to fix it as soon as possible.

As the incident is on a filer, we recommend that you do not attempt to reboot or restart your server.

 

12h30 CET : incident start

14h10 CET : end of the  incident

 

We apologise for any inconvenience.


A maintenance operation will be undertaken on the hosting service today (Thursday 8 December 2011) between 13:00 and 14:00 UTC.  During this time, operations on the hosting platform (stop, start, restart, create, etc.) will be suspended.   This maintenance activity will not impact already running servers.

 

We apologise for any inconvenience.


One of our older generation storage systems encountered a fault.  Our teams are on site investigating.  We recommend that customers refrain from issuing start/stop/restart operations if their servers are unresponsive during this incident.  The customer servers will regain access to their disks after the fault has been resolved.


One of our hosting filer units has stopped responding, likely due to a hardware fault.  We recommend that you do not attempt to restart your server if it is not responding, and to wait until the incident is resolved.  Our teams are on site to investigate and we will keep you informed of developments.

 

Update: 11:38 CET (10:38 GMT):  The issue involves a control head on our old storage system.  A fault on the disk controller has resulted in an interruption of service.  We are currently recovering the storage volume.

 

Update 12:12 CET (11:12 GMT): We have corrected a kernel bug which in the event of a hard fault on the controller, will enable use to resolve the situation rapidly.  We will be restarting the service shortly.

 

Update 13:01 CET (12:01 GMT):  We have restarted the service on this filer.  We are monitoring the controller for now, and will apply a patch during the afternoon which will incur a brief outage of this filer which is not expected to last more than a minute.


We have experienced a partial power failure at Equinix PA2 at 10:07 CEST this morning.  The power failure lasted a few seconds, but had a knock-on effect on some equipment, notably some older generation equipment with single power supply causing them to reboot, and lose network connectivity with some backend services.

 

Our teams are working to restore all affected services.


The problem encountered yesterday on a hosting filer is occuring on another unit since 13:45 (GMT).  We had planned a maintenance window to apply yesterdays patch across all storage unites, but due to this urgent situation we will be applying the patch immediately to the affected filer, and then on the remaining ones as quickly as possible.

 

Once again we apologise for the inconvenience caused.

 

 

14:30 GMT (7:30 AM CET): The patch is installed, and the filer is rebooting


14:38 GMT (7:38 AM CET): The filer has rebooted, we are inspecting the affected servers

 

15:17 GMT (8:17 AM CET): The service has returned to nominal operation and this emergency maintenance has been completed.


In view of the last two incidents, we are going to proceed with an urgent preventative maintenance operation on the platform's other storage hardware. Please do not reboot your servers during the maintenance: after 15 to 20 minutes of I/O loss, your service will return automatically.

Please accept our apologies for any inconvenience this may cause.


We have detected an anomaly on a hosting filer impacting several customer servers.  Our teams are currently working to resolve the issue as soon as possible.  We will update this notice as more information becomes available.

 

14:20 (GMT) / 10:20 (EST): We are still looking for the root of the problem before restarting your servers.

 

15:45 (GMT) / 11:45 (EST): Unfortunately at this stage we have no additional information available to relay.  Our entire team is mobilised to identify the cause of the problem and restablish service as soon as possible.

 

17:00 (GMT) / 12:00 (EST): The attempt to transfer to the backup storage controller did not yield a satisfactory result.

 

18:30 (GMT) / 13:30 (CET):  We have identified two or three potential sources of the problem, and our teams are attempting to apply the appropriate kernel patches.  The problem is centered around disk-write operations.  The "bug" appears to be known by Sun, but so far, not the solution.

 

20h30 (GMT) / 15h30 (EST) : Still working on the issue.  Some disks now function, but not all of them.  Unfortunately we still do not yet have an ETA to communicate, but we know that it will take several more hours. :(

 

20:50 (GMT) : 15:50 (EST) : A new kernel is being compiled currently and we will reboot the filer after the new kernel is installed.  (watch this space...)

 

23:00 (GMT) : 18:00 (EST) : The new Kernel is compiled and currently tested on a stage filer. Once tested, we will apply it on the broken storage unit.

 

00:00 (GMT) : 19:00 (EST) : Victory ! Filer seems to be back and running properly. We will restart all servers and monitore them to see if everything is ok. A detailled report will be sent tomorrow to all clients involved. Thank you for your patience.

 

We apologise for the inconvenience.


An incident has occured on one of our EQX (Paris) storage nodes, staff is working on it.

We'll keep you informed with this issue.

 

Update 14:03 CEST : During this filer maintenance, all hosting operations of the virtual servers (start/stop/update resources, ...) have been disabled for the moment.

Update 15:10 CEST : The filer is now stable, virtual server have now access to their disk on this filer. Hosting operations currently waiting (start/stop/conf/update/...) are now being processed.

Update 16:05 CEST : All the hosting operations in progress are now finished. All the virtual servers are now reachable.



Page   1 2 37 8 9
Change the news ticker size