Following the discovery of an intermittent but serious network issue, Gandi teams have determined that a rolling maintenance will be necessary.
We regret that, while most of the affected systems will simply require migration and no interruption in service, some will almost certainly require restarting. We will endevor to make these interruptions as short as possible, and only perform them when absolutely necessary.
We are starting with dc0 (Paris) this week. We will proceed on to dc2 (Luxembourg) on Monday, June 9.
The issue is not detected on dc1 (Baltimore) at the moment but, if necessary, we will proceed to fix it there. We apologize for any inconvenience this may cause.
* 08:25 UTC 12 hosting nodes are made inaccessible due to a switch failure. ~200 Virtual machines (VMs) are made unreachable.
* 08:40 UTC Switches are recovered and VMs are once again accessible. Investigation does not reveal cause of incident.
* 12:01 UTC A second incident occurs, affecting 8 nodes and ~180 VMs.
* 12:09 UTC Switches are recovered, VMs are made available again. Additional data collection measures are put in place to help determine cause.
* 14:56 UTC A third incident occurs, affecting 10 nodes and 321 VMs.
* 15:10 UTC Nodes and VMs are available again. This time extensive forensic data is made available, and we expect to find the root cause and execute a permanent fix, which will be implemented as soon as possible.
We do apologise for the inconvenience this issue may have caused.
We are facing problems reaching the .eu registry servers. The corresponding domain availability search service is not available. Our teams are working on the matter.
Some physical machines hosting IaaS VMs are unreachable due to an issue we are currently analyzing.
We are fixing the issues and restarting the VMs on other physical machines as soon as possible.
Thank you not to do any operation on your VM until the emergency maintenance is ongoing. and not finished.
At the end of the maintenance, if your server does not respond well, please contact our hosting support team by mail using the 'blocked server' option.
Sorry for the inconvenience this issue may cause to you.
UPDATE 12h20 : the situation is back to normal, we are still analyzing the metrics and the logs.