Our new platform is already available at www.gandi.net

Go to the new Gandi

We will be doing maintenance on our webmail during the afternoon and evening of December 17, in order to update the Roundcube webmail version for all email users.

 

Between 15:00 and 19:00 Pacific time, webmail will not be accessible. Email delivery and email client access will continue normally.

 

We apologize for the inconvenience, and look forward to bringing you an updated version of our webmail client.

Update 2013-12-17 19:15 PST: The maintenance is complete.


An incident is currently underway on our Simple Hosting platform (Paris datacenter only).
 
The reason for the incident is not immediately clear; we are investigating.  Please don't launch any operations on the instance for the moment.
 
Updates will be posted here as soon as we have more information.
 

Update Tue Dec 10 21:37:01 CET 2013: This issue has been resolved. Please accept our apologies for the brief period of inavailability.


Simple Hosting instances located in our Baltimore data center only may be currently experiencing issues. Our technical staff is investigating the issue. Please do not perform any operations on your instance in the meantime.

This post will be updated as the situation evolves.

Update 00:51:20 CET:

A member of our technical staff is currently onsite in Baltimore to address the problem.

Update 01:35:13 CET:

The issue has been resolved. Services should be now operating normally.



The incident of November 11th is part of a series of incidents over the past few weeks caused by the gateway units, which provide Internet access for the Simple Hosting instances.
The Simple Hosting platform has experienced a number of different issues, principally with the gateway equipment, which seems to be the weakest link in the architecture. It is suject to:
  • HSRP instability causing short interruptions in connectivity,
  • Saturation of NAT translation tables as a result of a number of factors, including DDoS and Customer Activity, 
  • High CPU usage under certain conditions.
What will Gandi do to fix the situation, replace this gateway and improve the Simple Hosting product ?
  • Replace the network equipment which provides the gateway to Internet for the Simple Hosting product with more powerful appliances, and greater numbers of units (scaling). The new units will better handle the current load and will support the growth of Simple Hosting instances in the near future,
  • Set up a deeper level of monitoring to better detect technical problems,
  • Implement advanced monitoring to detect abuse from specific instances and enable quicker reaction from our technical team for handling these abuses before they impact the quality of services for all other customers.
We apologise for the inconvenience, and please be assured that our teams are endeavouring to correct these issues in the shortest possible time.

We experienced a hardware fault on routing equipment on the simple hosting platform.
Below is a chronology of the various events:
- 20:06 UTC : CPU load on the equipment shows significant increase.
- 20:06 UTC : Equipment is running at 100% CPU for no apparent reason, and has failed to respond to commands.
- 20:08 UTC : We made the decision to migrate to secondary equipment.
- 20:08 UTC : The secondary equipement exhibits the same symptoms as the primary, so traffic was not transferred.
- 20:09 UTC : Debugging underway as to ascertain the cause of the problem.
- 20:26 UTC : Migration to the now-stabilised secondary equipment.
- 20:27 UTC : Service returned to nominal operation.
- 22:42 UTC : Following this incident, there was a secondary effect on DNS resolution; the Simple Hosting instances failing to resolve DNS since 20:06 UTC.  the problem is now resolved.
- The network equipment used for the Gateways for this service are visibly showing signs of weakness.  An in-depth analysis of the anomaly and behaviour of the primary unit is underway (likely due to a memory fault).  We are currently running on the secondary gateway for the moment.


Our database system will have a maintenance tonight.
Therefore all update of the DNS zones updates will be delayed betweeen 00:00 CEST and 02:00 CEST.
We apologize for the inconvenience.

A network unit currently encountered an issue.

 

Our technical team is on  site and analyzing the issue.

 

They will do the necessary maintenance operations in the next minutes.

This issue was resolved by the network team within minutes of this message being posted. 


We received an alert by our monitoring concerning the Baltimore/USA datacenter.

 

We lost contact with 11 physical machines.

 

We are currently analyzing the issue with the technical team.

 

Sorry for the inconvenience this issue may cause to you.

 

UPDATE : one of our technical team members is on its way to the datacenter.

UPDATE : we lost in fact a network unit, a switch on which the redundant system did not work as expected. The physical nodes are coming back.



Page   1 2 35 6 713 14 15
Change the news ticker size