Notre nouvelle plateforme est disponible sur www.gandi.net

Découvrir le nouveau Gandi



We are currently experiencing an incident on a storage unit.

The situation should be resolved shortly.  In the meantime, please do not attempt to restart your server. 

We apologise for any inconvenience.

 

UPDATE: 13h46 CEST - The I/Os stacks are now recovering

UPDATE: 13h50 CEST - End of incident

 


An interruption of service on SimpleHosting happened temporarily between 12h53 and 13h27 (CET).

Only a part of the instances have been impacted.

We are currently looking into the cause of this issue.

Please accept our apologies for the inconvenience.


The operations on the SimpleHosting platform are currently stopped.

 

Our team spotted an issue on the SimpleHosting operations.

They are done correctly, but their status is not updated in term of display ("operation in progress" although it is done).

But, it is blocking waiting operations.

 

We are currently analyzing the issue.

 

UPDATE : the problem was located in a logger system which did not allow the operation to be updated. All operations are finished now.


An incident is currently underway on our Simple Hosting platform (Paris datacenter only).
 
The reason for the incident is not immediately clear; we are investigating.  Please don't launch any operations on the instance for the moment.
 
Updates will be posted here as soon as we have more information.
 

Update Tue Dec 10 21:37:01 CET 2013: This issue has been resolved. Please accept our apologies for the brief period of inavailability.


Simple Hosting instances located in our Baltimore data center only may be currently experiencing issues. Our technical staff is investigating the issue. Please do not perform any operations on your instance in the meantime.

This post will be updated as the situation evolves.

Update 00:51:20 CET:

A member of our technical staff is currently onsite in Baltimore to address the problem.

Update 01:35:13 CET:

The issue has been resolved. Services should be now operating normally.



The incident of November 11th is part of a series of incidents over the past few weeks caused by the gateway units, which provide Internet access for the Simple Hosting instances.
The Simple Hosting platform has experienced a number of different issues, principally with the gateway equipment, which seems to be the weakest link in the architecture. It is suject to:
  • HSRP instability causing short interruptions in connectivity,
  • Saturation of NAT translation tables as a result of a number of factors, including DDoS and Customer Activity, 
  • High CPU usage under certain conditions.
What will Gandi do to fix the situation, replace this gateway and improve the Simple Hosting product ?
  • Replace the network equipment which provides the gateway to Internet for the Simple Hosting product with more powerful appliances, and greater numbers of units (scaling). The new units will better handle the current load and will support the growth of Simple Hosting instances in the near future,
  • Set up a deeper level of monitoring to better detect technical problems,
  • Implement advanced monitoring to detect abuse from specific instances and enable quicker reaction from our technical team for handling these abuses before they impact the quality of services for all other customers.
We apologise for the inconvenience, and please be assured that our teams are endeavouring to correct these issues in the shortest possible time.

We experienced a hardware fault on routing equipment on the simple hosting platform.
Below is a chronology of the various events:
- 20:06 UTC : CPU load on the equipment shows significant increase.
- 20:06 UTC : Equipment is running at 100% CPU for no apparent reason, and has failed to respond to commands.
- 20:08 UTC : We made the decision to migrate to secondary equipment.
- 20:08 UTC : The secondary equipement exhibits the same symptoms as the primary, so traffic was not transferred.
- 20:09 UTC : Debugging underway as to ascertain the cause of the problem.
- 20:26 UTC : Migration to the now-stabilised secondary equipment.
- 20:27 UTC : Service returned to nominal operation.
- 22:42 UTC : Following this incident, there was a secondary effect on DNS resolution; the Simple Hosting instances failing to resolve DNS since 20:06 UTC.  the problem is now resolved.
- The network equipment used for the Gateways for this service are visibly showing signs of weakness.  An in-depth analysis of the anomaly and behaviour of the primary unit is underway (likely due to a memory fault).  We are currently running on the secondary gateway for the moment.


Page   1 2 3 4 57 8 9
Taille du bandeau d'actualités