Due to an incident witha management system, we have temporariliy suspended email delivery to GandiMail boxes. New mails will be stored in the incoming spools and will be delivered when the issue has been resolved.
We apologise for any inconvenience this may cause.
Update: 16:00 CET (15:00 GMT) - the issue has been resolved and the mail services are operating normally.
We will be carrying out network maintenance during the night of 15-16 January 2011. The purpose of this work is part of a multiphase plan to remove the legacy nework topology and migrate to a more stable, scalable, and efficent hierarchical model in Paris.
In this phase, the activity will only involve the interconnections beween the core and aggregation network elements at our datacenter in St. Denis.
This activity will have several minor impacts on connectivity for various Gandi services in Paris throughout the maintenance window, each up to five minutes as the migrations are performed in the various sections of the network, but no significant outages are expected.
We have scheduled this maintenance window from 02:00 CET (01:00 GMT) to 08:30 CET (07:30 GMT) on 16 January during the period of lowest impact to customers.
We will schedule follow-on maintenance activity over the coming weeks for the rest of the network migrations, to include activities at Telehouse as well as a number of services in particular, and we will of course endeavour to keep any disruption to a minimum.
Today (January 4th 2011), one of our routers went offline. This led to the partial and temporary loss of our network, impacting some of our services such as our website, SiteMaker, GandiBlogs, some email accounts, and all operations towards servers. Domain names did not encounter any unavailability, though some network paths to certain servers were unavailable.
The incident is currently being resolved, and services will progressively return to normal.
Please accept our apologies for the inconvenience.
UPDATE: Here is the technical explanation for yesterday's network incident:
Part of the Gandi France network is based on legacy topologies built over the past ten years, including multi-site spans for various VLANs and in some cases a relatively flat architecture. This part of the architecture relies, perhaps unwisely, on spanning-tree protocol to ensure a loop-free layer-2 topology in a bridged or switched network. Whilst we have have been performing various engineering works over the past 18 months to simplify the architecture, it takes a considerable amount of time to completely unbuild what has been built piece by piece over a period of ten years without significant outages of the Gandi services.
The incident yesterday was exacerbated by the legacy elements of the Gandi France network infrastructure and was caused by a fault in a downstream access switch cluster which created a layer-2 loop in the architecture. This in turn caused an unfortunate situation whereby the layer-2 topology of the legacy network was being constantly recalculated resulting in the spanning-tree protocol failing to converge, consuming 100% resources on the affected switches and thus preventing traffic flow. The offending switch cluster was isolated from the network, but we were also required to reload another switch in another datacentre to stop the "snowball" effect caused by the fault.
We have already scheduled for this quarter significant network engineering activities to finally unpick the remainder of the legacy topology and migrate to a fully hierarchical model limiting the layer-2 domains to locally contained subnets, and minimising the reliance upon such protocols as spanning-tree which was never designed to be used in such large scale designs in the first place. We will be communicating the dates and times of the maintenance windows over the coming weeks.
We apologise again for any inconvenience caused during this network incident yesterday.
We are currently experiencing an abnormally increased load on the incoming mail spools on the GandiMail service. As a result, new mail deliveries may be slower than usual. Our teams are investigating the source of this increased load and we will keep you updated as we have more information.
Update: 14:00 : The slow spool performance is related to an increased load on the antispam/antivirus filtering on the mail spools. Our teams are actively working to resolve the issue as soon as possible. Inbound mail is still being delivered, but of course at a slower rate.
Update: 17:50: Our teams have isolated the issue and have tweaked the processes on the antispam filters to further optimise performance. All mail in the spools have been delivered to the recipient mailboxes and the system is now running nominally.
We apologise for any inconvenience caused by the slower than normal delivery of mails today.
A problem with one of our storage units has led to a disruption in mail for several thousand email mailboxes. We are currently working to restore service as soon as possible.
Please accept our apologies for the inconvenience.
The technical team
14:24 GMT: We have identified that the problem was caused by data corruption. We are corrently working on repairing the file system.
16:16 GMT: The filesystem has been found to be severely corrupted. Our technical teams are investigating whether it is possible to recover any data on the affected filer. We will keep you updated with any further information/progress.
12:00 GMT (1 Dec): The work was completed to rebuild the affected filer last night and undelivered mails still in the queue will have been delivered overnight. Unfortunately the filesystem corruption was replicated on the resilient disk arrays which meant that we were sadly unable to recover the data for the affected mailboxes. These mailboxes have been brought back online, but unfortunately are empty for users accessing via webmail or IMAP. We have sent further communication to customers affected by this regrettable event. We have put measures in place to mitigate any repeat occurence, and will be further developing and hardening our mail infrastructure. On behalf of the entire Gandi team, please accept our sincerest apologies for this most regrettable 'first'.
We have lost contact with a rack for the hosting service -- our technical teams are on site to repair the issue. As soon as we know if the problem is related to the network or the servers, we will migrate your server or reconnect the rack to reestablish service.
Please do not execute any operations on your server (stop/reboot) in the meantime, we will restart your server if necessary.
We apologise for any inconvenience caused.
Update: 13:00 CET (12:00 UTC) : The servers are now reachable. The brief outage was due to a partial electrical failure which is being investigated. The outage only affected the network connectivity for the affected servers.
For the 2nd time in 24 hours (!), some of our services are undergoing a DDoS attack. The most heavily affected by this is the web forwarding service, which is completely saturated.
We are doing everything necessary to return the service to normal operation as soon as possible.
Please accept our apologies for the inconvienence.
Update 15:00 GMT: The attack has ended, and our services are operating at normal levels now. We will nonetheless continue to monitor the situation
Following an unexpected hostile traffic burst, we experienced a network outage.
Web redirections have been down until 6:30 AM CET, until we reactivated the first part of the service. The whole "web redirection platform" is fully back online since 8:30 AM CET.
We will be performing maintenance on Thursday, November 11th between 11:00 PM and 12:00 AM CET time (5:00 PM to 6:00 PM EST). This will be to update our Roundcube (http://webmail.gandi.net) webmail to version 0.4.2.
The (long) list of changes to be made can be consulted on Roundcube's website:
One of the racks in our datacenter is currently experiencing an electrical incident affecting some of the servers in that rack. Our technical teams are investigating and working to bring the affected servers back online as soon as possible. We recommend that you refrain from stopping/rebooting your VPS servers, and these operations will be automatic when the affected VPS servers can be migrated.
Updated: 13:00 CET (12:00 GMT): The power has been restored to the affected servers. Our technical teams are investigating the cause of the outage.