Following an incident that occurred during an update, some Simple Hosting websites stopped responding correctly (they display an error message). We have identified the problem and are correcting it at this time. It is neither necessary nor recommended that you reboot your instance.
We will post more information here when we have an update for you.
The following updates are in CET for February 19, 2013:
13:08 The origin of the problem has been found, we are verifying the solution and will apply it shortly
13:14 The operation is still in progress. 25% of the platform is impacted.
13:44 The script did not work. We are correcting it and testing it on several instances before launching it for all the others. We can confirm that no data has been lost.
14:25 The script works. We are applying it to all of the instances affected. This will take about one hour before all of them are fixed.
15:53 The update is taking longer than expected. Estimated time to resolution is 16:50-17:00.
Incicent resolved. There are the technical details:
The affected instances are all now restarted. Any residual issues we will handle on a case-by-case basis. The deployment of a migration script failed, and all Simple Hosting instances were affected. A configuration change that should have been applied on the next restart was instead applied directly to the Apache service, and the logs were rotated. In parallel, an automatic recovery was executed on the instances in the middle of a migration. The end result was that the instances were started with a partial update applied. Consequently, for us to correct this problem, we had to stop the majority of the instances, and determine which were in an inconsistent state. We then restarted the instances, and forced the migration of the incompletely updated systems. This took longer than expected, which is why our initial recovery estimates were inaccurate. No data was lost during this incident, and your instance should be fully functional.
Please kindly accept our apologies for this incident. This week, we will be discussing how to ensure that this never happens again in this way.
We are experiencing an incident due to an unexpected hardware problem with a storage node. Our teams are currently working on the problem to restore service as soon as possible. We recommend that you do not reboot your server if you are impacted.
We will keep you informed of the situation in this news article.
[Update] The hardware component has now been replaced, and the situation is back to normal.
As of 16:45 UTC Oct 29, 2012, our Baltimore data center has seen no effects from the leading edges of hurricane Sandy. We are seeing many power outages in the local area, however, and we caution our hosting customers that while our power systems are highly redundant, it would be prudent to expect outages of connectivity, and possibly even power over the next 24-48 hours. The size of this storm is unusually large, and the effects expected to be severe.
Update: As of 15:00 UTC, the Baltimore Data Center has not lost power or suffered a degradation in service, despite losing one of the redundant connections to the Internet.
Resolved: The cyclone "Sandy" did not manage to take out power or significantly affect the Baltimore data center hosting or network services.
There is currently an issue on Paris datacenter affecting some IaaS/Cloudserver & PaaS/SimpleHosting physical machines.
Our technical team is currently analyzing the issue, and is on-site in the datacenter.
More information to come on this article.
EDIT : All hosting operations are currently stopped.
EDIT : It seems that it was located on a network device. The physical nodes and the VMs are coming back up. We monitor the impacted VMs to be sure that they respond in the next minuts.
EDIT : The issue has been solved. The hosting operations are up.
Over the past 24 hours (since June 12th at 11:00 CET), our web forwarding service has been the target of an extremely large DDoS attack. Our technical teams have been working on thep problem since it began, though it appears somewhat likely that the attack will succeed, meaning, the total saturation of our service and a subsequent temporary interrumption in service.
The services impactes will be all the web forwarding addresses that are configured on our DNS.
No other service is currently at risk (DNS, Email, Hosting, SSL, are OK).
We ask that you refrain from contacting our support with questions concerning this, as they can only forward you to this message, which will be updated in real time. The saturation of our customer support will only degrade the situation, while not being able to give you any more information concerning the status of the situation than is on this page.
We apologize for the inconvienence, and assure you that we are doing everything possible to recover from this attack and return to our normal quality of operation.
update: we have put in place a solution that has allowed us to reduce the risk of service interruption. We are therefore lowering the alert level, but will continue to monitor the DDoS which is still underway.
A network problem is occuring on SimpleHosting since 16h30. Our teams are trying to resolve it as soon as possible. Don't restart your server if you seem impacted. We'll keep you posted about this problem.
Edit : after isolating a bogus element, everything is back to normal since 22h30.
We have a temporary emergency halt on the hosting storage system (filers). We recommend that you do NOT attempt to restart your server. The impacted servers should recover in the next few minutes. We will update you with further information as soon as possible.
[edit00:00]The services arefullyrestoredas of 21:20 CET. Most users were back to nominal function before 19:30,butsometook longerto start.Identifiable blocked systemswere managed and restarted manually.Pleaserestartyour servicesif theyare stillunavailable at this time,and contact supportif your serveris not available and cannot be restarted.
A storage unit is currently experiencing a slowdown. Our teams are currently working on a solution.
Update (09:45 GMT): The situation improved between 07:00 and 08:00 GMT. There were significant slowdowns between 05:00 and 06:50 GMT.
Update (January 25th 09:00 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the one yesterday. Our technical team is working on solving the issue.
Update (January 25th 10:00 GMT): The I/O situation improved. Our technical team is still working to find a complete fix to the issue.
Update (January 25th 10:22 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the one this morning. Our technical team is working on solving the issue.
Update (January 26th 11:26 GMT): The I/O situation improved. Our technical team is still working to find a complete fix to the issue.
Update (January 27th 19:11 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the incident of the week. Our technical team is working on solving the issue.
Update (January 27th 22:00 GMT): The I/O situation is now stabilized. Our technical team is still working to find a complete fix to the issue.
Update (February 2nd 03h30 GMT): Another incident affects one of our storage units. We're now rebooting the faulty equipment. We recently found a few corrective actions that we'll soon be able to take in order to solve this kind of issues.
Update (February 2nd 20:19 GMT): Another incident has occurred, and slowdown was noticed, however the situation is stable right now.
Update (February 6th 02:09 GMT): Slowdown on one of our storage units. Teams working on it.
Two storage units are concerned by these incidents, which are isolated slowdowns in read/write operations. We suspect that the problem is two-fold: a software problem (blocking of operations), and a hardware problem (some disk models are unusually slow).
When these slowdowns occur, the implementation of iSCSI that lets us connect your servers to their disks may be dysfunctional. The result is an "I/O wait" that is artificially high (100%) even if the storage is once again rapid.
We are currently working on these three problems by giving priority to the capacity of our system to re-establish service after a slowdown.