Data Fail: Sidekick Phones
The Microsoft data store where T-Mobile Sidekick phones save their user data, such as contact info and pictures, has been reported to have been lost beyond repair.
On October 3, T-Mobile Chief Operations Officer, Jim Alling wrote the following post on the T-Mobile forum site:
Dear valued T-Mobile Sidekick customers:
I realize that for many of you, your T-Mobile Sidekick is how you stay in touch with your friends, family and others. I sincerely apologize for the impact the current disruption of data services may be having on you. I assure you that T-Mobile is working very closely with Danger/Microsoft to resolve the issue as quickly as possible. T-Mobile-supported services, such as voice calls and SMS/MMS, have not been affected and continue to be operational. Danger/Microsoft has been working, and will continue working through the week, to restore data functionality and other features.
I understand that this data service disruption is very frustrating to our valued Sidekick customers. For many years, the Sidekick has been, and continues to be, a cornerstone device for T-Mobile. And we believe Sidekick customers are among the most loyal customers anywhere. Recognizing that, and to address any inconvenience Sidekick data customers are experiencing, T-Mobile will automatically credit one month of data service to customers who subscribe to T-Mobile Sidekick data plans. There is nothing you need to do to get this credit – T-Mobile will post the credit to these accounts in the coming days.
We will continue to post the latest information and FAQs to these Forums. I appreciate you being a loyal T-Mobile customer, and appreciate your patience as everyone works hard to resolve the current issues. Thank you.
Sincerely,
Jim Alling, Chief Operations Officer, T-Mobile USA
Then, after a torrent of discussion on the forum site, the following update was provided earlier today:
Dear valued T-Mobile Sidekick customers:
We are thankful for your continued patience as Microsoft/Danger continues to work on preserving platform stability and restoring all services for our Sidekick customers. We have made significant progress this past weekend, restoring services to virtually every customer. Microsoft/Danger has teams of experts in place who are working around-the-clock to ensure this stability is maintained.
Regarding those of you who have lost personal content, T-Mobile and Microsoft/Danger continue to do all we can to recover and return any lost information. Recent efforts indicate the prospects of recovering some lost content may now be possible. We will continue to keep you updated on this front; we know how important this is to you.
In the event certain customers have experienced a significant and permanent loss of personal content, T-Mobile will be sending these customers a $100 customer appreciation card. This will be in addition to the free month of data service that already went to Sidekick data customers. This card can be used towards T-Mobile products and services, or a customer’s T-Mobile bill. For those who fall into this category, details will be sent out in the next 14 days – there is no action needed on the part of these customers. We however remain hopeful that for the majority of our customers, personal content can be recovered.
===
Dan
Moderator, T-Mobile Forums
At this time, neither Microsoft nor T-Mobile have confirmed conjecture that a SAN update caused the failure:
So yeah..
I would like to know what discounts are T-mobile going to give on a new Phone. I am probably going to move to the Moto Cliq, But I and other sidekick users should get a full phone discount not just a % of it.. (Microsoft should pay for it)
hmm Roz Ho haven’t you her of BACKUP…?
Quoting Hiptop3
“Currently the rumor with the most weight is as follows:
Microsoft was upgrading their SAN (Storage Area Network aka the thing that stores all your data) and had hired Hitachi to come in and do it for them. Typically in an upgrade like this, you are expected to make backups of your SAN before the upgrade happens. Microsoft failed to make these backups for some reason. We’re not sure if it was because of the amount of data that would be required, if they didn’t have time to do it, or if they simply forgot. Regardless of why, Microsoft should know better. So Hitachi worked on upgrading the SAN and something went wrong, resulting in it’s destruction. Currently the plan is to try to get the devices that still have personal data on them to sync back to the servers and at least keep the data that users have on their device saved. “
WOW.
Microsoft Do you understand that you are making yourself and T-mobile loose MONEY????
Also with me being a Sidekick owner I feel betrayed by Microsoft not T-mobile.
This outage I was all fine about at first but now it is just to much. We sidekick owners rely on Danger witch is now owned by Micro to keep are data stored on a secure server and that is why us users never backed up are data. I mean the sidekick does not even have a mass contact save Option. The user has to save them one by one. If I do stay with the sidekick I would like to see Options to save all on SD becuase a SIM can only hold around 250..
I have lost business and meetings from this outage and I am not happy.
So to everyone
It is not T-mobiles Fault so do not blame them. There customer service has been AWESOME
Also Danger and Microsoft do not comunicate with T-mobile as much that is why there is not much info.
“I wonder if we call Microsoft and bug them will they give us any info, they will probably say u have to call t-mobile. Well T-mobile is not the one who messed up,.they do not UPDATE THE SAN…..”
After a week of attempting to salvage the data, it would appear as though Microsoft was unsuccessful in doing so. If the SAN speculation is correct, then it was simply a failure of the data’s underlying SAN. The question is, why should a failing SAN bring with it the data of an entire customer base? I severely doubt that this would have occurred had this been a normal hardware breakdown. Well-designed storage solutions are built with the precondition of being able to survive a head failure, network failure, any sort of failure, really, without losing data. One would thus speculate that gross human error was at fault, and frankly, that means that management was not doing their job. Not enough layers of redundancy were built into this system, and not enough protective layers were written into policy to prevent this human error, or whatever it was, from cascading into a data-lost scenario. Data management is a big responsibility, and not enough resources go into its upkeep in many firms. It would thus appear that Microsoft appears to be one of the latter.
Rackspace Delves into Cloud Computing Marketplace
Rackspace recently delved into the cloud computing arena with it’s Mosso division. Mosso delivers online presence in an application-as-a-service model and mimics Amazon Web Services and Microsoft Azure in delivering high-availabilty platforms on which to run services. With Amazon and Microsoft however, shell access remains limited and system-level access takes a back seat to stability. Mosso, in contrast, also offers provider-provisioned virtual server instances, which doesn’t require tackling a new learning curve. One can surmise that as the cloud marketplace matures, more providers will adopt this model. Ultimately, cloud computing will provide the availability and performance we want, without sacrificing the control we need.
Join the discussion at the Open Cloud Manifesto site.
Cloud Hosting != Unbreakable
When Microsoft launched their cloud-based operating system last October, they branded it “Azure,” I suppose as a reference to the blue skies that supposedly hold these clouds.
According to Tier1 Research’s A. Piraino, Azure suffered a 22 hour outage this weekend when a (speculatively) software related glitch caused instances to suddenly stop responding. While Microsoft is yet to release the results of a root cause analysis, one can envision a NOC with stacks of monitors displaying Blue Screen’s of Death. Or rather, Azure Screen’s of Death.
To be fair, Microsoft Azure is still in “Technology Preview,” which is to say, pre-production. And other cloud computing platforms have suffered similar outages in their infancy as well. Amazon Web Services suffered a seven hour outage in July from faulty load balancers. Google systems were brought down twice in the past six months.
The problem isn’t that the architecture doesn’t work as planned. The problem is that no amount of planning will cover every situation that can, and will, occur. Failures of critical components become huge issues in virtualized applications, because that many more (virtual) instances require the services of those components. Though a system can have n levels of redundancy built into it, ultimately, there is no such thing as a completely unbreakable system.
Folks tend to get excited about cloud computing because they envision a future of virtualized applications zipping around in a grid computing infrastructure, never failing, never dying. Even in more traditional environments, people get excited about centralized storage, and the joys of instant snapshots and multiple layers of redundancy. Though these technologies are exciting and brings with it new avenues for innovation, uniform architectures share uniform faults. Diversity in architecture is an important consideration when you’re building fault tolerance into your system.
Another important consideration is this: the more power we place in the hands of an administrator, the more damage he can do when he goofs. And he will goof. We all goof once in a while. Take, for example, Flexiscale, who ate a five-day outage because of one such goof. The more we consolidate technology, the more vulnerable we are if something that should never happen, happens.
The takeaway is simple. Take the promises of new technology with a grain of salt. And even if the skies are blue, pack an umbrella just in case.