Data Fail: Sidekick Phones
The Microsoft data store where T-Mobile Sidekick phones save their user data, such as contact info and pictures, has been reported to have been lost beyond repair.
On October 3, T-Mobile Chief Operations Officer, Jim Alling wrote the following post on the T-Mobile forum site:
Dear valued T-Mobile Sidekick customers:
I realize that for many of you, your T-Mobile Sidekick is how you stay in touch with your friends, family and others. I sincerely apologize for the impact the current disruption of data services may be having on you. I assure you that T-Mobile is working very closely with Danger/Microsoft to resolve the issue as quickly as possible. T-Mobile-supported services, such as voice calls and SMS/MMS, have not been affected and continue to be operational. Danger/Microsoft has been working, and will continue working through the week, to restore data functionality and other features.
I understand that this data service disruption is very frustrating to our valued Sidekick customers. For many years, the Sidekick has been, and continues to be, a cornerstone device for T-Mobile. And we believe Sidekick customers are among the most loyal customers anywhere. Recognizing that, and to address any inconvenience Sidekick data customers are experiencing, T-Mobile will automatically credit one month of data service to customers who subscribe to T-Mobile Sidekick data plans. There is nothing you need to do to get this credit – T-Mobile will post the credit to these accounts in the coming days.
We will continue to post the latest information and FAQs to these Forums. I appreciate you being a loyal T-Mobile customer, and appreciate your patience as everyone works hard to resolve the current issues. Thank you.
Sincerely,
Jim Alling, Chief Operations Officer, T-Mobile USA
Then, after a torrent of discussion on the forum site, the following update was provided earlier today:
Dear valued T-Mobile Sidekick customers:
We are thankful for your continued patience as Microsoft/Danger continues to work on preserving platform stability and restoring all services for our Sidekick customers. We have made significant progress this past weekend, restoring services to virtually every customer. Microsoft/Danger has teams of experts in place who are working around-the-clock to ensure this stability is maintained.
Regarding those of you who have lost personal content, T-Mobile and Microsoft/Danger continue to do all we can to recover and return any lost information. Recent efforts indicate the prospects of recovering some lost content may now be possible. We will continue to keep you updated on this front; we know how important this is to you.
In the event certain customers have experienced a significant and permanent loss of personal content, T-Mobile will be sending these customers a $100 customer appreciation card. This will be in addition to the free month of data service that already went to Sidekick data customers. This card can be used towards T-Mobile products and services, or a customer’s T-Mobile bill. For those who fall into this category, details will be sent out in the next 14 days – there is no action needed on the part of these customers. We however remain hopeful that for the majority of our customers, personal content can be recovered.
===
Dan
Moderator, T-Mobile Forums
At this time, neither Microsoft nor T-Mobile have confirmed conjecture that a SAN update caused the failure:
So yeah..
I would like to know what discounts are T-mobile going to give on a new Phone. I am probably going to move to the Moto Cliq, But I and other sidekick users should get a full phone discount not just a % of it.. (Microsoft should pay for it)
hmm Roz Ho haven’t you her of BACKUP…?
Quoting Hiptop3
“Currently the rumor with the most weight is as follows:
Microsoft was upgrading their SAN (Storage Area Network aka the thing that stores all your data) and had hired Hitachi to come in and do it for them. Typically in an upgrade like this, you are expected to make backups of your SAN before the upgrade happens. Microsoft failed to make these backups for some reason. We’re not sure if it was because of the amount of data that would be required, if they didn’t have time to do it, or if they simply forgot. Regardless of why, Microsoft should know better. So Hitachi worked on upgrading the SAN and something went wrong, resulting in it’s destruction. Currently the plan is to try to get the devices that still have personal data on them to sync back to the servers and at least keep the data that users have on their device saved. “
WOW.
Microsoft Do you understand that you are making yourself and T-mobile loose MONEY????
Also with me being a Sidekick owner I feel betrayed by Microsoft not T-mobile.
This outage I was all fine about at first but now it is just to much. We sidekick owners rely on Danger witch is now owned by Micro to keep are data stored on a secure server and that is why us users never backed up are data. I mean the sidekick does not even have a mass contact save Option. The user has to save them one by one. If I do stay with the sidekick I would like to see Options to save all on SD becuase a SIM can only hold around 250..
I have lost business and meetings from this outage and I am not happy.
So to everyone
It is not T-mobiles Fault so do not blame them. There customer service has been AWESOME
Also Danger and Microsoft do not comunicate with T-mobile as much that is why there is not much info.
“I wonder if we call Microsoft and bug them will they give us any info, they will probably say u have to call t-mobile. Well T-mobile is not the one who messed up,.they do not UPDATE THE SAN…..”
After a week of attempting to salvage the data, it would appear as though Microsoft was unsuccessful in doing so. If the SAN speculation is correct, then it was simply a failure of the data’s underlying SAN. The question is, why should a failing SAN bring with it the data of an entire customer base? I severely doubt that this would have occurred had this been a normal hardware breakdown. Well-designed storage solutions are built with the precondition of being able to survive a head failure, network failure, any sort of failure, really, without losing data. One would thus speculate that gross human error was at fault, and frankly, that means that management was not doing their job. Not enough layers of redundancy were built into this system, and not enough protective layers were written into policy to prevent this human error, or whatever it was, from cascading into a data-lost scenario. Data management is a big responsibility, and not enough resources go into its upkeep in many firms. It would thus appear that Microsoft appears to be one of the latter.
BAARF
15 Apr 2009: Edited from it’s original form for clarity… and a stab at humor. -Jim
I’m a card-carrying member (so to speak) of BAARF, a little online group dedicated to dispelling the myth that RAID5, or any variant thereof, is a good compromise for capacity and fault tolerance. The reason I bring this up is that I had two hard drives fail earlier today (on separate machines), of which one was RAID5 (it’s not mine). The RAID5 box is still rebuilding, one hard drive failure away from data oblivion. Please, for the love of all that is sacred in storage, don’t trust your data to RAID5, or even RAID6, which is not a whole lot better.
Also, it makes me sad that someone would dedicate some very nice 15K RPM SAS drives to a RAID5 array, presumably to offset the characteristically low IOPS performance of any RAID3/4/5 variant. Listen folks: you can have good IOPS as well as high capacity with other RAID levels, namely RAID10, which offers the best compromise of both worlds. I won’t go into too many details here, the page linked below has a number of good reference write-ups, but the gist is that dedicating resources to parity management (the calculating, reading, and writing of parity data) is a practice that sucks and deserves a swift boot into tech obscurity along with floppy drives and modems.
You may join the fight, or not. Either way, enough is enough.
Conficker Update
Update: An excellent resource list is available at the Internet Storm Center.
The headline at dailymail.co.uk read “April Fool’s Day computer virus is activated… but fails to cause internet chaos.”
I guess the rumors were unfounded. However, it’s important to note that the virus is still rampant and speculation on the potential uses of such a huge botnet are as well. Some surmise that it might be used to DDOS the crap out of some poor server(s). It might also be used to crack passwords or encryption. Check out http://downadup.org to read more and for removal tools. It’s also a good idea to prepare your network for the potentiality of attack. Don’t be a soft target.
Here’s a couple (read non-comprehensive) ideas on how to not be a soft target:
- Backup, backup, backup
- Have systems ready to leap into action if necessary, and keep at least one form of backup offline in case of worst-case scenarios.
- If you don’t already have a backup strategy in place, it’s time to implement one.
- Control access to your critical services
- Enforce strong passwords – or better yet, employ multi-factor authentication. PPP is a strong candidate for the thrifty.
- Audit your users – does that guy who quit last year still have an active user account? Do your non-administrative users have access to critical servers?
- Use fail2ban or iptables to detect and drop password-guessing attacks – even with 10 million + IP’s to choose from, it’s not easy to crack a password/one-time password combination when you only get 3 tries per IP.
- Watch your traffic (not really a botnet vulnerability, but good practice in general):
- Control your legacy services – seriously, it’s time to retire telnet and other services that transmit passwords in cleartext.
- https > http – especially when it comes to passwords. Don’t allow users the ability to transmit passwords over http.
- etc…
I’ve hardly compiled a comprehensive list, and I welcome comments for other good practices, but the most important takeaway is to be cognizant of your security stance. Don’t make it easy for the bad guys.