Ok, but what are you DOING?!

Dell is buying 3PAR.  Oracle has Sun and Exadata.  EMC now has Greenplum.  Cisco sells telephony and servers.  IBM is selling SATA drives to the enterprise as XIV and is reselling NTAP, but its storage architect is out on the loose, again.  Who did I leave out?  Oh, HP…  The only one NOT making moves is Microsoft?!  Actually that’s not completely true… Microsoft continues to “innograte” — the act of innovating by integrating acquired technology into your existing products’ evolution.  DATAllegro, Opalis, and so on…

I keep thinking about the famous economic principle that states “you will ultimately be undone by your past”.  If EMC could just shed its dependency on Dell’s relationships with all those purchasing agents…  If Cisco could just lose the MDS and the Catalyst… if IBM could just forget about the XIV and buy NTAP already!…  And HP… Oh HP…

It has become painfully clear that the order of the components in the OSI model, in fact, serves as a roadmap.  Never forget that the Application is always at the top.  Whatever the application needs, the application gets.  If you don’t sell an application, don’t expect to tell anyone what to do.  Ever.  If your application is actually a utility for another application, don’t forget that fact.  Your “utility” is NOT the application.

Oracle has Java, yes.  Java is a development tool, not an application, but how about Siebel, PeopleSoft, JDEdwards?  Dell has … nothing.  HP has … nothing.  IBM has DB2 and Lotus.  Cisco has Unity — yes Voicemail is an application.  EMC has … hmmm VMware? VMware is actually an infrastructure tool — it’s like a server hardware manufacturer that lets you use whatever server vendor you want.  EMC also has Documentum — a utility that is configured as an application.  Microsoft, on the other hand, has all the applications you can shake a stick at.  If Microsoft says their application needs 100-spinning dancers to run, guess what you’re buying?

The way to true technology marketplace leadership is through applications that people actually use.  People love Microsoft Office.  They love their iPhones, their Androids, their Blackberries (often seen as tools… but they’re not — an iPhone is a collection of applications — remember… “there’s and APP for that!!!”).  People also love web-based applications like Facebook, Salesforce, gmail, and LinkedIn.  It seems to me… that HP, IBM, Dell, and EMC would do well to think about what people are using to “run their lives” and follow those markets.

Does it matter that Dell has 3PAR?  Does it matter that EMC has DataDomain?  Does it matter that Oracle has Sun and VirtualIron and BAE, and Java? — I don’t think so — but Oracle DOES have Siebel, PeopleSoft, and JDEdwards — real APPLICATIONS.  So at the end of the day, I think it becomes pretty clear who will be pulling and who will be pushing…  Take a look at what people are DOING.  The truth will show you the way:

  • Email — Google, Microsoft, Yahoo, Apple (MobileMe).  All Cloud-based email systems (and Microsoft even has a version of email that runs inside your firewall <g>)
  • Banking — a scattered field with many banks offering their own applications, plus Quicken
  • Social Networking — Facebook is dominating, but LinkedIn, MySpace, etc. continue to stay afloat
  • Media Sharing — Flickr, SnapFish, PhanFare
  • Media Consumption — Netflix, Pandora, Amazon’s Kindle, iTunes — all retailers with massive followings
  • Spreadsheets and Documents — Microsoft OWNS this space with trickles from OpenOffice, and iWork

So, just some idle advice from the sidelines for Dell, HP, even EMC — look at what people are doing; and go DO that!

Microsoft makes Massively Parallel Process Database available for the masses

Microsoft and its Parallel Data Warehouse (Madison) — a product derived from Microsoft’s purchase of DATAllegro — seems to be struggling to convince anyone that they actually understands what data warehousing really is.

As an example, Madison has no workload manager — a key element to data warehouses that allows the business to define “roles” that have access to the resources of the DW. Without a workload manager, all users/roles share the access to the valuable DW without prejudice.

In a mature (read “true”) data warehouse, “roles” are defined that allow the business to state, for example, all VIPs can run queries any time, but they cannot consume more than x resources on the system, they can only return x number of rows, they can only process x queries at one time, etc. Likewise, there is an adhoc group that might be governed more stringently and an analytics group that has higher privileges because the DW Admins can trust that the queries that the analytics group submits will not bring the DW to its knees.

Microsoft’s implementation of PDW 1.0 allows none of these controls. Instead they govern the entire system to 32 simultaneous queries. Each query can consume (theoretically) 1/32 of the Compute nodes’ resources. So there is no means to escalate, nor a means to sublimate queries. This may seem like nit-picking, but what DW Admins will find themselves doing (more than reading novels or starting ETL jobs) is killing run-away queries so that the analytics groups can run the queries the business needs.

Another example of Microsoft’s oversimplification of the DW space, PDW utilizes a “landing zone” server to stage data during ingestation. The landing zone server has been doubted for many years as a major bottleneck. And with good reason; a typical ETL datastream device like an Informatica server is a single device that outputs to Named Pipes. Many ETL implementations rely on Informatica’s engine to pump data at multi-Gbps speeds. Since most ETLs depend on Ethernet, this has dictated multiple Infiniband connections to obtain the throughput many organizations need.

In tests widely confirmed by Microsoft, the landing zone was able to pump “hundreds of GBs per hour”. For comparison purposes, a typical Windows 2008 File Server can pump about 275-300GB/hour — coincidence? Here’s the rub, the entire array of PDW servers uses Infiniband to link them together. It uses another fabric, Fibre Channel, to link the servers to storage. The system is flush with bandwidth — what is doesn’t have is a mechanism to avoid OS-level bottlenecks.

On the other hand, a purpose-built ETL server — like the ones used at eBay — written in lightweight C++ and designed to avoid the OS as it pumps hundreds of rows a second, can produce upwards of 4TBs/hour. A difference of about 16x. Not 2x, not 4x; 16x!

In typical “Microsoft fashion”, The SQL Server product group is attempting to show the enterprise that it has the world on a string by leveraging well-known infrastructure to solve even more complex business problems. Who could blame them? They are simply reacting to the same market indicators that Teradata, Oracle, and Netezza are seeing. Oracle brought RAC to the world many years ago and they are STILL struggling to master the DW dragon. We should applaud the efforts of Microsoft. Let’s all stand up and clap. Ok, that was fun. Now let’s take a deep breath and realize that Microsoft has years to go before they can acquire all of the technology they will need to simply enter the MPP/SN space of the truly massive data warehouse world. PDW is version 1.0. In 2015, PDW 3.0 will ship and… Thanks for reading.