Featured Posts

The New Economics of Technology Startups? I have recently been reading the book "Free: The Future of a Radical Price" by Chris Anderson.  Well I am not actually reading it as I find I do not have time for reading books any more.  These days...

Readmore

Here is my hammer. Show me your screw! Well I have been traveling out of the country a lot these past few weeks so its been a while since I posted.  I will try and do better in the future.  During my travels I had a lot of interesting discussions...

Readmore

Consideration For The Technical Implementation of an... I had a lot of questions from people after my last post on BPM and SOA about the layered SOA I proposed and whether it would be slow performance wise.  The answer I gave people was "It depends".  In...

Readmore

Why a Business Process Modeling (BPM) Approach to SOA... I was having a Twitter conversation with Brenda Michelson (@bmichelson) and Todd Biske (@toddbiske) about the tight coupling in peoples minds between BPM and SOA, and why I find that when people take a...

Readmore

Enterprise 2.0 Needs To Stop Being So Naive You know I really struggle to get excited about Enterprise 2.0.  Not because I don't think IT needs to undergo change, but because I feel that Enterprise 2.0 as we seem to be defining it, and covering...

Readmore

  • Prev
  • Next

Enterprise 2.0 Needs To Stop Being So Naive

Posted on : 31-08-2009 | By : Paul Michaud | In : Enterprise 2.0

Comments

You know I really struggle to get excited about Enterprise 2.0.  Not because I don’t think IT needs to undergo change, but because I feel that Enterprise 2.0 as we seem to be defining it, and covering it in the press and the blogosphere just doesn’t seem to be solving the key issues that either IT or the business units face. Let me explain why I have a problem with this.

I deal with CXO level execs all the time in my job and the ones I deal with all seem to have similar issues when it comes to IT Systems for the most part. While each firm and/or project will have some unique issues, they generally tend to fall into the following buckets:

  • Reduce cost in the IT organization
  • Improve efficiency in the IT organization (measured as cost reduction, responsiveness, improved time to market for projects, etc)
  • Improve user satisfaction (which usually happens if you get the first two right)

If you think about it,  pretty much everything we do in IT boils down to one of these.  Now clearly there are a whole lot of sub-bullets to each of those buckets some of which are listed here:

  • How do you fit more servers into the existing data centers
  • How do you improve server utilization
  • How do you consolidate legacy applications
  • How do you get new applications to market faster
  • How do you improve the usability and user experience of new and existing applications
  • How do you do all of this with a leaner and more productive organization

and most importantly (and in my opinion the one that if not managed correctly is the single biggest cause of failure in major IT projects) how do I get my existing staff which has a personal vested interest in maintaining the status quo to execute the vision efficiently.  Because, lets face it,  radical change in the existing IT organization along the lines of reducing costs and improving efficiency is going to result is potentially dramatically reduced head count one way or another.  By that I don’t just mean offshoring IT, but literally a permanent reduction in the numbers of IT staff across the board (on, off or near shore).

So, the questions one needs to pose when evaluating Enterprise 2.0 (or any other technology or movement, and lets face it Enterprise 2.0 is more movement than tech) are as follows:

1) What is the compelling reason to act, or significant problem, for the target users (in this case the enterprise itself)?
2) How is the proposed technology or solution going to solve that problem better than the alternatives?

Now I don’t think there is much disagreement around #1 in general.  I think the Enterprise 2.0 movement and I see eye to eye here on the problem and the compelling reason to act (which we defined at the top of the article re cost, efficiency and satisfaction).

Where I have real issue is with #2.

In general I am going to boil this down to my belief that the suggestions of the Enterprise 2.0 movement are in general naive when it comes to how to apply both technology and technique to large enterprises.  Lets look at some examples:

1)    I had a customer who has 500 applications running in part of their business.  The systems had a lot of overlapping functionality and it took 200 people full time just manually fixing errors in data replication from the nightly ETL processes.  I cannot see how cloud, mashups, social networking technology and the like would have addressed this issue.
2)  Recently I was doing a design for a client and we explained to them that the only way to hit their target go live was to use Agile methods and cut functionality to deliver a minimum viable product.  The client nodded and agreed. Then after we spent time cutting down the 150+ use cases to the bare minimum, the clients Business Analysts committee went through every line item we cut and demanded it back because they were apparently all super critical in their view (not one item was allowed to be cut)
3)  A while ago we did a complete SOA design for a client.  The business was excited, the budget was approved, etc and we were ready to start executing on the build.  Then IT middle management got involved.  Their people decided that if they supported this project then some of their existing pet projects would be shut down and the improvement in efficiency would mean job losses in IT.  They didn’t outright rebel, they just passively resisted, stalled, etc until the business gave up and the project died.  As a result it never got built and the bank ended up losing Billions when the credit markets collapsed and those same IT people got fired anyhow.  This system had been to manage risk more effectively in the banks credit portfolio, which the business knew in 2004 was an issue, and would have been online in 2006.  We actually proposed to run this on a cloud environment at that time and the IT people refused to allow it even though the business wanted it.

These are but a few examples I have seen just in the past few years.  I could go on all day with items like this.  The bottom line is this, none of this would be helped by Social Networking, Mashups, Cloud Computing, Enterprise Search, etc. and these are the kinds of problems big enterprise IT shops face every day.  When we talk Enterprise 2.0 I can’t help feeling all the discussion is really aimed at the SMB sector and not the Fortune 500 types.  That Enterprise 2.0 it is focused solely on the GUI side of the Applications, which frankly is almost never the problem.

From my perspective, I think the Enterprise 2.0 crowd needs to come down to earth and get a large dose of reality.  The world of Big Enterprise IT is not the same as a tech startup in the valley.  Not every application is about Web and related tools, collaboration, mashups, etc.  The apps where that stuff applies are frankly trivial and if that was the state of the world app complexity wise we wouldn’t have the issues we have and we wouldn’t even be talking about Enterprise 2.0.  The reality is real Enterprises have issues with Organizational Structure and that same structure fights changes.  I can’t tell you how many times I have seen attempts to redesign IT Org’s go down in flames or the result be just as bad as where they started.  They have issues with tons of legacy apps that continue to need to be supported, integrated, updated,etc.  Think Y2K people.  Those Cobol apps are still going strong (as much as the thought of that gives me a rash) and they cannot support mashups or social computing, or be run in a cloud. How do you deal with putting Paul Michaud’s contact information into 500-1000 applications which are scattered around the firm globally and no two of which store and address or a middle name the same.  These are boring mundane problems bu they are the real issues that keep CIO’s awake at night,  not whether their employees can change the color of the GUI background on the latest app or have better internal chat facilities, or Tweet from their desk.

Now don’t get me wrong, I am the biggest proponent of wholesale change being needed in both the technology and organization within today’s Enterprises.  I am a huge fan of new RAD tools, SOA, SaaS, Cloud, etc.  My point is simply that we need to wake up and not be so naive about Enterprise 2.0.  Changing big enterprises with 100′s of thousands of employees, is not trivial and just having lots of conferences about it, articles, etc and even getting CIO’s excited about it will not get it done.  Its the rare CIO who can drive through the kinds of wholesale changes needed to make an Enterprise truly Enterprise 2.0.  Remember it not sufficient to just use E 2.0 approaches and technologies for a single departmental application we need to do it across the board.  And remember, this is like the old saying,  we need to change the tires on this formula 1 race car while it continues going around the track.  We can’t shut the place down and start from scratch.   Managing the change process is the biggest challenge these efforts face.

Please feel free to comment either here on the blog or reach out to me on Twitter at @techmusings

  • Share/Bookmark

The Need For Speed

Posted on : 26-08-2009 | By : Paul Michaud | In : High Performance Computing

Comments

Yesterday was a busy day for me.  It started at 4:30 AM when I had to do an interview with a reporter from Bloomberg who covers the European Stock Exchanges.  There was then coverage of the goings on with some of my clients in the Wall Street Journal, Financial Times, and many other papers, which kept me fielding questions most of the day (and I have more reporter interviews today as well).  This interest in the exchanges spilled over onto Twitter as a result of people commenting on the coverage on CNBC.  As a result of that,  I thought I would talk a bit about the trends and challenges faced by the World’s Investment Banks, Hedge Funds and Exchanges as they grapple with their Need For Speed as their seemed to be interest by many of my followers both here and on Twitter.

So what’s driving the system designs at today Financial Market firms.  Well, amongst other factors such as cost reduction, operational efficiency, and the other usual IT issues, is an ever increasing need for speed.  Let me give you some background.

  • If you’re Twitter, you handle about 6 million Tweets per day and peak out at about 200 or so Tweets per second based on what info I have seen.
  • If your a Stock Exchange 10 years ago, you did a peak of a few thousand transactions per second
  • A big credit card processor handles a few tens of thousands of transactions per second at peak
  • Today’s Stock exchanges are building systems capable of handling millions of transactions and quotes per second

Not only are the throughput requirements exploding, but the response times or latency tolerance is approaching Zero at an alarming rate.  Again for comparison:

  • A Telecom system handles a few hundred thousand calls per second but can take a few seconds until the first ring without people complaining
  • Twitter, which is considered “real time” also takes a few seconds to acknowledge and publish a Tweet (at best)
  • A credit card processor also can be a few seconds to respond
  • 5 Years ago the NYSE would take about 40 milliseconds to process a trade
  • Today cutting edge exchanges are building systems which can process a trade in under 100 microseconds (yes that’s micro, not milli and definitely not seconds).

On the flip side,  Banks and Hedge Funds with their algorithmic trading systems are sending trades into these exchanges at volumes which for stocks is increasing at over 50% per year and for options at well over 100% per year, so these system need to scale.  In addition those same firms are monitoring the response times of each exchange and if they see one being slower than another, the trade gets routed to the faster exchange wherever possible.  The drive toward zero latency is causing a lot of traditional exchanges who had legacy systems such as the New York Stock Exchange, London Stock Exchange and Deutsche Boerse (three of the biggest), to lose market share and as a result they are all undergoing radical redesigns and deploying new cutting edge systems.

In addition to all of this speed,  keep in mind that the reliability levels on these systems are ultra high as well.  We design for 99.9999% uptime with absolutely zero loss of messages.  Its that need for high reliability levels coupled with the speed that makes building these systems such a challenge.  Making things fast without being reliable is easy. Making things reliable without being fast is also easy.  Bringing them both together is very difficult and requires radical new system designs.

Just think of some of the challenges you face.

  • How do you log transactions to a database when a physical hard drive takes 2 milliseconds to do a write
  • You can’t do database transactions in the critical transaction path because any database operations kill you for throughput and latency
  • You need to be highly horizontally scalable to handle the constant growth in transaction volume
  • You need to be running redundant hot/hot configurations for failover because the system reliability target is higher than that of any single component, components will fail,  but the system must stay up and not lose a beat
  • After 9/11 the Disaster Recovery mechanism has to keep the system up and running even during and after a 9/11 type event with no loss of messages or down time
  • How do you record all the trades.  A trade record is typically pretty small, between 120 and 400 bytes, not much bigger than a Tweet on Twitter.  People are always seem amazed that Twitter needs to store a few 10′s of GB per day.  Well with a modern exchange system we log about 1.2 TB per day and we need to keep 5 years of that searchable on main storage without going to tape, etc.

In these systems every microsecond counts.  As such even network path length is measurable and directly impacts trading profits for banks and hedge funds.  As a result firms will move their trading systems directly into Collocation facilities offered by the exchanges so as to have the absolute minimum latency as a result of the network itself.  Networks in these facilities are going from 1GigE networks to 40 Gig Infiniband and/or 10GigE (which is slower than IB for both throughput and latency).  The NYSE is putting in 100GB Fibre Optic Switches for their WAN.  They use ultra high speed messaging software on top of that network, solid state disks with SAN backups, the fastest processors with the highest I/O.  Every line of code needs to be extra tight, with Matching Engines in the exchanges typically being single threaded and using an MPP design instead of SMP because even the cost of a thread context switch is measurable and impacts profits.

Its a very tough set of criteria to meet and poses some very unique challenges.  I’m proud to say that currently the fastest of the new systems coming to market are ones I helped design and using technology I helped IBM develop, so I guess we’re doing something right.

If people have any questions on this or any of my other posts, please use the comments or reach me on Twitter at twitter.com/techmusings (@techmusings)

  • Share/Bookmark

“At Age 35, Mozart Was Dead”

Posted on : 24-08-2009 | By : Paul Michaud | In : Technology Startups

Comments

The title of this blog is a quote from Mike Moritz of Sequoia Capital during a Fireside Chat with Guy Kawasaki of Garage Technologies, Paul Graham of YCombinator and Mike at the Revenue Bootcamp held last July.  If your an entrepreneur and haven’t watched the video for this, I encourage you to watch it, as well some of the other sessions from the Bootcamp.  They are available online here.  I had the opportunity this weekend to watch a number of them and they were quite enlightening and this probably won’t be the last blog article I write that was spurred by them.

Anyhow the discussion was mostly around how Paul, Mike and the VC industry as a whole view the investment process for startups, etc and there was some great insight and advice provided by Mike, Paul and Guy over the course of the session, which I personally found very interesting.  About two thirds of the way into the session there was a question about the profile of a typical entrepreneur they were funding and they were asked specifically if it was mostly young grads fresh out of college.  Paul answered first, and said that while this was more common, they fund the full age spectrum.

When it came to Mike to answer, he jokingly said as his answer “By age 35, Mozart was dead.”  This of course offended most of the older entrepreneurs in the room, and has resulted in this quote being circulated all over twitter and other social media where startups are discussed.  Now in Mike’s defense,  he then went on to say much as Paul had, that while the young entrepreneur under 35 was more common today, that they have funded firms spanning the spectrum including people who were grandparents.  I also personally have no doubt that someone of Mike’s reputation and ability would never let a few grey hairs stop him from investing in a good idea.

Nevertheless, the comment struck a particular cord with me.  Not just because, at age 40 and with 25 years of commercial systems development under my belt, I fall into the Dead bucket, but for two other reasons:

  1. This comment is a complete 180 from the advice and belief of the Venture Capitalists 20 years ago when I launched my first company
  2. Because I think the comment, although it was in jest, reflects a subtle truth and a bias in the current startup community, which I am not sure I agree with.

So at this point your probably thinking to yourself, why am I reading this?  Paul’s a technology geek not a startup expert or VC, etc.  He doesn’t know anything about this stuff.  Well you might be surprised. In fact, people always think of me as a highly technical person, and I guess I am at that, but to be honest I always tell people I am a business person first who just happens to know a hell of a lot about technology.

To understand why this comment struck me the way it did, it is necessary to provide you with a rare but mind numbingly dull view into the early career of Paul Michaud.  In my life I have started and ran one profitable technology company, 2 web sites, and pitched for venture capital for 3 completely different businesses (failed the first, declined the capital the second and missed the window the third time in 2002, but did get job offers to work for some portfolio companies).  I have also been on the other side of things in my career in Investment Banking.  I have advised on M&A’s, private equity investments, and general stock acquisitions of technology companies for hedge funds, corporate raiders, Wall Street Investment Banks and large asset managers.

Why I Think VC Attitude Has Done A 180
If this was a movie, we would now fade into the memory shot going way back…back…back…

Like most entrepreneurs, I was what you would call a highly driven individual even at an early age.  By age 11 I had been invited by the Montreal Canadians to come and play hockey in front of some scouts in the Montreal Forum.  By 14, I was quarterback of the football team, captain of the track team and ranked 7th in the country in national mathematics competition (The combination Jock and Nerd made high school, shall we say, interesting).  I was also a member of a small Aeronautical Design Team building a prototype amphibious push prop airplane for which I wrote code on DEC PDP-11′s to simulate drag envelopes, stress on spars, etc.  By 15, I had been asked by a Canadian Investment Bank to write programs to try and automate what some of their investment managers did manually for trading S&P 500 Equities and Equity Options. I continued to write these types of programs for the next 12 years. At 16, I won first place in all five categories I was entered in at the McMaster University Science and Engineering Competition and was the youngest member of my starting Engineering Class by two years.  By age 18, I had dropped out to pursue my first Internet Startup.

So if today’s startups are Web 2.0 companies and the late 90′s was Web 1.0,  I guess we could call my first company a Web 0.1 Alpha company, except that when I started it in late 1988, the term World Wide Web and Marc Andreessen’s Netscape were still more than 5 years away. Anyhow,  the company was an information brokerage which specialized in providing access to electronic information to investors, banks and senior executives for a wide array of uses.  For those who have not heard the term Information Brokerage (which is probably just about everyone) think of a cross between a Yahoo like portal and Google with a lot of manual process in the middle.  Even in those days, we had access to 3500 electronic sources of business related data of all kinds, from news to financials, legal, etc.  The data was very expensive and was hard to find and access.  In 1989, I worked with Charles Schwab to design some of the first information packages ever provided to investors by discount brokerages and learned my first hard business lesson from them about how to get completely screwed if you don’t cover your legal ass.  By late 1989, we had gotten some traction and started to pursue funding so that we could accelerate growth.

This is where I am finally getting to why Mike’s comment struck me as a 180 from what VC’s were saying 20 years ago.  In 1989 I designed software to allow us to automate searching those 3500 computer systems, so that we could remove the manual intervention in the Information Brokerage business, thus allow us to cut cost and increase volume.  Now remember this is still 5 years before Netscape and well before Google or Yahoo were a twinkle in anyone’s eye.  In addition we had offers of multi-year contracts on the table from various firms including the predecessor to TD Ameritrade.  Armed with this, we went looking for funding …… and failed.  Now there were a couple of reasons for that failure, including:

  1. There was a recession going on in 1990-1992 so timing sucked
  2. We were a new idea that people didn’t really understand and couldn’t evaluate and we were about 5-6 years too early, so timing sucked again

But more importantly, we got told time and time again by VC’s and bankers to go hire ourselves a 55 year old front man, because no one was ever going to fund a couple of 21 year old kids.

So now, here we are 20 years later and Mike’s comment (and the general view of the industry) would imply that a 55 year old should go get himself a couple of 21 year old’s to front him.  It just struck me as ironic, how the world of the startup has changed so dramatically over that time.

Why Mike’s Comment Makes Me Worry About The State Of The Industry

So the irony aside,  Mike’s comment also got me thinking about the state of the startup industry, the perceptions of the average VC and what Mike really meant.  Lets start with the latter.

What I think Mike really meant by his comment is analogous to a comment I once had from an early mentor (the person who ran that Aeronautical Design Team).  He told me once that if I was ever going to accomplish something great in my life, that if I hadn’t accomplished it by age 25, I likely never would.  I almost made it.

The reality is, that at a young age,  you are usually more driven, more creative, and have new fresh ideas compared to the average Dead person (over 35).  That said, this is a general, statistical statement.  It is not a hard fast truth.  The reality is that while there is a higher percentage of driven, creative, entrepreneurialy inclined 25 year olds, there are also many who are just as driven, creative and entrepreneurial in the Dead Zone, just not as many as a percentage.  In addition, us Dead people are more hampered by family, commitments and life in general and are often unable to take the plunge into a startup.

I think that is what Mike meant.  Not that a talented person with the right idea, drive, creativity, etc in the over 35 bracket couldn’t get funded, but more that it is unlikely that someone in that bracket is going to pitch a great idea (or any idea for that matter) to a VC in the first place.  It does happen, but I suspect its more rare.

What concerns me more is the fact that I think Mike’s comment does represent a general bias, that young entrepreneurs are more likely to succeed than others with more life experience, and I am not sure I agree with that.  If I looked back at that 18 year old Paul Michaud, what we had then and where I could have taken it.  If I knew then what I know now (god I do sound like an old fart don’t I), I truly believe the story would have been very different.  For one,  I would have changed the business model we were using at the time,  bootstrapped the search engine and continued to organically grow the business, instead of getting discouraged and shutting down (we were actually making money but shut down anyhow because we couldn’t grow it the way we wanted to without the capital).  Imagine it we would have been there before Google or Yahoo and we were profitable already by 1990.

I guess if I look at myself as representative of those in the Dead Pool, with that entrepreneurial spark (and maybe I’m not typical), I would argue the following:

  • If you had the spark at 25, you likely have just as much entrepreneurial passion today as you did years ago, circumstance just might not allow you to act on it
  • The experiences, both successes and moreso the failures, have likely made the older entrepreneur a wiser founder than the 25 year old.  Maybe a little more cautious, maybe not, but almost always wiser and more street smart.

I could be wrong, but if I was the VC, given two entrepreneurs with equal passion, drive, creativity, technical skills and an equally good idea,  I would fund the older entrepreneur almost every time.

I look forward to your comments.

  • Share/Bookmark

How To Build an SOA Based, High Performance, Scalable and Reliable Twitter on Steroids

Posted on : 20-08-2009 | By : Paul Michaud | In : High Availability (HA), Service Oriented Architecture (SOA), Software Design

Comments

Over the past few days I have been having some issues with my Twitter account.  Beyond the well known pauses in the service, outages, etc there are some less known but more annoying problems with twitter search.  It turns out that many accounts don’t show up in search at all.  Therefore, if you are one of those lucky accounts, no one other than direct followers can see your tweets and no one can find you or any of your Tweets.  This makes the accounts pretty useless.  It also turns out its been a know issue with no fix for over a year other than to create a new account and tweet with that.  Well it turns out that my account was one such account which needless to say was very annoying and cost me 2 days of my time trying to figure out a viable work around.  As a result, Twitter earned the place of honor in today’s blog.

Now in the defense of Evan Williams, Biz Stone and the rest of the gang at Twitter, they find themselves in the enviable position of having a hugely successful product on their hands which has no doubt outpaced their wildest growth projections over the past few years and thus put stress on their design and everything else.  I on the other hand have the advantage of 20/20 hindsight and thus in this blog we can design Twitter on Steroids from scratch using technology that was not even available when Twitter was conceived.  I know the Team at twitter is busting their butts to keep up with their phenomenal growth and my hats of to them for their success.

So for those who have not read my Bio, I have been designing and building ultra high performance systems for the World’s Largest Banks and Stock Exchanges for about 25 years.  Just this June a couple of colleagues of mine and I designed and ran a stock exchange prototype system capable of 4.5 million transaction per second with round trip response time as low as 15 microseconds (yes that’s microseconds for multiple network hops, I/O, parsing, matching and the whole shebang, everything the NYSE does to tell you that you just bought 100 shares of IBM).  We also showed this system can scale linearly for throughput by adding hardware, was fully fault tolerant and could do dynamic load balancing if traffic at the exchange spiked.  In this design, I will be leveraging the lessons learned over that 25 years and the technologies used for the system above.

So lets dive in.

Requirements:
So what does our Twitter on Steroids need to do.  Here is my overly simplistic list of requirements (I am only going to deal with the big ones):

Functional Requirements:
The system shall allow users to create accounts.
The system must provide a means for users to submit Tweets
The System must persist those Tweets
Users shall be able to follow other users Tweets
The system shall provide a mechanism to search Tweets

Non-Functional Requirements:
The system shall be highly responsive
The system shall maintain response times even under load
The system shall be highly scalable
The system shall be highly available with 99.999% or better uptime (its doable)

Where Do We Start?

First some design principles:

  • We will use a componentized SOA design
  • The Twitter Web Site will use the same Service API that is exposed publicly
  • The System will use a Hot/Hot High Availability Model based on component replication for reliability
  • All Service Components will be implemented in a manner that ensure deterministic behaviour (Easiest way to do that, but not the only way, is to make it single threaded which is what we do for most exchange systems.  Thread context switches are expensive at speed and multithreading can result in coherency issues which Twitter seems to be suffering from based on the comments on their support site)
  • To the maximum extent possible all I/O, remote Service calls, etc will be asynchronous
  • All internal communication will be message based using multicast for efficiency

About the Technology
I don’t normally like to reference specific technologies in my blog but in this case I am going to as there are a couple which provide unique capabilities to implement this system design, and which people are probably not familiar with.  Apologies in advance for the product plug.  They are as follows:

Websphere MQ Low Latency Messaging (LLM):
LLM is a unique high performance messaging product that has some purpose built capabilities specifically designed for ultra high throughput, low latency, transactional systems.  For one it’s the fastest messaging available on the market, capable of throughput in excess of 9 million Tweets per second per connection, and latency application to application across a switch as low as 3 microseconds with Infiniband Networking and about 12 microseconds with 10Gbe.
More important than its speed for this type of application though is its unique high availability mechanisms.  LLM provides a unique mechanism that allows me to deliver messages to a primary and secondary Service Component at speed, while maintaining total order across all receivers.  In addition it provides unique mechanisms to perform failure detection, failover, state synchronization and component replication all at speed.  In exchange systems,  LLM has detected and failed over from a primary system to a backup in as little as 7 milliseconds, with no loss of messages or duplication and no system level down time even though a component failed.

Datapower XM70:
This is an appliance that was originally designed for Web Service and Web Edge Security.  This model is specifically enabled to work with LLM above.  It will allow us to expose REST or SOAP based services and convert them to message based for internal consumption.  The XM70 can also do content based routing, parsing and transformation for us on the fly at wire speed taking load of the back end Service Components.

XIV Storage:
This is a low cost storage appliance that has great throughput and reliability.  I have been able to sustain write speeds with this in excess of 5.5 Gb per second per intel box writing to it.

The rest we can use pretty commodity stuff.  The disk above can also be easily swapped for your preferred flavour,  this one just has great price performance.

What Does Twitter on Steroids Look Like?

My version of Twitter on Steroids would look like this (except I didn’t have room on the drawing to add the Account Management Service Componets or the Follower Service Components, so just imagine they are in the diagram and follow the same pattern :D ):

Twitter on Steroids

Twitter on Steroids

So let’s walk through this diagram.

  1. Firstly we are using Big IP to load balance across the Web Servers and also across the Datapower appliances.  This is pretty standard Web design no surprises.  The BIG IP could also do this to a remote backup site as well if configured correctly, where we could twin this setup for failover or load balancing.  Or we could put the Instance 2′s in the second site.  It just depends on the SLA’s you are trying to meet.  The logical design and coding would not change regardless.
  2. The Web Servers are making calls through the Datapower to the back end Services Components just like the external API calls.  This ensures consistent behaviour and reduces the need to test and maintain two API’s
  3. Datapower is converting all REST and SOAP payload into messages on top of LLM
  4. This is important.  Datapower is multicasting all messages out of the appliance using LLM’s high availability mechanisms.  It is also putting those messages on different topics based on the content of the message.  I am suggesting partitioning the incoming Tweets based on the first few letters of the Tweeter’s ID.  The first 2 letters will do to start giving us 676 topics to work with for load balancing.  We can add more topics for finer partitioning later if need be.
  5. LLM is delivering the messages throughout the systems and also providing all the reliability.  It handles NAK’s and ACK’s automatically, retransmissions, etc to asssure messages get where they need to be without any additional work by the application.
  6. Tweets are first picked up by the Tweet Capture Service Components.  Each partition subscribes to and handles a subset of the topics in order to provide load balancing.  It is possible to add an external system which monitors load per topic and dynamically changes the subscriptions to adjust load.  Also by partitioning, we can use multiple databases in parallel thus eliminating the databases as a bottleneck, throughput wise.
  7. I/O, in the Tweet Capture Service Components, is Asynchronous providing very fast response times.  We can batch write the tweets for higher throughput and because we do compoent replication using LLM,  if the primary Instance 1 fails, Instance 2 just takes over where it left off with no loss of messages or duplication.
  8. The Tweet Capture rebroadcasts (multicast) all messages to the Tweet Indexing Service Component.  These are also twinned for High Availability and Partitioned for Scalability.  The indexing component does as the name says and indexes into the tweets and stores a record in a database.  I would recommend an in memory database be used with a traditional database behind it, with bi-directional synchronization of current data between the two.  SolidDB/DB2 is one pair or possibly TimesTen/Oracle is another (but the latter pair is slower).  I/O would be batched and asynchronous again for speed.
  9. When a search request comes in, it would be routed by Datapower to the Search Service Components, which would then query the Indexing Service and receive back the matching records for each key word in the search.  A fast parallel algorithm would then be used to handle any “or” or “and” statements in the search
  10. These results would be returned to the caller via the datapower box as a response to the original service call.

So how fast would this be and how big could it scale?

Well this is just a guess based on my experience and without ever having looked at any specific search algorithms that might be used by Twitter.  Lets assume we write everything in C behind the Datapower for speed and stability and that we use 1Gbe for networking which is the slowest at about 27 microseconds per hop.  All latencies are round trip to and from the Datapower Box.

  1. I think for Tweet Capture, we could achieve round trip latency per tweet of about 50-60 microseconds with throughput per partition somewhere in the 100,000-200,000 thousand Tweets per second range if using a fast database and some solid state disk for the database log files, etc.  Even higher if a custom binary file system is used (15,000,000+ Tweets per second which have done with stock orders with similar sized messages)
  2. Similar performance is possible for the Tweet Indexing per partition to that of the Capture
  3. For Tweet Search it a bit tougher to gauge, but I woudl guess it would be about 100-150 microseconds per search depending on the algorithm used.  Throughput should also be well into the 100′s of thousands per second if in mkemory databases are used.
  4. Response times could be reduced by as much as 25 microseconds per network hop by using Infiniband networking instead of 1Gbe
  5. From a scaling poerspective,  this should be able to scale linearly by adding hardware almost without limit (only limited by the avilable network bandwidth)

Now clearly,  this is a simplified case and I am sure there are lots of design details we are missing but I think you get the idea.  A bigger, badder Twitter (or any other app for that matter) is definately possible and by using the SOA pattern, Async I/O, component replication, etc we can do this to almost anything.  So if anyone from Twitter (or any one else for that matter) wants to talk specifics or other examples feel free to leave a comment or reach me on Twitter (@techmusings) any time.

Sorry for picking on Twitter they just seemed like a good example given my struggles.  We all wish we had the “problems” that come with such a huge success.

  • Share/Bookmark