Consideration For The Technical Implementation of an SOA
Posted on : 06-09-2009 | By : Paul Michaud | In : Service Oriented Architecture (SOA), Software Design
Comments
I had a lot of questions from people after my last post on BPM and SOA about the layered SOA I proposed and whether it would be slow performance wise. The answer I gave people was “It depends”. In this post I will outline in more detail some of the considerations needed around performance when implementing an SOA or any system for that matter.
Firstly, I find over the last 10-15 years system architects and programmers often don’t consider the performance needs of their system enough when designing it. The process most people follow is typically:
- Code the system up (or at least a prototype)
- Run a small performance benchmark
- Size the hardware to get the desired performance
To be honest this drives me crazy as it often yields a very non-performent system and if the performance is not achieved with hardware at step 3 then its too late in most cases to do anything about it without going back and severely refactoring the code or worse yet rewriting it all together.
Years ago this was not the case because the hardware was too slow to assume you could easily find a big enough box to ensure you achieved your performance goals. As a result programmers pre about 1993 thought very carefully about how they designed and implemented their code with performance front of mind before even a single function got coded. But once the bigger SMP UNIX boxes started coming out, programmers got sloppy to the point where today most of the younger programmers (those who started commercial programming post say 1997
) I come across, have not been taught how to evaluate the effects of their technology or programming decisions, when it comes to impacting performance.
In the old days programmers were taught that when implementing any function or procedure to consider the Order “O” of their algorithm. We worried about whether it was Order N, N Log N, N^2 or God forbid N^3 or worse. If we could replace an O(N^2) algorithm with one that was O(N Log N) we did. The same mind set should be employed today as well when writing code, but more importantly it should be employed at a higher level when deciding the language, interface types, message formats, I/O pattern, network topology and database design.
Let me explain:
Consider our Layered SOA from the last post (shown again here for reference):
This is in principle a logical architecture which we can choose to realize in a number of ways and with a range of languages, interface protocols, hardware choices, network topologies, etc.
So the first thing you need to consider at a high level is whether you are designing the system for throughput, latency/response time or both. This is important, because depending on what the system needs to achieve, you should be making potentially dramatically different technology choices. For example, if I am building an ultra high performance stock exchange system that does 5 million transactions per second and has a response time target end to end of 100 microseconds, then certain technologies are out right off the bat They include but are not limited to:
- I can’t use Java or c# as the languages themselves and the implementations of the base language libraries do not lend themselves to low latency. I could use them to achieve the high throughput but I will not get to 100 microseconds as a rule (there are some ways to get close by using Java like C, pre-compiling, ensuring no garbage collection is used, etc but its a pain)
- I can’t build it on windows at all, because of the overhead and the slow network stack
- I can’t use persistent queue based messaging
- No traditional database transactions in the critical path (in fact no on disk I/O at all)
- No XML anywhere
- And definitely no REST or SOAP Web Service Interfaces
The key is that while all of these technologies can be used in a high throughput system, they are inherently not fast from a latency or response time perspective, so if you choose to use them for a system that needs ultra fast response times, you are dead from the word go because no amount of hardware will fix it.
It is important to keep in mind the time it takes for basic operations as well so that you can roughly gauge in advance what your system will be capable of. Here is a list of rough performance measures for common operations. Your mileage will vary based on specifics such as hardware, compiler, database, etc but the order of magnitude “O” will be about right regardless. These are approximations based on say a 2.8 GHz Xeon. Newer Nehalems, etc will do better. Also these are for a single thread on a single core. Most won’t benefit from multithreading. Also keep in mind that to an extent the lower the latency or response time of the system the higher throughput it can handle per core as CPU’s free up faster, so getting this right is important for all systems. So here they are:
- Network hop from desktop to remote web server 50-500 milliseconds
- Persistent Message (small) per hop 15-30 milliseconds
- Non-Persistent Queue based messaging (small) per hop 2-5 milliseconds
- Database Insert (Complex) 15-30 milliseconds
- Database Insert (Simple) 3-10 milliseconds
- Database Select (Complex) 10+ milliseconds
- Database Select (Simple) 500 microseconds-3 milliseconds
- Binary Write to Traditional Disk 2-5 milliseconds
- Binary write to SSD 25 microseconds
- Screen Refresh 10-15 milliseconds
- Web Service Call inside a Web Server (Small Payload Java or C#) 1-5 milliseconds
- Web Service Call using GSOAP or Systinet (C no server) 200-400 microseconds
- Binary RPC Call (small payload Java or C#) 50-100 microseconds (local machine plus network roundtrip if remote)
- Binary message based function call (Small, C , Infiniband) 1-2 microseconds in shared memory space, 5-10 microseconds remote through a switch
Anyhow this list is by no means exhaustive. Response times will increase with the size of the message payload, amount of XML to serialize/deserialize, complexity of the database query, etc. What’s important to the system designer is the order of magnitude of the performance given the non-functional targets for the system. Knowing what is possible for a given operation or technology should shape both your design and technology selection.
So armed with this, lets revisit the questions I got re the performance impact of the layered approach vs a non-layered one to see what happens. Let assume the following:
- We implemented the physical architecture exactly as the Logical one is laid out with each service component on a separate machine
- We used SOAP web services for every public function call
- Assume 1 Gigabit Ethernet networking in the data center (worst case)
So lets look at the effects of layering on the Customer Service which adds an extra layer in the proposed design.
Lets assume we just want to retrieve a basic customer record. In a single layer design with one database behind the service we have the following main costs:
Network Hop Application to Customer Service 200-500 microseconds
Find Customer Web Service Call 1-5 milliseconds
Complex query doing a join across multiple tables for name, addresses, etc 10 milliseconds
Internal Logic Code execution Implementation and Process dependant
Now lets look at the layered costs:
Network Hop Application to Customer Service 200-500 microseconds
Find Customer Web Service Call 1-5 milliseconds
Parallel Network round trips 200-500 microseconds
Second Layer Parallel Web Service Calls 1-3 milliseconds
Parallel Simple Queries 500usec – 3 milliseconds
Internal Logic Code execution Implementation and Process dependant
And I would contend that you can do better by using a binary call for internal Component to Component calls, reducing the 1-5 milliseconds down to more like 50-100 microseconds.
So if you add it up, you will see that not only did the layered approach potentially improve total end to end latency but it definitely improved throughput of the system. You should now be scratching your head wondering how that happened. There is at least one major assumption here and that is that each Service Component had its own independent database. This allowed the queries in the Layered approach to happen in parallel and the queries are themselves dramatically simpler with potentially no joins, etc. If we still have one single database, then performance will be bottlenecked there are we won’t see as much gain. Even then at worst, if the queries take the same 10 milliseconds, we added about 1-3 milliseconds to the total end to end time. Depending on the amount of time spent in the implementation specific code, that’s about 10-20% slower latency worst case (and still less than the cost of a single screen refresh so a user won’t notice it unless its enough to cause the total system to queue up throughput wise) but in return we got much better throughput and much more reuse.
Anyhow, the point is, when designing a system and coding it, you really should think about these types of things before you get so far down the road that you realize you have a problem only when you’re in it up to your neck and can’t easily do anything about it.
Feel free to fire away with the questions here or on Twitter (@TechMusings)




