From the Founder and CEO, Cloudscale Inc.

Bill McColl

Subscribe to Bill McColl: eMailAlertsEmail Alerts
Get Bill McColl: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Article

Feeling the Need for Speed?

Feeling the Need for Speed?

Ah, the need for speed. It drives many of us insane as we spend late nights in a dark and dingy development shop tricking out our code to gain just another microsecond of performance. Programming legends live and die by the performance of their code. Many of my brethren have lost the pink slip to their laptop on a Friday night in a parking garage on Microsoft's Redmond campus. Regardless of the efforts of the elite Microsoft security forces, the code battles will continue, and my hope is that this article will leave you standing among the performance masters and not one of the many left wondering what went wrong. Shall we get started? Ready to take it to the edge and push .NET for all its got? Then let's go.

Optimizing .NET application code for performance is, as with many things, a mix of art and science. The art of performance tuning is largely focused on balancing the need for speed with other factors, such as future flexibility, time-to-market, security, and user requirements. The scientific aspect involves gaining a strong understanding of how .NET works under the covers. In this article I'll discuss some of the internal aspects of .NET that affect performance, but I will also consider the artistic side of performance tuning. My hope is that you will gain a new appreciation and understanding for .NET internals and be able to apply your knowledge in practical, real-world scenarios.

The first thing you need to understand ­ and never forget ­ is that you won't go very far if you try to work against the JIT compiler. The compiler represents many hours of tuning and testing to assure that it does a great job of optimizing your code. So rather than work against it, it is important to understand exactly what it does and how to work with it. The primary purpose of the JIT compiler is to turn your IL code into native code that runs on a specific machine. The JIT compiler will make a compilation only when a method is executed. This is actually one of several ways that the JIT compiler helps performance; it doesn't waste time compiling code that is not being executed. Further, once a method is compiled, it is not recompiled during the lifetime of your application's execution. Here is what happens:
1.  Your program is loaded and a function table is initialized. The function table will be populated with indirect function calls pointing to the native instructions.
2.  Your program's Main() method is compiled by the JIT and the function table is updated.
3.  Whenever a method is called in your program, the JIT will look to see if the code has been compiled. If it has, it will be executed immediately; otherwise it will be compiled and the function table is updated.

Aside from the per-method compilation that is performed by the JIT, there are several other optimizations that you get for free. I've listed them for you in Table 1 so you can understand how they are applied, as well as the effect they have on your code. You shouldn't fret much about the JIT compiler doing things that hinder performance; in actuality it does a tremendous amount without any real effort on your part.

One argument often heard among performance hacks centers on the pros and cons of using NGen.exe to precompile .NET code. I don't recommend precompiling your code except in two specific instances, but more on that later. First, let me explain the organic reason for my view. The JIT compiler has been designed to take into account the environment in which it runs, and some of the optimizations can only happen at runtime. For instance, the JIT can compile for a specific compiler and optimize the instruction set; NGEN (the Native Image Generator Utility) cannot do this. Using the JIT allows the compiler to make aggressive use of inlining functions, optimize indirection, and optimize across assemblies, none of which can happen with the use of NGEN. So when do I think you should use NGEN? I would suggest you use it if your application does a lot of upfront initialization. For instance, if you are making a lot of method calls when your application starts, remember that the JIT does per-method compilation; you may want to reduce the startup time of your application by using NGEN. I would advise using NGEN when your application uses a large number of shared libraries. In this case using per-method compilation could lead to a performance decrease.

Threading for Performance
A huge mistake that many people make is assuming that more of a good thing is always better. So if one thread is running well, the assumption is often made that more threads will run better. This is not usually the case, and regardless of the science, you should consider whether you should use threading and only use it if it is truly required. Once you know whether you actually need threads, the next decision is whether you should manage the threading or allow the .NET CLR to handle it. As with the JIT compiler, thread management under .NET is optimized rather well. For instance, thread blocking in managed code is automatically detected and the situation can be adjusted. But in some cases ­ such as those where you need to guarantee the service level of a thread or you need to have a long-running task ­ you may wish to manage your own threads.

The key to achieving maximum performance with threading is to recycle threads. Threads are objects, and their instantiation is costly. So if you create a new thread for each request, you will incur the cost of creating and initializing the thread, but if you use an existing it, service a request, you will not incur the creation/initialization costs, improving performance. Much of this will be handled for you by the thread pool, which is one of the key benefits of using .NET threading services.

So the take-away on threading and performance is to use the .NET threading services for short-lived nonblocking operations and for situations where you have longer-running operations that can benefit from a managed thread. This is clearly an area where understanding the science of threading and the art of performance is critical.

Taking Out the Trash Quickly
The garbage collector (GC) in .NET makes memory and resource management rather simple. But if you are truly interested in performance, you need to get a handle on how the GC can trip up your application's performance. First, let's review the benefits of the GC and how it works.

The GC uses a mark-and-compact approach that is generational. The .NET GC uses generations to help optimize the collection process and free up resources. Generation 0 contains recently created, frequently accessed objects. Generations 1 and 2, which hold larger and less frequently accessed objects, are not collected as often as Generation 0. The GC will sweep through Generation 0 rather often and free up resources there, but when doing the sweep of Generation 0 it ignores Generations 1 and 2. This means that any objects you have that are using large, expensive resources, such as file and operating system resources or network and database connections, could adversely impact your application's performance by holding on to precious system resources longer than needed.

So one way to improve performance is to consider your object designs and, if you have resources being held by large, infrequently used objects, take steps to release them sooner than later. One way to do this is to implement a Dispose() method on all objects that consume expensive resources that may not be collected and freed as part of Generation 0.

Dispose() vs Finalize()
Freeing resources is an important part of improving performance. The decision as to which method to use to free a resource ­ Dispose() or Finalize() ­ is really not all that complicated. The benefits of Dispose() are that it is controlled by the programmer, and resources are freed upon completion of the method. With Finalize(), the GC calls the method, but there is no order or predictability as to when the GC will call the Finalize() method. Last, using Finalize() is a two-step process. During the first pass through the generation, the GC will mark the object to be freed, and then on the next pass it will be collected and destroyed. So keep in mind that with Finalize(), your resources will be kept alive for at least two passes of the GC, and depending on what generation your object lives in, collection could be later than sooner. Does this mean you should always implement Dispose() over Finalize()? Not institute a Finalize() method at all? No, it means that you again need to balance the art of performance tuning with the science. Deciding whether or not to use one of these methods is really something you will need to evaluate on a case-by-case basis. Also, keep in mind that if you do implement Dispose(), consumers of your object must be aware that they should be calling the method, or your efforts will be for naught.

Exceptional Performance
Throwing exceptions is another area that can lead to performance degradation. It is important that you consider the use of exceptions and how they affect your application. Please understand that I am not advocating that you do not use "Try...Catch" blocks in your code, but rather that you understand when exceptions are thrown and how they impact your code. Exceptions, by sheer nature, are expensive operations. Developers will often use them to control the flow of their program or as a poor man's communication mechanism, for instance, branching code based on an exception or throwing an exception to communicate an event instead of raising an event. If you are doing this, all I can say is ­ don't.

If you aren't sure if your application is "exception heavy," try using the .NET performance counters to see how many exceptions your application is throwing. You should be aware that sometimes you can't avoid throwing an exception, for instance invoking a redirection using "Response.Redirect()" throws an exception, and there are other .NET operations that do the same thing. Chances are you will not be able to work around every instance, but knowing is always half the battle when it comes to performance tuning. Also, be wary of integration with unmanaged code in which things like COM objects and System API calls can throw exceptions that can impact performance.

Good Network Neighbors
Another area in which performance can become critical is in distributed applications where you are using either .NET Remoting or another network programming approach. Here the ability to achieve better performance is often a result of how you balance the work to be performed by the client and the server. You will need to decide if it is better to process something at the client or to package and send it across the wire. When communicating across the network be sure to minimize the number of calls and be very careful about methods that end up blocking. Each area we have already examined will be magnified when working in a network scenario and needs to be considered very closely.

Get Chunky
When making method calls on remote objects or services, consider using as few method calls as possible and avoid the use of properties as much as possible. You will dramatically increase performance if you can package the data you need to send across the wire while minimizing the number of method calls. But beware that often if you send more data across the wire you will also increase the processing time on the recipient of the data, which will need to manage the incoming data.

Go Native
The use of native data types will help minimize the expense associated with marshaling. You incur expense, and thereby degrade performance, when you create situations where data translation is required. For instance, moving data from ASCII to Unicode ­ or in some cases from XML to another format ­ can create expensive marshaling scenarios. While planning your application you can drastically improve performance if your development team agrees on how you will manage data between your client and server objects.

Use the Right GC
The .NET CLR has two garbage collectors, the workstation version (mscorwks.dll) and the server version (mscorsvr.dll). The server version is optimized for throughput and is also more aggressive about collections, so it minimizes memory fragmentation, takes advantage of multiple processors, and can also support multiple heaps. The workstation version minimizes latency and can also recognize multiple processors, but is best used in a single-processor workstation scenario. In some cases you may be running your client objects on a multiprocessor workstation communicating with objects on a server. In this case you can force the workstation to use the server version of the GC so that you can reap the benefits of the server version even though you are running on a workstation.

Security
The last area we will examine is the impact security has on performance. I left it for last because I honestly feel that security is an area where you need to be very careful that your quest for speed doesn't leave you exposed. So as you consider the optimizations presented here, test and retest to be certain you have done nothing to compromise your application's security.

Security affects performance, but with that said, .NET security has been developed to meet the performance needs of most developers. If you still need to get that last bit of performance out of your application, here are some things to consider.

.NET security is optimized and has several techniques it uses to minimize impact on performance, but there are situations where a security check on a method will cause a walk of the stack. One thing to do is minimize the stack walk. Yes, .NET does this on its own by using declarative security instead of imperative. When PermitOnly, Deny, or Assert are declared, you can avoid the stack walk. Another option is to do as many of your security checks at link time instead of runtime, using "LinkDemand" to do code checks, as opposed to identity checks.

Just Getting Started
The reality is that this is just the beginning. There is so much more to review and learn, such as the effect of boxing on performance (hint: examine your IDL and try to find a better design when you see Box/UnBox instructions), the use of ValueTypes, and looping considerations. I plan to cover these issues and also dive much deeper into plan threading, the GC, security, and the CLR in future articles. I hope the information presented here will provide a strong foundation for further exploration.

More Stories By John Gomez

John Gomez, open source editor for .NET Developer's Journal, has over 25 years of software development and architectural experience, and is considered a leader in the design of highly distributed transaction systems. His interests include chaos- and fuzzy-based systems, self-healing and self-reliant systems, and offensive security technologies, as well as artificial intelligence. John started developing software at age 9 and is currently the CTO of Eclipsys Corporation, a worldwide leader in hospital and physician information systems.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.