A
long time ago I promised to release my buffer manager class that I use with
my Async socket server with some explanations on my blog as to why I use it. Chris
Mullins and JD
Conley were a lot of help to me in coming up with this strategy some time
ago (August? of last year I think) and they are continued resources on advanced socket programming who are great to bounce ideas off of. Unfortunately due to being so gosh darn busy
I never had the time to share it with everyone. I should also add that Sebastien Lorion also made a
really valuable suggestion in the move from a queue to a stack as the operation
is much faster and can provide better localization and that Jeffrey
Richter also provided some information on the non-assurance of LOH in the
future.
Yun Jin gives a very
good description of the issue in this
post but I will briefly summarize the issue. When you call
BeginReceive/BeginSend on a socket the memory that you pass to the method will
be pinned for the duration of the call. This will lead to heap fragmentation and
to make matters worse it will often be in your gen0 heap if you aren't careful.
This becomes especially true in the most common socket applications
(request/response) as BeginReceive is often called and waits for a long time to
receive data (by long time I mean a few seconds or more).
Unfortunately the solution presented by Yun Jin is far from optimal in 2.0.
This is of course of no fault of his own as it was written for 1.1 although some
optimizations could apply there as well. Even as presented it will run into a
few issues. The first issue and I consider this the biggest is based on one of
his statements.
If the pinned objects are allocated around same time, the free slots
between each two objects would be smaller, and the situation is better.
The code presented there does a good job of creating objects at the same
time. The problem is that you don't generally pin them for the same amount of
time (and they are often pinned for very long periods of time) so as collections
run the GC will move the unpinned ones thus creating holes around the pinned
ones (heap fragmentation) which is what we were trying to avoid. Given he does
qualify his buffer manager with another rule of
The shorter the objects are pinned, the easier GC could compact the
heap
Unfortunately in the case of most TCP servers this is not something that can
always be realized.
The main area that we can improve in by moving to .NET 2.0 is by using ArraySegments<byte>
structures instead of using byte[]. This may seem like a very simple change but
it in fact makes a huge difference in two areas.
The first thing it will allow us to do which was a limitation of the previous
buffer manager is to vary our buffer size per client. If you have a client who
is receiving a large file for instance you probably want to use larger send
buffers than a client who is not really doing much. The reason you want to do
this is to limit the number of thread context switches associated with that
socket (less context switches = better and more scalable). The buffer manager as
presented used single sized byte arrays and as such all buffers were to be
considered of equal size. .NET 2.0 added a few new overloads in dealing with BeginReceive and
BeginSend that
allow me to pass a List<ArraySegment<byte>> by making my
BufferManager now manage ArraySegment<byte> objects a connection can at
its whim increase and decrease its buffer size by checking out more buffers or
returning some buffer back to the cache respectively.
The next change that was allowed because of the use of ArraySegments was the
further isolation of the memory. Instead of creating numerous small arrays which
then move slowly from Gen0->Gen1 etc I can create really big arrays which
will be put right into the LOH
(Large Object Heap). At present objects over 80k or so are put in the LOH
but this is subject to change in the future *keep this in mind as it may affect
your performance slightly later as you could end up with a really big array in
gen0 which is pinned at almost all times*. One of the nice things about the LOH
is that there is no compaction in other words we don't have to worry about the
object being pinned or about fragmentation. As you can see the ability to create
this one BIG array will be quite a bit better for us as opposed to dealing with
lots of little arrays.
The biggest issue I had with the code as it stood was the handling when we
ran out of buffers, it would create them one by one leading to all the same
issues we would have without a buffer manager (objects not living near each
other and being pinned thus fragmenting the heap). This issue is removed
automatically when dealing with chunks because we always N chunks at a time
(i.e. if we run out of buffers and we are told to deal with 8k chunks in 1000
buffer segments we won't create 1 new buffer but a whole new 8mb segment
consisting of 1000 buffers). Although this could be wasteful for memory one is
generally better off in a server application to grow items like this once as
opposed to often. In fact the reason why this is not such a big issue overall is
that with the previous incarnate it should always be the case
that BUFFER_SIZE is larger than the maximum number of buffers you think you
would need and thus you never need to resize (an extra few mb in buffer space is
no big deal in the context of such an application and this is a common
pattern)
The other changes which were made were making the buffer cache thread safe. I
used LockFreeStacks to do this. There are many implementations of lock free
stacks that you can use available on the net Julian Bucknall has a great series of
articles on this as well. I used this particularly because I run on a
machine with lots of processors, you will likely be better off running with
simple monitor based locking around a normal Stack<T> if running on less
than 8 processors and this should
be a fairly easy change to the code. If people
are fairly interested I will spend the 15 minutes to make this change.
UPDATE: I put the lock version in a seperate listing (you could make the locks finer grained than my quick update though)
As a quick sidebar: the checkout method may look odd to you in the LockFreeStack version. The TryPop method will try to pop a value of the stack for me (in the tryxxx pattern format). The reason this pattern is needed is because I don't lock my count test would be innacurate, I need to do the operation attomically If a buffer is not found then I need to create more a new segment of buffers ... but another thread may have already done this for me (so I check the count in my lock before creating again). There is also an interesting case where although a new segment was created I am unlucky enough that all of it has been used up when I get to it so I have to loop. This has all of the problems of lock free algorithms especially starvation but is so unlikely to happen that I am not considerring to be a major concern.
All of that said here is the code. Note that even with all of this long
explanation the code is very short (as is the original) but its a perfect
example of how subtle asynchronous socket programming with .NET can be!
|
/// <summary> /// A manager to handle buffers for the socket
connections /// </summary> /// <remarks> /// When
used in an async call a buffer is pinned. Large numbers of pinned buffers
/// cause problem with the GC (in particular it causes heap
fragmentation). /// /// This class maintains a set of large segments
and gives clients pieces of these /// segments that they can use for their
buffers. The alternative to this would be to /// create many small arrays
which it then maintained. This methodology should be slightly /// better
than the many small array methodology because in creating only a few very
/// large objects it will force these objects to be placed on the LOH. Since
the /// objects are on the LOH they are at this time not subject to
compacting which would /// require an update of all GC roots as would be
the case with lots of smaller arrays /// that were in the normal
heap. /// </remarks> public class BufferManager {
private readonly int m_SegmentChunks; private readonly int
m_ChunkSize; private readonly int m_SegmentSize; private
readonly LockFreeStack<ArraySegment<byte>> m_Buffers; private
readonly object m_LockObject = new Object(); private readonly
List<byte[]> m_Segments;
/// <summary> /// The current number of buffers
available /// </summary> public int AvailableBuffers
{ get { return m_Buffers.Count; } //do we really care about
volatility here? }
/// <summary> /// The total size of all
buffers /// </summary> public int TotalBufferSize
{ get { return m_Segments.Count * m_SegmentSize; } //do we really
care about volatility here? }
/// <summary> /// Creates a new segment, makes buffers
available /// </summary> private void
CreateNewSegment() { byte[] bytes = new byte[m_SegmentChunks *
m_ChunkSize]; m_Segments.Add(bytes); for (int i = 0;
i < m_SegmentChunks; i++) { ArraySegment<byte> chunk
= new ArraySegment<byte>(bytes, i * m_ChunkSize,
m_ChunkSize); m_Buffers.Push(chunk); }
}
/// <summary> /// Checks out a buffer from the
manager /// </summary> /// <remarks>
/// It is the client's responsibility to return the buffer to the manger
by /// calling <see cref="Checkin"></see> on the
buffer /// </remarks> /// <returns>A <see
cref="ArraySegment"></see> that can be used as a
buffer</returns> public ArraySegment<byte> CheckOut()
{ ArraySegment<byte> b; while
(true) { while (true) { bool found = !m_Buffers.TryPop(out b); if (!found) { lock (m_LockObject) { if (m_Buffers.Count == 0) { CreateNewSegment(); } } } else { return b; } }
}
/// <summary> /// Returns a buffer to the control of
the manager /// </summary> ///
<remarks> /// It is the client's responsibility to return the
buffer to the manger by /// calling <see
cref="Checkin"></see> on the buffer ///
</remarks> /// <param name="_Buffer">The <see
cref="ArraySegment"></see> to return to the
cache</param> public void CheckIn(ArraySegment<byte>
_Buffer) { m_Buffers.Push(_Buffer); }
#region constructors
/// <summary> /// Constructs a new <see
cref="BufferManager"></see> object ///
</summary> /// <param name="_SegmentChunks">The number of
chunks tocreate per segment</param> /// <param
name="_ChunkSize">The size of a chunk in bytes</param> public
BufferManager(int _SegmentChunks, int _ChunkSize) :
this(_SegmentChunks, _ChunkSize, 1) { }
/// <summary> /// Constructs a new <see
cref="BufferManager"></see> object ///
</summary> /// <param name="_SegmentChunks">The number of
chunks tocreate per segment</param> /// <param
name="_ChunkSize">The size of a chunk in bytes</param> ///
<param name="_InitialSegments">The initial number of segments to
create</param> public BufferManager(int _SegmentChunks, int
_ChunkSize, int _InitialSegments) { m_SegmentChunks =
_SegmentChunks; m_ChunkSize = _ChunkSize;
m_SegmentSize = m_SegmentChunks * m_ChunkSize; m_Buffers = new
LockFreeStack<ArraySegment<byte>>(_SegmentChunks *
_InitialSegments); m_Segments = new
List<byte[]>(); for (int i = 0; i < _InitialSegments;
i++) { CreateNewSegment(); } }
} |
| Listing 1: Code with
LockFreeStack |
|
using System; using System.Collections.Generic; using System.Text;
namespace TickerPlant { /// <summary> /// A manager to
handle buffers for the socket connections /// </summary> ///
<remarks> /// When used in an async call a buffer is pinned. Large
numbers of pinned buffers /// cause problem with the GC (in particular it
causes heap fragmentation). /// /// This class maintains a set of
large segments and gives clients pieces of these /// segments that they
can use for their buffers. The alternative to this would be to /// create
many small arrays which it then maintained. This methodology should be
slightly /// better than the many small array methodology because in
creating only a few very /// large objects it will force these objects to
be placed on the LOH. Since the /// objects are on the LOH they are at
this time not subject to compacting which would /// require an update of
all GC roots as would be the case with lots of smaller arrays /// that
were in the normal heap. /// </remarks> public class
BufferManager { private readonly int m_SegmentChunks;
private readonly int m_ChunkSize; private readonly int
m_SegmentSize; private readonly Stack<ArraySegment<byte>>
m_Buffers; private readonly object m_LockObject = new
Object(); private readonly List<byte[]> m_Segments;
/// <summary> /// The current number of buffers
available /// </summary> public int AvailableBuffers
{ get { return m_Buffers.Count; } //do we really care about
volatility here? }
/// <summary> /// The total size of all
buffers /// </summary> public int TotalBufferSize
{ get { return m_Segments.Count * m_SegmentSize; } //do we really
care about volatility here? }
/// <summary> /// Creates a new segment, makes buffers
available /// </summary> private void
CreateNewSegment() { byte[] bytes = new byte[m_SegmentChunks *
m_ChunkSize]; m_Segments.Add(bytes); for (int i = 0;
i < m_SegmentChunks; i++) { ArraySegment<byte> chunk
= new ArraySegment<byte>(bytes, i * m_ChunkSize,
m_ChunkSize); m_Buffers.Push(chunk); }
}
/// <summary> /// Checks out a buffer from the
manager /// </summary> /// <remarks>
/// It is the client's responsibility to return the buffer to the manger
by /// calling <see cref="Checkin"></see> on the
buffer /// </remarks> /// <returns>A <see
cref="ArraySegment"></see> that can be used as a
buffer</returns> public ArraySegment<byte> CheckOut()
{ lock (m_LockObject) { if (m_Buffers.Count ==
0) { CreateNewSegment();
} return m_Buffers.Pop(); } }
/// <summary> /// Returns a buffer to the control of
the manager /// </summary> ///
<remarks> /// It is the client's responsibility to return the
buffer to the manger by /// calling <see
cref="Checkin"></see> on the buffer ///
</remarks> /// <param name="_Buffer">The <see
cref="ArraySegment"></see> to return to the
cache</param> public void CheckIn(ArraySegment<byte>
_Buffer) { lock (m_LockObject) {
m_Buffers.Push(_Buffer); } }
#region constructors
/// <summary> /// Constructs a new <see
cref="BufferManager"></see> object ///
</summary> /// <param name="_SegmentChunks">The number of
chunks tocreate per segment</param> /// <param
name="_ChunkSize">The size of a chunk in bytes</param> public
BufferManager(int _SegmentChunks, int _ChunkSize) :
this(_SegmentChunks, _ChunkSize, 1) { }
/// <summary> /// Constructs a new <see
cref="BufferManager"></see> object ///
</summary> /// <param name="_SegmentChunks">The number of
chunks tocreate per segment</param> /// <param
name="_ChunkSize">The size of a chunk in bytes</param> ///
<param name="_InitialSegments">The initial number of segments to
create</param> public BufferManager(int _SegmentChunks, int
_ChunkSize, int _InitialSegments) { m_SegmentChunks =
_SegmentChunks; m_ChunkSize = _ChunkSize;
m_SegmentSize = m_SegmentChunks * m_ChunkSize; m_Buffers = new
LockFreeStack<ArraySegment<byte>>(_SegmentChunks *
_InitialSegments); m_Segments = new
List<byte[]>(); for (int i = 0; i < _InitialSegments;
i++) { CreateNewSegment(); } }
} } |
| Listing 2: Code for BufferManager with Stack<T> +
monitor locking |
