Production Bug – Unnoticeable Async Void

The operation team reported a bug in production. The server crashed a few times, which seems daily. Here is the detail. I modified some texts (replaced with XXX) to remove the confidential data.

Your app crashed because of System.Net.Http.HttpRequestException
Your app, XXX, crashed because of System.Net.Http.HttpRequestException and aborted the requests it was processing when the overflow occurred. As a result, your app’s users may have experienced HTTP 502 errors.

This call stack caused the exception:
System.Net.Http.HttpContent+d__54.MoveNext
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
System.Net.Http.HttpContent+d__70`2[[System.__Canon System.Private.CoreLib][System.__Canon System.Private.CoreLib]].MoveNext
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
XXX.DeleteSampleCommandHandler+<>c__DisplayClass6_0+<b__0>d.MoveNext
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw
System.Threading.Tasks.Task+<>c.b__139_1
System.Threading.QueueUserWorkItemCallback+<>c.<.cctor>b__6_0
System.Threading.ExecutionContext.RunForThreadPoolUnsafe[[System.__Canon System.Private.CoreLib]]
System.Threading.QueueUserWorkItemCallback.Execute
System.Threading.ThreadPoolWorkQueue.Dispatch
System.Threading._ThreadPoolWaitCallback.PerformWaitCallback
---> (Inner Exception #0) System.IO.IOException
System.Net.Security.SslStream+d__214`1[[System.Net.Security.SslStream+SslReadAsync System.Net.Security]].MoveNext
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
System.Net.Http.HttpConnection+d__87.MoveNext
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
System.Net.Http.HttpConnection+ChunkedEncodingReadStream+d__10.MoveNext
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
System.Net.Http.HttpConnectionResponseContent+d__5.MoveNext
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
System.Net.Http.HttpContent+d__54.MoveNext
--- End of inner exception stack trace ---
---> (Inner Exception #1) System.ObjectDisposedException
System.Net.Security.SslStream.ThrowIfExceptional
System.Net.Security.SslStream.CheckThrow
System.Net.Security.SslStream+d__214`1[[System.Net.Security.SslStream+SslReadAsync System.Net.Security]].MoveNext
--- End of inner exception stack trace ---

To learn more about this exception, use the Kudu console to view the log file in d:\home\logfiles\crashdumps, which contains up to 10 log files for your app if space is available.

A long stack trace without only one line related to our application:

XXX.DeleteSampleCommandHandler+<>c__DisplayClass6_0+<b__0>d.MoveNext 

Let’s check out that command handler class.

public async Task Handle(DeleteSampleCommand command, CancellationToken cancellationToken)
{
    var client = await _clientFactory.GetSampleApiClient(command.Endpoint);

    client.RequestProcessed += async (sender, response) =>
    {
        var request = await response.RequestMessage.Content.ReadAsStringAsync();
        var content = await response.Content.ReadAsStringAsync();

        await _auditRepository.Add(command.Endpoint, command.SampleId, _context.User, "DeleteSample", request, (int)response.StatusCode, content);
    };

    await client.DeleteSampleAsync(command.SampleId, cancellationToken);

    // Other logic after deleting the sample from a remote endpoint

}

A client is auto-generated by Swagger/NSwag. This is the DeleteSampleAsync method.

public async System.Threading.Tasks.Task DeleteSampleAsync(System.Guid sampleId, System.Threading.CancellationToken cancellationToken)
{
    if (sampleId == null)
        throw new System.ArgumentNullException("sampleId");

    var urlBuilder_ = new System.Text.StringBuilder();
    urlBuilder_.Append(BaseUrl != null ? BaseUrl.TrimEnd('/') : "").Append("/samples/{sampleId}");
    urlBuilder_.Replace("{sampleId}", System.Uri.EscapeDataString(ConvertToString(sampleId, System.Globalization.CultureInfo.InvariantCulture)));

    var client_ = _httpClient;
    var disposeClient_ = false;
    try
    {
        using (var request_ = new System.Net.Http.HttpRequestMessage())
        {
            request_.Method = new System.Net.Http.HttpMethod("DELETE");

            PrepareRequest(client_, request_, urlBuilder_);
            var url_ = urlBuilder_.ToString();
            request_.RequestUri = new System.Uri(url_, System.UriKind.RelativeOrAbsolute);
            PrepareRequest(client_, request_, url_);

            var response_ = await client_.SendAsync(request_, System.Net.Http.HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(false);
            var disposeResponse_ = true;
            try
            {
                var headers_ = System.Linq.Enumerable.ToDictionary(response_.Headers, h_ => h_.Key, h_ => h_.Value);
                if (response_.Content != null && response_.Content.Headers != null)
                {
                    foreach (var item_ in response_.Content.Headers)
                        headers_[item_.Key] = item_.Value;
                }

                ProcessResponse(client_, response_);

                var status_ = (int)response_.StatusCode;
                if (status_ == 200)
                {
                    return;
                }
                else
                if (status_ == 400)
                {
                    string responseText_ = ( response_.Content == null ) ? string.Empty : await response_.Content.ReadAsStringAsync().ConfigureAwait(false);
                    throw new ApiException("Bad Request", status_, responseText_, headers_, null);
                }
                else
                if (status_ == 401)
                {
                    string responseText_ = ( response_.Content == null ) ? string.Empty : await response_.Content.ReadAsStringAsync().ConfigureAwait(false);
                    throw new ApiException("Unauthorized", status_, responseText_, headers_, null);
                }
                else
                if (status_ == 403)
                {
                    string responseText_ = ( response_.Content == null ) ? string.Empty : await response_.Content.ReadAsStringAsync().ConfigureAwait(false);
                    throw new ApiException("Forbidden", status_, responseText_, headers_, null);
                }
                else
                if (status_ == 500)
                {
                    string responseText_ = ( response_.Content == null ) ? string.Empty : await response_.Content.ReadAsStringAsync().ConfigureAwait(false);
                    throw new ApiException("Server Error", status_, responseText_, headers_, null);
                }
                else
                {
                    var responseData_ = response_.Content == null ? null : await response_.Content.ReadAsStringAsync().ConfigureAwait(false); 
                    throw new ApiException("The HTTP status code of the response was not expected (" + status_ + ").", status_, responseData_, headers_, null);
                }
            }
            finally
            {
                if (disposeResponse_)
                    response_.Dispose();
            }
        }
    }
    finally
    {
        if (disposeClient_)
            client_.Dispose();
    }
}

The client.RequestProcessed is our custom event. The implementation looks like

// Extend the generated SampleClient, thank to the partial class
public partial class SampleClient
{
    // It is designed in the SampleClient to allow consumers hook up additional functionalities
    partial void ProcessResponse(HttpClient client, HttpResponseMessage response)
    {
        RequestProcessed?.Invoke(client, response);
    }

    // Our custom event which is invoked whenever the ProcessResponse is executed
    public event EventHandler<HttpResponseMessage> RequestProcessed;
}

Investigation

The application is a Web API service. Command handlers are invoked from Controllers. Usually, an exception cannot crash the process.

Unless?

I recall I read the async await practices from David Fowler.

BAD Async void methods can’t be tracked and therefore unhandled exceptions can result in application crashes.

Async void usually appears in event handlers. Which is this code

client.RequestProcessed += async (sender, response) =>
{
    var request = await response.RequestMessage.Content.ReadAsStringAsync();
    var content = await response.Content.ReadAsStringAsync();

    await _auditRepository.Add(command.Endpoint, command.SampleId, _context.User, "DeleteSample", request, (int)response.StatusCode, content);
};

The compiler will generate an async void delegate for that event handler. Remember that the ProcessResponse is synchronous. Therefore, when there is an exception in the event handler block, it crashes the application.

But why? how to explain the exception in the stack trace? The response instance was disposed of. The event handler read request and response contents on a different thread for auditing purposes.

Solution

A quick and dirty solution introduces another method that returns a task.


private async Task AuditRequest(DeleteSampleCommand command, HttpResponseMessage response)
{
    var request = await response.RequestMessage.Content.ReadAsStringAsync();
    var content = await response.Content.ReadAsStringAsync();

    await _auditRepository_.Add(command.Endpoint, command.SampleId, _context.User, "DeleteSample", request, (int)response.StatusCode, content);
}

client.RequestProcessed += (sender, response) =>
{
    _ = AuditRequest(command, response);
};

However, it will break the audit requirements. The _auditRepository_.Add does not run because of an exception when reading the response.

To audit, the code needs to know the request and response. The execution flow can get the request and response without reading HttpResponseMessage directly again. The generated Swagger client has already read and wrapped the content in the ApiException.Response property.

And the final solution is as below.

public async Task Handle(DeleteSampleCommand command, CancellationToken cancellationToken)
{
    var payload = JsonConvert.SerializeObject(command);
    var response = "OK";
    var statusCode = 200;
    var client = await _clientFactory.GetSampleApiClient(command.Endpoint);

    try
    {
        await client.DeleteSampleAsync(command.SampleId, cancellationToken);
    }
    catch (ApiException apiException)
    {
        response = apiException.Response ?? apiException.Message;
        statusCode = apiException.StatusCode;
    }

    await _auditRepository.Add(
        command.Endpoint,
        command.SampleId,
        _context.User,
        "DeleteSample",
        payload, 
        statusCode,
        response);
    // Other logic after deleting the sample from a remote endpoint

}

Conclusion

Here are a few things I learned from this production issue.

  1. Be careful with async await. It looks cool but be sure you know what you are doing.
  2. Avoid accessing the Request/Response directly. Look for another solution. There is always another solution.
  3. When introducing an event, ask this question: "Do I need it? Can I write code without it?"

Production bugs are always interesting.

Write a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.