Production Bug – SQL Identity Jump

I got a bug report that some records were not synced between the two systems. There is a system to input data, and there is another system to expose data to external systems. In between, there is a system to convert data. Combining them is a data pipeline. It is typical architecture these days.

After checking data in related databases, I found that the error occurred in a single database where the data moved from one table to another. There are a source table, a destination table, and an intermediate table. We store the progress in a state which allows the system to continue from the last run.

Whenever there are changes in the source table, a SQL trigger executes and moves the "EntityId" column into the intermediate table.
Whenever the function (a timer Azure Function) starts, it reads data from the intermediate table and performs the required logic to convert data.

Everything worked as expected until the production bug was reported.

Schema and Code

The intermediate table schema Entity_Changes

CREATE TABLE [dbo].[Entity_Changes]
(
    [Ordering] [bigint] IDENTITY(1,1) NOT NULL PRIMARY KEY, 
    [EntityId] UNIQUEIDENTIFIER NOT NULL, 
)

The execution flow is

  1. Read records from Entity_Changes table
  2. Proceed them
  3. Delete them from Entity_Changes table

The simplified version of the code

private async Task<long> ReplaceRows(IDataSource dataSource, ConversionState resultState, CancellationToken cancellationToken)
{
    var pagingFrom = resultState.NumberOfRowsConverted;

    long recordsCount = 0;
    const int batchSize = 1000;
    do
    {
        cancellationToken.ThrowIfCancellationRequested();

        pagingFrom = resultState.NumberOfRowsConverted;
        var pagingTo = pagingFrom + batchSize;

        var pagedResult = await dataSource.ReadData(pagingFrom, pagingTo);
        if (pagedResult.MaxOrdering == 0)
        {
            _logger.LogInformation("No records to proceed");
            break;
        }

        // Do the conversion logic here

        recordsCount += distinctResults.Count;
        _logger.LogInformation($"{recordsCount} records");

        // NumberOfRowsConverted is used to set the PagingFrom value. 
        // It is the max ordering from the last batch
        resultState.NumberOfRowsConverted = pagedResult.MaxOrdering;
        resultState.State = (int)ConversionStateStatus.Running;
        await _conversionStateService.SaveLastSyncedState(resultState);

        var deletedRows = await dataSource.DeleteProceededRecords(pagingFrom, pagingTo);
        _logger.LogInformation($"Delete {deletedRows} rows from temp table");

    } while (true);

    return recordsCount;
}

In one of the data sources, I found these interesting numbers. The resultState.NumberOfRowsConverted was 5.885.592. The lowest Ordering in the source table was 5.895.593. The difference was 10.001. It did not make sense. How could that happen?

With those numbers, the ReadData returned empty result because the pagingTo was 5.886.592 (=5.885.592 + 1.000). It is smaller than 5.895.593

From experience, I know that the SQL Identity column does not guarantee continuous values; for example, when a transaction fails, Identity values are lost. But 10K gap is too big.

Looking at the code for hours could not help me explain (btw, I found some other issues with the code); I asked Google this SQL Identity Jump. Bingo! I found the answer SQLHack Avoid Identity Jump.


SQL Server is using a different cache size for the various data type of identity columns. The cache size of the INT data type column is 1000, and the Cache size of the BIGINT or Numeric data typed column is 10000.

Bingo! In some circumstances, there are jumps. The Ordering column is a BIGINT, so the jump size is 10.000.

Solution

Once you know the root cause, the solution is quite simple. First off, the paging implementation was not accurate. The number of rows might not be the same between the reading and deleting data. How could it happen?

Let’s say, when reading data for paging from [0-1.000], there are 10 rows in the database. The system proceeds them. When it comes to deleting those 10 rows, there are 900 rows. The solution is simple.

// Delete the actual ordering range that were proceeded
var deletedRows = await dataSource.DeleteProceededRecords(pagingFrom, pagedResult.MaxOrdering);

To solve the first problem, query using SELECT TOP WHERE ORDERING > @pagingFrom instead of querying by range.

var pagedResult = await dataSource.ReadData(pagingFrom, batchSize);

Production bugs are always interesting.

Pull Requests

A pull request (PR) is a social contract between developers, architects, and technical leads. So what are the major parts of a pull request?

Code

In software development, most of the time, PR is about code. Keep the changes as small as possible. Do not try to include too many changes in a pull request.

Description

I have seen many pull requests with poor descriptions. When I read them, they do not give me any helpful information. What should be included in a PR’s descriptions?

Bugs: Root Cause – Solution – Impact

Root Cause: It is a critical part. Developers must explain why and how it occurred. A common understanding between developers and reviewers is established once they read and agree upon the root cause. When the root cause is wrong, the rest does not matter.

Solution: Explain how you have fixed the problem. Did you make any tradeoffs? And why did you choose that approach? Depending on the problem, the detail differs.

Impact: Some changes might have significant consequences, such as infrastructure changes, changes in frameworks, changes in common flows. Some changes are local. Developers must be aware of them, and they must write them down.

Features: Problem/Requirement – Solution – Impact

The steps are the same as bugs, except the first one is "Problem/Requirement," a sentence explaining the outcome of the PR.

Commit Logs

Many pull requests have only one poor description commit. And it includes every change in the PR. IMO, it is a waste of commit feature. Commit logs should tell the process of solving the problem, a series of small changes. It reflects the implementation flow.

  1. A commit should capture a single "logical" change if possible. It is the same as the Single Responsibility Principle in software development.
  2. A commit should have a "good" commit message. At least it should help you, the developer, understand what has been implemented in that commit.
  3. The commit logs (all commits in a PR) should form an implementation flow, which is very important. It is like a checklist for both developers and reviewers. One can figure out what was missing by reading the commit logs.

There is pushback. What is a single "logical" change? Well, there is no definition. Developers have to practice and improve their judgment over time. It is also the magic of this practice. It makes developers better.

Resolve Code Reviews

If possible, resolve a comment in a commit. It is much easier to track what has been resolved and how – by looking at the commit detail.

Small practices can make a big difference.

Towards simplicity – Planning

My journey towards simplicity

For years, I have been practicing planning. There are books, articles about planning. One asks Google, and it gives you millions of resources. I have tried some of them. None of them works on me.

  1. I could not complete what I had planned. I stacked up the debts quickly. Soon, the original planning goals were gone and replaced by "making the plan look good."
  2. I overused planning tools. There are many out there. None of them made me feel good, probably because I planned poorly.

I felt it was too complicated, overwhelming. I needed a more straightforward approach.

Basic question

I realized that I jumped into learning "how to plan" without asking a fundamental question:

What do I want to accomplish from planning?

It turned out that my needs were simple.

  1. I want to feel good that I have accomplished a few things at the end of the day.
  2. I want to know what I have to do the following day.
  3. It is ok not to accomplish them all. I do not want to carry the guilty to the next day.
  4. I want to plan a few things for the weekend.

There are a few keywords: TODAY, TOMORROW, WEEKEND, and A FEW THINGS.

A few things

It is essential. I limit five items. The rule forces me to decide what matters most to me. Once I get them done, I call it a day.

Tomorrow

It is what I plan TODAY. Before going to bed, I plan for tomorrow. It is a list of a few things I want to do. I started by writing them down on papers. Later, I use MS ToDo app. Both works fine.

Today

When today comes, I move the "tomorrow" list into the "today" list. When I start to work on an item, I mark it done.

Weekend

During the week, I have things to handle on the weekend. They are on the list. When the weekend is over, they are gone regardless of results.

Switching Contexts Tips

Simple tips and tricks I have learned over the time. Some are useful.

If we can focus on one thing for a period of time, it is great. It is True But Useless (TBU) to me. I cannot do that. I have my work, my Facebook, my Twitter, my books… They are part of my reality.

Instead of fighting hard to just focus on one thing at a time, I did them better.

So what was the previous typical behavior? Let say I am reading a book. Notice that I have Facebook, LinkedIn, … tabs opening. I would have:

  1. Read a few pages in the book
  2. Switched to Facebook and read/scroll Facebook
  3. Time flies, usually longer than I have expected or imagined. Back to the book
  4. Read a couple more pages
  5. The steps were repeated. Sometimes, I stopped reading the book and did something else

They look normal except for the unaware consequences – I forgot almost everything

  1. I felt reading time longer than reality
  2. I felt Facebook time shorter than reality (it always takes longer)

I want to remember as much as I can what I have read. I want to enjoy my social media time. Above all, I want to feel good.

Pause and Reflect

Before starting the next action, I pause and reflect what I have learned/done in the previous action. I simply recall what I have learned, make sense of those new information, summarize them in my head. The actions vary. They take a few minutes.

A few amazing things happen

  1. I remember more what I have learned/done. I can recall them again at the end of the day
  2. Sometimes, I continued the learning. It gave me time to decide. I am in charge

Track the time

I do not track my time to manage it better. I do it to raise my awareness to the actions. Of course, at the end of a week, I gain the benefit of having everything logged and tracked.

I am not interested in managing my time. Instead, I am focusing on improving my energy and attention.

I use Toggl to track my time, but any tool can do the job, even on paper.

ASPNET Core User Secrets

One of things I recently know: ASPNET Core user secrets.

I took over another ASPNET Core project. As usual, I did a normal routine

  1. Check out the code from repository
  2. Compile and hit F5

Oops! It compiled but an exception was thrown at runtime. Usually, the configuration was wrong because the code was now running on my machine. I checked the source and looked for the appsettings.json or appsettings.development.json file. Bad news! They were not there.

How could the other developers run it locally? Were there any secrets? Apparently, yes there was a User Secret.

<PropertyGroup>
  <TargetFramework>netcoreapp3.1</TargetFramework>
  <UserSecretsId>79a3edd0-2092-40a2-a04d-dcb46d5ca9ed</UserSecretsId>
</PropertyGroup>

And here is the complete article from Microsoft Docs. Once I read it, Aha! It has been around for a while. Just that I was behind the bar.

The purpose is clear. It enforces security by preventing accidental commits to repository. There are connection strings, credentials, passwords, … all kind of sensitive information. It is so easy to make a mistake and push them into the remote repositories.

However, I had a secret problem. There are more than 15 settings. Adding them one by one is not a pleasure task. Note that there was a configuration file with all default settings. If every developer names their database, port, folder, … consistently, the configuration is the same. I had that file by asking other developers. How do I import them into my secrets?

  1. Open the secret folder at %appdata%\Microsoft\UserSecrets\
  2. Find the folder with the secret ID
  3. Copy the appsettings.development.json file to it and rename to secrets.json. Delete the existing one if exists

And it works perfectly. If there are differences in configuration between branches, be aware of updating the secrets.

The secrets are revealed!