Using Projection to Improve Entity Framework Core Queries

Entity Framework’s sluggish queries are a typical gripe, but we can utilize projected queries to speed things up significantly.

After using Entity Framework for a time, you begin to encounter a few minor issues.

Entity Framework’s performance is one of the primary issues.

It’s nothing new, really.

Although significant progress has been made over time, Entity Framework (along with other ORMs) has always suffered with performance. It is still relatively simple to create a painfully sluggish data access layer using Entity Framework.

The good news is that you can usually prevent these problems and resume things moving fast.

I recall a few years back working on a medium-sized application that used Entity Framework for 90% of its data access. From a modest proof-of-concept to a commercial application, it had experienced significant development. While certain aspects of the solution had been enhanced, the data access had remained mostly the same. When we began using the program at scale, which involved adding thousands of new users and considerably increasing the quantity of data stored in the databases, it wasn’t really a concern. We had to investigate why things started to slow down at that time.

One of the key causes was the overall inefficiency we were seeing with the EF-generated queries and how greedy Entity Framework (EF) was being when collecting data from the database. At first, we didn’t pay much attention to this and it did not raise any issues, but it soon turned into one of the biggest bottlenecks.

Let’s go into an example to observe what’s happening, and then we’ll consider what we can do to improve it.

The Entity Framework model

Imagine we have a program that keeps book-related data.

Here is one way we could represent it:

public class Book
{
    [Key]
    public int Id { get; set; }

    [Required, MaxLength(100)]
    public string Title { get; set; }

    [Required, MaxLength(4000)]
    public string Summary { get; set; }

    public int NumberOfPages { get; set; }

    public virtual Author Author { get; set; }

    public int AuthorId { get; set; }

    public virtual Genre Genre { get; set; }

    public int GenreId { get; set; }
}

public class Author
{
    [Key]
    public int Id { get; set; }

    [Required, MaxLength(100)]
    public string FirstName { get; set; }

    [Required, MaxLength(100)]
    public string LastName { get; set; }

    public virtual ICollection<Book> Books { get; set; }
}

public class Genre
{
    [Key]
    public int Id { get; set; }

    [Required, MaxLength(100)]
    public string Name { get; set; }
}

As a result, the Book object is connected to the Author and Genre entities.

We create our SQL script with EF migrations, and the result is a SQL Server database:

Reading data

Well, now that the database and models are ready, we want to query some data from it.

Let’s start by supposing that we want to create a method that, given a book’s ID, returns the title of the book. That ought to be simple enough.

Starting with this

public string? GetTitle(int id)
{
    Book? book = _context.Books
        .FirstOrDefault(p => p.Id == id);

    return book?.Title;
}

That seems excellent, no?

Let’s test it out and examine the search term it employs:

exec sp_executesql N'SELECT TOP(1) [b].[Id], [b].[AuthorId], [b].[GenreId], [b].[NumberOfPages], [b].[Summary], [b].[Title]
FROM [Books] AS [b]
WHERE [b].[Id] = @__id_0',N'@__id_0 int',@__id_0=10

It makes sense because it uses a TOP(1) to identify the first row and a parameter to define the Id to check for.

Do you realize, though, that we are bringing back every column from the dbo.Books table? Because of this, the C# entity has access to all the data it needs.

Occasionally, this is what we need, but because our technique only considers the Title property in this instance, we are returning more information from the database than we want.

Let’s consider another example. This time, we wish to get the list of books by a certain author back.

The procedure might be stated as follows:

public List<string> GetTitlesByAuthor(int authorId)
{
    return _context.Books
        .Where(p => p.AuthorId == authorId)
        .Select(p => p.Title)
        .ToList();
}

Let’s examine the generated SQL for this:

exec sp_executesql N'SELECT [b].[Title]
FROM [Books] AS [b]
WHERE [b].[AuthorId] = @__authorId_0',N'@__authorId_0 int',@__authorId_0=57

We are just selecting the Title field from the dbo.Books database here, and we are filtering by the author’s Id.

We are simply returning Title in this instance, no duplicate data.

What makes a difference, then?

The difference is a method known as projection.

How does projection work?

When you compose a query and then instantly change it into a new format, you are projecting. Making ensuring you just retrieve what you require from the database is helpful.

Let’s go through an instance step by step to examine what happens.

Have a look at our GetTitle method, which does not utilize projection:

public string? GetTitle(int id)
{
    Book? book = _context.Books
        .FirstOrDefault(p => p.Id == id);
    return book?.Title;
}

In this case, our query is returning the Book object directly. All the attributes have been filled in, as can be seen if we debug the contents of the book:

This is so because when.FirstOrDefault is called, and the query is sent to the database. The Title property may then be obtained and returned using the returned entity.

Compare this to our GetTitlesByAuthor procedure. In order to clarify what this approach does, we’ll split it down into a few steps:

public List<string> GetTitlesByAuthor(int authorId)
{
    IQueryable<Book> books = _context.Books
        .Where(p => p.AuthorId == authorId);

    IQueryable<string> titles = books
        .Select(p => p.Title);

    List<string> titlesList = titles.ToList();

    return titlesList;
}

Calling on our main line. While this yields IQueryableBook>, according to our books’ settings. This type is essential since it acts as the query’s definition prior to sending it to the database. in comparison to ours. The FirstOrDefault method, which we previously called. Instead of directly accessing the database, the where function generates this IQueryable object for us.

Next, a call is made. This IQueryable object may be selected. In addition to returning a new IQueryable, this doesn’t yet make a database call. But, here is the stage when we introduce projection and instruct the query to convert the result set into the new structure.

We finally call. On our IQueryable, ToList. A list of strings is returned as fast as the query has been created and delivered to the database.

The database is however only called during ours.ToList function in all of these phases. If we had called, the same thing would have happened. Here, you might use FirstOrDefault (which would give the author the first book) or other techniques. Single.

Let’s now utilize projection to change our GetTitle function so that we only get the database fields we require:

public string? GetTitle(int id)
{
    string? title = _context.Books
        .Where(p => p.Id == id)
        .Select(p => p.Title)
        .FirstOrDefault();

    return title;
}

Observe how we’re using to filter the data to proceed. then utilizing specific instructions to specify the data we want. choosing, then calling. Using FirstOrDefault will return the first row (or null if neither exists).

When we check the created SQL, we can see that this has now been successful and that the only field returning is Title.

exec sp_executesql N'SELECT TOP(1) [b].[Title]
FROM [Books] AS [b]
WHERE [b].[Id] = @__id_0',N'@__id_0 int',@__id_0=10

Extremely good.

Even though this is just a little example, maybe it will be clear how beneficial this is, especially for tables with many columns.

When you want data at the same time from several tables, you may also utilize this method. We’ll examine that next.

Using projection across multiple entities

Let’s imagine we want to create a method that provides information about a particular book’s title, author, and genre. We may wish to display this somewhere in a UI.

The structure we wish to return to the UI may be defined in a record that we have:

public record BookDetails(string Title, string Author, string Genre);

The data in this type can be returned in a range of formats.

To obtain what we want, one way is to enable lazy loading so that we may browse the entity properties:

public BookDetails? GetBookDetails(int id)
{
    Book? book = _context.Books.FirstOrDefault(p => p.Id == id);
    if (book is null)
    {
        return null;
    }
    return new BookDetails(book.Title, book.Author.LastName, book.Genre.Name);
}

The SQL suggests that this is not a good choice:

exec sp_executesql N'SELECT TOP(1) [b].[Id], [b].[AuthorId], [b].[GenreId], [b].[NumberOfPages], [b].[Summary], [b].[Title]
FROM [Books] AS [b]
WHERE [b].[Id] = @__id_0',N'@__id_0 int',@__id_0=57

exec sp_executesql N'SELECT [a].[Id], [a].[FirstName], [a].[LastName]
FROM [Authors] AS [a]
WHERE [a].[Id] = @__p_0',N'@__p_0 int',@__p_0=97

exec sp_executesql N'SELECT [g].[Id], [g].[Name]
FROM [Genres] AS [g]
WHERE [g].[Id] = @__p_0',N'@__p_0 int',@__p_0=25

To retrieve this information, we are making three calls to the database: once for the book, once for the author, and once for the genre. Lazy loading is by default disabled in Entity Framework Core because it is unacceptable.

Another choice is to use the same query to retrieve the book entity and “include” any related tables. similar to this

public BookDetails? GetBookDetails(int id)
{
    Book? book = _context.Books
        .Include(p => p.Author)
        .Include(p => p.Genre)
        .FirstOrDefault(p => p.Id == id);
    if (book is null)
    {
        return null;
    }
    return new BookDetails(book.Title, book.Author.LastName, book.Genre.Name);
}

Make sure of how we are calling. In order for these attributes to be filled during the first query, include in the linked entities. In this manner, we may avoid using lazy loading while maintaining the relevant attributes.

We can see from the SQL that even though we only used one query, it returned a lot of duplicated data:

exec sp_executesql N'SELECT TOP(1) [b].[Id], [b].[AuthorId], [b].[GenreId], [b].[NumberOfPages], [b].[Summary], [b].[Title], [a].[Id], [a].[FirstName], [a].[LastName], [g].[Id], [g].[Name]
FROM [Books] AS [b]
INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]
INNER JOIN [Genres] AS [g] ON [b].[GenreId] = [g].[Id]
WHERE [b].[Id] = @__id_0',N'@__id_0 int',@__id_0=57

Hence, this is also not a very good choice.

Lastly, we may specify the desired columns using projection. We are able to:

public BookDetails? GetBookDetails(int id)
{
    BookDetails? bookDetails = _context.Books
        .Where(p => p.Id == id)
        .Select(p => new BookDetails(p.Title, p.Author.LastName, p.Genre.Name))
        .FirstOrDefault();

    return bookDetails;
}

Once more, we are filtering by the book’s Id, choosing the columns we want, adding them straight to our BookDetails record, and then calling. To access the database and retrieve the first entry, use FirstOrDefault.

If we look at the SQL, everything seems much better:

exec sp_executesql N'SELECT TOP(1) [b].[Title], [a].[LastName], [g].[Name]
FROM [Books] AS [b]
INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]
INNER JOIN [Genres] AS [g] ON [b].[GenreId] = [g].[Id]
WHERE [b].[Id] = @__id_0',N'@__id_0 int',@__id_0=57

All that is required to create our BookDetails record is the Title, LastName, and Name fields, which we are only returning.

So, by using projection, we can group together various entities and obtain only the information we really need from the database.

Why is projection important?

You might be asking what advantages projection might provide.

The SQL queries definitely appear clearer, but does this truly change anything?

Imagine having a table with many more NVARCHAR(4000) columns than just a handful, each containing a sizable string of text. Maybe like this:

public class Article
{
    [Key]
    public int Id { get; set; }

    [MaxLength(100)]
    public string Name { get; set; }

    [MaxLength(4000)]
    public string Description { get; set; }

    [MaxLength(4000)]
    public string Summary { get; set; }

    [MaxLength(4000)]
    public string ExecutiveSummary { get; set; }

    [MaxLength(4000)]
    public string Comments { get; set; }

    [MaxLength(4000)]
    public string Contributions { get; set; }

    [MaxLength(4000)]
    public string Contents { get; set; }
}

Without utilizing projection, and if we were only required to retrieve the article name for a given id, we would perform the following query:

public string? GetArticleTitle(int id)
{
    Article? article = _context.Articles
        .Where(p => p.Id == id)
        .FirstOrDefault();

    return article?.Name;
}

That would generate the following SQL:

exec sp_executesql N'SELECT TOP(1) [a].[Id], [a].[Comments], [a].[Contents], [a].[Contributions], [a].[Description], [a].[ExecutiveSummary], [a].[Name], [a].[Summary]
FROM [Articles] AS [a]
WHERE [a].[Id] = @__id_0',N'@__id_0 int',@__id_0=3

We are, as you can see, returning all of those extra columns. The data

public string? GetArticleTitle(int id)
{
    return _context.Articles
        .Where(p => p.Id == id)
        .Select(p => p.Name)
        .FirstOrDefault();
}

retrieved from the database will be greater than necessary if they are all huge strings.

When I execute this query and switch on Client Statistics in SQL Server Management Studio (Shift+Alt+S), I can see that it delivers data in the range of 48KB:

If I were to utilize projection instead, I’d say:

The SQL query then just delivers the data I require:

exec sp_executesql N'SELECT TOP(1) [a].[Name]
FROM [Articles] AS [a]
WHERE [a].[Id] = @__id_0',N'@__id_0 int',@__id_0=3

And now, just the value 88B is returned when I perform this query and see the Client Statistics:

Imagine getting this call hundreds of times per second; the bandwidth you’ll be saving by utilizing projection will be considerable.

Yet, using projection can also result in smaller data sizes and faster execution times.

Let’s say we want a list of books written by a particular author. Without projection, we would first obtain the book entities by calling.ToList() to obtain the data, and then obtain the list of titles from our in-memory list:

public List<string> GetTitlesByAuthor(int authorId)
{
    List<Book> books = _context.Books
        .Where(p => p.AuthorId == authorId)
        .ToList();

    return books
        .Select(p => p.Title)
        .ToList();
}

The produced SQL looks like this:

exec sp_executesql N'SELECT [b].[Id], [b].[AuthorId], [b].[GenreId], [b].[NumberOfPages], [b].[Summary], [b].[Title]
FROM [Books] AS [b]
WHERE [b].[AuthorId] = @__authorId_0',N'@__authorId_0 int',@__authorId_0=57

Doing this and looking at the execution plan will show us how SQL is handling this request.

Basically, it’s a Clustered Index Scan. Also, a warning has been sent, as can be seen. We are reading more rows than we actually need to, according to this warning:

When presented with an execution plan like this, we would typically examine this table’s indexes to see if we could make any improvements.

We already have an index on AuthorId: if we look at the indexes that Entity Framework added by default.

CREATE NONCLUSTERED INDEX [IX_Books_AuthorId] ON [dbo].[Books]
(
 [AuthorId] ASC
)

We could wonder, why isn’t this index being used.

The reason for this is that since all of the table’s columns are being returned, SQL concluded that it may as well utilise the Clustered Index rather than having to search up each one separately. That’s not good since each time this query is run, we get much too many reads on this table.

Let’s modify our code to use projection as follows:

public List<string> GetTitlesByAuthor(int authorId)
{
    return _context.Books
        .Where(p => p.AuthorId == authorId)
        .Select(p => p.Title)
        .ToList();
}

Now, our SQL appears as follows:

exec sp_executesql N'SELECT [b].[Title]
FROM [Books] AS [b]
WHERE [b].[AuthorId] = @__authorId_0',N'@__authorId_0 int',@__authorId_0=57

Our strategy for execution is as follows:

It is, well, the same technique. Let’s take another look at our index, which is just accessing the AuthorId field and not the Title column.

Let’s add a Title column to this index in the following manner:

CREATE NONCLUSTERED INDEX [IX_Books_AuthorId] ON [dbo].[Books]
(
 [AuthorId] ASC
)
INCLUDE([Title])

Currently, the execution plan appears as follows when we perform the query:

Significantly improved. Since the AuthorId I’m testing with only contains 14 books, we are just reading the 14 rows that are necessary. Due to the fact that we’re not creating too many reads on this table, this is precisely what we want.

Why couldn’t we have optimized the initial non-projection query in this way, one could wonder? Yes, we could have done that, but the index would have had to include all of the table’s columns. It’s conceivable, although not optimal, to do that. This implies that every time this table is updated, this index will also need to be updated. If the table has a lot of data, this index will also be much bigger than it needs to be. Also, anytime a new column is added to the table, you’ll need to keep in mind to change the definition of this index. Finally, because indexes have a maximum row size, you could discover that you can’t insert a new row based on the data in your columns.

While projection makes it simpler for SQL to create better execution plans and gives you more alternatives for your indexing strategy, it is much preferable to be able to utilize an indexing strategy that is suited to the data you really require.

Projecting into other buildings

As was said before, a query can be projected into built-in types:

public List<string> GetTitlesByAuthor(int authorId)
{
    return _context.Books
        .Where(p => p.AuthorId == authorId)
        .Select(p => p.Title)
        .ToList();
}

We’ve also shown that projections into records are possible:

public record BookDetails(string Title, string Author, string Genre);

public BookDetails? GetBookDetails(int id)
{
    BookDetails? bookDetails = _context.Books
        .Where(p => p.Id == id)
        .Select(p => new BookDetails(p.Title, p.Author.LastName, p.Genre.Name))
        .FirstOrDefault();

    return bookDetails;
}

We could also project into classes:

public class BookDetails
{
    public string Title { get; set; }
    public string Author { get; set; }
    public string Genre { get; set; }

    public BookDetails(string title, string author, string genre)
    {
        Title = title;
        Author = author;
        Genre = genre;
    }
}

public BookDetails? GetBookDetails(int id)
{
    BookDetails? bookDetails = _context.Books
        .Where(p => p.Id == id)
        .Select(p => new BookDetails(p.Title, p.Author.LastName, p.Genre.Name))
        .FirstOrDefault();

    return bookDetails;
}

But, these are not our only choices. We can also expand into arbitrary objects:

var bookDetails = _context.Books
    .Where(p => p.Id == id)
    .Select(p => new
    {
        Title = p.Title,
        Author = p.Author.LastName,
        Genre = p.Genre.Name
    })
.FirstOrDefault();

string? title = bookDetails?.Title;
string? author = bookDetails?.Author;
string? genre = bookDetails?.Genre;

Furthermore, tuples can be used:

public Tuple<string, string, string> GetBookDetails(int id)
{
    var bookDetails = _context.Books
        .Where(p => p.Id == id)
        .Select(p => new Tuple<string, string, string>(p.Title, p.Author.LastName, p.Genre.Name))
        .FirstOrDefault();

    return bookDetails;
}

Therefore, we’ve got a lot of possibilities when using projection.

Submit a Comment

Your email address will not be published. Required fields are marked *

Subscribe

Select Categories