Laravel Chunk Database Records – But Beware!

In this post, we’re gonna be looking at how to process large amounts of database records with Laravel without running into memory or locking issues.

And that is – by chunking the data.

We’ll implement Laravel’s chunking methods and we’re gonna be doing that by creating a simple Laravel Command that will update large amounts of records from the Users table.

But there’s a trick, a common pitfall that occurs when updating database records while chunking the results.

It is very easy to overlook it and it has an unpleasant impact on the process of chunking.

So make sure to read the whole post or watch the video down below up until the end to find out how to solve this problem.

Let’s begin!

 

 

Populating the database

Like I said earlier, we’re gonna manipulate thousands of database records so we have to add some data to our local database.

To do that, I’m using the DatabaseSeeder to add ten thousand records to the Users table.

Let’s run the seeder.

php artisan db:seed

Let’s inspect the database to make sure that the seeder inserted all the records we need.

Great! Now that we have ten thousand records, let’s check out the Console Command.

The UpdateUsersCommand

This is a pretty basic Command.

We create a query that gets all the users that don’t have the two-factor authentication key – that means all of them.

We then count how many records we have, if we don’t have any, print that there are no users to be updated.

If we have the Users, then we proceed to update them.

And to know how fast it executes, we’re using a progress bar that we’re constantly updating based on our iterations.

Now the important part: the chunking…

Laravel provides a handful of methods but we’re focusing on chunk() and chunkById().

Both require the same parameters.

We specify how big each chunk should be, and feed each chunk into a Closure for processing.

Then we go ahead and update each user.

Let’s run the Command and see if this is working properly.

php artisan Users:update

Now check the Database and browse through all the data.

And we have an issue.

Only half of the users have been updated!

This is because of how the chunk() method behaves when the database records are being updated or deleted while chunking.

Understanding how chunk() works

Let’s search under the hood and find out how the query for chunk works.

https://github.com/illuminate/database/blob/master/Concerns/BuildsQueries.php#L40

So from what I understand by reading this code, the chunk method is paginating through results.

Let’s inspect the forPage() method to find out more.

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Query/Builder.php#L2118

Aha! So each chunk is being fetched via a single query with LIMIT and OFFSET.

Alright, that means Laravel paginates through the dataset 1 page at a time.

And since we’re updating – but we could be also deleting – the records in the set, Laravel effectively skips a page of data on each iteration, therefore we end up missing roughly half the data that was in the original query.

Let’s take the following scenario – there are 24 records that you wish to delete in chunks of 10:

Expected result:

IterationEloquent queryRows returned to the callback
Iteration 1OFFSET 0 LIMIT 1010
Iteration 2OFFSET 10 LIMIT 1010
Iteration 3OFFSET 20 LIMIT 104

Actual result:

IterationEloquent queryRows returned to the callback
Iteration 1OFFSET 0 LIMIT 1010
Iteration 2OFFSET 10 LIMIT 104
Iteration 3NONENONE

After the first iteration, there were only 14 records left, so when Laravel fetched Page 2, it only found 4 records.

The result is that 14 records out of 24 were deleted.

I hope the way the chunk method works is now a bit more clearer.

Fix the chunk issue

Now we need to fix the issue. And it’s a very simple but powerful fix.

Instead of querying using LIMIT and OFFSET, we should be querying by ID and LIMIT.

And Laravel has a built-in method just for that.

It’s chunkById().

chunkById() is similar to chunk(). The only difference is how the raw SQL is being constructed.

chunkById() uses ID and LIMIT, whereas chunk() uses LIMIT and OFFSET.

And in this way, we make sure to iterate over each resource without missing any.

Change the chunk() method to chunkById() at Line 45 and let’s see it in action.

Truncate the Users table, run the seeder, and then the Command.

All the users should now be updated properly!

Final words

Alright, I hope you now have a better understanding of how chunk() and chunkById() work and I hope this post (and maybe video too) helped you in your endeavor to build faster and more reliable apps.

I hope you’ve found this tutorial useful and let me know what you think about it in comments below, or on Twitter at @pelu_carol.

 

If you find this article helpful, please share it with others and subscribe to the blog to support me, and receive a bi-monthly-ish e-mail notification on my latest articles.

Spread the love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

You may also like...