Why do I need batched?

Sometimes, you have large datasets or iterables that you want to process in smaller chunks rather than as a single long sequence. Instead of manually slicing or grouping elements, the batched function lets you quickly split an iterable into batches of a specified size. This is especially useful for tasks like data processing, batch processing, or when interacting with APIs that require paginated responses.

What does batched do?

The batched function takes two main arguments:

  • iterable: The sequence of elements you want to chunk.
  • n: The size of each batch (i.e., how many items you want per batch).

The function returns an iterator that yields tuples containing the elements of each batch. If the iterable doesn’t divide evenly, the last batch might contain fewer elements.

Example 1: Basic Usage

Let's say you have a list of items, and you want to process them in batches of 3. You can use batched like so:

import itertools

data = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
result = list(itertools.batched(data, 3))

print(result)

Output:

[('A', 'B', 'C'), ('D', 'E', 'F'), ('G',)]

In this example, the input list is chunked into three groups. The last group contains only one element since the list isn’t evenly divisible by 3.

Example 2: Handling Incomplete Batches

If you're working with a strict batch size and don’t want the final batch to be smaller than n, you can set the strict parameter to True. This will raise a ValueError if the final batch is incomplete.

import itertools

data = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
result = list(itertools.batched(data, 3, strict=True))

print(result)

Output:

ValueError: batched() would create an incomplete batch

This error occurs because the last batch would be incomplete (containing just 'G'), and the strict=True flag enforces that all batches must be full.

Example 3: Working with Larger Data

The batched function is also useful when working with large datasets. Instead of iterating over every element individually, you can process it in smaller chunks. This can be especially beneficial in scenarios where you’re working with memory-intensive operations.

import itertools

data = range(10)
result = list(itertools.batched(data, 4))

print(result)

Output:

[(0, 1, 2, 3), (4, 5, 6, 7), (8, 9)]

Here, we have a range of numbers from 0 to 9, and we divide them into batches of 4.

Conclusion

batched from the itertools module is a simple but powerful function for dividing iterables into smaller chunks. Whether you're processing data in smaller pieces or preparing paginated API requests, batched streamlines the process, making your code cleaner and more efficient. If you're working with large datasets or need to process items in groups, this is a tool you’ll want in your toolkit.

Comments

comments powered by Disqus

Published

Category

Python

Tags

Contact