Welcome! Today, we will learn about an exciting Python feature called generators. Generators help us write code that uses memory efficiently, which is crucial when handling large amounts of data. By the end of this lesson, you'll understand what generators are and how to create and use them in Python. You’ll learn how they can make your code more efficient, especially when working with large datasets.
Imagine you have a vast book. You will read one sentence at a time rather than holding the whole book in your head at once. Generators in Python work similarly: they let us handle large data collections one item at a time.
Generators are special Python functions that return one item at a time, which helps process large datasets efficiently. Unlike normal functions that return a list of items and hold all items in memory at once, generators yield items one by one, using less memory and processing time.
Here's a quick comparison:
- A normal function returns all items at once (like a big bag of candies).
- A generator yields one item at a time (giving one candy at a time).
Let's look at a normal function that returns a list of numbers from 1 to 5.
Python1def normal_function(): 2 return [1, 2, 3, 4, 5] 3 4def main(): 5 numbers = normal_function() 6 for number in numbers: 7 print(number) # Output: 1, 2, 3, 4, 5 8 9if __name__ == "__main__": 10 main()
This function returns a list of five numbers. Now, let's see how generators do it differently.
Here is a simple_generator
function:
Python1def simple_generator(): 2 yield 1 3 yield 2 4 yield 3 5 yield 4 6 yield 5
The simple_generator
function uses the yield
keyword. Unlike return
, which exits the function, yield
pauses the function and saves its state.
The yield
keyword is what makes a function a generator. Here’s what happens when a function containing yield
is called:
- Creates a Generator Object: Instead of running the function, it returns a generator object.
- Pausing and Resuming: When the generator's
__next__()
method is called (e.g., vianext()
function or afor
loop), the function runs until it hitsyield
. - Saves State and Returns Value: The function pauses at
yield
, saves its current state (local variables, execution point), and returns the yielded value. - Resumes from Last State: When
__next__()
is called again, it resumes right after the lastyield
statement, with all its variables intact.
To run the generator, we initialize it and iterate through it using a for
loop.
Python1def simple_generator(): 2 yield 1 3 yield 2 4 yield 3 5 yield 4 6 yield 5 7 8# Initializing the generator 9generator = simple_generator() 10 11# Using a for loop to go through each item 12for number in generator: 13 print(number) # Output: 1, 2, 3, 4, 5
When the loop runs: the generator yields 1
, the loop prints 1
, and the generator pauses. The loop runs again; the generator yields 2
, and the loop prints 2
, and so on.
An alternative way of executing the generator is the next
method.
Python1def simple_generator(): 2 yield 1 3 yield 2 4 yield 3 5 yield 4 6 yield 5 7 8# Initializing the generator 9generator = simple_generator() 10 11# Using next to go through generator's items: 12print(next(generator)) # 1 13print(next(generator)) # 2 14print(next(generator)) # 3 15print(next(generator)) # 4 16print(next(generator)) # 5
Each time we call next(generator)
, the __next__()
method of the generator object is executed, which yields the next item.
Generators manage large datasets without using much memory. For example, a generator can efficiently read a huge log file line-by-line rather than loading the whole file into memory.
Let's look at a generator function that reads a large file line by line. Before we dive in, let's recall that the open
function is used to open a file and returns a file object. It is typically used in a with
statement for proper resource management, ensuring the file is closed after its block of code is executed.
Python1def read_large_file(file_path): 2 with open(file_path, 'r') as file: 3 for line in file: 4 yield line 5 6# Using the generator 7for line in read_large_file('big_log.txt'): 8 print(line.strip()) # Output: Each line of the file, stripped of leading/trailing whitespace
In the lesson playground, you won't be able to find and open the big_log.txt
file to try out this code snippet. But there will be a fun practice that includes creating a log file generator with an actual log file to work with.
Sometimes, you may want to create a generator on the fly without a full function definition. Python provides a shorthand for creating simple generators called generator expressions.
A generator expression looks very similar to a list comprehension but uses parentheses instead of square brackets.
Here is an example:
Python1# List comprehension 2numbers_list = [x * x for x in range(5)] 3print(numbers_list) # Output: [0, 1, 4, 9, 16] 4 5# Generator expression 6numbers_generator = (x * x for x in range(5)) 7 8# Using the generator 9for number in numbers_generator: 10 print(number) # Output: 0, 1, 4, 9, 16
Generator expressions are a compact and memory-efficient way to generate sequences on the fly.
Great job! You've learned a lot about generators today. Let’s recap:
- Generators are special functions that yield items one at a time.
- They use the
yield
keyword to pause and resume their state. - Generators are memory-efficient and can improve code performance, especially with large datasets.
- Generator expressions provide a compact way to create generators.
Now, it's time to put what you have learned into practice. You will create and use your own generators in different scenarios. This hands-on practice will help you solidify your understanding of how generators can make your code more efficient and powerful. Good luck!