Welcome back! In our previous lessons, we covered basic data retrieval operations like GetItem
and BatchGetItem
. Today, we’re advancing to more complex retrieval methods—Query
and Scan
. These operations are vital for efficiently managing and querying large datasets within DynamoDB. We'll be using the Books
table for our examples, which has a primary key consisting of year
(partition key) and title
(sort key).
The Query
operation in DynamoDB allows you to retrieve multiple items by specifying exact or conditional matches on the primary key attributes. For example, you can retrieve all books published in a specific year or filter further by titles within that year using sort key conditions.
The Scan
operation examines every item in a table and returns all data attributes by default. While powerful, Scan
can be inefficient and costly in terms of read capacity units, especially for large tables.
In DynamoDB, the Query
operation uses KeyConditionExpression
to find specific items. This is a string that outlines certain key conditions.
You can write the KeyConditionExpression
string manually:
Python1response = table.query( 2 KeyConditionExpression='year = :year_val', 3 ExpressionAttributeValues={':year_val': 2018} 4) 5print("Results:", response['Items'])
Alternatively, use the Key
helper class to build expressions:
Python1response = table.query( 2 KeyConditionExpression=Key('year').eq(2018) & Key('title').begins_with('The') 3) 4print("Results:", response['Items'])
This class handles various comparison operators like eq
, lt
, lte
, gt
, gte
, begins_with
, and between
.
Additional parameters refine the query:
Python1response = table.query( 2 KeyConditionExpression=Key('year').eq(2015), 3 ProjectionExpression="#yr, title, author", 4 ExpressionAttributeNames={"#yr": "year"}, 5 ConsistentRead=True 6) 7print("Results:", response['Items'])
In this example, ProjectionExpression
specifies the attributes to retrieve, while ExpressionAttributeNames
solves conflicts between attribute names and DynamoDB reserved words.
A simple Scan
reads every item in the entire table and returns all attributes for each item:
Python1response = table.scan() 2print("Results:", response['Items'])
To more efficiently scan and filter out unnecessary items from the results, use a FilterExpression
:
Python1response = table.scan( 2 FilterExpression="contains(title, :title_val) AND year = :year_val", 3 ExpressionAttributeValues={':title_val': 'Guide', ':year_val': 2017} 4) 5print("Results:", response['Items'])
Filter expressions can use logical operators like AND
, OR
, and NOT
to combine conditions and provide more refined results.
You can also use the DynamoDB boto3
helper class Attr
to build the filter expression:
Python1from boto3.dynamodb.conditions import Attr 2 3response = table.scan( 4 FilterExpression=Attr('title').contains('Guide') & Attr('year').eq(2017) 5) 6print("Results:", response['Items'])
This class offers various comparison methods (like eq
, ne
, lt
, lte
, gt
, gte
, begins_with
, between
, contains
) which let you construct complex filter expressions and save on read capacity by filtering out unnecessary items right in the AWS Cloud.
"Query" Limitations:
- Items Retrieved: You can retrieve up to 1 MB of data in a single
Query
operation. If your query retrieves more than 1 MB, additional queries are needed to retrieve the rest of the results (pagination). - Index Usage:
Query
requires that you use an index (either the primary key or a secondary index). - Performance: Provides quick access to items using the table's primary key or a secondary index.
"Scan" Limitations:
- Items Retrieved: Like
Query
,Scan
can retrieve up to 1 MB of data per operation. For larger datasets, multiple scans are needed (pagination). - Resource Consumption: Scans can consume significant read capacity by scanning every item in a table or a secondary index, regardless of the number of items returned.
- Efficiency: Generally less efficient than queries because it does not use indexes to filter results.
Both operations support pagination, but detailed coverage of this topic is beyond the scope of this course.
In today's lesson, you've learned advanced techniques for retrieving multiple items from DynamoDB using Query
and Scan
operations. These methods enable you to efficiently interact with large datasets, providing flexibility depending on your application's needs.
As you move forward, practice these methods through exercises that challenge you to apply them in various scenarios. This will help you develop a deeper understanding of DynamoDB's capabilities and prepare you for more advanced database management tasks. Keep exploring and enhancing your skills!