pagination and python 3.8 assignment expressions in AWS boto3

By Brian Fitzgerald

Introduction

AWS query operations return their results all at once, or in smaller, manageable pages. boto3 provides the programmer with to ways to manage the retrieval of paginated results.

Python version 3.8 introduces assignment expressions, a way to assign to variables within an expression using the notation := expr. PEP-572 explains how assignment expressions are useful in control statements, such as if and while, and in comprehensions. Assignment expressions can be used to improve coding style in pagination code.

describe

To query an Amazon AWS service, one calls an API using a procedure with a name beginning with “describe”, or sometimes “list” or “get”. The result is a dictionary with at least two elements, namely the payload, and ResponseMetadata. If more results are pending, NextToken will appear. The payload is your query result and consists of an array of objects. Examples of payload data include EC2 instance Reservations, EC2 EBS Volumes, and RDS DBInstances.

Using python API boto3, you can write:

import boto3
ec2 = boto3.client('ec2')
resp = { 'NextToken' : '' }
while 'NextToken' in resp:
    resp = ec2.describe_instances(
        NextToken = resp['NextToken']
    )
    for resv in resp['Reservations']:
        for inst in resv['Instances']:
            print ( inst['InstanceId'])

resp is a dict, and appears four times in the above code:

  • initialization with a dummy NextToken
  • returned from describe_instances
  • used to get describe_instances argument NextToken
  • used to get Reservations

The dictionary keys are Reservations, ResponseMetadata, and when needed, NextToken.

python 3.8 assignment expression

In python 3.8, we can assign to a variable, and then use the assignment as an expression. Specifically, we can assign resp, and access element Reservations.

import boto3
ec2 = boto3.client('ec2')
resp = { 'NextToken' : '' }
while 'NextToken' in resp:
    for resv in (resp := ec2.describe_instances(
        NextToken = resp['NextToken']
    ))['Reservations']:
        for inst in resv['Instances']:
            print ( inst['InstanceId'])

There is one fewer line. Some may find the revised code appealing.

paginator

AWS boto3 provides paginators, python iterators, that handle NextToken for you.

import boto3
ec2 = boto3.client('ec2')
pagr = ec2.get_paginator('describe_instances')
for page in pagr.paginate():
    for resv in page['Reservations']:
        for inst in resv['Instances']:
            print( inst['InstanceId'])

There are two fewer lines. NextToken does not appear in the code.

Paginate returns the same results as the prior “describe” type methods. The returned value is a dict, again with keys Reservations, ResponseMetadata, and when needed, NextToken. However, the returned NextToken has no role in the new code.

paginator relationship to API call

A debug trace of describe_instances() shows an underlying call to client.py, line 357, function _api_call().

(Pdb) where
  /usr/lib64/python3.7/bdb.py(585)run()
-> exec(cmd, globals, locals)
  (1)()
  /home/ec2-user/git/aws/insts.desc.3.7.py(8)()
-> NextToken = resp['NextToken']
  /usr/local/lib/python3.7/site-packages/botocore/client.py(357)_api_call()
-> return self._make_api_call(operation_name, kwargs)

Debug trace of paginate approach shows a call to the same point via paginate.py.

(Pdb) where
  /usr/lib64/python3.7/bdb.py(585)run()
-> exec(cmd, globals, locals)
  (1)()
  /home/ec2-user/git/aws/insts.page.3.7.py(6)()
-> for page in pagr.paginate():
  /usr/local/lib/python3.7/site-packages/botocore/paginate.py(255)__iter__()
-> response = self._make_request(current_kwargs)
  /usr/local/lib/python3.7/site-packages/botocore/paginate.py(332)_make_request()
-> return self._method(**current_kwargs)
  /usr/local/lib/python3.7/site-packages/botocore/client.py(357)_api_call()

You can see where paginate is handling NextToken for you. For example, this code:

NextToken = resp['NextToken']

is handled behind the scenes as:

> /usr/local/lib/python3.7/site-packages/botocore/paginate.py(303)__iter__()
-> previous_next_token = next_token
(Pdb) p next_token
{'NextToken': 'eyJ2IjoiMiIsImMiOiJ5eHV0K011N2crQlBBaFhoSWU2SUpad0c3V3VaUFBvKzBPbDRIWFAvaXJSb3poeDFDNks3TkxGMkU5R1UxRjk4UlVnNFViRzNjSUlWWXhMbHk3ejU1Qjd1ZGhERHNBVktCR1g0cW5RZk9FdStZckViM0NjOXljV1p0SWplckhkV2ZISkNvc0NXdjhnMXA4RVBMWDFiVzNkS3k1NW5CdlZmUlhWUEpzeUZNbnhMS3VzdEo4eHFIWHRYNytpcEdWbHJKMFRqTlNLQ3A0Rk9VaEZGckdBMTVOYU44WGhvYkYyZVBBYjRrMVVaYXNFTCIsInMiOiIxIn0='}

The describe and paginate approaches are functionally equivalent. Neither one has a performance advantage. Using paginate leads to cleaner code.

limiting the results (cloud side)

You can limit the number of items returned thus:

...
for page in pagr.paginate(PaginationConfig={'MaxItems': 250}):
...

In that case, the total number of items returned will be at most 250. The limiting is done on the cloud side. The items could be returned across more than one page.

limiting the results (client side)

Suppose you want all the ec2 instance types that have memory less than or equal to 8 GB. Two problems:

  1. AWS pricing has no filter for memory.
  2. The AWS pricing filter has no “less than” operator.

In that case, you need to retrieve your products, and apply additional filtering in client side code.

#!/home/ec2-user/sw/python/3.8/bin/python3.8

from boto3 import client
from pfilt import Pfilt
from json import dumps, loads


class ResultsLimitExceededError(Exception):
    def __init__(self, limit):
        fmt = format('results limit %s exceeded')
        msg = (fmt % limit)
        super(ResultsLimitExceededError, self).__init__(msg)


class Prices:

    @classmethod
    def instancetypes(cls):
        cli = client('pricing')
        pag = cli.get_paginator('get_products')

        max_products_in = 1000
        max_products_out = 100
        num_products_in = 0
        num_products_out = 0
        for page in pag.paginate(
                ServiceCode='AmazonEC2',
                Filters=Pfilt.filters,
                PaginationConfig={'MaxItems': max_products_in}
        ):
            print('page size %s' % len(page['PriceList']))
            for skitem in [loads(itm) for itm in page['PriceList']]:
                num_products_in += 1
                if (filtered_skitem := cls.client_filter(cls.enrich(skitem))) is not None:
                    if (num_products_out := num_products_out + 1) > max_products_out:
                        raise ResultsLimitExceededError(max_products_out)
                    cls.processitm(filtered_skitem)

        print('number of products in %s' % num_products_in)
        print('number of products out %s' % num_products_out)

    @classmethod
    def enrich(cls, skitem):
        eskitem = skitem
        product = skitem['product']
        memGB = float(product['attributes']['memory'].split()[0].replace(',', ''))
        eskitem['memGB'] = memGB
        return eskitem

    @classmethod
    def client_filter(cls, eskitem):
        maxmem = 8
        return eskitem if eskitem['memGB'] <= maxmem else None

    @classmethod
    def processitm(cls, skitem):
        pass


if __name__ == '__main__':
    Prices.instancetypes()

In this case, we set a cloud-side limit of 1000 records, and a client-side limit of 100 records.

Example output:

$ ./prices.pag.py
page size 100
page size 100
page size 39
number of products in 239
number of products out 33

You could manage your client-side memory by reducing the page size:

PaginationConfig={'MaxItems': max_products_in, 'PageSize': 50}

Setting PageSize does not affect the results. The output, then, is:

page size 50
page size 50
page size 50
page size 50
page size 39
number of products in 239
number of products out 33

To demonstrate the exception, we can change:

max_products_out = 20

in that case, we get:

$ ./prices.pag.py
Traceback (most recent call last):
  File "./prices.pag.py", line 62, in 
    Prices.instancetypes()
  File "./prices.pag.py", line 36, in instancetypes
    raise ResultsLimitExceededError(max_products_out)
__main__.ResultsLimitExceededError: results limit 20 exceeded

We used the new Python 3.8 assignment expression in two places:

if (fskitem := cls.client_filter(cls.enrich(skitem))) is not None:
    if (num_products_out := num_products_out + 1) > max_products_out:

We assign fskitem and then test it. Likewise, we assign num_products_out and test it.

pagination availability

Not all AWS operations can be paginated — it depends on the service. If an operation cannot be paginated, it will not return NextToken. If an operation cannot be paginated, you may have issues managing the results. You can use this script to find out if an operation can be paginated.

#!/usr/bin/python

from boto3 import client
from argparse import ArgumentParser


class CanPag:
    args = None

    @classmethod
    def prs(cls):
        ap = ArgumentParser(
            description='Check whether an operation can paginate'
        )
        ap.add_argument(
            '--service', '-s', required=True,
            help='AWS service name (ec2, s3, pricing, etc.)'
        )
        ap.add_argument(
            '--operation', '-o', required=True,
            help='AWS service operation (describe_instances, etc.)'
        )

        cls.args = ap.parse_args()

    @classmethod
    def canpage(cls):
        cli = client(cls.args.service)
        cp = cli.can_paginate(cls.args.operation)
        print(
            '%s %s %s paginate.' % (
                cls.args.service,
                cls.args.operation,
                'can' if cp else 'cannot')
        )


if __name__ == '__main__':
    CanPag.prs()
    CanPag.canpage()

Examples:

$ ./can.paginate.py -s ec2 -o describe_instances
ec2 describe_instances can paginate.
$ ./can.paginate.py -s ec2 -o describe_volumes
ec2 describe_volumes can paginate.
$ ./can.paginate.py -s ec2 -o describe_images
ec2 describe_images cannot paginate.
$ ./can.paginate.py -s pricing -o get_products
pricing get_products can paginate.

Conclusion

I wrote this article for two reasons: To identify a good paginator coding practice, and to try out the new Python 3.8 assignment expression.

AWS queries are often returned in chunks, or pages. You could write your own code to manage the retrieval, but you are better off using the provided paginator. You can configure the paginator as to and MaxItems and PageSize. Use can_paginate() to find out which operations can be paginated.

Python version 3.8 assignment expression was demonstrated. Three cases were presented. In each case, the pattern was “assign and test”. The assigned variables in the examples were resp, filtered_skitem, and num_products_out. The assigned value was needed elsewhere in the routine.

Leave a Reply