Trace only one logon on AWS RDS Oracle

By Brian Fitzgerald

Introduction

Given an Oracle RDS database, multiple application servers and multiple connections, the task at hand is to

  • Create a logon trigger that will …
  • Run when the first connection arrives and not again after that, and …
  • Run trace 10046 …
  • Up to 10 MB only.
  • Identify the trace file
  • Download the trace file
  • Run tkprof

Background

  • A logon trigger fires for every login that matches the firing condition. If we trace every session, we could file the file system. We only want one example trace file.
  • In RDS, you do not have direct access to the operating system. You must somehow copy out the trace file.

Let’s go!

Grant

The first thing you need to get right is the grant. Trace requires “alter session” rights, which must be granted directly, not through a role. Suppose the user is “inuser”

grant alter session to inuser;

Create a sequence

create sequence inuser.user_trace_trg_seq;

Create the trigger

The trigger must be owned by the user that needs to be traced.

create or replace trigger inuser.user_trace_trg
after logon on database
when ( user = 'INUSER' )
declare
  l_val number;
begin
  l_val := inuser.user_trace_trg_seq.nextval;
  if l_val = 1
  then
    execute immediate
    q'{alter session set max_dump_file_size = '10m'}';
    execute immediate
    q'{alter session set tracefile_identifier = '}'||user||q'{_user_trace_trg'}';
    execute immediate
    q'{alter session set events '10046 trace name context forever, level 12'}';
  end if;
end;
/

Now wait for the user to login. The first the trigger fires time, nextval equals 1, so the trigger body runs. After that, the trigger body does not run.

Identify the trace file

select * from table(
  rdsadmin.rds_file_util.listdir('BDUMP')
)
where filename like '%INUSER_user_trace_trg.trc'
order by mtime;
FILENAME TYPE FILESIZE MTIME
ORCL_ora_9022_INUSER_user_trace_trg.trc file 10485856 2022-04-17 00:17:04

“Download” the trace file

You could transfer the file from rds to s3, and then from s3 to a local system. The other way is:

select text from table(
  rdsadmin.rds_file_util.read_text_file(
   :dir,
   :filename
  )
)

Using sqlalchemy and boto3, I have integrated this statement with AWS secrets manager in a convenient python script

$ download-rds-ora-dir-files.py --secret-id $sec --dir BDUMP --pat ORCL_ora_9022_INUSER_user_trace_trg.trc
number of files: 1

You could also run sqlplus and spool the output to a file.

set feedback off
set linesize 32767
set trimspool on
set pagesize 0

set termout off
spool ORCL_ora_9022_INUSER_user_trace_trg.trc

select text from table(
    rdsadmin.rds_file_util.read_text_file(
        'BDUMP',
        'ORCL_ora_9022_INUSER_user_trace_trg.trc'
    )
)
;

Run tkprof

tkprof ORCL_ora_9022_INUSER_user_trace_trg.trc ORCL_ora_9022_INUSER_user_trace_trg.tkp

Done!

Conclusion

The following concepts were covered

  • How to make a logon trigger body that runs only once.
  • Permissions and object ownership for a logon trigger.
  • How to download a trace file from RDS.

Session objects in python APIs

By Brian Fitzgerald

Introduction

Database administrators understand “session” to mean an established database connection. It may come as a surprise that “session”, in several database-related libraries, refers to a data structure in the absence of any established connection.

Database session

A database session is an instance memory structure that holds an established connection’s state. The session contains facts such as a unique session id and the user name. The database session exists only after the transport (network connection) is established, the user has authenticated, and the memory structures have been allocated. The client application uses an API which holds a handle to the server side session. Application code uses the database API handle to issue commands and to receive responses and query results. The API is layered on top of a database driver, which associates the API handle with an operating system or network connection. In this way, client application to a server session via an API and a database driver. In databases, then, a client “handle” connects to a server “session”.

Other meanings of “session”

DBAs work with software APIs, such as python libraries, that do not establish a connection while allocating what they call a “session”. I will mention three APIs: requests, sqlalchemy, and boto3, in which “session” refers to a data structure that is not always associated with a connection.

Requests

Requests is a python HTTP library. HTTP is a connectionless protocol. Each request to the HTTP server can be implemented by a separate, new network connection that is disconnected after each call. In the following example, the call makes a TCP connection to the server, sends a request, receives a response, and closes the connection.

>>> import requests
>>> from json import loads
>>> r = requests.get('https://httpbin.org/get?q=Hello')
>>> type(r)
<class 'requests.models.Response'>

Let’s verify that no connection remains. Suspend your python repl by pressing ctrl-Z.

>>>
[1]+ Stopped python3
$

Is requests still connected?

$ lsof -n -p $(pgrep python3) | grep https

No output here means that requests is no longer connected.

Requests offers an advanced option that preserves state and a connection across calls.

>>> from requests import Session
>>> ses = Session()
>>> type(ses)
<class 'requests.sessions.Session'>
>>> ses.get('https://httpbin.org/get?q=Hello')
<Response [200]>

Is ses still connected? Let’s see.

$ lsof -n -p $(pgrep python3) | grep https
python3 3387 bf2190 3u IPv4 125425570 0t0 TCP 10.130.99.84:36878->52.86.136.68:https (ESTABLISHED)

Yes. The Session is connected to the https server. If you make another call within a short time, requests will make that call on the same socket. If the connection is idle for approximately 2.5 minutes, then requests will close the connection. If you make another call after that, requests will open a new socket.

To summarize, a requests Session is a client side handle that is not connected when it is created, but gets connected on first use. The session remains connected during use, and disconnects after an idle timeout.

SQL Alchemy

SQL Alchemy is a python SQL toolkit and object relational mapper. Two objects used for establishing a connection are the Engine and the Session. The Engine contains information about the driver and the connection string.

from sqlalchemy import create_engine

eng = create_engine(
    connection_string,
    max_identifier_length=128
)
print(type(eng))
<class 'sqlalchemy.engine.base.Engine'>

The Session is an object that will eventually hold an established connection.

from sqlalchemy.orm import sessionmaker
db_sess = sessionmaker(bind=eng)()
print(type(db_sess))
<class 'sqlalchemy.orm.session.Session'>

So far, the Session is not connected, but will be after a call to connection()

conn = db_sess.connection()
print(type(conn))
<class 'sqlalchemy.engine.base.Connection'>

conn holds an established connection. The other way to connect is to simply use the session.

for row in db_sess.execute(text('select dummy from dual')):
    print('%s' % row['dummy'])
X

Now db_sess holds an established connection. In conclusion, the SQLAlchemy Session is an object that is not initially connected. The Session connects by a call to connection() or on first use.

boto3

boto3 is the AWS python library. It is used to manage AWS services. One such service is Relational Database Service (RDS). In boto3, you make API calls to manage your environment. For example, you can list all RDS db instance details by calling describe_db_instances. To make a call, you must first allocate a boto3 client. The constructor for the RDS service is client(‘rds’). You can call the constructor and run describe directly like so:

>>> from boto3 import client
>>> len(client('rds').describe_db_instances()['DBInstances'])
30

The output from describe_db_instances is verbose, but the results are not needed for this example, so len is used here to summarize the result (30 db instances). Because describe was called directly from the constructor, the client object goes out of scope immediately, and the connection is dropped:

>>> 
[1]+ Stopped python3 
$ lsof -n -p $(pgrep python3) | grep https 
$

In the next example, the boto3 client is saved to variable rds

>>> from boto3 import client
>>> rds = client('rds')
>>> type(rds)
<class 'botocore.client.RDS'>
>>> len(rds.describe_db_instances()['DBInstances'])
30
>>>
[1]+ Stopped python3
$ lsof -n -p $(pgrep python3) | grep https
python3 7128 bf2190 3u IPv4 125436458 0t0 TCP 10.130.99.84:60276->52.46.159.85:https (ESTABLISHED)

In this case, the client remains connected after the results are returned. After approximately 2 minutes of idle time, boto3 closes the connection. If you make another call after that, boto3 opens a new connection. In the next case, the client is within the scope of a function. When the function returns, the client does out of scope, and the connection gets dropped.

>>> from boto3 import client
>>>
>>> def desc():
... rds = client('rds')
... rds.describe_db_instances()['DBInstances']
...
>>> desc()
>>>
[1]+ Stopped python3
$ lsof -n -p $(pgrep python3) | grep https
$
In the final example, client rds is set outside the scope. When the function returns, the client remains in scope and connected to the AWS server.
>>> from boto3 import client
>>>
>>> rds = client('rds')
>>> def desc():
... rds.describe_db_instances()['DBInstances']
...
>>> desc()
>>>
[1]+ Stopped python3
$ lsof -n -p $(pgrep python3) | grep https
python3 7631 bf2190 3u IPv4 125437676 0t0 TCP 10.130.99.84:42598->54.239.31.3:https (ESTABLISHED)

The boto3 session

The boto3 Session is not a connection at all. It is an object that holds access credentials. The Session creates objects that connect to AWS, particularly the Client. You can use the session to manage across multiple regions in an account, multiple accounts, multiple users, multiple roles, or across tokens acquired at different times. You can use the session to override the access keys in your profile, in your environment, or in your service account.
In this example, we loop over two regions and count the number of option groups. We allocate a list of sessions
>>> from boto3.session import Session
>>>
>>> AWS_ACCESS_KEY_ID="ASIAW3NNBKAMBK2CUGNE"
>>> AWS_SECRET_ACCESS_KEY="8V7Qaama1Yvl6ADRfrFI0vNTegx+V56S3sY5/uX0"
>>> AWS_SESSION_TOKEN="IQoJb3JpZ2luX2VjENz//////////wEaCXVzLWVhc3QtMSJIMEYCIQDOnKeV4UGWCVxuSvmD+O1FPUJHbxZOCPa/njhGlxBo1gIhAIRky8kx0jUNl40dg0NKGHjn1aOjYKe9Hdhs6msAlmq7KokDCFQQARoMNDcxMTk4NjgzMTYwIgzXqUo5F7yN89PdSxkq5gL1HhFXOOjVa8Irz8fVwUazE4VAKVtYthND8bXkqTLm1BE3x0hbD54rzpae80UxmcCnPv2Ec3WH8L6N2a7AvqSfhAJa34pzqJ+AcLy4blcdlbwE8I7d8ds/rvoNFiowEf5EhOaP41ZOdtngzdIUN1byE01Vsij8l9Ai74iE5DUnFqmIajtbr2T0Ah9a5ovQmBBB5nyjhY0DBcV9Bj0B6R8swoDnd6NBDcdcxQNvPqzJfwHsqrOYNJVzn1/FQLj86sgwTumHa1WyEugKWO3tlgEuOb+4Y3SqaMQ/vgC7OV+cdKfRzCrK/H/XYkWliVt7Ct1fQ8sGkEyxBlxyemgfvXybJnrXmEx5YisIO4LFMAkpzqsKYNTNybEwDKZmOAN3geP2c2QxoGNGHZ0NcT9nXDmtwHFuO0x2OlPf/qYmdZmlFKIRqBbGNCOPaJ5YgwjuGtx0TqULwKpNlYuF9WkZz5VGvFo6C1MSMLnpupEGOqUBmaqzHmvTykPbaH03SR5J1pKIvHI/XVMcvaThT2UEl7xQhwY0OE/AwAgP0+dtm0G6Mt7I6F2oZ5x6S2gnFHlxb5ctrgC8CRavVkks+ehHM8iAeBIm8CMk4Cb9jFBWj6jXldhHZDOCnOaqVAFSOlTaqXD3ZPlSRW1iyN1vOrmHIMJsGKeb6PiaGz0a9omCE/HCewQkcqYJXBhaTHSRUF1oJL/+GCvw"
>>>
>>>
>>> sess = []
>>> for region in 'us-east-1 us-west-1'.split():
... ses = Session(
... aws_access_key_id=AWS_ACCESS_KEY_ID,
... aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
... aws_session_token=AWS_SESSION_TOKEN,
... region_name=region
... )
... print(type(ses))
... sess.append(ses)
...
<class 'boto3.session.Session'>
<class 'boto3.session.Session'>

next we loop over the sessions, and run describe in a function.

>>> for ses in sess:
... res = ses.client('rds').describe_option_groups()['OptionGroupsList']
... print(len(res)).
File "<stdin>", line 3
print(len(res)).
^
SyntaxError: invalid syntax
>>> for ses in sess:
... res = ses.client('rds').describe_option_groups()['OptionGroupsList']
... print(len(res))
...
4
0
>>>
[1]+ Stopped python3
$ cat ../pgrep.bash
#!/bin/bash
lsof -n -p $(pgrep python3) | grep https
$ ../pgrep.bash
$

There are 4 option groups in us-east-1 and 0 in us-west-1. The describe ran directly off the client constructor. The client went out of scope once the describe finished. No connection remains connected. Instead, let’s allocate a list of clients.

>>> from boto3.session import Session
>>>
>>> AWS_ACCESS_KEY_ID="ASIAW3NNBKAMBK2CUGNE"
>>> AWS_SECRET_ACCESS_KEY="8V7Qaama1Yvl6ADRfrFI0vNTegx+V56S3sY5/uX0"
>>> AWS_SESSION_TOKEN="IQoJb3JpZ2luX2VjENz//////////wEaCXVzLWVhc3QtMSJIMEYCIQDOnKeV4UGWCVxuSvmD+O1FPUJHbxZOCPa/njhGlxBo1gIhAIRky8kx0jUNl40dg0NKGHjn1aOjYKe9Hdhs6msAlmq7KokDCFQQARoMNDcxMTk4NjgzMTYwIgzXqUo5F7yN89PdSxkq5gL1HhFXOOjVa8Irz8fVwUazE4VAKVtYthND8bXkqTLm1BE3x0hbD54rzpae80UxmcCnPv2Ec3WH8L6N2a7AvqSfhAJa34pzqJ+AcLy4blcdlbwE8I7d8ds/rvoNFiowEf5EhOaP41ZOdtngzdIUN1byE01Vsij8l9Ai74iE5DUnFqmIajtbr2T0Ah9a5ovQmBBB5nyjhY0DBcV9Bj0B6R8swoDnd6NBDcdcxQNvPqzJfwHsqrOYNJVzn1/FQLj86sgwTumHa1WyEugKWO3tlgEuOb+4Y3SqaMQ/vgC7OV+cdKfRzCrK/H/XYkWliVt7Ct1fQ8sGkEyxBlxyemgfvXybJnrXmEx5YisIO4LFMAkpzqsKYNTNybEwDKZmOAN3geP2c2QxoGNGHZ0NcT9nXDmtwHFuO0x2OlPf/qYmdZmlFKIRqBbGNCOPaJ5YgwjuGtx0TqULwKpNlYuF9WkZz5VGvFo6C1MSMLnpupEGOqUBmaqzHmvTykPbaH03SR5J1pKIvHI/XVMcvaThT2UEl7xQhwY0OE/AwAgP0+dtm0G6Mt7I6F2oZ5x6S2gnFHlxb5ctrgC8CRavVkks+ehHM8iAeBIm8CMk4Cb9jFBWj6jXldhHZDOCnOaqVAFSOlTaqXD3ZPlSRW1iyN1vOrmHIMJsGKeb6PiaGz0a9omCE/HCewQkcqYJXBhaTHSRUF1oJL/+GCvw"
>>>
>>> rdss = []
>>> for region in 'us-east-1 us-west-1'.split():
... rds = Session(
... aws_access_key_id=AWS_ACCESS_KEY_ID,
... aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
... aws_session_token=AWS_SESSION_TOKEN,
... region_name=region
... ).client('rds')
... rdss.append(rds)
...

Now loop over the clients.

>>> for rds in rdss:
... res = rds.describe_option_groups()['OptionGroupsList']
... print(len(res))
...
4
0
>>>
[1]+ Stopped python3
$ ../pgrep.bash
python3 6587 ec2-user 3u IPv4 26002 0t0 TCP 10.0.16.52:40794->52.119.197.147:https (ESTABLISHED)
python3 6587 ec2-user 4u IPv4 26004 0t0 TCP 10.0.16.52:39852->176.32.118.192:https (ESTABLISHED)
Because the list is in the outer scope, the clients remain in place after the describe finishes and the clients remain connected to the AWS server.
One session maps to one account and region, but that Session can create multiple Clients, where each Client can connect to AWS. In conclusion, the boto3 Session is an object containing tokens that are needed create session objects that can connect to AWS. The boto3 Session itself does not connect to the AWS server.

Conclusion

When I started applying python to my database administration job, I ran across various APIs that used the a “session” that was not a connection to the server. In the three APIs mentioned here, requests, SQL Alchemy, and boto3, the Session is not itself a connection, but it can connect and remain connected for a time. The SQL Alchemy session can be used create a connection to a database. The boto3 Session can create a client that connects to the AWS server.

pagination and python 3.8 assignment expressions in AWS boto3

By Brian Fitzgerald

Introduction

AWS query operations return their results all at once, or in smaller, manageable pages. boto3 provides the programmer with to ways to manage the retrieval of paginated results.

Python version 3.8 introduces assignment expressions, a way to assign to variables within an expression using the notation := expr. PEP-572 explains how assignment expressions are useful in control statements, such as if and while, and in comprehensions. Assignment expressions can be used to improve coding style in pagination code.

describe

To query an Amazon AWS service, one calls an API using a procedure with a name beginning with “describe”, or sometimes “list” or “get”. The result is a dictionary with at least two elements, namely the payload, and ResponseMetadata. If more results are pending, NextToken will appear. The payload is your query result and consists of an array of objects. Examples of payload data include EC2 instance Reservations, EC2 EBS Volumes, and RDS DBInstances.

Using python API boto3, you can write:

import boto3
ec2 = boto3.client('ec2')
resp = { 'NextToken' : '' }
while 'NextToken' in resp:
    resp = ec2.describe_instances(
        NextToken = resp['NextToken']
    )
    for resv in resp['Reservations']:
        for inst in resv['Instances']:
            print ( inst['InstanceId'])

resp is a dict, and appears four times in the above code:

  • initialization with a dummy NextToken
  • returned from describe_instances
  • used to get describe_instances argument NextToken
  • used to get Reservations

The dictionary keys are Reservations, ResponseMetadata, and when needed, NextToken.

python 3.8 assignment expression

In python 3.8, we can assign to a variable, and then use the assignment as an expression. Specifically, we can assign resp, and access element Reservations.

import boto3
ec2 = boto3.client('ec2')
resp = { 'NextToken' : '' }
while 'NextToken' in resp:
    for resv in (resp := ec2.describe_instances(
        NextToken = resp['NextToken']
    ))['Reservations']:
        for inst in resv['Instances']:
            print ( inst['InstanceId'])

There is one fewer line. Some may find the revised code appealing.

paginator

AWS boto3 provides paginators, python iterators, that handle NextToken for you.

import boto3
ec2 = boto3.client('ec2')
pagr = ec2.get_paginator('describe_instances')
for page in pagr.paginate():
    for resv in page['Reservations']:
        for inst in resv['Instances']:
            print( inst['InstanceId'])

There are two fewer lines. NextToken does not appear in the code.

Paginate returns the same results as the prior “describe” type methods. The returned value is a dict, again with keys Reservations, ResponseMetadata, and when needed, NextToken. However, the returned NextToken has no role in the new code.

paginator relationship to API call

A debug trace of describe_instances() shows an underlying call to client.py, line 357, function _api_call().

(Pdb) where
  /usr/lib64/python3.7/bdb.py(585)run()
-> exec(cmd, globals, locals)
  (1)()
  /home/ec2-user/git/aws/insts.desc.3.7.py(8)()
-> NextToken = resp['NextToken']
  /usr/local/lib/python3.7/site-packages/botocore/client.py(357)_api_call()
-> return self._make_api_call(operation_name, kwargs)

Debug trace of paginate approach shows a call to the same point via paginate.py.

(Pdb) where
  /usr/lib64/python3.7/bdb.py(585)run()
-> exec(cmd, globals, locals)
  (1)()
  /home/ec2-user/git/aws/insts.page.3.7.py(6)()
-> for page in pagr.paginate():
  /usr/local/lib/python3.7/site-packages/botocore/paginate.py(255)__iter__()
-> response = self._make_request(current_kwargs)
  /usr/local/lib/python3.7/site-packages/botocore/paginate.py(332)_make_request()
-> return self._method(**current_kwargs)
  /usr/local/lib/python3.7/site-packages/botocore/client.py(357)_api_call()

You can see where paginate is handling NextToken for you. For example, this code:

NextToken = resp['NextToken']

is handled behind the scenes as:

> /usr/local/lib/python3.7/site-packages/botocore/paginate.py(303)__iter__()
-> previous_next_token = next_token
(Pdb) p next_token
{'NextToken': 'eyJ2IjoiMiIsImMiOiJ5eHV0K011N2crQlBBaFhoSWU2SUpad0c3V3VaUFBvKzBPbDRIWFAvaXJSb3poeDFDNks3TkxGMkU5R1UxRjk4UlVnNFViRzNjSUlWWXhMbHk3ejU1Qjd1ZGhERHNBVktCR1g0cW5RZk9FdStZckViM0NjOXljV1p0SWplckhkV2ZISkNvc0NXdjhnMXA4RVBMWDFiVzNkS3k1NW5CdlZmUlhWUEpzeUZNbnhMS3VzdEo4eHFIWHRYNytpcEdWbHJKMFRqTlNLQ3A0Rk9VaEZGckdBMTVOYU44WGhvYkYyZVBBYjRrMVVaYXNFTCIsInMiOiIxIn0='}

The describe and paginate approaches are functionally equivalent. Neither one has a performance advantage. Using paginate leads to cleaner code.

limiting the results (cloud side)

You can limit the number of items returned thus:

...
for page in pagr.paginate(PaginationConfig={'MaxItems': 250}):
...

In that case, the total number of items returned will be at most 250. The limiting is done on the cloud side. The items could be returned across more than one page.

limiting the results (client side)

Suppose you want all the ec2 instance types that have memory less than or equal to 8 GB. Two problems:

  1. AWS pricing has no filter for memory.
  2. The AWS pricing filter has no “less than” operator.

In that case, you need to retrieve your products, and apply additional filtering in client side code.

#!/home/ec2-user/sw/python/3.8/bin/python3.8

from boto3 import client
from pfilt import Pfilt
from json import dumps, loads


class ResultsLimitExceededError(Exception):
    def __init__(self, limit):
        fmt = format('results limit %s exceeded')
        msg = (fmt % limit)
        super(ResultsLimitExceededError, self).__init__(msg)


class Prices:

    @classmethod
    def instancetypes(cls):
        cli = client('pricing')
        pag = cli.get_paginator('get_products')

        max_products_in = 1000
        max_products_out = 100
        num_products_in = 0
        num_products_out = 0
        for page in pag.paginate(
                ServiceCode='AmazonEC2',
                Filters=Pfilt.filters,
                PaginationConfig={'MaxItems': max_products_in}
        ):
            print('page size %s' % len(page['PriceList']))
            for skitem in [loads(itm) for itm in page['PriceList']]:
                num_products_in += 1
                if (filtered_skitem := cls.client_filter(cls.enrich(skitem))) is not None:
                    if (num_products_out := num_products_out + 1) > max_products_out:
                        raise ResultsLimitExceededError(max_products_out)
                    cls.processitm(filtered_skitem)

        print('number of products in %s' % num_products_in)
        print('number of products out %s' % num_products_out)

    @classmethod
    def enrich(cls, skitem):
        eskitem = skitem
        product = skitem['product']
        memGB = float(product['attributes']['memory'].split()[0].replace(',', ''))
        eskitem['memGB'] = memGB
        return eskitem

    @classmethod
    def client_filter(cls, eskitem):
        maxmem = 8
        return eskitem if eskitem['memGB'] <= maxmem else None

    @classmethod
    def processitm(cls, skitem):
        pass


if __name__ == '__main__':
    Prices.instancetypes()

In this case, we set a cloud-side limit of 1000 records, and a client-side limit of 100 records.

Example output:

$ ./prices.pag.py
page size 100
page size 100
page size 39
number of products in 239
number of products out 33

You could manage your client-side memory by reducing the page size:

PaginationConfig={'MaxItems': max_products_in, 'PageSize': 50}

Setting PageSize does not affect the results. The output, then, is:

page size 50
page size 50
page size 50
page size 50
page size 39
number of products in 239
number of products out 33

To demonstrate the exception, we can change:

max_products_out = 20

in that case, we get:

$ ./prices.pag.py
Traceback (most recent call last):
  File "./prices.pag.py", line 62, in 
    Prices.instancetypes()
  File "./prices.pag.py", line 36, in instancetypes
    raise ResultsLimitExceededError(max_products_out)
__main__.ResultsLimitExceededError: results limit 20 exceeded

We used the new Python 3.8 assignment expression in two places:

if (fskitem := cls.client_filter(cls.enrich(skitem))) is not None:
    if (num_products_out := num_products_out + 1) > max_products_out:

We assign fskitem and then test it. Likewise, we assign num_products_out and test it.

pagination availability

Not all AWS operations can be paginated — it depends on the service. If an operation cannot be paginated, it will not return NextToken. If an operation cannot be paginated, you may have issues managing the results. You can use this script to find out if an operation can be paginated.

#!/usr/bin/python

from boto3 import client
from argparse import ArgumentParser


class CanPag:
    args = None

    @classmethod
    def prs(cls):
        ap = ArgumentParser(
            description='Check whether an operation can paginate'
        )
        ap.add_argument(
            '--service', '-s', required=True,
            help='AWS service name (ec2, s3, pricing, etc.)'
        )
        ap.add_argument(
            '--operation', '-o', required=True,
            help='AWS service operation (describe_instances, etc.)'
        )

        cls.args = ap.parse_args()

    @classmethod
    def canpage(cls):
        cli = client(cls.args.service)
        cp = cli.can_paginate(cls.args.operation)
        print(
            '%s %s %s paginate.' % (
                cls.args.service,
                cls.args.operation,
                'can' if cp else 'cannot')
        )


if __name__ == '__main__':
    CanPag.prs()
    CanPag.canpage()

Examples:

$ ./can.paginate.py -s ec2 -o describe_instances
ec2 describe_instances can paginate.
$ ./can.paginate.py -s ec2 -o describe_volumes
ec2 describe_volumes can paginate.
$ ./can.paginate.py -s ec2 -o describe_images
ec2 describe_images cannot paginate.
$ ./can.paginate.py -s pricing -o get_products
pricing get_products can paginate.

Conclusion

I wrote this article for two reasons: To identify a good paginator coding practice, and to try out the new Python 3.8 assignment expression.

AWS queries are often returned in chunks, or pages. You could write your own code to manage the retrieval, but you are better off using the provided paginator. You can configure the paginator as to and MaxItems and PageSize. Use can_paginate() to find out which operations can be paginated.

Python version 3.8 assignment expression was demonstrated. Three cases were presented. In each case, the pattern was “assign and test”. The assigned variables in the examples were resp, filtered_skitem, and num_products_out. The assigned value was needed elsewhere in the routine.