Attempt to trace a process in AWS lambda

By Brian Fitzgerald

Introduction

On the surface, AWS Lambda appears to be a serverless resource that runs your code. However, Lambda users will quickly notice that the code runs on an EC2-like Linux container. There are times when a system-related error appears, and you want to trace the code to find out the cause, or the point of failure.

Scenario

We setup the lambda. File lg.py:

from os import getlogin
from json import dumps

def lam(ev, cx):
    ret = {
        'login': dumps(getlogin())
    }
    return ret
    
if __name__ == '__main__':
    lam(None, None)

the output is:

START RequestId: bc81ac8d-5c3b-421a-9432-82a6ab279767 Version: $LATEST
[Errno 6] No such device or address: OSError
Traceback (most recent call last):
File "/var/task/lg.py", line 6, in lam
'login': dumps(getlogin())
OSError: [Errno 6] No such device or address

Suppose we want to know more about the error message. In Linux, strace will tell you what system call led to the error message. You can try strace. File trc.py:

from os import system

def lam(ev, cx):
    system('strace python lg.py')
    return {}

However, strace is not in lambda:

sh: strace: command not found

You can search:

     system('find / -name strace -ls')

No file is found. You can copy in strace. Start from EC2. Copy in the strace binary and your python files:

[ec2-user@ip-172-31-80-17 trc]$ mkdir -p bin
[ec2-user@ip-172-31-80-17 trc]$ cp -p /usr/bin/strace bin/strace
[ec2-user@ip-172-31-80-17 trc]$ find * -type f
bin/strace
lg.py
trc.py
[ec2-user@ip-172-31-80-17 trc]$ zip -rq ../trc.zip * 
[ec2-user@ip-172-31-80-17 trc]$ aws s3 cp ../trc.zip s3://test.bucket/lambda/trc.zip
upload: ../trc.zip to s3://test.bucket/lambda/trc.zip

Upload the zip file:

upload

Now your lambda function has 3 files: the two python files and the strace binary. Change the call to system:

    system('/var/task/bin/strace python lg.py')

Run it and you get this message:

/var/task/bin/strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted

That did not work. The next thing you can try is:

  1. Start a second process, and get the pid
  2. The second process will sleep for 1s and then run the failing statement.
  3. In the first process, trace the second process.

trcp.py

from subprocess import Popen
from os import system


def lam(ev, cx):
    proc = Popen('python sllg.py', shell=True)
    fmt = 'bin/strace -p %s'
    cmd = format(fmt % proc.pid)
    print(cmd)
    system(cmd)
    return {}


if __name__ == '__main__':
    lam(None, None)

sllg.py:

from time import sleep
from os import getlogin

sleep(1)
lg = getlogin()

In that case, the error message is:

bin/strace: attach: ptrace(PTRACE_ATTACH, 5): Operation not permitted

Discussion

We tried two different invocations of strace inside lambda. In the first attempt, we ran “strace command”. Internally, strace should trace itself by calling ptrace(PTRACE_TRACEME…) and then exec the command. In the second case, we want strace to attach to a process with strace(PTRACE_ATTACH…). AWS Lambda permits neither call.

Conclusion

The traditional approach to tracing Linux processes is not permitted in AWS Lambda.

yield

By Brian Fitzgerald

Here are my observations on python yield.

The presence of “yield” causes a function to become a factory of generators.  Here is an example.

def fgen():
    p, f = 1, 1
    print('initial p=%s p=%s ' % (p, f))
    while True:
        print('y %s' % f)
        yield f
        p, f = f, p + f
        print('next p=%s f=%s' % (p, f))

Now we will run it. The call:

fa = fgen()
print('g %s' % fa)
for fi in fa:
    print('ret A %s' % fi)
    if fi >= 5:
        break

The output:

g <generator object fgen at 0x00C280C0>
initial p=1 p=1 
y 1
ret A 1
next p=1 f=2
y 2
ret A 2
next p=2 f=3
y 3
ret A 3
next p=3 f=5
y 5
ret A 5

Notes:

  • The call fa = fgen() produced no output. No fgen code ran.
  • fgen’s return type is generator. fa is a generator.
  • “for i in fa:” causes the generator to iterate.
  • In the first iteration, the “print(‘initial…” statement ran.
  • When execution reached “yield f”, the generator returned f and returned control to the caller. That is evident from this output:
y 1
ret A 1
  • On the second iteration, generator execution picked up where it left off, and ran
p, f = f, p + f

output:

next p=1 f=2

Note that f values are generated on the fly whenever the generator is called.

If the same generator (“fa”) gets used again, like this:

for fi in fa:
    print('ret A+ %s' % fi)
    break

then the generator picks up where it left off:

next p=5 f=8
y 8
ret A+ 8

Calling fgen again gets a new generator that starts from scratch.

fb = fgen()
print('g %s' % fb)
for fi in fb:
    print('ret B %s' % fi)
    if fi >= 3:
        break

Output:

g <generator object fgen at 0x02DE81B0>
initial p=1 p=1 
y 1
ret B 1
next p=1 f=2
y 2
ret B 2
next p=2 f=3
y 3
ret B 3

Summary:

  • “yield” is used to define a function that returns a generator.
  • On the first call to the generator, execution runs up to the yield.
  • On subsequent calls, execution continues where it left off.

 

neater python logging messages

By Brian Fitzgerald

Here are a few notes on cleaner-looking python logging messages. Some package defaults lead to verbose, ragged looking output.

from threading import Thread
from logging import debug, info, warning, error, critical
from logging import DEBUG, basicConfig
def hello():
    warning('Something is up')
basicConfig(level=DEBUG,
            format='%(asctime)s %(levelname)s %(threadName)s %(message)s'
            )
debug('begin')
error('problem')
critical('danger')
threads = []
for i in [1, 2]:
    th = Thread(target=hello)
    th.start()
    threads.append(th)
for th in threads:
    th.join()
    info('joined')
info('done')
"C:\Users\Brian Fitzgerald\PycharmProjects\blog\venv\Scripts\python.exe" "C:/Users/Brian Fitzgerald/PycharmProjects/blog/threadlogging.py"
2018-05-03 21:12:51,098 DEBUG MainThread begin
2018-05-03 21:12:51,099 ERROR MainThread problem
2018-05-03 21:12:51,099 CRITICAL MainThread danger
2018-05-03 21:12:51,099 WARNING Thread-1 Something is up
2018-05-03 21:12:51,100 WARNING Thread-2 Something is up
2018-05-03 21:12:51,100 INFO MainThread joined
2018-05-03 21:12:51,100 INFO MainThread joined
2018-05-03 21:12:51,100 INFO MainThread done

Process finished with exit code 0

Notes:

  • The timestamp field is 23 characters
  • The level field is 4 to 8 characters
  • The thread identifier in this example is up to 10 characters
  • The logger prefix requires up to 43 characters
  • The content has a ragged appearance

Next, we will make three changes

  1. use datefmt for a shorter date string: %Y%m%d.%H%M%S
  2. use single character abbreviations for level: D, I, W, E, and C
  3. format short, fixed-width thread identifiers using th%02d
from threading import Thread, currentThread
from logging import debug, info, warning, error, critical
from logging import DEBUG, basicConfig, addLevelName
def hello():
    warning('Something is up')
basicConfig(
    level=DEBUG,
    datefmt='%Y%m%d.%H%M%S',
    format='%(asctime)s %(levelname)s %(threadName)s %(message)s')
currentThread().setName('main')
addLevelName(10, 'D')
addLevelName(20, 'I')
addLevelName(30, 'W')
addLevelName(40, 'E')
addLevelName(50, 'C')
debug('begin')
error('problem')
critical('danger')
threads = []
for i in [1, 2]:
    th = Thread(target=hello)
    th.setName('th%02d' % i)
    th.start()
    threads.append(th)
for th in threads:
    th.join()
    info('joined')
info('done')
"C:\Users\Brian Fitzgerald\PycharmProjects\blog\venv\Scripts\python.exe" "C:/Users/Brian Fitzgerald/PycharmProjects/blog/threadlogging.py"
20180503.211759 D main begin
20180503.211759 E main problem
20180503.211759 C main danger
20180503.211759 W th01 Something is up
20180503.211759 W th02 Something is up
20180503.211759 I main joined
20180503.211759 I main joined
20180503.211759 I main done

Process finished with exit code 0

Now, each log file entry has a fixed 22-character width. There is more space to the right for the actual content.

python sys.exit vs. os._exit

By Brian Fitzgerald

Here are some notes on the behavior of python functions sys.exit() vs os._exit() in a threaded program.

from threading import Thread, currentThread
from time import sleep, time
import sys
import atexit

def pt(str=None):
    print('el=%ss th=%s: %s' % (
        round(time() - t0, 1),
        currentThread().name, str)
          )

def testexit():
    pt('sleeping')
    sleep(int(currentThread().getName()))
    pt('calling sys.exit')
    sys.exit(0)
    pt('returning')

def alldone():
    pt('all done')

t0 = time()
atexit.register(alldone)
threads = []
for i in [1, 2]:
    th = Thread(target=testexit)
    th.setName(i)
    th.setDaemon(False)
    th.start()
    threads.append(th)
for th in threads:
    pt('waiting to join thread th=%s' % th.name)
    th.join()
    pt('main: joined to th=%s' % th.name)

In the threads, the sleep time equals the thread id, i.e. 1 or 2 seconds. We have arranged that function alldone will execute when the program exits.

"C:\Users\Brian Fitzgerald\PycharmProjects\blog\venv\Scripts\python.exe" "C:/Users/Brian Fitzgerald/PycharmProjects/blog/threadexit.py"
el=0.0s th=1: sleeping
el=0.0s th=2: sleeping
el=0.0s th=MainThread: waiting to join thread th=1
el=1.0s th=1: calling sys.exit
el=1.0s th=MainThread: main: joined to th=1
el=1.0s th=MainThread: waiting to join thread th=2
el=2.0s th=2: calling sys.exit
el=2.0s th=MainThread: main: joined to th=2
el=2.0s th=MainThread: all done
Process finished with exit code 0

Notes:

  • sys.exit() causes the thread to exit
  • statement “pt(‘returning’)” is not reached
  • The main thread finishes
  • Function alldone is executed
  • The output is identical if th.setDaemon(True)

Now changing “import sys” to “import os”, and “sys.exit(0)” to “os._exit(0)”:

from threading import Thread, currentThread
from time import sleep, time
import os
import atexit

def pt(str=None):
    print('el=%ss th=%s: %s' % (
        round(time() - t0, 1),
        currentThread().name, str)
          )

def testexit():
    pt('sleeping')
    sleep(int(currentThread().getName()))
    pt('calling os._exit')
    os._exit(0)
    pt('returning')

def alldone():
    pt('all done')

t0 = time()
atexit.register(alldone)
threads = []
for i in [1, 2]:
    th = Thread(target=testexit)
    th.setName(i)
    th.setDaemon(False)
    th.start()
    threads.append(th)

for th in threads:
    pt('waiting to join thread th=%s' % th.name)
    th.join()
    pt('main: joined to th=%s' % th.name)
"C:\Users\Brian Fitzgerald\PycharmProjects\blog\venv\Scripts\python.exe" "C:/Users/Brian Fitzgerald/PycharmProjects/blog/threadexit.py"
el=0.0s th=1: sleeping
el=0.0s th=2: sleeping
el=0.0s th=MainThread: waiting to join thread th=1
el=1.0s th=1: calling os._exit

Process finished with exit code 0

Notes:

  • MainThread is terminated while waiting to join thread 1.
  • thread 2 is terminated while sleeping
  • alldone is not reached
  • The output is identical if th.setDaemon(True)

Beware that os._exit can terminate a thread while it is in an exception handler:

def testexit():
    pt('try')
    try:
        raise Exception()
    except:
        pt('sleeping')
        sleep(int(currentThread().getName()))
        pt('done except')
    finally:
        pt('done finally')
    pt('calling os._exit')
    os._exit(0)
    pt('returning')
"C:\Users\Brian Fitzgerald\PycharmProjects\blog\venv\Scripts\python.exe" "C:/Users/Brian Fitzgerald/PycharmProjects/blog/threadexit.py"
el=0.0s th=1: try
el=0.0s th=1: sleeping
el=0.0s th=MainThread: waiting to join thread th=1
el=0.0s th=2: try
el=0.0s th=2: sleeping
el=1.0s th=1: done except
el=1.0s th=1: done finally
el=1.0s th=1: calling os._exit

Process finished with exit code 0

Note:

  • In thread 2, pt(‘done except’) is not reached.

os._exit can also interrupt a finally block:

def testexit():
    pt('try')
    try:
        raise Exception()
    except:
        pt('done except')
    finally:
        pt('sleeping')
        sleep(int(currentThread().getName()))
        pt('done finally')
    pt('calling os._exit')
    os._exit(0)
    pt('returning')
"C:\Users\Brian Fitzgerald\PycharmProjects\blog\venv\Scripts\python.exe" "C:/Users/Brian Fitzgerald/PycharmProjects/blog/threadexit.py"
el=0.0s th=1: try
el=0.0s th=1: done except
el=0.0s th=1: sleeping
el=0.0s th=2: try
el=0.0s th=2: done except
el=0.0s th=MainThread: waiting to join thread th=1
el=0.0s th=2: sleeping
el=1.0s th=1: done finally
el=1.0s th=1: calling os._exit

Process finished with exit code 0

Note:

  • In thread 2, pt(‘done finally’) is not reached

Python variable scope

By Brian Fitzgerald

Python variable scoping rules are different from other languages. If you are accustomed to java, the behavior may surprise you. Scope can be file, class, or function; however, control blocks such as if-else, try-except, or while do not define scope. A variable that is defined in a control block can be used after the control block.

A simple demonstration case is the if-else block. In function test, variable a is set in the if and else blocks, and is used after the if-else structure.

scope1

Not defining a variable before entering a control block is considered idiomatic python. IDE PyCharm reports no warnings. If you try to define variable a before the if-else block, as you would in java, PyCharm greys out a and reports it as an unused variable.

scope2.png

Finally, if statement “a = 0” is changed to “pass”, then in statement “a = None”, a is no longer reported as an unused variable, and is displayed in dark grey.

scope3.png

In conclusion, python scoping is different from java’s. You should not carry java variable declaration habits into python projects.