Amazon Web Services logo

In the previous blog post I described how I'm able to use S3 buckets for the storage of log files and how I can easily access and update those logs. The final functionality of this particular web application needs to reformat the data as JSON, but the actual transformation of the data is outside the scope of this blog.

generic key logo

AWS has an extensive authentication and authorization framework, which they simply call Identify and Access Management or IAM for short (I guess no-one ever accused AWS of being overly creative with their naming of service). I tend to use a variety of tools to develop my code, but generally, I use some sort Linux based system. I then create access keys for the account I use. I never use the root account for this and there are a large number of best practices around the use of IAM based accounts. Suffice it to say that the assigned roles for the account I use have enough privileges to get the job done, but don't necessarily have full access to all of the AWS services. I've also described how I manage the various keys I have to easily switch accounts.

AWS Lambda logo

Once all this is set up, it becomes easy to develop, run, test and debug the code on my local workstation. So let's say I want to use the code I described in the blog post on configuration and want to turn this into a (not very interesting) Lambda function. AWS allows you to write code directly in the AWS console, but I prefer to have capable debugging tools at my disposal, so after using my local workstation for development I upload the code that is working to AWS using the CLI. But since the code is identical regardless of whether it runs locally on my workstation or at AWS, the actual process of developing and testing of the code is very similar to a normal development flow. The AWS Lambda configuration needs to provide a handler ( much more detailed information here), which is defined in the configuration. I usually call this 'lambda_handler', following AWS' lead on creative naming conventions. I just add the common pattern calling that function when the code is invoked as a stand-alone script, as follows:

#!/usr/bin/python
import boto3
import ConfigParser
import io

def get_config_from_s3():
    bucket_name = 'fres-data.es.isc.upenn.edu'
    config_file = 'fres_data.ini'
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    cfg = (bucket.Object(config_file)).get()
    config = ConfigParser.ConfigParser()
    config.readfp(io.BytesIO(cfg['Body'].read()))
    return config

def lambda_handler(event, context):
    config = get_config_from_s3()
    for section in config.sections():
        print("[" + section + "]")
        for (k, v) in config.items(section):
            print(k + ": " + v)
    return {'result': 'success'}

if __name__ == '__main__':
    print lambda_handler({}, {})

And I can now just run this from the command line on my workstation, resulting in:

# get_config.py
[ftp-server]
server: 192.168.1.2
username: exampleuser
password: examplepassword
filename: specialfile.csv
{'result': 'success'}

generic 3-D bar chart graphic

This is of course not very interesting and the output when running at AWS will go to their CloudWatch service, which can be used to verify correct operation of the Lambda function. However, this shows how I can simply create code on my local workstation, debug, test and run it and it behaves identical to how it runs when deployed as a Lambda function at AWS.

There are a few things to be aware of however. In an earlier blog post I described how the AWS Lambda python environment is not exactly equal to the environment that you get when running python on the command line on an EC2 instance that you started with the AMIs that AWS lists as the base of their Lambda python environment. In my experience this is not usually an issue, but I did find it helpful to create a small Lambda function that put the output of help('modules') in an S3 object so I could refer to it whenever I need it.

But if for now we assume the easiest case for our Lambda function, i.e. a single python file with all the needed code, I can now use the AWS cli to deploy and publish the Lambda function. However, as with anything you use at AWS, first I need to setup the correct IAM role that will be used when running the Lambda function. All policies at AWS are represented by JSON files and frankly it's easier to use the AWS console to work with policies, since it has nice editor and validation support. But I can also create it using the CLI and in order to do so I have the following policy file (policy.json), which describes the permissions I want the Lambda function to have:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [ "s3:*" ],
            "Resource": "arn:aws:s3:::*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}

There are two things of note in this policy. First, I give the Lambda function full access to all all the S3 operations ("Action": [ "s3:*" ])as well as all the S3 resources that may be available to it ("Resource": "arn:aws:s3:::*"). In real life scenarios, this would be more restricted. I also give access to the logs functionality. This is needed if I want to be able to look at the output of the Lambda function in the CloudWatch logs. I usually like to define policies that can be attached to roles, rather than inline policies, since then they can be re-used. So I create the policy as follows:

# aws iam create-policy --policy-name get_config_policy --policy-document file://policy.json | \
    jq -r '.Policy.Arn'
arn:aws:iam::012345678912:policy/get_config_policy

Next I create the role which will be used for running the Lambda function. Every role at AWS has a policy document (yet again) associated with it which describes what 'Principal' is allowed to assume the role. Most times these are simply the AWS services, but it could also be a federated user (see this blog post for an example). So I have this policy file (assume-role-policy.json), which shows that the AWS Lambda and API gateway services are allowed to assume this role:

 {
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "lambda.amazonaws.com",
          "apigateway.amazonaws.com"
        ]
      }
    }
  ]
}

Create the role and attach the policy we created earlier using the policy ARN that was displayed:

# aws iam create-role --role-name get_config_role \
    --assume-role-policy-document file://assume-role-policy.json | \
    jq -r '.Role.Arn'
arn:aws:iam::012345678912:role/get_config_role
# aws iam attach-role-policy --role-name get_config_role \
    --policy-arn arn:aws:iam::012345678912:policy/get_config_policy

And once all this is in place I can deploy the Lambda function, using the role ARN that was displayed and then run a test invocation:

# zip get_config.zip get_config.py
# aws lambda create-function --function-name get_config \
    --description "Sample Lambda function getting a config file from S3" \
    --runtime python2.7 \
    --role arn:aws:iam::012345678912:role/get_config_role \
    --handler get_config.lambda_handler \
    --zip-file fileb://get_config.zip \
    --publish | \
    jq -r '.FunctionArn'
arn:aws:lambda:us-east-1:012345678912:function:get_config
# aws lambda invoke --function-name get_config output.txt
{
    "StatusCode": 200
}

The output file will just contain {"result": "success"} since that is the Python dict we returned and Lambda conveniently re-wrote that result to JSON. The other output went to CloudWatch. In the actual functionality I wrote the code for, the Lambda handler reads the appropriate S3 bucket containing the logs files, extracts the data of interest and creates a Python dict, which will then be returned. Eventually I'll use the AWS API gateway to provide a web front end, which can then be used in a web page to gather the data needed to display a graph using standard XMLHttpRequest (fronted by jQuery).

In future blog post I will discuss how to trigger Lambda functions using AWS CloudWatch Events and AWS Simple Notification Service (SNS).

If you have any comments, questions or other observations, please contact me directly via email: vmic@isc.upenn.edu.