Accessing MySQL databases from an AWS Python Lambda function

Posted: Saturday, July 23, 2016, 12:37PM
Author: Michel van der List (Michel van der List's other blog posts — blogs by all authors)

AWS provides a tutorial on how to access MySQL databases from a python Lambda function. Though it is thorough, I found there were a few things that could use a little extra documentation. This blog post addresses that and provides fully working code, including scripts for some of the steps described in their tutorial. I'll follow the same order of the instructions AWS provides. These instructions were all tested on a Linux box (currently running Fedora 24), but they should translate relatively easy to other Linux versions or OSX. It should also be very easy to use these instructions using an small EC2 instance running at AWS. Also note that I use the AWS cli extensively, so this assumes you have that functionality installed on your system prior to using these instructions.

One of the things that the tutorial mentions is to (paraphrased) 'Install … pymysql … using pip'. This is rather vague in its description. When using python libraries that Lambda does not natively provide, you'll need to add them to the Lambda deployment package. The easiest way to do this I found was to just use the Python virtualenv functionality to get required library ready. If there is native code involved, you have to make sure to use a compatible system to build the code. AWS documents the AMI they use for the Lambda environment, so you could use an EC2 instance started using the correct AMI to get the library ready. I find it easiest to simply get the library ready at the root directory of my project, similar to this:

# cd ~/tmp
# virtualenv lambda_package
New python executable in /home/michel/tmp/lambda_package/bin/python2
Also creating executable in /home/michel/tmp/lambda_package/bin/python
Installing setuptools, pip, wheel...done.
# cd lambda_package/
# source bin/activate
(lambda_package) # pip install pymysql
Collecting pymysql
  Using cached PyMySQL-0.7.5-py2.py3-none-any.whl
Installing collected packages: pymysql
Successfully installed pymysql-0.7.5
(lambda_package) # ls -ld lib/python2.7/site-packages/pymysql
drwxr-xr-x. 4 michel michel 4096 Jul 22 13:13 lib/python2.7/site-packages/pymysql
(lambda_package) # deactivate
# mv lib/python2.7/site-packages/pymysql <my_project_root>/

The next thing to do is to start a managed MySQL database at AWS. This is very easy, simply use the following command, of course substituting the appropriate names, passwords, etc.:

#!/bin/sh
aws rds create-db-instance \
    --db-instance-identifier MySQLForLambdaTest \
    --db-instance-class db.t2.micro \
    --engine MySQL \
    --allocated-storage 5 \
    --no-publicly-accessible \
    --db-name afancydbname \
    --master-username someusername \
    --master-user-password supersecretpassword \
    --backup-retention-period 3

The database will start in your default VPC, so if you're looking to test the database using an EC2 instance, it's important to make sure your instance uses the same VPC as well as the correct security group. If not, there's a good chance your database will not be accessible on the standard (3306) port. It will take a while for the database to start but eventually you can find the endpoint (i.e. what to connect to) as follows:

 aws rds describe-db-instances |\
     jq -r '.DBInstances[]|select(.DBInstanceIdentifier="mysqlforlambdatest").Endpoint|.Address'

This command assumes you have jq installed. jq is a very nice lightweight JSON parser which is quite helpful when using the AWS cli in JSON mode, which I do.

We'll create the deployment package next. Our code is almost identical to AWS' example, but I've made a few minor changes that in my opinion made sense. First we create a configuration module, called rds_config.py. You'll need the username, password, database name and the endpoint of the running RDS instance.

#!/usr/bin/python
#config file containing credentials for rds mysql instance
db_username = "someusername"
db_password = "supersecretpassword"
db_name = "afancydbname"
db_endpoint = "mysqlforlambdatest.abcdefghijkl.us-east-1.rds.amazonaws.com"

Next we create a file called mysql_test.py. Please note that the file name will be used when creating the Lambda function, since AWS uses the file name as the identifier for the Lambda function. Again this code is largely identical to AWS' but I changed the config code slightly and renamed the handler function:

#!/usr/bin/python
import sys
import logging
import rds_config
import pymysql

rds_host  = rds_config.db_endpoint
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
port = 3306

logger = logging.getLogger()
logger.setLevel(logging.INFO)

try:
    conn = pymysql.connect(rds_host, user=name,
                           passwd=password, db=db_name, connect_timeout=5)
except:
    logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
    sys.exit()

logger.info("SUCCESS: Connection to RDS mysql instance succeeded")
def lambda_handler(event, context):
    """
    This function inserts content into mysql RDS instance
    """
    item_count = 0

    with conn.cursor() as cur:
        cur.execute("create table Employee3 (EmpID  int NOT NULL, Name varchar(255) NOT NULL, PRIMARY KEY (EmpID))")
        cur.execute('insert into Employee3 (EmpID, Name) values(1, "Joe")')
        cur.execute('insert into Employee3 (EmpID, Name) values(2, "Bob")')
        cur.execute('insert into Employee3 (EmpID, Name) values(3, "Mary")')
        conn.commit()
        cur.execute("select * from Employee3")
        for row in cur:
            item_count += 1
            logger.info(row)
    return "Added %d items to RDS MySQL table" %(item_count)

AWS describes how to create the deployment package, but does not mention you'll need the pymysql library as well. I will create and deploy the package in one step, so first we'll create the role we need to be able to run the function. We will use the standard AWS role for accessing the database. For the assume role policy, we'll use a standard policy allowing Lambda to assume a role, which is needs to use the defined policy. The commands to create the role:

#!/bin/sh
role_name="lambda-vpc-execution-role"
role_policy_arn="arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"

aws iam create-role \
    --role-name "lambda-vpc-execution-role" \
    --assume-role-policy-document file://assume-role-policy.txt
aws iam attach-role-policy \
    --role-name "${role_name}" \
    --policy-arn "${role_policy_arn}"

And the referenced assume-role-policy.txt file:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The role that gets created will have an arn, which contains the AWS account number. We'll need that arn in the next step, which is creating the Lambda function. The format will be 'arn:aws:iam::<aws-account-no>:role/lambda-vpc-execution-role'. We will create the Lambda function, using the role we created, the two python source files and the pymysql package we created using virtualenv. It's worth noting that you need to make sure the permissions on the python files are correct for execution at AWS. I wasted a few hours trying to debug a Lambda function that was not running because I had my permissions set to 0700 …. I also set the timeout to a generous 60 seconds.

#!/bin/sh
lambda_name="mysql_test"
zip_file="${lambda_name}.zip"
role_arn="arn:aws:iam::<aws-account-no>:role/lambda-vpc-execution-role"
subnet_ids=`aws ec2 describe-subnets |\
                jq -r '.Subnets|map(.SubnetId)|join(",")'`
sec_group_id=`aws ec2 describe-security-groups --group-name "default" |\
                jq -r '.SecurityGroups[].GroupId'`

files="mysql_test.py rds_config.py"
chmod 755 ${files}
zip -r "${zip_file}" pymysql ${files}

aws lambda create-function \
    --region "us-east-1" \
    --function-name "${lambda_name}"  \
    --zip-file "fileb://${zip_file}" \
    --role "${role_arn}" \
    --handler "${lambda_name}.lambda_handler" \
    --runtime python2.7 \
    --timeout 60 \
    --vpc-config SubnetIds="${subnet_ids}",SecurityGroupIds="${sec_group_id}"

We can now test the Lambda function by running it manually:

#!/bin/sh
lambda_name="mysql_test"

aws lambda invoke \
    --function-name "${lambda_name}" \
    --region us-east-1 \
    output.txt

Running this results in:

# ./run-lambda.sh
{
    "StatusCode": 200
}
# cat output.txt
"Added 3 items to RDS MySQL table"
#

If you like to check the results using a mysql client, simply fire up an EC2 instance in the same VPC and security group and use the mysql client (which you'll need to install), similar to this (of course your instance name will be different):

# aws ec2 run-instances --image-id ami-60b6c60a --instance-type t2.nano --key-name MichelsKP
# ssh ec2-10-10-192-90.compute-1.amazonaws.com
[ec2-user@ip-172-31-10-10 ~]$ sudo yum install mysql
[ec2-user@ip-172-31-10-10 ~]$ mysql --host=mysqlforlambdatest.cm7zzr46mbqx.us-east-1.rds.amazonaws.com \
    --user=someusername --password=supersecretpassword afancydbname
mysql> select * from Employee3;
+-------+------+
| EmpID | Name |
+-------+------+
|     1 | Joe  |
|     2 | Bob  |
|     3 | Mary |
+-------+------+
3 rows in set (0.00 sec)

If you have any comments, questions or other observations, please contact me directly via email: vmic@isc.upenn.edu.

Search form

Accessing MySQL databases from an AWS Python Lambda function