Recently I was working on a little thought experiment to see if we could use AWS Lambda for a small web application we wanted to move off-premises. This particular application consists of a a front end, which grabs time based measurements from a simple MySql table. When looking at the table, the creation script looks something like this:

CREATE TABLE time_based_data (
  the_date_time datetime NOT NULL
  ...
  PRIMARY KEY  (the_date_time)
)

I though it would be relatively easy to convert the table from SQL to the noSQL that is DynamoDB and indeed it was very easy to do so. In my python code I created a little function that creates the table similar to the code below :

def create_dynamodb_table(table_name, key_name):
    """Create the time_based_data table if it does not already exists
    Args: None
    Returns: None
    """
    client = boto3.client('dynamodb')
    try:
        response = client.describe_table(TableName=table_name)
    except botocore.exceptions.ClientError as e:
        print("DynamoDB table '" + table_name +
              "' does not appear to exist, creating...")
        dynamodb = boto3.resource('dynamodb')
        table = dynamodb.create_table(
                    TableName = table_name,
                    KeySchema = [ { 'AttributeName': key_name,
                                    'KeyType': 'HASH'  } ], # Partition key
                    AttributeDefinitions = [ { 'AttributeName': table_keyname,
                                               'AttributeType': 'S' } ],
                    ProvisionedThroughput = { 'ReadCapacityUnits': 5,
                                              'WriteCapacityUnits': 5 }
                )
        # Wait until the table exists.
        table.meta.client.get_waiter('table_exists').wait(TableName=table_name) 
        print("DynamoDB table '" + table_name + "' created.")

After the table is created (which can easily be done on the fly), we can then populate the data, simply using a python dict with the data, expecting the key in the format YYYYmmddHHMMSS). We can then access all of the data (at 1 second granularity) using the timestamp. This all seemed very promising:

def insert_into_dynamodb(data):
    """Insert a data record into dynamodb
    Args:
        arg1 (dict): The dictionary object to store. The 'timestamp' key
                     will be used as the dynamodb key
    Returns: None
    """
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table(table_name)
    response = table.put_item(Item = data) # store data as the python object

After this was done, I wrote a lambda function (more about that later) which would get triggered by AWS every 5 minutes to grab the time based data and put it in the DynamoDB. This all worked flawlessly and it seemed that this might be a great way to move this to an AWS based system.

However, once I got to writing the code that would display the data, I needed to re-implement the functionality that in the original code started along the lines of:

select * from time_based_data
 where the_date_time >= '2016-04-01'
   and the_date_time < '2016-04-02'

And here things started to become complicated. DynamoDB is designed as a key/value store but not as something that can access a select set of the keys without scanning the whole table. So trying to implement the functionality as designed by the original code which used SQL ran into the ground quite quickly. I knew that DynamoDB does not allow for duplicate keys, but I did not realize that searching for multiple keys involved a whole table scan.

There's been some discussion on the web about this (e.g. here). But I decided in the end, it's simply not what DynamoDB wants to do. So I've designed an alternate approach using S3 that seems to work reasonably well. I'll write that up soon.

Having said all of that, DynamoDB is incredibly powerful and very easy to use. I'm sure we'll find use for it in the near future.