Lessons Learned Testing Serverless Libraries with Architect
I'm building a library to be used on Amazon Lambda and the hardest part of testing a library designed to run in a serverless environment is having a local environment. A lot projects have these issues, and there are many ways to solve them. Some run local services via docker and k8 with a lot of overhead, others spin up test server environments in the cloud which take a very long time, and yet others choose simulators. I chose the Architect Sandbox as my simulator and I don't regret it.
I needed the following;
- Api Gateway to simulate websockets
- a nodejs lambda environment
- Dynamo DB
- Step Functions
Architect covered 3 out of 4 of these. I looked into other solutions and they were either expensive for the features I needed (both in money and time) or slow. Architect is able to very quickly and in NodeJS simulate the following services.
- DynamoDB with dynalite a nodejs dynamodb server.
- API Gateway HTTP and WebSocket connections
- Lambda Execution Enviornment
I was able to run it's sandbox which provides these services in my tests, via api. I'm able to open multiple instances of them at a time allowing me to run tests in parallel. I've written about this previously. And spin up different configurations of the services as needed.
What I'd like to share now are the things I've learned.
We don't have any simulated "Endpoint" for the
ApiGatewayManagementApi support for controlling web sockets, sending messages and getting meta information. It's not a huge deal breaker as
@architect/functions has a
ws.send() method for sending messages, but you can't actually do the other operations at this time.
We also don't have any cognito, auth token, or other advanced features. I don't need them, but they are not available.
The NodeJS lambda environment isn't exactly the same as production. There used to be a project that would download the NodeJS handlers from AWS for local use, but they're not currently maintained. So instead Arc has it's own handler functions. I hit some edge cases around returning promises vs using async functions (quickly patched). The NodeJS environment is missing the lambda context object which includes the function name, remainign time, awsRequestId, etc. (As an aside, this makes my logging and APM systems unhappy in application development.) I have some open PRs to add some of this data. However the HTTP and Websocket events are 95% populated and include all reasonable data you'll need for an application.
That missing 5% meant some weird tricks around guessing the endpoint for the management api didn't work, but I'm ok with it.
I think at some point we should support the runtime and logs apis but they aren't in wide use and the AWS provided runtimes are pretty good for most use cases. I just wont be able to run WASM or Rust projects with Architect without them.
Dynalite the backing engine to the DynamoDB simulator, hasn't had a release in a about a year and will crash the sandbox process if it has any errors. It's missing some features (like transactions and streams) but it does what it does do very well. It doesn't like me opening and closing it constantly and will crash if there's a pending operation on shutdown. This causes an unhandled exception which will fail any testing framework worth its salt.
The sandbox has a bunch of it's own output (which you can turn off), it also optionally logs http requests and web socket lambda invocations by request ID. Functions themselves don't log in real time and sometimes you'll get interleaved output. So it's not clear what output came from what function. I think this is solvable by patching console.log and adding similar prefixes to lambda in production. However it would be desireable to have a standard like output from all the services like we do in cloudwatch. This would make debugging a lot easier.
Architect is the fastest system I tested for my needs. The Sandbox starts and stops in a few milliseconds, however it's still not free. Lambda processes aren't reused so every request spins up a new NodeJS process. If you have an older computer like I do, this might be noticeable in a test suite which starts and stops the sandbox many times a second. I've worked around this by maintaining sandbox between tests and keeping tests as stateless as possible. It's also worth noting that other people don't seem to have this problem, so maybe it's me. I feel like in a large application it will be hard to keep test times down. But again it's the fastest system I tested.
Architect doesn't claim to support this, but I can see a plugin for managing Step Functions in production on the horizon. Local development however may never be supported without a major investment in a simulator.
The best I've found by a lot
While it's a large ecosystem and I'm sure I haven't seen it all. I think Architect is the best local development option for Serverless applications and libraries I can find. It's fast, reliable, and has a single dependency on NodeJS. The fact I can also use it to deploy my applications and tame cloudformation is just gravy on top of the testing abilities the sandbox gives. I don't know how else I'd test a library written for lambda today.