Updating Security controls on Palringo’s infrastructure using Cloudformation

Taking a large scale cloudformation template suite with shared resources with restrictions on security and scale to a target state of increased security and improved future-proofing

Challenge

Palringo (recently rebranded as WOLF – World OnLine Festival) is a messaging and gaming platform that gives its users a place to chat freely, create their own social spaces, and connect with people around the world. Over time, it had developed several bots, each of which had its own unique CloudFormation template.

A portion of the existing setup is shown below:

Palringo’s existing AWS solution for deploying their Bot functions took advantage of services such as lambda and docker via CloudFormation. As part of the standard deployment process, a security group parameter was passed into each new CloudFormation template. In the AWS Virtual Private Cloud (VPC) environment, a security group acts as a virtual firewall that controls the traffic for one or more instances. Rules can be added to each security group and these can be modified at any time – but the new rules automatically apply to all instances that are associated with the security group.

In Palringo’s case, any new template that had to be rolled back for any reason would try to apply those changes to the security groups of all the other bots. This essentially created a situation where several CloudFormation templates were each trying to control the security group and all the shared resources associated with them. The sheer scale of this issue had made it unmanageable and Palringo asked CirrusHQ to create a solution to bring control back to the security groups and to create a replicable process moving forward.

Shown below is an example where one of the bots has had an issue and is affecting the main security group.

Solution

CirrusHQ created a new deployment process for Palringo that involved introducing a new shared Elastic Container Service (ECS) security group; and updating existing deployments to reflect the new approach. In essence, it helped Palringo move from the old model – in which each template used a passed in security group – to a new model that links each template’s individual security group to the shared group. By having a shared security group, the egress rule is created on the group within the bot and the shared group is not modified, so any rollbacks only affect the bot itself.

This shared security group serves as a link between each bot’s structure and the master list of access rules, acting as a passthrough from the newly created bot security group to the ECS security group. This allows the changes to be made only to the security groups that the CF template itself creates and not on any shared resources. This passthrough allows for the addition and deletion of any individual security group from a deployed template without affecting any other group, essentially ring-fencing the new template so there are no knock-on impacts in the event of a rollback.

From a DevOps perspective, the solution centralised control of the security group and separated out the bot deployment security so that changes can be made without having to fix every bot deployment.

The solution seen below where the new shared security group has been added and the ingress rules changed to include the new shared security group.

Outcome 

Existing functions deployed via the new templates containing the shared security group can connect to the same services in the original template: each template simply refers to the changed template CirrusHQ created, rather than requiring change to the bots themselves. This promotes agility by virtually eliminating any additional work when new bots are launched.

Any new iterations of a service can be deployed rapidly: in terms of application security, changes to the deployed ECS Tasks (bots) do not need to factor in changes to security and any new deployment using the new templates should be able to be rolled back without affecting existing setups.

Overall, the solution enables rapid and trouble-free creation and deployment of new ECS Tasks (bots) and allows for further scalability of the team’s CloudFormation suite in the future.