Enabling data insights across education platforms in the UK schools market

Oxford University Press (OUP) has four very successful Education Platforms in the UK schools market. All these platforms were built on, or migrated to, AWS, and have existed for a number of years. Each platform, along with the supporting subscription management services, holds important data. Teams across OUP needed to analyse this data centrally to better inform ongoing updates to the platforms and interactions with their customers.

OUP chose to work with CirrusHQ, an AWS Partner, to utilise AWS services for the creation of a secure infrastructure on which an OUP report developer could design reports to meet the business needs.

Business Challenge / Problem Detail

OUP’s business teams needed the ability to derive enhanced customer insights from across their online products in order to support customer retention and acquisition strategies, and wished to pursue automated processes in order to be efficient and secure in their data handling.

OUP wanted to follow a best practice of reusable architecture patterns within the solution to ensure that any additional data sources (from new products or markets) could be incorporated within the solution if required.

The main goals for the solution included to:

  • only extract the required subset of data from multiple source datasets
  • create a reporting capability
  • ensure that customer data remains secure
  • automate infrastructure processes where appropriate
  • create secure integration points across AWS accounts
  • be cost-effective

Solution

CirrusHQ worked with the OUP project team, including a Solution Architect and Enterprise Architects initially to produce solution options, followed by a high-level architecture of the chosen solution.

The diagrams demonstrate the as-is landscape, and the to-be solution that was proposed agreed and implemented with CirrusHQ.

In the as-is landscape, a number of data stores were identified to be incorporated into the solution. Those on AWS were production databases, which were only accessible by the application layer of the related platform within its VPC. The 4 platforms were held across 3 AWS accounts, including 4 RDS data stores and one DynamoDB data stores.

In the to-be solution:

  • A new AWS account was created
  • Read-replicas were created of the production databases to ensure they did not experience any abnormal performance as a result of the reporting solution – these read-replicas cannot be promoted to master
  • A new database was created, linked to the read-replicas via VPC peering to ensure that data cannot be sent over public IP addresses
  • An application server was included for automated report generation
  • A putty server was included so that the report developer can securely view the data on the read-replicas

CirrusHQ was guided both by customer needs and policies, and AWS best practices and approaches. One of the main requirements was the need to ensure that data was secure both in transit and at rest, therefore the solution utilises VPC peering to a targeted private subnet that only contained the read-replicas. In addition, all s3 buckets were encrypted using AWS server-side encryption AES-256. Finally, the data within the EBS volumes were also encrypted using AWS KMS. Any access to the account is locked down to OUP’s IP only, achievable as this is an in-house service.

CirrusHQ utilised AWS managed services where possible. Here are some examples:

Example 1 – CirrusHQ created a lambda function that is triggered on a data file being placed into an S3 bucket. The function transfers the contents of the file into a staging table. The file itself is then moved to a separate archive folder, to be automatically deleted after an agreed amount of time determined by OUP.

Example 2 – One of the challenges for the solution was to move data from two tables in a DynamoDB non-relational database, to an MS SQL relational database in a separate AWS account. To resolve this, DynamoDB streams were enabled allowing the incorporation of a Lambda function, which is triggered when data is added to the tables. The function forwards the data to an AWS Kinesis stream, triggering another Lambda function, that is configured to use Kinesis Firehose that pushes the data to S3. Another Lambda function is then triggered to move the file between the AWS accounts using a VPC endpoint. At that point, the same pattern (highlighted in example 1) is used to move and insert the data into the MSSQL database on the AWS reporting account.

The following high-level architecture demonstrates the solution provided.

There were a number of cost optimisation features applied to ensure that the solution was cost effective on AWS. Tags were added to the main commute resources to enable them to be turned off out of work hours and at weekends this meant that the major costs within AWS were off for over 60% of the time. The use of managed services, including Lambda, were used where possible to ensure that the functions were only triggered by events to keep costs down further and to provide best practice.

CirrusHQ Role

CirrusHQ worked closely with OUP’s project team as AWS subject matter experts, to help design, build, deliver, and maintain the infrastructure for their reporting solution. The following are some of the key responsibilities of the team:

  • Peer review of the conceptual design
  • Cloud system architecture and design
  • DevOps implementation and produced Lambda functions
  • Deploying and configuring AWS Infrastructure
  • Cost and Performance optimisation
  • Monitoring and alerts

Tooling (Tools and Technologies)

Cloud Formation, EC2, S3, EBS, KMS, VPC, VPC Peering, Lambda, DynamoDB Streams, Kinesis Streams, Kinesis Firehose, SES, SNS

Expected Challenges

  • Getting data from DynamoDB from one AWS account to a MS SQL server on another AWS account
  • Creating Lambda scripts to ingest data sent via CSV into a staging table within MS SQL