Performance blog - why we thought we were cool

Tech chat: Scaling our systems to supercharge AI performance

By Richard Anderson / Founding Partner, Nuon AI

In our last blog, we shared the results of recent performance testing which revealed our AI can process 1,000,000 quotes a week. Now, Nuon Founding Partner Richard Anderson takes you on a deep dive into the tech behind those numbers, giving you a glimpse into the day in the life of our Engineering team.

We set out to use App Engine right from the off. Sooner or later we knew we’d need to scale to meet the performance needs of a client quoting through a motor aggregator in the UK where figures like 1,000,000 quotes a week are commonly bandied about. We didn’t expect that they would be one of our first clients mind you.

That 1,000,000 figure on its own doesn’t tell you much in reality. If they were spread evenly over the week that would be less than 2 quotes a second – a pushover by anyone’s performance standards. But they’re not spread evenly, they’re extremely peaky, and the devil is definitely in the peaks.

We test our system all the time of course. Every change, every commit, and release is tested, including performance testing. In addition, we run market simulations almost constantly which push hundreds of thousands of quotes through the AI. So we were confident that we had even a UK motor aggregator environment covered.

Before discovering we were uncool, we were looking at results like this:

Total transactions per second

That’s one graph from one of our test reports (using JMeter by the way). Each test run gives us 30 or so similar graphs plus a set of metrics from the servers themselves.

You have to be careful though. There’s a lot said in The Lean Startup, among other places, about vanity metrics, and it applies well to how we were viewing our performance metrics. You have to be looking in the right place and, inadvertently, we were focusing on metrics that showed our progress in the rosiest light. And it’s why we weren’t as cool as we thought.

How we discovered that we weren’t

When we engaged with the client, performance wasn’t the first topic on the agenda but it was close. We discussed rough numbers and they fell within the range we anticipated and had tested for, so all was good – but we didn’t initially talk too much about the peaks.

As the project progressed, a peak figure of 200 quotes a second was arrived at. App Engine will scale as necessary of course, and we can pre-configure the backend servers as needed. As you can see from the graph in the previous section, even in its default (very modest!) configuration, 100 quotes a second was possible. So, 200 didn’t seem to be a problem. The 200 figure was increased to 400 a bit later. After all, there are peaks, and then there are peaks.

The client, naturally, ran their own performance tests against our server. This is when we encountered the problem. Their tests were quite brutal. There was an immediate surge of 200 threads, each thread posting quotes as fast as possible. App Engine will scale, but that takes a little time so we saw some very disappointing response times from the surge. However, when we started focusing on the response times we realised something worse.

Time vs threads

The response times were poor across the board. Way more than our one-second limit, sometimes 5 or 10 seconds, or even more. That’s nowhere near good enough in the competitive motor quote comparison market.

So what was going wrong, and how had our confidence been so misplaced?

How we became cool again

Our performance guru sat in a darkened room with a cold towel wrapped around his head for a while and stared deeply into the performance numbers. One thing came to light pretty quickly: knowing that n-hundred quote responses are returned in a given second says nothing about how long each one took from start to end.

We’d been flattering ourselves with the wrong metric. We needed to be looking at the response times themselves.

When we started looking in the right place, the problem became clearer. When pretty much all the responses are too slow it’s a bread and butter problem to a developer:

  • Profile the code,
  • See where the time is being spent,
  • Optimize

That’s exactly what we did. We found a number of useful optimisations along the way but then hit the motherload: a huge database locking issue.

Who would have guessed it? A database lock causing a performance issue 😉

That locking issue was the fundamental cause of the whole problem. It was made worse by being particularly noticeable during cold-starts – when the AI is encountering a lot of new patterns very quickly – and that’s exactly the scenario our client’s 200 thread surge testing created.

Once fixed, bingo, we can process > 1,000 quotes per second.

Total transactions per second

Time vs threads

On top of this, the backend analytics show only 20% CPU utilization at that level, suggesting we could process well over 1,000 per second with the current configuration if we let App Engine loose – just put your head back to stop the nosebleed.

Is Nuon right for you? Let’s talk! Get in touch now to find out more about how our AI insurance products can benefit your business.

Like this article?

More to explore