At Underground Elephant, we recently built a new call routing platform for our customers. In this post, we want to talk about a few of the challenges we ran into while coding this type of system. As a company, this was new ground for us – we hadn’t previously had much involvement with the telephony aspect, so this article will share the our learning experience with building this system.
Our call routing platform is at its heart a hybrid API. We started with a RESTful web API to match prospects to agents, and combined it with a phone-based API using Twilio as the telephony provider. The general flow is:
Initial API request comes in
If the incoming request matches to a client, we provision a phone number for this transaction
If we receive a phone call on that number within an allocated amount of time, we can then route the call to our matched client; otherwise, we have to de-allocate the phone number and consider the transaction failed
One of the problems we were facing with this solution is that with the state of the phone system as it is, there’s no good way to pass along extra data with a phone call. An incoming dial to one of our numbers has absolutely zero associated context. Our options were limited to either having the agent on the phone key in a numeric code, or allocating a dedicated phone number for a short period of time. There are up and down sides to each approach:
Using a numeric code allows a certain degree of certainty that the call is being matched to the correct API transaction, but it slows down the connection and introduces the possibility of human error.
Using a dedicated number gives you a more reliable connection, but requires more back-end resources (pooled phone numbers) and requires a strict connection timeout.
We ended up using the second option, as the speed of the phone connection is paramount for our use case. If we were in a situation where consumer drop-off wasn’t a concern, we might have opted to use a numeric code for matching instead.
As with any distributed system, we had to account for the fact that sometimes an API request would fail. Twilio has a nice system of web callbacks to give our servers updates on the status of calls, but there is the off-case where a request would fail for some reason. To accommodate this, we had to run a few background scripts to check for ongoing phone calls which looked suspicious – either sitting in a “ringing” state for too long, or just running for a particularly long period of time.
For 99% of the time, our systems stay in sync, but a missed connection can result in a particularly obvious error.
Dealing with phone systems
Looking at the phone call lifecycle as part of an API looked simple at first – you dial a number, it rings, and a person picks up. However, we ran into some unexpected hiccups when dialing out to businesses.
Phone systems can lie. In some offices, the call is ‘picked up’ by an automated phone system, which then might play a ringing tone over the line until an agent is ready to actually pick it up. On our end, this showed as a connected call, and was indistinguishable from someone actually picking up the phone.
Twilio provides an experimental feature to detect if a phone call reached a human or an answering machine, based on the speech patterns when the phone is first picked up. However, in some cases the phone system of the agent might start with a greeting message (which sounds exactly like an answering machine), followed by ringing, elevator music, or a transfer to a real person.
Our best workaround so far has been requesting direct lines – another idea we’ve been playing around with is requiring some interaction from the recipient of the phone call before considering it to be connected. Again, speed of connection is a primary concern, so we have to weigh the impacts a voice prompt would have on time to connect.
We try to automate as much as we can on any of our systems, but there are times when our customers want the assistance of an empathetic person. We quickly found out that we needed a way to replay any call to see if it was a technical failure that passed as a legitimate billable call or if the consumer on the other end of the phone simply changed the intent before the call was connected to the service provider. These are things that are hard to automate, so instead we built a dashboard that allows our business owners to filters calls that exhibit commonly failed characteristics of unsuccessful calls and then quickly take action on resolving the issue.
Through this entire process of integrating phone calls with Twilio, we have managed to create a platform that is efficient for all parties.