Data Set Obligations for Voice Over IP

One of the more confusing services to consider is Voice over IP. This is generally a “telephony” service, from which you provide access for your customers to reach the PSTN network. When we first examined the data set, we were tempted to consider VoIP calls the “same” as traditional telephony calls. When we dug a little further into what the legislation requires it became clear that this would not be sufficient.

Let’s quickly review what the dataset requires that you retain.

  1. The subscriber of the service
  2. The source of a communication
  3. The destination of a communication
  4. The date, time and duration of the communication
  5. The type of a communication
  6. The location of the devices used in connection with a communication

Many smaller providers use VoIP call routing platforms such as Asterisk, FreeSwitch, OpenSIPS, Kamailio or 3CX. Each of these systems have a mechanism for recording call detail records (CDRs) that includes:

  • The VoIP “endpoint” that placed a call
  • The Caller ID presented by the “originating party” for a call
  • The Number dialled
  • The time and date of the call
  • The duration of the call
  • The VoIP “endpoint” that received the call

This is a great starting point for data to be retained, but it comes short in the following key areas:

  • In the case of most of the platforms mentioned above, the CDR records do not also record the “IP address” of the VoIP device that originated/received the communication. This is important as a subscriber may have registered from one of several IP addresses, and the IP address is an important part of identifying the location of a subscriber. It’s also important in potential SIP fraud scenarios, where a customers credentials have been compromised and used from another SIP system to place fraudulent calls.
  • The data set requires you to record both successful and unsuccessful communication attempts, which may not be handled by your system correctly
  • The data set requires you to record (for the destination of a communication – a call in this scenario) whether a call “has been forwarded, routed or transferred, or attempted to be forwarded, routed or transferred.”

In most cases, VoIP providers are using a protocol called SIP to manage their calls. Let’s consider an example PrePaid VoIP system SIP dialog that looks like this:

Example PrePaid SIP dialog (courtesy of

In this case our customer initiates a SIP call to us, and we invite it to a third party, a SIP provider. This SIP Provider uses a load balancer that functions through the use of SIP re-invites. They inspect our SIP request and then “re-invite” us to another SIP endpoint to terminate the call. The data set requires you to record the “origination IP” of the call and the final “destination IP” of the call.

While this scenario is relatively benign, imagine a different scenario where an external party originates a call to your customer, who then “re-invites” the call to another SIP URI. Some customer PBX systems insist on handling transfers in this method. In these cases, the final location where you terminate the call is determined by the final address that you send it to. In these cases you need to record both legs of the transaction.

“Leg based accounting” is not impossible in most systems, but it creates some major headaches for billing scenarios. In some cases it may be possible to natively extract the destination IPs from your PBX platform (for instance, the source and final destination SIP endpoint can be extracted in Asterisk using the standard SIP client and stored in CDR, but not any IPs for legs along the way without some very complicated Manager behaviour).

For SIP based systems, I’d generally suggest another approach is in order. Software such as Homer provides a Capture Agent (captagent) client that allows you to intercept SIP traffic using standard PCAP software and redirect it to another system for logging and analysis. In the case of Homer, it’s relatively trivial to modify the code to only pay attention to INVITE Call dialogues and record these in a database, which could be the basis of your SIP implementation. You can even do this “off-host” to ensure that it doesn’t interrupt your flow of normal operations by mirroring your traffic facing your SIP load balancers or proxies to one or more hosts which will intercept the SIP dialogues.

This has the added advantage that you get a full copy of the SIP dialogue, whether the call is successful or not. In the case of your system acting as a B2BUA (the most common scenario) you can also use this data to identify faults and trouble shoot systems on your network.

From the perspective of a Data Retention implementation, this is the sort of thing that is perfect to consider in your DRIP. If you need to make system changes to your environment to comply with your data retention obligations, you have 18 months in which to complete them provided you have lodged a DRIP indicating your intention to comply.

What if I resell a SIP service provided by another provider?
One of the questions I’m frequently asked by my customers (who use our Hosted PBX or SIP Trunking Service) is around what data they have to retain in relation to the services they sell, as they traditionally only have access to the CDRs.

The FAQ released by the Attorney General’s department (not available online) makes it clear that you need to provide information as far as it’s reasonable for you to have visibility.

In a situation where you rely on an upstream to provide SIP termination for you and you ultimately bill the call, it’s reasonable to expect that you would have access to basic CDR information, as well as the subscriber details. It would also be reasonable to expect that you can identify the carrier/provider that terminated the call. (This is relevant because you may “change” a carrier you use to provide or terminate a call from time to time). If you received a request for metadata that relates to a particular call, you could from the data you have provide the basic information including the upstream carrier that terminated the call. The upstream carrier could then be provided with a request for more specific information.

The same principal applies when you use are interacting with an upstream VoIP provider. The Data Retention rules do not mean you need to “know” where your upstream VoIP provider sends their calls – you just need to know that you send it to them. They can then send them wherever they need, and they must retain the data about that.

I like the idea of implementing my retention this way, but I don’t know how to do it/it’s too much work?
Pardon the commercial plug, but Real World can help you implement a scalable capture solution and open source technologies like Homer and Captagent (or even simple pcap filters) to capture this information. We’ve been through this process but have learnt the hard way that even with a relatively modest number of subscribers the potential volume of data that is created by the capture process can be very large. Parsing and processing the data is essential to keep the volume of data you are retaining low.

Filed under: Data Set Questions


Andrew is the CEO of the Real World Group, a family of IT and telecommunications companies based in Sydney, Australia. Andrew loves Jesus, his wife Bess and his four awesome kids.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s