Data Retention Advice

Where has the govt’s $128m data retention funding gone? – Telco/ISP – iTnews

Promised dollars are MIA.

Source: Where has the govt’s $128m data retention funding gone? – Telco/ISP – iTnews

ISP Column – August 2015

Excellent opinion piece by Geoff Huston, particularly in relation to CGN. Source: ISP Column – August 2015

Data Retention – Overview from the Workshop

And the audio from the workshop:

Your obligations stop with the services you sell

Just a quick reminder to those working to complete their DRIPs this week. Your obligations stop with the service you sell. If you sell someone a VPS or dedicated server, consider “this” to be the service. The application your client runs on the server is up to them. If your client has DR obligations, then they may have to retain data about their service – but you don’t need to know anything beyond your obligations for the server.

This is likely who the customer is, when the service was connected, any IP addressing you provide the server and perhaps a port log of when the server bounced the network port(s) it was connected to. If you provide a management interface for accessing a server’s console, DRAC or iLO (I don’t know anyone that does) then it might also be important to record access that occurs through these interfaces.

I’m a VISP. What data do I have to retain?

It’s quite common for CSPs to “outsource” large parts of their service delivery to a wholesale provider, who will deliver all the functional aspects of the services they sell; including the Internet Transit, Access Tail, IP addressing, etc. An internet provider who does this is called a VISP.

Typically, a VISP will have some basic information about the customer:

The billing and subscriber information of a service
The physical location that a fixed broadband service is sold to

and then will receive billing summaries from their VISP Provider that indicates the volume of data that an end-user transferred in a given period.

A VISP provider typically does not have visibility of the individual end user sessions (in the case of an xDSL or NBN service) or location data (in the case of a MVNO operator). So what does a VISP need to do in regards to their DR obligations on these services?

Question 1 – are these services relevant?
The first question that needs to be asked is are these services relevant. In the scenario’s I’m discussing here the answer is definitely yes – these are internet access services delivered by someone who is a CSP or ISP as far as the Telecommunications Act is concerned; because they are billing a third party for internet.

Question 2- does the provider own or operate infrastructure in Australia?
If the provider owns “infrastructure” in Australia, then they need to consider their obligations. The definition of Infrastructure is quite broad – and includes things like billing systems and servers for applications. Even if you don’t own the infrastructure, but could be seen to be the one that “operates” it then you are covered.

I’m going to suggest that in the case of most VISP operators they are going to satisfy these two requirements for a significant number of services.

What data do I have to retain?
The starting point for this question is typically to review the data matrix to determine what data you would need to retain for each service. This includes:

The subscriber of the service
The source of a communication (i.e. the account, source telephone number, IP addressing details, etc)
The destination of a communication (i.e. destination phone number, terminating LAC with port identification details or any other identifier that you might have, but not “internet destinations”)
The date, time and duration of the communication
The location of the equipment or line used for the service

It’s clear that while VISPs have sections of this, there are probably significant bits of information in the data set that a VISP can not see.

The Attorney General’s Department has released an FAQ for Industry that states the following:

2.5. How does data retention differ between wholesale service providers, retail service providers and resellers? (NEW)

A service provider is only obliged to retain data from the data set that it uses to provide its relevant service. The concept of having “visibility” can be useful in understanding what data a service provider must retain to meet its data retention obligations in relation to a particular relevant service.

For example, data relating to an over the top email service is only retained by the relevant ove rthe-top provider.

Contractual agreements can be used to define the boundaries between a wholesaler’s and retailer/reseller’s relevant services or for one provider to cause another provider to retain data on its behalf. The data retention obligations will remain with the provider who operates the relevant service.

One of the key aspects here is how the AGD has understood the term service:

1.4. What is a “service” for the purposes of data retention? (NEW)

The Australian telecommunications industry uses the term “service” in a number of ways. Some industry participants use the term “service” to refer to a commercial product that can be sold to a customer, such as “a mobile phone service”. The term is also used where many providers work together to deliver the final commercial product.

In the context of data retention, a provider’s “service” is the particular element of a commercial product that the provider operates.

For example, a voicemail product offered to a customer may comprise a telephony service that connects users to the voicemail server. An SMS service would alert users to the existence of a 12 voicemail message on the server, and the voicemail server itself. These services could be operated by different providers or the same provider depending on business models.

Providers need to take into account the commercial and technological context of a “service”.

While a wholesaler, retailer and reseller may all provide an internet access service, each of the elements is a different “service” for the purpose of data retention.

Understanding “service” in this way helps ensure that providers only need to keep data that relates to the service they provide (being data they have “visibility” of).

Based on this definition, the FAQs encourage service providers to define their services in such a way that reflects the visibility of information that they have.

The Case Studies in Annexure B (page 41) provide some really good examples of how this works in practice. These case studies take a traditional “3 tier” service aggregation model and provide an explanation of what the service/scope is for each provider in the tier. In your case your model could of course be slightly different.

Is that all there is to it?
One of the main concerns is the question of whether the AGD’s interpretation is correct. There is certainly a number of experts who have stated that the interpretation they have taken in the FAQ may be flawed.

Specifically, the definition of a service suggested by the AGD, while convenient, does not line up with any text of the law – and so a provider who relies on this advice to not retain information that they don’t have may find themselves in breach- and potentially in trouble when a law enforcement department requests data that they can not supply.

In actual fact, in other cases the AGD have clearly indicated that just because you don’t have access to the information or don’t normally require it, you are required to create it for the purposes of Data Retention – even if this means modifying your systems. If you ever found yourself in court trying to defend yourself for data you didn’t collect, you could potentially find that a judge who doesn’t understand the nuances of the technology agrees with the prosecuting party and issues a hefty fine.

The process of submitting a DRIP provides some protection for a provider, up until the point where the DRIP period of compliance has expired (at the moment 13 April 2017). After this point, the provider is expected to be fully compliant with the legislation.

What should VISPs do?
At this point, I would recommend VISPs (and all service providers) that find they have gaps in their data set submit a DRIP. The DRIP will provide an outline of how you, as a service provider, are intending to comply with the legislation. In some cases you may be able to negotiate with your upstream providers to provide you with more detail; for example individual RADIUS sessions, Location Data or IMEI number or a connected mobile device for each session.

You’ll need to self-assess what data you have access to, and what data you might be able to get.

Importantly, however I would also suggest that you specifically apply for an exemption for any data in the data set that you are not likely to be able to obtain. Once you have been granted an exemption for this portion of the data set then you are in the clear.

I can’t say for certain that the AGD would grant an exemption like this, but an exemption request that asks for an exemption on collecting the data requested by the matrix as it is out of the view of the provider, and specifically references their own definitions in Sections 1.4 and 2.5 of the “Frequently Asked Questions for Industry Version 1.1” … would be very hard to refuse.

Do I have Data Retention Obligations?

One of the most frequent questions I have seen is the simple question of “Do I have Data Retention Obligations”. David Ohri provided an excellent overview in his presentation.

If an entity wants to know if the data retention scheme imposes data retention obligations on it, it should ask itself the following three questions. If the answer to all three questions is yes, the entity will be required to retain data under the data retention scheme. If the answer to one of the questions is no it will not have any obligations to retain data under the data retention scheme:

Are you a ‘carrier’, ‘carriage service provider’ or ‘internet service provider’?

Do you operate at least one ‘relevant service’?

Do you own or operate ‘infrastructure’ in Australia that enables the provision of at least one of your relevant services?

A service provider that satisfies all of these conditions will be referred to as a Relevant Service Provider.

Are you a ‘carrier’, ‘carriage service provider’ or ‘internet service provider’?
A lot of potential providers approach this question in the wrong order. They approach it from the perspective that they “run a relevant service” and then try to work out from there if they are a carriage service provider. Fortunately, the legislation isn’t drafted in that form.

It’s worth spending a bit of time considering each of these points individually. If you are a “carrier” you probably already know. You have applied for a Carrier license and been granted it by the ACMA.

The next two, “carriage service provider/internet service provider” can be a little more complicated. If you are an “internet service provider” – i.e. you provide internet to someone in the form of re-billing another parties service, terminating layer 2 internet tails and adding internet on top or providing internet in some other form, you are going to be covered.

If you are a member of the TIO, you are going to be a carriage service provider and so are going to be covered.

If you are purchasing a server in a data centre with internet from someone else and then delivering some service on top of it then you may, in fact, not be covered. In these situations you probably need to seek legal advice to help determine your status. I’ve asked industry groups, such as Internet Australia to consider whether they can assist small organisations that are struggling with determining if they obligations here with a more full matrix of what your obligations are.

I certainly hold the perspective that hotels, service offices, multi-tenanted buildings and other locations you would potentially think would “not” be covered are, in fact, probably technically carriage service providers or internet service providers and so have obligations under this legislation.

Do you operate at least one ‘relevant service’?
If you operate one relevant service (and lets face it, you probably do if you are reading this blog) you are covered.

Do you own or operate ‘infrastructure’ in Australia that enables the provision of at least one of your relevant services?
For the purposes of this question, Infrastructure is any server or infrastructure that facilitates the communications, including billing systems/servers. This may be a firewall on a customer site or a mail filtering appliance. We imagine that it would be very rare for someone to be able to answer “yes” to questions 1 or 2 and then answer “no” to this question, but perhaps, in some cases, it may be possible.

I’d love to discuss (in the comments section) a little bit more of the intricacies of whether particular scenarios, such as hosting providers or VoIP only providers (that don’t deliver internet access) have obligations in this area. I’m sure this isn’t the “intent” of the legislation, but it probably equates to the letter of the legislation.

Data Set Obligations for Voice Over IP

One of the more confusing services to consider is Voice over IP. This is generally a “telephony” service, from which you provide access for your customers to reach the PSTN network. When we first examined the data set, we were tempted to consider VoIP calls the “same” as traditional telephony calls. When we dug a little further into what the legislation requires it became clear that this would not be sufficient.

Let’s quickly review what the dataset requires that you retain.

The subscriber of the service
The source of a communication
The destination of a communication
The date, time and duration of the communication
The type of a communication
The location of the devices used in connection with a communication

Many smaller providers use VoIP call routing platforms such as Asterisk, FreeSwitch, OpenSIPS, Kamailio or 3CX. Each of these systems have a mechanism for recording call detail records (CDRs) that includes:

The VoIP “endpoint” that placed a call
The Caller ID presented by the “originating party” for a call
The Number dialled
The time and date of the call
The duration of the call
The VoIP “endpoint” that received the call

This is a great starting point for data to be retained, but it comes short in the following key areas:

In the case of most of the platforms mentioned above, the CDR records do not also record the “IP address” of the VoIP device that originated/received the communication. This is important as a subscriber may have registered from one of several IP addresses, and the IP address is an important part of identifying the location of a subscriber. It’s also important in potential SIP fraud scenarios, where a customers credentials have been compromised and used from another SIP system to place fraudulent calls.
The data set requires you to record both successful and unsuccessful communication attempts, which may not be handled by your system correctly
The data set requires you to record (for the destination of a communication – a call in this scenario) whether a call “has been forwarded, routed or transferred, or attempted to be forwarded, routed or transferred.”

In most cases, VoIP providers are using a protocol called SIP to manage their calls. Let’s consider an example PrePaid VoIP system SIP dialog that looks like this:

Example PrePaid SIP dialog (courtesy of http://www.opensips.org/Documentation/Tutorials-B2BUA)

In this case our customer initiates a SIP call to us, and we invite it to a third party, a SIP provider. This SIP Provider uses a load balancer that functions through the use of SIP re-invites. They inspect our SIP request and then “re-invite” us to another SIP endpoint to terminate the call. The data set requires you to record the “origination IP” of the call and the final “destination IP” of the call.

While this scenario is relatively benign, imagine a different scenario where an external party originates a call to your customer, who then “re-invites” the call to another SIP URI. Some customer PBX systems insist on handling transfers in this method. In these cases, the final location where you terminate the call is determined by the final address that you send it to. In these cases you need to record both legs of the transaction.

“Leg based accounting” is not impossible in most systems, but it creates some major headaches for billing scenarios. In some cases it may be possible to natively extract the destination IPs from your PBX platform (for instance, the source and final destination SIP endpoint can be extracted in Asterisk using the standard SIP client and stored in CDR, but not any IPs for legs along the way without some very complicated Manager behaviour).

For SIP based systems, I’d generally suggest another approach is in order. Software such as Homer provides a Capture Agent (captagent) client that allows you to intercept SIP traffic using standard PCAP software and redirect it to another system for logging and analysis. In the case of Homer, it’s relatively trivial to modify the code to only pay attention to INVITE Call dialogues and record these in a database, which could be the basis of your SIP implementation. You can even do this “off-host” to ensure that it doesn’t interrupt your flow of normal operations by mirroring your traffic facing your SIP load balancers or proxies to one or more hosts which will intercept the SIP dialogues.

This has the added advantage that you get a full copy of the SIP dialogue, whether the call is successful or not. In the case of your system acting as a B2BUA (the most common scenario) you can also use this data to identify faults and trouble shoot systems on your network.

From the perspective of a Data Retention implementation, this is the sort of thing that is perfect to consider in your DRIP. If you need to make system changes to your environment to comply with your data retention obligations, you have 18 months in which to complete them provided you have lodged a DRIP indicating your intention to comply.

What if I resell a SIP service provided by another provider?
One of the questions I’m frequently asked by my customers (who use our Hosted PBX or SIP Trunking Service) is around what data they have to retain in relation to the services they sell, as they traditionally only have access to the CDRs.

The FAQ released by the Attorney General’s department (not available online) makes it clear that you need to provide information as far as it’s reasonable for you to have visibility.

In a situation where you rely on an upstream to provide SIP termination for you and you ultimately bill the call, it’s reasonable to expect that you would have access to basic CDR information, as well as the subscriber details. It would also be reasonable to expect that you can identify the carrier/provider that terminated the call. (This is relevant because you may “change” a carrier you use to provide or terminate a call from time to time). If you received a request for metadata that relates to a particular call, you could from the data you have provide the basic information including the upstream carrier that terminated the call. The upstream carrier could then be provided with a request for more specific information.

The same principal applies when you use are interacting with an upstream VoIP provider. The Data Retention rules do not mean you need to “know” where your upstream VoIP provider sends their calls – you just need to know that you send it to them. They can then send them wherever they need, and they must retain the data about that.

I like the idea of implementing my retention this way, but I don’t know how to do it/it’s too much work?
Pardon the commercial plug, but Real World can help you implement a scalable capture solution and open source technologies like Homer and Captagent (or even simple pcap filters) to capture this information. We’ve been through this process but have learnt the hard way that even with a relatively modest number of subscribers the potential volume of data that is created by the capture process can be very large. Parsing and processing the data is essential to keep the volume of data you are retaining low.

Data Retention – What is it?

At the recent Data Retention workshop hosted by Real World and Dell Australia, David Ohri from Thomson Geer provided an excellent overview of the Data Retention legislation.

We’ll be publishing a video summary of his talk in the next couple of days – but in the mean time his slides are available for download here.

Understanding Data Retention obligations in a WAN context

At yesterday’s Data Retention Workshop we discussed a range of common service scenarios we are dealing with that we found are not properly explained by the material that has been published by the AGD. I felt that it was important to capture the results of the workshop and share the information with the Service Provider community.

The goal of these blog posts is to provide some high level indications of how I believe Service Providers should interpret their obligations in these contexts.

What is my approach to determining DR obligations?

In general, we are taking the following steps to determine what our obligations are from a “product we sell perspective”.

Work out what it is you sell to your customers. Draw a diagram, or anything else that will help you understand all the aspects of what you do. Try and make this as detailed as possible.
Work through your diagram to determine what are the “relevant service(s)” in your product. We aren’t going to go into great lengths to define what this is in this post.
Work out if there are any relevant exclusions for your “relevant service(s)” that would mean that they are “not relevant” for the purposes of data retention.

Once you’ve completed these, your process to assess your obligations and develop a compliance plan (which is part of your DRIP) is pretty straight forward:

Work out what data you need to collect to meet your obligations. Try and do this ignorant of what data you currently collect.
Work out what data you currently have available to you. This is probably a subset of the required dataset.
Work out what you need to do to create/generate/record any additional information you need and how long it’s going to take you to get there. Make sure it’s less than 18 months!
Record this information in your DRIP.

We won’t go into this level of detail in this post, but I’ll aim to publish some information about what you might need to consider around individual services over the coming weeks.

The Managed WAN Scenario

One of the most common scenarios we are asked about is a Managed WAN. This is a good starting point to consider some of the intricacies of your Data Retention obligations as it covers a range of services. For the purposes of this exercise, we are going to take the viewpoint that the person assessing their obligations is the “Layer 2 Network Operator”. This means we are going to assume that the “internet” service is provided by them, that they own the MPLS infrastructure, and that they are also the provider of the PSTN connectivity for the phone system.

Step 1. Diagram out the service

We’ve done this already. It’s important to identify the possible places that you might have Data Retention requirements so that you can properly self-assess what you need to do.

Step 2. Determine the Possibly Relevant Services

In this scenario there are a number of possibly relevant services. At this point it might be tempting to start excluding services because you think they might be relevant, or looking at individual services and asking a lot of “what if” style questions. They key thing is “what do you sell” and “could we potentially have an obligation around it”.

In this diagram the potentially relevant services we have:

Internet Access Service
Comms links to site (via a variety of technologies)
The “MPLS WAN bit”
The routers at each site
The “LAN” and “Voice” networks on each site
The edge “firewalls”
The VPN servers
The SIP connectivity
The PSTN connectivity
The PBX(s)
The Application Servers

For each of these services we need to consider whether or not these potential “services” is in fact going to be subject to data retention obligations.

Step 3. Determine if there are any relevant exclusions for the service

Generally speaking there are two exclusions which are going to apply – an exclusion for the supply of services in “the same area” and an exclusion for the supply of services to a Person(s) “immediate” circle.

Internet Access Service
This service is definitely relevant, and no exclusions would apply.

Comms links to site(s)
These are tricky. In this scenario, the comms links to sites are not “internet access services” and are services to support the supply of services to a Person(s) immediate circle. In my view, that means that in the situation where you are providing a Corporate WAN, there is no data retention obligations for the individual links that connect the WAN into your MPLS network.

The “MPLS WAN bit”
In our opinion this is not a relevant service, even though it is part of the carriage of communications. If you believe that this is, in fact a service, then the same exclusion we apply for the comms links would apply in this scenario.

The routers at each site
I would consider that we can safely rule these out for two reasons:

We have already ruled out the Comms links to the individual site(s) as not relevant because they are providing service(s) to the Person(s) immediate circle.
Even if the Comms links were in scope, what goes on behind the routers would be irrelevant, so in considering the comms links we are going to capture any relevant data.

The “LAN” and “Voice” networks on each site
For the same reasons as we exclude the routers we can safely exclude the LAN. Specifically, we are applying both the “same location” exclusion and the “immediate circle” exclusion here.

The edge “firewalls”
These are perhaps more tricky. I’ll start by saying that I don’t believe that these are relevant in this context because we are already capturing all the relevant data on the internet links for this customer. Assuming that you are performing NAT on the firewalls, it is still a single entity behind the firewalls, and it’s clear that the DR legislation does not require you to track the individual usage of an employee, student, etc. The point of the legislation is to identify the entity that did something, not the individual employee or student that did something.

The possible difference here would be if your firewalls are multi-tenanted. You would need to have enough information to identify who the customer is that performed a particular action. That means that if you share a single public IP address, you are going to need to retain NAT tables to pinpoint down which customer used an IP; but you possibly need/want to de-identify these to hide the actual “IP endpoint” behind the firewall. If each customer had a separate public IP address you probably don’t need to do anything.

Long and short here – in the scenario pictured, excluded/not relevant.

The VPN Servers
A lot is going to depend on how you implement the VPN scenario. If the VPN is single-tenanted (i.e. one VPN public IP address per customer) then it’s probably not relevant. If you multi-tenant your VPN delivery (i.e. one public VPN endpoint that connects customers to different networks on the basis of their credentials or specific URL they hit) then you are going to need to retain logs that are sufficient to place where a user’s connection landed them.

The SIP Connectivity
This is relevant.

The PSTN Connectivity
I’m assuming that there is some sort of “PSTN” or “ISDN” failback. Regardless, this is relevant.

The PBX(s)
Even if you manage these servers, they are used to provide service to an “immediate” circle and so can be considered excluded. You would not be required to retain information about internal calls between endpoints in the same immediate circle. In a Hosted PBX scenario the answer could be different.

The Application Servers
For the same reasons we exclude the PBXs, we are going to exclude the application servers. If these servers were multi-tenanted then the obligations could be different.

This leaves us with the following services we need to determine our data retention obligations on:

Internet Service
SIP/PSTN Service
VPN Services