Churn prediction and management system专利检索-数据集市资料储存系统专利检索查询-专利查询网

Churn prediction and management system

阅读：857发布：2020-09-07

专利汇可以提供Churn prediction and management system专利检索，专利查询，专利分析的服务。并且A system and method for managing chum among the customers of a business is provided. The system and method provide for an analysis of the causes of customer chum and identifies customers who are most likely to chum in the future. Identifying likely chumers allows appropriate steps to be taken to prevent customers who are likely to chum from actually churning. The system included a dedicated data mart, a population architecture, a data manipulation module, a data mining tool and an end user access module for accessing results and preparing preconfigured reports. The method includes adopting an appropriate definition of churn, analyzing historical customer to identify significant trends and variables, preparing data for data mining, training a prediction model, verifying the results, deploying the model, defining retention targets, and identifying the most responsive targets.，下面是Churn prediction and management system专利的具体信息内容。

权利要求

A system for managing chum among customers of a business having a statistically large customer base, the system comprising:a data mart;

a population architecture adapted to receiving customer data from one or more data sources, the customer data defining a plurality of customer attributes for each customer in the customer base;

a data manipulation module for preparing one or more analytical records from data stored in the data mart for data mining;

a data mining tool for analyzing the one or more analytical records prepared by the data manipulation module, the data mining tool adapted to return results identifying clusters of customers sharing common customer attributes and calculating individual customers' propensities to chum during a predefined period in the future, the data manipulation module storing the results in the data mart; and

an end user access module for accessing the results returned from the data mining tool and presenting the results to a user.

The system for managing chum of claim 1 wherein the data mining tool comprises an SAS Enterprise Miner.

The system for managing churn of claim 1 wherein the data mining tool comprises a KXEN data mining tool.

The system for managing chum of claim 1 wherein the data manipulation module is adapted to calculate derived variables based on customer data stored in the data mart.

The system for managing chum of claim 4 wherein the data manipulation module is adapted to generate an analytical record containing variable data, including derived variable data, associated with a plurality of customers for input to the data mining tool.

The system for managing chum of claim 5 wherein the variable data included in the analytical record are selected to provide customer behavioral data to the data mining tool to allow the data mining tool to identify significant clusters of customers based on common behavioral characteristics.

The system for managing chum of claim 5 wherein the variable data included in the analytical record are selected to provide customer value data to the data mining tool to allow the data mining tool to identify significant clusters of customers based on common value characteristics.

The system for managing chum of claim 5 wherein the variable data included in the analytical record are selected to provide customer data to the data mining tool necessary to allow the data mining tool to calculate individual customers' propensities to chum.

The system for managing chum of claim 1 wherein the end user access module is adapted to generate one or more reports analyzing chum based on customer data stored in the data mart.

The system for managing chum of claim 9 wherein a report compares active customers to churned customers.

The system of claim 9 wherein the end user access module calculates a chum rate from historical data and generates a report that illustrates the chum rated versus classes of customers defined according to customer distribution relative to a selected customer attribute.

The system of claim 9 wherein the end user access module calculates a chum rate and generates a report that illustrates the chum rate for a cluster of customers.

The system of claim 9 wherein the end user access module calculates a chum rate and generates a report that illustrates the chum rate versus a first behavioral cluster variable, and a second value cluster variable.

A method of designing an efficient customer retention program for managing customer chum among customers of a business having a statistically large customer base, the customer retention program including an analysis of the causes of customer chum and identifying customers who are most likely to chum in the future, so that appropriate steps may be taken to prevent customers who are likely to chum in the future from churning, the method comprising:adopting a definition of chum sufficient to encompass all customers in the customer base and which relies on objective factors to determine whether individual customers have churned or remain active;

analyzing historical customer data to identify significant trends and variables that provide insight into causes of churn and to identify classes of customers who are more likely to chum than others;

preparing customer data, including data corresponding to the identified trends and variables, for data mining and predictive modeling;

training at least one predictive model on historical customer data;

verifying the accuracy of the at least one predictive model based on historical data;

deploying the at least one trained model on current customer data to generate a propensity to churn score for individual customers indicating the relative likelihood that the individual customer will chum within a specified time period in the future;

defining characteristics of the target customers to be contacted during the course of the customer retention program; and

compiling a list of targeted customers having the defined characteristics.

The method of designing an efficient customer retention program of claim 14 wherein training the predictive model comprises:assembling a first historical data set that includes prepared customer data from a training period in the past for which chum results are already known, applying the first historical data set to the predictive model to obtain a first set of training results; comparing the first set of training results to the known chum results for the training period; and

adjusting the model to compensate for discrepancies between the training results and the known chum results.

The method of designing an efficient customer retention program of claim 15 wherein training the predictive model further comprises:assembling a plurality of historical data sets from a plurality of training periods in the past for which the chum results are already known;

applying the historical data sets to the predictive model in an iterative process, and;

comparing the training results to the known chum results for each iteration, and adjusting the model accordingly.

The method of designing as efficient customer retention program of claim 16 wherein the plurality of historical data sets are taken from different but overlapping training periods.

The method of designing an efficient customer retention program of claim 14 wherein verifying the accuracy of at least one predictive model comprises:assembling model verification data set that includes prepared customer data from a verification period in the past for which chum results are already known;

applying the verification data set to the predictive model to obtain a set of verification test results;

comparing the verification test results to the known chum results for the verification period, and

determining whether the verification results are satisfactory.

The method of designing an efficient customer retention program of claim 14 wherein preparing customer data for data mining and predictive modeling comprises calculating derived variables from customer data, the derived variables being applied to data mining and predictive modeling.

The method of designing an efficient customer retention program of claim 19 wherein calculating a derived variable comprises calculating an average value from a plurality of data values based on multiple observations of a single data variable.

The method of designing an efficient customer retention program of claim 19 wherein calculating a derived variable comprises calculating a trend line that represents a best fit among a plurality of data points based on multiple observations of single variable, and calculating a slope of the trend line.

The method of designing an efficient customer retention program of claim 19 wherein calculating a derived variable comprises calculating a distribution of customers based on a value of a data variable associated with individual customers, and classifying customers based on where they fall within the distribution according to the values of the data variable associated with each individual customer.

The method of designing an efficient customer retention program on claim 14 wherein preparing customer data for data mining and predictive modeling comprises assembling and analytical record including data from a plurality of customers, the data including variable values associated with individual customers for the variables that have been identified as being significant for analyzing and predicting chum.

The method of designing an efficient customer retention program of claim 23 wherein preparing customer data for data mining and predictive modeling comprises assembling a first analytical record for input to a clustering data mining operation to in which significant groups of customers are identified based on common behavior characteristics, and assembling a second analytical record for input to the clustering data mining operation to identify significant groups of customers based on common value characteristics.

The method of designing an efficient customer retention program of claim 14 wherein defining characteristics of target customers to be contacted during the course of the customer retention program comprises establishing a threshold chum propensity score, and targeting customers having a chum propensity score greater than the established threshold.

The method of designing an efficient customer retention program of claim 25 further comprising identifying a customer characteristic other than a customer's chum propensity score, and further filtering targeted customers based on the other characteristic.

The method of designing an efficient customer retention program of claim 26 wherein the other customer characteristic is customer value.

A method of identifying targets for a customer retention program, the method comprising:identifying a set of customer data variables from which a customer's propensity to chum during a future period may be estimated based on values of the identified customer data variables associated with the customer;

providing a data mining tool with predictive modeling capabilities, the tool supporting at least one predictive model for estimating the propensity of individual customers to chum during the future period;

training the at least one predictive model on historical customer data for which chum results are known such that the at least one predictive model may be refined based on a comparison of the estimated chum propensities of individual customers against actual chum results;

deploying the trained model on current data to estimate chum propensities of individual customers for the period;

selecting targets for the customer retention program based on said chum propensities.

The method of identifying targets of a customer retention program of claim 28 further comprising:receiving customer data in monthly installments; and

defining a prediction horizon such that the predictive model calculates customer propensities to chum during the prediction horizon based on customer data installments received in previous months.

The method of identifying targets of a customer retention program of claim 29 further comprising compiling a first historical data training set including a plurality of historical customer data installments, an historical data analysis month, and an historical prediction horizon, all corresponding to a period of time in the past, the first historical data set including actual chum results accumulated during the historical prediction horizon.

The method of identifying targets for a customer retention program of claim 30 wherein training the at least one predictive model on historical customer data comprises applying the first historical data training set to the predictive model to predict chum events expected to have occurred in the historical prediction horizon and comparing the predicted chum events with the actual chum results accumulated during the historical prediction horizon, and refining the predictive model based on any discrepancies.

The method of identifying targets for a customer retention program of claim 31 further comprising compiling a second historical data training set substantially similar to the first historical data training set but wherein the historical customer data installments, the historical data analysis month, and the historical prediction horizon of the second historical data training set are offset in time from the historical customer data installments, the historical data analysis month, and the historical prediction horizon of the first historical data training set.

The method of identifying targets for a customer retention program of claim 31 further comprising compiling a model verification data set substantially similar to the first historical data training set but wherein the historical customer data installments, the historical data analysis month, and the historical prediction horizon of the model verification data set do not correspond in time with the historical customer data installments, the historical data analysis month, and the historical prediction horizon of the first historical data training set.

The method of identifying targets for a customer retention program of claim 33 further comprising applying the model verification data set to the predictive model to predict chum events expected to occur in the historical prediction horizon of the verification data set, and comparing the results predicted by the predictive model with the actual chum results accumulated during the verification data set prediction horizon.

The method of identifying targets for a customer retention program according to claim 28 wherein the data mining tool comprises an SAS Data Miner.

The method of identifying targets for a customer retention program according to claim 28 wherein the data mining tool comprises a KXEN data mining tool.

The method of identifying targets for a customer retention program according to claim 28 further comprising defining at least one derived variable and calculating a derived variable value for individual customers.

The method of identifying targets for a customer retention program according to claim 37 wherein calculating a value for the derived variable comprises:selecting a base variable for which individual customers have a corresponding value each month;

calculating a base variable average value for individual customers based on individual customers' base variable monthly values over a number of months;

calculating a customer distribution based on individual customers' base variable average values;

classifying customers based on their position within the distribution; and

storing individual customers' classifications as the customers' derived variable values.

说明书全文

BACKGROUND

Consumers typically purchase products or subscribe to services from businesses who they perceive to be offering the best products or services at the lowest price. And while consumers are often loyal to providers and brands they are familiar with, they will surely shift allegiance if they believe they can obtain better products or services or a better price somewhere else. Established ongoing relationships with existing customers can be a significant source of revenue for many businesses losing customers to competitors can significantly cut into a company's revenue. Managing this phenomenon, taking active steps to prevent customer "chum" is a high priority for many businesses.

In many cases it is less expensive for a business to retain existing customers than to acquire new ones. For this reason many companies will go to great lengths to maintain their existing customer base. In highly competitive industries it is common for companies to implement elaborate customer loyalty programs or aggressive customer retention programs to prevent or limit chum. Such programs may offer incentives to customers to entice them to continue buying the company's products or services or they may simply provide some personalized contact or message to existing customers to reinforce and strengthen the relationship.

Designing an efficient and effective customer retention program can be difficult, especially when confronted with a large diversified customer base. Companies may not know whether churning is a significant problem or not. And if it is, which customer groups are most likely affected. Furthermore, a company's tolerance threshold for chum may be very low. Customer chum may be considered a problem even though it may only affect a small percentage of the overall customer base. Contacting all customers during a customer retention program is too expensive and inefficient. However, contacting too few customers could result in a failure to contact many customers who are likely to chum and who are the appropriate targets of the customer retention program. Deciding who to contact, represents a significant obstacle to preparing an effective customer retention program.

Ideally a customer retention program will contact the maximum number of potential churners with the fewest total number of customer contacts. This point is illustrated in the graph 10 of Fig. 1. The horizontal axis represents the percentage of the total customer population from 0-100%. The vertical axis represents the percentage of customers who will in fact chum. In this example churners comprise 5% of the overall customer base. A first curve 12 shows the results of randomly contacting all existing customers. Since churners only make up 5% of the total customer population, churners can be expected to comprise approximately 5% of any truly random sample of the customer population regardless of the size of the sample. Under these circumstance 100% of the customer population must be contacted to ensure contacting 100% of all churners. 75% of the total customer base must be contacted to reach 75% of the churners, and so forth. Because of the relatively low percentage of churners, a large number of customer contacts are wasted on customers who will not chum. In other words excessive number of non-churners must be contacted in order to the reach a meaningful number of churners. The inefficiency of this method is apparent.

A second curve 1A represents the ideal situation in which the identity of all future churners is known. In this case only churners need be contacted. No contacts be wasted on non-churners since churners comprise 5% of the total customer population, 100% of all churners can be contacted by contacting only 5% of the total customer population. Obviously, contacting only known churners is a far more efficient mechanism for reaching significant numbers of churners than by contacting customers at random. Unfortunately, the identity of customers who will chum are not known in advance, and it is not realistic to put together a customer retention target list that includes only the names of those customers who will assuredly chum in the near future.

A third curve 16 represents an attractive targeting profile for a customer retention program. While it is impossible to determine in advance which customer will chum, it is possible to determine with some degree of accuracy, which customers are more likely to chum than others. In this case, customers who are more likely to chum are targeted first. Predicting who will chum and who will not chum is not a precise science. Some customers may be contacted who have not churned and some customers who will end up churning may not be. Nonetheless, the over all affect is a significant improvement in the targeting efficiency over the randomly selected method 302. As can be seen, the shape of curve 306 approximates the shape of the ideal curve 304. Approximately 70% of all churners may be contacted by contacting only 10% of the total customer population (a significant improvement over the random contact method in which 70% of all customers would have to be contacted to reach 70% of churners). A good targeting profile will have a very steep initial rise, indicating that most of the customers initially contacted are in fact churners. The key to developing a good targeting profile is accurately predicting which customers are likely to chum and which will not. To make such predictions an intimate and detailed knowledge of the customer base is absolutely essential.

BRIEF SUMMARY

The present invention relates to a system and method for analyzing and predicting chum within a business's customer base so that steps may be taken to limit or otherwise manage chum. The system and method provide business intelligence to business users responsible for retaining customers. The business intelligence provided by the invention facilitates efforts to retain high profitability customers and prevent erosion of the customer base. The invention allows business intelligence consumers to analyze their customer base, identifying customer behavior patterns and tracking trends that impact customer chum. Such analysis can be beneficial in understanding the causes of chum and identifying early warning signs that may indicate when a customer is contemplating or has decided to drop a particular service plan. Knowing the causes of customer chum, a business may take steps to improve products and services to reduce chum in the future. Furthermore, identifying potential churners early allows a business to take proactive steps to retain customers who may otherwise be lost.

According to the invention historical data are analyzed in order to develop a strict definition of chum and to distinguish between active and churned customers. The characteristics of churners and non-churners are analyzed to identify the key characteristics of each and to identify the reasons why customers chum. Data mining processes identify clusters of customers based on a large number of variables that define various customer attributes. The clustering function allows business intelligence consumers to see patterns and associations between customers and customer groups that would otherwise remain hidden in the vast amounts of data the present invention considers. Statistical models are created to score customers based on their propensity to chum. Customers having a high propensity to chum may be contacted as part of a customer retention or chum management program and offered incentives not to drop a particular service or service plan. For example, potential churners may be offered special pricing terms, extra services, or other incentives to dissuade them from dropping a service.

The present invention analyzes the characteristics and behavior patterns of past churners and non-churners alike. The invention identifies the factors and behavior and usage patterns that often precede either a customer's decision to churn or the actual event itself after the decision has been taken. The information gleaned from past customer behavior is applied to current customer data in order to predict which present customers are likely to churn in the future. Customers with the highest propensity to churn may be selected as targets for a customer retention program. By targeting only customers having a high propensity to churn, the present invention provides optimized customer lists designed to include a much higher percentage of potential churners out of a limited portion of the overall customer base. The present invention provides the processes and tools for designing and implementing effective customer retention programs.

According to an embodiment of the invention a system for managing chum among the customers of a business having a statistically large customer base is provided. The heart of the system is an optimized data mart configured to receive and store vast amounts of customer data. A population architecture is provided to receive customer data from one or more external and load the data into the data mart. The customer data stored in the data mart define a plurality of customer attributes for the customers in the customer base. A data manipulation module is provided for preparing one or more analytical records from data stored in the data mart. The data are prepared for data mining. A data mining tool is provided for analyzing the one or more analytical records prepared by the data manipulation module. The data mining tool is adapted to return results identifying clusters of customers sharing common customer attributes and calculating individual customers' propensities to chum during a predefined period in the future. The data manipulation module returns the results and stores them in the data mart. An end user access module is provided for accessing the results returned from the data mining tool and presenting the results to a user.

Another embodiment provides a method of designing an efficient customer retention program for managing customer chum among the customers of a business having a statistically large customer base. The customer retention program includes an analysis of the causes of customer chum and identifies customers who are most likely to chum in the future. Identifying likely churners allows appropriate steps to be taken to prevent customers who are likely to chum from actually churning. The method includes adopting a set of definitions of chum sufficient to encompass all customers in the customer base and which relies on objective factors to determine whether individual customers have churned or remain active. Historical customer data are analyzed to identify significant trends and variables that provide insight into causes of chum and to identify classes of customers who are more likely to chum than others. Customer data, including data corresponding to the identified trends and variables, are prepared for data mining and predictive modeling. A Predictive model is trained on historical customer data, and the accuracy of the predictive model is verified based on historical data. Once the model is trained and its accuracy verified, the model is deployed on current customer data to generate a propensity to chum score for individual customers. The propensity to chum score indicates the relative likelihood that the individual customer will chum within a specified time period in the future. One the customers are scored the characteristics of target customers who are to be contacted during the course of the customer retention program are defined and a list of targeted customers having the defined characteristics is compiled.

In another embodiment a method of identifying targets for a customer retention program is provided. The method of this embodiment includes identifying a set of customer data variables from which a customer's propensity to churn during a future period may be estimated based on values of the identified customer data variables associated with the customer. The method further calls for providing a data mining tool with predictive modeling capabilities. The data mining tool supports at least one predictive model for estimating the propensity of individual customers to chum during the future period. The predictive model is then trained on historical customer data for which chum results are known. The at least one predictive model is then refined based on a comparison of the estimated chum propensities of individual customers against actual chum results. Once trained the predictive model is deployed on current data to estimate chum propensities of individual customers for the future period. Targets for the customer retention program are then selected based on customer chum propensities.

Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a graph showing the percentage of future churners contacted during a customer retention program versus the overall percentage of customers contacted.

FIG. 2 is a block diagram of a chum prediction and management system according to the invention.

FIG 3 is a flow chart of a method of predicting and managing chum according to the invention.

FIG 4 is a graphical report analyzing the distribution of customers in a customer population based on active or churned status.

FIG 5 is a graphical report analyzing monthly trends of activated and churned customers.

FIG 6 is a graphical report showing the chum rate for various monthly revenue classes.

FIG 7 is a graphical report showing the chum rate for various traffic cost classes.

FIG 8 is a graphical report showing the chum rate for various monthly traffic volume classes.

FIG 9 is a historical data set for training a predictive model.

FIG 10 shows a plurality of staggered historical data sets for training a predictive model.

FIG 11 is a graphical report showing customer clusters based on a behavioral variable and a value variable.

FIG 12 is a report showing the number of chums customers in clusters based on a behavior variable and a value variable.

FIG 13 is a graphical report showing the average chum rate of clusters based on a behavior variable and a value variable.

FIG 14 is a graphical report showing the percentage of business customers and the percentage of business revenue impacted by potential chum plotted against chum probability.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Fig. 2 shows a block diagram of a system 100 for analyzing and predicting chum. The system 100 includes a plurality of data sources 102, 104, 106. A dedicated data mart 110 forms the core of the system 100. A population architecture 108 is provided to perform extraction, transformation and loading functions for populating the data mart 110 with the data received from the various data sources 102, 104, 106. A data manipulation module 114 prepares data stored in the data mart 110 to be input to other applications such as a data mining module 116, and an end user access module 118, or other applications. The end user access module 118 provides an interface through which business users may interact with, view, and analyze the data collected and stored in the data mart 110. The end user access module 118 may be configured to generate a plurality of predefined reports 120 for analyzing the data. The user access module 118 includes online analytical processing (OLAP) that allows a user to manipulate and contrast data "on-the-fly" to gain further insight into customer data, historical trends, and the characteristics of active and churned customers. External systems such as CRM 122 may also consume the data stored in the data mart 110.

In order to support the chum analysis and predictive methods of the present invention, the data mart 110 must be populated with a substantial amount of customer data for each customer in the customer base. Revenue data may be provided by the enterprise billing system. Customer demographics, geographic data, and other data may be provided from a customer relationship management system (CRM). If the enterprise is a telecommunications services provider, usage patterns, traffic and interconnection data may be provided directly from network control systems. Other data sources may provide other types of customer data for enterprises engaged in other industries. Alternatively, all or some of the data necessary to populate the data mart 110 may be provided by a data warehouse system or other mass storage system.

According to an embodiment, the data requirements of the system 100 are pre-configured and organized into logical flows, so that the data source systems 102, 104, 106, etc., supply the necessary data at the proper times to the proper location. Typically this involves writing a large text file (formatted as necessary) containing all of the requisite data to a designated directory. Because most enterprises operate on a monthly billing cycle the data typically will be extracted on a monthly basis to update the data mart 110.

The population architecture 108 is an application program associated with the data mart 110. The population architecture is responsible for reading the text files deposited in the designated directories by the various data sources at the appropriate times. The population architecture may perform quality checks on the data to ensure that the necessary data are present and in the proper format. The population architecture 108 includes data loading scripts that transform the data and load the data into the appropriate tables of the data mart 110 data model.

The data mart 110 is a traditional relational database and may be based on, for example, Oracle or Microsoft SQL Server platforms. The data mart 110 is the core of the system architecture 100. The customer and revenue data are optimized for fast access and analytic reporting according to a customized data model. Star schemas allow an efficient analysis of key performance indicators by various dimensions. Flat tables containing de-normalized data are created for feeding the predictive modeling systems.

As will be described in more detail below, the data mining module 116 performs clustering functions to identify significant groupings of customers based on common characteristics or attributes. Such clusters are discovered across a large number of customer variables with no pre-conceived target variables or predefined groupings. The data mining module 116 further creates predictive models for calculating each customer's propensity to chum. The data mining module 116 may be a commercially available data mining tool such as the SAS data miner or the KXEN data mining tool. In order to maximize the discovery power of the data mining tool, variables known to be significant to identifying and predicting chum are provided to the data mining module 116. The data manipulation module 114 pulls the necessary data from the data mart 110, calculates derived variables and formats others to create data files for feeding data into the data mining module 116. The effectiveness of the data mining operation is highly dependent on the quality of the data provided to the data mining tool. Accordingly, as will be described in more detail below, great care must be taken in the selection of the variables supplied to the data mining tool. The data manipulation module 114 is also responsible for receiving the output from the data mining module and loading the results back into the data mart 110.

The end-user access module 118 pulls data from the data mart 110 to be displayed in the various pre-configured reports 120. The end user access module 118 includes online analytical processing capabilities based on market standard reporting software. Because all of the data stored in the data mart 110 are accumulated and stored on a customer by customer basis, the online analytical processing capabilities of the end user access module 118 allow the end user to alter display criteria and filter customers by various customer attributes such as relevant clusters, chum propensity, and the like, to significantly expand the business intelligence insights that may be gleaned from the chum analysis and predictive modeling system.

Fig. 3 is a flow chart outlining the tasks for implementing a chum prediction and management program according to the invention. A first preliminary task 130 is to create transparency among the customers in the customer base. It is expected that the present invention will be implemented within a large and diverse customer base. For example, an embodiment of the invention may be implemented to predict and manage chum within a telecommunications service provider's customer base. A telecommunications service provider (telecom) may have millions of customers. Customers may have different service plans, different billing arrangements (pre-paid/post paid, etc.), or other service options. Creating transparency involves providing a set of flexible but rigorous definitions of chum that may be applied to all customers within the telecom's customer base. A satisfactory definition of chum is one that may be translated into technical constraints which, when applied to customer data, leaves no doubt as to which customers are active, which customers have churned and, in the case of customers who have churned, the timing of the transition from being an active customer to becoming a churned customer (chum date). The definition of chum may differ from business to business, and along different product or service lines. Whatever the definition of chum that is finally adopted will be highly dependent on the services offered by the business and other operational considerations. Provisions must be made for distinguishing between internal and external churn, voluntary and involuntary chum, and the like.

Once churn has been adequately defined, historical customer data can be analyzed to gain insights into the factors and circumstances that lead to instances of chum. For example, once churn has been defined it is a fairly straightforward process to classify current and past customers as either active or churned. Analysis of these two groups, their usage patterns, profitability, the average tenure of customers within each group, and many other trends and variables can provide significant insights into the causes of churn and clues to identifying the customers likely to churn in the future. For example, Fig. 4 shows a report 150 that may be generated directly from the customer data stored in the data mart 110 once an adequate definition of churn has been established. Once again, the data illustrated here relate to an embodiment for predicting and managing churn for a telecommunications service provider. In the report 150 customers are divided among active customers who have generated traffic 152 (60.95%), active customers with no traffic 154 (7.58%), churned-inactive customers 156 (18.29%), and churned deactivated customers 158 (13.18). The report 150 provides a quick, easy way to absorb analysis of the present state of the customer base. Thus, even at this early stage of the churn prediction and management process, useful information has been gathered and presented. Personnel responsible for managing chum can use the report 150 to gauge how big a problem chum may or may not be.

Fig. 5 is a report showing the monthly trend of activated customers 160 versus churned customers 162. This report indicates that the period between September and August was the most critical, because this period had the biggest gap between the number of customers activated and the number of customers who churned.

Another preliminary task in the chum prediction and management process involves identifying significant trends and variables that impact chum 132. The purpose of identifying trends and variables at 132 is to identify the most significant customer variables which when aggregated, averaged, compared or otherwise dissected, manipulated, and evaluated may provide insights into customer chum and the individual decisions made by customers that lead to chum. The trends and variables identified at this stage will be highly dependent on the specific products and services a company or service provider provides. For example, according to an embodiment of the invention, approximately 200 variables and trends have been identified for analyzing historical data for predicting and managing chum among the customers of a telecommunications service provider. A complete list of these variables and a brief description of each is shown in Table 1. Some of the variables may be obtained directly from the data provided by the operational data sources, 102, 104, 106 (Fig. 1). Many others must be derived from the raw data.

Table 1

Variable

Type

Measurement

Definition

CUSTOMER_ID

nominal

Customer Identification Key

IS CHURN IS_CHURN

target

binary

Flag variable as target for churn prediction; IS_CHURN = 1 if END_DATE minus LAST_CALL_DATE greater then 2 month, else IS_CHRUN = 0

BEHAVIOUR_ CLUSTER_ID

input

nominal

Cluster Identification of behavior clustering

CITY

input

nominal

City

GENDER

input

nominal

Gender

LANGUAGE

input

nominal

Language

MARITAL_ STATUS

input

nominal

Marital status

NATIONALITY

input

nominal

Nationality

PROVINCE

input

nominal

Province

REGION

input

nominal

Region

ZIP_CODE

input

nominal

Zip code

XYZ_1_2_24

input

interval

Number of deactivated Products of the product group XYZ per months

ACCESS_ INTERNET_1_24_SUM

input

interval

Number of active Products of the product group ACCESS_INTERNET for last 6 months

ACCESS_ INTERNET_1_2_24

input

interval

Number of deactivated Products of the product group ACCESS_INTERNET per months

ACCESS_ INTERNET_1_2_25

input

interval

Number of active Products of the product group ACCESS_INTERNET per months

ACCESS_ INTERNET_1_3_24

input

interval

Number of deactivated Products of the product group ACCESS_INTERNET per months

ACCESS_ INTERNET_1_4_24

input

interval

Number of deactivated Products of the product group ACCESS_INTERNET per months

ACCESS_ INTERNET_1_5_24

input

interval

Number of active Products of the product group ACCESS_INTERNET per months

ACCESS_ INTERNET_1_6_24

input

interval

Number of deactivated Products of the product group ACCESS_INTERNET per months

ACCESS_ INTERNET_1_7_24

input

interval

Number of deactivated Products of the product group ACCESS_INTERNET per months

ACCESS_ VOICE_1_24_SUM

input

interval

Number of deactivated Products of the product group ACCESS_VOICE for 6 months

ACCESS_ VOICE_1_2_24

input

interval

Number of deactivated Products of the product group ACCESS_VOICE per months

ACCESS_ VOICE_1_2_25

input

interval

Number of active Products of the product group ACCESS_VOICE per months

ACCESS_ VOICE_1_3_24

input

interval

Number of deactivated Products of the product group ACCESS_VOICE per months

ACCESS_ VOICE_1_4_24

input

interval

Number of deactivated Products of the product group ACCESS_VOICE per months

ACCESS_ VOICE_1_5_24

input

interval

Number of deactivated Products of the product group ACCESS_VOICE per months

ACCESS_ VOICE_1_6_24

input

interval

Number of deactivated Products of the product group ACCESS_VOICE per months

ACCESS_ VOICE_1_7_24

input

interval

Number of deactivated Products of the product group ACCESS_VOICE per months

ACCESS VOICE DIVERSE_1_2_25

input

nominal

Number of active Products of the product group ACCESS_VOICE_DIVERSE per months

KYT_1_2_24

input

nominal

Number of deactivated Products of the product group KYT per months

BUNDLE_ACCESS _VOICE_1_2_25

input

nominal

Number of active Products of the product group BUNDLE_ACCESS_VOICE per months

EBILL_1_2_25

input

nominal

Number of active Products of the product group EBILL per months

YTR_1_2_24

input

nominal

Number of deactivated Products of the product group YTR per months

IDENTIFIKATION_1 _2_24

input

nominal

Number of deactivated Products of the product group IDENTIFIKATION per months

REBATE VOICE_1_2_25

input

interval

Number of active Products of the product group REBATE_VOICE per months

SERVICES_1_2_25

input

nominal

Number of active Products of the product group SERVICES per months

SERVICE_ SUPPORT_1_2_24

input

nominal

Number of deactivated Products of the product group SERVICE_SUPPORT per months

SPECIAL_ OPTIONS_1_2_24

input

nominal

Number of deactivated Products of the product group SPECIAL_OPTIONS per months

STANDARDISIERT E_OPTI_1_2_24

input

nominal

Number of deactivated Products of the product group STANDARDISIERTE_OPTI per months

LAG1_REV

input

interval

Revenue of month 2 minus revenue in month 3

LAG2_REV

input

interval

Revenue of month 3 minus revenue in month 4

LAG3_REV

input

interval

Revenue of month 4 minus revenue in month 5

LAG4_REV

input

interval

Revenue of month 5 minus revenue in month 6

LAG5_REV

input

interval

Revenue of month 6 minus revenue in month 7

LAG1_USAGE

input

interval

Cost of voice, surf and sms usage month 2 minus month 3

LAG2_USAGE

input

interval

Cost of voice, surf and sms usage month 3 minus month 4

LAG1_VOICE

input

interval

Cost of voice event type month 3 minus month 3

LAG2_VOICE

input

interval

Cost of voice event type month 3 minus month 4

MEAN_PERC_U

input

interval

Percentage of mean usage on mean revenue for month 2,3,4

MEAN_R

input

interval

Average revenue between months 2 and 7

MEAN_U

input

interval

Average cost for usage (surf, voice, sms) between months 2 and 4

MEAN1_R

input

interval

Average revenue between months 2 and 4

USAGE_2

input

interval

Cost of usage (sms, voice, surf) for month 2

USAGE_3

input

interval

Cost of usage (sms, voice, surf) for month 3

USAGE_4

input

interval

Cost of usage (sms, voice, surf) for month 4

REVENUE_2

input

interval

Amount of revenue per month (revenue - discount)

REVENUE_3

input

interval

Amount of revenue per month (revenue - discount)

REVENUE_4

input

interval

Amount of revenue per month (revenue - discount)

REVENUE_5

input

interval

Amount of revenue per month (revenue - discount)

REVENUE_6

input

interval

Amount of revenue per month (revenue - discount)

REVENUE_7

input

interval

Amount of revenue per month (revenue - discount)

N_AMOUNT_2_22

input

interval

Amount of revenue without discount per month

N_AMOUNT_3_22

input

interval

Amount of revenue without discount per month

N_AMOUNT_4_22

input

interval

Amount of revenue without discount per month

N_AMOUNT_5_22

input

interval

Amount of revenue without discount per month

N_AMOUNT_6_22

input

interval

Amount of revenue without discount per month

N_AMOUNT_7_22

input

interval

Amount of revenue without discount per month

Y_AMOUNT_2_22

input

interval

Amount of applied discount per month

Y_AMOUNT_3_22

input

interval

Amount of applied discount per month

Y_AMOUNT_4_22

input

interval

Amount of applied discount per month

Y_AMOUNT_5_22

input

interval

Amount of applied discount per month

Y_AMOUNT_6_22

input

interval

Amount of applied discount per month

Y_AMOUNT_7_22

input

interval

Amount of applied discount per month

PERC_USAGE_2

input

interval

percentage of surf, voice and sms usage for month 2

PERC_USAGE_3

input

interval

percentage of surf, voice and sms usage for month 3

PERC_USAGE_4

input

interval

percentage of surf, voice and sms usage for month 4

PERC_VOICE_2

input

interval

percentage of voice destination for month 2

PERC_VOICE_3

input

interval

percentage of voice destination for month 3

PERC_VOICE_4

input

interval

percentage of voice destination for month 4

SMS_COST_2_5

input

interval

Cost of usage for SMS Event type per months

SMS_COST_3_5

input

interval

Cost of usage for SMS Event type per months

SMS_COST_4_5

input

interval

Cost of usage for SMS Event type per months

SMS_COST_5_5

input

interval

Cost of usage for SMS Event type per months

SMS_COST_6_5

input

interval

Cost of usage for SMS Event type per months

SMS_COST_7_5

input

interval

Cost of usage for SMS Event type per months

SURF_COST_2_5

input

interval

Cost of usage for SURF Event type per months

SURF_COST_3_5

input

interval

Cost of usage for SURF Event type per months

SURF_COST_4_5

input

interval

Cost of usage for SURF Event type per months

SURF_COST_5_5

input

interval

Cost of usage for SURF Event type per months

SURF_COST_6_5

input

interval

Cost of usage for SURF Event type per months

SURF_COST_7_5

input

interval

Cost of usage for SURF Event type per months

VOICE_COST_2_5

input

interval