Survey data and Bayesian analysis: a cost-efficient way to estimate customer equity

We present a Bayesian framework for estimating the customer lifetime value (CLV) and the customer equity (CE) based on the purchasing behavior deducible from the market surveys on customer purchasing behavior. The proposed framework systematically addresses the challenges faced when the future value of customers is estimated based on survey data. The scarcity of the survey data and the sampling variance are countered by utilizing the prior information and quantifying the uncertainty of the CE and CLV estimates by posterior distributions. Furthermore, information on the purchase behavior of the customers of competitors available in the survey data is integrated to the framework. The introduced approach is directly applicable in the domains where a customer relationship can be thought to be monogamous. As an example on the use of the framework, we analyze a consumer survey on mobile phones carried out in Finland in February 2013. The survey data contains consumer given information on the current and previous brand of the phone and the times of the last two purchases.


Introduction
The monetary value of a future relationship with a customer is a fundamental concept for the rational long-term management of the customer base and the planning of marketing activities. In recent years, a lot of effort has been invested to develop models for estimating the future value of a customer relationship, or customer lifetime value (CLV) (Bejou et al., 2006;Gupta and Lehmann, 2005;Blattberg et al., 2008). Thanks to this development, companies are nowadays able to utilize forward looking monetary estimates when assessing the value of the company (Bauer et al., 2003;Pfeifer, 2011), planning marketing actions (Kumar et al., 2008) or optimizing the customer base and allocating their resources (Venkatesan and Kumar, 2004;Kumar and Petersen, 2005;Rust et al., 2001). If applied properly, CLV and its direct derivative, the future value of the total customer base, or customer equity (CE) (Kumar and George, 2007), are able to reveal customers that are most valuable to a company in the long run, and direct the actions to produce optimal return on marketing investments.
For companies with an access to customer level purchase histories, CLV can be estimated for each customer, at the individual level, based on the data on the past purchases of an individual customer and stochastic modeling of the general purchasing behavior of the population (Schmittlein et al., 1987;Fader et al., 2005b,a). For many companies, however, purchase histories of individual customers are not available and CLV models based on such data are not applicable. This can be due to many reasons. Many companies producing consumer goods or services use retailers to sell their products and thus lack the direct interaction with their end customers. In some cases, the time interval between repurchases is long, and gathering meaningful individual level purchase histories takes a lot of time. And even in the cases where the individual level customer data would be in principle available, the lack of maturity of company's data gathering process, the strict privacy policies or the prohibitive cost of collecting purchase data might not allow the utilization of the data for involved statistical modeling.
As all companies need to plan their marketing actions and portfolio to maximize return on investment (ROI), also companies without an access to individual purchase histories would benefit from understanding the average CLV of different customer segments. For example, it is very common that the company manufacturing the consumer goods is much bigger than individual retailers that distribute the goods to consumers. Hence the manufacturing company needs to make substantial marketing investments to ensure the demand of its products, even if the customer relationship, and data on the individual purchase histories, is owned by the retailers. Ensuring that these marketing investments are optimally distributed is a major task for many large manufacturing companies. The goal of the paper is to present a practical method for the estimation of CLV and CE when company level data on individual purchase histories are not available. Below we introduce a modeling framework to estimate CLV and CE from the survey data. The use of the data from a market survey solves the problem of data availability and may be a significantly cheaper option than implementing the full scale collection of individual purchase histories.
Survey data have both advantages and disadvantages compared to customer registries. The data from the customer registry may suffer from adverse selection due to the cohort heterogeneity (Fader and Hardie, 2010) and thus give a biased view on retention. Survey data are drawn as a random sample from the whole population and therefore provide unbiased estimates of retention and churn, even in a non-contractual business setting. In addition, in a survey the respondents can be asked also on their purchase intentions, which are not visible from the purchase histories. In an anonymous survey, respondents can also be asked for rich background information that can be utilized to segment the customer base. Collecting the same information for the total customer base can be considered intrusive.
On the other hand, collecting information on the customer purchasing behavior via surveys has limitations that need to be addressed when CLV is estimated based on survey data. Compared to the full purchase histories, survey data are scarce. Survey respondents can be realistically assumed to remember only few latest transactions. Information on the timing of the transactions is also imprecise, making the purchase data interval censored. And even when the survey respondents form a representative sample of the whole customer base, fairly limited number of observations prohibits the use of complex statistical models, as they would easily overfit to the limited training data and produce biased forecasts on CLV. Limited number of sample observations also introduce a sampling variance to the CLV and CE estimates absent in data describing the full purchase histories of individual customers.
Our framework simultaneously addresses the listed challenges and can make the full use of information available in the survey data, including customer behavior with competitors. The framework uses survey data to model the brand switching behavior of customers between the focal brand and its competitors. The scarcity of the survey data is compensated by the use of prior information to guide the estimation process. Bayesian analysis (Gelman et al., 2013;Rossi et al., 2005) provides a natural way to handle the uncertainty of the prior information and the data. Bayesian models can directly deal with interval censored purchase data. The estimated Bayesian model not only gives the point estimates of the CLV and CE of different customer segments, but also quantifies the uncertainty of the estimates. This greatly increases the potential use of the estimates in the real world decision making, where the risk, or the expected opportunity loss, from the wrong marketing decision needs to be understood (Hubbard, 2010).
Despite of its flexibility and usefulness, the Bayesian approach has not been frequently applied to the CLV and CE estimation. Borle et al. (2008) and Abe (2009) proposed hierarchical Bayes extensions to the Pareto/NBD model (Schmittlein et al., 1987). Nagano et al. (2013) extended the model further by allowing more flexibility for the individual level regression parameters. Jen et al. (2009) used hierarchical Bayes models to model the temporal dependence of purchase quantity and timing. Singh et al. (2009) used Markov Chain Monte Carlo (MCMC) based data augmentation framework for estimating CLV in a noncontractual context. Our framework differs from these works by focusing on survey data and the uncertainty of the recorded purchase times. The cost-efficiency perspective has not been earlier used to motivate the use of survey data and Bayesian analysis.
The general Bayesian framework for the estimation of CLV and CE from survey data is presented in Section 2. In Section 3, we demonstrate the proposed framework for estimating CE from survey data by analyzing a survey with 536 respondents carried out in Finland in February 2013. The respondents provided information on the brand of their current and previous mobile phone and the times of the last two purchases. For each individual, we model the intensity of the purchase process and the personal probability of repurchase as latent variables that depend on the covariates measured in the survey. Conclusions are given in Section 4.

Bayesian estimation of customer equity from survey data
The business objective is to estimate the CLV and CE of the major companies for mobile phones or other electronic devices in a geographically specified market. Assume that the data on the purchase behavior are obtained for a small random sample of the population. The collected data contains answers to the following questions: 1. What is the brand of your current device?
2. When did you buy your current device?
3. What was the brand of your previous device?
4. When did you buy your previous device?
5. Which brand would be the most interesting for you if you were to buy a new device now?
In addition, there may be various questions on the background of the customer, such as age and gender, which can be used to define the customer segments.
The structure of the data is illustrated in Figure 1. The purchase interval T i is defined as the difference of the dates from questions 2 and 4. However, the survey respondents do not usually remember these dates exactly, which leads to interval censored dates and consequently to interval censored purchase intervals. It follows that the models and methods used in the analysis should be capable for handling interval censored data. This condition is fulfilled by Bayesian analysis in a natural way whereas many frequentist methods would require special techniques to deal with interval censoring.
It is also possible to utilize the data on the intended next brand (question 5). In 'Intended repurchase' model, it is assumed that the next brand will be the intended brand. The time of the next purchase is unknown but the difference between the survey date and the purchase date of the current device (question 2) can be used as a lower limit for the next purchase interval. The obtained right censored purchase intervals are referred as backward recurrence times and are potentially problematic in the maximum likelihood estimation (Allison, 1985). Here the problems are avoided because the right censored times are used together with the interval censored times from questions 2 and 4. The interval censored times provide information on the right tail distribution of the purchase intervals which cannot be learned from right censored times. The comparison between 'Intended repurchase' model and 'Historical repurchase' model where customer intentions are not utilized, may reveal the direction of the future changes in the brand popularity. Figure 1: Illustration of the interval censoring for the purchase intervals. In 'Historical repurchase' model, the data on the current brand and the previous brand are used; In 'Intended repurchase' model, also the data on the intended next brand is used.
The general procedure for Bayesian estimation of CE with survey data has the following steps: 1. Specify a model for the customer behavior and choose the prior distributions for the model parameters.
2. Design the survey so that the model parameters can be efficiently estimated and collect the survey data 3. Use Markov chain Monte Carlo (MCMC) or other simulation techniques to generate observations from the joint posterior distribution of the parameters.
4. For each set of parameters generated from the joint posterior distribution and for the each individual in the sample, generate several purchase histories for the forthcoming years. From these purchase histories, calculate the individual CLVs and the CE as their sum. These values are observations from the posterior distribution of the CE of the survey sample.
5. Using the knowledge on the size of the market, scale the CE distribution of the sample to present the whole customer population.
The procedure can be applied with a wide variety of different models. On one hand, the model needs to capture the purchase behavior of customers with a reasonable accuracy. On the other hand, the model needs to be identifiable from the (limited) survey data and general enough to avoid overfitting.
In Section 3 we demonstrate how the general approach can be applied to solve a real-world problem of estimating the CLV and CE for the major mobile handset makers in Finnish market. The example shows how the model for the customer behavior can be tailored to the business domain and available survey data. In Appendix we empirically validate the proposed approach with a general semi-Markov brand switching model and simulated data. The example in Appendix shows that the total CE can be estimated with a sufficient accuracy from survey samples of realistic size.
3 Customer equity for mobile phone brands

Survey data
To demonstrate the practical applicability of the proposed framework, we apply it to estimate the CLV of different mobile phone brand in Finland from the real survey data. The mobile phone data were collected in February 2013 together with the National Consumer Net Shopping Study conducted by market research company Tietoykkönen Oy. The target group was 15-79 years old mobile phone owners in Finland. The data collection method was telephone interviews by using a computer-assisted telephone interviewing (CATI) system. The sample source was targeting service Fonecta Finder B2C, which contains all publicly available phone numbers in Finland. Random sampling was made by setting quotas in respondents gender, age and region in the major region level excludingÅland autonomic region. The 7 sample size was 536 completed interviews. Table 1 shows the structure of the data. Compared to Finnish official statistics (Statistics Finland, 2012) the amount of women and youngest age group (15-24 years) in the data is slightly too small and men and oldest age group (65-79 years) too large. Finnish market is well saturated with mobile phones. Practically all household have at least one mobile phone at their disposal. In total, there were 9.3 million active mobile subscriptions at the end of year 2012 (Finnish Communications Regula 2013), when the size of the Finnish population was 5.4 million at the same time.
Mobile operators sell the majority of mobile phones. While some operators offer subsidies if the device is purchased together with an operator contract, the subsidies are not very aggressive. Furthermore, devices are not locked to a contract or to an operator. All major operator offer all major device brands. Hence, Finnish mobile device market is not as operator con-trolled as US market, for instance, and a consumer is not heavily directed by an operator to choose a certain device brand or a device repurchase rate.
All 536 survey respondents had a mobile phone. The respondents answered the following questions (originally in Finnish): 1. What is the brand of your mobile phone? 2. When did you purchase your mobile phone? (year and month; if the month was not recalled the season was asked) 3. What was the brand of your previous mobile phone?
4. When did you purchase your previous mobile phone? (year and month; if the month was not recalled the season was asked) 5. Which brand would be the most interesting for you if you were to buy a mobile phone now?
6. Is your mobile phone a smart phone, a feature phone with an internet connection or a phone without an internet connection?
In addition, the respondents where asked for their gender, age group ( Table 2 presents the distribution of the brand by gender and age. It can be seen that there is a clear association between the age group and the current brand. 73% of the respondents have Nokia as their current phone but Nokia's share of the installed base varies from the 93% of the age group 65-79 years to the 46 % of the age group 15-24 years. In contrast, Apple and Samsung are relatively strong in the younger age groups. This suggest that the CLV analysis should be stratified by the age group. The purchase times are interval censored: the respondents are asked only for the purchase month, not for the day and many respondents could not recall the time of the purchase. Out of 536 respondents, 310 were able to tell the purchase month and year, additional 115 were able to tell the season and year, 74 mentioned only the year and 37 were not able to tell even the year. 19 respondents told that they did not have a mobile phone earlier or they cannot remember the brand. Out of 517 respondents who mentioned the previous brand, 117 were able to tell the purchase month and year, additional 91 were able to tell the season and the year, 146 mentioned only the year and 163 were not able to tell even the year. Figure 1 illustrates the calculation of the minimum and the maximum purchase intervals. The maximum purchase interval of 200 months is assumed when the purchase year is missing. The survey data allows the modeling of purchase intervals and brand choices but do not provide information on sell-in prices that are also needed to estimate CE. Companies may obtain detailed data on pricing from their internal sources but in this example we rely on external information that is publicly available. Information on the sell-in prices is collected from the quarterly reports of Nokia, Apple and Samsung. For the 4th quarter of 2012, Nokia reported average sales price (ASP) 186 euros for smart phones and 31 euros for (other) mobile phones . For the same period, the ASP for Apple was 641 US dollars (473 euros) and the ASP for Samsung 178 euros. As the actual ASPs for Finland have not been published, these global numbers are used in the CLV estimation. The Nokia ASP is calculated to be 68 euros using the ratio of smart phones and mobile phones for the Nokia owners in the data.

Bayesian modeling
The structure of the Bayesian model and the survey data is presented in Figure 2. The observed covariates in the model are the age group, the income group, gender, the region and the current brand. The current brand is a dynamic covariate that changes at each transaction and the other covariates are assumed to be static. The latent variables in the model are the purchase rate λ i , the repurchase probability p i and the acquisition probabilities for each brand. The purchase rate and the repurchase probability depend on all the covariates but the acquisition probabilities depend only on the age group because the small size of the data does not allow the estimation of acquisition probabilities for all covariate combinations. The purchase interval T i is not observed exactly but is interval censored with observed lower limit t min i and upper limit t max i . The repurchase indicator R i ∈ {0, 1} and the purchased brand are observed. The common covariates, region, income group, gender and age group control the dependence structure between the purchase intervals and the repurchase probabilities.
The model consists of three parts: the submodel for the purchase intervals, the submodel for the repurchase and the submodel for the choice of the new brand in the case of churn. The submodel for the purchase intervals can be presented as follows: The purchase intervals are assumed to follow the Gamma distribution with shape parameter κ common for all intervals and scale parameter λ i defined separately for each interval. Assuming the Gamma distribution here increases the flexibility of the model compared to the exponential distribution. An informative Gamma prior is assumed κ because the value of the shape parameter is not expected to be very far from κ = 1, which corresponds to the exponential distribution. The scale parameter λ i follows a log-linear model where the regression coefficients for the covariates have uninformative priors. Only main effects are included because the small size of the data does Figure 2: Graphical model for the data generating mechanism of the mobile phone survey. The observed variables are presented as rectangles and the latent variables as ellipses. The arrows describe the relationships between the variables. The dashed arrow from new brand to current brand indicates that the new brand will be the current brand when the next transaction is considered.
not allow the interactions of the covariates to be estimated reliably. The regression coefficients are defined so that the reference is a poor young man who lives in Helsinki-Uusimaa and owns a Nokia phone. The parameter β 0 defines the value of the scale parameter for the reference.
The submodel for the repurchase can be presented as follows: The repurchase indicator follows a logistic regression model where the regression coefficients for the covariates have uninformative priors. The reference is defined as above.
Finally, the submodel for the choice of the new brand in the case of churn can be presented as follows: w jh ∼ Beta(2, 2) for all j, h.
The weights w jh , where j refers to the brand and the h refers to the age group, are used as auxiliary variables that describe the relative popularity of each brand. The probability of a brand to be selected equals the relative popularity of the brand divided by the sum of the relative probability of all brands except the current. The use of the Beta(2, 2) prior for the weights instead of the uniform prior stabilizes the scale of the weights and ensures the convergence in the MCMC estimation. With these definitions, the choice of the new brand follows multinominal distribution where the selection probabilities depend on the age group. The data can be utilized in the estimation in two alternative ways: The model 'Historical repurchase' uses only the actual historical purchase events whereas the model 'Intended repurchase' assumes that the next brand will be the brand the individual is most interested in according to the survey (question 5 in the list of questions above). These two alternatives are also illustrated in Figure 1. The purchase intervals are defined and analyzed as described in Section 2. The same distributional assumptions are used for both the models.

13
The estimation is carried out using OpenBUGS 3.2.2 (Lunn et al., 2009), R (R Core Team, 2012) and R2OpenBUGS R package (Sturtz et al., 2005). The BUGS code is provided in Appendix 2. The convergence of the MCMC chains is monitored separately for each parameter using the interval criterion proposed by Brooks and Gelman (1998).

Estimated model
The estimated parameters are presented in Table 3 for the models 'Historical repurchase' and 'Intended repurchase'. Theoretically the parameters for the submodel for the purchase intervals are the same and the small differences are only due to the MCMC estimation. The differences in the parameter estimates in the submodels for the repurchase and the brand choice reflect real differences between the models 'Historical repurchase' and 'Intended repurchase'. The shape parameter of the distribution of the purchase intervals has value κ = 1.5, which indicates that the hazard of transaction increases as the time from the last purchase increases. For the exponential distribution κ = 1, the hazard is constant. The estimated average purchase interval is 2.9 years based on simulations from Gamma(κ, λ i ). For Apple and Samsung the purchase rates are higher compared to Nokia (parameters β Apple and β Samsung ). There are no clear differences in the purchase rates between the age groups, gender, income group or geographical areas (parameters from β 25-34 to β Eastern & Northern ).
When repurchase probabilities are considered, the owners of Apple or Nokia seem to more loyal than the owners of Samsung or other brands (parameters α Apple , α Samsung and α Other brand ). The older age groups are more loyal to their current brand than the younger age groups. Gender, income group or geographical region do not have a major effect to the repurchase probabilities. When the models 'Historical repurchase' and 'Intended repurchase' are compared, it can be seen that in the latter model the repurchase probabilities are higher for all brands but especially for Apple and Samsung.

Model diagnostics
The model fit was studied by comparing the purchase intervals and repurchase probabilities between the data and the model. The real purchase intervals are not observed directly but are interval censored which complicates the checking of the model fit. For each individual in the data, the 9000 values of parameters κ and λ i are drawn from their posterior distribution available as the result of the MCMC estimation. For each realization of parameters κ and λ i , purchase intervals are generated and 9000 posterior realizations of the cumulative distribution function (CDF) of the purchase interval are obtained. One hundred randomly selected realizations of the CDF of the purchase interval are plotted in Figure 3 together with the CDFs of the minimum and the maximum purchase intervals in the data. The CDF of the minimum (maximum) purchase intervals is obtained as the CDF of the lower (upper) limits of the purchase intervals. In general, the posterior CDFs lie between the CDFs of the minimum and the maximum purchase intervals, which indicates that the model fits to the data.
The fit of the estimated repurchase and acquisition probabilities is checked by regenerating the distribution of the current brand on the basis of the previous brand and the estimated posterior parameters and comparing this to the observed distribution of the current brand in the data. The comparison in Table 4 shows that the frequencies in the data are always inside the 95% credible intervals calculated from the simulations. We conclude that the model fits to the data.

Estimation of CLV and CE
Using the estimated posterior distributions of the parameters, 2000 purchase histories were generated for each individual in the sample. Due to the volatile nature of the mobile phone markets, we restricted the simulation to cover only the next five years from March 2013 to February 2018. It can be argued that the uncertainty of the revenues in the future is so high that the decisions on the marketing actions should not be based on the transactions beyond the next five years. The each simulated purchase history contains all purchases with the purchase date and purchased brand from the next five years. As the data do not offer information on the customer level costs (calls to the customer care etc.), we exclude these cost and concentrate on the revenues. With the ASPs given in Section 3.1 and the annual discount rate 10%, the net present value the purchases can be calculated. As a result, we obtain 2000 simulated five-year CLVs for each individual. All calculations were repeated by using two alternative models and data: a) only historical purchases or b) both the historical and intended purchases. The average five-year CLV by the brand and the customer status are presented in Table 5. As expected the CLV was higher for the current customers than non-customers. The average CLV of a current Nokia owner for Nokia is 70 euros, which roughly means that the customer is going to buy one Nokia phone during the next five years. The average value of a current Apple owner for Apple is 840 euros or 1183 euros depending on the model. The latter number is equivalent to the value of purchasing Apple every second year during the next five years. The result reflects the high purchase rate, the high repurchase probabilities and the high ASP for Apple. The average value of a current Samsung owner for Samsung is approximately equivalent to the value of buying one Samsung phone during the next five years. When the intentions are taken into account, the CLV of Apple and Samsung increase and the CLV of Nokia decreases. The results also demonstrate the impact of survey uncertainty: the credible intervals of the mean are wider for Apple and Samsung than for Nokia because the number of purchases in the data are smaller. The five-year CE by brand and age group are presented in Figure 4. The five-year CE is calculated by scaling the average five-year CLV by the brand and the age group to the population level by using the population statistics by Statistics Finland given in Table 1. Apple has clearly the highest CE in Finland and the margin is largest in the young age groups. The survey uncertainty is reflected by the wide credible intervals for Apple. Nokia and Samsung have approximately the same CE at the population level but Samsung is the strongest in the young age groups while the CE of Nokia comes equally from all age groups. The CE of Nokia can be estimated rather accurately from the survey data but considering Apple and Samsung a larger sample size would have been desirable. Samsung customer equity with historical repurchase behavior Samsung customer equity with intended repurchase behavior Nokia customer equity with historical repurchase behavior Nokia customer equity with intended repurchase behavior Apple customer equity with historical repurchase behavior Apple customer equity with intended repurchase behavior Figure 4: The customer equities by brand and age group calculated using the historical and the intended purchase behavior.

20
We have presented a cost-efficient Bayesian approach to estimate the average CLV and the CE on the basis of survey data. The presented approach is motivated by the need for the CLV based decision making in the absence of personal purchase histories. Most companies manufacturing consumer goods do not directly sell the goods to their end customer, and hence are unable to directly collect transactional purchase data at the level of individual customer. For these companies, surveys offer a natural option to obtain information about the purchasing behavior different customer segments. Although the amount of the data is significantly smaller in surveys compared to customer registries, the survey based approach for the estimation of CE has important benefits. A survey gives insight also on the purchase behavior of the customers of the competitors. A properly collected sample represents the population and thus avoids the problems of cohort heterogeneity. Carrying out a survey is usually an easier option than organizing systematic collection of purchase histories and does not require investments to transactional data collection and storage. The use of survey data and Bayesian analysis could be a cost-efficient alternative to estimate CE for many companies that do not collect transactional data.
As always in survey sampling, the validity of the results depends on the representativeness of the sample and major differences in the survey response rates between the customer segments may bias the CE estimates. In many cases, the bias can be removed or reduced by stratified sampling or post-stratification which lead to unequal weighting of the individuals in the sample. Customer panels provide an alternative for cross-sectional surveys. Panels provide prospective data on the transactions but the population representativeness of the customer behavior may be a bigger problem than in surveys.
As a demonstration of the proposed survey based approach, we analyzed simulated data (results given in Appendix) as well as real data on the Finnish mobile markets. The applied statistical models were different in these examples to emphasize the generality of the approach. From the simulated data we learn that rather small sample sizes may be sufficient for unbiased CLV and CE estimation. The same was observed in the real data example where reasonable CLV and CE estimates were obtained from a survey with only 536 respondents.
The example of mobile phone survey demonstrates the importance of high customer loyalty and high ASP over the the success in the past when forward-looking customer value is estimated. With the installed base of 73% of the population, Nokia has expected five-year CE of 280 million Euros while Apple has expected five-year CE of 700 million Euros with the installed base of only 7%. The mobile phone market is known to be particularly volatile. Disruptive innovations, such as the introduction of Apple iPhone in 2007, are difficult to predict but may have a major impact to the future repurchase probabilities. In fact, in September 2013 it was announced that the mobile phone business of Nokia will be acquired by Microsoft Corporation. This might or might not have an impact to consumer behavior in Finland.
Taking the unpredictability of the long-run market dynamics into account, the CE should not be understood as an attempt to forecast the market development in the future but as a projection of the current state to the future. From this perspective, CLV and CE work as tools to compare brands and customer segments under the (unrealistic) assumption that the present model will describe also the future. The survey questions on the intended purchase behavior provide insight on the validity of this assumption. Significant differences in the CE estimates with historical repurchase behavior and intended repurchase behavior indicate the unpredictability of the market dynamics. And even in the changing business landscape, some CLV and CE estimates are required for the decisions on the marketing actions. On the other hand, the differences between historical and intended behavior also reflect the actual changes in the market dynamics, as in the example case above. Hence, the proposed approach can be used to quantify the CE impact of an on-going change in the market in the short-term.
An obvious weakness of the presented mobile phone survey example is the lack of individual level price data. The respondents could be naturally asked for the price paid for their current phone but this does not provide a straightforward solution to the problem. Operator subsidies and the recall bias complicate the data collection. Even in the best case the respondents can name only the retail price, not the sell-in price. Further, the purchase rate and the customer loyalty are aspects of individual level customer behavior whereas the phone prices in the future are not chosen by the customer but depend on the market. Despite the challenges, we recommend the price data to be collected in surveys.
We believe that the presented approach with survey data and Bayesian modeling will help in advocating the usefulness of the CLV and CE modeling even in the absence of personal purchase histories. Furthermore, we believe that the attention to the uncertainty of the CLV and CE will make the risks more explicit for the decision makers.
Full purchases histories are generated for the population from where small survey samples are drawn. The CE estimated from the sample is compared with the CE of the population. The procedure is repeated for a number of survey samples to obtain information on the sampling variation.
The survey data collected for the individuals i = 1, 2, . . . , n are the current state S (1 for the focal company and 0 for the competitors), the previous state S (−1) i , the time between the last two purchases T i and the time from the latest transaction T * i . As the transactions follow the Poisson process, the time between the purchases T i follows exponential distribution with rate λ i . The time from the latest purchase to the day of the survey T * i is an observation from the same exponential distribution because the Poisson process is memoryless. For the current state it holds (1) For the previous state it holds where S (−2) i is the state before the previous. For the state S (−2) i there are no observations but the formula follows from the equilibrium state of the Markov chain characterizing the brand switching.

Simulation setup
We first simulate the complete purchase histories of 100,000 individuals who are divided between the focal company and the competitors according to the market shares. Then, by using the proposed approach, we estimate the average CLV and CE from the small 'survey sample' of the simulated data, and compare the result to the true CLV and CE. The parameters used in the simulation are γ = 3, δ = 10, α p = 4, β p = 6, α q = 4, β q = 6 and the intensity is defined as the number of transactions per year. The value of a purchase assumed to be 100 euros. The purchase histories are generated for the 40 years forward and 30 years backward from the time of the survey. The true CLVs and CEs are calculated using the whole population and the generated purchase histories for the forthcoming 40 years. With the annual discounting vl~dgamma(2,1) gammal <-ml*ml/(vl+0.00001) deltal <-ml/(vl+0.00001) mp~dunif(0,1) mq~dunif(0,1) kp~dgamma(10,1) kq~dgamma(10,1) alphap <-kp*mp betap <-kp*(1-mp) alphaq <-kq*mq betaq <-kq*(1-mq) }

Simulation results
The simulation results are shown in Figure 5 and in Table 6. From Figure 5, it can be seen that the estimated posterior distributions are concentrated around the true value of the CE and the systematic bias is small or nonexisting. As expected, the variance is smaller for the larger sample sizes. Sample sizes of 800 or more seem to give sufficient accuracy of estimation.
The CLV posterior distributions for the customers of the focal company and the customers of a competitor are presented in Table 6. It can be seen that the posteriors estimated from a sample of size 1000 are very similar to the true CLV distribution of the population.