Year
2000
Abstract
Kohonen maps are used for projection and clustering of data when topology has a meaning or a managerial interest. Beyond the use of these neural networks for visualisation and graphic representation, the purpose of this research was to study the improvement obtained by the use of a Kohonen map for general underlying clustering in a response model. Two variants of a methodology were used: initially a Kohonen map was first adjusted. For each neuron, the response frequency for a new offer was computed (a book on Florence Art History) and this information was introduced as an additional variable in the logistic response model. The topology of the map had to be chosen and a decision made on the structure (sheet, cylinder or tore) and for size two measures were used: quantization and topology errors. In a big map, the frequency computed for each neurone can vary greatly and this methodology may lead to poor stability in the model coefficients with large gaps between the working and test samples. So a second methodology was also suggested in which another clustering step was included: the Kohonen map neurones were clustered in several groups. The number of groups was decided by minimization of a statistical criterion. For the empirical validation, a large database of an American book club was used. The reading of books should be fairly rational, even at the family level, and obtention of stable and meaningful clustering could be expected. The file was divided into a working sample of 20,000 customers and a test sample of 30,000 customers. A response model was built into the working sample and validated on the test sample. The computation was done with a SOM (self organising map), a Matlab version of a Kohonen map. In the first variant of the methodology, the best structure was the sheet (versus cylinder or tore). With no information of any statistical decision regarding the size of the map, parsimony was chosen for the first map (5×10) and a larger map capable of retaining the complexity of information for the second variant (25×25). The second-level clustering was done by a K-means algorithm with an optimal number of 15 clusters.The first advantage of a Kohonen map is visual presentation of the information to ease interpretation. So several maps were drawn up to present information about the distance between the neurones, the position of the books in the neurone space, the buying frequency of new books. Beyond this descriptive use, the Kohonen map results were used in the predictive response model. The criterion used was a gini concentration curve: the sample was organised by increasing buying frequency and was divided into 20 quantiles of 5% each. The coefficients computed were the mean square values for gaps between the response prediction and a naïve model of constant response rate (9%) and gaps between real buying frequencies in the working and test samples. Results are very encouraging : response prediction was better with the two variants presented compared to the Recency Frequency Money model or traditional logistic regression regarding either the predictive ability or the stability of the results. Although the empirical work was done with a real database and very large samples, the conclusions drawn need to be supported with work on other databases as some results may be peculiar to buying behaviour. An important research move was the integration of the two steps. Indeed, the methodology was a traditional two-steps approach with clustering first and then second response prediction. It should yet be possible to increase the quality of the response prediction by building a specific model for each segment. It could be done by estimating specific parameters for a logistic regression model for each group or by using specific software which aims to manage clustering and prediction simultaneously as in Typren or Glimmix.
DESMET, P. (2000). Analyse des paniers d’achat avec une typologie par une carte de Kohonen. Dans: 7ème Rencontre Internationale – ACSEG (Approches Connexionnistes en Sciences Economiques et de Gestion). Centre national de la recherche scientifique (CNRS), pp. 189-201.