Knowledge discovery techniques to improve the services of internet economics

  1. Ruiz Agúndez, Igor
Supervised by:
  1. Yoseba Koldobika Peña Landaburu Co-director
  2. Pablo García Bringas Co-director

Defence university: Universidad de Deusto

Fecha de defensa: 30 May 2012

  1. Mario G. Piattini Velthuis Chair
  2. Diego López de Ipiña González de Artaza Secretary
  3. Josuka Díaz Labrador Committee member
  4. Juan Garbajosa Sopeña Committee member
  5. Eduardo Jacob Taquet Committee member

Type: Thesis

Teseo: 330734 DIALNET lock_openTESEO editor


Clustering algorithms in literature follow different strategies focused on extracting a certain kind of knowledge. In this way, their results are usually disparate. Such diversity makes choosing the optimal clustering algorithm for a definite purpose very difficult, since each model is tailored to a specific goal; indeed, model comparison is one of the classic clustering challenges. Traditionally, this enterprise has been tackled in non-contrasted ways of performing the validations manually, without a clear method, or based on formal knowledge. One of the most promising fields to which clustering algorithms may contribute is Internet Economics: a research area that studies the services that users can consume. So far, its scope has been mainly concentrated around data transport accounting (customers usually pay for the use of network resources of certain access providers) but do not pay for any other type of service. The evolution of Internet already forces us to offer new and varied kinds of applications that lie on the transport infrastructure, using it. Therefore, Internet Economics must provide with suited tools and answers to these changes. For instance, the requirements of support systems have turned stricter and more demanding, pressuring service providers to improve their service quality, customer care, marketing, management, and so forth. This scenario hinders this discipline from achieving maturity, since it keeps procedures and semantics far from being completely standardised. Thus, we have evaluated the application of clustering algorithms on service data in order to enrich support systems in Internet Economics. Hence, in order to find the technique that allows extracting the most significant knowledge, we have to cope with challenges related to the difficulty of comparing algorithms with different goals, inputs, and outputs: structuring the input data, obtaining the models layout, determining the quality of the results, and so on. Indeed, each problem requires its own metric to obtain the best representation of knowledge in a problem-independent methodology. Against this background, we propose a knowledge discovery methodology that analyses the data generated by a service and enables the extraction of an optimal model that best represents each Internet service so it can be used to improve the aforementioned support systems. This methodology applies a number of clustering algorithms to a service dataset. First, it collects and compares the results of the clustering algorithms, subsequently obtaining an optimal model that represents the service. In order to compare these models, we have defined a common base attribute, a metric over it, and a criterion that picks up the best solution according to that metric. Still, trying to design a formal, theoretical way to accomplish this task for every possible problem specification is not feasible since metric, base attribute, and criterion are all area specific. Therefore, we propose the use of problem-independent metrics that enable the comparison of results by problem-specific criteria. We have validated this novel methodology through a real Internet Economics use-case, namely a Voice-over-Internet protocol service. Such service may range from a simple call between two users to more complex practices including multiple-user teleconference, call transferring, call-centre functionalities, and so on. The used dataset corresponds to a medium-sized corporation and the extracted knowledge has been used to support and advise the infrastructure dimensioning system. In this way, we demonstrate how clustering techniques allow extracting significant knowledge that may assist the support systems of a service of Internet Economics in its task.