Optimizacion geometrica para problemas de clasificacion

Tesis doctoral de Pablo Perez Lantero

Data mining is a relevant discipline in computer science, the main goal of which is to explore data and extract information that is potentially useful and previously unknown. By using mathematical tools, such as operation research, statistics, artificial intelligence and more recently computational geometry, data mining solves problems in many areas where there are big databases. Within computational geometry, the techniques of geometric optimization can be applied to solve many problems in this field. Typically, problems in data mining concern data belonging to two classes, say red and blue, and mainly appear in important subareas such as the classification of new data and the recognition of patterns. this thesis focuses on the study of optimization problems with application in data classification and pattern recognition. In all of them, we are given a two-class data set represented as red and blue points in the plane, and the objective is to find simple geometric shapes meeting some requirements for classification. The problems are approached from the computational geometry point of view, and efficient algorithms that use the inherent geometry of the problems are proposed. a crucial problem in data mining is the so-called «maximum box problem», where the geometric shape to be found is a maximum box, that is, an axis-aligned rectangle containing the maximum number of elements of only one class in the given data set. This thesis solves some natural variants of this basic problem by considering: two boxes (one per class), the minimum number of boxes to cover a class, or the maximum box in kinetic scenarios. Commonly, classification methods suppose a «good» data distribution, so a clustering procedure can be applied. However, if the classes are «well mixed», a clustering for selecting prototypes that represent a class is not possible. In that sense, this thesis studies a new parameter to measure, a priori, if a given two-class data set is suitable or not for classification.

 

Datos académicos de la tesis doctoral «Optimizacion geometrica para problemas de clasificacion«

  • Título de la tesis:  Optimizacion geometrica para problemas de clasificacion
  • Autor:  Pablo Perez Lantero
  • Universidad:  Sevilla
  • Fecha de lectura de la tesis:  02/07/2010

 

Dirección y tribunal

  • Director de la tesis
    • José Miguel Diaz Bañez
  • Tribunal
    • Presidente del tribunal: ferran Hurtado díaz
    • Manuel Abellanas oar (vocal)
    • sergey Bereg (vocal)
    • Jorge Urrutia galicia (vocal)

 

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Scroll al inicio