Trade-off among timeliness, messages and accuracy for large-scale information management

Tesis doctoral de René Brunner

Information management requires new techniques to deal with the increasing amount of data and the number of nodes in large-scale environments. Examples of such environments are the decentralized infrastructures of computational grid and computational cloud applications. These large-scale applications need different kinds of aggregated information such as resource monitoring, resource discovery or economic information. The challenge of providing timely and accurate information in large scale environments arise from the distribution of the information. Reasons for delays in distributed information system are a long information transmission time due to the distribution, churn and failures. a problem of large applications such as peer-to-peer (p2p) systems is the increasing retrieval time of the information due to the decentralization of the data and the failure proneness. However, many applications need a timely information provision. Another problem of large applications is the scalability due to an increasing network consumption. Using approximation techniques allows reducing the retrieval time and the network consumption. However, the usage of approximation techniques decreases the accuracy of the results. Thus, the remaining problem is to offer a trade-off in order to solve the conflicting requirements of fast information retrieval, accurate results and low messaging cost. our goal is to reach a self-adaptive decision mechanism to offer a trade-off among the retrieval time, the network consumption and the accuracy of the result. Self-adaption enables distributed software to modify its behavior based on changes in the operating environment. In large-scale information systems that use hierarchical data aggregation, we apply self-adaptation to control the approximation used for the information retrieval and reduces the network consumption and the retrieval time. The hypothesis of the thesis is that approximation techniques can reduce the retrieval time and the network consumption while guaranteeing an accuracy of the results, while considering user¿s defined priorities. first, this presented research addresses the problem of a trade-off among a timely information retrieval, accurate results and low messaging cost by proposing a summarization algorithm for resource discovery in p2p-content networks. After identifying how summarization can improve the discovery process, we propose an algorithm which uses a precision-recall metric to compare the accuracy and to offer a user-driven trade-off. Second, we propose an algorithm that applies a self-adaptive decision making on each node. The decision is about the pruning of the query and returning the result instead of continuing the query. The pruning reduces the retrieval time and the network consumption at the cost of a lower accuracy in contrast to continuing the query. The algorithm uses an analytic hierarchy process to assess the user¿s priorities and to propose a trade-off in order to satisfy the accuracy requirements with a low message cost and a short delay. the proposed content summarization algorithm reduces the information retrieval time from a logarithmic increase to a constant factor. Furthermore, the message size is reduced significantly by applying the summarization technique. For the user, a precision-recall metric allows defining the relation between the retrieval time and the accuracy. The self-adaptive algorithm reduces the number of messages needed from an exponential increase to a constant factor. At the same time, the retrieval time is reduced to a constant factor under an increasing number of nodes. Finally, the algorithm delivers the data with the required accuracy adjusting the depth of the query according to the network conditions.

 

Datos académicos de la tesis doctoral «Trade-off among timeliness, messages and accuracy for large-scale information management«

  • Título de la tesis:  Trade-off among timeliness, messages and accuracy for large-scale information management
  • Autor:  René Brunner
  • Universidad:  Politécnica de catalunya
  • Fecha de lectura de la tesis:  18/11/2011

 

Dirección y tribunal

  • Director de la tesis
    • Felix Freitag
  • Tribunal
    • Presidente del tribunal: torsten Eymann
    • dirk Neumann (vocal)
    • Luis Manuel Díaz de cerio ripalda (vocal)
    • óscar Ardaiz villanueva (vocal)

 

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Scroll al inicio