Desarrollo de nuevas metodologías informática aplicadas a la espectometría de masas y al análisis masivo de datos generados en proyectos de proteómica utilizando técnicas de segunda generación

Tesis doctoral de Pedro José Navarro álvarez

High¿throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. In this work, we introduce a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores obtained by the database searching engine sequest. The probability ratio is a non¿parametric and robust indicator that makes unnecessary spectra classification according to parameters such as charge state and allows a peptide identification performance, on the basis of false discovery rates, at least better than that obtained by other empirical statistical approaches. The indicator can also be modified to take into account the isoelectric point information obtained after ief peptide fractionation. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single¿spectrum sequest score distributions. These results make the robustness, conceptual simplicity and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high¿throughput experiments. In the other hand, statistical models for the analysis of protein expression changes by stable isotope labeling are still poorly developed. Besides, large¿scale test experiments to validate the null hypothesis are lacking. In this work we analyze several null¿hypothesis, large¿scale quantitative proteomics experiments performed using different isotope labeling approaches and mass spectrometry machines. Current statistical models based on normality and variance homogeneity were found unsuitable to describe the null hypothesis in all the situations tested, producing false expression changes. A random¿effects model was then developed including four different sources of variance at the spectrum¿fitting, scan, peptide and protein levels. With the new model the number of outliers at scan and peptide levels and the number of false expression changes were negligible in all the cases analyzed. The new model allowed to pass normality test all the three quantitation levels, becoming the first integrated, null¿hypothesis tested statistical model capable of interpreting any kind of quantitative data obtained by stable isotope labeling. All these algorithms and statistical models have been integrated in a software platform called quixot.

 

Datos académicos de la tesis doctoral «Desarrollo de nuevas metodologías informática aplicadas a la espectometría de masas y al análisis masivo de datos generados en proyectos de proteómica utilizando técnicas de segunda generación«

  • Título de la tesis:  Desarrollo de nuevas metodologías informática aplicadas a la espectometría de masas y al análisis masivo de datos generados en proyectos de proteómica utilizando técnicas de segunda generación
  • Autor:  Pedro José Navarro álvarez
  • Universidad:  Autónoma de Madrid
  • Fecha de lectura de la tesis:  12/03/2010

 

Dirección y tribunal

  • Director de la tesis
    • Jesús Vazquez Cobos
  • Tribunal
    • Presidente del tribunal: José María Carazo garcia
    • Francisco Zafra gomez (vocal)
    • benito Cañas montalvo (vocal)
    • paulino Gomez puertas (vocal)

 

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Scroll al inicio