Data Mining and Modeling
Principle investigators
Laura Elo, Ph.D.
Department of Mathematics, University of Turku, FI-20014 Turku, Finland
Email: laura.elo (at) utu.fi
Tero Aittokallio, Ph.D., Adjunct Professor in Biomathematics
Department of Mathematics, University of Turku, FI-20014 Turku, Finland
Email: tero.aittokallio (at) utu.fi
Olli Nevalainen, Ph.D., Professor of Computer Science
Turku Centre for Computer Science, Joukahaisenkatu 3-5 B, FI-20520 Turku, Finland
Email: olli.nevalainen (at) utu.fi

Personnel
Post-doctoral researchers:
- Jussi Salmi, Ph.D.
Graduate students:
- Bin Gao, M.Sc.
- Jukka Hiissa, M.Sc.
- Ville Koskinen, M.Sc.
- Mikael Laine, M.Sc.
- Rolf Linden, M.Sc.
- Johannes Tuikkala, M.Sc.
- Heidi Vähämaa, M.Sc.
Undergraduate students:
- Aki Järvinen
- Lari Natri
- Mirva Piippo
- Pekka Salmela
Description of the research
The research group develops mathematical modelling methods and implements computational analysis tools for mining data generated by modern high-throughput biotechnologies. The large number of components probed together with high technical and biological variability can make it difficult to extract pertinent biological information from the background noise. This has increased the need for computational models and tools that can efficiently integrate, visualize and analyze the experimental data so that the most important questions can be addressed and the meaningful interpretations can be made. The eventual aim is to model and explain the observations as a dynamic interaction of key molecular components and mechanisms controlling the underlying system.
Data mining protocols developed so far cover a wide range of high-throughput biotechnologies, such as gene and exon arrays (cDNA, Affymetrix and Illumina platforms) for global gene expression profiling, together with RNA interference (RNAi) and chromatin immunoprecipitation (ChIP) studies (ChIP-chip and ChIP-seq) for monitoring transcriptional regulation on a global scale, as well as mass-spectrometry (MS)-based assays for large-scale proteomic studies and comparative genomic hybridizations (CGH) for detecting gene amplification or deletion events. One of the most important computational challenges is to take full advantage of all the accumulated data, both from own laboratory and from public repositories, to obtain a more comprehensive view of the system under study.
We are developing a data integration approach, which can effectively correct for the technical variation characteristic to various experimental platforms, and hence improve the comparability of different experiments, identification of differentially expressed genes and proteins, and inference of their interaction partners in global cellular networks. Such integrative network-based modelling approach can provide robust and unbiased means to reveal the key molecular mechanisms behind the systems behaviour and to predict its response to various perturbations. In clinically-oriented research, the modelling approach has the potential to improve our understanding of the disease pathogenesis and help us to identify novel molecular markers for pharmaceutical or diagnostics applications.
Funding
The Academy of Finland, Systems Biology research programme, and the Graduate School in Computational Biology, Bioinformatics, and Biometry (ComBi).
Key collaborators
Riitta Lahesmaa (Turku Centre for Biotechnology), Tuula Nyman (University of Helsinki), Matej Oresic (VTT Biotechnology), Benno Schwikowski (Pasteur Institute, Paris), Mats Gyllenberg (University of Helsinki), Esa Uusipaikka (University of Turku), Timo Koski (Royal Institute of Technology, Stockholm), Jan Westerholm (Åbo Akademi University), Mauno Vihinen (University of Tampere), Samuel Kaski (Helsinki University of Technology), Esa Tyystjärvi (University of Turku), Eija Korpelainen (CSC – IT Center for Science).
Selected publications
Laajala, T.D., Raghav, S., Tuomela, S., Lahesmaa, R., Aittokallio, T. and Elo, L.L. A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments. To appear in BMC Genomics.
Lahti, L., Elo, L.L., Aittokallio, T. and Kaski, S. Probabilistic analysis of probe reliability in differential gene expression studies with short oligonucleotide arrays. To appear in IEEE/ACM Transactions on Computational Biology and Bioinformatics.
Clément-Ziza, M., Malabat, C., Weber, C., Moszer, I., Aittokallio, T., Letondal, C. and Rousseau, S. Genoscape: a Cytoscape plug-in to automate the retrieval and integration of gene expression data and molecular networks. To appear in Bioinformatics.
Laajala, E., Aittokallio T., Lahesmaa, R. and Elo, L.L. (2009) Probe-level estimation improves the detection of differential splicing in Affymetrix exon array studies. Genome Biology 10: R77.
Hiissa, J., Elo, L.L., Huhtinen, K., Perheentupa, A., Poutanen, M. and Aittokallio, T. (2009) Resampling reveals sample-level differential expression in clinical genome-wide studies. OMICS Journal of Integrative Biology 13: 381-396.
Elo, L.L., Hiissa, J., Tuimala, J., Kallio, A., Korpelainen, E. and Aittokallio, T. (2009) Optimized detection of differential expression in global profiling experiments: case studies in clinical transcriptomic and quantitative proteomic datasets. Briefings in Bioinformatics 10: 547-555.
Huhtinen, K., Suvitie, P., Hiissa, J., Junnila, J., Huvila, J., Kujari, H., Setälä, M., Härkki, P., Jalkanen, J., Fraser, J., Mäkinen, J., Auranen, A., Poutanen, M. and Perheentupa, A. (2009) Serum HE4 concentration differentiates malignant ovarian tumours from ovarian endometriotic cysts. Br J Cancer 100:1315-1319.
Salmi, J., Nyman, T.A, Nevalainen, O.S. and Aittokallio, T. (2009) Filtering strategies for improving protein identification in high-throughput MS/MS studies. Proteomics 9: 848-860.
Aittokallio, T. (2009) Module finding approaches for protein interaction networks. In: Li, X.-L. and Ng, S.-K. (eds.) Biological Data Mining in Protein Interaction Networks, Medical Information Science Series, Chapter 18, pp. 335-353. IGI Global, Hershey, Pennsylvania, U.S.A.
Järvinen, A.P., Hiissa, J., Elo, L.L. and Aittokallio, T. (2008) Predicting quantitative genetic interactions by means of sequential matrix approximation. PLoS ONE 3: e3284.
Elo, L.L., Filén, S., Lahesmaa, R. and Aittokallio, T. (2008) Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5: 423-431.
Ahola, V., Aittokallio, T., Vihinen, M. and Uusipaikka, E. (2008) Model-based prediction of sequence alignment quality. Bioinformatics 24: 2165-2171.
Tuikkala, J., Elo, L.L., Nevalainen, O.S. and Aittokallio, T. (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinformatics 9: 202.
Vandenbogaert, M., Li-Thiao-Te, S., Kaltenbach, M., Zhang, R., Aittokallio, T. and Schwikowski, B. (2008) Alignment of LC-MS images, with applications to biomarker discovery and protein identification. Annual Reviews Issue. Proteomics 8: 650-672.
Talvinen, K., Tuikkala, J., Nevalainen, O., Rantanen, A., Hirsimäki, P., Sundström, J. and Kronqvist P. (2008) Proliferation marker securin identifies favourable outcome in invasive ductal breast cancer. Br J Cancer 99: 335-340.
Salmela, P., Nevalainen, O.S. and Aittokallio, T. (2008) A multilevel graph layout algorithm for Cytoscape bioinformatics software platform. Turku Centre for Computer Science, Technical Report 861.
Elo, L. (2007) Strategies for dealing with incomplete information in gene expression studies, PhD Dissertation, Annales Universitatis Turkuensis AI 379, pp. 1-71.
Elo, L.L., Tuikkala, J., Nevalainen, O.S. and Aittokallio, T. (2007) Predicting gene expression from combined expression and promoter profile similarity with application to missing value imputation. In: Deutsch, A., Brusch, L., Byrne, H., de Vries, G. and Herzel, H. (eds.) Mathematical Modeling of Biological Systems, Volume I, Birkhäuser Boston, pp. 97-104.
Elo, L.L., Järvenpää, H., Oresic, M., Lahesmaa, R. and Aittokallio, T. (2007) Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process. Bioinformatics 23: 2096-2103.
Vähämaa, H., Ojala, P., Pahikkala, T., Nevalainen, O., Lahesmaa, R. and Aittokallio, T. (2007) Computer-assisted identification of multi-trace electrophoretic patterns in differential display experiments. Electrophoresis 28: 879-893.
Salmi, J. (2006) Improving Data Analysis in Proteomics, PhD Dissertation, TUCS Dissertations No 76, pp. 1-75.
Salmi, J., Moulder, R., Filén, J.J., Nevalainen, O.S., Nyman, T.A., Lahesmaa, R. and Aittokallio, T. (2006) Quality classification of tandem mass spectrometry data. Bioinformatics 22: 400-406.
Tuikkala, J., Elo, L.L., Nevalainen, O.S. and Aittokallio, T. (2006) Improving missing value estimation in microarray data with gene ontology. Bioinformatics 22: 566-572.
Aittokallio, T. and Schwikowski, B. (2006) Graph-based methods for analyzing networks in cell biology, Invited review. Briefings in Bioinformatics 7: 243-255.
Elo, L.L., Katajamaa, M., Lund, R., Oresic, M., Lahesmaa, R. and Aittokallio, T. (2006) Improving identification of differentially expressed genes by integrative analysis of Affymetrix and Illumina arrays. OMICS Journal of Integrative Biology 10: 369-380.
Hahtola, S., Tuomela, S., Elo, L., Häkkinen, T., Karenko, L., Nedoszytko, B., Heikkilä, H., Saariaho-Kere, U., Roszkiewicz, J., Aittokallio, T., Lahesmaa, R. and Ranki, A. (2006) Th1-response and cytotoxicity genes are downregulated in cutaneous T-cell lymphoma. Clinical Cancer Res. 12: 4812-4821.
Talvinen, K., Tuikkala, J., Grönroos, J., Huhtinen, H., Kronqvist, P., Aittokallio, T., Nevalainen, O., Hiekkanen, H., Nevalainen, T. and Sundström, J. (2006) Biochemical and clinical approaches in evaluating the prognosis of colon cancer. Anticancer Res. 26: 4745-51.
Elo, L.L., Lahesmaa, R. and Aittokallio, T. (2006) Inference of gene co-expression networks by integrative analysis across microarray experiments. J. Integrative Bioinformatics 3: 33.
Aittokallio, T., Ojala, P., Nevalainen, T.J. and Nevalainen, O.S. (2006) Automated pattern ranking in differential display data analysis. Meth. Mol. Biol. 317: 111-122.
Ahola, V., Aittokallio, T., Vihinen, M. and Uusipaikka, E. (2006) A statistical score for assessing the quality of multiple sequence alignments. BMC Bioinformatics 7: 484.
Elo, L.L., Lahti, L., Skottman, H., Kyläniemi, M., Lahesmaa, R. and Aittokallio, T. (2005) Integrating probe-level expression changes across generations of Affymetrix arrays. Nucleic Acids Res. 33: e193.
Nikula, T., West, A., Katajamaa, M., Lönnberg, T., Sara, R., Aittokallio, T., Nevalainen, O. and Lahesmaa, R. (2005) A human ImmunoChip cDNA microarray provides a comprehensive tool to study immune response. J. Immunol. Methods 303: 122-134.
Moulder, R., Filén, J.J., Salmi, J., Katajamaa, M., Nevalainen, O., Oresic, M., Aittokallio, T., Lahesmaa, R. and Nyman, T.A. (2005) A comparative evaluation of software for the analysis of liquid chromatography-tandem mass spectrometry data from isotope coded affinity tag experiments. Proteomics 5: 2748-2760.
Aittokallio, T., Salmi, J., Nyman, T.A. and Nevalainen, O.S. (2005) Geometrical distortions in two-dimensional gels: applicable correction methods, Invited review in the Special issues on Proteomic Databases. J. Chromatogr. B 815: 25-37.
Ahola, V., Aittokallio, T., Uusipaikka, E. and Vihinen, M. (2003) Efficient estimation of emission probabilities in profile hidden Markov models. Bioinformatics 19: 2359-2368.
Rosengren, A.T., Salmi, J.M., Aittokallio, T., Westerholm, J., Lahesmaa, R., Nyman, T.A. and Nevalainen, O. (2003) Comparison of PDQuest and Progenesis software packages in the analysis of two-dimensional electrophoresis gels. Proteomics 3: 1936-1946.
Aittokallio, T., Kurki, M., Nikula, T., West, A., Lahesmaa, R. and Nevalainen, O.S. (2003) Computational strategies for analyzing data in gene expression microarray experiments. J. Bioinform. Comput. Biol. 1: 541-586.
Lund, R., Aittokallio, T., Nevalainen, O. and Lahesmaa, R. (2003) Identification of novel genes regulated by IL-12, IL-4, or TGF-ß during the early polarization of CD4+ lymphocytes. J. Immunol. 171: 5328-5336.
Chen, Z., Lund, R., Aittokallio, T., Kosonen, M., Nevalainen, O. and Lahesmaa, R. (2003) Identification of novel IL-4/Stat6-regulated genes in T lymphocytes. J. Immunol. 171: 3627-3635.
Aittokallio, T., Pahikkala, T., Ojala, P., Nevalainen, T.J. and Nevalainen, O. (2003) Electrophoretic signal comparison applied to mRNA differential display analysis. BioTechniques 34: 116-123.
Salmi, J., Aittokallio, T., Westerholm, J., Griese, M., Rosengren, A., Nyman, T.A., Lahesmaa, R. and Nevalainen, O. (2002) Hierarchical grid transformation for image warping in the analysis of two-dimensional electrophoresis gels. Proteomics 2: 1504-1515.
Aittokallio, T., Ojala, P., Nevalainen, T.J. and Nevalainen, O. (2001) Automated detection of differentially expressed fragments in mRNA differential display. Electrophoresis 22: 1935-1945.
Aittokallio, T., Ojala, P., Nevalainen, T.J. and Nevalainen, O. (2000) Analysis of similarity of electrophoretic patterns in mRNA differential display. Electrophoresis 21: 2947-2956.
