A computational framework for complex disease stratification from multiple large-scale datasets

De Meulder, B.; Lefaudeux, D.; Bansal, A. T.; Mazein, A.; Chaiboonchoe, A.; Ahmed, H.; Balaur, I.; Saqi, M.; Pellet, J.; Ballereau, S.; Lemonnier, N.; Sun, K.; Pandis, I.; Yang, X.; Batuwitage, M.; Kretsos, K.; Van Eyll, J.; Bedding, A.; Davison, T.; Dodson, P.; Larminie, C.; Postle, A.; Corfield, J.; Djukanovic, R.; Chung, K. F.; Adcock, I. M.; Guo, Y. -K.; Sterk, P. J.; Manta, A.; Rowe, A.; Baribaud, F.; Auffray, C.; Gibeon, D.; Hoda, U.; Kuo, S.; Meah, S.; Meiser, A.; Fleming, L. J.; Hu, S.; Pavlidis, S.; Rossios, C.; Russel, K.; Wiegman, C.; Nezhad, A. T.; Oehmichen, A.; O'Malley, D.; Guitton, F.; Emam, I.; Agapow, P.; Rice, P.; Miles, S.; Elyasigomari, V.; Bel, E.; Brinkman, P.; Dekker, T.; Dijkhuis, A.; Hashimoto, S.; Hekking, P. -P.; Lone-Latif, S.; Lutter, R.; Ravanetti, L.; Smids, B.; Van Aalderen, W.; Van De Pol, M.; Van Drunen, K.; Van Drunen, M.; Wagener, A.; Zwinderman, K.; Adriaens, N.; Carusi, A. M.; Richard, F.; Nogueira, M. M.; Taibi, N.; Brasier, O.; Aliprantis, A.; Alving, K.; Faulenbach, C.; Braun, A.; Hohlfeld, J.; Krug, N.; Badorrek, P.; Bakke, P.; Berglind, A.; Chaleckis, R.; Dahlen, B.; Delin, I.; Gallart, H.; Gomez, C.; Hedlin, G.; Henriksson, E.; James, A. J.; Kolmert, J.; Konradsen, J.; Kupczyk, M.; Lantz, A. -S.; Lazarinis, L.; Mathon, C.; Middelveld, R.; Naz, S.; Nordlund, B.; Petren, A.; Reinke, S.; Sjodin, M.; Soderman, P.; Strandberg, K.; Wheelock, C. E.; Zetterquist, W.; Balgoma, D.; Brandsma, J.; Burg, D.; Dennison, P.; Nicholas, B.; Schofield, J. P. R.; Skipp, P. J.; Staykova, D.; Tariq, K.; Ward, J.; Wilson, S. J.; Barber, C.; Loza, M. J.; Bautmans, A.; Sandstrom, T.; Behndig, A. F.; De Alba, J.; Beleta, J.; Berton, A.; De Verdier, M. G.; Nihlen, U.; Ostling, J.; Dalentoft, T.; Lindgren, E.; Boedigheimer, M. J.; Hu, R.; Hu, X.; Yu, W.; Bigler, J.; Bonnelykke, K.; Thorsen, J.; Vising, N.; Bisgaard, H.; Bochenek, G.; Caruso, M.; Emma, R.; Campagna, D.; Thornton, B.; Carayannopoulos, L.; Gent, J.; Manzies-Gow, A.; Sogbesan, A.; Da Purificacao Rocha, P. C.; Pedro, J.; Chanez, P.; Edwards, J.; Flood, B.; Hudson, V.; Kennington, E. J.; Metcalf, L.; Rahman-Amin, M.; Reynolds, L.; Roberts, A.; Smith, J.; Supple, D.; Versnel, J.; Walker, S.; Coleman, C.; Hasan, S.; Compton, C.; Myles, D.; Riley, J.; Sousa, A. R.; Yeyasingham, E.; Pennazza, G.; Santoninco, M.; D'Amico, A.; Dahlen, S. -E.; De Boer, P.; Robberechts, M.; De Lepeleire, I.; Fitch, N.; Garret, T.; Wagers, S.; Draper, A.; Thorngren, J. -O.; Ericsson, M.; Erpenbeck, V.; Kluglich, M.; Nething, K.; Riemann, K.; Schoelch, C.; Seibold, W.; Sigmund, R.; Wald, F.; Wetzel, K.; Fichtner, K.; Erzen, D.; Galffy, G.; Horvath, I.; Szentkereszty, M.; Tamasi, L.; Fowler, S. J.; Krueger, L.; Singer, F.; Frey, U.; Gahlemann, M.; Geiser, T.; Hewitt, L.; Howarth, P.; Marouzet, L.; Martin, J.; Pink, S.; Ray, E.; Roberts, G.; Smith, C.; Gove, K.; Gozzard, N.; Williams, S.; Haughney, J.; Higgenbottam, T.; Matthews, J. G.; Holweg, C.; Rutgers, M.; Kamphuis, J.; Kerry, D.; Vink, A.; Knobel, H.; Knowles, R.; Shaw, D. E.; Smith, K. M.; Know, A.; Kots, M.; Lambrecht, B.; Masefield, S.; Nilsson, P.; Mikus, M.; Miralpeix, M.; Monk, P.; Mores, N.; Valente, S.; Montuschi, P.; Murray, C. S.; Musial, J.; Pacino, A.; Pahus, L.; Palkonen, S.; Powel, P.; Rao, N.; Santini, G.; Vestbo, J.; Von Garnier, C.; Weiszhart, Z.; Woodcock, A.; Biryukov, M.; Schneider, R.; Herzinger, S.; Satagopam, V.; Gu, W.; Da Silva, A. B.; Tielmann, A.; Bergeron, J.; Gaudette, A.; Silberberg, A.; Henderson, D.; Hayat, S.; Elefsinioti, A.; Moltzen, E. K.; Harbo, I. S.; Birgitte, J.; Bratfalean, D.; Houston, P.; Kisler, B.; Capdevila, F. B.; Verbeeck, D.; Marchetti, G.; Rahal, G.; Schuermann, H. D.; Mazuranok, L.; Hendlich, M.; Painell'S, L.; Marren, D.; Martasek, J.; Rimell, J.; Romacker, M.; Braxenthaler, M.; Sansone, S. -A.; Rocca-Serra, P.

doi:10.1186/s12918-018-0556-z

Background: Multilevel data integration is becoming a major area of research in systems biology. Within this area, multi-'omics datasets on complex diseases are becoming more readily available and there is a need to set standards and good practices for integrated analysis of biological, clinical and environmental data. We present a framework to plan and generate single and multi-'omics signatures of disease states. Methods: The framework is divided into four major steps: dataset subsetting, feature filtering, 'omics-based clustering and biomarker identification. Results: We illustrate the usefulness of this framework by identifying potential patient clusters based on integrated multi-'omics signatures in a publicly available ovarian cystadenocarcinoma dataset. The analysis generated a higher number of stable and clinically relevant clusters than previously reported, and enabled the generation of predictive models of patient outcomes. Conclusions: This framework will help health researchers plan and perform multi-'omics big data analyses to generate hypotheses and make sense of their rich, diverse and ever growing datasets, to enable implementation of translational P4 medicine.