Distributed, continuous and scalable data processing, platforms
One of the great challenges in data analytics is to manage large datasets across distributed computing systems. In this research, trainees will leverage the well-developed formalisms of graphical models to manage this complexity. Raw datasets can be modelled as a family of random variables indexed by nodes of a graph capturing probabilistic dependencies among data elements. Various transformations, e.g., regressions, wavelet decompositions, are required to extract valuable hidden variables from these data. Trainees will develop a linear discrete signal processing (DSP) framework for large datasets whose underlying dependency relations are also represented graphically. Trainees working on the project will extend fundamental DSP structures and concepts, including filtering theory, Fourier transform, spectral decomposition, and frequency response, to such datasets by viewing them as signals indexed by graph nodes. This will also demand new effective methods for visualizing these graphical models representing both data and transforms. This graphical model approach will support distributed processing of large datasets.
Social media text analytics for attention, influence and affect, concept mapping and clustering, personal visualization, supply chain management, customer personalization
The distributed framework outlined above is critical for social media applications requiring classification, compression, and linear prediction for large distributed datasets, enabling more informed decisions. Ideally, these decisions will be reached through dynamic interaction with the data. Interactive visual data mining is a new way to explore large data sets by combining data mining and information visualization techniques. An interactive user interface is provided to allow the user to guide the automatic mining process by using domain knowledge and providing feedback on intermediate mining results.
Beyond interaction, there is a powerful requirement to adapt and tailor the delivery of information to the individual. In this research, trainees will fuse design methods and practices, computational visualization research, cognitive science, aesthetics studies and data analytics to address personalization in data analytics and visualization. Further, trainees will study how improved personal data profiles will allow more effective use of recommender engines and stimulate social media engagement. The project will also explore a broader range of representational modalities. For example, trainees will explore visualization methods for big data analytics systems accessible interfaces such as tactile, vibro-tactile and sonified systems.
Medical imagery, blood flow, cognitive and neurological data, epidemiology, mobile apps for people with disabilities, monitoring of patients and elderly people, best practices decision support
High-dimensional medical and health data pose a challenge for signal processing and clustering algorithms, as the “curse of dimensionality” severely affects quality, speed and robustness. In this research, trainees will develop new correlation clustering algorithms suitable for high-dimensional data based on signal/image processing techniques and Projective Adaptive Resonance Theory (PART) neural network architectures.
Trainees will also expand integrative analysis by scalable, interactive visualization of complex typed graphs in NAViGaTOR, linking information across patients, using physical protein interaction, transcription regulatory, microRNA:gene and metabolic networks, pathways, and drug:protein interactions. This will enable us not only to identify superior prognostic and predictive signatures, but to present and simulate cancer-perturbed molecular networks, which will in turn lead to identifying personalized interventions.
Trainees will apply these methods to the problem of identifying controllable risk factors that impact health conditions, and in particular to identifying meaningful clusters of patients, to allow the tailoring of communications that may influence these groups. As for social media, personalization of the visualization is also a critical issue; trainees will develop personalized systems for visualization of patient data on collaborative mobile platforms.
Personalized mobile device visualizations also have the potential to provide benefit to individuals with disabilities, whether physical or cognitive, in providing real-time situational awareness in complex urban environments. In this work, trainees will research and develop methods for geotagging urban maps and computing and visualizing routes, both indoor and outdoor, adapted to the needs of the individual (e.g., accessible routing).
Data analytics and visualization technology also has great potential to improve the safety of patients and elderly people in the home, in hospitals and extended care facilities. Trainees will develop real-time computer vision algorithms that take advantage of video monitoring systems to provide useful analytics to monitor flow of people within care facilities, allowing automatic detection of unusual events (e.g., patient falls). This application is a natural extension of our Smart Cities research.
Our cities face a number of pressing mobility challenges: sub-optimal planning of transit and roadways has led to excess greenhouse gas emissions, crime makes some areas unsafe at night when crowds are sparse, and those with disabilities find many routes closed to them. A key requirement in meeting these challenges is the ability to measure, analyze, simulate and visualize the movement of people through the 3D cityscape, regardless of their mode of transit. In this project, trainees will research integrated technologies that provides this ability.
These technologies build upon two recent scientific advances made by labs at York University. The first is a technology called 3DTown. This technology combines innovations in large-scale 3D urban modelling and computer vision to produce a real-time web-based 3D visualization of urban activities. The second is an approach to large-scale, realistic 3D modelling and visualization of crowd behaviour (for example, SteerSuite). Our goal is to fuse these two technologies to generate an integrated system that goes well beyond the state of the art in providing real-time 3D visualization, analytics and simulation capacity of urban dynamics.