Publications

Integrating Belief-Desire-Intention agents with large language models for reliable human–robot interaction and explainable Artificial Intelligence (2025)

We developed an innovative communication interface between humans and robots for human-in-the-loop interaction with a user interface through natural language. The novelty of the presented approach lies in integration of a Belief-Desire-Intention agent, communicating directly to a robot and ensuring safety properties and verifiable decision making, with large language models. We establish a framework that allows users to formulate commands in natural language, ground them into actionable goals, with the proven ability of Belief-Desire Intention agents to perform verifiable reasoning and goal management. We utilize the ability of this agent to represent and store its mind state, including its belief base, plan library, and history of selected events and actions, to allow LLM’s based explanations of its behavior. Our findings demonstrate that this architecture provides benefits in performance and safety. Our research is novel in the combination of LLM with symbolic AI for XAI and aims at inspiring new developments in human-centered Artificial Intelligence systems. Read https://doi.org/10.1016/j.engappai.2024.109771 and listen the podcast below [WoS]

LLM in the Loop: A Framework for Contextualizing Counterfactual Segment Perturbations in Point Clouds (2025)

3D Point Cloud Data analysis has seen success with the introduction of PointNet algorithms. Key challenges remain in optimizing segment perturbations to influence model outcomes. Traditional methods struggle to generate realistic and contextually appropriate perturbations, limiting their effectiveness e.g. in autonomous robotic systems. In this work we integrate Large Language Models into a counterfactual reasoning process, unlocking a new level of automation in segment perturbation. Our approach begins with semantic segmentation, after which LLMs intelligently select optimal replacements based on features (class label, color, area, height). By leveraging the reasoning capabilities of LLMs, we generate perturbations that are computationally efficient and semantically meaningful. Our framework combines human inspection of LLM-generated suggestions with quantitative analysis of semantic classification model performance across different LLM variants. By bridging the gap between geometric transformations and high-level semantic reasoning we redefine PCD analysis, and pave the way for more interpretable AI-driven solutions, bringing us to real-world applications with explainability and robustness. Read https://doi.org/10.1109/ACCESS.2025.3568052 and listen [WoS]

Enhancing trust in automated 3D point cloud data interpretation through explainable counterfactuals (2025)

This paper introduces a novel framework for augmenting explainability in the interpretation of point cloud data by fusing expert knowledge with counterfactual reasoning. Given the complexity and voluminous nature of point cloud datasets, derived predominantly from LiDAR and 3D scanning technologies, achieving interpretability remains a significant challenge, particularly in smart cities, smart agriculture, and smart forestry. This research posits that integrating expert knowledge with counterfactual explanations – speculative scenarios illustrating how altering input data points could lead to different outcomes – can significantly reduce the opacity of deep learning models processing point cloud data. The proposed optimization-driven framework utilizes expert-informed ad-hoc perturbation techniques to generate meaningful counterfactual scenarios when employing state-of-the-art deep learning architectures. Read here: https://doi.org/10.1016/j.inffus.2025.103032 and listen below [WoS] [Scopus]

On the disagreement problem in Human-in-the-Loop federated machine learning (2025)

The popularity of Artificial Intelligence (AI) has risen sharply in recent years, revolutionizing applications in most domains. Milestones and achievements like ChatGPT demonstrate not only the impressive capabilities of data-driven AI, but also how accessible such technologies have become. The success of AI depends on the underlying data integration processes. Among the most important processes are the training of the AI model at the core of the application and the collection and pre-processing of training data. The task of collecting high-quality training data is costly and resource-intensive, as large amounts of data have to be annotated manually. A human-in-the-loop must have extensive domain expertise for certain tasks. In this paper, we present a novel framework to maximize the efficiency of human experts in a Machine Learning (ML) pipeline, optimizing the use of human experience in active learning. This is done by constantly measuring the quality of human experts’ input, as well as by involving humans only when needed. We showcase the benefits of our proposed framework by applying it to a problem in image classification, proving its usefulness to reduce the cost of annotating training data. Read the paper https://doi.org/10.1016/j.jii.2025.100827 and listen to the podcast below [WoS]

Tree smoothing: Post-hoc regularization of tree ensembles for interpretable machine learning (2025)

Random Forests (RFs) are powerful ensemble learning algorithms but tend to overfit noisy or irrelevant features, resulting in decreased generalization performance. We introduce Beta-Binomial Tree Smoothing (BBTS), a novel innovative post-hoc regularization technique designed for RFs. Our method significantly enhances the accuracy and interpretability of machine learning models. Unlike conventional pruning methods that alter a tree’s structure, BBTS achieves regularization by adjusting node probabilities, emphasizing reliable class distributions closer to the root whilst diminishing the influence of potentially noisy splits deeper in the tree. This approach not only leads to superior predictive performance but also provides valuable insights into the model’s confidence in its predictions, a critical aspect for Explainable AI (XAI). Furthermore, the framework allows for the integration of human expert knowledge human-in-the-loop, which can boost the model’s reliability and robustness, ultimately fostering greater trust in AI systems. This work sets a new benchmark for developing more transparent and trustworthy AI applications in critical domains. Read the paper https://doi.org/10.1016/j.ins.2024.121564 and listen to the podcast below. [WoS]

Fine-tuning language model embeddings to reveal domain knowledge: An explainable artificial intelligence perspective on medical decision making (2025)

We propose a novel explainable AI (XAI) tool to accelerate data-driven AI cancer research. We apply the Bidirectional Encoder Representations from Transformers (BERT) model to German language pathology reports examining the effects of domain-specific language adaptation and fine-tuning. We demonstrate our model on a real-world pathology dataset, analyzing the contextual representations of diagnostic reports. By illustrating decisions made by fine-tuned models, we provide decision values that can be applied in medical research. In critical domains inspection of the knowledge map in conjunction with expert evaluation reveals valuable information about how contextual representations of key features are categorized. This ultimately benefits data structuring and labeling and paves the way for even more advanced approaches to XAI, combining text with other input modalities, such as images which are then applicable to various engineering problems. Read here: https://doi.org/10.1016/j.engappai.2024.109561 and listen below [WoS]

Class imbalance in multi-resident activity recognition: an evaluative study on explainability of deep learning approaches (2025)

Recognizing multiple residents’ activities is a pivotal domain within active assisted living technologies, where the diversity of actions in a multi-occupant home poses a challenge due to their unequal distribution. Frequent activities contrast with those occurring sporadically, necessitating adept handling of class imbalance to ensure the integrity of activity recognition systems based on raw sensor data. While deep learning has proven its merit in identifying activities for solitary residents within balanced datasets, its application to multi-resident scenarios requires careful consideration. This study tackles the issue of class imbalance and explores the efficacy of Long Short-Term Memory and Bidirectional Long Short-Term Memory networks in discerning activities of multiple residents, considering both individual and aggregate labeling of actions. Our research scrutinizes the explicability of deep learning models, enhancing their transparency and reliability, offering insights into the models’ behavior and contributing to the advancement of trustworthy multi-resident activity recognition systems, read the paper: https://doi.org/10.1007/s10209-024-01123-0 and listen the podcast below [WoS]

NiaAML: AutoML for classification and regression pipelines (2025)

In this paper we present NiaAML, an AutoML framework that we have developed for creating machine learning pipelines and hyperparameter tuning. The composition of machine learning pipelines is presented as an optimization problem that can be solved using various stochastic, population-based, nature-inspired algorithms. Nature-inspired algorithms are powerful tools for solving real-world optimization problems, especially those that are highly complex, nonlinear, and involve large search spaces where traditional algorithms may struggle. They are applied widely in various fields, including robotics, operations research, and bioinformatics. This paper provides a comprehensive overview of the software architecture, and describes the main tasks of NiaAML, including the automatic composition of classification and regression pipelines. The overview is supported by an practical illustrative example – see on GitHub. Read the paper https://doi.org/10.1016/j.softx.2024.101974 and listen to the podcast below [WoS]

Post-hoc vs ante-hoc explanations: xAI design guidelines for data scientists (2024) Highly-Cited Paper in Journal Cognitive Systems Research

The growing field of explainable Artificial Intelligence (xAI) has given rise to a multitude of techniques and methodologies, yet this expansion has created a growing gap between existing xAI approaches and their practical application. This poses a considerable obstacle for data scientists striving to identify the optimal xAI technique for their needs. To address this problem, our study presents a customized decision support framework to aid data scientists in choosing a suitable xAI approach for their use-case. Drawing from a literature survey and insights from interviews with five experienced data scientists, we introduce a decision tree based on the trade-offs inherent in various xAI approaches, guiding the selection between six commonly used xAI tools. Our work critically examines six prevalent ante-hoc and post-hoc xAI methods, assessing their applicability in real-world contexts through expert interviews. The aim is to equip data scientists and policymakers with the capacity to select xAI methods that not only demystify the decision-making process, but also enrich user understanding and interpretation, ultimately advancing the application of xAI in practical settings. Read the paper here: https://doi.org/10.1016/j.cogsys.2024.101243 and listen to the podcast below [WoS]

Human-in-the-Loop Reinforcement Learning (2024) Highly-Cited Paper in Journal of Artificial Intelligence Research (JAIR)

Artificial intelligence (AI), particularly reinforcement learning (RL), enables autonomous task performance with superhuman capabilities. Despite this, RL remains fundamentally a Human-in-the-Loop (HITL) paradigm, especially when reward functions are hard to specify. Reinforcement Learning from Human Feedback (RLHF), as used in systems like ChatGPT, illustrates the value of integrating human input into the training process. HITL RL involves iterative refinement of agent behaviour through human feedback, underscoring the need for human-centric design. This paper reviews current explainability methods in HITL RL and proposes improvements using explainable AI (xAI) to support diverse user groups. Based on ML and software workflows, we define four human-involvement phases in HITL RL: Agent Development, Learning, Evaluation, and Deployment. For each phase, we examine human roles, explanation needs, challenges, and goals. We identify promising areas for xAI research and propose a vision for effective human-agent collaboration. Read the paper: https://doi.org/10.1613/jair.1.15348 and listen to the Podcast below [WoS]

On generating trustworthy counterfactual explanations

Deep learning like chatGPT exemplify AI success but necessitate trust, which can be achieved using counterfactual explanations. This is how humans become familiar with unknown processes: by understanding the hypothetical input circumstances under which the output changes. We argue that generation of counterfactual explanations requires several aspects of the generated counterfactual instances. We present a framework that formulates its goal as a multiobjective optimization problem balancing 3 objectives: plausibility; intensity of changes; and adversarial power. We use a generative adversarial network to model the distribution of the input, along with a multiobjective counterfactual discovery solver balancing these objectives. We demonstrate the usefulness of 6 classification tasks with image/3D data confirming the existence of a trade-off between objectives, consistency of the counterfactual explanations with human knowledge, and the capability of the framework to unveil the existence of concept-based biases and misrepresented attributes in the input domain of the audited model. We inspire further work on plausible counterfactuals in real-world scenarios where attribute-/concept-based annotations are available for the domain under analysis [WoS]

The next frontier : AI We Can Really Trust

This is the keynote paper “The next frontier : AI We Can Really Trust” presented at the ECML-PKDD 2021 – Machine Learning and Principles and Practice of Knowledge Discovery in Databases: Holzinger A. (2021) The Next Frontier: AI We Can Really Trust. In: Kamp M. et al. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_33 This paper is fostering a human-centered AI approach due to the fact taht the use of AI in domains that impact human life has led to an increased demand for robust, explainable hence trustworthy AI. One possible step to make AI more robust is to combine statistical learning with knowledge representations. For certain tasks, it can be advantageous to include a human in the loop to make use of the fascinating abilities of the natural intelligence of humans. [paper, pdf, 752 KB] [Scholar] [WoS]39

Digital Transformation in Smart Farm and Forest Operations Needs Human-Centered AI

In this paper we describe the use of trustworthy AI in two for mankind and our planet important domains: agriculture and forestry fostered by explainability and robustness. One step to make AI more robust is to use expert knowledge. For example, a farmer or a forester can bring in their experience and conceptual understanding to the AI pipeline. Consequently, human-centred AI (HCAI) can be seen as a combination of ‘artificial intelligence’ and ‘natural intelligence’ to strengthen, enhance and complement human performance, rather than replacing humans. To ensure practical success we introduce three Frontier Research Areas: (1) Intelligent Information Fusion, (2) Robotics and Embodied Intelligence, and (3) Augmentation, Explanation and Verification for Trustworthy Decision Support. [Scholar] [WoS]

Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI

Andreas Holzinger, Bernd Malle, Anna Saranti & Bastian Pfeifer (2021). Towards Multi-Modal Causability with Graph Neural Networks enabling Information Fusion for explainable AI. Information Fusion, 71, (7), 28-37, doi:10.1016/j.inffus.2021.01.008 In this paper our central hypothesis is that using conceptual knowledge as a guiding model of reality will help to train more explainable, more robust and less biased machine learning models, ideally able to learn from less data. One important aspect in many application domains is that various modalities contribute to one single result. Our main question is “How can we construct a multi-modal feature representation space (spanning images, text, genomics data) using knowledge bases as an initial connector for the development of novel explanation techniques?”. In this paper we argue for using Graph Neural Networks as a method-of-choice, enabling information fusion for multi-modal causability (causability – not to confuse with causality – is the measurable extent to which an explanation to a human expert achieves a specified level of causal understanding). [Project Page] [Scholar] [publons] 79

Classification by ordinal sums of conjunctive & disjunctive functions for explainable AI & interpretable machine learning

Miroslav Hudec, Erika Minarikova, Radko Mesiar, Anna Saranti & Andreas Holzinger (2021). Classification by ordinal sums of conjunctive and disjunctive functions for explainable AI and interpretable machine learning solutions. Knowledge Based Systems, doi:10.1016/j.knosys.2021.106916 In this paper we propose a novel classification according to aggregation functions of mixed behavior by variability in ordinal sums of conjunctive and disjunctive functions. Domain experts are empowered to assign only the most important observations regarding the considered attributes. This has the advantage that the variability of the functions provides opportunities for machine learning to learn the best possible option. Such a solution is comprehensible, reproducible and explainable-per-design to domain experts. We discuss the proposed approach with examples and outline the research steps in interactive machine learning with a human-in-the-loop over aggregation functions. Although human experts are not always able to explain anything either, they are sometimes able to bring in experience, contextual understanding and implicit knowledge, which is desirable in certain machine learning tasks and can contribute to the robustness of algorithms. [Project Page] [Scholar] [publons] 03

KANDINSKYPatterns - An experimental exploration environment for Pattern Analysis and Machine Intelligence

Andreas Holzinger, Anna Saranti & Heimo Mueller (2021). arXiv 2103.00519. Machine intelligence is successful at recognition tasks when having high-quality data. There is still a gap between machine-level pattern recognition and human-level concept learning. Humans can learn under uncertainty from few examples and generalize these concepts to solve new problems. The growing interest in explainable AI, requires diagnostic tests to analyze weaknesses in existing approaches. In this paper, we discuss existing diagnostic test data sets such as CLEVR, CLEVERER, CLOSURE, CURI, Bongard-LOGO, V-PROM, and present our own experimental environment: KANDINSKYPatterns, named after Wassily Kandinksy, who made contributions to compositivity, i.e. that all perceptions consist of geometrically elementary individual components. This was experimentally proven by Hubel &Wiesel in the 1960s and became the basis for machine learning approaches such as the Neocognitron and later Deep Learning. While KP have computationally controllable properties, bringing ground truth, they are also distinguishable by human observers, i.e., controlled patterns can be described by both humans and algorithms, making them another important contribution to international research in machine intelligence. [Project Page] [Scholar]

Artificial Intelligence and Machine Learning for Digital Pathology

Andreas Holzinger, Randy Goebel, Michael Mengel, Heimo Mueller (eds.) 2020. Artificial Intelligence and Machine Learning for Digital Pathology: State-of-the-Art and Future Challenges, Springer Lecture Notes in Artificial Intelligence Volume 12090, doi:10.1007/978-3-030-50402-1 Data driven AI and ML in digital pathology, radiology, dermatology is promising. In specific cases Deep Learning even exceeds human performance, however, in the context of medicine it is important for a human expert to verify the outcome. There is a need for transparency and re-traceability of state-of-the-art solutions to make them usable for ethical responsible medical decision support. Moreover, big data is required for training, covering a wide spectrum of a variety of human diseases in different organ systems. These data sets must meet top-quality and regulatory criteria and must be well annotated for ML at patient-, sample-, and image-level. Mentioned by WHO as important application of AI for human health [Book Homepage]

Measuring the Quality of Explanations: The Systems Causability Scale (SCS). Comparing Human and Machine Explanations.

Andreas Holzinger, Andre Carrington & Heimo Müller 2020. Measuring the Quality of Explanations: The System Causability Scale (SCS). Comparing Human and Machine Explanations. KI – Künstliche Intelligenz (German Journal of Artificial intelligence), Special Issue on Interactive Machine Learning, Edited by Kristian Kersting, TU Darmstadt, 34, (2), doi:10.1007/s13218-020-00636-z., online available via https://link.springer.com/article/10.1007/s13218-020-00636-z In this paper we introduce the System Causability Scale (SCS) to measure the qualty of explanations. It is based on the notion of Causability (Holzinger et al., 2019) combned with concepts adapted from the widely accepted System Usability Scale (SUS). In the same way as usability measures the quality of use, Causability measures the quality of explanations. [xAI-Project] [Scholar] [publons] 68

Causability and Explainability of Artificial Intelligence in Medicine

Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal & Heimo Mueller 2019. Causability and Explainability of AI in Medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, doi:10.1002/widm.1312 In this paper we intrduce the notion of Causability, which is extending explainability and is of great importance for future Human-AI interfaces (see our paper on dialogue systems for intelligent user interfaces). Such interfaces for explainable AI have to map the technical explainability (which is a property of an AI, e.g. the heatmap of a neural network produced by e.g. layer wise relevance propagation) with causability (which is a property of a human, i.e. the extent to which the technical explanation is interpretable by a human) and to answer questions of why we need a ground truth, i.e. a framework for understanding. Here counterfactuals are important P (y _x| x ^′, y ^′) with the typical activity of “retrospection” and questions including “what-if?” [Systems Causability Scale] [Scholar] [publons] 287

KANDINSKY Patterns: A Swiss-Knife for the Study of Explainable AI

Andreas Holzinger, Peter Kieseberg & Heimo Müller 2020. KANDINSKY Patterns: A Swiss-Knife for the Study of Explainable AI. ERCIM News, (120), 41-42. [pdf, 755 KB] Online available: https://ercim-news.ercim.eu/en120/r-i/kandinsky-patterns-a-swiss-knife-for-the-study-of-explainable-ai Kandinsky Patterns enable testing, benchmarking and evaluating machine learning algorithms under mathematically strictly controllable conditions, but at the same time are accessible and understandable for human observers and with the possibility to produce (and hide) a ground truth. This helps us in understanding “how do humans explain ?” and do basic research in ground truth. This is important, as adversarial examples have already demonstrated their potential in attacking security mechanisms applied in various domains, especially medical environments. Last, but not least, Kandinsky Patterns can be used to produce “counterfactuals” – the “what if”, which is difficult to handle for both humans and machines – but can provide novel insights into the behaviour of explanation methods. [Project page]

A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

André M. Carrington, Paul W. Fieguth, Hammad Qazi, Andreas Holzinger, Helen H. Chen, Franz Mayr & Douglas G. Manuel 2020. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. Springer/Nature BMC Medical Informatics and Decision Making, 20, (1), 4, doi:10.1186/s12911-019-1014-6. In explainable AI a very important issue is robustness of machine learning algorithms. For measuring robustness, we introduce a novel concordant partial Area Under the Curve (AUC) and a new partial c statistic for Receiver Operator Characteristic (ROC) data—as foundational measures to help to understand and to explain ROC and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. [relevant for xAI] [Scholar] [publons] 38

KANDINSKY Patterns as Intelligence Test for machines

Andreas Holzinger, Michael Kickmeier-Rust & Heimo Mueller 2019. KANDINSKY Patterns as IQ-Test for machine learning. Springer Lecture Notes LNCS 11713. Cham (CH): Springer Nature Switzerland, pp. 1-14, doi:10.1007/978-3-030-29726-8_1 . AI follows the notion of human intelligence which is not a clearly defined term, according to cognitive science includes abilities to think abstract, to reason, and to solve problems from the real world. A hot topic in current AI/machne learning research is to find whether and to what extent algorithms are able to learn abstract thinking and reasoning similarly as humans can do, or whether the learning remains on purely statistical correlations. In this paper we propose to use our Kandinsky Patterns as an IQ-Test for machines and to study concept learning which is a fundamental problem for future AI/ML. [Paper] [exploration enviroment] [TEDx] [Scholar] [publons] 11

Dialogue Systems for Intelligent Human Computer Interactions

Erinc Merdivan, Deepika Singh, Sten Hanke & Andreas Holzinger 2019. Dialogue Systems for Intelligent Human Computer Interactions. Electronic Notes in Theoretical Computer Science, 343, 57-71, doi:10.1016/j.entcs.2019.04.010. Online available via: https://www.sciencedirect.com/science/article/pii/S1571066119300106 In this paper we present some fundamentals on communication techniques for interation in dialogues involving speech, gesture, semantic and pragmatic knowledge and present a new image-based method in an Out Of Vocabulary setting. The results show that using dialogue as an image performs well and helps dialogue manager in expanding out of vocabulary dialogue tasks in comparison to Memory Networks. This is important for future Human-AI interfaces. [relevant for xAI] [Scholar] [publons] 13

The first publication on our KANDINSKY Universe, the experimental environment for explainability and causability

Mueller, H. & Holzinger, A. 2021. Kandinsky Patterns. Artificial intelligence, 300, (11), 103546, doi:10.1016/j.artint.2021.103546 In the medical domain (e.g. histopathology) the ground truth is in generally accepted textbooks, hence in the brain of the human pathologist, but often not directly accessible. Here the KANDINSKY Figures and KANDINSKY Patterns come into play: those are mathematically-logically describable, simple, self-contained, hence controllable test data sets for the development, validation and training of explainability/interpretability in artificial intelligence (AI) and machine learning (ML). While they possess these computationally manageable properties, they are at the same time easily distinguishable by human observers, so can be described by both humans and algorithms. We invite the international machine learning research community to a challenge to experiment with our Kandinsky Patterns to expand and thus make progress in the field of explainable AI and to contribute to the upcoming field of explainability and causability. [Project Page] [Scholar] [publons] 01

Interactive machine learning: experimental evidence for the human in the algorithmic loop: A case study on Ant Colony Optimization

Andreas Holzinger, Markus Plass, Michael Kickmeier-Rust, Katharina Holzinger, Gloria Cerasela Crişan, Camelia-M. Pintea & Vasile Palade 2019. Interactive machine learning: experimental evidence for the human in the algorithmic loop. Applied Intelligence, 49, (7), 2401-2414, doi:10.1007/s10489-018-1361-5. Online available: https://link.springer.com/article/10.1007/s10489-018-1361-5 In this paper we provide novel experimental insights on how we can improve computational intelligence by complementing it with human intelligence in an interactive machine learning approach (iML). For this purpose, we used the Ant Colony Optimization (ACO) framework, because this fosters multi-agent approaches with human agents in the loop (see when we need a human-in-the-loop). We propose a unification between human intelligence and interaction abilities and the computing power of an artificial machine learning system. The “human-in-the-loop” brings in conceptual knowledge that no algorithm on this planet yet has. [Scholar] [publons] 73

Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics

Zhou, J., Gandomi, A. H., Chen, F. & Holzinger, A. 2021. Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics. Electronics, 10, (5), 593, doi:10.3390/electronics10050593 While numerous explanation methods have been explored, there is a need for evaluations to quantify the quality of explanation methods to determine whether and to what extent the offered explainability achieves the defined objective, and compare available explanation methods and suggest the best explanation from the comparison for a specific task. This survey paper presents a comprehensive overview of methods proposed in the current literature for the evaluation of ML explanations. We identify properties of explainability from the review of definitions of explainability. The identified properties of explainability are used as objectives that evaluation metrics should achieve. The survey found that the quantitative metrics for both model-based and example-based explanations are primarily used to evaluate the parsimony/simplicity of interpretability, while the quantitative metrics for attribution-based explanations are primarily used to evaluate the soundness of fidelity of explainability. The survey also demonstrated that subjective measures, such as trust and confidence, have been embraced as the focal point for the human-centered evaluation of explainable systems. The paper concludes that the evaluation of ML explanations is a multidisciplinary research topic. It is also not possible to define an implementation of evaluation metrics, which can be applied to all explanation methods. [Scholar] [publons] 22

From Computer Innovation to Human Integration: Current Trends and Challenges for Pervasive Health Technologies

Röcker, C., Ziefle, M. & Holzinger, A. 2014. From Computer Innovation to Human Integration: Current Trends and Challenges for Pervasive Health Technologies. In: Holzinger, Andreas, Ziefle, Martina & Röcker, Carsten (eds.) Pervasive Health: State-of-the-Art and Beyond. Springer London, pp. 1-17, doi:10.1007/978-1-4471-6413-5_1 This chapter starts with an overview of the technical innovations and societal transformation processes we have seen in the last decades and as well as the consequences those changes have for the design of pervasive healthcare systems. Based on this theoretical foundation, emerging design requirements and research challenges are outlined, which are crucial to be addressed when developing future health technologies. [Scholar] [publons] 38

Rapid prototyping for a Virtual Medical Campus interface

Holzinger, A. 2004. Rapid Prototyping to the User Interface Development for a Virtual Medical Campus. IEEE Software, 21, (1), 92–99, doi:10.1109/MS.2004.1259241 . This is a best practice paper about the design of a Virtual Campus, working under a strict time-line, used simple, rapid, cost-effective prototyping techniques to create a user interface and release a working system within six months. Involving users early in the interface design facilitated acceptance. The VMC system architecture includes a multimedia repository for reusable learning objects; middleware containing the VMC logic that arranges learning objects into lectures, themes, and modules; and user interface. To ensure the interface suited the target population, we used the user-centered design method. This true pioneer project began in 2002 and was worldwide one of the first large-scale projects in development of a virtual campus system to support full and large-scale online learning – 20 years later in times of the Covid 19 pandemic all this becomes important again. [Scholar] [publons] 37

Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data

Andreas Holzinger, Benjamin Haibe-Kains & Igor Jurisica 2019. Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data. European Journal of Nuclear Medicine and Molecular Imaging, 46, (13), 2722-2730, doi:10.1007/s00259-019-04382-9. Integration of clinical, imaging, molecular data is necessary to understand complex diseases, and to achieve accurate diagnosis to provide the best possible treatment. In addition to the need for sufficient computing resources, suitable algorithms, models, and data infrastructure, three important aspects are often neglected: (1) the need for multiple independent, sufficiently large and, above all, high-quality data sets; (2) the need for domain knowledge and ontologies; and (3) the requirement for multiple networks that provide relevant relationships among biological entities. While one will always get results out of high-dimensional data, all three aspects are essential to provide robust training and validation of ML models, to provide explainable hypotheses and results, and to achieve the necessary trust in AI and confidence for clinical applications. [Preprint available here]

Human Activity Recognition Using Recurrent Neural Networks

Deepika Singh, Erinc Merdivan, Ismini Psychoula, Johannes Kropf, Sten Hanke, Matthieu Geist & Andreas Holzinger 2017. Human Activity Recognition Using Recurrent Neural Networks. In: Lecture Notes in Computer Science LNCS 10410. Cham: Springer International, pp. 267-274, doi:10.1007/978-3-319-66808-6_18. In this paper, we introduce a deep learning model that learns to classify human activities without using any prior knowledge. For this purpose, a Long Short Term Memory (LSTM) Recurrent Neural Network was applied to three real world smart home datasets. The results of our experiments show that the proposed approach outperforms existing in terms of accuracy and performance. Human activity recognition using smart home sensors is one of the bases of ubiquitous computing in smart environments and a topic undergoing intense research in the field of ambient assisted living. The increasingly large amount of data sets calls for machine learning methods. https://arxiv.org/abs/1804.07144

Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language

Miroslav Hudec, Erika Bednárová & Andreas Holzinger 2018. Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language. Journal of Official Statistics (JOS), 34, (4), 981, doi:10.2478/jos-2018-0048. Online available: https://content.sciendo.com/view/journals/jos/34/4/article-p981.xml In this paper we study the potential of natural language summaries expressed in short quantified sentences. Linguistic summaries are not intended to replace existing dissemination approaches, but can augment them by providing alternatives for the benefit of diverse users (e.g. domain experts, general public, disabled people, …). The concept of lingusitic summaries is demonstrated on test interfaces, which can be important for future human-AI dialogue systems. [relevant for xAI]

Computational approaches for mining user’s opinions on the Web 2.0

Gerald Petz, Michał Karpowicz, Harald Fuerschuss, Andreas Auinger, Vaclav Stritesky & Andreas Holzinger 2014. Computational approaches for mining user’s opinions on the Web 2.0. Information Processing & Management, 50, (6), 899-908, doi:10.1016/j.ipm.2014.07.005. Computational opinion mining discovers, extracts and analyzes people’s opinions, sentiments, attitudes and emotions towards certain topics in social media. While providing interesting market research information, the user generated content presents numerous challenges regarding systematic analysis, the differences and unique characteristics of the various social media channels. Here we report on the determination of such particularities, and deduces their impact on text preprocessing and opinion mining algorithms (sentiment anslaysis). [RG] [publons] 25

Explainable AI: The New 42?

Randy Goebel, Ajay Chander, Katharina Holzinger, Freddy Lecue, Zeynep Akata, Simone Stumpf, Peter Kieseberg & Andreas Holzinger 2018. Explainable AI: the new 42? Springer Lecture Notes in Computer Science LNCS 11015. Cham: Springer, pp. 295-303, doi:10.1007/978-3-319-99740-7_21. In this 2018 output of our yearly xAI-workshop at the CD-MAKE conference we discuss some issues of the current state-of-the-art in what is now called explainable AI and outline what we think is the next big thing in AI/machine learning: the combination of statistical probabilistic machine learning methods with classic logic based symbolic artificial intelligence. Maybe the field of explainable ai can act as an ideal bridge to combine these two worlds. [pdf, 875 kB]

Integrated web visualizations for protein-protein interaction databases

Jeanquartier, F., Jean-Quartier, C. & Holzinger, A. 2015. Integrated web visualizations for protein-protein interaction databases. BMC Bioinformatics, 16, (1), 195, doi:10.1186/s12859-015-0615-z Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. [Scholar] [WoS]

Interpretierbare KI: Neue Methoden zeigen Entscheidungswege künstlicher Intelligenz auf

Andreas Holzinger 2018. Interpretierbare KI: Neue Methoden zeigen Entscheidungswege künstlicher Intelligenz auf. c’t Magazin für Computertechnik, 22, 136-141. Machinelles Lernen bringt heute KI-Systeme hervor, die Entscheidungen schneller treffen als ein Mensch. Darf dieser sich aber entmündigen lassen? Neue Methoden machen Entscheidungswege nachvollziehbar, interpretierbar und damit transparent und schaffen so Vertrauen (trust) und Akzeptanz – oder sie decken Missverständnisse auf. Menschen können (manchmal – nicht immer) Zusammenhänge im Kontext verstehen und aus wenigen Beispielen generalisieren. Ein menschlicher Experte kann helfen, wo die KI an ihre Grenzen kommt, aber auch KI kann unterstützen, wo Menschen an ihre Grenzen kommen. Ärzte können von monotonen Routineaufgaben entlastet werden, während gleichzeitig, KI-Systeme und menschliche Experten gemeinsam bessere Entscheidungen treffen als jeweils für sich allein [pdf, 871 kB]. Online verfügbar: https://www.heise.de/select/ct/2018/22/1540263049336608

Explainable AI

Andreas Holzinger 2018. Explainable AI (ex-AI). Informatik-Spektrum, 41, (2), 138-143, doi:10.1007/s00287-018-1102-5. ,,Explainable AI“ ist kein neues Gebiet. Das Problem der Erklärbarkeit ist so alt wie die AI selbst, ja vielmehr das Resultat ihrer selbst. Während regelbasierte Lösungen der frühen AI nachvollziehbare ,,Glass-Box“-Ansätze darstellten, lag deren Schwäche im Umgang mit Unsicherheiten der realen Welt. Durch die Einführung probabilistischer Modellierung und statistischer Lernmethoden wurden die Anwendungen zunehmend erfolgreicher – aber immer komplexer und opak. Beispielsweise werden Wörter natürlicher Sprache auf hochdimensionale Vektoren abgebildet und dadurch für Menschen nicht mehr verstehbar. In Zukunft werden kontextadaptive Verfahren notwendig werden, die eine Verknüpfung zwischen statistischen Lernmethoden und großen Wissensrepräsentationen (Ontologien) herstellen und Nachvollziehbarkeit, Verständlichkeit und Erklärbarkeit erlauben – dem Ziel von ,,explainable AI“. Online verfügbar: https://link.springer.com/article/10.1007/s00287-018-1102-5

Emotion Detection: Application of the Valence Arousal Space for Rapid Biological Usability Testing

Stickel, C., Ebner, M., Steinbach-Nordmann, S., Searle, G. & Holzinger, A. 2009. Emotion Detection: Application of the Valence Arousal Space for Rapid Biological Usability Testing to Enhance Universal Access. In: Stephanidis, Constantine (ed.) Universal Access in Human-Computer Interaction. Addressing Diversity, Lecture Notes in Computer Science, LNCS 5614. Berlin, Heidelberg: Springer, pp. 615-624, doi:10.1007/978-3-642-02707-9-70 Emotion is an important mental and physiological state – in times of AI even more important – influencing cognition, perception, learning, communication, decision making, etc. It is considered as a definitive important aspect of user experience (UX), although at least well developed and most of all lacking experimental evidence. This paper deals with an application for emotion detection in usability testing of software. It describes the approach to utilize the valence arousal space for emotion modeling in a formal experiment. Our study revealed correlations between low performance and negative emotional states. Reliable emotion detection in usability tests will help to prevent negative emotions and attitudes in the final products. [Scholar] [publons] 34

Human Annotated Dialogues Dataset for Natural Conversational Agents

Erinc Merdivan, Deepika Singh, Sten Hanke, Johannes Kropf, Andreas Holzinger & Matthieu Geist 2020. Human Annotated Dialogues Dataset for Natural Conversational Agents. Applied Sciences, 10, (3), 1-16, doi:10.3390/app10030762. [Scholar] We developed a benchmark dataset with human annotations and replies, useful to develop metrics for conversational agents. This is relevant for the xAI research community, because conversational agents are gaining huge popularity in industrial applications (e.g. digital assistants, chatbots, and particularly systems for natural language understanding (NLU), for medical decision support). A major drawback is the unavailability of a common metric to evaluate the replies against human judgement for conversation agents. Human responses include: (i) ratings of the dialogue reply in relevance to the dialogue history; and (ii) unique dialogue replies for each dialogue history from the users. This enables evaluating models against six unique human responses for each given history. Detailed analysis on how dialogues are structured and human perception on dialogue score in comparison with existing models are also presented.

Convolutional and Recurrent Neural Networks for Activity Recognition in Smart Environment

Singh, D., Merdivan, E., Hanke, S., Kropf, J., Geist, M. & Holzinger, A. 2017. Convolutional and Recurrent Neural Networks for Activity Recognition in Smart Environment. In: Holzinger, Andreas, Goebel, Randy, Ferri, Massimo & Palade, Vasile (eds.) Towards Integrative Machine Learning and Knowledge Extraction: BIRS Workshop, Banff, AB, Canada, July 24-26, 2015, Revised Selected Papers. Cham: Springer International Publishing, pp. 194–205, doi:10.1007/978-3-319-69775-8_12 Convolutional Neural Networks (CNN) are very useful for fully automatic extraction of discriminative features from raw sensor data. This is an important problem in activity recognition, which is of enormous interest in ambient sensor environments due to its universality on various applications. Activity recognition in smart homes uses large amounts of time-series sensor data to infer daily living activities and to extract effective features from those activities, which is a challenging task. In this paper we demonstrate the use of the CNN and a comparison of results, which has been performed with Long Short Term Memory (LSTM), recurrent neural networks and other machine learning algorithms, including Naive Bayes, Hidden Markov Models, Hidden Semi-Markov Models and Conditional Random Fields. [Scholar] [WoS]

Towards a Deeper Understanding of How a Pathologist Makes a Diagnosis

Birgit Pohn, Michaela Kargl, Robert Reihs, Andreas Holzinger, Kurt Zatloukal & Heimo Müller. Towards a Deeper Understanding of How a Pathologist Makes a Diagnosis: Visualization of the Diagnostic Process in Histopathology. IEEE Symposium on Computers and Communications (ISCC 2019), 2019 Barcelona. IEEE, 1081-1086, doi:10.1109/ISCC47284.2019.8969598. Advancements in Artificial Intelligence (AI) and Machine Learning (ML) are enabling new diagnostic capabilities. In this paper we argue that the very first step before introducing AI/ML into diagnostic workflows is a deep understanding of how pathologists work. We developed a visualization concept, including: (a) the sequence of the views observed by the pathologist (Observation Path), (b) the sequence of the spoken comments and statements of the pathologist (Dictation Path), (c) the underlying knowledge and experience of the pathologist (Knowledge Path), (d) information about the current phase of the diagnostic process and (e) the current magnification factor of the microscope chosen by the pathologist. This is highly important for explainable AI [Paper] [Scholar]

NLP for the Generation of Training Data Sets for Ontology-Guided Weakly-Supervised Machine Learning in Digital Pathology

Robert Reihs, Birgit Pohn, Kurt Zatloukal, Andreas Holzinger & Heimo Müller. NLP for the Generation of Training Data Sets for Ontology-Guided Weakly-Supervised Machine Learning in Digital Pathology. 2019 IEEE Symposium on Computers and Communications (ISCC), 2019. IEEE, 1072-1076, doi:10.1109/ISCC47284.2019.8969703. The combination of ontologies with machine learning (ML) approaches is a hot topic and not yet extensively investigated but having great future potential – particularly for explainable AI – interpretable machine learning. Since full annotation on pixel level would be impracticably expensive, a practical solution is in weakly-supervised ML. In this paper we used ontology-guided natural language processing (NLP) for term extraction and a decision tree built with an expert-curated classification system. This demonstrates the practical value of our solution to analyze and structure training data sets for ML and as a tool for the generation of biobank catalogues. [xAI-Project] [Scholar] [RG]

In silico modeling for tumor growth visualization

Fleur Jeanquartier, Claire Jean-Quartier, David Cemernek & Andreas Holzinger 2016. In silico modeling for tumor growth visualization. BMC Systems Biology, 10, (1), 1-15, doi:10.1186/s12918-016-0318-8.

In-silico methods overcome the lack of wet experimental possibilities and as dry method succeed in terms of reduction, refinement and replacement of animal experimentation, also known as the 3R principles. Our visualization approach to simulation allows for more flexible usage and easy extension to facilitate understanding and gain novel insight. Biomedical research in general and research on tumor growth in particular will benefit from the systems biology perspective. We aim to provide a comprehensive and expandable simulation tool to visualizing tumor growth. This novel Web-based application offers the advantage of a user-friendly graphical interface with several manipulable input variables to correlate different aspects of tumor growth. [Paper] [Scholar]

In silico cancer research towards 3R

Claire Jean-Quartier, Fleur Jeanquartier, Igor Jurisica & Andreas Holzinger 2018. In silico cancer research towards 3R. Springer/Nature BMC cancer, 18, (1), 408, doi:10.1186/s12885-018-4302-0

Underlining and extending the in-silico approach with respect to the 3Rs (replacement, reduction, refinement) will lead cancer research towards efficient and effective precision medicine. Therefore, we suggest refined translational models and testing methods based on integrative analyses and the incorporation of computational biology within cancer research. We give an overview on in vivo, in vitro and in silico methods used in cancer research. Common models as cell-lines, xenografts, or genetically modified rodents reflect relevant pathological processes to a different degree, but can not replicate the full spectrum of human disease. There is an increasing importance of computational biology, advancing from the task of assisting biological analysis with network biology approaches as the basis for understanding a cell’s functional organization up to model building for predictive systems. [Paper] [Scholar]

From extreme programming & usability engineering to extreme usability in software engineering education (XP+UE > XU)

The success of extreme programming (XP) is based, among other things, on an optimal communication in teams of 6-12 persons, simplicity, frequent releases and a reaction to changing demands. Most of all, the customer is integrated into the development process, with constant feedback. This is very similar to usability engineering (UE) which follows a spiral four phase procedure model (analysis, draft, development, test) and a three step (paper mock-up, prototype, final product) production model. In comparison, these phases are extremely shortened in XP; also the ideal team size in UE user-centered development is 4-6 people, including the end-user. The two development approaches have different goals but, at the same time, employ similar methods to achieve them. It seems obvious that there must be synergy in combining them. The authors present ideas in how to combine them in an even more powerful development method called extreme usability (XU). The most important issue of this paper is that the authors have embedded their ideas into software engineering education. [Scholar]

Biomedical informatics: Discovering knowledge in big data

This book provides a broad overview of the topic Bioinformatics with focus on data, information and knowledge. From data acquisition and storage to visualization, ranging through privacy, regulatory and other practical and theoretical topics, the author touches several fundamental aspects of the innovative interface between Medical and Technology domains that is Biomedical Informatics. Each chapter starts by providing a useful inventory of definitions and commonly used acronyms for each topic and throughout the text, the reader finds several real-world examples, methodologies and ideas that complement the technical and theoretical background. This new edition includes new sections at the end of each chapter, called “future outlook and research avenues,” providing pointers to future challenges. At the beginning of each chapter a new section called “key problems”, has been added, where the author discusses possible traps and unsolvable or major problems. https://www.springer.com/de/book/9783319045276

Expectations of Artificial Intelligence for Pathology

Peter Regitnig, Heimo Mueller & Andreas Holzinger 2020. Expectations of Artificial Intelligence in Pathology. Springer Lecture Notes in Artificial Intelligence LNAI 12090. Cham: Springer, pp. 1-15, doi:10.1007/978-3-030-50402-1-1 [For students, pdf, 1,3 MB]

Within the last ten years, essential steps have been made to bring artificial intelligence (AI) successfully into the field of pathology. However, most medical experts are still far away from using AI in daily practice. This paper focuses on tasks, which could be solved, and which could be done better by AI, or image-based algorithms, compared to a human expert. In particular, this paper focuses on the needs and demands of surgical pathologists; examples include: Finding small tumour deposits within lymph nodes, detection and grading of cancer, quantification of positive tumour cells in immunohistochemistry, pre-check of Papanicolaoustained gynaecological cytology in cervical cancer screening, text feature extraction, text interpretation for tumour-coding error prevention and AI in the next-generation virtual autopsy.

Legal, regulatory, ethical frameworks for standards in artificial intelligence and autonomous robotic surgery

Shane O’Sullivan, Nathalie Nevejans, Colin Allen, Andrew Blyth, Simon Leonard, Ugo Pagallo, Katharina Holzinger, Andreas Holzinger, Mohammed Imran Sajid & Hutan Ashrafian 2019. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. The International Journal of Medical Robotics and Computer Assisted Surgery, 15, (1), 1-12, doi:10.1002/rcs.1968. We classify responsibility into (1) Accountability; (2) Liability; and (3) Culpability. All three aspects were addressed when discussing responsibility for AI and autonomous surgical robots, be these civil or military patients (however, these aspects may require revision in cases where robots become citizens). The component which produces the least clarity is Culpability, since it is unthinkable in the current state of technology. We envision that in the near future a surgical robot can learn and perform routine operative tasks that can then be supervised by a human surgeon. This represents a surgical parallel to autonomously driven vehicles. Here a human remains in the ‘driving seat’ as a ‘doctor‐in‐the‐loop’ thereby safeguarding patients undergoing operations that are supported by surgical machines with autonomous capabilities.

Analysis of biomedical data with multilevel glyphs

Heimo Müller, Robert Reihs, Kurt Zatloukal & Andreas Holzinger 2014. Analysis of biomedical data with multilevel glyphs. BMC Bioinformatics, 15, (Suppl 6), S5, doi:10.1186/1471-2105-15-S6-S5 – We present multilevel data glyphs optimized for interactive knowledge discovery and visualization of large biomedical data sets. Data glyphs are 3D objects defined by multiple levels of geometric descriptions (levels of detail) combined with a mapping of data attributes to graphical elements and methods, which specify their spatial position. In the data mapping phase meta information about the attributes (scale, number of distinct values) are compared with the visual capabilities of the graphical elements in order to give a feedback to the user about the correctness of the variable mapping. The spatial arrangement of glyphs is done in a dimetric view, which leads to high data density, a simplified 3D navigation and avoids perspective distortion. We show the usage of data glyphs in the disease analyser for personalized medicine. Data glyphs are successfully applied in the disease analyser. Especially the automatic validation of the data mapping, selection of subgroups within histograms and the visual comparison of the value distributions were seen by experts as an important functionality.

From Machine Learning to Explainable AI (reading for students)

Andreas Holzinger 2018. From Machine Learning to Explainable AI. 2018 World Symposium on Digital Intelligence for Systems and Machines (IEEE DISA). IEEE, pp. 55-66, doi:10.1109/DISA.2018.8490530. The success of statistical machine learning (ML) methods made the field of Artificial Intelligence (AI) so popular again, after the last AI winter. Meanwhile deep learning approaches even exceed human performance in particular tasks. However, such approaches have some disadvantages besides of needing big quality data, much computational power and engineering effort; those approaches are becoming increasingly opaque, and even if we understand the underlying mathematical principles of such models they still lack explicit declarative knowledge. For example, words are mapped to high-dimensional vectors, making them unintelligible to humans. What we need in the future are context-adaptive procedures, i.e. systems that construct contextual explanatory models for classes of real-world phenomena. This is the goal of explainable AI, which is not a new field; rather, the problem of explainability is as old as AI itself. While rule-based approaches of early AI were comprehensible “glass-box” approaches at least in narrow domains, their weakness was in dealing with uncertainties of the real world. Maybe one step further is in linking probabilistic learning methods with large knowledge representations (ontologies) and logical approaches, thus making results re-traceable, explainable and comprehensible on demand. [For my students]

On Graph Extraction from Image Data

Andreas Holzinger, Bernd Malle & Nicola Giuliani 2014. On Graph Extraction from Image Data. In: Slezak, Dominik, Peters, James F., Tan, Ah-Hwee & Schwabe, Lars (eds.) Brain Informatics and Health, BIH 2014, Lecture Notes in Artificial Intelligence, LNAI 8609. Heidelberg, Berlin: Springer, pp. 552-563, doi:10.1007/978-3-319-09891-3-50 A hot topic in AI/machine learning is to learn from graphs, particularly as graphs are a data structure which fosters explainability/causability. For any such approach one needs at first a relevant and robust representation from the image data. In this paper we present a novel approach for knowledge discovery by extracting graph structures from natural image data. For this purpose, we created a framework built upon modern Web technologies, utilizing HTML canvas and pure Javascript inside a Web-browser, which is a very promising engineering approach. This was the basis for our Graphinius project [Paper ]

The European Legal Framework for Medical AI

Schneeberger, D., Stöger, K. & Holzinger, A. The European Legal Framework for Medical AI. In: Springer Lecture Notes in Computer Science LNCS 12279, (2020) Cham. Springer International, doi:10.1007/978-3-030-57321-8_12. In Feb 2020, the EC published a White Paper on AI and report on the safety and liability implications of AI, the Internet of Things (IoT) and robotics. The EC highlighted the “European Approach” to AI, stressing that “it is vital that European AI is grounded in our values and fundamental rights such as human dignity and privacy protection”. It also announced its intention to propose EU legislation for “high risk” AI applications in the nearer future which will include the majority of medical AI applications. We analyse the current European framework regulating medical AI. Starting with the fundamental rights framework as clear guidelineswe are focusing on data protection, product approval procedures and liability law. This analysis of the current state of law, including its problems and ambiguities regarding AI, is complemented by an outlook at the proposed amendments to product approval procedures and liability law, which, by endorsing a human-centred approach, will influence how medical AI will be used in Europe in the future. [paper] [Scholar] [publons] 23

Current Advances, Trends and Challenges in Machine Learning and Knowledge Extraction

Andreas Holzinger, Peter Kieseberg, Edgar Weippl & A Min Tjoa (2018). Current Advances, Trends and Challenges of Machine Learning and Knowledge Extraction: From Machine Learning to Explainable AI. Springer Lecture Notes in Computer Science LNCS 11015. Cham: Springer, pp. 1-8, doi:10.1007/978-3-319-99740-7-1 In this editorial we present thoughts on future trends in AI generally, and ML specifically. Industry is investing heavily in AI, and spin-offs and start-ups are emerging on an unprecedented rate. The European Union is allocating a lot of additional funding into AI research grants, and various institutions are calling for a joint European AI research institute. Even universities are taking AI/ML into their curricula and strategic plans. Finally, even the people on the street talk about it, and if grandma knows what her grandson is doing in his new start-up, then the time is ripe: We are reaching a new AI spring. However, as fantastic current approaches seem to be, there are still huge problems to be solved: the best performing models lack transparency, hence are considered to be black boxes. The general and worldwide trends in privacy, data protection, safety and security make such black box solutions difficult to use in practice. Specifically in Europe, where the new General Data Protection Regulation (GDPR) came into effect on May, 28, 2018 which affects everybody (right of explanation). Consequently, a previous niche field for many years, explainable AI, explodes in importance. For the future, we envision a fruitful marriage between classic logical approaches (ontologies) with statistical approaches which may lead to context-adaptive systems (stochastic ontologies) that might work similar as our human brain.

The Ten Commandments of Ethical Medical AI

Mueller, H., Mayrhofer, M. T., Veen, E.-B. V. & Holzinger, A. 2021. The Ten Commandments of Ethical Medical AI. IEEE COMPUTER, 54, (7), 119–123, doi:10.1109/MC.2021.3074263. In this paper we propose ten commandments as practical guidelines for those applying artificial intelligence to provide a concise checklist to a wide group of stakeholders. The aim of the third United Nations (UN) Sustainable Development Goal, dedicated to “Good Health and Well-Being,” is that all people can access the health services they need without facing financial hardship. The goal has three targets: 1) 1 billion more people should benefit from universal health coverage, 2) 1 billion more people should be better protected from health emergencies, and 3) 1 billion more people should enjoy better health and well-being (World Health Organization, 2018).21 Artificial intelligence (AI) is generally acknowledged as an important component in achieving these three targets. [Scholar] [publons] 20

Digital Transformation for Sustainable Development Goals (SDGs) - A Security, Safety and Privacy Perspective on AI

Holzinger, A., Weippl, E., Tjoa, A. M. & Kieseberg, P. 2021. Digital Transformation for Sustainable Development Goals (SDGs) – a Security, Safety and Privacy Perspective on AI. Springer Lecture Notes in Computer Science, LNCS 12844. Cham: Springer, pp. 1-20, doi:10.1007/978-3-030-84060-0_1. The main driver of digital transformation is artificial intelligence (AI). The potential of AI to benefit humanity and its environment is enormous. AI can help find new solutions to the most pressing challenges in virtually all areas of life: from agriculture and forest ecosystems that affect our entire planet, to the health of every single human being. This article highlights a very different aspect: For all its benefits, the large-scale adoption of AI technologies also holds enormous and unimagined potential for new kinds of unforeseen threats. All stakeholders, governments, policy makers, industry, academia, must ensure that AI is developed with these potential threats in mind and that the safety, traceability, transparency, explainability, validity, and verifiability of AI applications in our everyday lives are ensured. It is the responsibility of all stakeholders to ensure the use of trustworthy AI. Achieving this will require a concerted effort to ensure that AI is always consistent with human values and includes a future that is safe in every way for all people on this planet. In this paper, we describe some of these threats and show that safety, security and explainability are indispensable cross-cutting issues and highlight this with two exemplary selected application areas: smart agriculture and smart health. [Scholar] [publons] 12

Legal aspects of data cleansing in medical AI

Data quality is of paramount importance for the smooth functioning of modern data-driven AI applications with machine learning as a core technology. This is also true for medical AI, where malfunctions due to “dirty data” can have particularly dramatic harmful implications. Consequently, data cleansing is an important part in improving the usability of (Big) Data for medical AI systems. However, it should not be overlooked that data cleansing can also have negative effects on data quality if not performed carefully. This paper takes an interdisciplinary look at some of the technical and legal challenges of data cleansing against the background of European medical device law, with the key message that technical and legal aspects must always be considered together in such a sensitive context. Stoeger, K., Schneeberger, D., Kieseberg, P. & Holzinger, A. (2021). Legal aspects of data cleansing in medical AI. Computer Law and Security Review, 42, 105587, doi:10.1016/j.clsr.2021.105587. [Scholar] [publons] 02

Network Module Detection from Multi-Modal Node Features with a Greedy Decision Forest for Actionable Explainable AI (AXAI)

Network-based algorithms are often used in real-world applications and are of great practical value. In this work, we demonstrate subnetwork detection based on multimodal node features using a new Greedy Decision Forest for better interpretability. The latter will be a crucial factor to gain the trust of human experts in the future. We show a concrete application example from bioinformatics and systems biology with a special focus on biomedicine. However, our methodological approach is applicable in many other fields as well. Systems biology is a very good example of a field where statistical data-driven machine learning enables the analysis of large amounts of multimodal biomedical data. This is important to achieve the future goal of precision applications (e.g., precision medicine), where complexity is modeled at the system level to, for example, optimally tailor decisions, health practices, and therapies to individual patients. Our glass-box approach could be revolutionary in uncovering disease-causing network modules from multi-omics data to better understand diseases such as cancer. https://arxiv.org/abs/2108.11674 [Project Page]

Medical Artificial Intelligence: The European Legal Perspective

Karl Stöger, David Schneeberger & Andreas Holzinger (2021). Medical Artificial Intelligence: The European Legal Perspective. Communications of the ACM, 64, (11), doi:10.1145/3458652 . Although the European Commission proposed new legislation for the use of “high-risk artificial intelligence” earlier this year, the existing European fundamental rights framework already provides some clear guidance on the use of medical AI. The European Commission has already published a white paper on artificial intelligence (AI) and an accompanying report on the security and liability implications of AI, the Internet of Things (IoT), and robotics. Here, the “European approach” to AI is highlighted, emphasizing that “it is crucial that European AI is based on human values and fundamental rights and privacy protection.” In April 2021, a proposal for a regulation entitled the Artificial Intelligence Act was presented. This regulation is intended to regulate the use of “high-risk” AI applications, which include those AI applications that affect human life in some way. [pdf preprint] [Scholar] [publons]

Robust, explainable, and trustworthy artificial intelligence

Holzinger, A., Dehmer, M., Emmert-Streib, F., Cucchiara, R., Augenstein, I., Del Ser, J., Samek, W., Jurisica, I. & Díaz-Rodríguez, N. 2022. Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Information Fusion, 79, (3), 263–278, doi:10.1016/j.inffus.2021.10.007. In this paper we argue that if we want to use AI to solve real-world problems outside the lab and in routine environments (beyond i.i.d. data) we need to integrate conceptual knowledge as a guiding model of reality so to help develop more robust, explainable, and less biased machine learning models that can ideally learn from less data. We argue that achieving these goals will require a coordinated joint effort that combines three complementary pioneering research areas: (1) complex netwoks (graphs) and their inference, (2) graphical causal models and counterfactual models, and (3) verification, interpretability and explainabilty methods. [Scholar] [publons] 07

Explainable AI Methods - A Brief Overview

Holzinger, A., Saranti, A., Molnar, C., Biececk, P. & Samek, W. 2022. Explainable AI Methods – A Brief Overview. XXAI – Lecture Notes in Artificial Intelligence LNAI 13200. Cham: Springer, pp. 13-38, doi:10.1007/978-3-031-04083-2_2 In this paper, we briefly introduce a few selected methods and discuss them in a short, clear and concise way. The goal of this article is to give beginners, especially application engineers and data scientists, a quick overview of the state of the art in the current topic of explainable AI (XAI). The following 17 methods are covered in this chapter: LIME, Anchors, GraphLIME, LRP, DTD, PDA, TCAV, XGNN, SHAP, ASV, Break-Down, Shapley Flow, Textual Explanations of Visual Models, Integrated Gradients, Causal Models, Meaningful Perturbations, and X-NeSyL. [Scholar] [publons] –

Toward Human-AI Interfaces to Support Explainability and Causability in Medical AI

Andreas Holzinger & Heimo Mueller (2021). Toward Human-AI Interfaces to Support Explainability and Causability in Medical AI. IEEE COMPUTER, 54, (10), 78-86, doi:10.1109/MC.2021.3092610. Our concept of causability is a measure of whether and to what extent humans can understand a given machine explanation. We motivate causability with a clinical case from cancer research. We argue for using causability in medical artificial intelligence (AI) to develop and evaluate future human–AI interfaces. In Figure 2, we outline a model for the information flow between humans and an AI system. On the interaction surface, which can be seen as a “border” between human intelligence and AI, the information flow is maximal. As one gradually goes “deeper” into the AI system, the information flow decreases; at the same time, the semantic richness (SR) of potential information objects increases. In traditional human–computer interactions, the information flow is extremely asymmetrical; that is, much more information is shown by high-resolution displays compared to mouse and/or textual input—not to mention other input modalities (see the dotted line in Figure 2). [WoS]

The explainability paradox: Challenges for xAI in digital pathology

Theodore Evans, Carl Orge Retzlaff, Christian Geißler, Michaela Kargl, Markus Plass, Heimo Müller, Tim-Rasmus Kiehl, Norman Zerbe & Andreas Holzinger (2022). The explainability paradox: Challenges for xAI in digital pathology. Future Generation Computer Systems, 133, (8), 281–296, doi:10.1016/j.future.2022.03.009. The increasing prevalence of digitised workflows in diagnostic pathology opens the door to life-saving applications of artificial intelligence (AI). Explainability is identified as a critical component for the safety, approval and acceptance of AI systems for clinical use. Despite the cross-disciplinary challenge of building explainable AI (xAI), very few application-and user-centric studies in this domain have been carried out. We conducted the first mixed-methods study of user interaction with samples of stateof-the-art AI explainability techniques for digital pathology. This study reveals challenging dilemmas faced by developers of xAI solutions for medicine and proposes empirically-backed principles for their safer and more effective design. [WoS]

Emotion Detection: Application of the Valence Arousal Space for Rapid Biological Usability Testing

Christian Stickel, Martin Ebner, Silke Steinbach-Nordmann, Gig Searle & Andreas Holzinger (2009). Emotion Detection: Application of the Valence Arousal Space for Rapid Biological Usability Testing to Enhance Universal Access. In: Stephanidis, Constantine (ed.) Universal Access in Human-Computer Interaction. Addressing Diversity, Lecture Notes in Computer Science, LNCS 5614. Berlin, Heidelberg: Springer, pp. 615–624, doi:10.1007/978-3-642-02707-9-70. Emotions are an important mental and physiological state that influences cognition, perception, learning, communication, decision making, etc. They are considered an important aspect of user experience (UX), even though they are not well developed and, most importantly, experimental evidence is not yet available. This contribution addresses an application for emotion detection in software usability testing. It describes the approach of using the valence arousal space for emotion modeling in a formal experiment. Our study showed correlations between low performance and negative emotional states. Reliable detection of emotions in usability tests will help to avoid negative emotions and attitudes in final products. This can be a great advantage to improve Universal Access. [Physiological Computing] [Scholar] [WoS]

AI for Life: Trends in Artificial Intelligence for Biotechnology

Holzinger, A., Keiblinger, K., Holub, P., Zatloukal, K. & Müller, H. 2023. AI for Life: Trends in Artificial Intelligence for Biotechnology. New Biotechnology, 74, (1), 16–24, doi:10.1016/j.nbt.2023.02.001. Due to popular successes (e.g., ChatGPT) Artificial Intelligence (AI) is on everyone’s lips today. When advances in biotechnology are combined with advances in AI unprecedented new potential solutions become available. This can help with many global problems and contribute to important Sustainability Development Goals. Current examples include Food Security, Health and Well-being, Clean Water, Clean Energy, Responsible Consumption and Production, Climate Action, Life below Water, or protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss. AI is ubiquitous in the life sciences today. Topics include a wide range from machine learning and Big Data analytics, knowledge discovery and data mining, biomedical ontologies, knowledge-based reasoning, natural language processing, decision support and reasoning under uncertainty, temporal and spatial representation and inference, and methodological aspects of explainable AI (XAI) with applications of biotechnology. [WoS]

Quod erat demonstrandum?-Towards a typology of the concept of explanation for the design of explainable AI

Cabitza, F., Campagner, A., Malgieri, G., Natali, C., Schneeberger, D., Stoeger, K. & Holzinger, A. 2023. Quod erat demonstrandum?-Towards a typology of the concept of explanation for the design of explainable AI. Expert Systems with Applications, 213, (3), 1–16, doi:10.1016/j.eswa.2022.118888. In this paper, we present a fundamental framework for defining different types of explanations of AI systems and the criteria for evaluating their quality. Starting from a structural view of how explanations can be constructed, i.e., in terms of an explanandum (what needs to be explained), multiple explanantia (explanations, clues, or parts of information that explain), and a relationship linking explanandum and explanantia, we propose an explanandum-based typology and point to other possible typologies based on how explanantia are presented and how they relate to explanandia. We also highlight two broad and complementary perspectives for defining possible quality criteria for assessing explainability: epistemological and psychological (cognitive). These definition attempts aim to support the three main functions that we believe should attract the interest and further research of XAI scholars: clear inventories, clear verification criteria, and clear validation methods. [WoS]

Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation

Carrington, A. M., Manuel, D. G., Fieguth, P. W., Ramsay, T., Osmani, V., Wernly, B., Benett, C., Hawken, S., Mcinnes, M., Magwood, O., Sheikh, Y. & Holzinger, A. 2023. Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, (1), 329–341, doi:10.1109/TPAMI.2022.3145392. Optimal performance is desired for decision-making in any field with binary classifiers and diagnostic tests, however common performance measures lack depth in information. The area under the receiver operating characteristic curve (AUC) and the area under the precision recall curve are too general because they evaluate all decision thresholds including unrealistic ones. Conversely, accuracy, sensitivity, specificity, positive predictive value and the F1 score are too specificthey are measured at a single threshold that is optimal for some instances, but not others, which is not equitable. In between both approaches, we propose deep ROC analysis to measure performance in multiple groups of predicted risk (like calibration), or groups of true positive rate or false positive rate. In each group, we measure the group AUC (properly), normalized group AUC, and averages of: sensitivity, specificity, positive and negative predictive value, and likelihood ratio positive and negative. The measurements can be compared between groups, to whole measures, to point measures and between models. We also provide a new interpretation of AUC in whole or part, as balanced average accuracy, relevant to individuals instead of pairs. We evaluate models in three case studies using our method and Python toolkit and confirm its utility. [WoS]

GNN-SubNet: disease subnetwork detection with explainable Graph Neural Networks.

Pfeifer, B., Saranti, A. & Holzinger, A. 2022. GNN-SubNet: disease subnetwork detection with explainable Graph Neural Networks. Bioinformatics, 38, (S-2), ii120-ii126, doi:10.1093/bioinformatics/btac478.

The tremendous success of graphical neural networks (GNNs) already had a major impact on systems biology research. For example, GNNs are currently being used for drug target recognition in protein–drug interaction networks, as well as for cancer gene discovery and more. Important aspects whose practical relevance is often underestimated are comprehensibility, interpretability and explainability.
In this work, we present a novel graph-based deep learning framework for disease subnetwork detection via explainable GNNs. Each patient is represented by the topology of a protein–protein interaction (PPI) network, and the nodes are enriched with multi-omics features from gene expression and DNA methylation. In addition, we propose a modification of the GNNexplainer that provides model-wide explanations for improved disease subnetwork detection.
Availability and implementation. The proposed methods and tools are implemented in the GNN-SubNet Python package, which we have made available on our GitHub for the international research community [WoS]

Exploring artificial intelligence for applications of drones in forest ecology and management

This paper highlights the significance of Artificial Intelligence (AI) in the realm of drone applications in forestry. Drones have revolutionized various forest operations, and their role in mapping, monitoring, and inventory procedures is explored comprehensively. Leveraging advanced imaging technologies and data processing techniques, drones enable real-time tracking of changes in forested landscapes, facilitating effective monitoring of threats such as fire outbreaks and pest infestations. They expedite forest inventory by swiftly surveying large areas, providing precise data on tree species identification, size estimation, and health assessment, thus supporting informed decision-making and sustainable forest management practices. Moreover, drones contribute to tree planting, pruning, and harvesting, while monitoring reforestation efforts in real-time. Wildlife monitoring is also enhanced, aiding in the identification of conservation concerns and informing targeted conservation strategies. Drones offer a safer and more efficient alternative in search and rescue operations within dense forests, reducing response time and improving outcomes. Additionally, drones equipped with thermal cameras enable early detection of wildfires, enabling timely response, mitigation, and preservation efforts. The integration of AI and drones holds immense potential for enhancing forestry practices and contributing to sustainable land management. In the future explainable AI (XAI) improves trust and safety by providing transparency in decision-making, aiding in liability issues, and enabling precise operations. XAI facilitates better environmental monitoring and impact analysis, contributing to efficient forest management and preservation efforts. If a drone’s AI can explain its actions, it will be easier to understand why it chose a particular path or action, which could inform safety procedures and improvements.

Sensors for Digital Transformation in Smart Forestry

In our paper we envision a framework where sensors of all kind interact with other sensors, but also interact with robots and drones and with the human-in-the-loop who brings in the Hausverstand. We explore the role of artificial intelligence in smart forestry, emphasizing the importance of high-quality, sensor-derived data for effective AI deployment. We discuss the challenges of data acquisition in complex forest environments and argue for a human-in-the-loop approach to enhance adaptability and effectiveness. We highlight the integration of autonomous robotic systems as mobile sensor hubs, enabling efficient data collection and processing. Central to our contribution is a universal sensor platform, through which we demonstrate that careful sensor selection and robust initial data generation are critical to advancing digital transformation in forestry. Read the paper https://doi.org/10.3390/s24030798 and listen to this podcast: “Decoding the Digital Forest: How Sensors and AI Are Rewriting the Rules of Sustainability” below [WoS]

Scholar, DBLP, ORCID, SCI

Integrating Belief-Desire-Intention agents with large language models for reliable human–robot interaction and explainable Artificial Intelligence (2025)

LLM in the Loop: A Framework for Contextualizing Counterfactual Segment Perturbations in Point Clouds (2025)

Enhancing trust in automated 3D point cloud data interpretation through explainable counterfactuals (2025)

On the disagreement problem in Human-in-the-Loop federated machine learning (2025)

Tree smoothing: Post-hoc regularization of tree ensembles for interpretable machine learning (2025)

Fine-tuning language model embeddings to reveal domain knowledge: An explainable artificial intelligence perspective on medical decision making (2025)

Class imbalance in multi-resident activity recognition: an evaluative study on explainability of deep learning approaches (2025)

NiaAML: AutoML for classification and regression pipelines (2025)

Post-hoc vs ante-hoc explanations: xAI design guidelines for data scientists (2024) Highly-Cited Paper in Journal Cognitive Systems Research

Human-in-the-Loop Reinforcement Learning (2024) Highly-Cited Paper in Journal of Artificial Intelligence Research (JAIR)

On generating trustworthy counterfactual explanations

The next frontier : AI We Can Really Trust

Digital Transformation in Smart Farm and Forest Operations Needs Human-Centered AI

Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI

Classification by ordinal sums of conjunctive & disjunctive functions for explainable AI & interpretable machine learning

KANDINSKYPatterns - An experimental exploration environment for Pattern Analysis and Machine Intelligence

Artificial Intelligence and Machine Learning for Digital Pathology

Measuring the Quality of Explanations: The Systems Causability Scale (SCS). Comparing Human and Machine Explanations.

Causability and Explainability of Artificial Intelligence in Medicine

KANDINSKY Patterns: A Swiss-Knife for the Study of Explainable AI

A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

KANDINSKY Patterns as Intelligence Test for machines

Dialogue Systems for Intelligent Human Computer Interactions

The first publication on our KANDINSKY Universe, the experimental environment for explainability and causability

Interactive machine learning: experimental evidence for the human in the algorithmic loop: A case study on Ant Colony Optimization

Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics

From Computer Innovation to Human Integration: Current Trends and Challenges for Pervasive Health Technologies

Rapid prototyping for a Virtual Medical Campus interface

Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data

Human Activity Recognition Using Recurrent Neural Networks

Augmenting Statistical Data Dissemination by Short Quantified Sentences of Natural Language

Computational approaches for mining user’s opinions on the Web 2.0

Explainable AI: The New 42?

Integrated web visualizations for protein-protein interaction databases

Interpretierbare KI: Neue Methoden zeigen Entscheidungswege künstlicher Intelligenz auf

Explainable AI

Emotion Detection: Application of the Valence Arousal Space for Rapid Biological Usability Testing

Human Annotated Dialogues Dataset for Natural Conversational Agents

Convolutional and Recurrent Neural Networks for Activity Recognition in Smart Environment

Towards a Deeper Understanding of How a Pathologist Makes a Diagnosis

NLP for the Generation of Training Data Sets for Ontology-Guided Weakly-Supervised Machine Learning in Digital Pathology

In silico modeling for tumor growth visualization

In silico cancer research towards 3R

From extreme programming & usability engineering to extreme usability in software engineering education (XP+UE > XU)

Biomedical informatics: Discovering knowledge in big data

Expectations of Artificial Intelligence for Pathology

Legal, regulatory, ethical frameworks for standards in artificial intelligence and autonomous robotic surgery

Analysis of biomedical data with multilevel glyphs

From Machine Learning to Explainable AI (reading for students)

On Graph Extraction from Image Data

The European Legal Framework for Medical AI

Current Advances, Trends and Challenges in Machine Learning and Knowledge Extraction

The Ten Commandments of Ethical Medical AI

Digital Transformation for Sustainable Development Goals (SDGs) - A Security, Safety and Privacy Perspective on AI

Legal aspects of data cleansing in medical AI

Network Module Detection from Multi-Modal Node Features with a Greedy Decision Forest for Actionable Explainable AI (AXAI)

Medical Artificial Intelligence: The European Legal Perspective

Robust, explainable, and trustworthy artificial intelligence

Explainable AI Methods - A Brief Overview

Toward Human-AI Interfaces to Support Explainability and Causability in Medical AI

The explainability paradox: Challenges for xAI in digital pathology

Emotion Detection: Application of the Valence Arousal Space for Rapid Biological Usability Testing

AI for Life: Trends in Artificial Intelligence for Biotechnology

Quod erat demonstrandum?-Towards a typology of the concept of explanation for the design of explainable AI

Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation

GNN-SubNet: disease subnetwork detection with explainable Graph Neural Networks.

Exploring artificial intelligence for applications of drones in forest ecology and management

Sensors for Digital Transformation in Smart Forestry