Diligent helpers in data analysis: How AI becomes transparent and reproducible
Diligent helpers in data analysis: How AI becomes transparent and reproducible
Interview with Prof. Dr. David B. Blumenthal, Biomedical Network Science Lab, Friedrich-Alexander University Erlangen-Nuremberg (FAU)
Huge amounts of data are generated in the laboratory every day, which would actually have to be laboriously analyzed by hand. This is where artificial intelligence (AI) comes into play as a perfect helper: Because it evaluates such data volumes much faster than humans ever could. The only problem with AI is that when it is developed, there is hardly any guideline or standard that makes AI systems comparable with each other.
Prof. David B. Blumenthal
Prof. David Blumenthal and his team also encountered these hurdles when they were conducting their own research in AI. As a result, they developed AIMe: The tool aims to make AI reproducible and more transparent. In an interview with MEDICA-tradefair.com, Blumenthal explains how the tool works, how it always stays up to date and what possibilities Blumenthal sees for AI in the field of biomolecular research.
What goal are you pursuing with AIMe?
David Blumenthal: We want to help researchers and developers create transparent reports on biomedical AI systems, in which meta-parameters, validation strategy and data in particular are described in detail. The goal here is that transparency and reproducibility of such systems are increased, so that then in turn trust in such systems is strengthened – for example, by people who might use such systems in clinical practice.
What use cases is AIMe intended for?
Blumenthal: AIMe is a generic standard that can in principle be used for all biomedical AI systems. Since we ourselves come from the field of molecular research, we had corresponding use cases whose starting point was molecular data. We had this background in mind when developing the AIMe standard, but ultimately AIMe can be used for any biomedical AI application.
And how exactly does AIMe work?
Blumenthal: On the one hand, there is the AIMe standard: the information standard consists of about 25 questions divided into different sections. Five questions each deal with a section: meta-data, purpose - that means, what is the goal of the AI; and data set – the data sets that are to be used are to be described there. In addition, there are questions about the method – that means which AI methods were used, how were the hyper-parameters set, how was the whole thing validated; and finally, questions about reproducibility. This is about being able to trace the AI and its development as best as possible: Where is the source code available, where are the data sets available? Are there good tutorials that explain how the particular AI works?
These are the first five sections used to create the information standard.
The whole thing is then available in digital form: In the AIMe web service, interested parties get an online questionnaire asking these questions. After answering the questions, an AIMe entry is created, which is stored in our database and given a unique URL. The database is then searchable: If someone is interested in specific applications, he or she can use a keyword search to browse through the existing entries in this database.
The URL is also there to refer to the particular AI in more detail in research papers: For example, if someone is working on a paper, the researcher can explain that more detailed descriptions of the datasets, validation, strategy, and so on can be found in the paper's appendix. There, the URL can be provided so that more information about AI is available to the reader on our website.
Finally, there is the AIMe steering committee, which is responsible for, on the one hand, summarizing the feedback from users on the current standard and then updating it every year – based on the feedback. In addition, the committee is of course also responsible for hosting the website and thus covering the technical side.
Artificial intelligence can be a great help in data analysis. To make the various AI systems comparable, a team led by David Blumenthal has developed AIMe.
How did the development of AIMe come about?
Blumenthal: It started when we were annoyed that things were not reproducible: For example, if you want to develop a method yourself and use other papers as a guide, you realize pretty quickly that virtually half of the information on AI is missing and the application is therefore virtually incomparable. We thought: Either we get angry now, or we develop a tool ourselves to improve our situation. So we started thinking about what something like this could look like. At the TU Munich, we made a draft and sent it out via mailing lists and social media to get researchers from other institutions and other countries on board. In the end, there were a good 20 of us from all kinds of countries and we developed this standard together.
AI is a research field that is still very much changing. How do you make sure that the standard you have set up now doesn't become obsolete?
Blumenthal: We've built an update cycle into the initiative, so from January to September each year, people can provide feedback on the current standard through the website. Then from October to December, there is a consolidation phase where the AIMe steering commitee evaluates that feedback so that based on that, a new version comes out in January of the next year. Anyone who wants to can participate in this consolidation phase - this is not a closed area of people, but everyone who is interested is welcome to participate. The only prerequisite is: experience in the field of biomedicine or in AI research - of course, an interface of both would be best.
Products and exhibitors around IT
Find related exhibitors and products in the database of MEDICA 2021:
What strengths and weaknesses do you see for AI in relation to biomolecular research?
Blumenthal: The strengths are well known: There are huge amounts of data, especially molecular profiling data. There is a tremendous amount of information in it, but to evaluate it, you need automated approaches. Of course, an AI is ideally suited for this.
What's more interesting is what the challenges are. In my opinion, there are three main challenges: Reproducibility is currently a major problem. Many AI systems, or the results that are published, are simply not reproducible. We are trying to improve that.
The second is explainability. Many of the AI systems are basically black boxes: you put an input in and get a prediction out – but often it's not entirely clear why the prediction comes out that way. That's a problem, especially when it comes to clinically relevant decisions: why does a patient get treatment A and not treatment B? Because the AI only indicates that treatment A is 95 percent likely to be better than treatment B, but it doesn't give any clues as to why that is. So it's more difficult to then communicate and advocate that decision as a physician. To that extent, the challenge is to make this issue explainable. This has become an active field of research in recent years, and there are also many different approaches already. But the challenge is still there.
The third challenge is data: Biomedical data are very sensitive, because they are patient data, and that's why it's often difficult or impossible to get access to them without red tape. On the one hand, of course, it is necessary to maintain data protection. On the other hand, however, this leads to the fact that many researchers train with smaller data sets to which they have access. These data sets are often relatively homogeneous, which then leads to the fact that other patient groups that could perhaps also benefit from AI are not represented in the data sets. So data privacy is a bit of a dilemma for which good solutions still need to be found.
What do you think will be the role of AI in biomolecular research in the coming years?
Blumenthal: We are currently researching an attempt to mechanistically redefine complex diseases – for example, Alzheimer's or other neurological diseases. What do we mean by that? When these diseases are diagnosed, it happens not on the basis of a molecular mechanism, but on the basis of symptomatology. This is problematic because if you don't know the mechanistic basis, you can only treat the disease symptomatically. This is where we should set new goals for this field of research. Because we have to realize that we have a lot of data available and we also have ways to analyze it. So now we need to try to apply AI systems to these big data to end up with molecular biology-based mechanistic redefinitions of complex diseases. Because that will help us treat causally and not just symptomatically. I hope to see breakthroughs in the near future thanks to the use of AI.
More topic-related exciting news from the editors of MEDICA-tradefair.com: