17. November 2022 · Reading Time: 4 Min.
Customer segmentation with AI for biotech company
From the single-celled bacterium to the highly complex human being: Fats (lipids) play an important role in all living organisms. The Saxon company Lipotype is the world’s leading provider for the analysis of lipids in biomedical research. The Lipotype Lipidomics technology, based on mass spectrometry, enables rapid and comprehensive identification of thousands of lipids per sample.
The international team of molecular biologists, biostatisticians, physicians, biochemists, mass spectrometry specialists and bioinformatics experts led by the company’s founder Prof. Dr. Kai Simons has set itself the task of contributing to a better understanding of life and health with the help of detailed access to lipid data.
As a provider of highly specialized technology, Lipotype wants to address the right target groups in product marketing. To do this, they want to get to know their customers better, identify their interests and expertise, and understand how customers use their products. The formation of customer segments is the basis for optimizing the following processes:
- Precisely tailor content and product marketing
- Adapt sales processes to target groups
- Draw conclusions for product development
- Generate new leads that correspond to the target groups and acquire them in a targeted manner
Lipotype’s customers and interested parties are active in science and publish papers in the bio-medical field. These publications represent a treasure trove of data for us, as they already contain valuable information about their customers, such as which research field they are active in, with whom they cooperate, which problems they are working on and how they use lipidomics for this purpose.
The basic idea is therefore to find and analyze publications of Lipotype customers, to identify their expertise and interests in order to better understand one’s own environment and to form customer segments.
To solve the task manually, it would be necessary to read the publications of all customers, to record them thematically and to evaluate them. The time required for this would be gigantic. With the means of AI, more precisely the subarea of Natural Language Processing, this process can be automated.
We trained an AI model over publications that builds relationships between authors and topics. The model analyzes the publications and captures their content, it establishes thematic similarities and differences between authors and can extract relevant topics.
Paper scraping and disambiguation of author names
In the first step, we identify the publications of authors who are already customers or interested parties of Lipotype. These can be obtained from public databases, but the difficulty here is that there is no clear mapping between names and publications. For example, there are very many publications by John Smith, but it is unlikely that they are all by the same author.
Therefore, we developed an author disambiguation model that determines the probability of association between names and authors based on the available information (name, affiliation, publication content etc.).
Training of the author-topic model
We have now compiled a pool of publications from Lipotype customers and prospects to use as a basis for model training.
The model learns representations (so-called embeddings) of the publications in which their content properties are encoded. It thus relates publications, authors and topics in one common space.
Define customer segments
With the help of these representations, clustering procedures can be used to divide the publications and authors into subject areas relevant to Lipotype. These clusters form the individual customer segments. This provides Lipotype with a qualitative and quantitative statement about which topics (specific types of cancer, treatment methods, etc.) play a role for which customers and can align its strategies accordingly.
Communications Officer at Lipotype
If you are not a data scientist yourself, it is difficult to express complex problems in marketing and sales in “data science” language. Unless you work with exdatis – then the right questions are asked. In the end, I knew my problem better than I did at the beginning, and it was solved!