All news at TU Wien

18. November 2025

Beyond models

Félix Iglesias Vázquez on open code practices and the development of versatile algorithms for real-world data challenges.

The image shows a man working on a laptop. Next to him and in the background, graphic point clouds and lines can be seen. — Félix Iglesias Vázquez with network traffic captures

We meet Dr Félix Iglesias Vázquez at the Institute of Telecommunications on TU Wien’s Gußhaus campus, where he has been working for over 12 years. Surrounded by screens and servers, we dive into a conversation about data-centric research, open code practices for truly reproducible science, and the responsible use of AI and large language models.

Driving a network security working group with a background in electrical engineering, data analysis, and machine learning, Félix focuses on developing innovative methodologies and algorithms to detect anomalies across complex datasets from various research fields.

Theory meets application

“From the theoretical part to the application – maybe it's too much to be everywhere. But if you are working on theory or methodology, it's very important to always have the application in mind – trying to solve a real problem. Not just play with mathematics.”

We discuss the intricacies of developing new theoretical approaches and designing models for data analysis while staying grounded in real-world applications. This balance is especially important in domains dealing with highly personalised data, such as medical research, where measurement instruments, laboratory artefacts, and patient records play a critical role in providing context for analysis. Similar challenges arise in network security, where researchers face the persistent challenge of obtaining high-quality, well-documented datasets while respecting privacy, security, and data anonymisation requirements.

Mapping anomalies

“I'm going to use the tools and models that have been published for anomaly detection, but then you realise that they don't work in your domain. And you start asking yourself, why? Why aren't these tools working?”

One key insight is that anomaly detection must align with the nature of the data and the research context. In some domains, anomalies manifest as compact, dense clusters instead of isolated outliers, which requires tailored detection strategies. This calls for a data-centric approach and a broader perspective on anomalies – not just isolated points, but also novelties or patterns that don’t fit predefined norms.

Data-centric methodologies

“When I use a dataset, I want to know exactly which application it serves, which problem it addresses, how it is used, and what the metadata tells me – especially about the labelling. […] Then you need to rethink the concept of anomaly. Because an anomaly is not only an isolated point in the space, it can also be a novelty. Right? Something that doesn't fit a predefined normality – you have to change your perspective.”

Dr Iglesias emphasises the importance of findable metadata and thorough documentation, explaining that poor data quality and reliance on synthetic benchmark datasets – unrepresentative of real communications – have slowed progress in the field. He has shifted to data-centric approaches, which prioritise the quality, relevance, and context of datasets over merely refining algorithms. Since model-centric tools often fail in new domains due to mismatched data and intended applications, his group focuses on robust, adaptable methods and well-matched datasets, strategically developing versatile algorithms and publishing code – such as anomaly detection systems SDO (Sparse Data Observers) and Go-flows – that work effectively across various domains.

Reproducible code publishing

“The ideal for code sharing is the fewer libraries you need, the better. The best solution that I found is just to dockerize everything – to publish the code and also a Docker version for reproducibility. In this Docker container, you can put the libraries, pin dependencies, and even define the operating system environment to create infrastructure independence. It's like its own ecosystem to run and reproduce the experiments.”

The emphasis is on becoming more mindful of adhering not only to the FAIR principles but also to field-specific standards and smart data models that make datasets more interpretable and useful for the community. To address reproducibility challenges caused by software dependencies and library versioning, he advocates for Docker containerization, ensuring that experiments can be reliably replicated over time.

When discussing recent advances in AI, Dr Iglesias acknowledges that large language models (LLMs) are powerful tools, especially when acting as agents that can test and interpret results in complex environments. As we discuss the broader evolution of data analysis, cautious optimism is voiced about the potential and risks of AI agents, particularly concerning error propagation and model degradation.

Look into the future

“The problem with machine learning and artificial intelligence is not that they fail – because they can fail. The problem is if they fail and we don't notice – this is a catastrophe, because in such architectures, it is quite likely that all systems fail in a similar way.“

According to Dr Iglesias, the real danger is not just random mistakes, but systematic, invisible failures that can propagate on a large scale. That’s why it’s especially important to have strong monitoring and careful checks by humans whose different backgrounds and experiences can offer context-aware evaluations of these technologies.

Félix further underscores the need for critical thinking, transparency, and adaptability in scientific practice, expressing the hope that his algorithms and educational influence will inspire his students to form their own opinions.

Contact

Dr Félix Iglesias Vázquez
Institute of Telecommunications
TU Wien
felix.iglesias@tuwien.ac.at

Center for Research Data Management
TU Wien
research.data@tuwien.ac.at

Name	Purpose	Lifetime	Type	Provider
CookieConsent	Saves your settings for the use of cookies on this website.	1 year	HTML	Homepage TU Wien
SimpleSAML	This is needed to distinguish between the sessions of the logged-in users.	session	HTTP	Login TU Wien
SimpleSAMLAuthToken	This is needed to distinguish between the sessions of the logged-in users.	session	HTTP	Login TU Wien
fe_typo_user	Is needed so that in case of a Typo3 frontend login the session ID is recognized to grant access to protected areas.	session	HTTP	Homepage TU Wien
staticfilecache	Is needed to optimize the delivery time of the website.	session	HTTP	Homepage TU Wien
JESSIONSID	Is needed so that in case of a LectureTube the session ID is recognized to grant access to protected areas.	session	HTTP	LectureTube TU Wien
_shibsession_lecturetube	This is needed to distinguish between the sessions of the logged-in users.	session	HTTP	LectureTube TU Wien

Name	Purpose	Lifetime	Type	Provider
_pk_id	Used to store a few details about the user such as the unique visitor ID.	13 months	HTML	Matomo TU Wien
_pk_ref	Is used to store the information of the users home website.	6 months	HTML	Matomo TU Wien
_pk_ses	Is needed to store temporary data of the visit.	30 minutes	HTML	Matomo TU Wien

Name	Purpose	Lifetime	Type	Provider
facebook	Is used to Enable ad delivery or retargeting	90 days	HTTP	Meta
__fb_chat_plugin	Is needed to store and track interactions (marketing/tracking).	persistent	HTTP	Meta
_js_datr	Is needed to save user settings.	2 years	HTTP	Meta
_fbc	Is needed to save the last visit (marketing/tracking).	2 years	HTTP	Meta
fbm	Is needed to store account data (marketing/tracking).	1 year	HTTP	Meta
xs	Is needed to store a unique session ID (marketing/tracking).	1 year	HTTP	Meta
wd	Is needed to log the screen resolution.	1 week	HTTP	Meta
fr	Is needed to serve ads and measure and improve their relevance.	3 months	HTTP	Meta
act	Is needed to store logged in users (marketing/tracking).	90 days	HTTP	Meta
_fbp	Is needed to store and track visits to various websites (marketing/tracking).	3 months	HTTP	Meta
datr	Is needed to identify the browser for security and website integrity purposes, including account recovery and identification of potentially compromised accounts.	2 years	HTTP	Meta
dpr	Is used for analysis purposes. Technical parameters are logged (e.g. aspect ratio and dimensions of the screen) so that Facebook apps can be displayed correctly.	1 week	HTTP	Meta
sb	Is needed to store browser details and security information of the Facebook account.	2 years	HTTP	Meta
dbln	Is needed to store browser details and security information of the Facebook account.	2 years	HTTP	Meta
spin	Is needed for promotional purposes and social campaign reporting.	session	HTTP	Meta
presence	Contains the "chat" status of logged in users.	1 month	HTTP	Meta
cppo	Is needed for statistical purposes.	90 days	HTTP	Meta
locale	Is needed to save the language settings.	session	HTTP	Meta
pl	Required for Facebook Pixel.	2 years	HTTP	Meta
lu	Required for Facebook Pixel.	2 years	HTTP	Meta
c_user	Required for Facebook Pixel.	3 months	HTTP	Meta
bcookie	Is needed to store browser data (marketing/tracking).	2 years	HTTP	LinkedIn
li_oatml	Is needed to identify LinkedIn members outside of LinkedIn for advertising and analytics purposes.	1 month	HTTP	LinkedIn
BizographicsOptOut	Is needed to save privacy settings.	10 years	HTTP	LinkedIn
li_sugr	Is needed to store browser data (marketing/tracking).	3 months	HTTP	LinkedIn
UserMatchHistory	Is needed to provide advertising or retargeting (marketing/tracking).	30 days	HTTP	LinkedIn
linkedin_oauth_	Is needed to provide cross-page functionality.	session	HTTP	LinkedIn
lidc	Is needed to store performed actions on the website (marketing/tracking).	1 day	HTTP	LinkedIn
bscookie	Is needed to store performed actions on the website (marketing/tracking).	2 years	HTTP	LinkedIn
X-LI-IDC	Is needed to provide cross-page functionality (marketing/tracking).	session	HTTP	LinkedIn
AnalyticsSyncHistory	Stores the time when the user was synchronized with the "lms_analytics" cookie.	30 days	HTTP	LinkedIn
lms_ads	Is needed to identify LinkedIn members outside of LinkedIn.	30 days	HTTP	LinkedIn
lms_analytics	Is needed to identify LinkedIn members for analytics purposes.	30 days	HTTP	LinkedIn
li_fat_id	Required for indirect member identification used for conversion tracking, retargeting and analytics.	30 days	HTTP	LinkedIn
U	Is needed to identify the browser.	3 months	HTTP	LinkedIn
_guid	Is needed to identify a LinkedIn member for advertising via Google Ads.	90 days	HTTP	LinkedIn