• Skip to content  (Accesskey: 1)
  • Skip to navigation  (Accesskey: 2)
  • Skip to search  (Accesskey: 7)
Close page navigation
DE
Open page navigation
  • TU Wien
    • Overview
    • News
    • fuTUre fit
    • About TU Wien
    • Organisation
    • A university for all
    • Working at TUW
    • TUW Community
    • Campus
    • Contact
  • Studies
    • Overview
    • Studies
    • Prospective Students
    • New Students
    • Students
    • Studying Internationally
    • Teaching Staff , opens an external URL in a new window
    • Pupils
    • ÖH Elections 2025
    • Best Teaching Awards 2025
  • Research
    • Overview
    • News
    • Events
    • Profile
    • Facilities
    • Successes
    • Networks
    • TUW Doctoral School
    • RTI support
    • Funding opportunities
    • Databases
  • Partnerships
    • Overview
    • Inventions, Patents, Commercialization
    • Fundraising
    • Start-ups
    • Technology Offers
    • Industry Relations
    • Center for Technology and Society , opens an external URL in a new window
    • University Alliances
    • TU Austria , opens an external URL in a new window
    • EULIST
  • Services
    • Overview
    • Library
    • Campus IT-Services
    • Campus services
    • Eventmanagement
    • Media
    • Reporting system
    • Newsletter
  • Internal
    • Overview
    • Portal (TISS, SAP, TYPO3,...) , opens an external URL in a new window
CSTAT
  1. CSTAT /
  2. Projects /

Projects

01. April 2024

DGBIAS – Detecting gender bias in children's books

Aladdin and the Wonderful Lamp

Principal Investigator: Laura Vana-Gür (TU Wien)

Team:Camilla Damian (VU Amsterdam), Laura Vana-Gür (TU Wien)

Program / Call: FWF 1000 Ideas Programme, opens an external URL in a new window

Start: 01 December 2021, duration: 24 months

 

Aim

Gender stereotypes form early in the child's development and are carried over throughout adolescence into adulthood, leaving long-lasting effects which may impact activity and career choices, as well as academic performance. Books, in particular, can have considerable influence, as their characters serve to shape role models of femininity and masculinity for young children. Thus, gender under- and misrepresentation in children's textual literature can contribute to the internalization and reinforcement of negative stereotypes. To address this issue, we aim to identify and measure relevant dimensions of gender bias in children's books with the aid of both qualitative and quantitative techniques: systematic literature review across disciplines, synthesis and (expert) validation on the one hand and state-of-the-art NLP methods on the other. By exploiting such an integrated research framework, we believe that we can automate the detection of potentially biased text while enhancing the interpretability and transparency of the results.

 

What is gender bias and what are its relevant dimensions?

The design of an unified, measurable concept of gender bias is an ambitious task, mostly due to the complexity of the concept to be measured.  There is no real consensus on how exactly to measure gender bias and gender stereotypes and on which factors or aspects are relevant when building such a measure.

To identify dimensions along which gender bias should be measured,  we made use of specific recommendations from the relevant qualitative literature about detecting gender bias and stereotypes in children’s literature. Important topics here are (see e.g., Railsback, 1993; Narahara, 1998; Turner-Bowker, 1996; Wollman-Bonilla, 1998; McCabe et al., 2011; Pownall et al., 2023):

  • Representation
    •  What is the gender of the main character?
    •  Is there a balanced number of female and male characters?
    •  Are the female characters playing an integral part? i.e., are they speaking a lot, how central or influential are they to the story?
  • Stereotypes in characters
    • Active vs. dependent roles: E.g., Who are the breadwinners in the family? Who takes on leadership roles?
    • Professions: Are technical careers more likely to be pursued by the male characters? Are the female characters taking on more homemaking/childcare roles while males work outside of the home?
    • Arts vs. science: are boys presented as being more predisposed to science and girls more to arts?
    • Positive vs. negative descriptions
    • Physical abilities vs. features: are male characters predominantly portrayed as being strong and good in sports? On the other hand, are females more often characterized by their looks?
  • Stereotypical language
    • Does the text use shared and generic pronouns (such as they or one) when possible?
    • Does the text use generic and not gender-specific nouns (e.g., police officers rather than policemen)?
    • Does the text contain cliches?
    • Does the text use titles appropriately (e.g., Dr. Lucy Surgeon vs. woman doctor Lucy)?

In order to enhance the literature review, we have applied topic modeling techniques (reference)  on a larger corpus of psychological, sociological and pedagogical research related to gender bias in children’s development more generally. The results of this quantitative analysis confirmed that, first of all, characters in books and media have an impact on children’s perception of themselves and the world. The relevant topics emerging in this exercise relate again to leadership bias, mathematics- and STEM-gender stereotype as well as the gender brilliance stereotype. Another interesting finding is the fact that the adoption of care-givers and educators of these stereotypes amplifies the perception of children of their own traits and abilities, leading to the so-called “self-fulfilling prophecies” (e.g., if girls are not encouraged to pursue technical interests, they come to believe they are less capable than boys, which in turn makes them not want to pursue such interests).

Moreover, intersectionality is also a relevant topic in the literature on the development of representations in children. The intersection of race and gender is particularly important in this context. Gender also intersects with other dimensions such as medical conditions. For example, diagnosis of developmental disorders associated with learning problems, such as ADHD or autism, are made more often for boys than girls (Guy et al, 2022). While these aspects are outside of the scope of the current project, we highlight their relevance for future research.  

How can the aspects identified in the previous section be measured?

Review of papers in the field of natural language processing

We performed a literature review of papers published on ArXiv, opens an external URL in a new window, an online repository for pre-prints, to identify what natural language processing (NLP) tools can be employed in order to extract the previously identified dimensions from unstructured text. We queried abstracts in the field of computer science, computation and language, computation and society published after 2014.  A total of 295 papers were deemed to be relevant (as of August 28, 2023). Papers deemed useful were read and a review data set was built.

Character extraction

A key task in proceeding with the analysis is character extraction, which is not a trivial task. Named entity recognition (NER) is an NLP task which seeks to locate and classify named entities mentioned in text. The algorithms for performing this task have been pre-trained on corpora of news and do not apply well to books in general and children's books in particular.

For this purpose, we trained a custom NER on some of the stories in Gutenberg and also manually encoded the main characters and their gender based on plot summaries on Wikipedia.

Computing the measures

Once the characters are extracted and their gender is known, representation measures such as proportions of female vs male characters can easily be computed.

But how important are the different characters to the story? For this purpose, we built a network of characters by comparing how many times they are mentioned in close proximity to each other. Based on this network we can compute measures of centrality and betweenness for each character.

For the stereotypes related to professions, appearance and intelligence we create dictionaries for given categories and for gendered words such a woman, man, queen, king etc. This allows us to measure how a target word relates to gendered words by means of their co-occurrence in a context window.

Final model

Finally, we build a gender response from word embeddings obtained from large language models, by assuming that these word embeddings do indeed encode gender bias (Bolukbasi et al., 2016). We then build a regression model using our computed measures to explain this gender response and to learn the weights each measure should get in the construction of a transparent score.

References                         

  • BOLUKBASI, Tolga, et al. (2016). “Man is to computer programmer as woman is to homemaker? debiasing word embeddings”. Advances in neural information processing systems, 29. Jg. https://proceedings.neurips.cc/paper_files/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf
  • BLEI David, et al. (2003b). “Latent Dirichlet Allocation.” Journal of Machine Learning Research, 3, 993–1022. https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
  • GUY, Jacalyn, et al. (2022) “Dimensions of cognition, behavior, and mental health in struggling learners: A spotlight on girls”. JCPP advances, 2. Jg., Nr. 4, S. e12082. https://acamh.onlinelibrary.wiley.com/doi/10.1002/jcv2.12082
  • MCCABE, Janice, et al. (2011). “Gender in twentieth-century children’s books: Patterns of Disparity in Titles and Central Characters. Gender and Society, 25(2), 197–226.http://www.jstor.org/stable/23044136
  • NARAHARA, May M. (1998). “Gender Stereotypes in Children's Picture Books”. https://eric.ed.gov/?id=ED419248
  • POWNALL, Madeleine; HEFLICK, Nathan (2023). “Mr. Active and Little Miss Passive? The Transmission and Existence of Gender Stereotypes in Children’s Books”. Sex Roles, 2023, 89. Jg., Nr. 11, S. 758-773.https://link.springer.com/article/10.1007/s11199-023-01409-2
  • RAILSBACK, Diane Estelle (1993). “Reading for equality: An examination of gender-bias in children's literature”.https://scholarworks.lib.csusb.edu/cgi/viewcontent.cgi?article=1680&context=etd-project
  • TURNER-BOWKER, Diane M. (1996) “Gender stereotyped descriptors in children's picture books: Does “Curious Jane” exist in the literature?”. Sex roles, 35. Jg., S. 461-488.https://digitalcommons.uri.edu/theses/1585/
  • WOLLMAN-BONILLA, J. E. (1998). “Outrageous Viewpoints: Teachers’ Criteria for Rejecting Works of Children’s Literature”. Language Arts, 75(4), 287–295.http://www.jstor.org/stable/41962063
Skip to footer

TU Wien

  • News
  • fuTUre fit
  • About TU Wien
  • Organisation
  • Corona
  • A university for all
  • Working at TUW
  • TUW Community
  • Campus
  • Contact

Studies

  • News
  • Studies
  • Admission
  • Studying at TU Wien
  • Student Support
  • Teaching at TU Wien
  • International
  • Pupils
  • Continuing Education
  • ÖH Elections 2025
  • Best Teaching Awards 2025

Research

  • News
  • Events
  • Profile
  • Facilities
  • Successes
  • Networks
  • TUW Doctoral School
  • RTI support
  • Funding opportunities
  • Databases

Partnerships

  • Inventions, Patents, Commercialization
  • Fundraising
  • Start-ups
  • Technology Offers
  • Industry Relations
  • Center for Technology and Society, opens an external URL in a new window
  • University Alliances
  • TU Austria, opens an external URL in a new window
  • EULIST

Services

  • Library
  • Campus IT-Services
  • Campus services
  • Eventmanagement
  • Media
  • Reporting system
  • Newsletter

Internal

  • Portal (TISS, SAP, TYPO3,...), opens an external URL in a new window

© TU Wien  # 41577

  • Legal notice
  • Accessibility Declaration
  • Data Protection Declaration (PDF)
  • Cookie settings
  • Top menu level E105-06-Research Unit of Computational Statistics
  • Back to: E105-06-Research Unit of Computational Statistics
  • Projects
  • Facebook
  • LinkedIn
  • YouTube
  • Instagram
  • Bluesky

About Cookies and other techniques

Our website uses cookies and integrates content from third-party providers to ensure you get the best experience on our website, for analytical purposes, to provide social media features, and for targeted advertising. This it is necessary in order to pass information on to respective service providers. If you would like additional information about cookies and content from third-party providers on this website, please see our Data protection declaration.

Mandatory

These cookies are required to help our website run smoothly.

Name Purpose Lifetime Type Provider
CookieConsent Saves your settings for the use of cookies on this website. 1 year HTML Homepage TU Wien
SimpleSAML This is needed to distinguish between the sessions of the logged-in users. session HTTP Login TU Wien
SimpleSAMLAuthToken This is needed to distinguish between the sessions of the logged-in users. session HTTP Login TU Wien
fe_typo_user Is needed so that in case of a Typo3 frontend login the session ID is recognized to grant access to protected areas. session HTTP Homepage TU Wien
staticfilecache Is needed to optimize the delivery time of the website. session HTTP Homepage TU Wien
JESSIONSID Is needed so that in case of a LectureTube the session ID is recognized to grant access to protected areas. session HTTP LectureTube TU Wien
_shibsession_lecturetube This is needed to distinguish between the sessions of the logged-in users. session HTTP LectureTube TU Wien
Web statistics

These cookies help us to continuously improve our services and adapt our website to your needs. We statistically evaluate the pseudonymized data collected from our website.

Name Purpose Lifetime Type Provider
_pk_id Used to store a few details about the user such as the unique visitor ID. 13 months HTML Matomo TU Wien
_pk_ref Is used to store the information of the users home website. 6 months HTML Matomo TU Wien
_pk_ses Is needed to store temporary data of the visit. 30 minutes HTML Matomo TU Wien
nmstat Is used to record the behaviour on the website. It is used to collect statistics about website usage, such as when the visitor last visited the website. The cookie does not contain any personal data and is only used for website analysis. 1000 days HTML Siteimprove
siteimproveses Is used to track the sequence of pages that a visitor views during his/her visit to the website. The cookie does not contain any personal data and is used solely for website analysis. session HTTP Siteimprove
AWSELB Always occurs in pairs with siteimproveses (for load balancing on the provider server) session HTTP Siteimprove
Marketing

With the help of these cookies and third-party content we strive to improve our offer for our users. By means of anonymized data of website users we can optimize the user flow. This enables us to improve ads and website content.

Name Purpose Lifetime Type Provider
_ga Is needed to distinguish the sessions of the users from each other. persistent HTTP Google Analytics
_gali Is needed to determine which links are clicked on a page. expires immediately HTTP Google Analytics
_gat This is a function-related cookie, whose tasks may differ. 2 years HTTP Google Analytics
_gid Is needed to distinguish users and create statistics. 24 hours HTTP Google Analytics
_gads Required to enable websites to display advertising from Google, including personalized advertising. 13 months HTTP Google Analytics
_gac_ Required by advertisers to measure user activity and the performance of their advertising campaigns. 90 days HTTP Google Analytics
_gcl_ Required by advertisers to determine how often users who click on their ads end up taking an action on their website. 90 days HTTP Google Analytics
_gcl_au Contains a randomly generated user ID. 90 days HTTP Google
_gcl_aw Is set when users click on a Google ad on the website and contains information about which ad was clicked. 90 days HTTP Google
__utma Is used to record visits and visitors. 2 years HTTP Google Analytics
__utmb Is used to detect new visits. 30 minutes HTTP Google Analytics
__utmc Is used in connection with __utmb to determine whether it is a new (recent) visit. session HTTP Google Analytics
__utmd Is used to store and track visitor journeys through the site and classifies them into groups (marketing/tracking). 1 second HTTP Google Analytics
__utmt Is needed to limit the query rate on Google Analytics. 10 minutes HTTP Google Analytics
__utmz Is needed to determine from which source/campaign visitors come. 6 months HTTP Google Analytics
__utmvc Is needed to collect information about user behavior on multiple websites. This information is used to optimize the relevance of advertising on the website. 24 hours HTTP Google AdSense
utm_source Is needed to tag URLs with parameters to identify the campaigns that forward traffic. expires immediately HTTP Google Analytics
__utm.gif Is needed to save browser details. session HTTP Google Analytics
gtag Is needed to perform remarketing. 30 days HTTP Google AdSense
id Is needed to perform remarketing. 2 years HTTP Google AdWords
1P_JAR Is needed to optimize advertising, provide ads that are relevant to users, improve campaign performance reports, or prevent users from seeing the same ads more than once. 2 years HTTP Google
AID Is needed to activate targeted advertising. 2 years HTTP Google Analytics
ANID Is needed to display Google ads on non-Google websites. 2 years HTTP Google AdSense
APISID Unknown functionality 2 years HTTP Google Ads Optimization
AR Is needed to profile visitors' interests and display relevant ads on other websites. This cookie works by uniquely identifying your browser and device. 2 years HTTP Google AdSense
CONSENT Is needed to store the preferences of visitors and personalize advertising. persistent HTTP Google
DSID Is needed by DoubleClick for advertising displayed in various places on the web and used to store the preferences of users. 2 years HTTP Doubleclick
DV Is needed to store user preferences and other information. This includes, in particular, the preferred language, the number of search results to be displayed on the page, and the decision whether or not to activate the Google SafeSearch filter. 2 years HTTP Google
HSID Contains the Google account ID and the last login time of the user. 2 years HTTP Google
IDE Is needed by DoubleClick to record and report the actions of users on the website after viewing or clicking on one of the provider's ads, with the purpose of measuring the effectiveness of an advertisement and displaying targeted advertisements to users. 2 years HTTP Doubleclick
LOGIN_INFO Is used to store the credentials of users of Google services. 2 years HTTP Google
NID Is used to store information about user settings. 6 months HTTP Google
OTZ Is needed to link activities of visitors with other devices that are previously logged in via the Google account. In this way, advertising is tailored to different devices. 1 month HTTP Google
RUL Is needed by DoubleClick to determine whether advertising has been displayed correctly in order to make marketing activities more efficient. 1 year HTTP Doubleclick
SAPISID Is needed by YouTube to store user settings and to calculate user bandwidth. persistent HTTP Google
SEARCH_SAMESITE Enables servers to mitigate the risk of CSRF and information leakage attacks by specifying that a particular cookie may only be sent on requests originating from the same registerable domain. 6 months HTTP Google
SID Contains the Google account ID and the last login time of the user. 2 years HTTP Google
SIDCC Is needed to store information about user settings and information for Google Maps. 3 months HTTP Google
SSID Is needed to collect visitor information for videos hosted by YouTube on Google Maps integrated maps. persistent HTTP Google
__SECURE-1PAPISID Is needed for targeting purposes to create a profile of the interests of website visitors. 2 years HTTP Google
__SECURE-1PSID Is needed for targeting purposes to create a profile of the interests of website visitors. 2 years HTTP Google
__SECURE-3PAPISID Is needed for targeting purposes to create a profile of the interests of website visitors. 2 years HTTP Google
__SECURE-3PSID Is needed for targeting purposes to create a profile of the interests of website visitors. 2 years HTTP Google
__SECURE-3PSIDCC Is needed for targeting purposes to create a profile of the interests of website visitors. 2 years HTTP Google
__SECURE-APISID Is needed to profile the interests of website visitors in order to display relevant and personalized advertising through retargeting. 8 months HTTP Google
__SECURE-HSID Is needed to secure digitally signed and encrypted data from the unique Google ID and to store the last login time that Google uses to identify visitors, prevent fraudulent use of login data, and protect visitor data from unauthorized parties. This may also be used for targeting purposes to display relevant and personalized advertising content. 8 months HTTP Google
__SECURE-SSID Is needed to store information about how visitors use the site and about the ads they may have seen before visiting the site. Also used to customize ads on Google domains. 8 months HTTP Google
test_cookie Is set as a test to check whether the browser allows cookies to be set. Does not contain any identification features. 15 minutes HTTP Google
VISITOR_INFO1_LIVE Is needed by YouTube to store user settings and to calculate user bandwidth. 6 months HTTP Youtube
facebook Is used to Enable ad delivery or retargeting 90 days HTTP Meta (Facebook)
__fb_chat_plugin Is needed to store and track interactions (marketing/tracking). persistent HTTP Meta (Facebook)
_js_datr Is needed to save user settings. 2 years HTTP Meta (Facebook)
_fbc Is needed to save the last visit (marketing/tracking). 2 years HTTP Meta (Facebook)
fbm Is needed to store account data (marketing/tracking). 1 year HTTP Meta (Facebook)
xs Is needed to store a unique session ID (marketing/tracking). 1 year HTTP Meta (Facebook)
wd Is needed to log the screen resolution. 1 week HTTP Meta (Facebook)
fr Is needed to serve ads and measure and improve their relevance. 3 months HTTP Meta (Facebook)
act Is needed to store logged in users (marketing/tracking). 90 days HTTP Meta (Facebook)
_fbp Is needed to store and track visits to various websites (marketing/tracking). 3 months HTTP Meta (Facebook)
datr Is needed to identify the browser for security and website integrity purposes, including account recovery and identification of potentially compromised accounts. 2 years HTTP Meta (Facebook)
dpr Is used for analysis purposes. Technical parameters are logged (e.g. aspect ratio and dimensions of the screen) so that Facebook apps can be displayed correctly. 1 week HTTP Meta (Facebook)
sb Is needed to store browser details and security information of the Facebook account. 2 years HTTP Meta (Facebook)
dbln Is needed to store browser details and security information of the Facebook account. 2 years HTTP Meta (Facebook)
spin Is needed for promotional purposes and social campaign reporting. session HTTP Meta (Facebook)
presence Contains the "chat" status of logged in users. 1 month HTTP Meta (Facebook)
cppo Is needed for statistical purposes. 90 days HTTP Meta (Facebook)
locale Is needed to save the language settings. session HTTP Meta (Facebook)
pl Required for Facebook Pixel. 2 years HTTP Meta (Facebook)
lu Required for Facebook Pixel. 2 years HTTP Meta (Facebook)
c_user Required for Facebook Pixel. 3 months HTTP Meta (Facebook)
bcookie Is needed to store browser data (marketing/tracking). 2 years HTTP LinkedIn
li_oatml Is needed to identify LinkedIn members outside of LinkedIn for advertising and analytics purposes. 1 month HTTP LinkedIn
BizographicsOptOut Is needed to save privacy settings. 10 years HTTP LinkedIn
li_sugr Is needed to store browser data (marketing/tracking). 3 months HTTP LinkedIn
UserMatchHistory Is needed to provide advertising or retargeting (marketing/tracking). 30 days HTTP LinkedIn
linkedin_oauth_ Is needed to provide cross-page functionality. session HTTP LinkedIn
lidc Is needed to store performed actions on the website (marketing/tracking). 1 day HTTP LinkedIn
bscookie Is needed to store performed actions on the website (marketing/tracking). 2 years HTTP LinkedIn
X-LI-IDC Is needed to provide cross-page functionality (marketing/tracking). session HTTP LinkedIn
AnalyticsSyncHistory Stores the time when the user was synchronized with the "lms_analytics" cookie. 30 days HTTP LinkedIn
lms_ads Is needed to identify LinkedIn members outside of LinkedIn. 30 days HTTP LinkedIn
lms_analytics Is needed to identify LinkedIn members for analytics purposes. 30 days HTTP LinkedIn
li_fat_id Required for indirect member identification used for conversion tracking, retargeting and analytics. 30 days HTTP LinkedIn
U Is needed to identify the browser. 3 months HTTP LinkedIn
_guid Is needed to identify a LinkedIn member for advertising via Google Ads. 90 days HTTP LinkedIn