News articles

DGBIAS – Detecting gender bias in children's books

Aladdin and the Wonderful Lamp

Principal Investigator: Laura Vana-Gür (TU Wien)

Team:Camilla Damian (VU Amsterdam), Laura Vana-Gür (TU Wien)

Program / Call: FWF 1000 Ideas Programme, opens an external URL in a new window

Start: 01 December 2021, duration: 24 months



Gender stereotypes form early in the child's development and are carried over throughout adolescence into adulthood, leaving long-lasting effects which may impact activity and career choices, as well as academic performance. Books, in particular, can have considerable influence, as their characters serve to shape role models of femininity and masculinity for young children. Thus, gender under- and misrepresentation in children's textual literature can contribute to the internalization and reinforcement of negative stereotypes. To address this issue, we aim to identify and measure relevant dimensions of gender bias in children's books with the aid of both qualitative and quantitative techniques: systematic literature review across disciplines, synthesis and (expert) validation on the one hand and state-of-the-art NLP methods on the other. By exploiting such an integrated research framework, we believe that we can automate the detection of potentially biased text while enhancing the interpretability and transparency of the results.


What is gender bias and what are its relevant dimensions?

The design of an unified, measurable concept of gender bias is an ambitious task, mostly due to the complexity of the concept to be measured.  There is no real consensus on how exactly to measure gender bias and gender stereotypes and on which factors or aspects are relevant when building such a measure.

To identify dimensions along which gender bias should be measured,  we made use of specific recommendations from the relevant qualitative literature about detecting gender bias and stereotypes in children’s literature. Important topics here are (see e.g., Railsback, 1993; Narahara, 1998; Turner-Bowker, 1996; Wollman-Bonilla, 1998; McCabe et al., 2011; Pownall et al., 2023):

  • Representation
    •  What is the gender of the main character?
    •  Is there a balanced number of female and male characters?
    •  Are the female characters playing an integral part? i.e., are they speaking a lot, how central or influential are they to the story?
  • Stereotypes in characters
    • Active vs. dependent roles: E.g., Who are the breadwinners in the family? Who takes on leadership roles?
    • Professions: Are technical careers more likely to be pursued by the male characters? Are the female characters taking on more homemaking/childcare roles while males work outside of the home?
    • Arts vs. science: are boys presented as being more predisposed to science and girls more to arts?
    • Positive vs. negative descriptions
    • Physical abilities vs. features: are male characters predominantly portrayed as being strong and good in sports? On the other hand, are females more often characterized by their looks?
  • Stereotypical language
    • Does the text use shared and generic pronouns (such as they or one) when possible?
    • Does the text use generic and not gender-specific nouns (e.g., police officers rather than policemen)?
    • Does the text contain cliches?
    • Does the text use titles appropriately (e.g., Dr. Lucy Surgeon vs. woman doctor Lucy)?

In order to enhance the literature review, we have applied topic modeling techniques (reference)  on a larger corpus of psychological, sociological and pedagogical research related to gender bias in children’s development more generally. The results of this quantitative analysis confirmed that, first of all, characters in books and media have an impact on children’s perception of themselves and the world. The relevant topics emerging in this exercise relate again to leadership bias, mathematics- and STEM-gender stereotype as well as the gender brilliance stereotype. Another interesting finding is the fact that the adoption of care-givers and educators of these stereotypes amplifies the perception of children of their own traits and abilities, leading to the so-called “self-fulfilling prophecies” (e.g., if girls are not encouraged to pursue technical interests, they come to believe they are less capable than boys, which in turn makes them not want to pursue such interests).

Moreover, intersectionality is also a relevant topic in the literature on the development of representations in children. The intersection of race and gender is particularly important in this context. Gender also intersects with other dimensions such as medical conditions. For example, diagnosis of developmental disorders associated with learning problems, such as ADHD or autism, are made more often for boys than girls (Guy et al, 2022). While these aspects are outside of the scope of the current project, we highlight their relevance for future research.  

How can the aspects identified in the previous section be measured?

Review of papers in the field of natural language processing

We performed a literature review of papers published on ArXiv, opens an external URL in a new window, an online repository for pre-prints, to identify what natural language processing (NLP) tools can be employed in order to extract the previously identified dimensions from unstructured text. We queried abstracts in the field of computer science, computation and language, computation and society published after 2014.  A total of 295 papers were deemed to be relevant (as of August 28, 2023). Papers deemed useful were read and a review data set was built.

Character extraction

A key task in proceeding with the analysis is character extraction, which is not a trivial task. Named entity recognition (NER) is an NLP task which seeks to locate and classify named entities mentioned in text. The algorithms for performing this task have been pre-trained on corpora of news and do not apply well to books in general and children's books in particular.

For this purpose, we trained a custom NER on some of the stories in Gutenberg and also manually encoded the main characters and their gender based on plot summaries on Wikipedia.

Computing the measures

Once the characters are extracted and their gender is known, representation measures such as proportions of female vs male characters can easily be computed.

But how important are the different characters to the story? For this purpose, we built a network of characters by comparing how many times they are mentioned in close proximity to each other. Based on this network we can compute measures of centrality and betweenness for each character.

For the stereotypes related to professions, appearance and intelligence we create dictionaries for given categories and for gendered words such a woman, man, queen, king etc. This allows us to measure how a target word relates to gendered words by means of their co-occurrence in a context window.

Final model

Finally, we build a gender response from word embeddings obtained from large language models, by assuming that these word embeddings do indeed encode gender bias (Bolukbasi et al., 2016). We then build a regression model using our computed measures to explain this gender response and to learn the weights each measure should get in the construction of a transparent score.