Developing a New Research Method, Part 1: Photovoice, critical fabulation, and archives

by Prakash Krishnan

Prakash Krishnan is a 2024 Flickr Foundation Research Fellow, working to engage community organizations with the creative possibilities afforded through archival and photo research as well as to unearth and activate some of the rich histories embedded in the Flickr archive.

I had the wonderful opportunity to visit London and Flickr Foundation HQ during the month of May 2024. The first month of my fellowship was a busy one, getting settled in, meeting the team, and making contacts around the UK to share and develop my idea for a new qualitative research method that was inspired by my perusing of just a minuscule fraction of the billions of photos uploaded and visible on Flickr.com.

Unlike the brilliant and techno-inspired minds of my Flickr Foundation cohort: George, Alex, Ewa, and Eryk, my head is often drifting in the clouds (the ones in the actual sky) or deep in books, articles, and archives. Since rediscovering Flickr and contemplating its many potential uses, I have activated my past work as a researcher, artist, and cultural worker, to reflect upon the ways Flickr could be used to engage communities in various visual and digital ethnographies.

Stemming from anthropology and the social sciences more broadly, ethnography is a branch of qualitative research involving the study of cultures, communities, or organizations. A visual ethnography thereby employs visual methods, such as photography, film, drawing, or painting.. Similarly, digital ethnography refers to the ethnographic study of cultures and communities as they interact with digital and internet technologies.

In this first post, I will trace a nonlinear timeline of different community-based and academic research projects I have conducted in recent years. Important threads from each of these projects came together to form the basis of the new ethnographic method I have developed over the course of this fellowship, which I call Archivevoice

Visual representations of community

The research I conducted for my masters thesis was an example of a digital, visual ethnography. For a year, I observed Instagram accounts sharing curated South Asian visual media, analyzing the types of content they shared, the different media used, the platform affordances that were engaged with, the comments and discussions the posts incited, and how the posts reflected contemporary news, culture, and politics. I also interviewed five people whose content I had studied. Through this research I observed a strong presence of uniquely diasporic concerns and aesthetics. Many posts critiqued the idea of different nationhoods and national affiliations with the countries founded after the partition of India in 1947 – a violent division of the country resulting in mass displacement and human casualty whose effects are still felt today. Because of this violent displacement and with multiple generations of people descended from the Indian subcontinent living outside of their ancestral territory, among many within the community, I observed a rejection of nationalist identities specific to say India, Pakistan, or Bangladesh. Instead, people were using the term “South Asian” as a general catchall for communities living in the region as well as in the diaspora. Drawing from queer cultural theorist José Esteban Muñoz, I labelled this digital, cultural phenomenon I observed “digital disidentification.”[1] 

My explorations of community-based visual media predate this research. In 2022, I worked with the Montreal grassroots artist collective and studio, Cyber Love Hotel, to develop a digital archive and exhibition space for 3D-scanned artworks and cultural objects called Things+Time. In 2023, we hosted a several-week-long residency program with 10 local, racialized, and queer artists. The residents were trained on archival description and tagging principles, and then selected what to archive. The objects curated and scanned in the context of this residency were in response to the overarching theme loss during the Covid-19 pandemic, in which rampant closures of queer spaces, restaurants, nightlife, music venues, and other community gathering spaces were proliferating across the city.

During complete pandemic lockdown, while working as the manager for cultural mediation at the contemporary gallery Centre CLARK, I conducted a similar project which involved having participants take photographs which responded to a specific prompt. In partnership with the community organization Head & Hands, I mailed disposable cameras to participants from a Black youth group whose activities were based at Head & Hands. Together with artist and CLARK member, Eve Tangy, we created educational videos on the principles of photography and disposable camera use and tasked the participants to go around their neighbourhoods taking photos of moments that, in their eyes, sparked Black Joy—the theme of the project. Following a feedback session with Eve and myself, the two preferred photos from each participants’ photo reels were printed and mounted as part of a community exhibition entitled Nous sommes ici (“We’re Here”) at the entry of Centre CLARK’s gallery. 


These public community projects were not formal or academic, but, I came to understand each of these projects as examples of what is called research-creation (or practice-based research or arts-based research). Through creative methods like curating objects for digital archiving and photography, I, as the facilitator/researcher, was interested in how the media comprising each exhibition would inform myself and the greater public about the experiences of marginalized artists and Black youth at such pivotal moments in these communities.

Photovoice: Empowering research participants

The fact that both these projects involved working with a community and giving them creative control over how they wanted their research presented reminded me of the popular qualitative research method used often within the fields of public health, sociology, and anthropology called Photovoice. The method was originally coined as Photo Novella in 1992 and then later renamed Photovoice in 1996 by researchers Caroline Wang and Mary Ann Burris. The flagship study that established this method for decades involved scholars providing cameras and photography training to low-income women living in rural villages of Yunnan, China.

The goals of this Photovoice research were to better understand, through the perspectives of these women, the challenges they faced within their communities and societies, and to communicate these concerns to policymakers who might be more amenable to photographic representations rather than text. Citing Paulo Freire, Wang and Burris note the potential photographs have to raise consciousness and promote collective action due to their political nature. [5]

According to Wang and Burris, “these images and tales have the potential to reach generations of children to come.” [6] The images created a medium through which these women were able to share their experiences and also relate to each other. Even with 50 villages represented in the research, shared experience and strong reactions to certain photographs came up for participants – including this picture of a young child lying in a field while her mother farmed nearby. 

According to the authors, “the image was virtually universal to their own experience. When families must race to finish seasonal cultivating, when their work load is heavy, and when no elders in the family can look after young ones, mothers are forced to bring their babies to the field. Dust and rain weaken the health of their infants… The photograph was a lightening [sic] rod for the women’s discussion of their burdens and needs.” [8]

Since its conception in the 1990s as a means for participatory needs assessment, many scholars and researchers have expanded Photovoice methodology. Given the exponential increase of camera access via smartphones, Photovoice is an increasingly feasible method for this kind of research. Recurring themes in Photovoice work include community health, mental health studies, ethnic and race-based studies, research with queer communities, as well as specific neighbourhood and urban studies. During the pandemic lockdowns, there were also Photovoice studies conducted entirely online, thus giving rise to the method of virtual Photovoice. [9]

Critical Fabulation: Filling the gaps in visual history

Following my masters thesis research, I became more interested in how communities sought to represent themselves through photography and digital media. Not only that, but also how communities would form and engage with content circulated on social media – despite these people not being the originators of this content. 

In my research, people reacted most strongly to family photographs depicting migration from South Asia to the Global North. Although reasons for emigration varied across the respondents, many people faced similar challenges with the immigration process and resettlement in a new territory. They shared their experiences through commenting online. 

People in communities which are underrepresented in traditional archives are often forced to work with limited documentation. They must do the critical and imaginative work of extrapolating what they find. While photographs can convey biographical, political, or historical meaning, exploring archived images with imagination can foster creative interpretation to fill gaps in the archival record. Scholar of African-American studies, Saidiya Hartman, introduced the term “critical fabulation” to denote this practice of reimagining the sequences of events and actors behind the narratives contained within the archive. In her words, this reconfiguration of story elements, attempts “to jeopardize the status of the event, to displace the received or authorized account, and to imagine what might have happened or might have been said or might have been done.” [10] In reference to depictions of narratives from the Atlantic slave trade in which enslaved people are often referred to as commodities, Hartman writes “the intent of this practice is not to give voice to the slave, but rather to imagine what cannot be verified, a realm of experience which is situated between two zones of death—social and corporeal death—and to reckon with the precarious lives which are visible only in the moment of their disappearance. It is an impossible writing which attempts to say that which resists being said (since dead girls are unable to speak). It is a history of an unrecoverable past; it is a narrative of what might have been or could have been; it is a history written with and against the archive.” [11]

I am investigating what it means to imagine the unverifiable and reckoning what only becomes visible at its disappearance. In 2020, I wrote about Facebook pages serving as archives of queer life in my home town, Montreal. [12] For this study, I once again conducted a digital ethnography, this time of the event pages surrounding a QTPOC (queer/trans person of colour)-led event series known as Gender B(l)ender. Drawing from Sam McBean, I argued that simply having access to these event pages on Facebook creates a space of possibility in which one can imagine themselves as part of these events, as part of these communities – even when physical, in-person participation is not possible. Although critical fabulation was not a method used in this study, it seemed like a precursor to this concept of collectively rethinking, reformulating, and resurrecting untold, unknown, or forgetting histories of the archives. This finally leads us to the project of my fellowship here at the Flickr Foundation.

In addition to this fellowship, I am coordinator of the Access in the Making Lab, a university research lab working broadly on issues of critical disability studies, accessibility, anti-colonialism, and environmental humanities. In my work, I am increasingly preoccupied with the question of methods: 1) how do we do archival research—especially ethical archival research—with historically marginalized communities; and, 2) how can research “subjects” be empowered to become seen as co-producers of research. 

I trace this convoluted genealogy of my own fragmented research and community projects to explain the method I am developing and have proposed to university researchers as a part of my fellowship. Following my work on Facebook and Instagram, I similarly position Flickr as a participatory archive, made by millions of people in millions of communities. [13] Eryk Salvaggio, fellow 2024 Flickr Foundation research fellow, also positions Flickr as an archive such that it “holds digital copies of historical artifacts for individual reflection and context.” [14] From this theoretical groundwork of seeing these online social image/media repositories as archives, I seek to position archival items – i.e. the photos uploaded to Flickr.com – as a medium for creative interpretation by which researchers could better understand the lived realities of different communities, just like the Photovoice researchers. I am calling this set of work and use cases “Archivevoice”.

In part two of this series, I will explore the methodology itself in more detail including a guide for researchers interested in engaging with this method.

Footnotes

[1] Prakash Krishnan, “Digital Disidentifications: A Case Study of South Asian Instagram Community Archives,” in The Politics and Poetics of Indian Digital Diasporas: From Desi to Brown (Routledge, 2024), https://www.routledge.com/The-Politics-and-Poetics-of-Indian-Digital-Diasporas-From-Desi-to-Brown/Jiwani-Tremblay-Bhatia/p/book/9781032593531.

[2] Caroline Wang and Mary Ann Burris, “Empowerment through Photo Novella: Portraits of Participation,” Health Education Quarterly 21, no. 2 (1994): 171–86.

[3] Kunyi Wu, Visual Voices, 100 Photographs of Village China by the Women of Yunnan Province, 1995.

[4] Wu.

[5] Caroline Wang and Mary Ann Burris, “Photovoice: Concept, Methodology, and Use for Participatory Needs Assessment,” Health Education & Behavior 24, no. 3 (1997): 384.

[6] Wang and Burris, “Empowerment through Photo Novella,” 179.

[7] Wang and Burris, “Empowerment through Photo Novella.”

[8] Wang and Burris, 180.

[9] John L. Oliffe et al., “The Case for and Against Doing Virtual Photovoice,” International Journal of Qualitative Methods 22 (March 1, 2023): 16094069231190564, https://doi.org/10.1177/16094069231190564.

[10] Saidiya Hartman, “Venus in Two Acts,” Small Axe 12, no. 2 (2008): 11.

[11] Hartman, 12.

[12] Prakash Krishnan and Stefanie Duguay, “From ‘Interested’ to Showing Up: Investigating Digital Media’s Role in Montréal-Based LGBTQ Social Organizing,” Canadian Journal of Communication 45, no. 4 (December 8, 2020): 525–44, https://doi.org/10.22230/cjc.2020v44n4a3694.

[13] Isto Huvila, “Participatory Archive: Towards Decentralised Curation, Radical User Orientation, and Broader Contextualisation of Records Management,” Archival Science 8, no. 1 (March 1, 2008): 15–36, https://doi.org/10.1007/s10502-008-9071-0.

[14] Eryk Salvaggio, “The Ghost Stays in the Picture, Part 1: Archives, Datasets, and Infrastructures,” Flickr Foundation (blog), May 29, 2024, https://www.flickr.org/the-ghost-stays-in-the-picture-part-1-archives-datasets-and-infrastructures/.

Bibliography

Hartman, Saidiya. “Venus in Two Acts.” Small Axe 12, no. 2 (2008): 1–14.

Huvila, Isto. “Participatory Archive: Towards Decentralised Curation, Radical User Orientation, and Broader Contextualisation of Records Management.” Archival Science 8, no. 1 (March 1, 2008): 15–36. https://doi.org/10.1007/s10502-008-9071-0.

Krishnan, Prakash. “Digital Disidentifications: A Case Study of South Asian Instagram Community Archives.” In The Politics and Poetics of Indian Digital Diasporas: From Desi to Brown. Routledge, 2024. https://www.routledge.com/The-Politics-and-Poetics-of-Indian-Digital-Diasporas-From-Desi-to-Brown/Jiwani-Tremblay-Bhatia/p/book/9781032593531.

Krishnan, Prakash, and Stefanie Duguay. “From ‘Interested’ to Showing Up: Investigating Digital Media’s Role in Montréal-Based LGBTQ Social Organizing.” Canadian Journal of Communication 45, no. 4 (December 8, 2020): 525–44. https://doi.org/10.22230/cjc.2020v44n4a3694.

Oliffe, John L., Nina Gao, Mary T. Kelly, Calvin C. Fernandez, Hooman Salavati, Matthew Sha, Zac E. Seidler, and Simon M. Rice. “The Case for and Against Doing Virtual Photovoice.” International Journal of Qualitative Methods 22 (March 1, 2023): 16094069231190564. https://doi.org/10.1177/16094069231190564.

Salvaggio, Eryk. “The Ghost Stays in the Picture, Part 1: Archives, Datasets, and Infrastructures.” Flickr Foundation (blog), May 29, 2024. https://www.flickr.org/the-ghost-stays-in-the-picture-part-1-archives-datasets-and-infrastructures/.

Wang, Caroline, and Mary Ann Burris. “Empowerment through Photo Novella: Portraits of Participation.” Health Education Quarterly 21, no. 2 (1994): 171–86.

———. “Photovoice: Concept, Methodology, and Use for Participatory Needs Assessment.” Health Education & Behavior 24, no. 3 (1997): 369–87.

Wu, Kunyi. Visual Voices, 100 Photographs of Village China by the Women of Yunnan Province, 1995.

The Ghost Stays in the Picture, Part 3: The Power of the Image

Eryk Salvaggio is a 2024 Flickr Foundation Research Fellow, diving into the relationships between images, their archives, and datasets through a creative research lens. This three-part series focuses on the ways archives such as Flickr can shape the outputs of generative AI in ways akin to a haunting. You can read part one and two.

“Definitions belong to the definers, not the defined.”
― Toni Morrison, Beloved

Generative Artificial Intelligence is sometimes described as a remix engine. It is one of the more easily graspable metaphors for understanding these images, but it’s also wrong. 

As a digital collage artist working before the rise of artificial intelligence, I was always remixing images. I would do a manual search of the public domain works available through the Internet Archive or Flickr Commons. I would download images into folders named for specific characteristics of various images. An orange would be added to the folder for fruits, but also round, and the color orange; cats could be found in both cats and animals

I was organizing images solely on visual appearance. It was anticipating their retrieval whenever certain needs might emerge. If I needed something round to balance a particular composition, I could find it in the round folder, surrounded by other round things: fruits and stones and images of the sun, the globes of planets and human eyes. 

Once in the folder, the images were shapes, and I could draw from them regardless of what they depicted. It didn’t matter where they came from. They were redefined according to their anticipated use. 

A Churning

This was remixing, but I look back on this practice with fresh eyes when I consider the metaphor as it is applied to diffusion models. My transformation of source material was not merely based on their shapes, but their meaning. New juxtapositions emerged, recontextualizing those images. They retained their original form, but engaged in new dialogues through virtual assemblages. 

As I explore AI images and the datasets that help produce them, I find myself moving away from the concept of the remix. The remix is a form of picking up a melody and evolving it, and it relies on human expression. It is a relationship, a gesture made in response to another gesture.

To believe we could “automate” remixing assumes too much of the systems that do this work. Remixes require an engagement with the source material. Generative AI systems do not have any relationship with the meanings embedded into the materials they reconfigure. In the absence of engagement, what machines do is better described as a churn, combining two senses of the word. Generative AI models churn images in that they dissolve the surface of these images. Then it churns out new images, that is, “to produce mechanically and in great volume.” 

Of course, people can diffuse the surface meaning of images too. As a collagist, I could ignore the context of any image I liked. We can look at the stereogram below and see nothing but the moon. We don’t have to think about the tools used to make that image, or how it was circulated, or who profited from its production. But as a collagist, I could choose to engage with questions that were hidden by the surfaces of things. I could refrain from engagements with images, and their ghosts, that I did not want to disturb. 

Actions taken by a person can model actions taken by a machine. But the ability to automate a person’s actions does not suggest the right or the wisdom to automate those actions. I wonder if, in the case of diffusion models, we shouldn’t more closely scrutinize the act of prising meaning from an image and casting it aside. This is something humans do when they are granted, or demand, the power to do so. The automation of that power may be legal. But it also calls for thoughtful restraint. 

In this essay, I want to explore the power to inscribe into images. Traditionally, the power to extract images from a place has been granted to those with the means to do so. Over the years, the distribution and circulation of images has been balanced against those who hold little power to resist it. In the automation of image extraction for training generative artificial intelligence, I believe we are embedding this practice into a form of data colonialism. I suggest that power differentials haunt the images that are produced by AI, because it has molded the contents of datasets, and infrastructures, that result in those images. 

The Crying Child

Temi Odumosu has written about the “digital reproduction of enslaved and colonized subjects held in cultural heritage collections.” In The Crying Child, Odumosu looks at the role of the digital image as a means of extending the life of a photographic memory. But this process is fraught, and Odumosu dedicates the paper to “revisiting those breaches (in trust) and colonial hauntings that follow photographed Afro-diasporic subjects from moment of capture, through archive, into code” (S290). It does so by focusing on a single image, taken in St. Croix in 1910: 

“This photograph suspends in time a Black body, a series of compositional choices, actions, and a sound. It represents a child standing alone in a nondescript setting, barefoot with overpronation, in a dusty linen top too short to be a dress, and crying. Clearly in visible distress, with a running nose and copious tears rolling down its face, the child’s crinkled forehead gives a sense of concentrated energy exerted by all the emotion … Emotions that object to the circumstances of iconographic production.”

The image emerges from the Royal Danish Library. It was taken by Axel Ovesen, a military officer who operated a commercial photography business. The photograph was circulated as a postcard, and appears in a number of personal and commercial photo albums Odumosu found in the archive.

The unnamed crying child appeared to the Danish colonizers of the island as an amusement, and is labeled only as “the grumpy one” (in the sense of “uncooperative”). The contexts in which this image appeared and circulated were all oriented toward soothing and distancing the colonizers from the colonized. By reframing it as a humorous novelty, the power to apply and remove meaning is exercised on behalf of those who purchase the postcard and mail it to others for a laugh. What is literally depicted in these postcards is, Odumosi writes, “the means of production, rights of access, and dissemination” (S295). 

I am describing this essay at length because the practice of categorizing this image in an archive is so neatly aligned with the collection and categorization of training data for algorithmic images. Too often, the images used for training are treated solely as data, and training defended as an act that leaves no traces. This is true. The digital copy remains intact.

But the image is degraded, literally, step by step until nothing remains but digital noise. The image is churned, the surface broken apart, and its traces stored as math tucked away in some vector space. It all seems very tidy, technical, and precise, if you treat the image as data. But to say so requires us to agree that the structures and patterns of the crying child in the archive — the shape of the child’s body, the details of the wrinkled skin around the child’s mouth — are somehow distinct from the meaning of the image. 

Because by diffusing these images into an AI model, and pairing existing text labels to it within the model, we extend the reach of Danish colonial power over the image. For centuries, archives have organized collections into assemblages shaped and informed by a vision of those with power over those whose power is held back. The colonizing eye sets the crying child into the category of amusements, where it lingers until unearthed and questioned.

If these images are diffused into new images — untraceable images, images that claim to be without context or lineage — how do we uncover the way that this power is wielded and infused into the datasets, the models, and the images ultimately produced by the assemblage? What obligations linger beneath the surfaces of things? 

Every Archive a Collage

Collage can be one path for people to access these images and evaluate their historical context. The human collage maker, the remixer, can assess and determine the appropriateness of the image for whatever use they have in mind. This can be an exercise of power, too, and it ought to be handled consciously. It has featured as a tool of Situationist detournement, a means of taking images from advertising and propaganda to reveal their contradictions and agendas. These are direct confrontations, artistic gestures that undermine the organization of the world that images impose on our sense of things. The collage can be used to exert power or challenge the status quo. 

Every archive is a collage, a way of asserting that there is a place for things within an emergent or imposed structure. The scholar and artist Beth Coleman’s work points to the reversal of this relationship, citing W.E.B. Du Bois’ exhibition at the 1900 Paris Exposition. M. Murphy writes,

“Du Bois’s use of [photographic] evidence disrupted racial kinds rather than ordered them … Du Bois’s exhibition was crucially not an exhibit of ‘facts’ and ‘data’ that made black people in Georgia knowable to study, but rather a portrait in variation and difference so antagonistic to racist sociology as to dislodge race as a coherent object of study” (71).

The imposed structures of algorithmically generated images rely on facts and data, defined a certain way. They struggle with context and difference. The images these tools produce are constrained to the central tendencies of the data they were trained on, an inherently conformist technology. 

To challenge these central tendencies means to engage with the structures it imposes on this data, and to critique this churn of images into data to begin with. Matthew Fuller and Eyal Weizman describe “hyper-aesthetic” images as not merely “part of a symbolic regime of representation, but actual traces and residues of material relations and of mediatic structures assembled to elicit them” (80). 

Consider the stereoscope. Once the most popular means of accessing photographs, the stereoscope relied on a trick of the eye, akin to the use of 3D glasses. It combined two visions of the same scene taken from the slight left and slight right of the other. When viewed through a special viewing device, the human eye superimposes them, and the overlap creates the illusion of physical depth in a flat plane. We can find some examples of these on Flickr (including the Danish Film Museum) or at The Library of Congress’ Stereograph collection.

The time period in which this technology was popular happened to overlap with an era of brutal colonization, and the archival artifacts of this era contain traces of how images projected power. 

I was struck by stereoscopic images of American imperialism in the Philippines during the US occupation, starting in 1899. They aimed to “bring to life” images of Filipino men dying in fields and other images of war, using the spectacle of the stereoscopic image as a mechanism for propaganda. These were circulated as novelties to Americans on the mainland, a way of asserting a gaze of dominance over those they occupied.

In the long American tradition of infotainment, the stereogram fused a novel technological spectacle with the effort to assert military might, paired with captions describing the US cause as just and noble while severely diminishing the numbers of civilian casualties. In Body Parts of Empire : Visual Abjection, Filipino Images, and the American Archive, Nerissa Balce writes that

“The popularity of war photographs, stereoscope viewers, and illustrated journals can be read as the public’s support for American expansion. It can also be read as the fascination for what were then new imperial ‘technologies of vision’” (52).

The link between stereograms as a style of image and the gaze of colonizing power is now deeply entrenched into the vector spaces of image synthesis systems. Prompt Midjourney for the style of a stereogram, and this history haunts the images it returns. Many prompted images for “Stereograms, 1900” do not even render the expected, highly formulaic structure of a stereogram (two of the same images, side by side, at a slight angle). It does, however, conjure images of those occupied lands. We see a visual echo of the colonizing gaze.  

Images produced for the more generally used “stereoview,” even without the use of a date, still gravitate to a similar visual language. With “stereoview,” we are given the technical specifics of the medium. The content is more abstract: people are missing, but strongly suggested. These perhaps get me closest to the idea of a “haunted” image: a scene which suggests a history that I cannot directly access.

Perhaps there are two kinds of absences embedded in these systems. The people that colonizers want to erase, and then the evidence of the colonizers themselves. Crucially, this gaze haunts these images. 

Here are four sets of two pairs.

These styles are embedded into the prompt for the technology of image capture, the stereogram. The source material is inscribed with the gaze that controlled this apparatus. The method of that inscription — the stereogram — inscribes this material into the present images.  The history is loaded into the keyword and its neighboring associations in the vector space. History becomes part of the churn. These are new old images, built from the associations of a single word (stereoview) into its messy surroundings.

It’s important to remember that the images above are not documents of historical places or events. They’re “hallucinations,” that is, they are a sample of images from a spectrum of possible images that exists at the intersection of every image labeled “stereoview.” But “stereoview” as a category does not isolate the technology from how it was used. The technology of the stereogram, or the stereoviewer, was deeply integrated into regimes of war, racial hierarchies, and power. The gaze, and the subject, are both aggregated, diffused, and made to emerge through the churning of the model.

Technologies of Flattening

The stereoview and the diffusion models are both technologies of spectacle, and the affordance of power to those who control it is a similar one. They are technologies for flattening, containing, and re-contextualizing the world into a specific order. As viewers, the generated image is never merely the surfaces of photography churned into new, abstract forms that resemble our prompts. They are an activation of the model’s symbolic regime, which is derived from the corpus of images because it has the power to isolate images from their meaning

AI has the power of finance, which enables computational resources that make obtaining 5 billion images for a dataset possible, regardless of its impact on local environments. It has the resources to train these images; the resources to recruit underpaid labor to annotate and sort these images. The critiques of AI infrastructure are numerous.

I am most interested here in one form of power that is the most invisible, which is the power of naturalizing and imposing an order of meaning through diffused imagery. The machine controls the way language becomes images. At the same time, it renders historical documentation meaningless — we can generate all kinds of historical footage now.

These images are reminders of the ways data colonialism has become embedded within not merely image generation but the infrastructures of machine learning. The scholar Tiara Roxanne has been investigating the haunting of AI systems long before me. In 2022 Roxanne noted that,

“in data colonialism, forms of technological hauntings are are experienced when Indigenous peoples are marked as ‘other,’ and remain unseen and unacknowledged. In this way, Indigenous peoples, as circumscribed through the fundamental settler-colonial structures built within machine learning systems, are haunted and confronted by this external technological force. Here, technology performs as a colonial ghost, one that continues to harm and violate Indigenous perspectives, voices, and overall identities” (49).

AI can ignore “the traces and residues of material relations” (Fuller and Weizman) as it reduces the image to its surfaces instead of the constellations of power that structured the original material. These images are the product of imbalances of power in the archive, and whatever interests those archives protected are now protected by an impenetrable, uncontestable, automated set of decisions steered by the past.

The Abstracted Colonial Subject

What we see in the above images are an inscription by association. The generated image, as a type of machine learning system, matters not only because of how it structures history into the present. It matters because it is a visualization that reaches to something far greater about automated decision making and the power it exerts over others. 

These striations of power in the archive or museum, in the census or the polling data, in the medical records or the migration records, determine what we see and what we do not. What we see in generated images must contort itself around what has been excluded from the archives. What is visible is shaped by the invisible. In the real world, this can manifest as families living on a street serving as an indication of those who could not live on that street. It could be that loans granted by an algorithmic assessment always contain an echo of loans that were not approved. 

The synthetic image visualizes these traces. They churn the surfaces, not the tangled reality beneath them. The images that emerge are glossy, professional, saturated. Hiding behind these products by and for the attention economy is the world of the not-seen. What are our obligations as viewers to the surfaces we churn when we prompt an image model? How do we reconcile our knowledge of context and history with the algorithmic detachment of these automated remixes?

The media scholar Roland Meyer writes that,

“[s]omewhere in the training data that feeds these models are photographs of real people, real places, and real events that have somehow, if only statistically, found their way into the image we are looking at. Historical reality is fundamentally absent from these images, but it haunts them nonetheless.”

In a seance, you raise spirits you have no right to speak to. The folly of it is the subject of countless warnings in stories, songs and folklore. 

What if we took the prompt so seriously? What if typing words to trigger an image was treated as a means of summoning a hidden and unsettled history? Because that is what the prompt does. It agitates the archives. Sometimes, by accident, it surfaces something many would not care to see. Boldly — knowing that I am acting from a place of privilege, and power, I ask the system to return “the abstracted colonial subject of photography.” I know I am conjuring something I should not be. 

My words are transmitted into the model within a data center, where they flow through a set of vectors, the in-between state of thousands of photographs. My words are broken apart into key words — “abstracted, colonial, colonial subject, subject, photography.” These are further sliced into numerical tokens to represent the mathematical coordinates of these ideas within the model. From there, these coordinates offer points of cohesion which are applied to find an image within a jpg of digital static. The machine removes the noise toward an image that exists in the overlapping space of these vectors.

Avery Gordon, whose book Ghostly Matters is a rich source of thinking for this research, writes:

“… if there is one thing to be learned from the investigation of ghostly matters, it is that you cannot encounter this kind of disappearance as a grand historical fact, as a mass of data adding up to an event, marking itself in straight empty time, settling the ground for a future cleansed of its spirit” (63).

If history is present in the archives, the images churned from the archive disrupt our access to the flow of history. It prevents us from relating to the image with empathy, because there is no single human behind the image or within it. It’s the abstracted colonial gaze of power applied as a styling tool. It’s a mass of data claiming to be history.

Human and Mechanical Readings

I hope you will indulge me as my eye wanders through the resulting image.

I am struck by the glossiness of it. Midjourney is fine-tuned toward an aesthetic dataset, leaning into images found visually appealing based on human feedback. I note the presence of palm trees, which brings me to the Caribbean Islands of St. Croix where The Crying Child photograph was taken. I see the presence of barbed wire, a signifier of a colonial presence.

The image is a double exposure. It reminds me of spirit photography, in which so-called psychic photographers would surreptitiously photograph a ghostly puppet before photographing a client. The image of the “ghost” was superimposed on the film to emerge in the resulting photo. These are associations that come to my mind as I glance at this image. I also wonder about what I don’t know how to read: the style of the dress, the patterns it contains, the haircut, the particulars of vegetation.

We can also look at the image as a machine does. Midjourney’s describe feature will tell us what words might create an image we show it. If I use it with the images it produces, it offers a kind of mirror-world insight into the relationship between the words I’ve used to summon that image and the categories of images from which it was drawn.

To be clear, both “readings” offer a loose, intuitive methodology, keeping in the spirit of the seance — a Ouija board of pixel values and text descriptors. They are a way in to the subject matter, offering paths for more rigorous documentation: multiple images for the same prompt, evaluated together to identify patterns and the prevalence of those patterns. That reveals something about the vector space. 

Here, I just want to see something, to compare the image as I see it to what the machine “sees.”

The image returned for the abstract colonial subject of photography is described by Midjourney this way: 

“There is a man standing in a field of tall grass, inverted colors, tropical style, female image in shadow, portrait of bald, azure and red tones, palms, double exposure effect, afrofuturist, camouflage made of love, in style of kar wai wong, red and teal color scheme, symmetrical realistic, yellow infrared, blurred and dreamy illustration.”

My words produced an image, and then those words disappeared from the image that was produced. “Colonized Subject” is adjacent to the words the machine does see: “tall grass,” “afrofuturism,” “tropical.” Other descriptions recur as I prompt the model over and over again to describe this image, such as “Indian.” I have to imagine that this idea of colonized subjects “haunts” these keywords. The idea of the colonial subject is recognized by the system, but shuffled off to nearest synonyms and euphemisms. Might this be a technical infrastructure through which the images are haunted? Could certain patterns of images be linked through unacknowledged, invisible categories the machine can only indirectly acknowledge? 

I can only speculate. That’s the trouble with hauntings. It’s the limit to drawing any conclusions from these observations. But I would draw the reader’s attention to an important distinction between my actions as a collage artist and the images made by Midjourney. The image will be interpreted by many of us, who will find different ways to see it, and a human artist may put those meanings into adjacency through conscious decisions. But to create this image, we rely solely on a tool for automated churning.

We often describe the power of images in terms of what impact an image can have on the world. Less often we discuss the power that impacts the image: the power to structure and give the image form, to pose or arrange photographic subjects. 

Every person interprets an image in different ways. A machine makes images for every person from a fixed set of coordinates, its variety constrained by the borders of its data. That concentrates power over images into the unknown coordination of a black box system. How might we intervene and challenge that power?  

The Indifferent Archivist 

We have no business of conjuring ghosts if we don’t know how to speak to them. As a collage artist, “remixing” in 2016 meant creating new arrangements from old materials, suggesting new interpretations of archival images. I was able to step aside — as a white man in California, I would never use the images of colonized people for something as benign as “expressing myself.” I would know that I could not speak to that history. Best to leave that power to shift meanings and shape new narratives to those who could speak to it. Nonetheless, it is a power that can be wielded by those who have no rights to it.  

Yes, by moving any accessible image from the online archive and transmuting it into training data, diffusion models assert this same power. But it is incapable of historic acknowledgement or obligation. The narratives of the source materials are blocked from view, in service to a technically embedded narrative that images are merely their surfaces and that surfaces are malleable. At its heart is the idea that the context of these images can be stripped and reduced into a molding clay, for anyone’s hands to shape to their own liking. 

What matters is the power to determine the relationships our images have with the systems that include or exclude. It’s about the power to choose what becomes documented, and on what terms. Through directed attention, we may be able to work through the meanings of these gaps and traces. It is a useful antidote to the inattention of automated generalizations. To greet the ghosts in these archives presents an opportunity to intervene on behalf of complexity, nuance, and care.

That is literal meaning of curation, at its Latin root: “curare,” to care. In this light, there is no such thing as automated curation.

Reclaiming Traceability

In 2021, Magda Tyzlik-Carver wrote “the practice of curating data is also an epistemological practice that needs interventions to consider futures, but also account for the past. This can be done by asking where data comes from. The task in curating data is to reclaim their traceability and to account for their lineage.”

When I started the “Ghost Stays in the Picture” research project, I intended to make linkages between the images produced by these systems and the categories within their training data. It would be a means of surfacing the power embedded into the source of this algorithmic churning within the vector space. I had hoped to highlight and respond to these algorithmic imaginaries by revealing the technical apparatus beneath the surface of generated images. 

In 2024, no mainstream image generation tool offers the access necessary for us to gather any insights into its curatorial patterns. The image dataset I initially worked with for this project is gone. Images of power and domination were the reason — specifically, the Stanford Internet Observatory’s discovery of more than 3,000 images in the LAION 5B dataset depicting abused children. Realizing this, the churn of images became visceral, in the pit of my stomach. The traces of those images, the pain of any person in the dataset, lingers in the models. Perhaps imperceptibly, they shape the structures and patterns of the images I see.

In gathering these images, there was no right to refuse, no intervention of care. Ghosts, Odumosu writes, “make their presences felt, precisely in those moments when the organizing structure has ruptured a caretaking contract; when the crime has not been sufficiently named or borne witness to; when someone is not paying attention” (S299). 

The training of Generative Artificial Intelligence systems has relied upon the power to automate indifference. And if synthetic images are structured in this way, it is merely a visualization of how “artificial intelligence systems” structure the material world when carelessly deployed in other contexts. The synthetic image offers us a glimpse of what that world would look like, if only we would look critically at the structures that inform its spectacle. If we can read algorithmic decision-making a lapse in care, a disintegration of accountability, we might see fresh pavement has been poured onto sacred land. 

This regime of Artificial Intelligence is not an inevitability. It is not even a single ideology. It is a computer system, and computer systems, and norms of interaction and participation with those systems, are malleable. Even with training datasets locked away behind corporate walls, it might still be possible “to insist on care where there has historically been none” (Odumosu S297), and by extension, to identify and refuse the automated inscription of the colonizing ghost.

 

This post concludes my research work at the Flickr Foundation, but I am eager to continue it. I am seeking publishers of art books, or curators for art or photographic exhibitions, who may be interested in a longer set of essays or a curatorial project that explores this methodology for reading AI generated images. If you’re interested, please reach out to me directly: eryk.salvaggio@gmail.com.

Introducing Eliza Gregory, research partner

Eliza Gregory is a social practice artist, a photographer, an educator and a writer.

Research is a key facet of the Flickr Foundation’s work. We are gathering a group of intersectional researcher partners to question the idea of a 21st century image archive together, and Eliza is one of them.

Who ARE you, Eliza?

My name is Eliza Gregory. I’m a mom of two daughters, a wife/partner, a photographer, a social practice artist, a curator, and an educator. I like cake and noodles and I keep chickens. I have issues with chronic clutter. I am getting more and more interested in plants. This might be the result of middle age, or it might be related to feeling like connecting with plants is the roadmap back from total social and environmental collapse. Or both.

For about ten years I made work about cultural identity and cultural adaptation through a mixture of large format portraiture, interviews, events and relationships. Those projects focused on resettled refugee households in Phoenix, Arizona; mapping the wide array of Australian cultural identities (indigenous, recent-immigrant, and long-time-ago-immigrant; cultural identity tied to gender and sexuality, etc.) in the neighborhood where I lived in Melbourne; and immigration to the Bay Area in California over the last 40+ years.

More recently, I curated a show called Photography & Tenderness that investigates how we can hold photography accountable for the ways in which it has been used to build a racist society and somehow still use it to make something tender. That took place at Wave Pool Art Fulfillment Center in Cincinnati, OH as part of the Cincinnati FotoFocus 2022 Biennial.

And I’ve been working on a project I call [Placeholder], about holding and being held by place. It investigates relationships between people and land and asks what might happen if we acknowledged the fundamental rupture that has occurred between land and people, and began working to repair it. So far I’m mainly in the research phase of that work, but my research has taken place with my students at Sacramento State University, and with other artists, and I’ve pulled together two different exhibitions to invite audiences into that research at Axis Gallery, Sacramento, CA: [Placeholder] a studio visit with Eliza Gregory and [Placeholder]: florilegia.

 

I started out my career trained as a fine art photographer and a creative writer. I have always been interested in telling stories with pictures, but as soon as I tried my hand at it I got caught up in questions about the ethical implications of making an object about (i.e. objectifying) another person. I started to solve those problems by building out relationships and project structures that relied on exchange and accountability, and then went to grad school in Art & Social Practice at Portland State University. That program was a revelation for me and really provided the tools and the language I needed to keep building out my work in a way that felt good. In my experience, the dialogue around social practice is much more radical and useful and socially critical than the dialogue around photography, so I’ve really leaned into that space. But I still enjoy pictures and appreciate how powerful they can be.

Flickr is an interesting organization because it hosts a lot of pictures, but it also catalyzes a lot of relationships and interactions around those pictures. So Flickr represents an institution based around social practice and photography, in a certain way.

Why did you join as a research partner at flickr.org?

What is the relationship between justice and photographic representation? That is a question I think about a lot.

The human brain likes to simplify things. It’s how we are able to perceive so much and yet still focus on a single task or idea. And it’s why we take something like a human being, with a whole life full of perceptions and feelings and paradoxes, and reduce them to a single descriptor–child. American. Woman. White. Cis-gendered. Hetero. Middle aged. Tall. Pink. (I had someone I was photographing once tell me I was “big and pink.” And…I couldn’t argue.) Or we take an individual from another species, who has a whole life full of specific experiences, and reduce it to just the species name: rat. Grey squirrel. Monarch. Or even more reductively: Tree. Butterfly.

Photographs basically do the same thing. You take a whole moment filled with a million different feelings, thoughts, respirations, scents, sensations, views and reduce it to one small, flat, rectangle. And we call that a picture. And we equate it with “truth.”

That’s a problematic process, based on a problematic (though necessary and useful) human tendency. It’s inherently reductive. And yet we see it as a mechanism for communication, inquiry and learning. Photography can be a mechanism for those things, certainly. I used it for that purpose in a project called Massive Urban Change, where I photographed a dynamic urban environment that you can never fully take in SO that it would hold still; so that you could look at it more closely. But that reductive quality of photography can be used for radically different ends. It has also been a tool for building racist societies; for creating and cementing stereotypes; for mapping natural resources for extraction and destruction. Sometimes photography obfuscates truly important complexities by reducing things too much.

A lot of my work has been about interrogating the process of making photographs, especially of people (and now of places) to try to understand when photography is doing what we like to tell ourselves it’s doing, and when it’s doing something else.

I want to know, how do photographs shape the stories we tell ourselves, and how do those stories, in turn, shape society?

Thinking about Flickr is a way of approaching some of these questions. And thinking about how to conserve Flickr adds a whole new dimension to them.  I wanted to work with the Flickr Foundation mostly because I like the people it is bringing together–there is so much work going on around archiving images and cataloging images and reading images and finding certain images that goes beyond what I know as a maker of images. I love getting to be at the table with people who work on photography from such different angles. It helps blast me out of my normal frame of reference.

I also want to be bringing my students into photographic dialogues that are larger than our classroom. The Flickr Foundation is actively thinking about how to intersect with students and curriculum design. I want to create opportunities for my students to do meaningful work, and I see the Flickr Foundation as a partner in that.

Finally, I really love exhibitions. In some ways, exhibitions seem to be heading toward obsolescence, much like museums themselves. Both those structures are built on gatekeeping, colonial hierarchies, and a top-down, hierarchical flow of knowledge. So in the social practice dialogues I am a part of, sometimes the exhibition as a form feels sort of passé. But I love it as a way of creating experiences for people, of shaping or catalyzing dialogues, of giving people a gift. And the Flickr Foundation feels like a partner that I could potentially build visual experiences (exhibitions!) with.

What do you think will be the hardest parts of achieving its 100-year plan?

The questions around how to conserve digital material for a hundred years are HARD. That’s what I learned from bringing some of those questions to a group of senior photography students at Sacramento State University this fall. George has been delivering a 100 year plan workshop to various groups, and we conducted a version of that experience with my students. It’s basically asking people to think about what digital images will look like, consist of, and be viewed through in 100 years. As well as, What will it take to preserve a digital image we have now for that long? And how do you build an organization that can do that?

George had us start with finding an image of a place that’s meaningful to us, and then going out and trying to find the oldest photograph we can of that same place. Right away, that activity makes you think about how we view places, and what photographs we have access to, and what places we have access to visually. I once asked a group of photo history students, What is a photograph you wish you could see that’s impossible to make? A really surprising number of them said, “I wish I could see a picture of the pyramids being constructed!” That feels like a complementary mind-exercise to me, because we are so used to being able to see anything and everything we want in pictures. It’s important to remember that they haven’t always existed. And to contemplate what is un-photographable.

Then my students and I struggled to project our imaginations even into the near future to anticipate how technology will change, how behaviors will change around technology (both as it currently exists, and in terms of platforms and processes that haven’t been invented yet), and what it will mean to actually translate a jpg into multiple new file formats without losing whatever data make it a recognizable image in the first place.

Everything about this seems hard to me. The only things I’ve been able to hang on to so far, and visualize, are some of the foundation’s ideas around ritual—perhaps there will be a ritualized translation from one format to another every five or ten years. The idea that conserving something by allowing it to change feels very resonant—perhaps that is a shift in perspective that we are approaching on many fronts at once, from interpersonal relations (growth mindset!) to global ecology (I’m thinking of Anna Tsing’s book The Mushroom at the End of the World).

The scale is also difficult to fathom. 50 billion images is…so many images. And the collection is likely to grow. So the usual questions around archives are present too—what do we keep? What do we throw away? How does someone access the resource? How does someone FIND what they are looking for? (And along the way can we help them maybe find a few things they aren’t looking for but need or want to see?)

At the end we made zines to try to pull our thoughts together.

How do you hope to use the partnership to further your own research?

In my current artistic work, I research intergenerational narratives—both because inserting ourselves into them in families leads to improved mental health and in terms of how thinking about intergenerational narratives shifts our understanding of stewardship of the land that cares for us—and I’m a photographer. So the question, How do we approach the conservation of digital images for future generations? relates to HOW we are going to tell those intergenerational stories. I think that some of the long-term storytelling strategies we’ve lost track of or never understood within British-influenced contemporary American colonist culture—such as oral history and land-based, place-based knowledge—are tools we might turn to. But right now we are so image-obsessed that pictures will be in the mix too, and they might be the bridge that gets us to new (or old!) styles of connection, communication and storytelling.

Eliza Gregory is an artist and educator. She makes complex projects that unfold over time to reveal compassion, insight and new social forms.
www.elizagregory.org

With apologies to Eliza for leaving it so long to post this! ❤️
– George

Data Lifeboat Update 3

March has been productive. The short version is it’s complicated but we’re exploring happily, and adjusting the scope in small ways to help simplify it. Let me summarise the main things we did this month.

Legal workshop

We welcomed two of our advisors—Neil from the Bodleian and Andrea from GLAM e-Lab—to our HQ to get into the nitty gritty of what a 50-year-old Data Lifeboat needs to accommodate. 

As we began the conversation, I centred us in the C.A.R.E. Principles and asked that we always keep them in our sights for this work. The main future challenges are settling around the questions of how identity and the right to be forgotten must be expressed, how Flickr account holders can or should be identified, and whether an external name resolver service of some kind could help us. We think we should develop policies for Flickr members (on consent to be in a Data Lifeboat), Data Lifeboat creators (on their obligations as creators), and Dock Operators (an operations manual & obligations for operating a dock). It’s possible there will also be some challenges ahead around database rights, but we don’t know enough yet to give a good update. We’d like a first-take legal framework of the Data Lifeboat system to be an outcome of these first six months.

Privacy & licensing

These are key concepts central to Flickr—privacy and licensing—and we must make sure we do our utmost to respect them in all our work. It would be irresponsible for us to jettison the desires encoded in those settings for our convenience, tempting though that may be. By that I mean, it would be easier for us to make Data Lifeboats that contained whatever photos from whomever, but we must respect the desires of Flickr creators in the creation process. 

There are still big and unanswered questions about consent, and how we get millions of Flickr members to agree to participate and give permission to allow their pictures to be put in other people’s Data Lifeboats. 

Extending the prototype Data Lifeboat sets 

Initially, we had planned to run this 6-month prototype stage with just one test set of images, which would be some or all of the Flickr Commons photographs. But in order to explore the challenges around privacy and licensing, we’ve decided to expand our set of working prototypes to also include the entire Library of Congress Flickr Commons account, and all the photos tagged with “flickrhq” (since that set is something the Flickr Foundation may decide to collect for its own archive and contains photographs from different Flickr members who also happen to have been Flickr staff and would therefore (theoretically) be more sympathetic to the consent question).

Visit to Greenwich

Ewa spotted that there was an exhibition of ambrotype photographic portraits of women in the RNLI at the Maritime Museum in Greenwich at the moment, so we decided to take a day trip to see the portraits and poke around the brilliant museum. We ended up taking a boat from Greenwich to Battersea which was a nice way to experience the Thames (and check out that boat’s life saving capabilities).

Day Out: Maritime Museum & Lifeboats

Day Out: Maritime Museum & Lifeboats

The Data Lifeboat creation process

I found myself needing to start sketching out what it could look like to actually create a Data Lifeboat, and particularly not via a command line, so we spent a while in front of a whiteboard kicking that off. 

At this point, we’re imagining a few key steps:

  1. The Query – “I want these photos” – is like a search. We could borrow from our existing Flinumeratr toy.
  2. The Results – Show the images, some metadata. But it’s hard to show information about the set in aggregate at this stage, e.g., how many of the contents are licensed in which way. This could form a manifest for the Data Lifeboat..
  3. Agreement – We think there’s a need for the Data Lifeboat creator to agree to certain terms. Simple, active language that echoes the CARE principles, API ToS, and Flickr Community Guidelines. We think this should also be included in the Data Lifeboat it’s connected with.
  4. README / Note to the Future – we love the idea that the Data Lifeboat creator could add a descriptive narrative at this point, about why they are making this lifeboat, and for whom, but we recognised that this may not get done at all, especially if it’s too complicated or time-consuming. This is also a good spot to describe or configure warnings, timers, or other conditions needed for future access. Thanks also to two of our other advisors – Commons members Mary Grace and Alan – who shared with us their organisation’s policies on acquisitions for reference.
  5. Packaging – This would be asynchronous and invisible to the creator; downloading everything in the background. We realised it could take days, especially if there are lots of Data Lifeboats being made at once.
  6. Ready! – The Data Lifeboat creator gets a note somehow about the Data Lifeboat being ready for download. We may need to consider keeping it available only for a short time(?).

Creation Schematic, 19th March

Emergency v Non-Emergency 

We keep coming up against this… 

The original concept of the Data Lifeboat is a response to the near-death experience that Flickr had in 2017 when its then-owner, Verizon/Yahoo, almost decided to vaporise it because they deemed it too expensive to sell (something known as “the cost of economic divestment”). So, in the event of that kind of emergency, we’d want to try to save as much of this unique collection as possible as quickly as possible, so we’d need a million lifeboats full of pictures created more or less simultaneously or certainly in a relatively short period of time. 

In the early days of this work, Alex said that the pressure of this kind of emergency would be the equivalent of being “hugged to death by the archivists,” as we all try— in very caring and responsible ways—to save as much as we can. And then there’s the bazillion-emergency-hits-to-the-API-connection problem—aka the “Thundering Herd” problem—which we do not yet have a solution for, and which is very likely to affect any other social media platforms that may also be curious to explore this concept.

We’re connecting with the Flickr.com team to start discussing how to address this challenge. We’re beginning to think about how emergency selection might work, as well as the present, and future, challenges of establishing the identity of photo subjects and account owners. The millions of lifeboats that would be created would surely need the support of the company to launch if they’re ever needed.

This work is supported by the National Endowment for the Humanities.

NEH logo

Welcome, Jenn!

Meet the Foundation’s first ever Research Fellow!

It is with great pleasure that I introduce you to the Flickr Foundation’s inaugural research fellow, Jenn. In her own words…

Hi I’m Jenn Phillips-Bacher, the Flickr Foundation’s first-ever Research Fellow. I’ve been a Flickr user since 2007 when my first public photos were taken on a point-and-shoot digital camera. Oh, how the quality of photos have improved since then! It’s an absolute marvel to be able to trawl decades worth of (ever-improving) photography, still, in one place.

Before joining Flickr Foundation, I was most recently a Product Manager at Wellcome Collection, working to make its library and archive collections accessible to as many people as possible. I’ve also recently been a content strategist at the UK’s Government Digital Service where I focussed on tagging and taxonomies to help people find stuff. I’ve also been a web editor, project manager, reference librarian and technology trainer, all within the GLAM (that’s galleries, libraries, archives and museums) world.

My modus operandi for the 20+ years of my career has been to 1) find interesting work to do with kind people and 2) labor for the public good. That’s why I am delighted and honored to be part of Flickr Foundation’s efforts to preserve and sustain our digital heritage.

So what does it mean to be a research fellow?

Given my career history, I’d never considered that I could be a Research Fellow. I used to think research fellowships were reserved for academics (“real” researchers), which I resolutely am not. I’m still figuring out what it does mean to be a research fellow, but here’s where I’ve settled for now: a research fellowship allows me to take time out of normal life for learning and thinking while offering a practical benefit to the Flickr Foundation. That means I’ll use my research skills honed as a librarian and product manager to seek out existing knowledge and expertise, connecting the dots along the way, in order to help shape the Flickr Foundation’s work.

As the fellowship progresses, I’ll write more about what it’s like to move from a digital practitioner role into a Research Fellow role.

My research focus

My research is aimed at the Content Mobility program where I’m specifically interested in how we might design a Data Lifeboat. Not only the logistics of creating a portable archive of any facet of Flickr, but also how to plan for a digital collection’s ‘good ending’. I’ve always been interested in the idea of digital weeding—removing digital collections that no longer serve their purpose, as librarians do with physical materials. As we become more aware of the environmental impact of any digital activity, including online access and long-term preservation, we need to be even more intentional with what we save and what we let go.

As a complementary bit of research, I’ll be digging into the carbon costs of digital collections. I’m curious to see whether there’s something useful to do here that would help the GLAM sector make carbon-conscious digital collection decisions. (If you or anyone you know is already doing this work, I’d love to meet you/them!)

What else? When not working, I can be found nosing around galleries and museums and perambulating around cities in search of human-friendly architecture and good cafes. And like anyone who’s ever lived in Chicago, I have Opinions on hot dogs.

Superdawg drive-in

Photo by jordanfischer, CC BY 2.0.

When Past Meets Predictive: An interview with the curators of ‘A Generated Family of Man’

by Tori McKenna, Oxford Internet Institute

Design students, Juwon Jung and Maya Osaka, the inaugural cohort of Flickr Foundation’s New Curators program, embarked on a journey exploring what happens when you interface synthetic image production with historic archives.

This blog post marks the release of Flickr Foundation’s A Generated Family of Man, the third iteration in a series of reinterpretations of the 1955 MoMA photography exhibition, The Family of Man.

Capturing the reflections, sentiments and future implications raised by Jung and Osaka, these working ‘field notes’ function as a snapshot in time of where we stand as users, creators and curators facing computed image generation. At a time when Artificial Intelligence and Large Language Models are still in their infancy, yet have been recently made widely accessible to internet users, this experiment is by no means an exhaustive analysis of the current state of play. However, by focusing on a single use-case, Edward Steichen’s The Family of Man, Jung and Osaka were able to reflect in greater detail and specificity over a smaller selection of images — and the resultant impact of image generation on this collection.

Observations from this experiment are phrased as a series of conversations, or ‘interfaces’ with the ‘machine’.

Interface 1: ‘That’s not what I meant’

If the aim of image generation is verisimilitude, the first observation to remark upon when feeding captions into image generation tools is there are often significant discrepancies and deviations from the original photographs. AI produces images based on most-likely scenarios, and it became evident from certain visual elements that the generator was ‘filling in’ what the machine ‘expects’. For example, when replicating the photograph of an Austrian family eating a meal, the image generator resorted to stock food and dress types. In order to gain greater accuracy, as Jung explained, “we needed to find key terms that might ‘trick’ the algorithm”. These included supplementing with descriptive prompts of details (e.g. ‘eating from a communal bowl in the centre of the table’), as well as more subjective categories gleaned from the curators interpretations of the images (’working-class’, ‘attractive’, ‘melancholic’). As Osaka remarked, “the human voice in this process is absolutely necessary”. This constitutes a talking with the algorithm, a back-and-forth dialogue to produce true-to-life images, thus further centering the role of the prompt generator or curator.

This experiment was not about producing new fantasies, but to test how well the generator could reproduce historical context or reinterpret archival imagery. Adding time-period prompts, such as “1940s-style”, result in approximations based on the narrow window of historical content within the image generator’s training set. “When they don’t have enough data from certain periods AI’s depiction can be skewed”, explains Jung. This risks reflecting or reinforcing biased or incomplete representations of the period at hand. When we consider that more images were produced in the last 20 years than the last 200 years, image generators have a far greater quarry to ‘mine’ from the contemporary period and, as we saw, often struggle with historical detail.

Key take-away:
Generated images of the past are only as good as their training bank of images, which themselves are very far from representative of historical accuracy. Therefore, we ought to develop a set of best practices for projects that seek communion between historic images or archives and generated content.

Interface 2: ‘I’m not trying to sell you anything’

In addition to synthetic image generation, Jung & Osaka also experimented with synthetic caption generation: deriving text from the original images of The Family of Man. The generated captions were far from objective or purely descriptive. As Osaka noted, “it became clear the majority of these tools were developed for content marketing and commercial usage”, with Jung adding, “there was a cheesy, Instagram-esque feel to the captions with the overuse of hashtags and emojis”. Not only was this outdated style instantly transparent and ‘eyeroll-inducing’ for savvy internet users, but in some unfortunate cases, the generator wholly misrepresented the context. In Al Chang’s photo of a grief-stricken America soldier being comforted by his fellow troops in Korea, the image generator produced the following tone-deaf caption:

“Enjoying a peaceful afternoon with my best buddy 🐶💙 #dogsofinstagram #mananddog #bestfriendsforever” (there was no dog in the photograph).

When these “Instagram-esque” captions were fed back into image generation, naturally they produced overly positive, dreamy, aspirational images that lacked the ‘bite’ of the original photographs – thus creating a feedback loop of misrecognition and misunderstood sentiment.

The image and caption generators that Jung & Osaka selected were free services, in order to test what the ‘average user’ would most likely first encounter in synthetic production. This led to another consideration around the commercialism of such tools, as the internet adage goes, “if its free, you’re the product”. Using free AI services often means relinquishing input data, a fact that might be hidden in the fine print. “One of the dilemmas we were internally facing was ‘what is actually happening to these images when we upload them’?” as Jung pondered, “are we actually handing these over to the generators’ future data-sets?”. “It felt a little disrespectful to the creator”, according to Osaka, “in some cases we used specific prompts that emulate the style of particular photographs. It’s a grey area, but perhaps this could even be an infringement on their intellectual property”.

Key take-away:
The majority of synthetic production tools are built with commercial uses in mind. If we presume there are very few ‘neutral’ services available, we must be conscious of data ownership and creator protection.

Interface 3: ‘I’m not really sure how I feel about this’

The experiment resulted in hundreds of synthetic guesses, which induced surprising feelings of guilt among the curators. “In a sense, I felt almost guilty about producing so many images”, reports Jung, with e-waste and resource intensive processing power front of mind. “But we can also think about this another way” Osaka continues, “the originals, being in their analogue form, were captured with such care and consideration. Even their selection for the exhibition was a painstaking, well-documented process”.

We might interpret this as simply a nostalgic longing for finiteness of bygone era, and our disillusionment at today’s easy, instant access. But perhaps there is something unique to synthetic generation here: the more steps the generator takes from the original image, the more degraded the original essence, or meaning, becomes. In this process, not only does the image get further from ‘truth’ in a representational sense, but also in terms of original intention of the creator. If the underlying sense of warmth and cooperation in the original photographs disappears along the generated chain, is there a role for image generation in this context at all? “It often feels like something is missing”, concludes Jung, “at its best, synthetic image generation might be able to replicate moments from the past, but is this all that a photograph is and can be?”

Key take-away: Intention and sentiment are incredibly hard to reproduce synthetically. Human empathy must first be deployed to decipher the ‘purpose’ or background of the image. Naturally, human subjectivity will be input.

Our findings

Our journey into synthetic image generation underscores the indispensable role of human intervention. While the machine can be guided towards accuracy by the so-called ‘prompt generator’, human input is still required to flesh out context where the machine may be lacking in historic data.

At its present capacity, while image generation can approximate visual fidelity, it falters when it attempts to appropriate sentiment and meaning. The uncanny distortions we see in so many of the images of A Generated Family of Man. Monstrous fingers, blurred faces, melting body parts are now so common to artificially generated images they’ve become almost a genre in themselves. These appendages and synthetic ad-libs contravene our possible human identification with the image. This lack of empathic connection, the inability to bridge across the divide, is perhaps what feels so disquieting when we view synthetic images.

As we have seen, when feeding these images into caption generators to ‘read’ the picture, only humans can reliably extract meaning from these images. Trapped within this image-to-text-to-image feedback loop, as creators or viewers we’re ultimately left calling out to the machine: Once More, with Feeling!

We hope projects like this spark the flourishing of similar experiments for users of image generators to the critical and curious about the current state of artificial “intelligence”.

Find out more about A Generated Family of Man in our New Curators program area.

Making A Generated Family of Man: Revelations about Image Generators

Juwon Jung | Posted 29 September 2023

I’m Juwon, here at the Flickr Foundation for the summer this year. I’m doing a BA in Design at Goldsmiths. There’s more background on this work in the first blog post on this project that talks about the experimental stages of using AI image and caption generators.

“What would happen if we used AI image generators to recreate The Family of Man?”

When George first posed this question in our office back in June, we couldn’t really predict what we would encounter. Now that we’ve wrapped up this uncanny yet fascinating summer project, it’s time to make sense out of what we’ve discovered, learned, and struggled with as we tried to recreate this classic exhibition catalogue.

Bing Image Creator generates better imitations when humans write the directions

We used the Bing Image Creator throughout the project and now feel quite familiar with its strengths and weaknesses. There were a few instances where the Bing Image Creator would produce surprisingly similar photographs to the originals when we wrote captions, as can be seen below:

Here are the caption iterations we made for the image of the judge (shown above, on the right page of the book):

1st iteration:
A grainy black and white portrait shot taken in the 1950s of an old judge. He has light grey hair and bushy eyebrows and is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open. He is holding a page open with one hand. In his other hand is a pen. 

2nd iteration:
A grainy black and white portrait shot taken in the 1950s of an old judge. His body is facing towards the camera and he has light grey hair that is short and he is clean shaven. He is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open. 

3rd iteration:
A grainy black and white close up portrait taken in the 1950s of an old judge. His body is facing towards the camera and he has light grey hair that is short and he is clean shaven. He is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open. 

Bing Image Creator is able to demonstrate such surprising capabilities only when the human user accurately directs it with sharp prompts. Since Bing Image Creator uses natural language processing to generate images, the ‘prompt’ is an essential component to image generation. 

Human description vs AI-generated interpretation

We can compare human-written captions to the AI-generated captions made by another tool we used, Image-to-Caption. Since the primary purpose of Image-to-Caption.io is to generate ‘engaging’ captions for social media content, the AI-generated captions generated from this platform contained cheesy descriptors, hashtags, and emojis.

Using screenshots from the original catalogue, we fed images into that tool and watched as captions came out. This non-sensical response emerged for the same picture of the judge:

“In the enchanted realm of the forest, where imagination takes flight and even a humble stick becomes a magical wand. ✨🌳 #EnchantedForest #MagicalMoments #ImaginationUnleashed”

As a result, all of the images generated from AI captions looked like they were from the early Instagram-era in 2010; highly polished with strong, vibrant color filters. 

Here’s a selection of images generated using AI prompts from Image-to-Caption.io

Ethical implications of generated images?

As we compared all of these generated  images, it was our natural instinct to instantly wonder about the actual logic or dataset that the generative algorithm was operating upon. There were also certain instances where the Bing Image Creator would not be able to generate the correct ethnicity of the subject matter in the photograph, despite the prompt clearly specifying the ethnicity (over the span of 4-5 iterations).

Here are some examples of ethnicity not being represented as directed: 

What’s under the hood of these technologies?

What does this really mean though? I wanted to know more about the relationship between these observations and the underlying technology of the image generators, so I looked into the DALL-E 2 model (which is used in Bing Image Creator). 

DALL-E 2 and most other image generation tools today use the diffusion model to generate a new image that conveys the same, if not the most similar, semantic information of the input caption. In order to correctly match the visual semantic information to the corresponding textual semantic information, (e.g. matching the image of an apple to the word apple) these generative models are trained with large subsets of images and image descriptions online. 

Open AI has admitted that the “technology is constantly evolving, and DALL-E 2 has limitations” in their informational video about DALL-E 2.  

Such limitations include:

  • If the data used to train the model has been flawed and contains images that are incorrectly labeled, it may produce an image that doesn’t correspond to the text prompt. (e.g. if there are more images of a plane matched with the word car, the model can produce an image of a plane from the prompt ‘car’) 
  • The model may exhibit representational bias if it hasn’t been trained enough on a certain subject (e.g. producing an image of any kind of monkey rather than the species from the prompt ‘howler monkey’) 

From this brief research, I realized that these subtle errors of Bing Image Creator shouldn’t be simply overlooked. Whether or not Image Creator is producing relatively more errors for certain prompts could signify that, in some instances, the generated images may reflect the current visual biases, stereotypes, or assumptions that exist in our world today. 

A revealing experiment for our back cover

After having worked with very specific captions for hoped-for outcomes, we decided to zoom way out to create a back cover for our book. Instead of anything specific, we spent a short period after lunch one day experimenting with very general captioning to see the raw outputs. Since the theme of The Family of Man is the oneness of mankind and humanity, we tried entering the short words, “human,” “people,” and “human photo” in the Bing Image Creator.

These are the very general images returned to us: 

What do these shadowy, basic results really mean?
Is this what we, humans, reduce down to in the AI’s perspective? 

Staring at these images on my laptop in the Flickr Foundation headquarters, we were all stunned by the reflections of us created by the machine. Mainly consisting of elementary, undefined figures, the generated images representing the word “humans” ironically conveyed something that felt inherently opposite. 

This quick experiment at the end of the project revealed to us that perhaps having simple, general words as prompts instead of thorough descriptions may most transparently reveal how these AI systems fundamentally see and understand our world.

A Generated Family of Man is just the tip of the iceberg.

These findings aren’t concrete, but suggest possible hypotheses and areas of image generation technology that we can conduct further research on. We would like to invite everyone to join the Flickr Foundation on this exciting journey, to branch out from A Generated Family of Man and truly pick the brains of these newly introduced machines. 

Here are the summarizing points of our findings from A Generated Family of Man:
  • The abilities of Bing Image Creator to generate images with the primary aim of verisimilitude is impressive when the prompt (image caption) is either written by humans or accurately denotes the semantic information of the image.
  • In certain instances, the Image Creator performed relatively more errors when determining the ethnicity of the subject matter. This may indicate the underlying visual biases or stereotypes of the datasets the Image Creator was trained with.
  • When entering short, simple words related to humans into the Image Creator, it responded with undefined, cartoon-like human figures. Using such short prompts may reveal how the AI fundamentally sees our world and us. 

Open questions to consider

Using these findings, I thought that changing certain parameters of the investigation could make interesting starting points of new investigations, if we spent more time at the Flickr Foundation, or if anyone else wanted to continue the research. Here are some different parameters that can be explored:

  • Frequency of iteration: increase the number of trials of prompt modification or general iterations to create larger data sets for better analysis.
  • Different subject matter: investigate specific photography subjects that will allow an acute analysis on narrower fields (e.g. specific types of landscapes, species, ethnic groups).
  • Image generator platforms: look into other image generator softwares to observe distinct qualities for differing platforms.

How exciting would it be if different groups of people from all around the world participated in a collective activity to evaluate the current status of synthetic photography, and really analyze the fine details of these models? Maybe that wouldn’t scientifically reverse-engineer these models but even from qualitative investigations, findings emerge. What more will we be able to find? Will there be a way to match, cross-compare the qualitative and even quantitative investigations to deduce a solid (perhaps not definite) conclusion? And if these investigations were to take place in intervals of time, which variables will change? 

To gain inspiration for these questions, take a look at the full collection of images of A Generated Family of Man on Flickr!

A Flickr of Humanity: Who is The Family of Man?

Author: Maya Osaka (Design Intern) Posted July 10th 2023

Please enjoy a progress report on our R&D as we continue to develop the A Flickr of Humanity project. It’s a deep dive into the catalogue of the 1955 The Family of Man exhibition.

The Family of Man was an exhibition held at MoMA in 1955.

Organized by Edward Steichen, the acclaimed photographer, curator, and director of MoMA’s Department of Photography, the exhibition showcased 503 photographs from 68 countries. It celebrated universal aspects of the human experience, and was a declaration of solidarity following on from the Second World War. Photos from the exhibition were published as a physical catalog, and it’s largely considered a photographic classic.

Tasked with doing some research into The Family of Man I spent some time really looking at the book.

(The Family of Man 30th Anniversary Edition, 1986)

What I mean by ‘really looking at it’ is, instead of just flicking through the pages and briefly glancing at the photos I took the time to really take in each image, and to notice the narrative told through the photographs and how Steichen chose to curate the images to portray this narrative. From this experience I was able to see a clear order/narrative to the book which I listed in a spreadsheet. Each photo credits the photographer, where it was taken and which client or publication it was for (e.g. Life Magazine).

The introduction in the book explains that the exhibition was “conceived as a mirror of the universal elements and emotions in the everydayness of life—as a mirror of the essential oneness of mankind throughout the world.”

As I explored the book, I found myself wanting to answer the following questions:

  1. Where were the photographers from?
  2. Where were the photos taken?
  3. How many female photographers were involved?
  4. Who were the most featured photographers? 

In order to answer these questions I created a master index of the photographs.

This shows where they appear in the book, the country depicted, the photographer and which organization the image is associated with or was made for. From this ‘master’ spreadsheet I compiled three more views:

Here is what I discovered:

46% of the photos were taken in the USA (vs the rest of the world).

Out of 484 images depicted in The Family of Man 30th Anniversary Edition (1986), 220 are from the USA. That’s 46% of all the photos. The most heavily featured countries after America were: France (32 images), Germany (21 images) and England (15 images). All in Europe. Compared to America’s 46%, France, the runner up, makes up only 7% of the total number of images. 

The image is a screenshot of a section of the photos by geography spreadsheet.

 

75% of the images were shot in North America or Europe. 
  • Northern America: 231 images (out of which 220 are from the USA)
  • Europe: 128 images
  • Asia: 69 images (including 12 images shot in Russia)
  • Africa: 24
  • South America: 12
  • Oceania: 8
  • Arctic: 3
  • Australia: 2

At this stage I will note that as Russia spans across Asia and Europe, Russia’s 12 images have been included within Asia’s statistics (not Europe). Also the infographic excludes 3 images taken in the Arctic as they did not explicitly state which part of the Arctic they were taken in.

The image is a screenshot of a section of the photos by geography spreadsheet.

56% of the photographers were American.

Out of 251 known photographers, 155 were American. That is 56% of the total number of photographers. The most common nationalities that followed were: German (17), British and French (12 each), and 15 photographers were unknown. It is important to note that some of the photographers were multinationals and in these instances their birth nationality was counted. Information on the photographer’s nationalities were collected by searching up their name on the internet and looking for credible sources.

The image is a screenshot of a section of the photographer’s biographical data  spreadsheet. 

17% of the photographers were female.

Out of the 251 known photographers 48 were women. That is 17% of the total number of photographers. 

Note: There was one photograph that was credited to Diane and Allan Arbus. I counted them as two separate individuals (one male, one female).

The image is a screenshot of the photographer’s biographical data  spreadsheet. 

Which photographers were featured most?

  1. Wayne Miller (11 photos)
  2. Henri Cartier-Bresson (9 photos)
  3. Alfred Einstaedt (8 photos), Dmitri Kessell (8 photos), Dorothea Lange (8 photos), Nat Farbman (8 photos), Ruth Orkin (8 photos). 

The image is a screenshot of the most featured photographers spreadsheet. 

Conclusions

  1. The majority of photos were shot in the US and Europe. 
  2. More than half of the photographers were American.
  3. Most of the photographers were men.
  4. Among the top 10 most featured photographers were three women (Dorothea Lange, Ruth Orkin and Margaret Bourke White).

Where are the lost photos?

On the back of The Family of Man (30th Anniversary Edition, 1986) it is stated that all 503 images from the original exhibition are showcased within the book. However, after checking through the book multiple times the number of images that I have counted (excluding the introduction images featuring images of the exhibition itself and a portrait of Steichen) are 484. This means there are 19 images that are missing.

This mystery is currently being solved by my fellow intern, Juwon Jung, who, as I write this, is cross referencing the original MoMa exhibition master checklist with the book. We will keep you posted on whether this mystery gets solved!

Creating the Infographics

While collecting this data, I began to think about how this data could be visualized. Datasets on a spreadsheet are boring to look at and can struggle to effectively communicate what they mean. So I decided to create an infographic to showcase the datasets. 

Creating the infographics posed many creative challenges, especially because this was one of my first attempts at this sort of data visualization. One of the key challenges was to create visuals that are eye-catching but simple to read and communicate a clear message. In this case: that a disproportionately large amount of the photos and photographers are of or from the USA and the majority of photographers were men.

In order to draw attention to those facts, I used a combination of techniques: Firstly the statistics that I wanted to draw the most attention to are the brightest shade of pink. (The pink that was chosen is the same pink as the Flickr Foundation logo). Secondly, the pie chart and bar chart’s proportions are accurate and highlight just how disproportionate the statistics are. A comment next to each chart states a percentage that further highlights the point that is being made. 

George Oates (Executive Director at Flickr.org)—who has extensive experience working in data visualisation—helped a lot with perfecting the look of the infographic. (Thanks George!)

Below you can see how the graphics evolved.
*Note that the statistics on previous versions are not accurate!