Developing a New Research Method, Part 2: Introduction to Archivevoice

By Prakash Krishnan, 2024 Research Fellow

Many of my previous projects centre observational analysis of photography in community group settings. As my practice developed, I was led to the participatory research method called Photovoice. In 2016, Apaza & DeSantis documented a five-phase process methodology for the Photovoice method, and I am applying and extending it to selection and processing of archival photography and documentation that respond to researchers’ questions. I am calling this extension “Archivevoice.” But before I go deeper into that, let’s outline our framing, starting with the basics.

What is an archive?

At its simplest, an archive is a repository of historical records like photographs, documents, sound recordings, books and artworks. Speciality archives may focus on a particular medium, such as the Moving Image Archive or a place, like the London Metropolitan Archives. Archives house physical or digital records or a combination of both. Many archives are found within larger institutions such as universities, libraries, museums, government offices, and established public or private organizations. Usually, these archives have their materials grouped into collections managed by professionals called archivists. 

There are all kinds of informal archives as well. Lots of smaller community and cultural organizations keep records of their activities but may not have a dedicated archivist to keep them organized. We, individuals, also record our lives through photography, sometimes printing them or keeping them in digital photo albums, or online on various social media platforms like Instagram, Facebook, or Flickr.

What is Photovoice?

Originally conceived and put into practice by health researchers Caroline Wang and Mary Ann Burris in the early 1990s, Photovoice involves working alongside participants to take photographs and subsequently discuss them in order to be able to collectively illuminate and reflect upon contemporary issues within a community. At the end of the project, a selection of the photos taken and discussed is exhibited for the community to share the insights that were collectively produced. Often, researchers engaging in Photovoice seek to recalibrate the power imbalance between researcher and subject by lending the tools for research (i.e. the camera) to the active participants, thus elevating them to the position of collaborator, co-researcher, or co-producer.

Archivevoice is an extension of Photovoice, alongside others like Videovoice and Comicvoice. By using the principles of participatory action research developed in Photovoice, other researchers have modified their methods engaging in different artistic mediums for participants’ self-expression. Videovoice has the goal of getting “people, who are usually the subjects or consumers of mainstream media [to] get behind video cameras to research issues of concern, communicate their knowledge, and advocate for change.” Comicvoice, coined by John Baird, engages research groups in creating their own narratives from outsourced comics.

Why Archives?

There is so much rich, historical information available within archives. One of the limits of the Photovoice method is what is available to be photographed. Through Archivevoice, the research participants are able to navigate through much broader windows of time and space to reflect and discuss how historical events shape the contemporary moment. This kind of embodied practice of looking back and critically engaging with one’s community and culture through such a deep, reflective practice has been referred to by scholars as “ancestor work”. 

Archivevoice

Archivevoice is a Participatory Action Research (PAR) method that adapts the core principles of Photovoice using photographs or other archival documentation to undergo community-centered research.

“Through exhibitions and outreach events community members can be brought to the archive and made aware of what records are present in the archives. They can see themselves, their families, and their histories represented in the materials and engage with those who are responsible for preserving and describing that history. Through workshops and naming events, the archives can be brought to community members for the purpose of facilitating discussion, memory making, and healing together.”

– Kristen Young

What is the purpose of Archivevoice?

  • To use photographs and other archival documentation to reflect on collective experiences affecting communities
  • To gain insight about a particular community’s histories, activities, and concerns
  • To engage communities with their own archival records
  • To empower communities to lend their voice to heritage projects and document their own histories
  • To have community participants be co-producers of research
  • To activate the archive through creative presentation of the selected records (e.g. exhibition, zine, phonebook, etc.)

How could I run an Archivevoice session?

Archivevoice borrows the same five-phase approach outlined in Vanese Apaza and Phoebe DeSantis’s 2016 Facilitator’s Toolkit for a Photovoice Project, with the authors’ permission. The five phases adapted to Archivevoice are as follows:

Phase 1: Introduction to Archivevoice

Introducing the method, the project and its research questions. Introduce participants to the archive or collection that will be explored, and the possible project outputs.


Phase 2: Selection of archival photos or other documentation



In Archivevoice, this phase replaces the photo-taking step in Photovoice. This is when the participants will receive archive research training tailored to the particular archive they are working with. They search the archive and select the photographs or other materials they wish to discuss. The project manager or lead researcher may want to preselect the items available for study (e.g. limiting the search within specific collections or setting specific inclusion criteria such as using materials with ‘no known copyright restrictions’).


Phase 3: Discussion around selected media

Just as in Apaza & DeSantis’ process, the “SHOWeD” method can be used to prompt the research participants to discuss the selected media.

SHOWeD is an acronym used in community-based health care research inspired by the pedagogical teachings of Paulo Freire. 

S – What things did you see?

H – What was happening?

O – Does this happen in our community?
W – Why does this happen?

D – What can we do about it?



While SHOWeD has a long history of being used in conjunction with Photovoice, other reflection and discussion methods such as focus groups, semi-structured interviews, and narrative writing are also possible or can be used in conjunction with SHOWeD depending on what is deemed appropriate or relevant by the lead researcher.

Phase 4: Media processing for archive activation

Here the selected archival media must be prepared for display. This could involve ensuring that the researcher has the appropriate rights or permissions to reproduce the identified media, that copyright is granted (or no known copyright restrictions are applied), and digitizing, formatting, and printing.

Phase 5: Community exhibition or other public output

Photovoice projects often culminate in a public exhibition for the community who were the subjects of the photos taken during the project. Similarly, once proper permissions are secured for the selected archival media, an exhibition of these items can be produced either physically – in the archive itself or gallery or community centre or online. Other options for public presentations of these selected media could be a book or zine, online exhibition, documentary, podcast, and more. 

Archivevoice serves as an adaptation of Photovoice to facilitate engagement with archival intervention and activation. According to Freire, “education must begin with the solution of the teacher-student contradiction, by reconciling the poles of the contradiction so that both are simultaneously teachers and students.” In this sense, a critical pedagogy requires both parties, “teachers” and “students” to understand that they each have something to learn from the other and that knowledge can be freely transferred from one to another. Participants who become de facto co-researchers in these projects in which they are given relative autonomy to express themselves through the selected media (e.g. photo, video, comic, archive, etc.) are empowered to have their feedback and knowledge heard and understood and they make planning and curatorial decisions. By elevating their status to co-researchers and collaborators, 

Archivevoice and the other -voice projects dissolve knowledge hierarchies asserting that lived experience and community knowledge merit their place in research and public pedagogy projects.  

For my final blog post—coming soon—I will report on my own investigation of the Archivevoice method, through a workshop I ran recently in Montreal with researchers from the Access in the Making Lab located at Concordia University in Montreal, Canada. Members from the lab are currently engaged in a project researching the disabling conditions that climate change and systems of extraction are having on various populations and ecosystems around the world. Together, we went through the steps of the Archivevoice method using Flickr as its source archive, looking for and discussing images that related to the researchers’ individual projects. The vast quantity of photos available in the Flickr archive prompted many interesting topics of discussion that will be explored in part three of this series.

Bibliography

Apaza, Vanesa, Phoebe Desantis, Aurea DeLeon, Jaclyn Keelin, Alexandra Ovits, Sherrine Schuldt, and Michael Spillane. “Facilitator’s Toolkit for a Photovoice Project.” United for Prevention in Passaic County and the William Paterson University Department of Public Health, 2016. https://www.up-in-pc.org/clientuploads/Whatwedo/Flyers/UPinPC_Photovoice_Facilitator_Toolkit_Final.pdf.

BAIRD, John Loige. “Comicvoice: Community Education through Sequential Art.” In POP CULTURE ASSOCIATION ANNUAL MEETING, Vol. 13, 2010.

Catalani, Caricia E. C. V., Anthony Veneziale, Larry Campbell, Shawna Herbst, Brittany Butler, Benjamin Springgate, and Meredith Minkler. “Videovoice: Community Assessment in Post-Katrina New Orleans.” Health Promotion Practice 13, no. 1 (January 1, 2012): 18–28. https://doi.org/10.1177/1524839910369070.

Freire, Paulo, Donaldo P. Macedo, Ira Shor, and Myra Bergman Ramos. Pedagogy of the Oppressed. 50th anniversary edition. 1 online resource (viii, 220 pages) vols. New York: Bloomsbury Academic, 2018. https://nls.ldls.org.uk/welcome.html?ark:/81055/vdc_100055048362.0x000001.

Shaffer, Roy. “Beyond the Dispensary.” English Press: Nairobi, Kenya, 1986. https://www.amoshealth.org/wp-content/uploads/sites/62/2019/10/Beyond-the-Dispensary.pdf.

Young, Kristen. “Black Community Archives in Practice.” In Black Community Archives in Practice, 211–21. McGill-Queen’s University Press, 2023. https://doi.org/10.1515/9780228019152-011.

 

Field Notes #01: Lughnasadh

by Fattori McKenna

Deep Reading in the Last Days of Summer

 

I joined the Foundation team in early August, with the long-term goal of better understanding future users of the Data Lifeboat project and Safe Harbor network. Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, my first task was to get up to speed with the Data Lifeboat project, a concept that has been in the works since 2022, as part of Flickr.org’s Content Mobility Program, and recently developed a working prototype. I have the structured independence to design my own research plan and, as every researcher knows, being able to immerse oneself in the topic prior, is a huge advantage. It allows us to frame the problem at hand, to be resolute with objectives and ground the research in what is known and current.

 

Stakeholder interviews

To understand what would be needed from the research plan, I first wanted to understand how we got to where we are with Data Lifeboat project.

I spoke with Flickr.org’s tight-knit internal team to gather perspectives that emphasised varying approaches to the question of long-term digital preservation: ranging from the technological, to the speculative, to the communal. It was curious to see how different team members viewed the project, each speaking from their own specialty, with their wider ambitions and community in mind.

Branching out, I enlisted external stakeholders for half-hour chats, those who’ve had a hand in the Data Lifeboat project since it was in napkin-scribble format. The tool owes its present form to a cadre of digital preservation experts and enthusiasts, who do not work on the project full-time, but have generously given their hours to partake in workshops, coffees, Whereby calls, and a blissfully meandering Slack thread. Knowing these folks would be, themselves, a huge repository of knowledge, I wanted a way to capture this. Besides introductions to the Safe Harbor Network co-design workshops (as supported by the recent Mellon Foundation grant) and my new role, I centred our conversation around three key questions:

  1. What has your experience of the last six months of the Data Lifeboat project been like? How do you think we are doing? Any favourite moments, any concerns?
  2. What are the existing practices around digital acquisition, storage and maintenance in your organisation(s)? How would the Data Lifeboat and Safe Harbor Network differ from the existing practices?
  3. Where are the blind-spots that still exist for developing the Data Lifeboat project and Safe Harbor Network? What might we want to find out from the co-design workshops in October and November?

Here it was notable to learn what had stuck with them in the repose since the last Data Lifeboat project meet-up. For some the emphasis was on how the Data Lifeboat tool could connect institutions, for others it was how the technology can decentralise power and ownership of data. All were keen to see what shape the project would take next.

One point, however, remained amorphous to all stakeholders that we ought to carry forward into research: what is the problem that Data Lifeboat project is solving? Specifically in a non-emergency scenario (as the emergency need is intuitive). How can we best articulate that problem to our imagined users?

As our prototype user group is likely to be institutional users of Flickr (Galleries, Libraries, Archives and Museums), it will be important to meet them where they are, which brought me onto my next August task: the mini-literature review.

 

Mini Literature Review

Next, I wanted to get up to date on the contemporary discourses around digital preservation. Whilst stakeholders have brought their understanding of these topics to shaping the Data Lifeboat project, it felt as if the project was missing its own bibliography or set of citations. I wanted to ask, what are the existing conversations that Data Lifeboat project is speaking to?

It goes without saying that this is a huge topic and, despite my humble background in digital heritage research (almost always theoretical), cramming this all into one month would be impossible. Thus, I adopted the ethos of the archival ‘sliver’ that so informs the ethos of the Data Lifeboat project, to take a snapshot of current literature. After reviewing the writing to date on the project (shout-out to Jenn’s reporting here and here), I landed on three guiding topics for the literature review:

 

The Status of Digital Preservation

  • What are the predominant tools and technologies of digital preservation?
  • What are recent reflections and learnings from web archiving experiments?
  • What are current institutional and corporate strategies to digital social collecting and long-term data storage?

Examples include:

Care & Ethics of Archives

  • What are the key ethical considerations among archivists today?
  • How are care practices being embedded into archives and archival practice?
  • What reflections and responses exist to previous ethical interventions?

Examples include:

Collaboration and Organisation in Archival Practice

  • What are the infrastructures (hard and soft) of archival practice?
  • What are the predominant organisational structures, considerations and difficulties in digital archives
  • How does collaboration appear in archives? Who are the (visible and invisible) stakeholders?

Examples include:

 

A selection of academic articles, blog posts and industry guidelines were selected as source materials (as well as crowdsourcing from the Flickr.org team’s favourites). In reading these texts, I had top of mind the questions: ‘What does this mean for the Data Lifeboat project and the Safe Harbor Network’, in more granular terms this means, ‘What can we learn from these investigations?’ ‘Where are we positioned in the wider ecosystem of digital preservation?’ and finally, ‘What should we be thinking about that we aren’t yet?’

Naturally with more time, or with an academic audience in mind, a more rigorous methodology to discourse capture would be appropriate. For our purposes, however, this snapshot approach suffices – ultimately the data this research is grounded in comes not from textual problematising, but instead will emerge from our workshops with future users.

Having this resource is of huge benefit to meeting our session participants where they stand. Whilst there will inevitably be discourses, approaches and critiques I have missed, I will at least be able to speak the same language as our participants and get into the weeds of our problems in a complex, rather than baseline, manner. Furthermore, my ambition is for this bibliography to become an ongoing and open-source asset, expanding as the project develops.

These three headers (1. The Status of Digital Preservation, 2. Care & Ethics of Archives, 3. Collaboration and Organisation in Archival Practice) currently constitute placeholders for our workshop topics. It is likely, however, that these titles could evolve, splinter or coalesce as we come closer to a more refined and targeted series of questions for investigating with our participants.

 

Question Repository [in the works]

Concurrently to these ongoing workstreams, I am building a repository, or long-list, of questions for our upcoming workshops. The aim is to first go broad, listing all possible questions, in an attempt to capture as many inquisitive voices as possible. These will then be refined down, grouped under thematic headings which will in turn structure the sub-points or provocations for our sessions. This iterative process reflects a ground-up methodology, derived from interviews, reading, and the collective knowledge of the Flickr.org community, to finally land on working session titles for our October and November Safe Harbor Network co-design workshops.

Looking ahead, there is an opportunity to test several of these provocations around Data Lifeboat at our Birds-of-a-Feather session, taking place at this year’s International Conference on Digital Preservation (iPres) in Ghent later this month. Here we might foresee which questions generate lively and engaged discussion; which features of the Data Lifeboat tool and project prompt anticipation or concern; and finally, which pathways we ought to explore further.

 

Other things I’ve been thinking about this month

Carl Öhman’s concept of the Neo-Natufians in The Afterlife of Data: What Happens to Your Information When you Die and Why You Should Care

Öhman proposes that the digital age has ushered in a major shift in how we interact with our deceased. Referencing the Natufians, the first non-nomadic peoples to keep the dead among their tribe (who would adorn skulls with seashells and place them in the walls) instead of leaving them behind to the elements, he posits our current position is equally as seismic. The dead now live alongside us in the digital realm. A profound shift from the family shoebox of photographs, the dead are accessible from virtually anywhere at any time, their (visible and invisible) data trail co-existing with ours. An inescapable provocation for the Data Lifeboat project to consider.

“The imago mask, printed not in wax but in ones and zeros”

The Shikinen Sengu Ritual at Ise Jingu, Japan

The Shikinen Sengu is a ritual held at the Ise Grand Shrine in Japan every 20 years, where the shrine is completely rebuilt and the sacred objects are transferred to the new structure. This practice has been ongoing for over a millennium and makes me think on the mobility of cultural heritage (analogue or digital) and that stasis, despite its intuitive appeal, can cause objects to perish. I am reminded of the oft-exalted quote from di Lampedusa’s Sicilian epic:

“If we want things to stay as they are, things will have to change.” The Leopard, by Giuseppe Tomasi di Lampedusa

Furthermore Shikinen Sengu highlights the importance of ritual in sustaining objects, despite the wear-and-tear that handling over millennia may cause. What might our rituals around digital cultural data be, what practices could we generate (even if the original impetus gets lost)?

 

Background Ephemera

Currently Playing: Laura Misch Sample the Earth and Sample the Sky

Currently Reading: The Hearing Trumpet by Leonora Carrington

Currently Drinking: Clipper Green Tea

Making some marvelous maps

This week we added maps to our Commons Explorer, and it’s proving to be a fun new way to find photos.

There are over 50,000 photos in the Flickr Commons collection which have location information telling us where the photo was taken. We can plot those locations on a map of the world, so you can get a sense of the geographical spread:

This map is interactive, so you can zoom in and move around to focus on a specific place. As you do, we’ll show you a selection of photos from the area you’ve selected.

You can also filter the map, so you see photos from just a single Commons member. For smaller members the map points can tell a story in themselves, and give you a sense of where a collection is and what it’s about:

These maps are available now, and know about the location of every geotagged photo in Flickr Commons.

Give them a try!

How can you add a location to a Flickr Commons photo?

For the first version of this map, we use the geotag added by the photo’s owner.

If you’re a Flickr Commons member, you can add locations to your photos and they’ll automatically show up on this map. The Flickr Help Center has instructions for how to do that.

It’s possible for other Flickr members to add machine tags to photos, and there are already thousands of crowdsourced tags that have location-related information. We don’t show those on the map right now, but we’re thinking about how we might do that in future!

How does the map work?

There are three technologies that make these maps possible.

The first is SQLite, the database engine we use to power the Commons Explorer. We have a table which contains every photo in the Flickr Commons, and it includes any latitude and longitude information. SQLite is wicked fast and our collection is small potatoes, so it can get the data to draw these maps very quickly.

I’d love to tell you about some deeply nerdy piece of work to hyper-optimize our queries, but it wasn’t necessary. I wrote the naïve query, added a couple of column indexes, and that first attempt was plenty fast. Tallying the locations for the entire Flickr Commons collection takes ~45ms; tallying the locations for an individual member is often under a millisecond.)

The second is Leaflet.js, a JavaScript library for interactive maps. This is a popular and feature-rich library that made it easy for us to add a map to the site. Combined with a marker clustering plugin, we had a lot of options for configuring the map to behave exactly as we wanted, and to connect it to Flickr Commons data.

The third is OpenStreetMap. This is a world map maintained by a community of volunteers, and we use their map tiles as the backdrop for our map.

Plus ça Change

To help us track changes to the Commons Explorer, we’ve added another page: the changelog.

This is part of our broader goal of archiving the organization. Even in the six months since we launched the Explorer, it’s easy to forget what happened when, and new features quickly feel normal. The changelog is a place for us to remember what’s changed and what the site used to look like, as we continue to make changes and improvements.

Developing a New Research Method, Part 1: Photovoice, critical fabulation, and archives

by Prakash Krishnan

Prakash Krishnan is a 2024 Flickr Foundation Research Fellow, working to engage community organizations with the creative possibilities afforded through archival and photo research as well as to unearth and activate some of the rich histories embedded in the Flickr archive.

I had the wonderful opportunity to visit London and Flickr Foundation HQ during the month of May 2024. The first month of my fellowship was a busy one, getting settled in, meeting the team, and making contacts around the UK to share and develop my idea for a new qualitative research method that was inspired by my perusing of just a minuscule fraction of the billions of photos uploaded and visible on Flickr.com.

Unlike the brilliant and techno-inspired minds of my Flickr Foundation cohort: George, Alex, Ewa, and Eryk, my head is often drifting in the clouds (the ones in the actual sky) or deep in books, articles, and archives. Since rediscovering Flickr and contemplating its many potential uses, I have activated my past work as a researcher, artist, and cultural worker, to reflect upon the ways Flickr could be used to engage communities in various visual and digital ethnographies.

Stemming from anthropology and the social sciences more broadly, ethnography is a branch of qualitative research involving the study of cultures, communities, or organizations. A visual ethnography thereby employs visual methods, such as photography, film, drawing, or painting.. Similarly, digital ethnography refers to the ethnographic study of cultures and communities as they interact with digital and internet technologies.

In this first post, I will trace a nonlinear timeline of different community-based and academic research projects I have conducted in recent years. Important threads from each of these projects came together to form the basis of the new ethnographic method I have developed over the course of this fellowship, which I call Archivevoice

Visual representations of community

The research I conducted for my masters thesis was an example of a digital, visual ethnography. For a year, I observed Instagram accounts sharing curated South Asian visual media, analyzing the types of content they shared, the different media used, the platform affordances that were engaged with, the comments and discussions the posts incited, and how the posts reflected contemporary news, culture, and politics. I also interviewed five people whose content I had studied. Through this research I observed a strong presence of uniquely diasporic concerns and aesthetics. Many posts critiqued the idea of different nationhoods and national affiliations with the countries founded after the partition of India in 1947 – a violent division of the country resulting in mass displacement and human casualty whose effects are still felt today. Because of this violent displacement and with multiple generations of people descended from the Indian subcontinent living outside of their ancestral territory, among many within the community, I observed a rejection of nationalist identities specific to say India, Pakistan, or Bangladesh. Instead, people were using the term “South Asian” as a general catchall for communities living in the region as well as in the diaspora. Drawing from queer cultural theorist José Esteban Muñoz, I labelled this digital, cultural phenomenon I observed “digital disidentification.”[1] 

My explorations of community-based visual media predate this research. In 2022, I worked with the Montreal grassroots artist collective and studio, Cyber Love Hotel, to develop a digital archive and exhibition space for 3D-scanned artworks and cultural objects called Things+Time. In 2023, we hosted a several-week-long residency program with 10 local, racialized, and queer artists. The residents were trained on archival description and tagging principles, and then selected what to archive. The objects curated and scanned in the context of this residency were in response to the overarching theme loss during the Covid-19 pandemic, in which rampant closures of queer spaces, restaurants, nightlife, music venues, and other community gathering spaces were proliferating across the city.

During complete pandemic lockdown, while working as the manager for cultural mediation at the contemporary gallery Centre CLARK, I conducted a similar project which involved having participants take photographs which responded to a specific prompt. In partnership with the community organization Head & Hands, I mailed disposable cameras to participants from a Black youth group whose activities were based at Head & Hands. Together with artist and CLARK member, Eve Tangy, we created educational videos on the principles of photography and disposable camera use and tasked the participants to go around their neighbourhoods taking photos of moments that, in their eyes, sparked Black Joy—the theme of the project. Following a feedback session with Eve and myself, the two preferred photos from each participants’ photo reels were printed and mounted as part of a community exhibition entitled Nous sommes ici (“We’re Here”) at the entry of Centre CLARK’s gallery. 


These public community projects were not formal or academic, but, I came to understand each of these projects as examples of what is called research-creation (or practice-based research or arts-based research). Through creative methods like curating objects for digital archiving and photography, I, as the facilitator/researcher, was interested in how the media comprising each exhibition would inform myself and the greater public about the experiences of marginalized artists and Black youth at such pivotal moments in these communities.

Photovoice: Empowering research participants

The fact that both these projects involved working with a community and giving them creative control over how they wanted their research presented reminded me of the popular qualitative research method used often within the fields of public health, sociology, and anthropology called Photovoice. The method was originally coined as Photo Novella in 1992 and then later renamed Photovoice in 1996 by researchers Caroline Wang and Mary Ann Burris. The flagship study that established this method for decades involved scholars providing cameras and photography training to low-income women living in rural villages of Yunnan, China.

The goals of this Photovoice research were to better understand, through the perspectives of these women, the challenges they faced within their communities and societies, and to communicate these concerns to policymakers who might be more amenable to photographic representations rather than text. Citing Paulo Freire, Wang and Burris note the potential photographs have to raise consciousness and promote collective action due to their political nature. [5]

According to Wang and Burris, “these images and tales have the potential to reach generations of children to come.” [6] The images created a medium through which these women were able to share their experiences and also relate to each other. Even with 50 villages represented in the research, shared experience and strong reactions to certain photographs came up for participants – including this picture of a young child lying in a field while her mother farmed nearby. 

According to the authors, “the image was virtually universal to their own experience. When families must race to finish seasonal cultivating, when their work load is heavy, and when no elders in the family can look after young ones, mothers are forced to bring their babies to the field. Dust and rain weaken the health of their infants… The photograph was a lightening [sic] rod for the women’s discussion of their burdens and needs.” [8]

Since its conception in the 1990s as a means for participatory needs assessment, many scholars and researchers have expanded Photovoice methodology. Given the exponential increase of camera access via smartphones, Photovoice is an increasingly feasible method for this kind of research. Recurring themes in Photovoice work include community health, mental health studies, ethnic and race-based studies, research with queer communities, as well as specific neighbourhood and urban studies. During the pandemic lockdowns, there were also Photovoice studies conducted entirely online, thus giving rise to the method of virtual Photovoice. [9]

Critical Fabulation: Filling the gaps in visual history

Following my masters thesis research, I became more interested in how communities sought to represent themselves through photography and digital media. Not only that, but also how communities would form and engage with content circulated on social media – despite these people not being the originators of this content. 

In my research, people reacted most strongly to family photographs depicting migration from South Asia to the Global North. Although reasons for emigration varied across the respondents, many people faced similar challenges with the immigration process and resettlement in a new territory. They shared their experiences through commenting online. 

People in communities which are underrepresented in traditional archives are often forced to work with limited documentation. They must do the critical and imaginative work of extrapolating what they find. While photographs can convey biographical, political, or historical meaning, exploring archived images with imagination can foster creative interpretation to fill gaps in the archival record. Scholar of African-American studies, Saidiya Hartman, introduced the term “critical fabulation” to denote this practice of reimagining the sequences of events and actors behind the narratives contained within the archive. In her words, this reconfiguration of story elements, attempts “to jeopardize the status of the event, to displace the received or authorized account, and to imagine what might have happened or might have been said or might have been done.” [10] In reference to depictions of narratives from the Atlantic slave trade in which enslaved people are often referred to as commodities, Hartman writes “the intent of this practice is not to give voice to the slave, but rather to imagine what cannot be verified, a realm of experience which is situated between two zones of death—social and corporeal death—and to reckon with the precarious lives which are visible only in the moment of their disappearance. It is an impossible writing which attempts to say that which resists being said (since dead girls are unable to speak). It is a history of an unrecoverable past; it is a narrative of what might have been or could have been; it is a history written with and against the archive.” [11]

I am investigating what it means to imagine the unverifiable and reckoning what only becomes visible at its disappearance. In 2020, I wrote about Facebook pages serving as archives of queer life in my home town, Montreal. [12] For this study, I once again conducted a digital ethnography, this time of the event pages surrounding a QTPOC (queer/trans person of colour)-led event series known as Gender B(l)ender. Drawing from Sam McBean, I argued that simply having access to these event pages on Facebook creates a space of possibility in which one can imagine themselves as part of these events, as part of these communities – even when physical, in-person participation is not possible. Although critical fabulation was not a method used in this study, it seemed like a precursor to this concept of collectively rethinking, reformulating, and resurrecting untold, unknown, or forgetting histories of the archives. This finally leads us to the project of my fellowship here at the Flickr Foundation.

In addition to this fellowship, I am coordinator of the Access in the Making Lab, a university research lab working broadly on issues of critical disability studies, accessibility, anti-colonialism, and environmental humanities. In my work, I am increasingly preoccupied with the question of methods: 1) how do we do archival research—especially ethical archival research—with historically marginalized communities; and, 2) how can research “subjects” be empowered to become seen as co-producers of research. 

I trace this convoluted genealogy of my own fragmented research and community projects to explain the method I am developing and have proposed to university researchers as a part of my fellowship. Following my work on Facebook and Instagram, I similarly position Flickr as a participatory archive, made by millions of people in millions of communities. [13] Eryk Salvaggio, fellow 2024 Flickr Foundation research fellow, also positions Flickr as an archive such that it “holds digital copies of historical artifacts for individual reflection and context.” [14] From this theoretical groundwork of seeing these online social image/media repositories as archives, I seek to position archival items – i.e. the photos uploaded to Flickr.com – as a medium for creative interpretation by which researchers could better understand the lived realities of different communities, just like the Photovoice researchers. I am calling this set of work and use cases “Archivevoice”.

In part two of this series, I will explore the methodology itself in more detail including a guide for researchers interested in engaging with this method.

Footnotes

[1] Prakash Krishnan, “Digital Disidentifications: A Case Study of South Asian Instagram Community Archives,” in The Politics and Poetics of Indian Digital Diasporas: From Desi to Brown (Routledge, 2024), https://www.routledge.com/The-Politics-and-Poetics-of-Indian-Digital-Diasporas-From-Desi-to-Brown/Jiwani-Tremblay-Bhatia/p/book/9781032593531.

[2] Caroline Wang and Mary Ann Burris, “Empowerment through Photo Novella: Portraits of Participation,” Health Education Quarterly 21, no. 2 (1994): 171–86.

[3] Kunyi Wu, Visual Voices, 100 Photographs of Village China by the Women of Yunnan Province, 1995.

[4] Wu.

[5] Caroline Wang and Mary Ann Burris, “Photovoice: Concept, Methodology, and Use for Participatory Needs Assessment,” Health Education & Behavior 24, no. 3 (1997): 384.

[6] Wang and Burris, “Empowerment through Photo Novella,” 179.

[7] Wang and Burris, “Empowerment through Photo Novella.”

[8] Wang and Burris, 180.

[9] John L. Oliffe et al., “The Case for and Against Doing Virtual Photovoice,” International Journal of Qualitative Methods 22 (March 1, 2023): 16094069231190564, https://doi.org/10.1177/16094069231190564.

[10] Saidiya Hartman, “Venus in Two Acts,” Small Axe 12, no. 2 (2008): 11.

[11] Hartman, 12.

[12] Prakash Krishnan and Stefanie Duguay, “From ‘Interested’ to Showing Up: Investigating Digital Media’s Role in Montréal-Based LGBTQ Social Organizing,” Canadian Journal of Communication 45, no. 4 (December 8, 2020): 525–44, https://doi.org/10.22230/cjc.2020v44n4a3694.

[13] Isto Huvila, “Participatory Archive: Towards Decentralised Curation, Radical User Orientation, and Broader Contextualisation of Records Management,” Archival Science 8, no. 1 (March 1, 2008): 15–36, https://doi.org/10.1007/s10502-008-9071-0.

[14] Eryk Salvaggio, “The Ghost Stays in the Picture, Part 1: Archives, Datasets, and Infrastructures,” Flickr Foundation (blog), May 29, 2024, https://www.flickr.org/the-ghost-stays-in-the-picture-part-1-archives-datasets-and-infrastructures/.

Bibliography

Hartman, Saidiya. “Venus in Two Acts.” Small Axe 12, no. 2 (2008): 1–14.

Huvila, Isto. “Participatory Archive: Towards Decentralised Curation, Radical User Orientation, and Broader Contextualisation of Records Management.” Archival Science 8, no. 1 (March 1, 2008): 15–36. https://doi.org/10.1007/s10502-008-9071-0.

Krishnan, Prakash. “Digital Disidentifications: A Case Study of South Asian Instagram Community Archives.” In The Politics and Poetics of Indian Digital Diasporas: From Desi to Brown. Routledge, 2024. https://www.routledge.com/The-Politics-and-Poetics-of-Indian-Digital-Diasporas-From-Desi-to-Brown/Jiwani-Tremblay-Bhatia/p/book/9781032593531.

Krishnan, Prakash, and Stefanie Duguay. “From ‘Interested’ to Showing Up: Investigating Digital Media’s Role in Montréal-Based LGBTQ Social Organizing.” Canadian Journal of Communication 45, no. 4 (December 8, 2020): 525–44. https://doi.org/10.22230/cjc.2020v44n4a3694.

Oliffe, John L., Nina Gao, Mary T. Kelly, Calvin C. Fernandez, Hooman Salavati, Matthew Sha, Zac E. Seidler, and Simon M. Rice. “The Case for and Against Doing Virtual Photovoice.” International Journal of Qualitative Methods 22 (March 1, 2023): 16094069231190564. https://doi.org/10.1177/16094069231190564.

Salvaggio, Eryk. “The Ghost Stays in the Picture, Part 1: Archives, Datasets, and Infrastructures.” Flickr Foundation (blog), May 29, 2024. https://www.flickr.org/the-ghost-stays-in-the-picture-part-1-archives-datasets-and-infrastructures/.

Wang, Caroline, and Mary Ann Burris. “Empowerment through Photo Novella: Portraits of Participation.” Health Education Quarterly 21, no. 2 (1994): 171–86.

———. “Photovoice: Concept, Methodology, and Use for Participatory Needs Assessment.” Health Education & Behavior 24, no. 3 (1997): 369–87.

Wu, Kunyi. Visual Voices, 100 Photographs of Village China by the Women of Yunnan Province, 1995.

The Ghost Stays in the Picture, Part 3: The Power of the Image

Eryk Salvaggio is a 2024 Flickr Foundation Research Fellow, diving into the relationships between images, their archives, and datasets through a creative research lens. This three-part series focuses on the ways archives such as Flickr can shape the outputs of generative AI in ways akin to a haunting. You can read part one and two.

“Definitions belong to the definers, not the defined.”
― Toni Morrison, Beloved

Generative Artificial Intelligence is sometimes described as a remix engine. It is one of the more easily graspable metaphors for understanding these images, but it’s also wrong. 

As a digital collage artist working before the rise of artificial intelligence, I was always remixing images. I would do a manual search of the public domain works available through the Internet Archive or Flickr Commons. I would download images into folders named for specific characteristics of various images. An orange would be added to the folder for fruits, but also round, and the color orange; cats could be found in both cats and animals

I was organizing images solely on visual appearance. It was anticipating their retrieval whenever certain needs might emerge. If I needed something round to balance a particular composition, I could find it in the round folder, surrounded by other round things: fruits and stones and images of the sun, the globes of planets and human eyes. 

Once in the folder, the images were shapes, and I could draw from them regardless of what they depicted. It didn’t matter where they came from. They were redefined according to their anticipated use. 

A Churning

This was remixing, but I look back on this practice with fresh eyes when I consider the metaphor as it is applied to diffusion models. My transformation of source material was not merely based on their shapes, but their meaning. New juxtapositions emerged, recontextualizing those images. They retained their original form, but engaged in new dialogues through virtual assemblages. 

As I explore AI images and the datasets that help produce them, I find myself moving away from the concept of the remix. The remix is a form of picking up a melody and evolving it, and it relies on human expression. It is a relationship, a gesture made in response to another gesture.

To believe we could “automate” remixing assumes too much of the systems that do this work. Remixes require an engagement with the source material. Generative AI systems do not have any relationship with the meanings embedded into the materials they reconfigure. In the absence of engagement, what machines do is better described as a churn, combining two senses of the word. Generative AI models churn images in that they dissolve the surface of these images. Then it churns out new images, that is, “to produce mechanically and in great volume.” 

Of course, people can diffuse the surface meaning of images too. As a collagist, I could ignore the context of any image I liked. We can look at the stereogram below and see nothing but the moon. We don’t have to think about the tools used to make that image, or how it was circulated, or who profited from its production. But as a collagist, I could choose to engage with questions that were hidden by the surfaces of things. I could refrain from engagements with images, and their ghosts, that I did not want to disturb. 

Actions taken by a person can model actions taken by a machine. But the ability to automate a person’s actions does not suggest the right or the wisdom to automate those actions. I wonder if, in the case of diffusion models, we shouldn’t more closely scrutinize the act of prising meaning from an image and casting it aside. This is something humans do when they are granted, or demand, the power to do so. The automation of that power may be legal. But it also calls for thoughtful restraint. 

In this essay, I want to explore the power to inscribe into images. Traditionally, the power to extract images from a place has been granted to those with the means to do so. Over the years, the distribution and circulation of images has been balanced against those who hold little power to resist it. In the automation of image extraction for training generative artificial intelligence, I believe we are embedding this practice into a form of data colonialism. I suggest that power differentials haunt the images that are produced by AI, because it has molded the contents of datasets, and infrastructures, that result in those images. 

The Crying Child

Temi Odumosu has written about the “digital reproduction of enslaved and colonized subjects held in cultural heritage collections.” In The Crying Child, Odumosu looks at the role of the digital image as a means of extending the life of a photographic memory. But this process is fraught, and Odumosu dedicates the paper to “revisiting those breaches (in trust) and colonial hauntings that follow photographed Afro-diasporic subjects from moment of capture, through archive, into code” (S290). It does so by focusing on a single image, taken in St. Croix in 1910: 

“This photograph suspends in time a Black body, a series of compositional choices, actions, and a sound. It represents a child standing alone in a nondescript setting, barefoot with overpronation, in a dusty linen top too short to be a dress, and crying. Clearly in visible distress, with a running nose and copious tears rolling down its face, the child’s crinkled forehead gives a sense of concentrated energy exerted by all the emotion … Emotions that object to the circumstances of iconographic production.”

The image emerges from the Royal Danish Library. It was taken by Axel Ovesen, a military officer who operated a commercial photography business. The photograph was circulated as a postcard, and appears in a number of personal and commercial photo albums Odumosu found in the archive.

The unnamed crying child appeared to the Danish colonizers of the island as an amusement, and is labeled only as “the grumpy one” (in the sense of “uncooperative”). The contexts in which this image appeared and circulated were all oriented toward soothing and distancing the colonizers from the colonized. By reframing it as a humorous novelty, the power to apply and remove meaning is exercised on behalf of those who purchase the postcard and mail it to others for a laugh. What is literally depicted in these postcards is, Odumosi writes, “the means of production, rights of access, and dissemination” (S295). 

I am describing this essay at length because the practice of categorizing this image in an archive is so neatly aligned with the collection and categorization of training data for algorithmic images. Too often, the images used for training are treated solely as data, and training defended as an act that leaves no traces. This is true. The digital copy remains intact.

But the image is degraded, literally, step by step until nothing remains but digital noise. The image is churned, the surface broken apart, and its traces stored as math tucked away in some vector space. It all seems very tidy, technical, and precise, if you treat the image as data. But to say so requires us to agree that the structures and patterns of the crying child in the archive — the shape of the child’s body, the details of the wrinkled skin around the child’s mouth — are somehow distinct from the meaning of the image. 

Because by diffusing these images into an AI model, and pairing existing text labels to it within the model, we extend the reach of Danish colonial power over the image. For centuries, archives have organized collections into assemblages shaped and informed by a vision of those with power over those whose power is held back. The colonizing eye sets the crying child into the category of amusements, where it lingers until unearthed and questioned.

If these images are diffused into new images — untraceable images, images that claim to be without context or lineage — how do we uncover the way that this power is wielded and infused into the datasets, the models, and the images ultimately produced by the assemblage? What obligations linger beneath the surfaces of things? 

Every Archive a Collage

Collage can be one path for people to access these images and evaluate their historical context. The human collage maker, the remixer, can assess and determine the appropriateness of the image for whatever use they have in mind. This can be an exercise of power, too, and it ought to be handled consciously. It has featured as a tool of Situationist detournement, a means of taking images from advertising and propaganda to reveal their contradictions and agendas. These are direct confrontations, artistic gestures that undermine the organization of the world that images impose on our sense of things. The collage can be used to exert power or challenge the status quo. 

Every archive is a collage, a way of asserting that there is a place for things within an emergent or imposed structure. The scholar and artist Beth Coleman’s work points to the reversal of this relationship, citing W.E.B. Du Bois’ exhibition at the 1900 Paris Exposition. M. Murphy writes,

“Du Bois’s use of [photographic] evidence disrupted racial kinds rather than ordered them … Du Bois’s exhibition was crucially not an exhibit of ‘facts’ and ‘data’ that made black people in Georgia knowable to study, but rather a portrait in variation and difference so antagonistic to racist sociology as to dislodge race as a coherent object of study” (71).

The imposed structures of algorithmically generated images rely on facts and data, defined a certain way. They struggle with context and difference. The images these tools produce are constrained to the central tendencies of the data they were trained on, an inherently conformist technology. 

To challenge these central tendencies means to engage with the structures it imposes on this data, and to critique this churn of images into data to begin with. Matthew Fuller and Eyal Weizman describe “hyper-aesthetic” images as not merely “part of a symbolic regime of representation, but actual traces and residues of material relations and of mediatic structures assembled to elicit them” (80). 

Consider the stereoscope. Once the most popular means of accessing photographs, the stereoscope relied on a trick of the eye, akin to the use of 3D glasses. It combined two visions of the same scene taken from the slight left and slight right of the other. When viewed through a special viewing device, the human eye superimposes them, and the overlap creates the illusion of physical depth in a flat plane. We can find some examples of these on Flickr (including the Danish Film Museum) or at The Library of Congress’ Stereograph collection.

The time period in which this technology was popular happened to overlap with an era of brutal colonization, and the archival artifacts of this era contain traces of how images projected power. 

I was struck by stereoscopic images of American imperialism in the Philippines during the US occupation, starting in 1899. They aimed to “bring to life” images of Filipino men dying in fields and other images of war, using the spectacle of the stereoscopic image as a mechanism for propaganda. These were circulated as novelties to Americans on the mainland, a way of asserting a gaze of dominance over those they occupied.

In the long American tradition of infotainment, the stereogram fused a novel technological spectacle with the effort to assert military might, paired with captions describing the US cause as just and noble while severely diminishing the numbers of civilian casualties. In Body Parts of Empire : Visual Abjection, Filipino Images, and the American Archive, Nerissa Balce writes that

“The popularity of war photographs, stereoscope viewers, and illustrated journals can be read as the public’s support for American expansion. It can also be read as the fascination for what were then new imperial ‘technologies of vision’” (52).

The link between stereograms as a style of image and the gaze of colonizing power is now deeply entrenched into the vector spaces of image synthesis systems. Prompt Midjourney for the style of a stereogram, and this history haunts the images it returns. Many prompted images for “Stereograms, 1900” do not even render the expected, highly formulaic structure of a stereogram (two of the same images, side by side, at a slight angle). It does, however, conjure images of those occupied lands. We see a visual echo of the colonizing gaze.  

Images produced for the more generally used “stereoview,” even without the use of a date, still gravitate to a similar visual language. With “stereoview,” we are given the technical specifics of the medium. The content is more abstract: people are missing, but strongly suggested. These perhaps get me closest to the idea of a “haunted” image: a scene which suggests a history that I cannot directly access.

Perhaps there are two kinds of absences embedded in these systems. The people that colonizers want to erase, and then the evidence of the colonizers themselves. Crucially, this gaze haunts these images. 

Here are four sets of two pairs.

These styles are embedded into the prompt for the technology of image capture, the stereogram. The source material is inscribed with the gaze that controlled this apparatus. The method of that inscription — the stereogram — inscribes this material into the present images.  The history is loaded into the keyword and its neighboring associations in the vector space. History becomes part of the churn. These are new old images, built from the associations of a single word (stereoview) into its messy surroundings.

It’s important to remember that the images above are not documents of historical places or events. They’re “hallucinations,” that is, they are a sample of images from a spectrum of possible images that exists at the intersection of every image labeled “stereoview.” But “stereoview” as a category does not isolate the technology from how it was used. The technology of the stereogram, or the stereoviewer, was deeply integrated into regimes of war, racial hierarchies, and power. The gaze, and the subject, are both aggregated, diffused, and made to emerge through the churning of the model.

Technologies of Flattening

The stereoview and the diffusion models are both technologies of spectacle, and the affordance of power to those who control it is a similar one. They are technologies for flattening, containing, and re-contextualizing the world into a specific order. As viewers, the generated image is never merely the surfaces of photography churned into new, abstract forms that resemble our prompts. They are an activation of the model’s symbolic regime, which is derived from the corpus of images because it has the power to isolate images from their meaning

AI has the power of finance, which enables computational resources that make obtaining 5 billion images for a dataset possible, regardless of its impact on local environments. It has the resources to train these images; the resources to recruit underpaid labor to annotate and sort these images. The critiques of AI infrastructure are numerous.

I am most interested here in one form of power that is the most invisible, which is the power of naturalizing and imposing an order of meaning through diffused imagery. The machine controls the way language becomes images. At the same time, it renders historical documentation meaningless — we can generate all kinds of historical footage now.

These images are reminders of the ways data colonialism has become embedded within not merely image generation but the infrastructures of machine learning. The scholar Tiara Roxanne has been investigating the haunting of AI systems long before me. In 2022 Roxanne noted that,

“in data colonialism, forms of technological hauntings are are experienced when Indigenous peoples are marked as ‘other,’ and remain unseen and unacknowledged. In this way, Indigenous peoples, as circumscribed through the fundamental settler-colonial structures built within machine learning systems, are haunted and confronted by this external technological force. Here, technology performs as a colonial ghost, one that continues to harm and violate Indigenous perspectives, voices, and overall identities” (49).

AI can ignore “the traces and residues of material relations” (Fuller and Weizman) as it reduces the image to its surfaces instead of the constellations of power that structured the original material. These images are the product of imbalances of power in the archive, and whatever interests those archives protected are now protected by an impenetrable, uncontestable, automated set of decisions steered by the past.

The Abstracted Colonial Subject

What we see in the above images are an inscription by association. The generated image, as a type of machine learning system, matters not only because of how it structures history into the present. It matters because it is a visualization that reaches to something far greater about automated decision making and the power it exerts over others. 

These striations of power in the archive or museum, in the census or the polling data, in the medical records or the migration records, determine what we see and what we do not. What we see in generated images must contort itself around what has been excluded from the archives. What is visible is shaped by the invisible. In the real world, this can manifest as families living on a street serving as an indication of those who could not live on that street. It could be that loans granted by an algorithmic assessment always contain an echo of loans that were not approved. 

The synthetic image visualizes these traces. They churn the surfaces, not the tangled reality beneath them. The images that emerge are glossy, professional, saturated. Hiding behind these products by and for the attention economy is the world of the not-seen. What are our obligations as viewers to the surfaces we churn when we prompt an image model? How do we reconcile our knowledge of context and history with the algorithmic detachment of these automated remixes?

The media scholar Roland Meyer writes that,

“[s]omewhere in the training data that feeds these models are photographs of real people, real places, and real events that have somehow, if only statistically, found their way into the image we are looking at. Historical reality is fundamentally absent from these images, but it haunts them nonetheless.”

In a seance, you raise spirits you have no right to speak to. The folly of it is the subject of countless warnings in stories, songs and folklore. 

What if we took the prompt so seriously? What if typing words to trigger an image was treated as a means of summoning a hidden and unsettled history? Because that is what the prompt does. It agitates the archives. Sometimes, by accident, it surfaces something many would not care to see. Boldly — knowing that I am acting from a place of privilege, and power, I ask the system to return “the abstracted colonial subject of photography.” I know I am conjuring something I should not be. 

My words are transmitted into the model within a data center, where they flow through a set of vectors, the in-between state of thousands of photographs. My words are broken apart into key words — “abstracted, colonial, colonial subject, subject, photography.” These are further sliced into numerical tokens to represent the mathematical coordinates of these ideas within the model. From there, these coordinates offer points of cohesion which are applied to find an image within a jpg of digital static. The machine removes the noise toward an image that exists in the overlapping space of these vectors.

Avery Gordon, whose book Ghostly Matters is a rich source of thinking for this research, writes:

“… if there is one thing to be learned from the investigation of ghostly matters, it is that you cannot encounter this kind of disappearance as a grand historical fact, as a mass of data adding up to an event, marking itself in straight empty time, settling the ground for a future cleansed of its spirit” (63).

If history is present in the archives, the images churned from the archive disrupt our access to the flow of history. It prevents us from relating to the image with empathy, because there is no single human behind the image or within it. It’s the abstracted colonial gaze of power applied as a styling tool. It’s a mass of data claiming to be history.

Human and Mechanical Readings

I hope you will indulge me as my eye wanders through the resulting image.

I am struck by the glossiness of it. Midjourney is fine-tuned toward an aesthetic dataset, leaning into images found visually appealing based on human feedback. I note the presence of palm trees, which brings me to the Caribbean Islands of St. Croix where The Crying Child photograph was taken. I see the presence of barbed wire, a signifier of a colonial presence.

The image is a double exposure. It reminds me of spirit photography, in which so-called psychic photographers would surreptitiously photograph a ghostly puppet before photographing a client. The image of the “ghost” was superimposed on the film to emerge in the resulting photo. These are associations that come to my mind as I glance at this image. I also wonder about what I don’t know how to read: the style of the dress, the patterns it contains, the haircut, the particulars of vegetation.

We can also look at the image as a machine does. Midjourney’s describe feature will tell us what words might create an image we show it. If I use it with the images it produces, it offers a kind of mirror-world insight into the relationship between the words I’ve used to summon that image and the categories of images from which it was drawn.

To be clear, both “readings” offer a loose, intuitive methodology, keeping in the spirit of the seance — a Ouija board of pixel values and text descriptors. They are a way in to the subject matter, offering paths for more rigorous documentation: multiple images for the same prompt, evaluated together to identify patterns and the prevalence of those patterns. That reveals something about the vector space. 

Here, I just want to see something, to compare the image as I see it to what the machine “sees.”

The image returned for the abstract colonial subject of photography is described by Midjourney this way: 

“There is a man standing in a field of tall grass, inverted colors, tropical style, female image in shadow, portrait of bald, azure and red tones, palms, double exposure effect, afrofuturist, camouflage made of love, in style of kar wai wong, red and teal color scheme, symmetrical realistic, yellow infrared, blurred and dreamy illustration.”

My words produced an image, and then those words disappeared from the image that was produced. “Colonized Subject” is adjacent to the words the machine does see: “tall grass,” “afrofuturism,” “tropical.” Other descriptions recur as I prompt the model over and over again to describe this image, such as “Indian.” I have to imagine that this idea of colonized subjects “haunts” these keywords. The idea of the colonial subject is recognized by the system, but shuffled off to nearest synonyms and euphemisms. Might this be a technical infrastructure through which the images are haunted? Could certain patterns of images be linked through unacknowledged, invisible categories the machine can only indirectly acknowledge? 

I can only speculate. That’s the trouble with hauntings. It’s the limit to drawing any conclusions from these observations. But I would draw the reader’s attention to an important distinction between my actions as a collage artist and the images made by Midjourney. The image will be interpreted by many of us, who will find different ways to see it, and a human artist may put those meanings into adjacency through conscious decisions. But to create this image, we rely solely on a tool for automated churning.

We often describe the power of images in terms of what impact an image can have on the world. Less often we discuss the power that impacts the image: the power to structure and give the image form, to pose or arrange photographic subjects. 

Every person interprets an image in different ways. A machine makes images for every person from a fixed set of coordinates, its variety constrained by the borders of its data. That concentrates power over images into the unknown coordination of a black box system. How might we intervene and challenge that power?  

The Indifferent Archivist 

We have no business of conjuring ghosts if we don’t know how to speak to them. As a collage artist, “remixing” in 2016 meant creating new arrangements from old materials, suggesting new interpretations of archival images. I was able to step aside — as a white man in California, I would never use the images of colonized people for something as benign as “expressing myself.” I would know that I could not speak to that history. Best to leave that power to shift meanings and shape new narratives to those who could speak to it. Nonetheless, it is a power that can be wielded by those who have no rights to it.  

Yes, by moving any accessible image from the online archive and transmuting it into training data, diffusion models assert this same power. But it is incapable of historic acknowledgement or obligation. The narratives of the source materials are blocked from view, in service to a technically embedded narrative that images are merely their surfaces and that surfaces are malleable. At its heart is the idea that the context of these images can be stripped and reduced into a molding clay, for anyone’s hands to shape to their own liking. 

What matters is the power to determine the relationships our images have with the systems that include or exclude. It’s about the power to choose what becomes documented, and on what terms. Through directed attention, we may be able to work through the meanings of these gaps and traces. It is a useful antidote to the inattention of automated generalizations. To greet the ghosts in these archives presents an opportunity to intervene on behalf of complexity, nuance, and care.

That is literal meaning of curation, at its Latin root: “curare,” to care. In this light, there is no such thing as automated curation.

Reclaiming Traceability

In 2021, Magda Tyzlik-Carver wrote “the practice of curating data is also an epistemological practice that needs interventions to consider futures, but also account for the past. This can be done by asking where data comes from. The task in curating data is to reclaim their traceability and to account for their lineage.”

When I started the “Ghost Stays in the Picture” research project, I intended to make linkages between the images produced by these systems and the categories within their training data. It would be a means of surfacing the power embedded into the source of this algorithmic churning within the vector space. I had hoped to highlight and respond to these algorithmic imaginaries by revealing the technical apparatus beneath the surface of generated images. 

In 2024, no mainstream image generation tool offers the access necessary for us to gather any insights into its curatorial patterns. The image dataset I initially worked with for this project is gone. Images of power and domination were the reason — specifically, the Stanford Internet Observatory’s discovery of more than 3,000 images in the LAION 5B dataset depicting abused children. Realizing this, the churn of images became visceral, in the pit of my stomach. The traces of those images, the pain of any person in the dataset, lingers in the models. Perhaps imperceptibly, they shape the structures and patterns of the images I see.

In gathering these images, there was no right to refuse, no intervention of care. Ghosts, Odumosu writes, “make their presences felt, precisely in those moments when the organizing structure has ruptured a caretaking contract; when the crime has not been sufficiently named or borne witness to; when someone is not paying attention” (S299). 

The training of Generative Artificial Intelligence systems has relied upon the power to automate indifference. And if synthetic images are structured in this way, it is merely a visualization of how “artificial intelligence systems” structure the material world when carelessly deployed in other contexts. The synthetic image offers us a glimpse of what that world would look like, if only we would look critically at the structures that inform its spectacle. If we can read algorithmic decision-making a lapse in care, a disintegration of accountability, we might see fresh pavement has been poured onto sacred land. 

This regime of Artificial Intelligence is not an inevitability. It is not even a single ideology. It is a computer system, and computer systems, and norms of interaction and participation with those systems, are malleable. Even with training datasets locked away behind corporate walls, it might still be possible “to insist on care where there has historically been none” (Odumosu S297), and by extension, to identify and refuse the automated inscription of the colonizing ghost.

 

This post concludes my research work at the Flickr Foundation, but I am eager to continue it. I am seeking publishers of art books, or curators for art or photographic exhibitions, who may be interested in a longer set of essays or a curatorial project that explores this methodology for reading AI generated images. If you’re interested, please reach out to me directly: eryk.salvaggio@gmail.com.

The Ghost Stays in the Picture, Part 2: Data Casts Shadows

Eryk Salvaggio is a 2024 Flickr Foundation Research Fellow, diving into the relationships between images, their archives, and datasets through a creative research lens. This three-part series focuses on the ways archives such as Flickr can shape the outputs of generative AI in ways akin to a haunting. Read part one, or continue to part three.

“Today the photograph has transformed again.” – David A. Shamma, in a blog post announcing the YFCC100M dataset.

In part one of this series, I wrote about the differences between archives, datasets, and infrastructures. We explored the movement of images into archives through the simple act of sharing a photograph in an online showcase. We looked at the transmutation of archives into datasets — the ways those archives, composed of individual images, become a category unto themselves, and analyzed as an object of much larger scale. Once an archive becomes a dataset, seeing its contents as individual pieces, each with its own story and value, requires a special commitment to archival practices.

Flickr is an archive — a living and historical record of images taken by people living in the 21st century, a repository for visual culture and cultural heritage. It is also a dataset: the vast sum of this data, framed as an overwhelming challenge for organizing, sorting, and contextualizing what it contains. That data becomes AI infrastructure, as datasets made to aid the understanding of the archive become used in unexpected and unanticipated ways.  

In this post, I shift my analysis from image to archive to dataset, and trace the path of images as they become AI infrastructure — particularly in the field of data-driven machine learning and computer vision. I’ll again turn to the Flickr archive and datasets derived from it.

99.2 Million Rows

A key case study is a collection of millions of images shared in June 2014. That’s when Yahoo! Labs released the YFCC100M dataset, which contained 99.2 million rows of metadata describing photos by 578,268 Flickr members, all uploaded to Flickr between 2004 and 2014 and tagged with a CC license. The dataset contained information such as photo IDs, URLs, and a handful of metadata such as the title, tags, description. I believe that the YFCC100M release was emblematic of a shift in the public’s — and Silicon Valley’s — perception of the visual archive into the category of “image datasets.” 

Certainly, it wasn’t the first image dataset. Digital images had been collected into digital databases for decades, usually for the task of training image recognition systems, whether for handwriting, faces, or object detection. Many of these assembled similar images, such as Stanford’s dogs dataset or NVIDIA’s collection of faces. Nor was it the first transition that a curated archive made into the language of “datasets.” For example, the Tate Modern introduced a dataset of 70,000 digitized artworks in 2013.  

What made YFCC100M interesting was that it was so big, but also diverse. That is, it wasn’t a pre-assembled dataset of specific categories, it was an assortment of styles, subject matter, and formats. Flickr was not a cultural heritage institution but a social media network with a user base that had uploaded far more images than the world’s largest libraries, archives, or museums. In terms of pure photography, no institution could compete on scale and community engagement. 

The YFCC100M includes the description, tags, geotags, camera types, and links to 100 million source images. As a result, we see YFCC100M appear over and over again in papers about image recognition, and then image synthesis. It has been used to train, test, or calibrate countless machine vision projects, including high-rated image labeling systems at Google and OpenAI’s CLIP, which was essential to building DALL-E. Its influence in these systems rivals that of ImageNet, a dataset of 14 million images which was used as a benchmark for image recognition systems, though Nicolas Maleve notes that nearly half of ImageNet’s photos came from Flickr URLs. (ImageNet has been explored in-depth by Kate Crawford and Trevor Paglen.)

10,000 Images of San Francisco

It is always interesting to go in and look at the contents of a dataset, and I’m often surprised how rarely people do this. Whenever we dive into the actual content of datasets we discover interesting things. The YFCC100M dataset contains references to 200,000 images by photographer Andy Nystrom alone, a prolific street photographer who has posted nearly 8 million images to Flickr since creating their account in 2008. 

The dataset contains more than 10,000 images each of London, Paris, Tokyo, New York, San Francisco, and Hong Kong, which outnumber those of other cities. Note the gaps here: all cities of the Northern hemisphere. When I ask Midjourney for an image of a city, I see traces of these locations in the output. 

Are these strange hybrids a result of the prevalence of Flickr in the calibration and testing of these systems? Are they a bias accumulated through the longevity of these datasets and their embeddedness into AI infrastructures? I’m not confident enough to say for sure. But missing from the images produced from the generic prompt “city” are traces of what Midjourney considers an African city. What emerges are not shiny, glistening postcard shots or images that would be plastered on posters by the tourist bureau. Instead, they seem to affirm the worst of the colonizing imagination: unpaved roads, cars broken down in the street. The images for “city” are full of windows reflecting streaks of sunlight; for “African city,” these are windows absent of glass. 

“A prompt about a ‘building in Dakar’ will likely return a deserted field with a dilapidated building while Dakar is a vibrant city with a rich architectural history,” notes the Senegalese curator Linda Dounia. She adds: “For a technology that was developed in our times, it feels like A.I. has missed an opportunity to learn from the fraught legacies that older industries are struggling to untangle themselves from.”

Beyond the training data, these legacies are also entangled in digital infrastructures. We know images from Flickr have come to shape the way computers represent the world, and how we define tests of AI-generated output as “realistic.” These definitions emerge from data, but also from infrastructures of AI. Here, one might ask if the process of calibrating images to places has been so centered on the geographic regions where Flickr has access to ample images: 10,000 images each from cities of the Northern Hemisphere. These created categories for future assessment and comparison. 

What we see in those images of an “African city” are what we don’t see in the data set. What we see is what is what is missing from that infrastructure: 10,000 pictures of Lagos or Nairobi. When these images are absent from the training data, they influence the result. When they are absent from the classifiers and calibration tools, that absence is entrenched.

The sociologist Avery Gordon writes of ghosts, too. For Gordon, the ghost, or the haunting, is “the intermingling of fact, fiction and desire as it shapes the personal and social memory … what does the ghost say as it speaks, barely, in the interstices of the visible and invisible?” In these images, the ghost is the image not taken, the history not preserved, the gaps that haunt the archives. It’s clear these absences move into the data, too, and that the images of artificial intelligence are haunted by them, conjuring up images that reveal these gaps, if we can attune ourselves to see them.

There is a limit to this kind of visual infrastructural analysis of image generation tools — its reliance on intuition. There is always a distance between these representations of reality in the generated image and the reality represented in the datasets. Hence our language of the seance. It is a way of poking through the uncanny, to see if we can find its source, however remote the possibility may be.  

Representativeness

We do know a few things, in fact. We know this dataset was tested for representativeness, that was defined as how evenly it aligned with Flickr’s overall content — not the world at large. We know, then, that the dataset was meant to represent the broader content of Flickr as a whole, and that the biases of the dataset — such as the strong presence of these particular cities — are therefore the biases of Flickr. In 2024, an era where images have been scraped from the web wholesale for training data without warning or permission, we can ask if the YFCC100M dataset reflected the biases we see in tools like DALL-E and Midjourney. We can also ask if the dataset, in becoming a tool for measuring and calibrating these systems, may have shaped those biases as a piece of data infrastructure.

As biased data becomes a piece of automated infrastructure, we see biases come into play from factors beyond just the weights of the training data. It also comes into play in the ways the system maps words to images, sorts out and rejects useful images, and more. One of the ways YFCC100M’s influence may shape these outcomes is through its role in training the OpenAI tool I mentioned earlier, called CLIP. 

CLIP looks at patterns of pixels in an image and compares them to labels for similar sets of pixels. It’s a bridge that connects the descriptions of images to words of a user’s prompt. CLIP is a core connection point between words and images within generative AI. Recognizing whether an image resembles a set of words is how researchers decided what images to include in training datasets such as LAION 5B. 

Calibration

CLIP’s training and calibration dataset contained a subset of YFCC100M, about 15 million images out of CLIP’s 400 million total. But CLIP was calibrated with, and its results tested against, classifications using YFCC100M’s full set. By training and calibrating CLIP against YFCC100M, that dataset played a role in establishing the “ground truth” that shaped CLIP’s ability to link images to text. 

CLIP was assessed on its ability to scale the classifications produced by YFCC100M and MS-COCO, another dataset which consisted entirely of images downloaded from Flickr. The result is that the logic of Flickr users and tagging has become deeply embedded into the fabric of image synthesis. The captions created by Flickr members modeled — and then shaped — the ways images of all kinds would be labeled in the future. In turn, that structured the ways machines determined the accuracy of those labels. If we want to look at the infrastructural influences of these digital “ghosts in the machine,” then the age, ubiquity, and openness of the YFCC100M dataset suggests it has a subtle but important role to play in the way images are produced by diffusion models. 

We might ask about “dataset bias,” a form of bias that doesn’t refer to the dataset, or the archive, or the images they contain. Instead, it’s a bias introduced through the simple act of calling something a dataset, rather than acknowledging its constitutive pieces. This shift in focus shifts our relationship to these pieces, asking us to look at the whole. Might the idea of a “dataset” bias us from the outset toward ignoring context, and distract us from our obligation of care to the material it contains?  

From Drips Comes the Deluge

The YFCC100M dataset was paired with a paper, YFCC100M: The New Data in Multimedia Research, which focused on the needs of managing visual archives at scale. YFCC100M was structured as an index of the archive: a tool for generating insight about what the website held. The authors hoped it might be used to create tools for handling an exponential tide of visual information, rather than developing tools that contributed to the onslaught. 

The words “generative AI” never appear in the paper. It would have been difficult, in 2014, to anticipate that such datasets would be seen through a fundamental shift from “index” to “content” for image generation tools. That is a shift driven by the mindset of AI companies that rose to prominence years later.

In looking at the YFCC100M dataset and paper, I was struck by the difference between the problems it was established to address and the eventual, mainstream use of the dataset. Yahoo! released the paper in response to the problems of proprietary datasets, which they claimed were hampering replication across research efforts. The limits on the reuse of datasets also meant that researchers had to gather their own training data, which was a time consuming and expensive process. This is what made the data valuable enough to protect in the first place — an interesting historical counterpoint to today’s paradoxical claim by AI companies that image data is both rare and ubiquitous, essential but worth very little.  

Attribution

Creative Commons licensed pictures were selected for inclusion in order to facilitate the widest possible range of uses, noting that they were providing “a public dataset with clearly marked licenses that do not overly impose restrictions on how the data is used” (2). Only a third of the images in the dataset were marked as appropriate for commercial use, and 17% required only attribution. But, in accordance with the terms of the Creative Commons licenses used, every image in the dataset required attribution of some kind. When the dataset was shared with the public, it was assumed that researchers would use the dataset to determine how to use the images contained within it, picking images that complied with their own experiments.  

The authors of the paper acknowledge that archives are growing beyond our ability to parse them as archivists. But they also acknowledge Flickr as an archive, that is, a site of memory: 

“Beyond archived collections, the photostreams of individuals represent many facets of recorded visual information, from remembering moments and storytelling to social communication and self-identity [19]. This presents a grand challenge of sensemaking and understanding digital archives from non-homogeneous sources. Photographers and curators alike have contributed to the larger collection of Creative Commons images, yet little is known on how such archives will be navigated and retrieved, or how new information can be discovered therein.”

Despite this, there was a curious contradiction in the way Yahoo! Labs structured the release of the dataset. The least restrictive license in the dataset is CC-BY — images where the license requires attribution. Nearly 68 million out of the 100 million images in the dataset specifically stated there could be no commercial use of their images. Yet, the dataset itself was then released without any restrictions at all, described as “publicly and freely usable.”  

The dataset of YFCC100M wasn’t the images themselves. It was the list of images, a sample of the larger archive that was made referenceable as a way to encourage researchers to make sense of the scale of image hosting platforms. The strange disconnect between boldly declaring the contents as CC-licensed, while making them available to researchers to violate those licenses, is perhaps evident only in hindsight.

Publicly Available

It may not have been a deliberate violation of boundaries so much as it was a failure to grapple with the ways boundaries might be transgressed. The paper, then, serves as a unique time capsule for understanding the logic of datasets as descriptions of things, to the understanding of datasets as the collection of things themselves. This was a logic that we can see carried out in the relationships that AI companies have to the data they use. These companies see the datasets as markedly different from the images that the data refers to, suggesting that they have the right to use datasets of images under “fair use” rules that apply to data, but not to intellectual property. 

This breaks with the early days of datafication and machine learning, which made clearer distinctions between the description of an archive and the archive itself. When Stability AI used LAION 5B as a set of pointers to consumable content, this relationship between description and content collapsed. What was a list of image URLs and the text describing what would be found there became pointers to training data. The context was never considered. 

That collapse is the result of a set of a fairly recent set of beliefs about the world which increasingly sees the “image” as an assemblage of color information paired with technical metadata. We hear echoes of this in the defense of AI companies, that their training data is “publicly available,” a term with no actual, specific meaning. OpenAI says that CLIP was trained on “text–image pairs that are already publicly available” in its white paper.

In releasing the dataset, Yahoo’s researchers may have contributed to a shift: from understanding online platforms through the lens of archives, into understanding them as data sources to be plundered. Luckily, it’s not too late to reassert this distinction. Visual culture, memory, and history can be preserved through a return to the original mission of data science and machine learning in the digital humanities. We need to make sense of a growing number of images, which means preserving and encouraging new contexts and relationships between images rather than replacing them with context-free abstractions produced by diffusion models. 

Generative AI is a product of datasets and machine learning and digital humanities research. But in the past ten years, data about images and the images themselves have become increasingly interchangeable. Datasets were built to preserve and study metadata about images. But now, the metadata is stripped away, aside from the URL, which is used to analyze an image. The image is translated into abstracted information, ignoring where these images came from and the meaning – and relationships of power – that are embedded into what they depict. In erasing these sources, we lose insight into what they mean and how they should be understood: whether an image of a city was taken by a tourism board or an aid agency, for example. The biases that result from these absences are made clear.

Correcting these biases requires care and attention. It requires pathways for intervention and critical thinking about where images are sourced. It means prioritizing context over convenience. Without attention to context, correcting the source biases are far more challenging. 

Data Casts Shadows

In my fellowship with the Flickr Foundation, I am continuing my practice with AI, looking at the gaps between archives and data, and data and infrastructures, through the lens of an archivist. It is a creative research approach that examines how translations of translations shape the world. I am deliberately relying on the language of intuition — ghosts, hauntings, the ritual of the seance — to encourage a more human-scaled, intuitive relationship to this information. It’s a rebuttal of the idea that history, documentation, images and media can be reduced to objective data. 

That means examining the emerging infrastructure built on top of data, and returning to the archival view to see what was erased and what remains. What are the images in this dataset? What do they show us, and what do they mean? Maleve writes that to become AI infrastructure, a Flickr image is pulled from the context of its original circulation, losing currency. It is relabeled by machines, and even the associations of metadata itself become superfluous to the goal of image alignment. All that matters is what the machine sees and how it compares to similar images. The result is a calibration: the creation of a category. The original image is discarded, but the residue of whatever was learned lingers in the system. 

While writing this piece, I became transfixed by shadows within synthetic images. Where does the shadow cast in an AI generated image come from? They don’t come from the sun, because there is no sunlight within the black box of the AI system. Despite the hype, these models do not understand the physics of light, but merely produce traces of light abstracted from other sources.

Unlike photographic evidence, synthetic photographs don’t rely on being present to the world of light bouncing from objects onto film or sensors. The shadows we see in an AI generated image are the shadows cast by other images. The generated image is itself a shadow of shadows, a distortion of a distortion of light. The world depicted in the synthetic image is always limited to the worlds pre-arranged by the eyes of countless photographers. Those arrangements are further extended and mediated as these data shadows stretch across datasets, calibration systems, engineering decisions, design choices and automated processes that ignore or obscure their presence.

Working Backward from the Ghost

When we don’t know the source of decisions made about the system, the result is unexplainable, mysterious, spooky. But image generation platforms are a series of systems stacked on top of one another, trained on hastily assembled stews of image data. The outcomes go through multiple steps of analysis and calibration, outputs of one machine fed into another. Most of these systems are built upon a subset of human decisions scaled to cover inhuman amounts of information. Once automated, these decisions become disembodied, influencing the results.

In part 3 – the conclusion of this series – I’ll examine a means of reading AI generated images through the lens of power, hoping to reveal the intricate entanglement of context, control, and shifting meanings within text and image pairs. Just as shadows move across the AI generated image, so too, I propose, does the gaze of power contained within the archives.

I’ll attempt to trace the flow of power and meaning through datasets and data infrastructures that produce these prompted images, working backwards from what is produced. Where do these training images come from? What stories and images do they contain, or lack? In some ways, it is impossible to parse, like a ghost whose message from the past is buried in cryptic riddles. A seance is rarely satisfying, and shadows disappear under a flashlight.

But it’s my hope that learning to read and uncover these relationships improves our literacy about so-called AI images, and how we relate to them beyond toys for computer art. Rather, I hope to show that these are systems that perpetuate power, through inclusion and exclusion, and the sorting logic of automated computation. The more we automate a system, the more the system is haunted by unseen decisions. I hope to excavate the context of decisions embedded within the system and examine the ways that power moves through it. Otherwise, the future of AI will be dictated by what can most easily be forgotten.  

Read part three here.

***

I would be remiss not to point to the excellent and abundant work on Flickr as a dataset that has been published by Katrina Sluis and Nicolas Malevé, whose work is cited here but merits a special thank you in shaping the thinking throughout this research project. I am also grateful to scholars such as Timnit Gebru, whose work on dataset auditing has deeply informed this work, and to Dr. Abeba Birhane, whose work on the content of the LAION 5B dataset has inspired this creative research practice. 

In the images accompanying this text, I’ve paired images created in Stable Diffusion 1.6 for the prompt “Flickr.com street shadows.” They’re paired with images from actual Flickr members. I did not train AI on these photos, nor did I reference the originals in my prompts. But by pairing the two, we can see the ways that the original Flickr photos might have formed the hazy structures of those generated by Stable Diffusion. 

Improving millions of files on Wikimedia Commons with Flickypedia Backfillr Bot

Last year, we built Flickypedia, a new tool for copying photos from Flickr to Wikimedia Commons. As part of our planning, we asked for feedback on Flickr2Commons and analysed other tools. We spotted two consistent themes in the community’s responses:

  • Write more structured data for Flickr photos
  • Do a better job of detecting duplicate files

We tried to tackle both of these in Flickypedia, and initially, we were just trying to make our uploader better. Only later did we realize that we could take our work a lot further, and retroactively apply it to improve the metadata of the millions of Flickr photos already on Wikimedia Commons. At that moment, Flickypedia Backfillr Bot was born. Last week, the bot completed its millionth update, and we guesstimate we will be able to operate on another 13 million files.

The main goals of the Backfillr Bot are to improve the structured data for Flickr photos on Wikimedia Commons and to make it easier to find out which photos have been copied across. In this post, I’ll talk about what the bot does, and how it came to be.

Write more structured data for Flickr photos

There are two ways to add metadata to a file on Wikimedia Commons: by writing Wikitext or by creating structured data statements.

When you write Wikitext, you write your metadata in a MediaWiki-specific markup language that gets rendered as HTML. This markup can be written and edited by people, and the rendered HTML is designed to be read by people as well. Here’s a small example, which has some metadata to a file linking it back to the original Flickr photo:

== {{int:filedesc}} ==
{{Information
|Description={{en|1=Red-whiskered Bulbul photographed in Karnataka, India.}}
|Source=https://www.flickr.com/photos/shivanayak/12448637/
|Author=[[:en:User:Shivanayak|Shiva shankar]]
|Date=2005-05-04
|Permission=
|other_versions=
}}

and here’s what that Wikitext looks like when rendered as HTML:

A table with four rows: Description (Red-whiskered Bulbul photographed in Karnataka, India), Date (4 May 2005), Source (a Flickr URL) and Author (Shiva shankar)

This syntax is convenient for humans, but it’s fiddly for computers – it can be tricky to extract key information from Wikitext, especially when things get more complicated.

In 2017, Wikimedia Commons added support for structured data. This allows editors to add metadata in a machine-readable format. This makes it much easier to edit metadata programmatically, and there’s a strong desire from the community for new tools to write high-quality structured metadata that other tools can use.

When you add structured data to a file, you create “statements” which are attached to properties. The list of properties is chosen by the volunteers in the Wikimedia community.

For example, there’s a property called “source of file” which is used to indicate where a file came from. The file in our example has a single statement for this property, which says the file is available on the Internet, and points to the original Flickr URL:

Structured data is exposed via an API, and you can retrieve this information in nice machine-readable XML or JSON:

$ curl 'https://commons.wikimedia.org/w/api.php?action=wbgetentities&sites=commonswiki&titles=File%3ARed-whiskered%20Bulbul-web.jpg&format=xml'
<?xml version="1.0"?>
<api success="1">
  …
  <P7482>
    …
    <P973>
      <_v snaktype="value" property="P973">
        <datavalue
          value="https://www.flickr.com/photos/shivanayak/12448637/"
          type="string"/>
      </_v>
    </P973>
    …
  </P7482>
</api>

(Here “P7482” means “source of file” and “P973” is “described at URL”.)

Part of being a good structured data citizen is following the community’s established patterns for writing structured data. Ideally every tool would create statements in the same way, so the data is consistent across files – this makes it easier to work with later.

We spent a long time discussing how Flickypedia should use structured data, and we got a lot of helpful community feedback. We’ve documented our current data model as part of our Wikimedia project page.

Do a better job of detecting duplicate files

If a photo has already been copied from Flickr onto Wikimedia Commons, nobody wants to copy it a second time.

This sounds simple – just check whether the photo is already on Commons, and don’t offer to copy it if it’s already there. In practice, it’s quite tricky to tell if a given Flickr photo is on Commons. There are two big challenges:

  1. Files on Wikimedia Commons aren’t consistent in where they record the URL of the original Flickr photo. Newer files put the URL in structured data; older files only put the URL in Wikitext or the revision descriptions. You have to look in multiple places.
  2. Files on Wikimedia Commons aren’t consistent about which form of the Flickr URL they use – with and without a trailing slash, with the user NSID or their path alias, or the myriad other URL patterns that have been used in Flickr’s twenty-year history.

Here’s a sample of just some of the different URLs we saw in Wikimedia Commons:

https://www.flickr.com/photos/joyoflife//44627174
https://farm5.staticflickr.com/4586/37767087695_bb4ecff5f4_o.jpg
www.flickr.com/photo_edit.gne?id=3435827496
https://www.flickr.com/photo.gne?short=2ouuqFT

There’s no easy way to query Wikimedia Commons and see if a Flickr photo is already there. You can’t, for example, do a search for the current Flickr URL and be sure you’ll find a match – it wouldn’t find any of the examples above. You can combine various approaches that will improve your chances of finding an existing duplicate, if there is one, but it’s a lot of work and you get varying results.

For the first version of Flickypedia, we took a different approach. We downloaded snapshots of the structured data for every file on Wikimedia Commons, and we built a database of all the links between files on Wikimedia Commons and Flickr photos. For every file in the snapshot, we looked at the structured data properties where we might find a Flickr URL. Then we tried to parse those URLs using our Flickr URL parsing library, and find out what Flickr photo they point at (if any).

This gave us a SQLite database that mapped Flickr photo IDs to Wikimedia Commons filenames. We could use this database to do fast queries to find copies of a Flickr photo that already exist on Commons. This proved the concept, but it had a couple of issues:

  • It was an incomplete list – we only looked in the structured data, and not the Wikitext. We estimate we were missing at least a million photos.
  • Nobody else can use this database; it only lives on the Flickypedia server. Theoretically somebody else could create it themselves – the snapshots are public, and the code is open source – but it seems unlikely.
  • This database is only as up-to-date as the latest snapshot we’ve downloaded – it could easily fall behind what’s on Wikimedia Commons.

We wanted to make this process easier – both for ourselves, and anybody else building Flickr–Wikimedia Commons integrations.

Adding the Flickr Photo ID property

Every photo on Flickr has a unique numeric ID, so we proposed a new Flickr photo ID property to add to structured data on Wikimedia Commons. This proposal was discussed and accepted by the Wikimedia Commons community, and gives us a better way to match files on Wikimedia Commons to photos on Flickr:

This is a single field that you can query, and there’s an unambiguous, canonical way that values should be stored in this field – you don’t need to worry about the different variants of Flickr URL.

We added this field to Flickypedia, so any files uploaded with our tool will get this new field, and we hope that other Flickr upload tools will consider adding this field as well. But what about the millions of Flickr photos already on Wikimedia Commons? This is where Flickypedia Backfillr Bot was born.

Updating millions of files

Flickypedia Backfillr Bot applies our structured data mapping to every Flickr photo it can find on Wikimedia Commons – whether or not it was uploaded with Flickypedia. For every photo which was copied from Flickr, it compares the structured data to the live Flickr metadata, and updates the structured data if the two don’t match. This includes the Flickr Photo ID.

It reuses code from our duplicate detector: it goes through a snapshot looking for any files that come from Flickr photos. Then it gets metadata from Flickr, checks if the structured data matches that metadata, and if not, it updates the file on Wikimedia Commons.

Here’s a brief sketch of the process:

Most of the time this logic is fairly straightforward, but occasionally the bot will get confused – this is when the bot wants to write a structured data statement, but there’s already a statement with a different value. In this case, the bot will do nothing and flag it for manual review. There are edge cases and unusual files in Wikimedia Commons, and it’s better for the bot to do nothing than write incorrect or misleading data that will need to be reverted later.

Here are two examples:

  • Sometimes Wikimedia Commons has more specific metadata than Flickr. For example, this Flickr photo was posted by the Donostia Kultura account, and the description identifies Leire Cano as the photographer.

    Flickypedia Backfillr Bot wants to add a creator statement for “Donostia Kultura”, because it can’t understand the description – but when this file was copied to Wikimedia Commons, somebody added a more specific creator statement for “Leire Cano”.

    The bot isn’t sure which statement is correct, so it does nothing and flags this for manual review – and in this case, we’ve left the existing statement as-is.

  • Sometimes existing data on Wikimedia Commons has been mapped incorrectly. For example, this Flickr photo was taken “circa 1943”, but when it was copied to Wikimedia Commons somebody added an overly precise “date taken” statement claiming it was taken on “1 Jan 1943”.

    This bug probably occurred because of a misunderstanding of the Flickr API. The Flickr API will always return a complete timestamp in the “date” field, and then return a separate granularity value telling you how accurate it is. If you ignored that granularity value, you’d create an incorrect statement of what the date is.

    The bot isn’t sure which statement is correct, so it does nothing and flags this for manual review – and in this case, we made a manual edit to replace the statement with the correct date.

What next?

We’re going to keep going! There were a few teething problems when we started running the bot, but the Wikimedia community helped us fix our mistakes. It’s now been running for a month or so, and processed over a million files.

All the Flickypedia code is open source on GitHub, and a lot of it isn’t specific to Flickr – it’s general-purpose code for working with structured data on Wikimedia Commons, and could be adapted to build similar bots. We’ve already had conversations with a few people about other use cases, and we’ve got some sketches for how that code could be extracted into a standalone library.

We estimate that at least 14 million files on Wikimedia Commons are photos that were originally uploaded to Flickr – more than 10% of all the files on Commons. There’s plenty more to do. Onwards and upwards!

Data Lifeboat 5: Prototypes and policy

We are now past the midpoint of our first project stage, and have our three basic prototype Data Lifeboats. At the moment, they run locally via the command line and generate rough versions of what Data Lifeboats will eventually contain—data and pictures.

The last step for those prototypes is to move them into a clicky web prototype showing the full workflow—something we will share with our working group (but may not put online publicly). We are working towards completing this first prototyping stage around the end of June and writing up the project in July.

We’ve made a few key decisions since we last posted an update, namely about who we’re designing for and what other expertise we need to bring in. We still have more questions than answers, but really, that’s what prototyping is for.

Who might do which bit

It took us a while to get to this decision, but once we had gone through the initial discovery phase, it became clear that we need to concentrate our efforts on three key user groups:

  1. Flickr members – People who’ve uploaded pictures to Flickr, have set licenses and permissions, and may either be happy or not happy for their pictures to be put into Data Lifeboats.
  2. Data Lifeboat creators – Could be archivists or other curatorial types looking to gather sets of pictures to copy into archives elsewhere, whether that be an institution like The Library of Congress, or a family archivist with a DropBox account.
  3. Dock operators – This group is a bit more speculative, but, we envision that Data Lifeboats could actually land (or dock) in specific destinations and be treated with special care there. Our ideal scenario would be to develop a network of docks–something we’ve been calling a “Safe Harbor Network”—made up of members that are our great and good cultural organizations: they are already really good at keeping things safe over the long term.

It’ll be good to flesh the needs and wants of these three groups out in more detail in our next stage. If you are a Flickr member reading this, and want to share your story about what your Flickr account means to you, we’d love to hear it.

Web archive vs object archive

Some digital/web preservation experts take the opinion that it’s archivally important to also archive the user interface of a digital property in order to fully understand a digital object’s context. This has arguably resulted in web archives containing a whole lot more information and structural stuff than is useful or necessary. It’s sort of like archiving the entire house within which the shoebox of photos was found.

We have decided that archiving the flickr.com interface itself is not necessary for a Data Lifeboat, and we will be designing a special viewer that will live inside each Data Lifeboat to help people explore its contents.

Analysing the need for new policy

The Data Lifeboat idea is about so much more than technology. Even though that’s certainly challenging, the more we think about it, the more challenging the social and ethical aspects are. It’s gritty, complex stuff, made moreso by the delicate socio-technical settings available to Flickr members, like privacy, search settings, and licensing. The crosshatch of these three vectors makes managing stable permissions over time harder than weaving a complicated textile!

Once we narrowed down our focus to these specific user groups it also became clear that we need to address the (very) complex legal landscape surrounding the potential for archiving of Flickr images external to the service. It’s particularly gnarly when you start considering how permissions might change over time, or how access might shift for different scales of audience. For example, a Flickr member might be happy for Data Lifeboats containing their images to be shared with friends of friends, but a little apprehensive about them being shared with a recognized cultural institution that would use them for research. They may be much less happy for their Flickr pictures to be fully archived and available to anyone in perpetuity.

To help us explore these questions, and begin prototyping policies for each type of user group we foreses, we have enlisted the help of Dr. Andrea Wallace of the Law School at the University of Exeter. She is working with us to develop legal and policy frameworks tailored to the needs of each of these three groups, and to study how the current Flickr Terms of Service may be suitable for, or need adaption around, this idea of a Data Lifeboat. This may include drafting terms and conditions needed to create a Data Lifeboat, how we might be able to enhance rights management, and exploring how to manage expiration or decay of privacy or licensing into the future.

Data Lifeboat prototypes

We have generated three different prototype Data Lifeboats to think with, and show to our working group:

  1. Photos tagged with “Flickrhq”: This prototype includes thousands of tagged images of ‘life working at Flickr’, which is useful to explore the tricky aspects of collating other people’s pictures into a Data Lifeboat. Creating it revealed a search foible, whereby the result set that is delivered by searching via a tag is not consistent. Many of the pictures are also marked as All Rights Reserved, with 33% having downloads disabled. This raises juicy questions about licensing and permissions that need further discussion.
  2. Two photos from each Flickr Commons Member: We picked this subset because Flickr Commons photos are earmarked with the ‘no known copyright restrictions’ assertion, so questions about copying or reusing are theoretically simpler. 
  3. All photos from the Library of Congress (LoC) account: Comprising roughly 42,000 photos also marked as “no known copyright restrictions,” this prototype contains a set that is simpler to manage as all images have a uniform license setting. It was also useful to generate a Data Lifeboat of this size as it allowed us to do some very early benchmarking on questions like how long it takes to create one and where changes to our APIs might be helpful.

Preparing these prototypes has underscored the challenges of balancing the legal, social, and technical aspects of this kind of social media archiving, making clear the need for a special set of terms & conditions for Data Lifeboat creation. They also reveal the limitations of tags in capturing all relevant content (which, to some extent, we were expecting) and the user-imposed restrictions set on images in the Flickr context, like ‘can be downloaded.’

Remaining questions?

OMG, so many. Although the prototypes are still in progress, they have already stimulated great discussion and raised some key questions, such as:

  • How might user intentions or permissions change over time and how could software represent them?
  • How could the scope or scale of sharing influence how shared images are perceived, updated, and utilized?
  • How can we understand how different use cases and how archivists/librarians could engage with the Data Lifeboats?
  • How important is it to make sure Data Lifeboats are launched with embedded rights information, and how might those decay over time?
  • How should we be considering the descriptive or social contexts that accompany images, and how should they inform subsequent decisions about expiration dates?

Long term sustainability and funding models

It’s really so early to be talking about this – and we’re definitely not ready to present any actual, reasonable, viable models here because we don’t know enough yet about how Data Lifeboats could be used or under what circumstances. We did do a first pass review of some obvious potential business models, for example:

  • A premium subscription service that allows Flickr.com users to create personalized Data Lifeboats for their own collections.
  • A consulting service for institutions and individuals who want to create Data Lifeboats for specific archival purposes.
  • Developing training and certification programs for digital archivization that uses Data Lifeboats as the foundation.
  • Membership fees for members of the Safe Harbor network, or charging fees for access to the Data Lifeboat archives.

While there were aspects to each that appealed to our partners, there were also significant flaws so overall, we’re still a long way from having an answer. This is something else we’re planning to explore more broadly in partnership with the wider Flickr Commons membership in subsequent phases of this project.

Next steps

This month we’ll be wrapping up this first prototyping phase supported by the National Endowment for the Humanities. After we’ve completed the required reporting, we’ll move into the next phase in earnest, reaching out to those three user groups more deliberately to learn more about how Data Lifeboats could operate for them and what they would need them to do. 

Two upcoming in-person events!

We’re also very happy to be able to tell you the Mellon Foundation has awarded us a grant to support this next stage, and we’re especially looking forward to running two small events later in the year to gather people from our Flickr Commons partner institutions, as well as other birds of a feather, to discuss these key challenges together.

If you’d like to register your interest in attending one of these meetings, please let us know via this short Registration of Interest form. Please note, these will be small, maybe 20ish people at each, and registering interest does not guarantee a spot, and we’ve only just begun planning in earnest.

 

The surprising utility of a Flickr URL parser

In my first week at the Flickr Foundation, we made a toy called Flinumeratr. This is a small web app that takes a Flickr URL as input, and shows you all the photos which are present at that URL.

As part of this toy, I made a Python library which parses Flickr URLs, and tells you what the URL points to – a single photo, an album, a gallery, and so on. Initially it just handled fairly common patterns, the sort of URLs that you’d encounter if you use Flickr today, but it’s grown to handle more complicated URLs.

$ flickr_url_parser "https://www.flickr.com/photos/sdasmarchives/50567413447"
{"type": "single_photo", "photo_id": "50567413447"}

$ flickr_url_parser "https://www.flickr.com/photos/aljazeeraenglish/albums/72157626164453131"
{"type": "album", "user_url": "https://www.flickr.com/photos/aljazeeraenglish", "album_id": "72157626164453131", "page": 1}

$ flickr_url_parser "https://www.flickr.com/photos/blueminds/page3"
{"type": "user", "user_url": "https://www.flickr.com/photos/blueminds"}

The implementation is fairly straightforward: I use the hyperlink library to parse the URL text into a structured object, then I compare that object to a list of known patterns. Does it look like this type of URL? Or this type of URL? Or this type of URL? And so on.

You can run this library as a command-line tool, or call it from Python – there are instructions in the GitHub README.

There are lots of URL variants

In my second week and beyond, I started to discover more variants, which should probably be expected in 20-year old software! I’ve been looking into collections of Flickr URLs that have been built up over multiple years, and although most of these URLs follow common patterns, there are lots of unusual variants in the long tail.

Some of these are pretty simple. For example, the URL to a user’s photostream can be formed using your Flickr user NSID or your path alias, so flickr.com/photos/197130754@N07/ and flickr.com/photos/flickrfoundation/ point to the same page.

Others are more complicated, and you can trace the history of Flickr through some of the older URLs. Some of my favorites include:

  • Raw JPEG files, on live.staticflickr.com, farm1.static.flickr.com, and several other subdomains.

  • Links with a .gne suffix, like www.flickr.com/photo_edit.gne?id=3435827496 (from Wikimedia Commons). This acronym stands for Game Neverending, the online game out of which Flickr was born.

  • A Flash video player called stewart.swf, which might be a reference to Stewart Butterfield, one of the cofounders of Flickr.

I’ve added support for every variant of Flickr URL to the parsing library – if you want to see a complete list, check out the tests. I need over a hundred tests to check all the variants are parsed correctly.

Where we’re using it

I’ve been able to reuse this parsing code in a bunch of different projects, including:

  • Building a similar “get photos at this URL” interface in Flickypedia.

  • Looking for Flickr photo URLs in Wikimedia Commons. This is for detecting Flickr photos which have already been uploaded to Commons, which I’ll describe more in another post.

  • Finding Flickr pages which have been captured in the Wayback Machine – I can get a list of saved Flickr URLs, and then see what sort of pages have actually been saved.

When I created the library, I wasn’t sure if this code was actually worth extracting as a standalone package – would I use it again, or was this a premature abstraction?

Now that I’ve seen more of the diversity of Flickr URLs and found more uses for this code, I’m much happier with the decision to abstract it into a standalone library. Now we  only need to add support for each new URL variant once, and then all our projects can benefit.

If you want to try the Flickr URL parser yourself, all the code is open source on GitHub.

On the way to 100 years of Flickr

A report on archival strategies

By Ashley Kelleher Skjøtt

Flickr is an important piece of social history that pioneered user-driven curation, through folksonomic tags and through a publicly-accessible platform at scale, crystallising the web 2.0 internet. Applying tags to one’s own images and those of others, Flickr’s users significantly contributed to the emergence of commons culture. These collective practices became a core tenet of Flickr’s design ethos as a platform, decentralising and democratising the role of curation.

Of course, Flickr was not alone in pioneering this—hashtags and social sharing on other platforms added momentum to the general shift which was overall democratising by giving users agency over what they shared, experienced, and categorised. This shift in curatorial agency is just one aspect of Flickr’s significance as a living piece of social history.

Flickr continues to be one of the largest public collections of photographs on the planet, comprising tens of billions of images. Flickr celebrated its 20th birthday in February 2024. The challenge of archiving Flickr at scale, then, perhaps becomes about designing processes for preservation which can also be decentralised.

In August 2023, I learnt from a dear friend and colleague, Dan Pett, that the Flickr Foundation, newly based in London, was beginning to build an innovative archival practice for the platform. With my interest in digital cultural memory systems, an interest for which I have moved continents, I was determined to contribute in some way to the Foundation’s new goal. After exploring and discussing the space with George Oates, Director of the Flickr Foundation, we agreed that a practice-based information-gathering exercise could be useful in building up an understanding of such a practice.

So, what would an archive for Flickr look like?

Flickr is a living social media environment, with up to 25 million images uploaded each day. The reality of the company’s being acquired by a number of different parent companies over the course of its 20-year lifetime—already a remarkable timespan by social media standards—additionally brings to the forefront a stark case for working to ensure the availability of its contents into the long future. This is a priority shared today between Flickr itself and the new Flickr Foundation.

I have prepared a report of findings, written over a deliberately slow period and which aims to present a colloquial yet current answer to the question of archival practice for Flickr as a unique case, both when it comes to scale and defining what should be prioritised for preservation. Presuming that the platform is not invulnerable to media obsolescence, what on earth (or space) should an archive preserving the best of Flickr look like today? The work of asking this question again and again through the days, months, years, and decades to come leads us to the Foundation’s own question: what does it look like to ensure Flickr lasts for one hundred years?

REPORT: 20 Years of Flickr: Archiving the Living Environment

This information-gathering exercise consisted of seven interviews with sector peers across a wide range of practice, from academia to a small company, to a global design practice and within the museum world. My sincere thanks to:

  • Alex Seville (Head of Flickr),
  • Cass Fino-Radin (Small Data Industries),
  • Richard Palmer (V&A Museum),
  • Annet Dekker (University of Amsterdam),
  • Jenny Basford (British Library),
  • Matthew Hoerl (Arch Mission Foundation), and
  • Julie May (Bjarke Ingels Group)

Many thanks for taking the time to generously share their thoughts on the prospect, reflections on their own work, and expertise in the area.

The report sets out to define the value of what should be preserved for Flickr, as (1) a social platform, (2) a network-driven community, (3) a collection of uniquely user-generated metadata, and (4) as an invaluable image collection, specifically of photography. It then proceeds through a discussion of risks identified through the course of interviews. Finally, it proceeds through ten identified areas of practice which can be addressed in the Foundation’s archival plan, divided into long- and short-term initiatives. The report closes with six recommendations for the present.

An archive for Flickr which honours its considerable legacy should be created in the same vein. One interviewee reflected that the work of the archivist is to select what to preserve. This is, effectively, curation – the curation of archival material. It follows then, that if a central innovation of Flickr as a platform was to democratise the application of curatorial tools – enabling tags as metadata based in natural language, at scale – then the approach to archiving such a platform should follow this model in allowing its selection to be driven by users. What about a “preserve” tag?

Thanks to Flickr and other internet pioneers, this is far from any kind of revolutionary idea – and is one worth creating an archival practice around, so that coming generations can access the stories we want to tell about Flickr: the story of the internet, of the commons, of building open structures to find new images and of what it means to be a community, online.