Making some marvelous maps

This week we added maps to our Commons Explorer, and it’s proving to be a fun new way to find photos.

There are over 50,000 photos in the Flickr Commons collection which have location information telling us where the photo was taken. We can plot those locations on a map of the world, so you can get a sense of the geographical spread:

This map is interactive, so you can zoom in and move around to focus on a specific place. As you do, we’ll show you a selection of photos from the area you’ve selected.

You can also filter the map, so you see photos from just a single Commons member. For smaller members the map points can tell a story in themselves, and give you a sense of where a collection is and what it’s about:

These maps are available now, and know about the location of every geotagged photo in Flickr Commons.

Give them a try!

How can you add a location to a Flickr Commons photo?

For the first version of this map, we use the geotag added by the photo’s owner.

If you’re a Flickr Commons member, you can add locations to your photos and they’ll automatically show up on this map. The Flickr Help Center has instructions for how to do that.

It’s possible for other Flickr members to add machine tags to photos, and there are already thousands of crowdsourced tags that have location-related information. We don’t show those on the map right now, but we’re thinking about how we might do that in future!

How does the map work?

There are three technologies that make these maps possible.

The first is SQLite, the database engine we use to power the Commons Explorer. We have a table which contains every photo in the Flickr Commons, and it includes any latitude and longitude information. SQLite is wicked fast and our collection is small potatoes, so it can get the data to draw these maps very quickly.

I’d love to tell you about some deeply nerdy piece of work to hyper-optimize our queries, but it wasn’t necessary. I wrote the naïve query, added a couple of column indexes, and that first attempt was plenty fast. Tallying the locations for the entire Flickr Commons collection takes ~45ms; tallying the locations for an individual member is often under a millisecond.)

The second is Leaflet.js, a JavaScript library for interactive maps. This is a popular and feature-rich library that made it easy for us to add a map to the site. Combined with a marker clustering plugin, we had a lot of options for configuring the map to behave exactly as we wanted, and to connect it to Flickr Commons data.

The third is OpenStreetMap. This is a world map maintained by a community of volunteers, and we use their map tiles as the backdrop for our map.

Plus ça Change

To help us track changes to the Commons Explorer, we’ve added another page: the changelog.

This is part of our broader goal of archiving the organization. Even in the six months since we launched the Explorer, it’s easy to forget what happened when, and new features quickly feel normal. The changelog is a place for us to remember what’s changed and what the site used to look like, as we continue to make changes and improvements.

Developing a New Research Method, Part 1: Photovoice, critical fabulation, and archives

by Prakash Krishnan

Prakash Krishnan is a 2024 Flickr Foundation Research Fellow, working to engage community organizations with the creative possibilities afforded through archival and photo research as well as to unearth and activate some of the rich histories embedded in the Flickr archive.

I had the wonderful opportunity to visit London and Flickr Foundation HQ during the month of May 2024. The first month of my fellowship was a busy one, getting settled in, meeting the team, and making contacts around the UK to share and develop my idea for a new qualitative research method that was inspired by my perusing of just a minuscule fraction of the billions of photos uploaded and visible on Flickr.com.

Unlike the brilliant and techno-inspired minds of my Flickr Foundation cohort: George, Alex, Ewa, and Eryk, my head is often drifting in the clouds (the ones in the actual sky) or deep in books, articles, and archives. Since rediscovering Flickr and contemplating its many potential uses, I have activated my past work as a researcher, artist, and cultural worker, to reflect upon the ways Flickr could be used to engage communities in various visual and digital ethnographies.

Stemming from anthropology and the social sciences more broadly, ethnography is a branch of qualitative research involving the study of cultures, communities, or organizations. A visual ethnography thereby employs visual methods, such as photography, film, drawing, or painting.. Similarly, digital ethnography refers to the ethnographic study of cultures and communities as they interact with digital and internet technologies.

In this first post, I will trace a nonlinear timeline of different community-based and academic research projects I have conducted in recent years. Important threads from each of these projects came together to form the basis of the new ethnographic method I have developed over the course of this fellowship, which I call Archivevoice

Visual representations of community

The research I conducted for my masters thesis was an example of a digital, visual ethnography. For a year, I observed Instagram accounts sharing curated South Asian visual media, analyzing the types of content they shared, the different media used, the platform affordances that were engaged with, the comments and discussions the posts incited, and how the posts reflected contemporary news, culture, and politics. I also interviewed five people whose content I had studied. Through this research I observed a strong presence of uniquely diasporic concerns and aesthetics. Many posts critiqued the idea of different nationhoods and national affiliations with the countries founded after the partition of India in 1947 – a violent division of the country resulting in mass displacement and human casualty whose effects are still felt today. Because of this violent displacement and with multiple generations of people descended from the Indian subcontinent living outside of their ancestral territory, among many within the community, I observed a rejection of nationalist identities specific to say India, Pakistan, or Bangladesh. Instead, people were using the term “South Asian” as a general catchall for communities living in the region as well as in the diaspora. Drawing from queer cultural theorist José Esteban Muñoz, I labelled this digital, cultural phenomenon I observed “digital disidentification.”[1] 

My explorations of community-based visual media predate this research. In 2022, I worked with the Montreal grassroots artist collective and studio, Cyber Love Hotel, to develop a digital archive and exhibition space for 3D-scanned artworks and cultural objects called Things+Time. In 2023, we hosted a several-week-long residency program with 10 local, racialized, and queer artists. The residents were trained on archival description and tagging principles, and then selected what to archive. The objects curated and scanned in the context of this residency were in response to the overarching theme loss during the Covid-19 pandemic, in which rampant closures of queer spaces, restaurants, nightlife, music venues, and other community gathering spaces were proliferating across the city.

During complete pandemic lockdown, while working as the manager for cultural mediation at the contemporary gallery Centre CLARK, I conducted a similar project which involved having participants take photographs which responded to a specific prompt. In partnership with the community organization Head & Hands, I mailed disposable cameras to participants from a Black youth group whose activities were based at Head & Hands. Together with artist and CLARK member, Eve Tangy, we created educational videos on the principles of photography and disposable camera use and tasked the participants to go around their neighbourhoods taking photos of moments that, in their eyes, sparked Black Joy—the theme of the project. Following a feedback session with Eve and myself, the two preferred photos from each participants’ photo reels were printed and mounted as part of a community exhibition entitled Nous sommes ici (“We’re Here”) at the entry of Centre CLARK’s gallery. 


These public community projects were not formal or academic, but, I came to understand each of these projects as examples of what is called research-creation (or practice-based research or arts-based research). Through creative methods like curating objects for digital archiving and photography, I, as the facilitator/researcher, was interested in how the media comprising each exhibition would inform myself and the greater public about the experiences of marginalized artists and Black youth at such pivotal moments in these communities.

Photovoice: Empowering research participants

The fact that both these projects involved working with a community and giving them creative control over how they wanted their research presented reminded me of the popular qualitative research method used often within the fields of public health, sociology, and anthropology called Photovoice. The method was originally coined as Photo Novella in 1992 and then later renamed Photovoice in 1996 by researchers Caroline Wang and Mary Ann Burris. The flagship study that established this method for decades involved scholars providing cameras and photography training to low-income women living in rural villages of Yunnan, China.

The goals of this Photovoice research were to better understand, through the perspectives of these women, the challenges they faced within their communities and societies, and to communicate these concerns to policymakers who might be more amenable to photographic representations rather than text. Citing Paulo Freire, Wang and Burris note the potential photographs have to raise consciousness and promote collective action due to their political nature. [5]

According to Wang and Burris, “these images and tales have the potential to reach generations of children to come.” [6] The images created a medium through which these women were able to share their experiences and also relate to each other. Even with 50 villages represented in the research, shared experience and strong reactions to certain photographs came up for participants – including this picture of a young child lying in a field while her mother farmed nearby. 

According to the authors, “the image was virtually universal to their own experience. When families must race to finish seasonal cultivating, when their work load is heavy, and when no elders in the family can look after young ones, mothers are forced to bring their babies to the field. Dust and rain weaken the health of their infants… The photograph was a lightening [sic] rod for the women’s discussion of their burdens and needs.” [8]

Since its conception in the 1990s as a means for participatory needs assessment, many scholars and researchers have expanded Photovoice methodology. Given the exponential increase of camera access via smartphones, Photovoice is an increasingly feasible method for this kind of research. Recurring themes in Photovoice work include community health, mental health studies, ethnic and race-based studies, research with queer communities, as well as specific neighbourhood and urban studies. During the pandemic lockdowns, there were also Photovoice studies conducted entirely online, thus giving rise to the method of virtual Photovoice. [9]

Critical Fabulation: Filling the gaps in visual history

Following my masters thesis research, I became more interested in how communities sought to represent themselves through photography and digital media. Not only that, but also how communities would form and engage with content circulated on social media – despite these people not being the originators of this content. 

In my research, people reacted most strongly to family photographs depicting migration from South Asia to the Global North. Although reasons for emigration varied across the respondents, many people faced similar challenges with the immigration process and resettlement in a new territory. They shared their experiences through commenting online. 

People in communities which are underrepresented in traditional archives are often forced to work with limited documentation. They must do the critical and imaginative work of extrapolating what they find. While photographs can convey biographical, political, or historical meaning, exploring archived images with imagination can foster creative interpretation to fill gaps in the archival record. Scholar of African-American studies, Saidiya Hartman, introduced the term “critical fabulation” to denote this practice of reimagining the sequences of events and actors behind the narratives contained within the archive. In her words, this reconfiguration of story elements, attempts “to jeopardize the status of the event, to displace the received or authorized account, and to imagine what might have happened or might have been said or might have been done.” [10] In reference to depictions of narratives from the Atlantic slave trade in which enslaved people are often referred to as commodities, Hartman writes “the intent of this practice is not to give voice to the slave, but rather to imagine what cannot be verified, a realm of experience which is situated between two zones of death—social and corporeal death—and to reckon with the precarious lives which are visible only in the moment of their disappearance. It is an impossible writing which attempts to say that which resists being said (since dead girls are unable to speak). It is a history of an unrecoverable past; it is a narrative of what might have been or could have been; it is a history written with and against the archive.” [11]

I am investigating what it means to imagine the unverifiable and reckoning what only becomes visible at its disappearance. In 2020, I wrote about Facebook pages serving as archives of queer life in my home town, Montreal. [12] For this study, I once again conducted a digital ethnography, this time of the event pages surrounding a QTPOC (queer/trans person of colour)-led event series known as Gender B(l)ender. Drawing from Sam McBean, I argued that simply having access to these event pages on Facebook creates a space of possibility in which one can imagine themselves as part of these events, as part of these communities – even when physical, in-person participation is not possible. Although critical fabulation was not a method used in this study, it seemed like a precursor to this concept of collectively rethinking, reformulating, and resurrecting untold, unknown, or forgetting histories of the archives. This finally leads us to the project of my fellowship here at the Flickr Foundation.

In addition to this fellowship, I am coordinator of the Access in the Making Lab, a university research lab working broadly on issues of critical disability studies, accessibility, anti-colonialism, and environmental humanities. In my work, I am increasingly preoccupied with the question of methods: 1) how do we do archival research—especially ethical archival research—with historically marginalized communities; and, 2) how can research “subjects” be empowered to become seen as co-producers of research. 

I trace this convoluted genealogy of my own fragmented research and community projects to explain the method I am developing and have proposed to university researchers as a part of my fellowship. Following my work on Facebook and Instagram, I similarly position Flickr as a participatory archive, made by millions of people in millions of communities. [13] Eryk Salvaggio, fellow 2024 Flickr Foundation research fellow, also positions Flickr as an archive such that it “holds digital copies of historical artifacts for individual reflection and context.” [14] From this theoretical groundwork of seeing these online social image/media repositories as archives, I seek to position archival items – i.e. the photos uploaded to Flickr.com – as a medium for creative interpretation by which researchers could better understand the lived realities of different communities, just like the Photovoice researchers. I am calling this set of work and use cases “Archivevoice”.

In part two of this series, I will explore the methodology itself in more detail including a guide for researchers interested in engaging with this method.

Footnotes

[1] Prakash Krishnan, “Digital Disidentifications: A Case Study of South Asian Instagram Community Archives,” in The Politics and Poetics of Indian Digital Diasporas: From Desi to Brown (Routledge, 2024), https://www.routledge.com/The-Politics-and-Poetics-of-Indian-Digital-Diasporas-From-Desi-to-Brown/Jiwani-Tremblay-Bhatia/p/book/9781032593531.

[2] Caroline Wang and Mary Ann Burris, “Empowerment through Photo Novella: Portraits of Participation,” Health Education Quarterly 21, no. 2 (1994): 171–86.

[3] Kunyi Wu, Visual Voices, 100 Photographs of Village China by the Women of Yunnan Province, 1995.

[4] Wu.

[5] Caroline Wang and Mary Ann Burris, “Photovoice: Concept, Methodology, and Use for Participatory Needs Assessment,” Health Education & Behavior 24, no. 3 (1997): 384.

[6] Wang and Burris, “Empowerment through Photo Novella,” 179.

[7] Wang and Burris, “Empowerment through Photo Novella.”

[8] Wang and Burris, 180.

[9] John L. Oliffe et al., “The Case for and Against Doing Virtual Photovoice,” International Journal of Qualitative Methods 22 (March 1, 2023): 16094069231190564, https://doi.org/10.1177/16094069231190564.

[10] Saidiya Hartman, “Venus in Two Acts,” Small Axe 12, no. 2 (2008): 11.

[11] Hartman, 12.

[12] Prakash Krishnan and Stefanie Duguay, “From ‘Interested’ to Showing Up: Investigating Digital Media’s Role in Montréal-Based LGBTQ Social Organizing,” Canadian Journal of Communication 45, no. 4 (December 8, 2020): 525–44, https://doi.org/10.22230/cjc.2020v44n4a3694.

[13] Isto Huvila, “Participatory Archive: Towards Decentralised Curation, Radical User Orientation, and Broader Contextualisation of Records Management,” Archival Science 8, no. 1 (March 1, 2008): 15–36, https://doi.org/10.1007/s10502-008-9071-0.

[14] Eryk Salvaggio, “The Ghost Stays in the Picture, Part 1: Archives, Datasets, and Infrastructures,” Flickr Foundation (blog), May 29, 2024, https://www.flickr.org/the-ghost-stays-in-the-picture-part-1-archives-datasets-and-infrastructures/.

Bibliography

Hartman, Saidiya. “Venus in Two Acts.” Small Axe 12, no. 2 (2008): 1–14.

Huvila, Isto. “Participatory Archive: Towards Decentralised Curation, Radical User Orientation, and Broader Contextualisation of Records Management.” Archival Science 8, no. 1 (March 1, 2008): 15–36. https://doi.org/10.1007/s10502-008-9071-0.

Krishnan, Prakash. “Digital Disidentifications: A Case Study of South Asian Instagram Community Archives.” In The Politics and Poetics of Indian Digital Diasporas: From Desi to Brown. Routledge, 2024. https://www.routledge.com/The-Politics-and-Poetics-of-Indian-Digital-Diasporas-From-Desi-to-Brown/Jiwani-Tremblay-Bhatia/p/book/9781032593531.

Krishnan, Prakash, and Stefanie Duguay. “From ‘Interested’ to Showing Up: Investigating Digital Media’s Role in Montréal-Based LGBTQ Social Organizing.” Canadian Journal of Communication 45, no. 4 (December 8, 2020): 525–44. https://doi.org/10.22230/cjc.2020v44n4a3694.

Oliffe, John L., Nina Gao, Mary T. Kelly, Calvin C. Fernandez, Hooman Salavati, Matthew Sha, Zac E. Seidler, and Simon M. Rice. “The Case for and Against Doing Virtual Photovoice.” International Journal of Qualitative Methods 22 (March 1, 2023): 16094069231190564. https://doi.org/10.1177/16094069231190564.

Salvaggio, Eryk. “The Ghost Stays in the Picture, Part 1: Archives, Datasets, and Infrastructures.” Flickr Foundation (blog), May 29, 2024. https://www.flickr.org/the-ghost-stays-in-the-picture-part-1-archives-datasets-and-infrastructures/.

Wang, Caroline, and Mary Ann Burris. “Empowerment through Photo Novella: Portraits of Participation.” Health Education Quarterly 21, no. 2 (1994): 171–86.

———. “Photovoice: Concept, Methodology, and Use for Participatory Needs Assessment.” Health Education & Behavior 24, no. 3 (1997): 369–87.

Wu, Kunyi. Visual Voices, 100 Photographs of Village China by the Women of Yunnan Province, 1995.

The Ghost Stays in the Picture, Part 3: The Power of the Image

Eryk Salvaggio is a 2024 Flickr Foundation Research Fellow, diving into the relationships between images, their archives, and datasets through a creative research lens. This three-part series focuses on the ways archives such as Flickr can shape the outputs of generative AI in ways akin to a haunting. You can read part one and two.

“Definitions belong to the definers, not the defined.”
― Toni Morrison, Beloved

Generative Artificial Intelligence is sometimes described as a remix engine. It is one of the more easily graspable metaphors for understanding these images, but it’s also wrong. 

As a digital collage artist working before the rise of artificial intelligence, I was always remixing images. I would do a manual search of the public domain works available through the Internet Archive or Flickr Commons. I would download images into folders named for specific characteristics of various images. An orange would be added to the folder for fruits, but also round, and the color orange; cats could be found in both cats and animals

I was organizing images solely on visual appearance. It was anticipating their retrieval whenever certain needs might emerge. If I needed something round to balance a particular composition, I could find it in the round folder, surrounded by other round things: fruits and stones and images of the sun, the globes of planets and human eyes. 

Once in the folder, the images were shapes, and I could draw from them regardless of what they depicted. It didn’t matter where they came from. They were redefined according to their anticipated use. 

A Churning

This was remixing, but I look back on this practice with fresh eyes when I consider the metaphor as it is applied to diffusion models. My transformation of source material was not merely based on their shapes, but their meaning. New juxtapositions emerged, recontextualizing those images. They retained their original form, but engaged in new dialogues through virtual assemblages. 

As I explore AI images and the datasets that help produce them, I find myself moving away from the concept of the remix. The remix is a form of picking up a melody and evolving it, and it relies on human expression. It is a relationship, a gesture made in response to another gesture.

To believe we could “automate” remixing assumes too much of the systems that do this work. Remixes require an engagement with the source material. Generative AI systems do not have any relationship with the meanings embedded into the materials they reconfigure. In the absence of engagement, what machines do is better described as a churn, combining two senses of the word. Generative AI models churn images in that they dissolve the surface of these images. Then it churns out new images, that is, “to produce mechanically and in great volume.” 

Of course, people can diffuse the surface meaning of images too. As a collagist, I could ignore the context of any image I liked. We can look at the stereogram below and see nothing but the moon. We don’t have to think about the tools used to make that image, or how it was circulated, or who profited from its production. But as a collagist, I could choose to engage with questions that were hidden by the surfaces of things. I could refrain from engagements with images, and their ghosts, that I did not want to disturb. 

Actions taken by a person can model actions taken by a machine. But the ability to automate a person’s actions does not suggest the right or the wisdom to automate those actions. I wonder if, in the case of diffusion models, we shouldn’t more closely scrutinize the act of prising meaning from an image and casting it aside. This is something humans do when they are granted, or demand, the power to do so. The automation of that power may be legal. But it also calls for thoughtful restraint. 

In this essay, I want to explore the power to inscribe into images. Traditionally, the power to extract images from a place has been granted to those with the means to do so. Over the years, the distribution and circulation of images has been balanced against those who hold little power to resist it. In the automation of image extraction for training generative artificial intelligence, I believe we are embedding this practice into a form of data colonialism. I suggest that power differentials haunt the images that are produced by AI, because it has molded the contents of datasets, and infrastructures, that result in those images. 

The Crying Child

Temi Odumosu has written about the “digital reproduction of enslaved and colonized subjects held in cultural heritage collections.” In The Crying Child, Odumosu looks at the role of the digital image as a means of extending the life of a photographic memory. But this process is fraught, and Odumosu dedicates the paper to “revisiting those breaches (in trust) and colonial hauntings that follow photographed Afro-diasporic subjects from moment of capture, through archive, into code” (S290). It does so by focusing on a single image, taken in St. Croix in 1910: 

“This photograph suspends in time a Black body, a series of compositional choices, actions, and a sound. It represents a child standing alone in a nondescript setting, barefoot with overpronation, in a dusty linen top too short to be a dress, and crying. Clearly in visible distress, with a running nose and copious tears rolling down its face, the child’s crinkled forehead gives a sense of concentrated energy exerted by all the emotion … Emotions that object to the circumstances of iconographic production.”

The image emerges from the Royal Danish Library. It was taken by Axel Ovesen, a military officer who operated a commercial photography business. The photograph was circulated as a postcard, and appears in a number of personal and commercial photo albums Odumosu found in the archive.

The unnamed crying child appeared to the Danish colonizers of the island as an amusement, and is labeled only as “the grumpy one” (in the sense of “uncooperative”). The contexts in which this image appeared and circulated were all oriented toward soothing and distancing the colonizers from the colonized. By reframing it as a humorous novelty, the power to apply and remove meaning is exercised on behalf of those who purchase the postcard and mail it to others for a laugh. What is literally depicted in these postcards is, Odumosi writes, “the means of production, rights of access, and dissemination” (S295). 

I am describing this essay at length because the practice of categorizing this image in an archive is so neatly aligned with the collection and categorization of training data for algorithmic images. Too often, the images used for training are treated solely as data, and training defended as an act that leaves no traces. This is true. The digital copy remains intact.

But the image is degraded, literally, step by step until nothing remains but digital noise. The image is churned, the surface broken apart, and its traces stored as math tucked away in some vector space. It all seems very tidy, technical, and precise, if you treat the image as data. But to say so requires us to agree that the structures and patterns of the crying child in the archive — the shape of the child’s body, the details of the wrinkled skin around the child’s mouth — are somehow distinct from the meaning of the image. 

Because by diffusing these images into an AI model, and pairing existing text labels to it within the model, we extend the reach of Danish colonial power over the image. For centuries, archives have organized collections into assemblages shaped and informed by a vision of those with power over those whose power is held back. The colonizing eye sets the crying child into the category of amusements, where it lingers until unearthed and questioned.

If these images are diffused into new images — untraceable images, images that claim to be without context or lineage — how do we uncover the way that this power is wielded and infused into the datasets, the models, and the images ultimately produced by the assemblage? What obligations linger beneath the surfaces of things? 

Every Archive a Collage

Collage can be one path for people to access these images and evaluate their historical context. The human collage maker, the remixer, can assess and determine the appropriateness of the image for whatever use they have in mind. This can be an exercise of power, too, and it ought to be handled consciously. It has featured as a tool of Situationist detournement, a means of taking images from advertising and propaganda to reveal their contradictions and agendas. These are direct confrontations, artistic gestures that undermine the organization of the world that images impose on our sense of things. The collage can be used to exert power or challenge the status quo. 

Every archive is a collage, a way of asserting that there is a place for things within an emergent or imposed structure. The scholar and artist Beth Coleman’s work points to the reversal of this relationship, citing W.E.B. Du Bois’ exhibition at the 1900 Paris Exposition. M. Murphy writes,

“Du Bois’s use of [photographic] evidence disrupted racial kinds rather than ordered them … Du Bois’s exhibition was crucially not an exhibit of ‘facts’ and ‘data’ that made black people in Georgia knowable to study, but rather a portrait in variation and difference so antagonistic to racist sociology as to dislodge race as a coherent object of study” (71).

The imposed structures of algorithmically generated images rely on facts and data, defined a certain way. They struggle with context and difference. The images these tools produce are constrained to the central tendencies of the data they were trained on, an inherently conformist technology. 

To challenge these central tendencies means to engage with the structures it imposes on this data, and to critique this churn of images into data to begin with. Matthew Fuller and Eyal Weizman describe “hyper-aesthetic” images as not merely “part of a symbolic regime of representation, but actual traces and residues of material relations and of mediatic structures assembled to elicit them” (80). 

Consider the stereoscope. Once the most popular means of accessing photographs, the stereoscope relied on a trick of the eye, akin to the use of 3D glasses. It combined two visions of the same scene taken from the slight left and slight right of the other. When viewed through a special viewing device, the human eye superimposes them, and the overlap creates the illusion of physical depth in a flat plane. We can find some examples of these on Flickr (including the Danish Film Museum) or at The Library of Congress’ Stereograph collection.

The time period in which this technology was popular happened to overlap with an era of brutal colonization, and the archival artifacts of this era contain traces of how images projected power. 

I was struck by stereoscopic images of American imperialism in the Philippines during the US occupation, starting in 1899. They aimed to “bring to life” images of Filipino men dying in fields and other images of war, using the spectacle of the stereoscopic image as a mechanism for propaganda. These were circulated as novelties to Americans on the mainland, a way of asserting a gaze of dominance over those they occupied.

In the long American tradition of infotainment, the stereogram fused a novel technological spectacle with the effort to assert military might, paired with captions describing the US cause as just and noble while severely diminishing the numbers of civilian casualties. In Body Parts of Empire : Visual Abjection, Filipino Images, and the American Archive, Nerissa Balce writes that

“The popularity of war photographs, stereoscope viewers, and illustrated journals can be read as the public’s support for American expansion. It can also be read as the fascination for what were then new imperial ‘technologies of vision’” (52).

The link between stereograms as a style of image and the gaze of colonizing power is now deeply entrenched into the vector spaces of image synthesis systems. Prompt Midjourney for the style of a stereogram, and this history haunts the images it returns. Many prompted images for “Stereograms, 1900” do not even render the expected, highly formulaic structure of a stereogram (two of the same images, side by side, at a slight angle). It does, however, conjure images of those occupied lands. We see a visual echo of the colonizing gaze.  

Images produced for the more generally used “stereoview,” even without the use of a date, still gravitate to a similar visual language. With “stereoview,” we are given the technical specifics of the medium. The content is more abstract: people are missing, but strongly suggested. These perhaps get me closest to the idea of a “haunted” image: a scene which suggests a history that I cannot directly access.

Perhaps there are two kinds of absences embedded in these systems. The people that colonizers want to erase, and then the evidence of the colonizers themselves. Crucially, this gaze haunts these images. 

Here are four sets of two pairs.

These styles are embedded into the prompt for the technology of image capture, the stereogram. The source material is inscribed with the gaze that controlled this apparatus. The method of that inscription — the stereogram — inscribes this material into the present images.  The history is loaded into the keyword and its neighboring associations in the vector space. History becomes part of the churn. These are new old images, built from the associations of a single word (stereoview) into its messy surroundings.

It’s important to remember that the images above are not documents of historical places or events. They’re “hallucinations,” that is, they are a sample of images from a spectrum of possible images that exists at the intersection of every image labeled “stereoview.” But “stereoview” as a category does not isolate the technology from how it was used. The technology of the stereogram, or the stereoviewer, was deeply integrated into regimes of war, racial hierarchies, and power. The gaze, and the subject, are both aggregated, diffused, and made to emerge through the churning of the model.

Technologies of Flattening

The stereoview and the diffusion models are both technologies of spectacle, and the affordance of power to those who control it is a similar one. They are technologies for flattening, containing, and re-contextualizing the world into a specific order. As viewers, the generated image is never merely the surfaces of photography churned into new, abstract forms that resemble our prompts. They are an activation of the model’s symbolic regime, which is derived from the corpus of images because it has the power to isolate images from their meaning

AI has the power of finance, which enables computational resources that make obtaining 5 billion images for a dataset possible, regardless of its impact on local environments. It has the resources to train these images; the resources to recruit underpaid labor to annotate and sort these images. The critiques of AI infrastructure are numerous.

I am most interested here in one form of power that is the most invisible, which is the power of naturalizing and imposing an order of meaning through diffused imagery. The machine controls the way language becomes images. At the same time, it renders historical documentation meaningless — we can generate all kinds of historical footage now.

These images are reminders of the ways data colonialism has become embedded within not merely image generation but the infrastructures of machine learning. The scholar Tiara Roxanne has been investigating the haunting of AI systems long before me. In 2022 Roxanne noted that,

“in data colonialism, forms of technological hauntings are are experienced when Indigenous peoples are marked as ‘other,’ and remain unseen and unacknowledged. In this way, Indigenous peoples, as circumscribed through the fundamental settler-colonial structures built within machine learning systems, are haunted and confronted by this external technological force. Here, technology performs as a colonial ghost, one that continues to harm and violate Indigenous perspectives, voices, and overall identities” (49).

AI can ignore “the traces and residues of material relations” (Fuller and Weizman) as it reduces the image to its surfaces instead of the constellations of power that structured the original material. These images are the product of imbalances of power in the archive, and whatever interests those archives protected are now protected by an impenetrable, uncontestable, automated set of decisions steered by the past.

The Abstracted Colonial Subject

What we see in the above images are an inscription by association. The generated image, as a type of machine learning system, matters not only because of how it structures history into the present. It matters because it is a visualization that reaches to something far greater about automated decision making and the power it exerts over others. 

These striations of power in the archive or museum, in the census or the polling data, in the medical records or the migration records, determine what we see and what we do not. What we see in generated images must contort itself around what has been excluded from the archives. What is visible is shaped by the invisible. In the real world, this can manifest as families living on a street serving as an indication of those who could not live on that street. It could be that loans granted by an algorithmic assessment always contain an echo of loans that were not approved. 

The synthetic image visualizes these traces. They churn the surfaces, not the tangled reality beneath them. The images that emerge are glossy, professional, saturated. Hiding behind these products by and for the attention economy is the world of the not-seen. What are our obligations as viewers to the surfaces we churn when we prompt an image model? How do we reconcile our knowledge of context and history with the algorithmic detachment of these automated remixes?

The media scholar Roland Meyer writes that,

“[s]omewhere in the training data that feeds these models are photographs of real people, real places, and real events that have somehow, if only statistically, found their way into the image we are looking at. Historical reality is fundamentally absent from these images, but it haunts them nonetheless.”

In a seance, you raise spirits you have no right to speak to. The folly of it is the subject of countless warnings in stories, songs and folklore. 

What if we took the prompt so seriously? What if typing words to trigger an image was treated as a means of summoning a hidden and unsettled history? Because that is what the prompt does. It agitates the archives. Sometimes, by accident, it surfaces something many would not care to see. Boldly — knowing that I am acting from a place of privilege, and power, I ask the system to return “the abstracted colonial subject of photography.” I know I am conjuring something I should not be. 

My words are transmitted into the model within a data center, where they flow through a set of vectors, the in-between state of thousands of photographs. My words are broken apart into key words — “abstracted, colonial, colonial subject, subject, photography.” These are further sliced into numerical tokens to represent the mathematical coordinates of these ideas within the model. From there, these coordinates offer points of cohesion which are applied to find an image within a jpg of digital static. The machine removes the noise toward an image that exists in the overlapping space of these vectors.

Avery Gordon, whose book Ghostly Matters is a rich source of thinking for this research, writes:

“… if there is one thing to be learned from the investigation of ghostly matters, it is that you cannot encounter this kind of disappearance as a grand historical fact, as a mass of data adding up to an event, marking itself in straight empty time, settling the ground for a future cleansed of its spirit” (63).

If history is present in the archives, the images churned from the archive disrupt our access to the flow of history. It prevents us from relating to the image with empathy, because there is no single human behind the image or within it. It’s the abstracted colonial gaze of power applied as a styling tool. It’s a mass of data claiming to be history.

Human and Mechanical Readings

I hope you will indulge me as my eye wanders through the resulting image.

I am struck by the glossiness of it. Midjourney is fine-tuned toward an aesthetic dataset, leaning into images found visually appealing based on human feedback. I note the presence of palm trees, which brings me to the Caribbean Islands of St. Croix where The Crying Child photograph was taken. I see the presence of barbed wire, a signifier of a colonial presence.

The image is a double exposure. It reminds me of spirit photography, in which so-called psychic photographers would surreptitiously photograph a ghostly puppet before photographing a client. The image of the “ghost” was superimposed on the film to emerge in the resulting photo. These are associations that come to my mind as I glance at this image. I also wonder about what I don’t know how to read: the style of the dress, the patterns it contains, the haircut, the particulars of vegetation.

We can also look at the image as a machine does. Midjourney’s describe feature will tell us what words might create an image we show it. If I use it with the images it produces, it offers a kind of mirror-world insight into the relationship between the words I’ve used to summon that image and the categories of images from which it was drawn.

To be clear, both “readings” offer a loose, intuitive methodology, keeping in the spirit of the seance — a Ouija board of pixel values and text descriptors. They are a way in to the subject matter, offering paths for more rigorous documentation: multiple images for the same prompt, evaluated together to identify patterns and the prevalence of those patterns. That reveals something about the vector space. 

Here, I just want to see something, to compare the image as I see it to what the machine “sees.”

The image returned for the abstract colonial subject of photography is described by Midjourney this way: 

“There is a man standing in a field of tall grass, inverted colors, tropical style, female image in shadow, portrait of bald, azure and red tones, palms, double exposure effect, afrofuturist, camouflage made of love, in style of kar wai wong, red and teal color scheme, symmetrical realistic, yellow infrared, blurred and dreamy illustration.”

My words produced an image, and then those words disappeared from the image that was produced. “Colonized Subject” is adjacent to the words the machine does see: “tall grass,” “afrofuturism,” “tropical.” Other descriptions recur as I prompt the model over and over again to describe this image, such as “Indian.” I have to imagine that this idea of colonized subjects “haunts” these keywords. The idea of the colonial subject is recognized by the system, but shuffled off to nearest synonyms and euphemisms. Might this be a technical infrastructure through which the images are haunted? Could certain patterns of images be linked through unacknowledged, invisible categories the machine can only indirectly acknowledge? 

I can only speculate. That’s the trouble with hauntings. It’s the limit to drawing any conclusions from these observations. But I would draw the reader’s attention to an important distinction between my actions as a collage artist and the images made by Midjourney. The image will be interpreted by many of us, who will find different ways to see it, and a human artist may put those meanings into adjacency through conscious decisions. But to create this image, we rely solely on a tool for automated churning.

We often describe the power of images in terms of what impact an image can have on the world. Less often we discuss the power that impacts the image: the power to structure and give the image form, to pose or arrange photographic subjects. 

Every person interprets an image in different ways. A machine makes images for every person from a fixed set of coordinates, its variety constrained by the borders of its data. That concentrates power over images into the unknown coordination of a black box system. How might we intervene and challenge that power?  

The Indifferent Archivist 

We have no business of conjuring ghosts if we don’t know how to speak to them. As a collage artist, “remixing” in 2016 meant creating new arrangements from old materials, suggesting new interpretations of archival images. I was able to step aside — as a white man in California, I would never use the images of colonized people for something as benign as “expressing myself.” I would know that I could not speak to that history. Best to leave that power to shift meanings and shape new narratives to those who could speak to it. Nonetheless, it is a power that can be wielded by those who have no rights to it.  

Yes, by moving any accessible image from the online archive and transmuting it into training data, diffusion models assert this same power. But it is incapable of historic acknowledgement or obligation. The narratives of the source materials are blocked from view, in service to a technically embedded narrative that images are merely their surfaces and that surfaces are malleable. At its heart is the idea that the context of these images can be stripped and reduced into a molding clay, for anyone’s hands to shape to their own liking. 

What matters is the power to determine the relationships our images have with the systems that include or exclude. It’s about the power to choose what becomes documented, and on what terms. Through directed attention, we may be able to work through the meanings of these gaps and traces. It is a useful antidote to the inattention of automated generalizations. To greet the ghosts in these archives presents an opportunity to intervene on behalf of complexity, nuance, and care.

That is literal meaning of curation, at its Latin root: “curare,” to care. In this light, there is no such thing as automated curation.

Reclaiming Traceability

In 2021, Magda Tyzlik-Carver wrote “the practice of curating data is also an epistemological practice that needs interventions to consider futures, but also account for the past. This can be done by asking where data comes from. The task in curating data is to reclaim their traceability and to account for their lineage.”

When I started the “Ghost Stays in the Picture” research project, I intended to make linkages between the images produced by these systems and the categories within their training data. It would be a means of surfacing the power embedded into the source of this algorithmic churning within the vector space. I had hoped to highlight and respond to these algorithmic imaginaries by revealing the technical apparatus beneath the surface of generated images. 

In 2024, no mainstream image generation tool offers the access necessary for us to gather any insights into its curatorial patterns. The image dataset I initially worked with for this project is gone. Images of power and domination were the reason — specifically, the Stanford Internet Observatory’s discovery of more than 3,000 images in the LAION 5B dataset depicting abused children. Realizing this, the churn of images became visceral, in the pit of my stomach. The traces of those images, the pain of any person in the dataset, lingers in the models. Perhaps imperceptibly, they shape the structures and patterns of the images I see.

In gathering these images, there was no right to refuse, no intervention of care. Ghosts, Odumosu writes, “make their presences felt, precisely in those moments when the organizing structure has ruptured a caretaking contract; when the crime has not been sufficiently named or borne witness to; when someone is not paying attention” (S299). 

The training of Generative Artificial Intelligence systems has relied upon the power to automate indifference. And if synthetic images are structured in this way, it is merely a visualization of how “artificial intelligence systems” structure the material world when carelessly deployed in other contexts. The synthetic image offers us a glimpse of what that world would look like, if only we would look critically at the structures that inform its spectacle. If we can read algorithmic decision-making a lapse in care, a disintegration of accountability, we might see fresh pavement has been poured onto sacred land. 

This regime of Artificial Intelligence is not an inevitability. It is not even a single ideology. It is a computer system, and computer systems, and norms of interaction and participation with those systems, are malleable. Even with training datasets locked away behind corporate walls, it might still be possible “to insist on care where there has historically been none” (Odumosu S297), and by extension, to identify and refuse the automated inscription of the colonizing ghost.

 

This post concludes my research work at the Flickr Foundation, but I am eager to continue it. I am seeking publishers of art books, or curators for art or photographic exhibitions, who may be interested in a longer set of essays or a curatorial project that explores this methodology for reading AI generated images. If you’re interested, please reach out to me directly: eryk.salvaggio@gmail.com.

The Ghost Stays in the Picture, Part 2: Data Casts Shadows

Eryk Salvaggio is a 2024 Flickr Foundation Research Fellow, diving into the relationships between images, their archives, and datasets through a creative research lens. This three-part series focuses on the ways archives such as Flickr can shape the outputs of generative AI in ways akin to a haunting. Read part one, or continue to part three.

“Today the photograph has transformed again.” – David A. Shamma, in a blog post announcing the YFCC100M dataset.

In part one of this series, I wrote about the differences between archives, datasets, and infrastructures. We explored the movement of images into archives through the simple act of sharing a photograph in an online showcase. We looked at the transmutation of archives into datasets — the ways those archives, composed of individual images, become a category unto themselves, and analyzed as an object of much larger scale. Once an archive becomes a dataset, seeing its contents as individual pieces, each with its own story and value, requires a special commitment to archival practices.

Flickr is an archive — a living and historical record of images taken by people living in the 21st century, a repository for visual culture and cultural heritage. It is also a dataset: the vast sum of this data, framed as an overwhelming challenge for organizing, sorting, and contextualizing what it contains. That data becomes AI infrastructure, as datasets made to aid the understanding of the archive become used in unexpected and unanticipated ways.  

In this post, I shift my analysis from image to archive to dataset, and trace the path of images as they become AI infrastructure — particularly in the field of data-driven machine learning and computer vision. I’ll again turn to the Flickr archive and datasets derived from it.

99.2 Million Rows

A key case study is a collection of millions of images shared in June 2014. That’s when Yahoo! Labs released the YFCC100M dataset, which contained 99.2 million rows of metadata describing photos by 578,268 Flickr members, all uploaded to Flickr between 2004 and 2014 and tagged with a CC license. The dataset contained information such as photo IDs, URLs, and a handful of metadata such as the title, tags, description. I believe that the YFCC100M release was emblematic of a shift in the public’s — and Silicon Valley’s — perception of the visual archive into the category of “image datasets.” 

Certainly, it wasn’t the first image dataset. Digital images had been collected into digital databases for decades, usually for the task of training image recognition systems, whether for handwriting, faces, or object detection. Many of these assembled similar images, such as Stanford’s dogs dataset or NVIDIA’s collection of faces. Nor was it the first transition that a curated archive made into the language of “datasets.” For example, the Tate Modern introduced a dataset of 70,000 digitized artworks in 2013.  

What made YFCC100M interesting was that it was so big, but also diverse. That is, it wasn’t a pre-assembled dataset of specific categories, it was an assortment of styles, subject matter, and formats. Flickr was not a cultural heritage institution but a social media network with a user base that had uploaded far more images than the world’s largest libraries, archives, or museums. In terms of pure photography, no institution could compete on scale and community engagement. 

The YFCC100M includes the description, tags, geotags, camera types, and links to 100 million source images. As a result, we see YFCC100M appear over and over again in papers about image recognition, and then image synthesis. It has been used to train, test, or calibrate countless machine vision projects, including high-rated image labeling systems at Google and OpenAI’s CLIP, which was essential to building DALL-E. Its influence in these systems rivals that of ImageNet, a dataset of 14 million images which was used as a benchmark for image recognition systems, though Nicolas Maleve notes that nearly half of ImageNet’s photos came from Flickr URLs. (ImageNet has been explored in-depth by Kate Crawford and Trevor Paglen.)

10,000 Images of San Francisco

It is always interesting to go in and look at the contents of a dataset, and I’m often surprised how rarely people do this. Whenever we dive into the actual content of datasets we discover interesting things. The YFCC100M dataset contains references to 200,000 images by photographer Andy Nystrom alone, a prolific street photographer who has posted nearly 8 million images to Flickr since creating their account in 2008. 

The dataset contains more than 10,000 images each of London, Paris, Tokyo, New York, San Francisco, and Hong Kong, which outnumber those of other cities. Note the gaps here: all cities of the Northern hemisphere. When I ask Midjourney for an image of a city, I see traces of these locations in the output. 

Are these strange hybrids a result of the prevalence of Flickr in the calibration and testing of these systems? Are they a bias accumulated through the longevity of these datasets and their embeddedness into AI infrastructures? I’m not confident enough to say for sure. But missing from the images produced from the generic prompt “city” are traces of what Midjourney considers an African city. What emerges are not shiny, glistening postcard shots or images that would be plastered on posters by the tourist bureau. Instead, they seem to affirm the worst of the colonizing imagination: unpaved roads, cars broken down in the street. The images for “city” are full of windows reflecting streaks of sunlight; for “African city,” these are windows absent of glass. 

“A prompt about a ‘building in Dakar’ will likely return a deserted field with a dilapidated building while Dakar is a vibrant city with a rich architectural history,” notes the Senegalese curator Linda Dounia. She adds: “For a technology that was developed in our times, it feels like A.I. has missed an opportunity to learn from the fraught legacies that older industries are struggling to untangle themselves from.”

Beyond the training data, these legacies are also entangled in digital infrastructures. We know images from Flickr have come to shape the way computers represent the world, and how we define tests of AI-generated output as “realistic.” These definitions emerge from data, but also from infrastructures of AI. Here, one might ask if the process of calibrating images to places has been so centered on the geographic regions where Flickr has access to ample images: 10,000 images each from cities of the Northern Hemisphere. These created categories for future assessment and comparison. 

What we see in those images of an “African city” are what we don’t see in the data set. What we see is what is what is missing from that infrastructure: 10,000 pictures of Lagos or Nairobi. When these images are absent from the training data, they influence the result. When they are absent from the classifiers and calibration tools, that absence is entrenched.

The sociologist Avery Gordon writes of ghosts, too. For Gordon, the ghost, or the haunting, is “the intermingling of fact, fiction and desire as it shapes the personal and social memory … what does the ghost say as it speaks, barely, in the interstices of the visible and invisible?” In these images, the ghost is the image not taken, the history not preserved, the gaps that haunt the archives. It’s clear these absences move into the data, too, and that the images of artificial intelligence are haunted by them, conjuring up images that reveal these gaps, if we can attune ourselves to see them.

There is a limit to this kind of visual infrastructural analysis of image generation tools — its reliance on intuition. There is always a distance between these representations of reality in the generated image and the reality represented in the datasets. Hence our language of the seance. It is a way of poking through the uncanny, to see if we can find its source, however remote the possibility may be.  

Representativeness

We do know a few things, in fact. We know this dataset was tested for representativeness, that was defined as how evenly it aligned with Flickr’s overall content — not the world at large. We know, then, that the dataset was meant to represent the broader content of Flickr as a whole, and that the biases of the dataset — such as the strong presence of these particular cities — are therefore the biases of Flickr. In 2024, an era where images have been scraped from the web wholesale for training data without warning or permission, we can ask if the YFCC100M dataset reflected the biases we see in tools like DALL-E and Midjourney. We can also ask if the dataset, in becoming a tool for measuring and calibrating these systems, may have shaped those biases as a piece of data infrastructure.

As biased data becomes a piece of automated infrastructure, we see biases come into play from factors beyond just the weights of the training data. It also comes into play in the ways the system maps words to images, sorts out and rejects useful images, and more. One of the ways YFCC100M’s influence may shape these outcomes is through its role in training the OpenAI tool I mentioned earlier, called CLIP. 

CLIP looks at patterns of pixels in an image and compares them to labels for similar sets of pixels. It’s a bridge that connects the descriptions of images to words of a user’s prompt. CLIP is a core connection point between words and images within generative AI. Recognizing whether an image resembles a set of words is how researchers decided what images to include in training datasets such as LAION 5B. 

Calibration

CLIP’s training and calibration dataset contained a subset of YFCC100M, about 15 million images out of CLIP’s 400 million total. But CLIP was calibrated with, and its results tested against, classifications using YFCC100M’s full set. By training and calibrating CLIP against YFCC100M, that dataset played a role in establishing the “ground truth” that shaped CLIP’s ability to link images to text. 

CLIP was assessed on its ability to scale the classifications produced by YFCC100M and MS-COCO, another dataset which consisted entirely of images downloaded from Flickr. The result is that the logic of Flickr users and tagging has become deeply embedded into the fabric of image synthesis. The captions created by Flickr members modeled — and then shaped — the ways images of all kinds would be labeled in the future. In turn, that structured the ways machines determined the accuracy of those labels. If we want to look at the infrastructural influences of these digital “ghosts in the machine,” then the age, ubiquity, and openness of the YFCC100M dataset suggests it has a subtle but important role to play in the way images are produced by diffusion models. 

We might ask about “dataset bias,” a form of bias that doesn’t refer to the dataset, or the archive, or the images they contain. Instead, it’s a bias introduced through the simple act of calling something a dataset, rather than acknowledging its constitutive pieces. This shift in focus shifts our relationship to these pieces, asking us to look at the whole. Might the idea of a “dataset” bias us from the outset toward ignoring context, and distract us from our obligation of care to the material it contains?  

From Drips Comes the Deluge

The YFCC100M dataset was paired with a paper, YFCC100M: The New Data in Multimedia Research, which focused on the needs of managing visual archives at scale. YFCC100M was structured as an index of the archive: a tool for generating insight about what the website held. The authors hoped it might be used to create tools for handling an exponential tide of visual information, rather than developing tools that contributed to the onslaught. 

The words “generative AI” never appear in the paper. It would have been difficult, in 2014, to anticipate that such datasets would be seen through a fundamental shift from “index” to “content” for image generation tools. That is a shift driven by the mindset of AI companies that rose to prominence years later.

In looking at the YFCC100M dataset and paper, I was struck by the difference between the problems it was established to address and the eventual, mainstream use of the dataset. Yahoo! released the paper in response to the problems of proprietary datasets, which they claimed were hampering replication across research efforts. The limits on the reuse of datasets also meant that researchers had to gather their own training data, which was a time consuming and expensive process. This is what made the data valuable enough to protect in the first place — an interesting historical counterpoint to today’s paradoxical claim by AI companies that image data is both rare and ubiquitous, essential but worth very little.  

Attribution

Creative Commons licensed pictures were selected for inclusion in order to facilitate the widest possible range of uses, noting that they were providing “a public dataset with clearly marked licenses that do not overly impose restrictions on how the data is used” (2). Only a third of the images in the dataset were marked as appropriate for commercial use, and 17% required only attribution. But, in accordance with the terms of the Creative Commons licenses used, every image in the dataset required attribution of some kind. When the dataset was shared with the public, it was assumed that researchers would use the dataset to determine how to use the images contained within it, picking images that complied with their own experiments.  

The authors of the paper acknowledge that archives are growing beyond our ability to parse them as archivists. But they also acknowledge Flickr as an archive, that is, a site of memory: 

“Beyond archived collections, the photostreams of individuals represent many facets of recorded visual information, from remembering moments and storytelling to social communication and self-identity [19]. This presents a grand challenge of sensemaking and understanding digital archives from non-homogeneous sources. Photographers and curators alike have contributed to the larger collection of Creative Commons images, yet little is known on how such archives will be navigated and retrieved, or how new information can be discovered therein.”

Despite this, there was a curious contradiction in the way Yahoo! Labs structured the release of the dataset. The least restrictive license in the dataset is CC-BY — images where the license requires attribution. Nearly 68 million out of the 100 million images in the dataset specifically stated there could be no commercial use of their images. Yet, the dataset itself was then released without any restrictions at all, described as “publicly and freely usable.”  

The dataset of YFCC100M wasn’t the images themselves. It was the list of images, a sample of the larger archive that was made referenceable as a way to encourage researchers to make sense of the scale of image hosting platforms. The strange disconnect between boldly declaring the contents as CC-licensed, while making them available to researchers to violate those licenses, is perhaps evident only in hindsight.

Publicly Available

It may not have been a deliberate violation of boundaries so much as it was a failure to grapple with the ways boundaries might be transgressed. The paper, then, serves as a unique time capsule for understanding the logic of datasets as descriptions of things, to the understanding of datasets as the collection of things themselves. This was a logic that we can see carried out in the relationships that AI companies have to the data they use. These companies see the datasets as markedly different from the images that the data refers to, suggesting that they have the right to use datasets of images under “fair use” rules that apply to data, but not to intellectual property. 

This breaks with the early days of datafication and machine learning, which made clearer distinctions between the description of an archive and the archive itself. When Stability AI used LAION 5B as a set of pointers to consumable content, this relationship between description and content collapsed. What was a list of image URLs and the text describing what would be found there became pointers to training data. The context was never considered. 

That collapse is the result of a set of a fairly recent set of beliefs about the world which increasingly sees the “image” as an assemblage of color information paired with technical metadata. We hear echoes of this in the defense of AI companies, that their training data is “publicly available,” a term with no actual, specific meaning. OpenAI says that CLIP was trained on “text–image pairs that are already publicly available” in its white paper.

In releasing the dataset, Yahoo’s researchers may have contributed to a shift: from understanding online platforms through the lens of archives, into understanding them as data sources to be plundered. Luckily, it’s not too late to reassert this distinction. Visual culture, memory, and history can be preserved through a return to the original mission of data science and machine learning in the digital humanities. We need to make sense of a growing number of images, which means preserving and encouraging new contexts and relationships between images rather than replacing them with context-free abstractions produced by diffusion models. 

Generative AI is a product of datasets and machine learning and digital humanities research. But in the past ten years, data about images and the images themselves have become increasingly interchangeable. Datasets were built to preserve and study metadata about images. But now, the metadata is stripped away, aside from the URL, which is used to analyze an image. The image is translated into abstracted information, ignoring where these images came from and the meaning – and relationships of power – that are embedded into what they depict. In erasing these sources, we lose insight into what they mean and how they should be understood: whether an image of a city was taken by a tourism board or an aid agency, for example. The biases that result from these absences are made clear.

Correcting these biases requires care and attention. It requires pathways for intervention and critical thinking about where images are sourced. It means prioritizing context over convenience. Without attention to context, correcting the source biases are far more challenging. 

Data Casts Shadows

In my fellowship with the Flickr Foundation, I am continuing my practice with AI, looking at the gaps between archives and data, and data and infrastructures, through the lens of an archivist. It is a creative research approach that examines how translations of translations shape the world. I am deliberately relying on the language of intuition — ghosts, hauntings, the ritual of the seance — to encourage a more human-scaled, intuitive relationship to this information. It’s a rebuttal of the idea that history, documentation, images and media can be reduced to objective data. 

That means examining the emerging infrastructure built on top of data, and returning to the archival view to see what was erased and what remains. What are the images in this dataset? What do they show us, and what do they mean? Maleve writes that to become AI infrastructure, a Flickr image is pulled from the context of its original circulation, losing currency. It is relabeled by machines, and even the associations of metadata itself become superfluous to the goal of image alignment. All that matters is what the machine sees and how it compares to similar images. The result is a calibration: the creation of a category. The original image is discarded, but the residue of whatever was learned lingers in the system. 

While writing this piece, I became transfixed by shadows within synthetic images. Where does the shadow cast in an AI generated image come from? They don’t come from the sun, because there is no sunlight within the black box of the AI system. Despite the hype, these models do not understand the physics of light, but merely produce traces of light abstracted from other sources.

Unlike photographic evidence, synthetic photographs don’t rely on being present to the world of light bouncing from objects onto film or sensors. The shadows we see in an AI generated image are the shadows cast by other images. The generated image is itself a shadow of shadows, a distortion of a distortion of light. The world depicted in the synthetic image is always limited to the worlds pre-arranged by the eyes of countless photographers. Those arrangements are further extended and mediated as these data shadows stretch across datasets, calibration systems, engineering decisions, design choices and automated processes that ignore or obscure their presence.

Working Backward from the Ghost

When we don’t know the source of decisions made about the system, the result is unexplainable, mysterious, spooky. But image generation platforms are a series of systems stacked on top of one another, trained on hastily assembled stews of image data. The outcomes go through multiple steps of analysis and calibration, outputs of one machine fed into another. Most of these systems are built upon a subset of human decisions scaled to cover inhuman amounts of information. Once automated, these decisions become disembodied, influencing the results.

In part 3 – the conclusion of this series – I’ll examine a means of reading AI generated images through the lens of power, hoping to reveal the intricate entanglement of context, control, and shifting meanings within text and image pairs. Just as shadows move across the AI generated image, so too, I propose, does the gaze of power contained within the archives.

I’ll attempt to trace the flow of power and meaning through datasets and data infrastructures that produce these prompted images, working backwards from what is produced. Where do these training images come from? What stories and images do they contain, or lack? In some ways, it is impossible to parse, like a ghost whose message from the past is buried in cryptic riddles. A seance is rarely satisfying, and shadows disappear under a flashlight.

But it’s my hope that learning to read and uncover these relationships improves our literacy about so-called AI images, and how we relate to them beyond toys for computer art. Rather, I hope to show that these are systems that perpetuate power, through inclusion and exclusion, and the sorting logic of automated computation. The more we automate a system, the more the system is haunted by unseen decisions. I hope to excavate the context of decisions embedded within the system and examine the ways that power moves through it. Otherwise, the future of AI will be dictated by what can most easily be forgotten.  

Read part three here.

***

I would be remiss not to point to the excellent and abundant work on Flickr as a dataset that has been published by Katrina Sluis and Nicolas Malevé, whose work is cited here but merits a special thank you in shaping the thinking throughout this research project. I am also grateful to scholars such as Timnit Gebru, whose work on dataset auditing has deeply informed this work, and to Dr. Abeba Birhane, whose work on the content of the LAION 5B dataset has inspired this creative research practice. 

In the images accompanying this text, I’ve paired images created in Stable Diffusion 1.6 for the prompt “Flickr.com street shadows.” They’re paired with images from actual Flickr members. I did not train AI on these photos, nor did I reference the originals in my prompts. But by pairing the two, we can see the ways that the original Flickr photos might have formed the hazy structures of those generated by Stable Diffusion. 

Improving millions of files on Wikimedia Commons with Flickypedia Backfillr Bot

Last year, we built Flickypedia, a new tool for copying photos from Flickr to Wikimedia Commons. As part of our planning, we asked for feedback on Flickr2Commons and analysed other tools. We spotted two consistent themes in the community’s responses:

  • Write more structured data for Flickr photos
  • Do a better job of detecting duplicate files

We tried to tackle both of these in Flickypedia, and initially, we were just trying to make our uploader better. Only later did we realize that we could take our work a lot further, and retroactively apply it to improve the metadata of the millions of Flickr photos already on Wikimedia Commons. At that moment, Flickypedia Backfillr Bot was born. Last week, the bot completed its millionth update, and we guesstimate we will be able to operate on another 13 million files.

The main goals of the Backfillr Bot are to improve the structured data for Flickr photos on Wikimedia Commons and to make it easier to find out which photos have been copied across. In this post, I’ll talk about what the bot does, and how it came to be.

Write more structured data for Flickr photos

There are two ways to add metadata to a file on Wikimedia Commons: by writing Wikitext or by creating structured data statements.

When you write Wikitext, you write your metadata in a MediaWiki-specific markup language that gets rendered as HTML. This markup can be written and edited by people, and the rendered HTML is designed to be read by people as well. Here’s a small example, which has some metadata to a file linking it back to the original Flickr photo:

== {{int:filedesc}} ==
{{Information
|Description={{en|1=Red-whiskered Bulbul photographed in Karnataka, India.}}
|Source=https://www.flickr.com/photos/shivanayak/12448637/
|Author=[[:en:User:Shivanayak|Shiva shankar]]
|Date=2005-05-04
|Permission=
|other_versions=
}}

and here’s what that Wikitext looks like when rendered as HTML:

A table with four rows: Description (Red-whiskered Bulbul photographed in Karnataka, India), Date (4 May 2005), Source (a Flickr URL) and Author (Shiva shankar)

This syntax is convenient for humans, but it’s fiddly for computers – it can be tricky to extract key information from Wikitext, especially when things get more complicated.

In 2017, Wikimedia Commons added support for structured data. This allows editors to add metadata in a machine-readable format. This makes it much easier to edit metadata programmatically, and there’s a strong desire from the community for new tools to write high-quality structured metadata that other tools can use.

When you add structured data to a file, you create “statements” which are attached to properties. The list of properties is chosen by the volunteers in the Wikimedia community.

For example, there’s a property called “source of file” which is used to indicate where a file came from. The file in our example has a single statement for this property, which says the file is available on the Internet, and points to the original Flickr URL:

Structured data is exposed via an API, and you can retrieve this information in nice machine-readable XML or JSON:

$ curl 'https://commons.wikimedia.org/w/api.php?action=wbgetentities&sites=commonswiki&titles=File%3ARed-whiskered%20Bulbul-web.jpg&format=xml'
<?xml version="1.0"?>
<api success="1">
  …
  <P7482>
    …
    <P973>
      <_v snaktype="value" property="P973">
        <datavalue
          value="https://www.flickr.com/photos/shivanayak/12448637/"
          type="string"/>
      </_v>
    </P973>
    …
  </P7482>
</api>

(Here “P7482” means “source of file” and “P973” is “described at URL”.)

Part of being a good structured data citizen is following the community’s established patterns for writing structured data. Ideally every tool would create statements in the same way, so the data is consistent across files – this makes it easier to work with later.

We spent a long time discussing how Flickypedia should use structured data, and we got a lot of helpful community feedback. We’ve documented our current data model as part of our Wikimedia project page.

Do a better job of detecting duplicate files

If a photo has already been copied from Flickr onto Wikimedia Commons, nobody wants to copy it a second time.

This sounds simple – just check whether the photo is already on Commons, and don’t offer to copy it if it’s already there. In practice, it’s quite tricky to tell if a given Flickr photo is on Commons. There are two big challenges:

  1. Files on Wikimedia Commons aren’t consistent in where they record the URL of the original Flickr photo. Newer files put the URL in structured data; older files only put the URL in Wikitext or the revision descriptions. You have to look in multiple places.
  2. Files on Wikimedia Commons aren’t consistent about which form of the Flickr URL they use – with and without a trailing slash, with the user NSID or their path alias, or the myriad other URL patterns that have been used in Flickr’s twenty-year history.

Here’s a sample of just some of the different URLs we saw in Wikimedia Commons:

https://www.flickr.com/photos/joyoflife//44627174
https://farm5.staticflickr.com/4586/37767087695_bb4ecff5f4_o.jpg
www.flickr.com/photo_edit.gne?id=3435827496
https://www.flickr.com/photo.gne?short=2ouuqFT

There’s no easy way to query Wikimedia Commons and see if a Flickr photo is already there. You can’t, for example, do a search for the current Flickr URL and be sure you’ll find a match – it wouldn’t find any of the examples above. You can combine various approaches that will improve your chances of finding an existing duplicate, if there is one, but it’s a lot of work and you get varying results.

For the first version of Flickypedia, we took a different approach. We downloaded snapshots of the structured data for every file on Wikimedia Commons, and we built a database of all the links between files on Wikimedia Commons and Flickr photos. For every file in the snapshot, we looked at the structured data properties where we might find a Flickr URL. Then we tried to parse those URLs using our Flickr URL parsing library, and find out what Flickr photo they point at (if any).

This gave us a SQLite database that mapped Flickr photo IDs to Wikimedia Commons filenames. We could use this database to do fast queries to find copies of a Flickr photo that already exist on Commons. This proved the concept, but it had a couple of issues:

  • It was an incomplete list – we only looked in the structured data, and not the Wikitext. We estimate we were missing at least a million photos.
  • Nobody else can use this database; it only lives on the Flickypedia server. Theoretically somebody else could create it themselves – the snapshots are public, and the code is open source – but it seems unlikely.
  • This database is only as up-to-date as the latest snapshot we’ve downloaded – it could easily fall behind what’s on Wikimedia Commons.

We wanted to make this process easier – both for ourselves, and anybody else building Flickr–Wikimedia Commons integrations.

Adding the Flickr Photo ID property

Every photo on Flickr has a unique numeric ID, so we proposed a new Flickr photo ID property to add to structured data on Wikimedia Commons. This proposal was discussed and accepted by the Wikimedia Commons community, and gives us a better way to match files on Wikimedia Commons to photos on Flickr:

This is a single field that you can query, and there’s an unambiguous, canonical way that values should be stored in this field – you don’t need to worry about the different variants of Flickr URL.

We added this field to Flickypedia, so any files uploaded with our tool will get this new field, and we hope that other Flickr upload tools will consider adding this field as well. But what about the millions of Flickr photos already on Wikimedia Commons? This is where Flickypedia Backfillr Bot was born.

Updating millions of files

Flickypedia Backfillr Bot applies our structured data mapping to every Flickr photo it can find on Wikimedia Commons – whether or not it was uploaded with Flickypedia. For every photo which was copied from Flickr, it compares the structured data to the live Flickr metadata, and updates the structured data if the two don’t match. This includes the Flickr Photo ID.

It reuses code from our duplicate detector: it goes through a snapshot looking for any files that come from Flickr photos. Then it gets metadata from Flickr, checks if the structured data matches that metadata, and if not, it updates the file on Wikimedia Commons.

Here’s a brief sketch of the process:

Most of the time this logic is fairly straightforward, but occasionally the bot will get confused – this is when the bot wants to write a structured data statement, but there’s already a statement with a different value. In this case, the bot will do nothing and flag it for manual review. There are edge cases and unusual files in Wikimedia Commons, and it’s better for the bot to do nothing than write incorrect or misleading data that will need to be reverted later.

Here are two examples:

  • Sometimes Wikimedia Commons has more specific metadata than Flickr. For example, this Flickr photo was posted by the Donostia Kultura account, and the description identifies Leire Cano as the photographer.

    Flickypedia Backfillr Bot wants to add a creator statement for “Donostia Kultura”, because it can’t understand the description – but when this file was copied to Wikimedia Commons, somebody added a more specific creator statement for “Leire Cano”.

    The bot isn’t sure which statement is correct, so it does nothing and flags this for manual review – and in this case, we’ve left the existing statement as-is.

  • Sometimes existing data on Wikimedia Commons has been mapped incorrectly. For example, this Flickr photo was taken “circa 1943”, but when it was copied to Wikimedia Commons somebody added an overly precise “date taken” statement claiming it was taken on “1 Jan 1943”.

    This bug probably occurred because of a misunderstanding of the Flickr API. The Flickr API will always return a complete timestamp in the “date” field, and then return a separate granularity value telling you how accurate it is. If you ignored that granularity value, you’d create an incorrect statement of what the date is.

    The bot isn’t sure which statement is correct, so it does nothing and flags this for manual review – and in this case, we made a manual edit to replace the statement with the correct date.

What next?

We’re going to keep going! There were a few teething problems when we started running the bot, but the Wikimedia community helped us fix our mistakes. It’s now been running for a month or so, and processed over a million files.

All the Flickypedia code is open source on GitHub, and a lot of it isn’t specific to Flickr – it’s general-purpose code for working with structured data on Wikimedia Commons, and could be adapted to build similar bots. We’ve already had conversations with a few people about other use cases, and we’ve got some sketches for how that code could be extracted into a standalone library.

We estimate that at least 14 million files on Wikimedia Commons are photos that were originally uploaded to Flickr – more than 10% of all the files on Commons. There’s plenty more to do. Onwards and upwards!

Data Lifeboat 5: Prototypes and policy

We are now past the midpoint of our first project stage, and have our three basic prototype Data Lifeboats. At the moment, they run locally via the command line and generate rough versions of what Data Lifeboats will eventually contain—data and pictures.

The last step for those prototypes is to move them into a clicky web prototype showing the full workflow—something we will share with our working group (but may not put online publicly). We are working towards completing this first prototyping stage around the end of June and writing up the project in July.

We’ve made a few key decisions since we last posted an update, namely about who we’re designing for and what other expertise we need to bring in. We still have more questions than answers, but really, that’s what prototyping is for.

Who might do which bit

It took us a while to get to this decision, but once we had gone through the initial discovery phase, it became clear that we need to concentrate our efforts on three key user groups:

  1. Flickr members – People who’ve uploaded pictures to Flickr, have set licenses and permissions, and may either be happy or not happy for their pictures to be put into Data Lifeboats.
  2. Data Lifeboat creators – Could be archivists or other curatorial types looking to gather sets of pictures to copy into archives elsewhere, whether that be an institution like The Library of Congress, or a family archivist with a DropBox account.
  3. Dock operators – This group is a bit more speculative, but, we envision that Data Lifeboats could actually land (or dock) in specific destinations and be treated with special care there. Our ideal scenario would be to develop a network of docks–something we’ve been calling a “Safe Harbor Network”—made up of members that are our great and good cultural organizations: they are already really good at keeping things safe over the long term.

It’ll be good to flesh the needs and wants of these three groups out in more detail in our next stage. If you are a Flickr member reading this, and want to share your story about what your Flickr account means to you, we’d love to hear it.

Web archive vs object archive

Some digital/web preservation experts take the opinion that it’s archivally important to also archive the user interface of a digital property in order to fully understand a digital object’s context. This has arguably resulted in web archives containing a whole lot more information and structural stuff than is useful or necessary. It’s sort of like archiving the entire house within which the shoebox of photos was found.

We have decided that archiving the flickr.com interface itself is not necessary for a Data Lifeboat, and we will be designing a special viewer that will live inside each Data Lifeboat to help people explore its contents.

Analysing the need for new policy

The Data Lifeboat idea is about so much more than technology. Even though that’s certainly challenging, the more we think about it, the more challenging the social and ethical aspects are. It’s gritty, complex stuff, made moreso by the delicate socio-technical settings available to Flickr members, like privacy, search settings, and licensing. The crosshatch of these three vectors makes managing stable permissions over time harder than weaving a complicated textile!

Once we narrowed down our focus to these specific user groups it also became clear that we need to address the (very) complex legal landscape surrounding the potential for archiving of Flickr images external to the service. It’s particularly gnarly when you start considering how permissions might change over time, or how access might shift for different scales of audience. For example, a Flickr member might be happy for Data Lifeboats containing their images to be shared with friends of friends, but a little apprehensive about them being shared with a recognized cultural institution that would use them for research. They may be much less happy for their Flickr pictures to be fully archived and available to anyone in perpetuity.

To help us explore these questions, and begin prototyping policies for each type of user group we foreses, we have enlisted the help of Dr. Andrea Wallace of the Law School at the University of Exeter. She is working with us to develop legal and policy frameworks tailored to the needs of each of these three groups, and to study how the current Flickr Terms of Service may be suitable for, or need adaption around, this idea of a Data Lifeboat. This may include drafting terms and conditions needed to create a Data Lifeboat, how we might be able to enhance rights management, and exploring how to manage expiration or decay of privacy or licensing into the future.

Data Lifeboat prototypes

We have generated three different prototype Data Lifeboats to think with, and show to our working group:

  1. Photos tagged with “Flickrhq”: This prototype includes thousands of tagged images of ‘life working at Flickr’, which is useful to explore the tricky aspects of collating other people’s pictures into a Data Lifeboat. Creating it revealed a search foible, whereby the result set that is delivered by searching via a tag is not consistent. Many of the pictures are also marked as All Rights Reserved, with 33% having downloads disabled. This raises juicy questions about licensing and permissions that need further discussion.
  2. Two photos from each Flickr Commons Member: We picked this subset because Flickr Commons photos are earmarked with the ‘no known copyright restrictions’ assertion, so questions about copying or reusing are theoretically simpler. 
  3. All photos from the Library of Congress (LoC) account: Comprising roughly 42,000 photos also marked as “no known copyright restrictions,” this prototype contains a set that is simpler to manage as all images have a uniform license setting. It was also useful to generate a Data Lifeboat of this size as it allowed us to do some very early benchmarking on questions like how long it takes to create one and where changes to our APIs might be helpful.

Preparing these prototypes has underscored the challenges of balancing the legal, social, and technical aspects of this kind of social media archiving, making clear the need for a special set of terms & conditions for Data Lifeboat creation. They also reveal the limitations of tags in capturing all relevant content (which, to some extent, we were expecting) and the user-imposed restrictions set on images in the Flickr context, like ‘can be downloaded.’

Remaining questions?

OMG, so many. Although the prototypes are still in progress, they have already stimulated great discussion and raised some key questions, such as:

  • How might user intentions or permissions change over time and how could software represent them?
  • How could the scope or scale of sharing influence how shared images are perceived, updated, and utilized?
  • How can we understand how different use cases and how archivists/librarians could engage with the Data Lifeboats?
  • How important is it to make sure Data Lifeboats are launched with embedded rights information, and how might those decay over time?
  • How should we be considering the descriptive or social contexts that accompany images, and how should they inform subsequent decisions about expiration dates?

Long term sustainability and funding models

It’s really so early to be talking about this – and we’re definitely not ready to present any actual, reasonable, viable models here because we don’t know enough yet about how Data Lifeboats could be used or under what circumstances. We did do a first pass review of some obvious potential business models, for example:

  • A premium subscription service that allows Flickr.com users to create personalized Data Lifeboats for their own collections.
  • A consulting service for institutions and individuals who want to create Data Lifeboats for specific archival purposes.
  • Developing training and certification programs for digital archivization that uses Data Lifeboats as the foundation.
  • Membership fees for members of the Safe Harbor network, or charging fees for access to the Data Lifeboat archives.

While there were aspects to each that appealed to our partners, there were also significant flaws so overall, we’re still a long way from having an answer. This is something else we’re planning to explore more broadly in partnership with the wider Flickr Commons membership in subsequent phases of this project.

Next steps

This month we’ll be wrapping up this first prototyping phase supported by the National Endowment for the Humanities. After we’ve completed the required reporting, we’ll move into the next phase in earnest, reaching out to those three user groups more deliberately to learn more about how Data Lifeboats could operate for them and what they would need them to do. 

Two upcoming in-person events!

We’re also very happy to be able to tell you the Mellon Foundation has awarded us a grant to support this next stage, and we’re especially looking forward to running two small events later in the year to gather people from our Flickr Commons partner institutions, as well as other birds of a feather, to discuss these key challenges together.

If you’d like to register your interest in attending one of these meetings, please let us know via this short Registration of Interest form. Please note, these will be small, maybe 20ish people at each, and registering interest does not guarantee a spot, and we’ve only just begun planning in earnest.

 

The surprising utility of a Flickr URL parser

In my first week at the Flickr Foundation, we made a toy called Flinumeratr. This is a small web app that takes a Flickr URL as input, and shows you all the photos which are present at that URL.

As part of this toy, I made a Python library which parses Flickr URLs, and tells you what the URL points to – a single photo, an album, a gallery, and so on. Initially it just handled fairly common patterns, the sort of URLs that you’d encounter if you use Flickr today, but it’s grown to handle more complicated URLs.

$ flickr_url_parser "https://www.flickr.com/photos/sdasmarchives/50567413447"
{"type": "single_photo", "photo_id": "50567413447"}

$ flickr_url_parser "https://www.flickr.com/photos/aljazeeraenglish/albums/72157626164453131"
{"type": "album", "user_url": "https://www.flickr.com/photos/aljazeeraenglish", "album_id": "72157626164453131", "page": 1}

$ flickr_url_parser "https://www.flickr.com/photos/blueminds/page3"
{"type": "user", "user_url": "https://www.flickr.com/photos/blueminds"}

The implementation is fairly straightforward: I use the hyperlink library to parse the URL text into a structured object, then I compare that object to a list of known patterns. Does it look like this type of URL? Or this type of URL? Or this type of URL? And so on.

You can run this library as a command-line tool, or call it from Python – there are instructions in the GitHub README.

There are lots of URL variants

In my second week and beyond, I started to discover more variants, which should probably be expected in 20-year old software! I’ve been looking into collections of Flickr URLs that have been built up over multiple years, and although most of these URLs follow common patterns, there are lots of unusual variants in the long tail.

Some of these are pretty simple. For example, the URL to a user’s photostream can be formed using your Flickr user NSID or your path alias, so flickr.com/photos/197130754@N07/ and flickr.com/photos/flickrfoundation/ point to the same page.

Others are more complicated, and you can trace the history of Flickr through some of the older URLs. Some of my favorites include:

  • Raw JPEG files, on live.staticflickr.com, farm1.static.flickr.com, and several other subdomains.

  • Links with a .gne suffix, like www.flickr.com/photo_edit.gne?id=3435827496 (from Wikimedia Commons). This acronym stands for Game Neverending, the online game out of which Flickr was born.

  • A Flash video player called stewart.swf, which might be a reference to Stewart Butterfield, one of the cofounders of Flickr.

I’ve added support for every variant of Flickr URL to the parsing library – if you want to see a complete list, check out the tests. I need over a hundred tests to check all the variants are parsed correctly.

Where we’re using it

I’ve been able to reuse this parsing code in a bunch of different projects, including:

  • Building a similar “get photos at this URL” interface in Flickypedia.

  • Looking for Flickr photo URLs in Wikimedia Commons. This is for detecting Flickr photos which have already been uploaded to Commons, which I’ll describe more in another post.

  • Finding Flickr pages which have been captured in the Wayback Machine – I can get a list of saved Flickr URLs, and then see what sort of pages have actually been saved.

When I created the library, I wasn’t sure if this code was actually worth extracting as a standalone package – would I use it again, or was this a premature abstraction?

Now that I’ve seen more of the diversity of Flickr URLs and found more uses for this code, I’m much happier with the decision to abstract it into a standalone library. Now we  only need to add support for each new URL variant once, and then all our projects can benefit.

If you want to try the Flickr URL parser yourself, all the code is open source on GitHub.

On the way to 100 years of Flickr

A report on archival strategies

By Ashley Kelleher Skjøtt

Flickr is an important piece of social history that pioneered user-driven curation, through folksonomic tags and through a publicly-accessible platform at scale, crystallising the web 2.0 internet. Applying tags to one’s own images and those of others, Flickr’s users significantly contributed to the emergence of commons culture. These collective practices became a core tenet of Flickr’s design ethos as a platform, decentralising and democratising the role of curation.

Of course, Flickr was not alone in pioneering this—hashtags and social sharing on other platforms added momentum to the general shift which was overall democratising by giving users agency over what they shared, experienced, and categorised. This shift in curatorial agency is just one aspect of Flickr’s significance as a living piece of social history.

Flickr continues to be one of the largest public collections of photographs on the planet, comprising tens of billions of images. Flickr celebrated its 20th birthday in February 2024. The challenge of archiving Flickr at scale, then, perhaps becomes about designing processes for preservation which can also be decentralised.

In August 2023, I learnt from a dear friend and colleague, Dan Pett, that the Flickr Foundation, newly based in London, was beginning to build an innovative archival practice for the platform. With my interest in digital cultural memory systems, an interest for which I have moved continents, I was determined to contribute in some way to the Foundation’s new goal. After exploring and discussing the space with George Oates, Director of the Flickr Foundation, we agreed that a practice-based information-gathering exercise could be useful in building up an understanding of such a practice.

So, what would an archive for Flickr look like?

Flickr is a living social media environment, with up to 25 million images uploaded each day. The reality of the company’s being acquired by a number of different parent companies over the course of its 20-year lifetime—already a remarkable timespan by social media standards—additionally brings to the forefront a stark case for working to ensure the availability of its contents into the long future. This is a priority shared today between Flickr itself and the new Flickr Foundation.

I have prepared a report of findings, written over a deliberately slow period and which aims to present a colloquial yet current answer to the question of archival practice for Flickr as a unique case, both when it comes to scale and defining what should be prioritised for preservation. Presuming that the platform is not invulnerable to media obsolescence, what on earth (or space) should an archive preserving the best of Flickr look like today? The work of asking this question again and again through the days, months, years, and decades to come leads us to the Foundation’s own question: what does it look like to ensure Flickr lasts for one hundred years?

REPORT: 20 Years of Flickr: Archiving the Living Environment

This information-gathering exercise consisted of seven interviews with sector peers across a wide range of practice, from academia to a small company, to a global design practice and within the museum world. My sincere thanks to:

  • Alex Seville (Head of Flickr),
  • Cass Fino-Radin (Small Data Industries),
  • Richard Palmer (V&A Museum),
  • Annet Dekker (University of Amsterdam),
  • Jenny Basford (British Library),
  • Matthew Hoerl (Arch Mission Foundation), and
  • Julie May (Bjarke Ingels Group)

Many thanks for taking the time to generously share their thoughts on the prospect, reflections on their own work, and expertise in the area.

The report sets out to define the value of what should be preserved for Flickr, as (1) a social platform, (2) a network-driven community, (3) a collection of uniquely user-generated metadata, and (4) as an invaluable image collection, specifically of photography. It then proceeds through a discussion of risks identified through the course of interviews. Finally, it proceeds through ten identified areas of practice which can be addressed in the Foundation’s archival plan, divided into long- and short-term initiatives. The report closes with six recommendations for the present.

An archive for Flickr which honours its considerable legacy should be created in the same vein. One interviewee reflected that the work of the archivist is to select what to preserve. This is, effectively, curation – the curation of archival material. It follows then, that if a central innovation of Flickr as a platform was to democratise the application of curatorial tools – enabling tags as metadata based in natural language, at scale – then the approach to archiving such a platform should follow this model in allowing its selection to be driven by users. What about a “preserve” tag?

Thanks to Flickr and other internet pioneers, this is far from any kind of revolutionary idea – and is one worth creating an archival practice around, so that coming generations can access the stories we want to tell about Flickr: the story of the internet, of the commons, of building open structures to find new images and of what it means to be a community, online.

On being a research fellow

By Jenn Phillips-Bacher

A half-year of fellowing

Six months is not a very long time when you’re working with a team whose ambition is to secure a vast digital archive for the next 100 years. That’s only two quarters of a single calendar year; twelve two-week sprints, if you’re counting in agile time. It’s said that it takes around six months to feel competent in a new job. Six months is both no time at all and just enough time to get a grasp of your work environment. 

Now it’s time for me, at the end of these six months, to look back at what I’ve done as the Flickr Foundation’s first Research Fellow. I’d like to share some reflections on my time here: what it was like to be a fellow, what I accomplished, and four crucial work-life lessons. 

How could I be a researcher?

When I joined Flickr Foundation, I was curious but apprehensive about what I could deliver for the organization. I didn’t come to this fellowship with an existing research practice. My last run of ‘proper’ education was at the turn of the century! 

The thing about being the first is that there is no precedent, no blueprint I can follow to be a good fellow. And what flavor of research should I even be doing? I was open to the fellowship taking its own course, and my research and other day-to-day activities were largely led by the team’s 2024 plan. Having an open agenda meant that I could steer myself where I would be most useful, with an emphasis on the Flickr Commons and Content Mobility programs. 

My mode of research followed my experience working in product management: conduct discovery and design research, synthesize it, and then extract and share insights that inform Flickr Foundation’s next steps. I had the freedom to adopt a research approach that clicked for me.

New perspectives, old inclinations

My main motivation in this fellowship was to get fresh perspectives on work I’d been doing for several years; that is, building digital services to help people find and use cultural heritage collections. I’d been doing it first as a librarian, then through managing several projects and products within the GLAM sector.

When working as a Product Manager on a multidisciplinary team within a large company, professional development was tightly tied to my role. When my daily focus was on team dynamic and delivering against a quarterly and annual plan, my self-development was geared toward optimizing those things. I didn’t get much time to look into the more distant future while working to shorter timelines. I relished the idea that I’d get some long-overdue, slow thinking time during my fellowship.

But I’ll be the first to admit that I couldn’t shake the habit of being in delivery mode. I was most drawn to practical, problem-solving work that was well within my comfort zone, like mapping an improved workflow for new members to register for Flickr Commons, or doing a content audit and making recommendations for improvements on flickr.org. 

I stretched myself in other ways, particularly in my writing practice. I won’t claim to be prolific in publishing words to screen, but I wrote several things for Flickr Foundation, including: 

In the midst of all the doing, I read. A lot. (Here’s some of what I covered). Some days it felt like I was spending all day looking at text and video, which is both a luxury and a frustration. I felt an urge to put new knowledge into immediate practice. But when this knowledge is situated in the context of a 100-year plan, ‘immediate’ isn’t necessary. It’s a mindset shift that can challenge anyone’s output-focussed ways of working.

“Aim low, succeed often” and other career lessons 

I want to conclude with four important takeaways from my fellowship. I’m confident they’ll find their way into the next stage of my career. Thank you to George Oates, Alex Chan, Jessamyn West, Ewa Spohn and all of the people I met through the Flickr Foundation for being part of this learning journey.

All technology is about people

Whether being a subject of data, being subjected to a technology, or even being the worker behind the tech, it’s always about people.

Take the Data Lifeboat project as an illustration. If the idea is to decouple Flickr photos from the flickr.com context in order to preserve them for the long term, you have to think about the people in the past who took the photos and their creative rights. What about the people depicted in the photos – did they consent? What would they think about being part of an archive? And what about the institutions who might be a destination for a Data Lifeboat? Guess what, it’s totally made of people who work in complex organizations with legacies, guidelines and policies and strategies of their own (and those governance structures made by people).

To build technology responsibly is a human endeavor, not a mere technical problem. We owe it to each other to do it well.

Slow down, take a longer view

One of the team mottos is “Aim low, succeed often”; meaning, take on only what feels sustainable and take your time to make it achievable. We’re working to build an organization that’s meant to steward our photographic heritage for a century and beyond. We can’t solve everything at once. 

This was a much needed departure from the hamster-wheel of agile product delivery and the planning cycles that can accompany it. But it also fits nicely into product-focussed ways of working – in fact, it’s the ideal. Break down the hard stuff into smaller, achievable aims but have that long-term vision in mind. Work sustainably.

A strong metaphor brings people along

How do you describe a thing that doesn’t exist yet? If you want to shape a narrative for a new product or service, a strong metaphor is your friend. Simple everyday concepts that everyone can understand are perfect framing devices for novel work that’s complex and ambiguous. 

The Data Lifeboat project is held afloat with metaphor. We dock, we build a network of Safe Harbors. If you’re a visual kind of person, the metaphor draws its own picture. Metaphor is helping us to navigate something that’s not perfectly defined.

Meeting culture is how you meet

Everyone in almost every office-based workplace complains about all of the meetings. Having spent time in a small office with three other Flickr Foundation staff, I’ve learned that meeting culture is just how you meet. If you sit around a table and have coffee while talking about what you’re working on, that’s a meeting. It’s been a joy to break out of the dominant, pre-scheduled, standing one-hour meeting culture. Let’s go for a walk. Let’s figure out the best way to get stuff done together. 

Flickr Turns 20 London Photowalk
This is me, captured by George at a Flickr Photowalk in London.

How does the Commons Explorer work?

Last week we wrote an introductory post about our new Commons Explorer; today we’re diving into some of the technical details. How does it work under the hood?

When we were designing the Commons Explorer, we knew we wanted to look across the Commons collection – we love seeing a mix of photos from different members, not just one account at a time. We wanted to build more views that emphasize the breadth of the collection, and help people find more photos from more members.

We knew we’d need the Flickr API, but it wasn’t immediately obvious how to use it for this task. The API exposes a lot of data, but it can only query the data in certain ways.

For example, we wanted the homepage to show a list of recent uploads from every Flickr Commons member. You can make an API call to get the recent uploads for a single user, but there’s no way to get all the uploads for multiple users in a single API call. We could make an API call for every member, but with over 100 members we’d be making a lot of API calls just to render one component of one page!

It would be impractical to fetch data from the API every time we render a page – but we don’t need to. We know that there isn’t that much activity in Flickr Commons – it isn’t a social media network with thousands of updates a second – so rather than get data from the API every time somebody loads a page, we decided it’s good enough to get it once a day. We trade off a bit of “freshness” for a much faster and more reliable website.

We’ve built a Commons crawler that runs every night, and makes thousands of Flickr API calls (within the API’s limits) to populate a SQLite database with all the data we need to power the Commons Explorer. SQLite is a great fit for this sort of data – it’s easy to run, it gives us lots of flexibility in how we query the data, and it’s wicked fast with the size of our collection.

There are three main tables in the database:

  • The members
  • The photos uploaded by all the members
  • The comments on all those photos

We’re using a couple of different APIs to get this information:

  • The flickr.commons.getInstitutions API gives us a list of all the current Commons members. We combine this with the flickr.people.getInfo API to get more detailed information about each member (like their profile page description).
  • The flickr.people.getPhotos API gives us a list of all the photos in each member’s photostream. This takes quite a while to run – it returns up to 500 photos per call, but there are over 1.8 million photos in Flickr Commons.
  • The flickr.photos.comments.getList API gives us a list of all the comments on a single photo. To save us calling this 1.8 million times, we have some logic to check if there are any (new) comments since the last crawl – we don’t need to call this API if nothing has changed.

We can then write SQL queries to query this data in interesting ways, including searching photos and comments from every member at once.

We have a lightweight Flask web app that queries the SQLite database and renders them as nice HTML pages. This is what you see when you browse the website at https://commons.flickr.org/.

We have a couple of pages where we call the Flickr API to get the most up-to-date data (on individual member pages and the cross-Commons search), but most of the site is coming from the SQLite database. After fine-tuning the database with a couple of indexes, it’s now plenty fast, and gives us a bunch of exciting new ways to explore the Commons.

Having all the data in our own database also allows us to learn new stuff about the Flickr Commons collection that we can’t see on Flickr itself – like the fact that it has 1.8 million photos, or that together Flickr Commons as a whole has had 4.4 billion views.

This crawling code has been an interesting test bed for another project – we’ll be doing something very similar to populate a Data Lifeboat, but we’ll talk more about that in a separate post.